mirror of
https://github.com/johnkerl/miller.git
synced 2026-01-24 02:36:15 +00:00
553 lines
No EOL
42 KiB
HTML
553 lines
No EOL
42 KiB
HTML
|
||
<!DOCTYPE html>
|
||
|
||
<html>
|
||
<head>
|
||
<meta charset="utf-8" />
|
||
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
|
||
<title>DSL reference: output statements — Miller 6.0.0-alpha documentation</title>
|
||
|
||
<link rel="stylesheet" href="_static/scrolls.css" type="text/css" />
|
||
<link rel="stylesheet" href="_static/pygments.css" type="text/css" />
|
||
<link rel="stylesheet" href="_static/print.css" type="text/css" />
|
||
|
||
<script id="documentation_options" data-url_root="./" src="_static/documentation_options.js"></script>
|
||
<script src="_static/jquery.js"></script>
|
||
<script src="_static/underscore.js"></script>
|
||
<script src="_static/doctools.js"></script>
|
||
<script src="_static/language_data.js"></script>
|
||
<script src="_static/theme_extras.js"></script>
|
||
<link rel="index" title="Index" href="genindex.html" />
|
||
<link rel="search" title="Search" href="search.html" />
|
||
<link rel="next" title="DSL reference: unset statements" href="reference-dsl-unset-statements.html" />
|
||
<link rel="prev" title="DSL reference: built-in functions" href="reference-dsl-builtin-functions.html" />
|
||
</head><body>
|
||
<div id="content">
|
||
<div class="header">
|
||
<h1 class="heading"><a href="index.html"
|
||
title="back to the documentation overview"><span>DSL reference: output statements</span></a></h1>
|
||
</div>
|
||
<div class="relnav" role="navigation" aria-label="related navigation">
|
||
<a href="reference-dsl-builtin-functions.html">« DSL reference: built-in functions</a> |
|
||
<a href="#">DSL reference: output statements</a>
|
||
| <a href="reference-dsl-unset-statements.html">DSL reference: unset statements »</a>
|
||
</div>
|
||
<div id="contentwrapper">
|
||
<div id="toc" role="navigation" aria-label="table of contents navigation">
|
||
<h3>Table of Contents</h3>
|
||
<ul>
|
||
<li><a class="reference internal" href="#">DSL reference: output statements</a><ul>
|
||
<li><a class="reference internal" href="#print-statements">Print statements</a></li>
|
||
<li><a class="reference internal" href="#dump-statements">Dump statements</a></li>
|
||
<li><a class="reference internal" href="#tee-statements">Tee statements</a></li>
|
||
<li><a class="reference internal" href="#redirected-output-statements">Redirected-output statements</a></li>
|
||
<li><a class="reference internal" href="#emit-statements">Emit statements</a></li>
|
||
<li><a class="reference internal" href="#multi-emit-statements">Multi-emit statements</a></li>
|
||
<li><a class="reference internal" href="#emit-all-statements">Emit-all statements</a></li>
|
||
</ul>
|
||
</li>
|
||
</ul>
|
||
|
||
</div>
|
||
<div role="main">
|
||
|
||
<div class="section" id="dsl-reference-output-statements">
|
||
<h1>DSL reference: output statements<a class="headerlink" href="#dsl-reference-output-statements" title="Permalink to this headline">¶</a></h1>
|
||
<p>You can <strong>output</strong> variable-values or expressions in <strong>five ways</strong>:</p>
|
||
<ul class="simple">
|
||
<li><p><strong>Assign</strong> them to stream-record fields. For example, <code class="docutils literal notranslate"><span class="pre">$cumulative_sum</span> <span class="pre">=</span> <span class="pre">@sum</span></code>. For another example, <code class="docutils literal notranslate"><span class="pre">$nr</span> <span class="pre">=</span> <span class="pre">NR</span></code> adds a field named <code class="docutils literal notranslate"><span class="pre">nr</span></code> to each output record, containing the value of the built-in variable <code class="docutils literal notranslate"><span class="pre">NR</span></code> as of when that record was ingested.</p></li>
|
||
<li><p>Use the <strong>print</strong> or <strong>eprint</strong> keywords which immediately print an expression <em>directly to standard output or standard error</em>, respectively. Note that <code class="docutils literal notranslate"><span class="pre">dump</span></code>, <code class="docutils literal notranslate"><span class="pre">edump</span></code>, <code class="docutils literal notranslate"><span class="pre">print</span></code>, and <code class="docutils literal notranslate"><span class="pre">eprint</span></code> don’t output records which participate in <code class="docutils literal notranslate"><span class="pre">then</span></code>-chaining; rather, they’re just immediate prints to stdout/stderr. The <code class="docutils literal notranslate"><span class="pre">printn</span></code> and <code class="docutils literal notranslate"><span class="pre">eprintn</span></code> keywords are the same except that they don’t print final newlines. Additionally, you can print to a specified file instead of stdout/stderr.</p></li>
|
||
<li><p>Use the <strong>dump</strong> or <strong>edump</strong> keywords, which <em>immediately print all out-of-stream variables as a JSON data structure to the standard output or standard error</em> (respectively).</p></li>
|
||
<li><p>Use <strong>tee</strong> which formats the current stream record (not just an arbitrary string as with <strong>print</strong>) to a specific file.</p></li>
|
||
<li><p>Use <strong>emit</strong>/<strong>emitp</strong>/<strong>emitf</strong> to send out-of-stream variables’ current values to the output record stream, e.g. <code class="docutils literal notranslate"><span class="pre">@sum</span> <span class="pre">+=</span> <span class="pre">$x;</span> <span class="pre">emit</span> <span class="pre">@sum</span></code> which produces an extra output record such as <code class="docutils literal notranslate"><span class="pre">sum=3.1648382</span></code>.</p></li>
|
||
</ul>
|
||
<p>For the first two options you are populating the output-records stream which feeds into the next verb in a <code class="docutils literal notranslate"><span class="pre">then</span></code>-chain (if any), or which otherwise is formatted for output using <code class="docutils literal notranslate"><span class="pre">--o...</span></code> flags.</p>
|
||
<p>For the last three options you are sending output directly to standard output, standard error, or a file.</p>
|
||
<div class="section" id="print-statements">
|
||
<span id="reference-dsl-print-statements"></span><h2>Print statements<a class="headerlink" href="#print-statements" title="Permalink to this headline">¶</a></h2>
|
||
<p>The <code class="docutils literal notranslate"><span class="pre">print</span></code> statement is perhaps self-explanatory, but with a few light caveats:</p>
|
||
<ul class="simple">
|
||
<li><p>There are four variants: <code class="docutils literal notranslate"><span class="pre">print</span></code> goes to stdout with final newline, <code class="docutils literal notranslate"><span class="pre">printn</span></code> goes to stdout without final newline (you can include one using “n” in your output string), <code class="docutils literal notranslate"><span class="pre">eprint</span></code> goes to stderr with final newline, and <code class="docutils literal notranslate"><span class="pre">eprintn</span></code> goes to stderr without final newline.</p></li>
|
||
<li><p>Output goes directly to stdout/stderr, respectively: data produced this way do not go downstream to the next verb in a <code class="docutils literal notranslate"><span class="pre">then</span></code>-chain. (Use <code class="docutils literal notranslate"><span class="pre">emit</span></code> for that.)</p></li>
|
||
<li><p>Print statements are for strings (<code class="docutils literal notranslate"><span class="pre">print</span> <span class="pre">"hello"</span></code>), or things which can be made into strings: numbers (<code class="docutils literal notranslate"><span class="pre">print</span> <span class="pre">3</span></code>, <code class="docutils literal notranslate"><span class="pre">print</span> <span class="pre">$a</span> <span class="pre">+</span> <span class="pre">$b</span></code>, or concatenations thereof (<code class="docutils literal notranslate"><span class="pre">print</span> <span class="pre">"a</span> <span class="pre">+</span> <span class="pre">b</span> <span class="pre">=</span> <span class="pre">"</span> <span class="pre">.</span> <span class="pre">($a</span> <span class="pre">+</span> <span class="pre">$b)</span></code>). Maps (in <code class="docutils literal notranslate"><span class="pre">$*</span></code>, map-valued out-of-stream or local variables, and map literals) aren’t convertible into strings. If you print a map, you get <code class="docutils literal notranslate"><span class="pre">{is-a-map}</span></code> as output. Please use <code class="docutils literal notranslate"><span class="pre">dump</span></code> to print maps.</p></li>
|
||
<li><p>You can redirect print output to a file: <code class="docutils literal notranslate"><span class="pre">mlr</span> <span class="pre">--from</span> <span class="pre">myfile.dat</span> <span class="pre">put</span> <span class="pre">'print</span> <span class="pre">></span> <span class="pre">"tap.txt",</span> <span class="pre">$x'</span></code> <code class="docutils literal notranslate"><span class="pre">mlr</span> <span class="pre">--from</span> <span class="pre">myfile.dat</span> <span class="pre">put</span> <span class="pre">'o=$*;</span> <span class="pre">print</span> <span class="pre">></span> <span class="pre">$a.".txt",</span> <span class="pre">$x'</span></code>.</p></li>
|
||
<li><p>See also <a class="reference internal" href="#reference-dsl-redirected-output-statements"><span class="std std-ref">Redirected-output statements</span></a> for examples.</p></li>
|
||
</ul>
|
||
</div>
|
||
<div class="section" id="dump-statements">
|
||
<span id="reference-dsl-dump-statements"></span><h2>Dump statements<a class="headerlink" href="#dump-statements" title="Permalink to this headline">¶</a></h2>
|
||
<p>The <code class="docutils literal notranslate"><span class="pre">dump</span></code> statement is for printing expressions, including maps, directly to stdout/stderr, respectively:</p>
|
||
<ul class="simple">
|
||
<li><p>There are two variants: <code class="docutils literal notranslate"><span class="pre">dump</span></code> prints to stdout; <code class="docutils literal notranslate"><span class="pre">edump</span></code> prints to stderr.</p></li>
|
||
<li><p>Output goes directly to stdout/stderr, respectively: data produced this way do not go downstream to the next verb in a <code class="docutils literal notranslate"><span class="pre">then</span></code>-chain. (Use <code class="docutils literal notranslate"><span class="pre">emit</span></code> for that.)</p></li>
|
||
<li><p>You can use <code class="docutils literal notranslate"><span class="pre">dump</span></code> to output single strings, numbers, or expressions including map-valued data. Map-valued data are printed as JSON. Miller allows string and integer keys in its map literals while JSON allows only string keys, so use <code class="docutils literal notranslate"><span class="pre">mlr</span> <span class="pre">put</span> <span class="pre">--jknquoteint</span></code> if you want integer-valued map keys not double-quoted.</p></li>
|
||
<li><p>If you use <code class="docutils literal notranslate"><span class="pre">dump</span></code> (or <code class="docutils literal notranslate"><span class="pre">edump</span></code>) with no arguments, you get a JSON structure representing the current values of all out-of-stream variables.</p></li>
|
||
<li><p>As with <code class="docutils literal notranslate"><span class="pre">print</span></code>, you can redirect output to files.</p></li>
|
||
<li><p>See also <a class="reference internal" href="#reference-dsl-redirected-output-statements"><span class="std std-ref">Redirected-output statements</span></a> for examples.</p></li>
|
||
</ul>
|
||
</div>
|
||
<div class="section" id="tee-statements">
|
||
<h2>Tee statements<a class="headerlink" href="#tee-statements" title="Permalink to this headline">¶</a></h2>
|
||
<p>Records produced by a <code class="docutils literal notranslate"><span class="pre">mlr</span> <span class="pre">put</span></code> go downstream to the next verb in your <code class="docutils literal notranslate"><span class="pre">then</span></code>-chain, if any, or otherwise to standard output. If you want to additionally copy out records to files, you can do that using <code class="docutils literal notranslate"><span class="pre">tee</span></code>.</p>
|
||
<p>The syntax is, by example, <code class="docutils literal notranslate"><span class="pre">mlr</span> <span class="pre">--from</span> <span class="pre">myfile.dat</span> <span class="pre">put</span> <span class="pre">'tee</span> <span class="pre">></span> <span class="pre">"tap.dat",</span> <span class="pre">$*'</span> <span class="pre">then</span> <span class="pre">sort</span> <span class="pre">-n</span> <span class="pre">index</span></code>. First is <code class="docutils literal notranslate"><span class="pre">tee</span> <span class="pre">></span></code>, then the filename expression (which can be an expression such as <code class="docutils literal notranslate"><span class="pre">"tap.".$a.".dat"</span></code>), then a comma, then <code class="docutils literal notranslate"><span class="pre">$*</span></code>. (Nothing else but <code class="docutils literal notranslate"><span class="pre">$*</span></code> is teeable.)</p>
|
||
<p>See also <a class="reference internal" href="#reference-dsl-redirected-output-statements"><span class="std std-ref">Redirected-output statements</span></a> for examples.</p>
|
||
</div>
|
||
<div class="section" id="redirected-output-statements">
|
||
<span id="reference-dsl-redirected-output-statements"></span><h2>Redirected-output statements<a class="headerlink" href="#redirected-output-statements" title="Permalink to this headline">¶</a></h2>
|
||
<p>The <strong>print</strong>, <strong>dump</strong> <strong>tee</strong>, <strong>emitf</strong>, <strong>emit</strong>, and <strong>emitp</strong> keywords all allow you to redirect output to one or more files or pipe-to commands. The filenames/commands are strings which can be constructed using record-dependent values, so you can do things like splitting a table into multiple files, one for each account ID, and so on.</p>
|
||
<p>Details:</p>
|
||
<ul class="simple">
|
||
<li><p>The <code class="docutils literal notranslate"><span class="pre">print</span></code> and <code class="docutils literal notranslate"><span class="pre">dump</span></code> keywords produce output immediately to standard output, or to specified file(s) or pipe-to command if present.</p></li>
|
||
</ul>
|
||
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> mlr help keyword print
|
||
</span> print: prints expression immediately to stdout.
|
||
|
||
Example: mlr --from f.dat put -q 'print "The sum of x and y is ".($x+$y)'
|
||
Example: mlr --from f.dat put -q 'for (k, v in $*) { print k . " => " . v }'
|
||
Example: mlr --from f.dat put '(NR %% 1000 == 0) { print > stderr, "Checkpoint ".NR}'
|
||
</pre></div>
|
||
</div>
|
||
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> mlr help keyword dump
|
||
</span> dump: prints all currently defined out-of-stream variables immediately
|
||
to stdout as JSON.
|
||
|
||
With >, >>, or |, the data do not become part of the output record stream but
|
||
are instead redirected.
|
||
|
||
The > and >> are for write and append, as in the shell, but (as with awk) the
|
||
file-overwrite for > is on first write, not per record. The | is for piping to
|
||
a process which will process the data. There will be one open file for each
|
||
distinct file name (for > and >>) or one subordinate process for each distinct
|
||
value of the piped-to command (for |). Output-formatting flags are taken from
|
||
the main command line.
|
||
|
||
Example: mlr --from f.dat put -q '@v[NR]=$*; end { dump }'
|
||
Example: mlr --from f.dat put -q '@v[NR]=$*; end { dump > "mytap.dat"}'
|
||
Example: mlr --from f.dat put -q '@v[NR]=$*; end { dump >> "mytap.dat"}'
|
||
Example: mlr --from f.dat put -q '@v[NR]=$*; end { dump | "jq .[]"}'
|
||
</pre></div>
|
||
</div>
|
||
<ul class="simple">
|
||
<li><p><code class="docutils literal notranslate"><span class="pre">mlr</span> <span class="pre">put</span></code> sends the current record (possibly modified by the <code class="docutils literal notranslate"><span class="pre">put</span></code> expression) to the output record stream. Records are then input to the following verb in a <code class="docutils literal notranslate"><span class="pre">then</span></code>-chain (if any), else printed to standard output (unless <code class="docutils literal notranslate"><span class="pre">put</span> <span class="pre">-q</span></code>). The <strong>tee</strong> keyword <em>additionally</em> writes the output record to specified file(s) or pipe-to command, or immediately to <code class="docutils literal notranslate"><span class="pre">stdout</span></code>/<code class="docutils literal notranslate"><span class="pre">stderr</span></code>.</p></li>
|
||
</ul>
|
||
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> mlr help keyword tee
|
||
</span> tee: prints the current record to specified file.
|
||
This is an immediate print to the specified file (except for pprint format
|
||
which of course waits until the end of the input stream to format all output).
|
||
|
||
The > and >> are for write and append, as in the shell, but (as with awk) the
|
||
file-overwrite for > is on first write, not per record. The | is for piping to
|
||
a process which will process the data. There will be one open file for each
|
||
distinct file name (for > and >>) or one subordinate process for each distinct
|
||
value of the piped-to command (for |). Output-formatting flags are taken from
|
||
the main command line.
|
||
|
||
You can use any of the output-format command-line flags, e.g. --ocsv, --ofs,
|
||
etc., to control the format of the output. See also mlr -h.
|
||
|
||
emit with redirect and tee with redirect are identical, except tee can only
|
||
output $*.
|
||
|
||
Example: mlr --from f.dat put 'tee > "/tmp/data-".$a, $*'
|
||
Example: mlr --from f.dat put 'tee >> "/tmp/data-".$a.$b, $*'
|
||
Example: mlr --from f.dat put 'tee > stderr, $*'
|
||
Example: mlr --from f.dat put -q 'tee | "tr \[a-z\\] \[A-Z\\]", $*'
|
||
Example: mlr --from f.dat put -q 'tee | "tr \[a-z\\] \[A-Z\\] > /tmp/data-".$a, $*'
|
||
Example: mlr --from f.dat put -q 'tee | "gzip > /tmp/data-".$a.".gz", $*'
|
||
Example: mlr --from f.dat put -q --ojson 'tee | "gzip > /tmp/data-".$a.".gz", $*'
|
||
</pre></div>
|
||
</div>
|
||
<ul class="simple">
|
||
<li><p><code class="docutils literal notranslate"><span class="pre">mlr</span> <span class="pre">put</span></code>’s <code class="docutils literal notranslate"><span class="pre">emitf</span></code>, <code class="docutils literal notranslate"><span class="pre">emitp</span></code>, and <code class="docutils literal notranslate"><span class="pre">emit</span></code> send out-of-stream variables to the output record stream. These are then input to the following verb in a <code class="docutils literal notranslate"><span class="pre">then</span></code>-chain (if any), else printed to standard output. When redirected with <code class="docutils literal notranslate"><span class="pre">></span></code>, <code class="docutils literal notranslate"><span class="pre">>></span></code>, or <code class="docutils literal notranslate"><span class="pre">|</span></code>, they <em>instead</em> write the out-of-stream variable(s) to specified file(s) or pipe-to command, or immediately to <code class="docutils literal notranslate"><span class="pre">stdout</span></code>/<code class="docutils literal notranslate"><span class="pre">stderr</span></code>.</p></li>
|
||
</ul>
|
||
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> mlr help keyword emitf
|
||
</span> emitf: inserts non-indexed out-of-stream variable(s) side-by-side into the
|
||
output record stream.
|
||
|
||
With >, >>, or |, the data do not become part of the output record stream but
|
||
are instead redirected.
|
||
|
||
The > and >> are for write and append, as in the shell, but (as with awk) the
|
||
file-overwrite for > is on first write, not per record. The | is for piping to
|
||
a process which will process the data. There will be one open file for each
|
||
distinct file name (for > and >>) or one subordinate process for each distinct
|
||
value of the piped-to command (for |). Output-formatting flags are taken from
|
||
the main command line.
|
||
|
||
You can use any of the output-format command-line flags, e.g. --ocsv, --ofs,
|
||
etc., to control the format of the output if the output is redirected. See also mlr -h.
|
||
|
||
Example: mlr --from f.dat put '@a=$i;@b+=$x;@c+=$y; emitf @a'
|
||
Example: mlr --from f.dat put --oxtab '@a=$i;@b+=$x;@c+=$y; emitf > "tap-".$i.".dat", @a'
|
||
Example: mlr --from f.dat put '@a=$i;@b+=$x;@c+=$y; emitf @a, @b, @c'
|
||
Example: mlr --from f.dat put '@a=$i;@b+=$x;@c+=$y; emitf > "mytap.dat", @a, @b, @c'
|
||
Example: mlr --from f.dat put '@a=$i;@b+=$x;@c+=$y; emitf >> "mytap.dat", @a, @b, @c'
|
||
Example: mlr --from f.dat put '@a=$i;@b+=$x;@c+=$y; emitf > stderr, @a, @b, @c'
|
||
Example: mlr --from f.dat put '@a=$i;@b+=$x;@c+=$y; emitf | "grep somepattern", @a, @b, @c'
|
||
Example: mlr --from f.dat put '@a=$i;@b+=$x;@c+=$y; emitf | "grep somepattern > mytap.dat", @a, @b, @c'
|
||
|
||
Please see https://johnkerl.org/miller6://johnkerl.org/miller/doc for more information.
|
||
</pre></div>
|
||
</div>
|
||
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> mlr help keyword emitp
|
||
</span> emitp: inserts an out-of-stream variable into the output record stream.
|
||
Hashmap indices present in the data but not slotted by emitp arguments are
|
||
output concatenated with ":".
|
||
|
||
With >, >>, or |, the data do not become part of the output record stream but
|
||
are instead redirected.
|
||
|
||
The > and >> are for write and append, as in the shell, but (as with awk) the
|
||
file-overwrite for > is on first write, not per record. The | is for piping to
|
||
a process which will process the data. There will be one open file for each
|
||
distinct file name (for > and >>) or one subordinate process for each distinct
|
||
value of the piped-to command (for |). Output-formatting flags are taken from
|
||
the main command line.
|
||
|
||
You can use any of the output-format command-line flags, e.g. --ocsv, --ofs,
|
||
etc., to control the format of the output if the output is redirected. See also mlr -h.
|
||
|
||
Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emitp @sums'
|
||
Example: mlr --from f.dat put --opprint '@sums[$a][$b]+=$x; emitp > "tap-".$a.$b.".dat", @sums'
|
||
Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emitp @sums, "index1", "index2"'
|
||
Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emitp @*, "index1", "index2"'
|
||
Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emitp > "mytap.dat", @*, "index1", "index2"'
|
||
Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emitp >> "mytap.dat", @*, "index1", "index2"'
|
||
Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emitp | "gzip > mytap.dat.gz", @*, "index1", "index2"'
|
||
Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emitp > stderr, @*, "index1", "index2"'
|
||
Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emitp | "grep somepattern", @*, "index1", "index2"'
|
||
|
||
Please see https://johnkerl.org/miller6://johnkerl.org/miller/doc for more information.
|
||
</pre></div>
|
||
</div>
|
||
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> mlr help keyword emit
|
||
</span> emit: inserts an out-of-stream variable into the output record stream. Hashmap
|
||
indices present in the data but not slotted by emit arguments are not output.
|
||
|
||
With >, >>, or |, the data do not become part of the output record stream but
|
||
are instead redirected.
|
||
|
||
The > and >> are for write and append, as in the shell, but (as with awk) the
|
||
file-overwrite for > is on first write, not per record. The | is for piping to
|
||
a process which will process the data. There will be one open file for each
|
||
distinct file name (for > and >>) or one subordinate process for each distinct
|
||
value of the piped-to command (for |). Output-formatting flags are taken from
|
||
the main command line.
|
||
|
||
You can use any of the output-format command-line flags, e.g. --ocsv, --ofs,
|
||
etc., to control the format of the output if the output is redirected. See also mlr -h.
|
||
|
||
Example: mlr --from f.dat put 'emit > "/tmp/data-".$a, $*'
|
||
Example: mlr --from f.dat put 'emit > "/tmp/data-".$a, mapexcept($*, "a")'
|
||
Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emit @sums'
|
||
Example: mlr --from f.dat put --ojson '@sums[$a][$b]+=$x; emit > "tap-".$a.$b.".dat", @sums'
|
||
Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emit @sums, "index1", "index2"'
|
||
Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emit @*, "index1", "index2"'
|
||
Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emit > "mytap.dat", @*, "index1", "index2"'
|
||
Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emit >> "mytap.dat", @*, "index1", "index2"'
|
||
Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emit | "gzip > mytap.dat.gz", @*, "index1", "index2"'
|
||
Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emit > stderr, @*, "index1", "index2"'
|
||
Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emit | "grep somepattern", @*, "index1", "index2"'
|
||
|
||
Please see https://johnkerl.org/miller6://johnkerl.org/miller/doc for more information.
|
||
</pre></div>
|
||
</div>
|
||
</div>
|
||
<div class="section" id="emit-statements">
|
||
<span id="reference-dsl-emit-statements"></span><h2>Emit statements<a class="headerlink" href="#emit-statements" title="Permalink to this headline">¶</a></h2>
|
||
<p>There are three variants: <code class="docutils literal notranslate"><span class="pre">emitf</span></code>, <code class="docutils literal notranslate"><span class="pre">emit</span></code>, and <code class="docutils literal notranslate"><span class="pre">emitp</span></code>. Keep in mind that out-of-stream variables are a nested, multi-level hashmap (directly viewable as JSON using <code class="docutils literal notranslate"><span class="pre">dump</span></code>), whereas Miller output records are lists of single-level key-value pairs. The three emit variants allow you to control how the multilevel hashmaps are flatten down to output records. You can emit any map-valued expression, including <code class="docutils literal notranslate"><span class="pre">$*</span></code>, map-valued out-of-stream variables, the entire out-of-stream-variable collection <code class="docutils literal notranslate"><span class="pre">@*</span></code>, map-valued local variables, map literals, or map-valued function return values.</p>
|
||
<p>Use <strong>emitf</strong> to output several out-of-stream variables side-by-side in the same output record. For <code class="docutils literal notranslate"><span class="pre">emitf</span></code> these mustn’t have indexing using <code class="docutils literal notranslate"><span class="pre">@name[...]</span></code>. Example:</p>
|
||
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> mlr put -q '
|
||
</span><span class="hll"> @count += 1;
|
||
</span><span class="hll"> @x_sum += $x;
|
||
</span><span class="hll"> @y_sum += $y;
|
||
</span><span class="hll"> end { emitf @count, @x_sum, @y_sum}
|
||
</span><span class="hll"> ' data/small
|
||
</span> count=5,x_sum=2.264761728567491,y_sum=2.585085709781158
|
||
</pre></div>
|
||
</div>
|
||
<p>Use <strong>emit</strong> to output an out-of-stream variable. If it’s non-indexed you’ll get a simple key-value pair:</p>
|
||
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> cat data/small
|
||
</span> a=pan,b=pan,i=1,x=0.3467901443380824,y=0.7268028627434533
|
||
a=eks,b=pan,i=2,x=0.7586799647899636,y=0.5221511083334797
|
||
a=wye,b=wye,i=3,x=0.20460330576630303,y=0.33831852551664776
|
||
a=eks,b=wye,i=4,x=0.38139939387114097,y=0.13418874328430463
|
||
a=wye,b=pan,i=5,x=0.5732889198020006,y=0.8636244699032729
|
||
</pre></div>
|
||
</div>
|
||
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> mlr put -q '@sum += $x; end { dump }' data/small
|
||
</span> {
|
||
"sum": 2.264761728567491
|
||
}
|
||
</pre></div>
|
||
</div>
|
||
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> mlr put -q '@sum += $x; end { emit @sum }' data/small
|
||
</span> sum=2.264761728567491
|
||
</pre></div>
|
||
</div>
|
||
<p>If it’s indexed then use as many names after <code class="docutils literal notranslate"><span class="pre">emit</span></code> as there are indices:</p>
|
||
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> mlr put -q '@sum[$a] += $x; end { dump }' data/small
|
||
</span> {
|
||
"sum": {
|
||
"pan": 0.3467901443380824,
|
||
"eks": 1.1400793586611044,
|
||
"wye": 0.7778922255683036
|
||
}
|
||
}
|
||
</pre></div>
|
||
</div>
|
||
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> mlr put -q '@sum[$a] += $x; end { emit @sum, "a" }' data/small
|
||
</span> a=pan,sum=0.3467901443380824
|
||
a=eks,sum=1.1400793586611044
|
||
a=wye,sum=0.7778922255683036
|
||
</pre></div>
|
||
</div>
|
||
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> mlr put -q '@sum[$a][$b] += $x; end { dump }' data/small
|
||
</span> {
|
||
"sum": {
|
||
"pan": {
|
||
"pan": 0.3467901443380824
|
||
},
|
||
"eks": {
|
||
"pan": 0.7586799647899636,
|
||
"wye": 0.38139939387114097
|
||
},
|
||
"wye": {
|
||
"wye": 0.20460330576630303,
|
||
"pan": 0.5732889198020006
|
||
}
|
||
}
|
||
}
|
||
</pre></div>
|
||
</div>
|
||
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> mlr put -q '@sum[$a][$b] += $x; end { emit @sum, "a", "b" }' data/small
|
||
</span> a=pan,b=pan,sum=0.3467901443380824
|
||
a=eks,b=pan,sum=0.7586799647899636
|
||
a=eks,b=wye,sum=0.38139939387114097
|
||
a=wye,b=wye,sum=0.20460330576630303
|
||
a=wye,b=pan,sum=0.5732889198020006
|
||
</pre></div>
|
||
</div>
|
||
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> mlr put -q '@sum[$a][$b][$i] += $x; end { dump }' data/small
|
||
</span> {
|
||
"sum": {
|
||
"pan": {
|
||
"pan": {
|
||
"1": 0.3467901443380824
|
||
}
|
||
},
|
||
"eks": {
|
||
"pan": {
|
||
"2": 0.7586799647899636
|
||
},
|
||
"wye": {
|
||
"4": 0.38139939387114097
|
||
}
|
||
},
|
||
"wye": {
|
||
"wye": {
|
||
"3": 0.20460330576630303
|
||
},
|
||
"pan": {
|
||
"5": 0.5732889198020006
|
||
}
|
||
}
|
||
}
|
||
}
|
||
</pre></div>
|
||
</div>
|
||
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> mlr put -q '
|
||
</span><span class="hll"> @sum[$a][$b][$i] += $x;
|
||
</span><span class="hll"> end { emit @sum, "a", "b", "i" }
|
||
</span><span class="hll"> ' data/small
|
||
</span> a=pan,b=pan,i=1,sum=0.3467901443380824
|
||
a=eks,b=pan,i=2,sum=0.7586799647899636
|
||
a=eks,b=wye,i=4,sum=0.38139939387114097
|
||
a=wye,b=wye,i=3,sum=0.20460330576630303
|
||
a=wye,b=pan,i=5,sum=0.5732889198020006
|
||
</pre></div>
|
||
</div>
|
||
<p>Now for <strong>emitp</strong>: if you have as many names following <code class="docutils literal notranslate"><span class="pre">emit</span></code> as there are levels in the out-of-stream variable’s hashmap, then <code class="docutils literal notranslate"><span class="pre">emit</span></code> and <code class="docutils literal notranslate"><span class="pre">emitp</span></code> do the same thing. Where they differ is when you don’t specify as many names as there are hashmap levels. In this case, Miller needs to flatten multiple map indices down to output-record keys: <code class="docutils literal notranslate"><span class="pre">emitp</span></code> includes full prefixing (hence the <code class="docutils literal notranslate"><span class="pre">p</span></code> in <code class="docutils literal notranslate"><span class="pre">emitp</span></code>) while <code class="docutils literal notranslate"><span class="pre">emit</span></code> takes the deepest hashmap key as the output-record key:</p>
|
||
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> mlr put -q '@sum[$a][$b] += $x; end { dump }' data/small
|
||
</span> {
|
||
"sum": {
|
||
"pan": {
|
||
"pan": 0.3467901443380824
|
||
},
|
||
"eks": {
|
||
"pan": 0.7586799647899636,
|
||
"wye": 0.38139939387114097
|
||
},
|
||
"wye": {
|
||
"wye": 0.20460330576630303,
|
||
"pan": 0.5732889198020006
|
||
}
|
||
}
|
||
}
|
||
</pre></div>
|
||
</div>
|
||
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> mlr put -q '@sum[$a][$b] += $x; end { emit @sum, "a" }' data/small
|
||
</span> a=pan,pan=0.3467901443380824
|
||
a=eks,pan=0.7586799647899636,wye=0.38139939387114097
|
||
a=wye,wye=0.20460330576630303,pan=0.5732889198020006
|
||
</pre></div>
|
||
</div>
|
||
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> mlr put -q '@sum[$a][$b] += $x; end { emit @sum }' data/small
|
||
</span> pan.pan=0.3467901443380824,eks.pan=0.7586799647899636,eks.wye=0.38139939387114097,wye.wye=0.20460330576630303,wye.pan=0.5732889198020006
|
||
</pre></div>
|
||
</div>
|
||
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> mlr put -q '@sum[$a][$b] += $x; end { emitp @sum, "a" }' data/small
|
||
</span> a=pan,sum.pan=0.3467901443380824
|
||
a=eks,sum.pan=0.7586799647899636,sum.wye=0.38139939387114097
|
||
a=wye,sum.wye=0.20460330576630303,sum.pan=0.5732889198020006
|
||
</pre></div>
|
||
</div>
|
||
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> mlr put -q '@sum[$a][$b] += $x; end { emitp @sum }' data/small
|
||
</span> sum.pan.pan=0.3467901443380824,sum.eks.pan=0.7586799647899636,sum.eks.wye=0.38139939387114097,sum.wye.wye=0.20460330576630303,sum.wye.pan=0.5732889198020006
|
||
</pre></div>
|
||
</div>
|
||
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> mlr --oxtab put -q '@sum[$a][$b] += $x; end { emitp @sum }' data/small
|
||
</span> sum.pan.pan 0.3467901443380824
|
||
sum.eks.pan 0.7586799647899636
|
||
sum.eks.wye 0.38139939387114097
|
||
sum.wye.wye 0.20460330576630303
|
||
sum.wye.pan 0.5732889198020006
|
||
</pre></div>
|
||
</div>
|
||
<p>Use <strong>–oflatsep</strong> to specify the character which joins multilevel
|
||
keys for <code class="docutils literal notranslate"><span class="pre">emitp</span></code> (it defaults to a colon):</p>
|
||
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> mlr put -q --oflatsep / '@sum[$a][$b] += $x; end { emitp @sum, "a" }' data/small
|
||
</span> a=pan,sum.pan=0.3467901443380824
|
||
a=eks,sum.pan=0.7586799647899636,sum.wye=0.38139939387114097
|
||
a=wye,sum.wye=0.20460330576630303,sum.pan=0.5732889198020006
|
||
</pre></div>
|
||
</div>
|
||
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> mlr put -q --oflatsep / '@sum[$a][$b] += $x; end { emitp @sum }' data/small
|
||
</span> sum.pan.pan=0.3467901443380824,sum.eks.pan=0.7586799647899636,sum.eks.wye=0.38139939387114097,sum.wye.wye=0.20460330576630303,sum.wye.pan=0.5732889198020006
|
||
</pre></div>
|
||
</div>
|
||
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> mlr --oxtab put -q --oflatsep / '
|
||
</span><span class="hll"> @sum[$a][$b] += $x;
|
||
</span><span class="hll"> end { emitp @sum }
|
||
</span><span class="hll"> ' data/small
|
||
</span> sum.pan.pan 0.3467901443380824
|
||
sum.eks.pan 0.7586799647899636
|
||
sum.eks.wye 0.38139939387114097
|
||
sum.wye.wye 0.20460330576630303
|
||
sum.wye.pan 0.5732889198020006
|
||
</pre></div>
|
||
</div>
|
||
</div>
|
||
<div class="section" id="multi-emit-statements">
|
||
<h2>Multi-emit statements<a class="headerlink" href="#multi-emit-statements" title="Permalink to this headline">¶</a></h2>
|
||
<p>You can emit <strong>multiple map-valued expressions side-by-side</strong> by
|
||
including their names in parentheses:</p>
|
||
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> mlr --from data/medium --opprint put -q '
|
||
</span><span class="hll"> @x_count[$a][$b] += 1;
|
||
</span><span class="hll"> @x_sum[$a][$b] += $x;
|
||
</span><span class="hll"> end {
|
||
</span><span class="hll"> for ((a, b), _ in @x_count) {
|
||
</span><span class="hll"> @x_mean[a][b] = @x_sum[a][b] / @x_count[a][b]
|
||
</span><span class="hll"> }
|
||
</span><span class="hll"> emit (@x_sum, @x_count, @x_mean), "a", "b"
|
||
</span><span class="hll"> }
|
||
</span><span class="hll"> '
|
||
</span> a b x_sum x_count x_mean
|
||
pan pan 219.1851288316854 427 0.5133141190437597
|
||
pan wye 198.43293070748447 395 0.5023618498923658
|
||
pan eks 216.07522773165525 429 0.5036718595143479
|
||
pan hat 205.22277621488686 417 0.492140950155604
|
||
pan zee 205.09751802331917 413 0.4966041598627583
|
||
eks pan 179.96303047250723 371 0.48507555383425127
|
||
eks wye 196.9452860713734 407 0.4838950517724162
|
||
eks zee 176.8803651584733 357 0.49546320772681596
|
||
eks eks 215.91609712937984 413 0.5227992666570941
|
||
eks hat 208.783170520597 417 0.5006790659966355
|
||
wye wye 185.29584980261419 377 0.49150092785839306
|
||
wye pan 195.84790012056564 392 0.4996119901034838
|
||
wye hat 212.0331829346132 426 0.4977304763723314
|
||
wye zee 194.77404756708714 385 0.5059066170573692
|
||
wye eks 204.8129608356315 386 0.5306035254809106
|
||
zee pan 202.21380378504267 389 0.5198298297816007
|
||
zee wye 233.9913939194868 455 0.5142667998230479
|
||
zee eks 190.9617780631925 391 0.4883932942792647
|
||
zee zee 206.64063510417319 403 0.5127559183726382
|
||
zee hat 191.30000620900935 409 0.46772617655014515
|
||
hat wye 208.8830097609959 423 0.49381326184632596
|
||
hat zee 196.3494502965293 385 0.5099985721987774
|
||
hat eks 189.0067933716193 389 0.48587864619953547
|
||
hat hat 182.8535323148762 381 0.47993053101017374
|
||
hat pan 168.5538067327806 363 0.4643355557376876
|
||
</pre></div>
|
||
</div>
|
||
<p>What this does is walk through the first out-of-stream variable (<code class="docutils literal notranslate"><span class="pre">@x_sum</span></code> in this example) as usual, then for each keylist found (e.g. <code class="docutils literal notranslate"><span class="pre">pan,wye</span></code>), include the values for the remaining out-of-stream variables (here, <code class="docutils literal notranslate"><span class="pre">@x_count</span></code> and <code class="docutils literal notranslate"><span class="pre">@x_mean</span></code>). You should use this when all out-of-stream variables in the emit statement have <strong>the same shape and the same keylists</strong>.</p>
|
||
</div>
|
||
<div class="section" id="emit-all-statements">
|
||
<h2>Emit-all statements<a class="headerlink" href="#emit-all-statements" title="Permalink to this headline">¶</a></h2>
|
||
<p>Use <strong>emit all</strong> (or <code class="docutils literal notranslate"><span class="pre">emit</span> <span class="pre">@*</span></code> which is synonymous) to output all out-of-stream variables. You can use the following idiom to get various accumulators output side-by-side (reminiscent of <code class="docutils literal notranslate"><span class="pre">mlr</span> <span class="pre">stats1</span></code>):</p>
|
||
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> mlr --from data/small --opprint put -q '
|
||
</span><span class="hll"> @v[$a][$b]["sum"] += $x;
|
||
</span><span class="hll"> @v[$a][$b]["count"] += 1;
|
||
</span><span class="hll"> end{emit @*,"a","b"}
|
||
</span><span class="hll"> '
|
||
</span> a b pan.sum pan.count
|
||
v pan 0.3467901443380824 1
|
||
|
||
a b pan.sum pan.count wye.sum wye.count
|
||
v eks 0.7586799647899636 1 0.38139939387114097 1
|
||
|
||
a b wye.sum wye.count pan.sum pan.count
|
||
v wye 0.20460330576630303 1 0.5732889198020006 1
|
||
</pre></div>
|
||
</div>
|
||
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> mlr --from data/small --opprint put -q '
|
||
</span><span class="hll"> @sum[$a][$b] += $x;
|
||
</span><span class="hll"> @count[$a][$b] += 1;
|
||
</span><span class="hll"> end{emit @*,"a","b"}
|
||
</span><span class="hll"> '
|
||
</span> a b pan
|
||
sum pan 0.3467901443380824
|
||
|
||
a b pan wye
|
||
sum eks 0.7586799647899636 0.38139939387114097
|
||
|
||
a b wye pan
|
||
sum wye 0.20460330576630303 0.5732889198020006
|
||
|
||
a b pan
|
||
count pan 1
|
||
|
||
a b pan wye
|
||
count eks 1 1
|
||
|
||
a b wye pan
|
||
count wye 1 1
|
||
</pre></div>
|
||
</div>
|
||
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> mlr --from data/small --opprint put -q '
|
||
</span><span class="hll"> @sum[$a][$b] += $x;
|
||
</span><span class="hll"> @count[$a][$b] += 1;
|
||
</span><span class="hll"> end{emit (@sum, @count),"a","b"}
|
||
</span><span class="hll"> '
|
||
</span> a b sum count
|
||
pan pan 0.3467901443380824 1
|
||
eks pan 0.7586799647899636 1
|
||
eks wye 0.38139939387114097 1
|
||
wye wye 0.20460330576630303 1
|
||
wye pan 0.5732889198020006 1
|
||
</pre></div>
|
||
</div>
|
||
</div>
|
||
</div>
|
||
|
||
|
||
</div>
|
||
</div>
|
||
</div>
|
||
|
||
<div class="footer" role="contentinfo">
|
||
© Copyright 2021, John Kerl.
|
||
Created using <a href="https://www.sphinx-doc.org/">Sphinx</a> 3.2.1.
|
||
</div>
|
||
</body>
|
||
</html> |