mirror of
https://github.com/johnkerl/miller.git
synced 2026-01-23 18:25:45 +00:00
6239 lines
175 KiB
HTML
6239 lines
175 KiB
HTML
<!DOCTYPE html>
|
|
<html lang="en">
|
|
|
|
<!-- PAGE GENERATED FROM template.html and content-for-reference-dsl.html BY poki. -->
|
|
<!-- PLEASE MAKE CHANGES THERE AND THEN RE-RUN poki. -->
|
|
<head>
|
|
<meta http-equiv="Content-type" content="text/html;charset=UTF-8"/>
|
|
<meta name="description" content="Miller documentation"/>
|
|
<meta name="viewport" content="width=device-width, initial-scale=1.0"/> <!-- mobile-friendly -->
|
|
<meta name="keywords"
|
|
content="John Kerl, Kerl, Miller, miller, mlr, OLAP, data analysis software, regression, correlation, variance, data tools, " />
|
|
|
|
<title> DSL reference </title>
|
|
<link rel="stylesheet" type="text/css" href="css/miller.css"/>
|
|
<link rel="stylesheet" type="text/css" href="css/poki-callbacks.css"/>
|
|
</head>
|
|
|
|
<!-- ================================================================ -->
|
|
<script type="text/javascript">
|
|
var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www.");
|
|
document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E"));
|
|
</script>
|
|
<script type="text/javascript">
|
|
try {
|
|
var pageTracker = _gat._getTracker("UA-15651652-1");
|
|
pageTracker._trackPageview();
|
|
} catch(err) {}
|
|
</script>
|
|
<!-- ================================================================ -->
|
|
|
|
<body bgcolor="#ffffff">
|
|
|
|
<!-- ================================================================ -->
|
|
|
|
<!-- navbar -->
|
|
<div class="pokinav">
|
|
<center><titleinbody>Miller</titleinbody></center>
|
|
|
|
<!-- NAVBAR GENERATED FROM template.html BY poki -->
|
|
<br/>
|
|
<a class="poki-navbar-element" href="index.html">Overview</a>
|
|
|
|
<a class="poki-navbar-element" href="faq.html">Using</a>
|
|
|
|
<a class="poki-navbar-element" href="reference.html"><b>Reference</b></a>
|
|
|
|
<a class="poki-navbar-element" href="why.html">Background</a>
|
|
|
|
<a class="poki-navbar-element" href="contact.html">Repository</a>
|
|
|
|
<br/>
|
|
<br/><a href="reference.html">Main reference</a>
|
|
<br/><a href="reference-verbs.html">Verbs reference</a>
|
|
<br/><a href="reference-dsl.html"><b>DSL reference</b></a>
|
|
<br/><a href="manpage.html">Manpage</a>
|
|
<br/><a href="release-docs.html">Documents by release</a>
|
|
<br/><a href="build.html">Installation</a>
|
|
</div>
|
|
|
|
<!-- page body -->
|
|
<p/>
|
|
|
|
<!-- BODY COPIED FROM content-for-reference-dsl.html BY poki -->
|
|
<div class="pokitoc">
|
|
<center><titleinbody>DSL reference</titleinbody></center>
|
|
• <a href="#Overview">Overview</a><br/>
|
|
• <a href="#Syntax">Syntax</a><br/>
|
|
• <a href="#Expression_formatting">Expression formatting</a><br/>
|
|
• <a href="#Expressions_from_files">Expressions from files</a><br/>
|
|
• <a href="#Semicolons,_commas,_newlines,_and_curly_braces">Semicolons, commas, newlines, and curly braces</a><br/>
|
|
• <a href="#Variables">Variables</a><br/>
|
|
• <a href="#Built-in_variables">Built-in variables</a><br/>
|
|
• <a href="#Field_names">Field names</a><br/>
|
|
• <a href="#Positional_field_names">Positional field names</a><br/>
|
|
• <a href="#Out-of-stream_variables">Out-of-stream variables</a><br/>
|
|
• <a href="#Indexed_out-of-stream_variables">Indexed out-of-stream variables</a><br/>
|
|
• <a href="#Local_variables">Local variables</a><br/>
|
|
• <a href="#Map_literals">Map literals</a><br/>
|
|
• <a href="#Type-checking">Type-checking</a><br/>
|
|
• <a href="#Type-test_and_type-assertion_expressions">Type-test and type-assertion expressions</a><br/>
|
|
• <a href="#Type-declarations_for_local_variables,_function_parameter,_and_function_return_values">Type-declarations for local variables, function parameter, and function return values</a><br/>
|
|
• <a href="#Null_data:_empty_and_absent">Null data: empty and absent</a><br/>
|
|
• <a href="#Aggregate_variable_assignments">Aggregate variable assignments</a><br/>
|
|
• <a href="#Keywords_for_filter_and_put">Keywords for filter and put</a><br/>
|
|
• <a href="#Operator_precedence">Operator precedence</a><br/>
|
|
• <a href="#Operator_and_function_semantics">Operator and function semantics</a><br/>
|
|
• <a href="#Control_structures">Control structures</a><br/>
|
|
• <a href="#Pattern-action_blocks">Pattern-action blocks</a><br/>
|
|
• <a href="#If-statements">If-statements</a><br/>
|
|
• <a href="#While_and_do-while_loops">While and do-while loops</a><br/>
|
|
• <a href="#For-loops">For-loops</a><br/>
|
|
• <a href="#Key-only_for-loops">Key-only for-loops</a><br/>
|
|
• <a href="#Key-value_for-loops">Key-value for-loops</a><br/>
|
|
• <a href="#C-style_triple-for_loops">C-style triple-for loops</a><br/>
|
|
• <a href="#Begin/end_blocks">Begin/end blocks</a><br/>
|
|
• <a href="#Output_statements">Output statements</a><br/>
|
|
• <a href="#Print_statements">Print statements</a><br/>
|
|
• <a href="#Dump_statements">Dump statements</a><br/>
|
|
• <a href="#Tee_statements">Tee statements</a><br/>
|
|
• <a href="#Redirected-output_statements">Redirected-output statements</a><br/>
|
|
• <a href="#Emit_statements">Emit statements</a><br/>
|
|
• <a href="#Multi-emit_statements">Multi-emit statements</a><br/>
|
|
• <a href="#Emit-all_statements">Emit-all statements</a><br/>
|
|
• <a href="#Unset_statements">Unset statements</a><br/>
|
|
• <a href="#Filter_statements">Filter statements</a><br/>
|
|
• <a href="#Built-in_functions_for_filter_and_put,_summary">Built-in functions for filter and put, summary</a><br/>
|
|
• <a href="#Built-in_functions_for_filter_and_put">Built-in functions for filter and put</a><br/>
|
|
• <a href="#User-defined_functions_and_subroutines">User-defined functions and subroutines</a><br/>
|
|
• <a href="#User-defined_functions">User-defined functions</a><br/>
|
|
• <a href="#User-defined_subroutines">User-defined subroutines</a><br/>
|
|
• <a href="#Errors_and_transparency">Errors and transparency</a><br/>
|
|
• <a href="#A_note_on_the_complexity_of_Miller’s_expression_language">A note on the complexity of Miller’s expression language</a><br/>
|
|
</div>
|
|
<p/>
|
|
|
|
<p/>
|
|
<button style="font-weight:bold;color:maroon;border:0" onclick="bodyToggler.expandAll();" href="javascript:;">Expand all sections</button>
|
|
<button style="font-weight:bold;color:maroon;border:0" onclick="bodyToggler.collapseAll();" href="javascript:;">Collapse all sections</button>
|
|
|
|
<!-- ================================================================ -->
|
|
<a id="Overview"/><h1>Overview</h1>
|
|
<button style="font-weight:bold;color:maroon;border:0" padding=0 onclick="bodyToggler.toggle('body_section_toggle_overview');" href="javascript:;">Toggle section visibility</button>
|
|
<div id="body_section_toggle_overview" style="display: block">
|
|
|
|
<p/> Here’s comparison of verbs and <code>put</code>/<code>filter</code> DSL expressions:
|
|
|
|
<table border=1>
|
|
<tr> <td>
|
|
Example:
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr stats1 -a sum -f x -g a data/small
|
|
a=pan,x_sum=0.346790
|
|
a=eks,x_sum=1.140079
|
|
a=wye,x_sum=0.777892
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
<p/>
|
|
<ul>
|
|
<li/> Verbs are coded in C
|
|
<li/> They run a bit faster
|
|
<li/> They take fewer keystrokes
|
|
<li/> There is less to learn
|
|
<li/> Their customization is limited to each verb’s options
|
|
</ul>
|
|
</td>
|
|
<td>
|
|
Example:
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr put -q '@x_sum[$a] += $x; end{emit @x_sum, "a"}' data/small
|
|
a=pan,x_sum=0.346790
|
|
a=eks,x_sum=1.140079
|
|
a=wye,x_sum=0.777892
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
<ul>
|
|
<li/> You get to write your own DSL expressions
|
|
<li/> They run a bit slower
|
|
<li/> They take more keystrokes
|
|
<li/> There is more to learn
|
|
<li/> They are highly customizable
|
|
</ul>
|
|
</td> </tr>
|
|
</table>
|
|
|
|
<p/>Please see <a href="reference-verbs.html">here</a> for information on
|
|
verbs other than <code>put</code> and <code>filter</code>.
|
|
|
|
<p/>
|
|
The essential usages of <code>mlr filter</code> and <code>mlr put</code> are for
|
|
record-selection and record-updating expressions, respectively. For example, given the following input data:
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ cat data/small
|
|
a=pan,b=pan,i=1,x=0.3467901443380824,y=0.7268028627434533
|
|
a=eks,b=pan,i=2,x=0.7586799647899636,y=0.5221511083334797
|
|
a=wye,b=wye,i=3,x=0.20460330576630303,y=0.33831852551664776
|
|
a=eks,b=wye,i=4,x=0.38139939387114097,y=0.13418874328430463
|
|
a=wye,b=pan,i=5,x=0.5732889198020006,y=0.8636244699032729
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
|
|
<p/> you might retain only the records whose <code>a</code> field has value <code>eks</code>:
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr filter '$a == "eks"' data/small
|
|
a=eks,b=pan,i=2,x=0.7586799647899636,y=0.5221511083334797
|
|
a=eks,b=wye,i=4,x=0.38139939387114097,y=0.13418874328430463
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
|
|
<p/> or you might add a new field which is a function of existing fields:
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr put '$ab = $a . "_" . $b ' data/small
|
|
a=pan,b=pan,i=1,x=0.3467901443380824,y=0.7268028627434533,ab=pan_pan
|
|
a=eks,b=pan,i=2,x=0.7586799647899636,y=0.5221511083334797,ab=eks_pan
|
|
a=wye,b=wye,i=3,x=0.20460330576630303,y=0.33831852551664776,ab=wye_wye
|
|
a=eks,b=wye,i=4,x=0.38139939387114097,y=0.13418874328430463,ab=eks_wye
|
|
a=wye,b=pan,i=5,x=0.5732889198020006,y=0.8636244699032729,ab=wye_pan
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
|
|
<p/>The two verbs <code>mlr filter</code> and <code>mlr put</code> are essentially the
|
|
same. The only differences are:
|
|
|
|
<ul>
|
|
|
|
<li/> Expressions sent to <code>mlr filter</code> must end with a boolean expression,
|
|
which is the filtering criterion;
|
|
|
|
<li/> <code>mlr filter</code> expressions may not
|
|
reference the <code>filter</code> keyword within them; and
|
|
|
|
<li/> <code>mlr filter</code> expressions may not use <code>tee</code>, <code>emit</code>,
|
|
<code>emitp</code>, or <code>emitf</code>.
|
|
|
|
</ul>
|
|
|
|
<p/> All the rest is the same: in particular, you can define and invoke
|
|
functions and subroutines to help produce the final boolean statement, and
|
|
record fields may be assigned to in the statements preceding the final boolean
|
|
statement.
|
|
|
|
<p/>There are more details and more choices, of course, as detailed in the following sections.
|
|
|
|
</div>
|
|
<!-- ================================================================ -->
|
|
<a id="Syntax"/><h1>Syntax</h1>
|
|
<button style="font-weight:bold;color:maroon;border:0" padding=0 onclick="bodyToggler.toggle('body_section_toggle_syntax');" href="javascript:;">Toggle section visibility</button>
|
|
<div id="body_section_toggle_syntax" style="display: block">
|
|
|
|
<!-- ================================================================ -->
|
|
<a id="Expression_formatting"/><h2>Expression formatting</h2>
|
|
|
|
<p/>Multiple expressions may be given, separated by semicolons, and each may refer to the ones before:
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ ruby -e '10.times{|i|puts "i=#{i}"}' | mlr --opprint put '$j = $i + 1; $k = $i +$j'
|
|
i j k
|
|
0 1 1
|
|
1 2 3
|
|
2 3 5
|
|
3 4 7
|
|
4 5 9
|
|
5 6 11
|
|
6 7 13
|
|
7 8 15
|
|
8 9 17
|
|
9 10 19
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
|
|
Newlines within the expression are ignored, which can help increase legibility of complex expressions:
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr --opprint put '
|
|
$nf = NF;
|
|
$nr = NR;
|
|
$fnr = FNR;
|
|
$filenum = FILENUM;
|
|
$filename = FILENAME
|
|
' data/small data/small2
|
|
a b i x y nf nr fnr filenum filename
|
|
pan pan 1 0.3467901443380824 0.7268028627434533 5 1 1 1 data/small
|
|
eks pan 2 0.7586799647899636 0.5221511083334797 5 2 2 1 data/small
|
|
wye wye 3 0.20460330576630303 0.33831852551664776 5 3 3 1 data/small
|
|
eks wye 4 0.38139939387114097 0.13418874328430463 5 4 4 1 data/small
|
|
wye pan 5 0.5732889198020006 0.8636244699032729 5 5 5 1 data/small
|
|
pan eks 9999 0.267481232652199086 0.557077185510228001 5 6 1 2 data/small2
|
|
wye eks 10000 0.734806020620654365 0.884788571337605134 5 7 2 2 data/small2
|
|
pan wye 10001 0.870530722602517626 0.009854780514656930 5 8 3 2 data/small2
|
|
hat wye 10002 0.321507044286237609 0.568893318795083758 5 9 4 2 data/small2
|
|
pan zee 10003 0.272054845593895200 0.425789896597056627 5 10 5 2 data/small2
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr --opprint filter '($x > 0.5 && $y < 0.5) || ($x < 0.5 && $y > 0.5)' then stats2 -a corr -f x,y data/medium
|
|
x_y_corr
|
|
-0.747994
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
|
|
<!-- ================================================================ -->
|
|
<a id="Expressions_from_files"/><h2>Expressions from files</h2>
|
|
|
|
<p/>The simplest way to enter expressions for <code>put</code> and <code>filter</code> is between single quotes on the command line, e.g.
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr --from data/small put '$xy = sqrt($x**2 + $y**2)'
|
|
a=pan,b=pan,i=1,x=0.3467901443380824,y=0.7268028627434533,xy=0.805299
|
|
a=eks,b=pan,i=2,x=0.7586799647899636,y=0.5221511083334797,xy=0.920998
|
|
a=wye,b=wye,i=3,x=0.20460330576630303,y=0.33831852551664776,xy=0.395376
|
|
a=eks,b=wye,i=4,x=0.38139939387114097,y=0.13418874328430463,xy=0.404317
|
|
a=wye,b=pan,i=5,x=0.5732889198020006,y=0.8636244699032729,xy=1.036584
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr --from data/small put 'func f(a, b) { return sqrt(a**2 + b**2) } $xy = f($x, $y)'
|
|
a=pan,b=pan,i=1,x=0.3467901443380824,y=0.7268028627434533,xy=0.805299
|
|
a=eks,b=pan,i=2,x=0.7586799647899636,y=0.5221511083334797,xy=0.920998
|
|
a=wye,b=wye,i=3,x=0.20460330576630303,y=0.33831852551664776,xy=0.395376
|
|
a=eks,b=wye,i=4,x=0.38139939387114097,y=0.13418874328430463,xy=0.404317
|
|
a=wye,b=pan,i=5,x=0.5732889198020006,y=0.8636244699032729,xy=1.036584
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
|
|
<p/>You may, though, find it convenient to put expressions into files for reuse, and read them
|
|
<b>using the -f option</b>. For example:
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ cat data/fe-example-3.mlr
|
|
func f(a, b) {
|
|
return sqrt(a**2 + b**2)
|
|
}
|
|
$xy = f($x, $y)
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr --from data/small put -f data/fe-example-3.mlr
|
|
a=pan,b=pan,i=1,x=0.3467901443380824,y=0.7268028627434533,xy=0.805299
|
|
a=eks,b=pan,i=2,x=0.7586799647899636,y=0.5221511083334797,xy=0.920998
|
|
a=wye,b=wye,i=3,x=0.20460330576630303,y=0.33831852551664776,xy=0.395376
|
|
a=eks,b=wye,i=4,x=0.38139939387114097,y=0.13418874328430463,xy=0.404317
|
|
a=wye,b=pan,i=5,x=0.5732889198020006,y=0.8636244699032729,xy=1.036584
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
|
|
<p/>If you have some of the logic in a file and you want to write the rest on the command line, you
|
|
can <b>use the -f and -e options together</b>:
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ cat data/fe-example-4.mlr
|
|
func f(a, b) {
|
|
return sqrt(a**2 + b**2)
|
|
}
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr --from data/small put -f data/fe-example-4.mlr -e '$xy = f($x, $y)'
|
|
a=pan,b=pan,i=1,x=0.3467901443380824,y=0.7268028627434533,xy=0.805299
|
|
a=eks,b=pan,i=2,x=0.7586799647899636,y=0.5221511083334797,xy=0.920998
|
|
a=wye,b=wye,i=3,x=0.20460330576630303,y=0.33831852551664776,xy=0.395376
|
|
a=eks,b=wye,i=4,x=0.38139939387114097,y=0.13418874328430463,xy=0.404317
|
|
a=wye,b=pan,i=5,x=0.5732889198020006,y=0.8636244699032729,xy=1.036584
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
|
|
<p/>A suggested use-case here is defining functions in files, and calling them from command-line expressions.
|
|
|
|
<p/>Another suggested use-case is putting default parameter values in files, e.g. using
|
|
<code>begin{@count=is_present(@count)?@count:10}</code> in the file, where you can precede that using
|
|
<code>begin{@count=40}</code> using <code>-e</code>.
|
|
|
|
<p/>Moreover, you can have one or more <code>-f</code> expressions (maybe one
|
|
function per file, for example) and one or more <code>-e</code> expressions on the
|
|
command line. If you mix <code>-f</code> and <code>-e</code> then the expressions are
|
|
evaluated in the order encountered. (Since the expressions are all simply
|
|
concatenated together in order, don’t forget intervening semicolons: e.g.
|
|
not <code>mlr put -e '$x=1' -e '$y=2 ...'</code> but rather <code>mlr put -e '$x=1;' -e
|
|
'$y=2' ...</code>.)
|
|
|
|
<!-- ================================================================ -->
|
|
<a id="Semicolons,_commas,_newlines,_and_curly_braces"/><h2>Semicolons, commas, newlines, and curly braces</h2>
|
|
|
|
<p/>Miller uses <b>semicolons as statement separators</b>, not statement terminators. This means you can write:
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
mlr put 'x=1'
|
|
mlr put 'x=1;$y=2'
|
|
mlr put 'x=1;$y=2;'
|
|
mlr put 'x=1;;;;$y=2;'
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
|
|
<p/>Semicolons are optional after closing curly braces (which close conditionals and loops as discussed below).
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ echo x=1,y=2 | mlr put 'while (NF < 10) { $[NF+1] = ""} $foo = "bar"'
|
|
x=1,y=2,3=,4=,5=,6=,7=,8=,9=,10=,foo=bar
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ echo x=1,y=2 | mlr put 'while (NF < 10) { $[NF+1] = ""}; $foo = "bar"'
|
|
x=1,y=2,3=,4=,5=,6=,7=,8=,9=,10=,foo=bar
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
|
|
<p/>Semicolons are required between statements even if those statements are on
|
|
separate lines. <b>Newlines</b> are for your convenience but have no syntactic
|
|
meaning: line endings do not terminate statements. For example, adjacent
|
|
assignment statements must be separated by semicolons even if those statements
|
|
are on separate lines:
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
mlr put '
|
|
$x = 1
|
|
$y = 2 # Syntax error
|
|
'
|
|
|
|
mlr put '
|
|
$x = 1;
|
|
$y = 2 # This is OK
|
|
'
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
|
|
<p/><b>Trailing commas</b> are allowed in function/subroutine definitions,
|
|
function/subroutine callsites, and map literals. This is intended for (although
|
|
not restricted to) the multi-line case:
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr --csvlite --from data/a.csv put '
|
|
func f(
|
|
num a,
|
|
num b,
|
|
): num {
|
|
return a**2 + b**2;
|
|
}
|
|
$* = {
|
|
"s": $a + $b,
|
|
"t": $a - $b,
|
|
"u": f(
|
|
$a,
|
|
$b,
|
|
),
|
|
"v": NR,
|
|
}
|
|
'
|
|
s,t,u,v
|
|
3,-1,5.000000,1
|
|
9,-1,41.000000,2
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
|
|
<p/>Bodies for all compound statements must be enclosed in <b>curly braces</b>, even if the body is a single statement:
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
mlr put 'if ($x == 1) $y = 2' # Syntax error
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
mlr put 'if ($x == 1) { $y = 2 }' # This is OK
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
|
|
<p/>Bodies for compound statements may be empty:
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
mlr put 'if ($x == 1) { }' # This no-op is syntactically acceptable
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
|
|
</div>
|
|
<!-- ================================================================ -->
|
|
<a id="Variables"/><h1>Variables</h1>
|
|
<button style="font-weight:bold;color:maroon;border:0" padding=0 onclick="bodyToggler.toggle('body_section_toggle_variables');" href="javascript:;">Toggle section visibility</button>
|
|
<div id="body_section_toggle_variables" style="display: block">
|
|
|
|
<p/>Miller has the following kinds of variables:
|
|
|
|
<p/> <b>Built-in variables</b> such as <code>NF</code>, <code>NF</code>,
|
|
<code>FILENAME</code>, <code>M_PI</code>, and <code>M_E</code>. These are all capital letters
|
|
and are read-only (although some of them change value from one record to
|
|
another).
|
|
|
|
<p/> <b>Fields of stream records</b>, accessed using the <code>$</code> prefix.
|
|
These refer to fields of the current data-stream record. For example, in
|
|
<code>echo x=1,y=2 | mlr put '$z = $x + $y'</code>, <code>$x</code> and <code>$y</code>
|
|
refer to input fields, and <code>$z</code> refers to a new, computed output field.
|
|
In a few contexts, presented below, you can refer to the entire record as
|
|
<code>$*</code>.
|
|
|
|
<p/> <b>Out-of-stream variables</b> accessed using the <code>@</code> prefix. These
|
|
refer to data which persist from one record to the next, including in
|
|
<code>begin</code> and <code>end</code> blocks (which execute before/after the record
|
|
stream is consumed, respectively). You use them to remember values across
|
|
records, such as sums, differences, counters, and so on. In a few contexts,
|
|
presented below, you can refer to the entire out-of-stream-variables collection
|
|
as <code>@*</code>.
|
|
|
|
<p/> <b>Local variables</b> are limited in scope and extent to the current
|
|
statements being executed: these include function arguments, bound variables in
|
|
for loops, and explicitly declared local variables.
|
|
|
|
<p/> <b>Keywords</b> are not variables, but since their names are reserved, you
|
|
cannot use these names for local variables.
|
|
|
|
<!-- ================================================================ -->
|
|
<a id="Built-in_variables"/><h2>Built-in variables</h2>
|
|
|
|
<p/> These are written all in capital letters, such as <code>NR</code>,
|
|
<code>NF</code>, <code>FILENAME</code>, and only a small, specific set of them is
|
|
defined by Miller.
|
|
|
|
<p/>Namely, Miller supports the following five built-in variables for <a
|
|
href="reference-verbs.html#filter"><code>filter</code></a> and <code>put</code>, all <code>awk</code>-inspired:
|
|
<code>NF</code>, <code>NR</code>, <code>FNR</code>, <code>FILENUM</code>, and
|
|
<code>FILENAME</code>, as well as the mathematical constants <code>M_PI</code> and
|
|
<code>M_E</code>. Lastly, the <code>ENV</code> hashmap allows read access to environment
|
|
variables, e.g. <code>ENV["HOME"]</code> or <code>ENV["foo_".$hostname]</code>.
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr filter 'FNR == 2' data/small*
|
|
a=eks,b=pan,i=2,x=0.7586799647899636,y=0.5221511083334797
|
|
1=pan,2=pan,3=1,4=0.3467901443380824,5=0.7268028627434533
|
|
a=wye,b=eks,i=10000,x=0.734806020620654365,y=0.884788571337605134
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr put '$fnr = FNR' data/small*
|
|
a=pan,b=pan,i=1,x=0.3467901443380824,y=0.7268028627434533,fnr=1
|
|
a=eks,b=pan,i=2,x=0.7586799647899636,y=0.5221511083334797,fnr=2
|
|
a=wye,b=wye,i=3,x=0.20460330576630303,y=0.33831852551664776,fnr=3
|
|
a=eks,b=wye,i=4,x=0.38139939387114097,y=0.13418874328430463,fnr=4
|
|
a=wye,b=pan,i=5,x=0.5732889198020006,y=0.8636244699032729,fnr=5
|
|
1=a,2=b,3=i,4=x,5=y,fnr=1
|
|
1=pan,2=pan,3=1,4=0.3467901443380824,5=0.7268028627434533,fnr=2
|
|
1=eks,2=pan,3=2,4=0.7586799647899636,5=0.5221511083334797,fnr=3
|
|
1=wye,2=wye,3=3,4=0.20460330576630303,5=0.33831852551664776,fnr=4
|
|
1=eks,2=wye,3=4,4=0.38139939387114097,5=0.13418874328430463,fnr=5
|
|
1=wye,2=pan,3=5,4=0.5732889198020006,5=0.8636244699032729,fnr=6
|
|
a=pan,b=eks,i=9999,x=0.267481232652199086,y=0.557077185510228001,fnr=1
|
|
a=wye,b=eks,i=10000,x=0.734806020620654365,y=0.884788571337605134,fnr=2
|
|
a=pan,b=wye,i=10001,x=0.870530722602517626,y=0.009854780514656930,fnr=3
|
|
a=hat,b=wye,i=10002,x=0.321507044286237609,y=0.568893318795083758,fnr=4
|
|
a=pan,b=zee,i=10003,x=0.272054845593895200,y=0.425789896597056627,fnr=5
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
|
|
<p/> Their values of <code>NF</code>, <code>NR</code>, <code>FNR</code>, <code>FILENUM</code>,
|
|
and <code>FILENAME</code> change from one record to the next as Miller scans
|
|
through your input data stream. The mathematical constants, of course, do not
|
|
change; <code>ENV</code> is populated from the system environment variables at the
|
|
time Miller starts and is read-only for the remainder of program execution.
|
|
|
|
<p/> Their <b>scope is global</b>: you can refer to them in any <code>filter</code>
|
|
or <code>put</code> statement. Their values are assigned by the input-record
|
|
reader:
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr --csv put '$nr = NR' data/a.csv
|
|
a,b,c,nr
|
|
1,2,3,1
|
|
4,5,6,2
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr --csv repeat -n 3 then put '$nr = NR' data/a.csv
|
|
a,b,c,nr
|
|
1,2,3,1
|
|
1,2,3,1
|
|
1,2,3,1
|
|
4,5,6,2
|
|
4,5,6,2
|
|
4,5,6,2
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
|
|
<p/> The <b>extent</b> is for the duration of the put/filter: in a
|
|
<code>begin</code> statement (which executes before the first input record is
|
|
consumed) you will find <code>NR=1</code> and in an <code>end</code> statement (which
|
|
is executed after the last input record is consumed) you will find <code>NR</code>
|
|
to be the total number of records ingested.
|
|
|
|
<p/> These are all <b>read-only</b> for the <code>mlr put</code> and <code>mlr
|
|
filter</code> DSLs: they may be assigned from, e.g. <code>$nr=NR</code>, but they may
|
|
not be assigned to: <code>NR=100</code> is a syntax error.
|
|
|
|
<!-- ================================================================ -->
|
|
<a id="Field_names"/><h2>Field names</h2>
|
|
|
|
<p/>Names of fields within stream records must be specified using a <code>$</code>
|
|
in <code>filter</code> and <a href="reference-verbs.html#put"><code>put</code></a>
|
|
expressions, even though the dollar signs don’t appear in the data stream
|
|
itself. For integer-indexed data, this looks like <code>awk</code>’s
|
|
<code>$1,$2,$3</code>, except that Miller allows non-numeric names such as
|
|
<code>$quantity</code> or <code>$hostname</code>. Likewise, enclose string literals
|
|
in double quotes in <code>filter</code> expressions even though they don’t
|
|
appear in file data. In particular, <code>mlr filter '$x=="abc"'</code> passes
|
|
through the record <code>x=abc</code>.
|
|
|
|
<p/>If field names have <b>special characters</b> such as <code>.</code> then you
|
|
can use braces, e.g. <code>'${field.name}'</code>.
|
|
|
|
<p/>You may also use a <b>computed field name</b> in square brackets, e.g.
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ echo a=3,b=4 | mlr filter '$["x"] < 0.5'
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ echo s=green,t=blue,a=3,b=4 | mlr put '$[$s."_".$t] = $a * $b'
|
|
s=green,t=blue,a=3,b=4,green_blue=12
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
|
|
Notes:
|
|
|
|
<p/> The names of record fields depend on the contents of your input data stream, and their
|
|
values change from one record to the next as Miller scans through your input
|
|
data stream.
|
|
|
|
<p/> Their <b>extent</b> is limited to the current record; their <b>scope</b>
|
|
is the <code>filter</code> or <code>put</code> command in which they appear.
|
|
|
|
<p/> These are <b>read-write</b>: you can do <code>$y=2*$x</code>,
|
|
<code>$x=$x+1</code>, etc.
|
|
|
|
<p/> Records are Miller’s output: field names present in the input
|
|
stream are passed through to output (written to standard output) unless fields
|
|
are removed with <code>cut</code>, or records are excluded with <code>filter</code> or
|
|
<code>put -q</code>, etc. Simply assign a value to a field and it will be output.
|
|
|
|
<!-- ================================================================ -->
|
|
<a id="Positional_field_names"/><h2>Positional field names</h2>
|
|
|
|
<p/> Even though Miller’s main selling point is
|
|
name-indexing, sometimes you really want to refer to a field name by its
|
|
positional index (starting from 1).
|
|
|
|
<p/> Use <code>$[[3]]</code> to access the name of field 3. More generally, any
|
|
expression evaluating to an integer can go between <code>$[[</code> and
|
|
<code>]]</code>.
|
|
|
|
Then using a computed field name, <code>$[ $[[3]] ]</code> is the value in the third field.
|
|
This has the shorter equivalent notation <code>$[[[3]]]</code>.
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr cat data/small
|
|
a=pan,b=pan,i=1,x=0.3467901443380824,y=0.7268028627434533
|
|
a=eks,b=pan,i=2,x=0.7586799647899636,y=0.5221511083334797
|
|
a=wye,b=wye,i=3,x=0.20460330576630303,y=0.33831852551664776
|
|
a=eks,b=wye,i=4,x=0.38139939387114097,y=0.13418874328430463
|
|
a=wye,b=pan,i=5,x=0.5732889198020006,y=0.8636244699032729
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr put '$[[3]] = "NEW"' data/small
|
|
a=pan,b=pan,NEW=1,x=0.3467901443380824,y=0.7268028627434533
|
|
a=eks,b=pan,NEW=2,x=0.7586799647899636,y=0.5221511083334797
|
|
a=wye,b=wye,NEW=3,x=0.20460330576630303,y=0.33831852551664776
|
|
a=eks,b=wye,NEW=4,x=0.38139939387114097,y=0.13418874328430463
|
|
a=wye,b=pan,NEW=5,x=0.5732889198020006,y=0.8636244699032729
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr put '$[[[3]]] = "NEW"' data/small
|
|
a=pan,b=pan,i=NEW,x=0.3467901443380824,y=0.7268028627434533
|
|
a=eks,b=pan,i=NEW,x=0.7586799647899636,y=0.5221511083334797
|
|
a=wye,b=wye,i=NEW,x=0.20460330576630303,y=0.33831852551664776
|
|
a=eks,b=wye,i=NEW,x=0.38139939387114097,y=0.13418874328430463
|
|
a=wye,b=pan,i=NEW,x=0.5732889198020006,y=0.8636244699032729
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr put '$NEW = $[[NR]]' data/small
|
|
a=pan,b=pan,i=1,x=0.3467901443380824,y=0.7268028627434533,NEW=a
|
|
a=eks,b=pan,i=2,x=0.7586799647899636,y=0.5221511083334797,NEW=b
|
|
a=wye,b=wye,i=3,x=0.20460330576630303,y=0.33831852551664776,NEW=i
|
|
a=eks,b=wye,i=4,x=0.38139939387114097,y=0.13418874328430463,NEW=x
|
|
a=wye,b=pan,i=5,x=0.5732889198020006,y=0.8636244699032729,NEW=y
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr put '$NEW = $[[[NR]]]' data/small
|
|
a=pan,b=pan,i=1,x=0.3467901443380824,y=0.7268028627434533,NEW=pan
|
|
a=eks,b=pan,i=2,x=0.7586799647899636,y=0.5221511083334797,NEW=pan
|
|
a=wye,b=wye,i=3,x=0.20460330576630303,y=0.33831852551664776,NEW=3
|
|
a=eks,b=wye,i=4,x=0.38139939387114097,y=0.13418874328430463,NEW=0.381399
|
|
a=wye,b=pan,i=5,x=0.5732889198020006,y=0.8636244699032729,NEW=0.863624
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr put '$[[[NR]]] = "NEW"' data/small
|
|
a=NEW,b=pan,i=1,x=0.3467901443380824,y=0.7268028627434533
|
|
a=eks,b=NEW,i=2,x=0.7586799647899636,y=0.5221511083334797
|
|
a=wye,b=wye,i=NEW,x=0.20460330576630303,y=0.33831852551664776
|
|
a=eks,b=wye,i=4,x=NEW,y=0.13418874328430463
|
|
a=wye,b=pan,i=5,x=0.5732889198020006,y=NEW
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
|
|
Right-hand side accesses to non-existent fields — i.e. with index less
|
|
than 1 or greater than <code>NF</code> -- return an absent value. Likewise,
|
|
left-hand side accesses only refer to fields which already exist. For example,
|
|
if a field has 5 records then assigning the name or value of the 6th (or 600th)
|
|
field results in a no-op.
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr put '$[[6]] = "NEW"' data/small
|
|
a=pan,b=pan,i=1,x=0.3467901443380824,y=0.7268028627434533
|
|
a=eks,b=pan,i=2,x=0.7586799647899636,y=0.5221511083334797
|
|
a=wye,b=wye,i=3,x=0.20460330576630303,y=0.33831852551664776
|
|
a=eks,b=wye,i=4,x=0.38139939387114097,y=0.13418874328430463
|
|
a=wye,b=pan,i=5,x=0.5732889198020006,y=0.8636244699032729
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr put '$[[[6]]] = "NEW"' data/small
|
|
a=pan,b=pan,i=1,x=0.3467901443380824,y=0.7268028627434533
|
|
a=eks,b=pan,i=2,x=0.7586799647899636,y=0.5221511083334797
|
|
a=wye,b=wye,i=3,x=0.20460330576630303,y=0.33831852551664776
|
|
a=eks,b=wye,i=4,x=0.38139939387114097,y=0.13418874328430463
|
|
a=wye,b=pan,i=5,x=0.5732889198020006,y=0.8636244699032729
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
|
|
<!-- ================================================================ -->
|
|
<a id="Out-of-stream_variables"/><h2>Out-of-stream variables</h2>
|
|
|
|
<p/> These are prefixed with an at-sign, e.g. <code>@sum</code>. Furthermore,
|
|
unlike built-in variables and stream-record fields, they are maintained in an
|
|
arbitrarily nested hashmap: you can do <code>@sum += $quanity</code>, or
|
|
<code>@sum[$color] += $quanity</code>, or <code>@sum[$color][$shape] +=
|
|
$quanity</code>. The keys for the multi-level hashmap can be any expression which
|
|
evaluates to string or integer: e.g. <code>@sum[NR] = $a + $b</code>,
|
|
<code>@sum[$a."-".$b] = $x</code>, etc.
|
|
|
|
<p/> Their names and their values are entirely under your control; they change
|
|
only when you assign to them.
|
|
|
|
<p/> Just as for field names in stream records, if you want to define out-of-stream variables
|
|
with <b>special characters</b> such as <code>.</code> then you can use braces, e.g. <code>'@{variable.name}["index"]'</code>.
|
|
|
|
<p/>You may use a <b>computed key </b> in square brackets, e.g.
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ echo s=green,t=blue,a=3,b=4 | mlr put -q '@[$s."_".$t] = $a * $b; emit all'
|
|
green_blue=12
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
|
|
<p/> Out-of-stream variables are <b>scoped</b> to the <code>put</code> command in
|
|
which they appear. In particular, if you have two or more <code>put</code>
|
|
commands separated by <code>then</code>, each put will have its own set of
|
|
out-of-stream variables:
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ cat data/a.dkvp
|
|
a=1,b=2,c=3
|
|
a=4,b=5,c=6
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr put '@sum += $a; end {emit @sum}' then put 'is_present($a) {$a=10*$a; @sum += $a}; end {emit @sum}' data/a.dkvp
|
|
a=10,b=2,c=3
|
|
a=40,b=5,c=6
|
|
sum=5
|
|
sum=50
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
|
|
<p/> Out-of-stream variables’ <b>extent</b> is from the start to the end of the record stream,
|
|
i.e. every time the <code>put</code> or <code>filter</code> statement referring to them is executed.
|
|
|
|
<p/> Out-of-stream variables are <b>read-write</b>: you can do <code>$sum=@sum</code>, <code>@sum=$sum</code>,
|
|
etc.
|
|
|
|
<!-- ================================================================ -->
|
|
<a id="Indexed_out-of-stream_variables"/><h2>Indexed out-of-stream variables</h2>
|
|
|
|
<p/>Using an index on the <code>@count</code> and <code>@sum</code> variables, we get the benefit of the
|
|
<code>-g</code> (group-by) option which <code>mlr stats1</code> and various other Miller commands have:
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr put -q '
|
|
@x_count[$a] += 1;
|
|
@x_sum[$a] += $x;
|
|
end {
|
|
emit @x_count, "a";
|
|
emit @x_sum, "a";
|
|
}
|
|
' ../data/small
|
|
a=pan,x_count=2
|
|
a=eks,x_count=3
|
|
a=wye,x_count=2
|
|
a=zee,x_count=2
|
|
a=hat,x_count=1
|
|
a=pan,x_sum=0.849416
|
|
a=eks,x_sum=1.751863
|
|
a=wye,x_sum=0.777892
|
|
a=zee,x_sum=1.125680
|
|
a=hat,x_sum=0.031442
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr stats1 -a count,sum -f x -g a ../data/small
|
|
a=pan,x_count=2,x_sum=0.849416
|
|
a=eks,x_count=3,x_sum=1.751863
|
|
a=wye,x_count=2,x_sum=0.777892
|
|
a=zee,x_count=2,x_sum=1.125680
|
|
a=hat,x_count=1,x_sum=0.031442
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
|
|
<p/>Indices can be arbitrarily deep — here there are two or more of them:
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr --from data/medium put -q '
|
|
@x_count[$a][$b] += 1;
|
|
@x_sum[$a][$b] += $x;
|
|
end {
|
|
emit (@x_count, @x_sum), "a", "b";
|
|
}
|
|
'
|
|
a=pan,b=pan,x_count=427,x_sum=219.185129
|
|
a=pan,b=wye,x_count=395,x_sum=198.432931
|
|
a=pan,b=eks,x_count=429,x_sum=216.075228
|
|
a=pan,b=hat,x_count=417,x_sum=205.222776
|
|
a=pan,b=zee,x_count=413,x_sum=205.097518
|
|
a=eks,b=pan,x_count=371,x_sum=179.963030
|
|
a=eks,b=wye,x_count=407,x_sum=196.945286
|
|
a=eks,b=zee,x_count=357,x_sum=176.880365
|
|
a=eks,b=eks,x_count=413,x_sum=215.916097
|
|
a=eks,b=hat,x_count=417,x_sum=208.783171
|
|
a=wye,b=wye,x_count=377,x_sum=185.295850
|
|
a=wye,b=pan,x_count=392,x_sum=195.847900
|
|
a=wye,b=hat,x_count=426,x_sum=212.033183
|
|
a=wye,b=zee,x_count=385,x_sum=194.774048
|
|
a=wye,b=eks,x_count=386,x_sum=204.812961
|
|
a=zee,b=pan,x_count=389,x_sum=202.213804
|
|
a=zee,b=wye,x_count=455,x_sum=233.991394
|
|
a=zee,b=eks,x_count=391,x_sum=190.961778
|
|
a=zee,b=zee,x_count=403,x_sum=206.640635
|
|
a=zee,b=hat,x_count=409,x_sum=191.300006
|
|
a=hat,b=wye,x_count=423,x_sum=208.883010
|
|
a=hat,b=zee,x_count=385,x_sum=196.349450
|
|
a=hat,b=eks,x_count=389,x_sum=189.006793
|
|
a=hat,b=hat,x_count=381,x_sum=182.853532
|
|
a=hat,b=pan,x_count=363,x_sum=168.553807
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
|
|
The idea is that <code>stats1</code>, and other Miller verbs, encapsulate
|
|
frequently-used patterns with a minimum of keystroking (and run a little
|
|
faster), whereas using out-of-stream variables you have more flexibility and
|
|
control in what you do.
|
|
|
|
<p/>Begin/end blocks can be mixed with pattern/action blocks. For example:
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr put '
|
|
begin {
|
|
@num_total = 0;
|
|
@num_positive = 0;
|
|
};
|
|
@num_total += 1;
|
|
$x > 0.0 {
|
|
@num_positive += 1;
|
|
$y = log10($x); $z = sqrt($y)
|
|
};
|
|
end {
|
|
emitf @num_total, @num_positive
|
|
}
|
|
' data/put-gating-example-1.dkvp
|
|
x=-1
|
|
x=0
|
|
x=1,y=0.000000,z=0.000000
|
|
x=2,y=0.301030,z=0.548662
|
|
x=3,y=0.477121,z=0.690740
|
|
num_total=5,num_positive=3
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
|
|
<!-- ================================================================ -->
|
|
<a id="Local_variables"/><h2>Local variables</h2>
|
|
|
|
<p/>Local variables are similar to out-of-stream variables, except that
|
|
their extent is limited to the expressions in which they appear (and their
|
|
basenames can’t be computed using square brackets).
|
|
There are three kinds of local variables: <b>arguments</b> to
|
|
functions/subroutines, <b>variables bound within for-loops</b>, and
|
|
<b>locals</b> defined within control blocks. They may be untyped using
|
|
<code>var</code>, or typed using <code>num</code>, <code>int</code>, <code>float</code>,
|
|
<code>str</code>, <code>bool</code>, and <code>map</code>.
|
|
|
|
<p/>For example:
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ # Here I'm using a specified random-number seed so this example always
|
|
# produces the same output for this web document: in everyday practice we
|
|
# would leave off the --seed 12345 part.
|
|
mlr --seed 12345 seqgen --start 1 --stop 10 then put '
|
|
func f(a, b) { # function arguments a and b
|
|
r = 0.0; # local r scoped to the function
|
|
for (int i = 0; i < 6; i += 1) { # local i scoped to the for-loop
|
|
num u = urand(); # local u scoped to the for-loop
|
|
r += u; # updates r from the enclosing scope
|
|
}
|
|
r /= 6;
|
|
return a + (b - a) * r;
|
|
}
|
|
num o = f(10, 20); # local to the top-level scope
|
|
$o = o;
|
|
'
|
|
i=1,o=14.662901
|
|
i=2,o=17.881983
|
|
i=3,o=14.586560
|
|
i=4,o=16.402409
|
|
i=5,o=16.336598
|
|
i=6,o=14.622701
|
|
i=7,o=15.983753
|
|
i=8,o=13.852177
|
|
i=9,o=15.472899
|
|
i=10,o=15.643912
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
|
|
<p/>Things which are completely unsurprising, resembling many other languages:
|
|
|
|
<ul>
|
|
|
|
<li/> Parameter names are bound to their arguments but can be reassigned, e.g.
|
|
if there is a parameter named <code>a</code> then you can reassign the value of
|
|
<code>a</code> to be something else within the function if you like.
|
|
|
|
<li/> However, you cannot redeclare the <i>type</i> of an argument or a local:
|
|
<code>var a=1; var a=2</code> is an error but
|
|
<code>var a=1; a=2</code> is OK.
|
|
|
|
<li/> All argument-passing is positional rather than by name; arguments are
|
|
passed by value, not by reference. (This is also true for map-valued variables:
|
|
they are not, and cannot be, passed by reference)
|
|
|
|
<li/> You can define locals (using <code>var</code>, <code>num</code>, etc.) at any
|
|
scope (if-statements, else-statements, while-loops, for-loops, or the top-level
|
|
scope), and nested scopes will have access (more details on scope in the next
|
|
section). If you define a local variable with the same name inside an inner
|
|
scope, then a new variable is created with the narrower scope.
|
|
|
|
<li/> If you assign to a local variable for the first time in a scope without
|
|
declaring it as <code>var</code>, <code>num</code>, etc. then: if it exists in an outer
|
|
scope, that outer-scope variable will be updated; if not, it will be defined in
|
|
the current scope as if <code>var</code> had been used. (See also <a
|
|
href="#Type-checking">here</a> for an example.) I recommend always declaring
|
|
variables explicitly to make the intended scoping clear.
|
|
|
|
<li/> Functions and subroutines never have access to locals from their callee
|
|
(unless passed by value as arguments).
|
|
|
|
</ul>
|
|
|
|
<p/>Things which are perhaps surprising compared to other languages:
|
|
|
|
<ul>
|
|
|
|
<li/> Type declarations using <code>var</code>, or typed using <code>num</code>,
|
|
<code>int</code>, <code>float</code>, <code>str</code>, and <code>bool</code> are necessary to
|
|
declare local variables. Function arguments and variables bound in for-loops
|
|
over stream records and out-of-stream variables are <i>implicitly</i> declared
|
|
using <code>var</code>. (Some examples are shown below.)
|
|
|
|
<li/> Type-checking is done at assignment time. For example, <code>float f =
|
|
0</code> is an error (since <code>0</code> is an integer), as is <code>float f = 0.0; f
|
|
= 1</code>. For this reason I prefer to use <code>num</code> over <code>float</code> in
|
|
most contexts since <code>num</code> encompasses integer and floating-point values.
|
|
More information about type-checking is <a href="#Type-checking">here</a>.
|
|
|
|
<li/> Bound variables in for-loops over stream records and out-of-stream
|
|
variables are implicitly local to that block. E.g. in
|
|
<code>for (k, v in $*) { ... }</code>
|
|
<code>for ((k1, k2), v in @*) { ... }</code>
|
|
if there are <code>k</code>, <code>v</code>, etc. in the enclosing scope then those
|
|
will be masked by the loop-local bound variables in the loop, and moreover
|
|
the values of the loop-local bound variables are not available after the
|
|
end of the loop.
|
|
|
|
<li/> For C-style triple-for loops, if a for-loop variable is defined using
|
|
<code>var</code>, <code>int</code>, etc. then it is scoped to that for-loop. E.g.
|
|
<code>for (i = 0; i < 10; i += 1) { ... }</code> and <code>for (int i = 0; i < 10; i
|
|
+= 1) { ... }</code>. (This is unsurprising.). If there is no typedecl and an
|
|
outer-scope variable of that name exists, then it is used. (This is also
|
|
unsurprising.) But of there is no outer-scope variable of that name then the
|
|
variable is scoped to the for-loop only.
|
|
|
|
</ul>
|
|
|
|
<p/> The following example demonstrates the scope rules:
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ cat data/scope-example.mlr
|
|
func f(a) { # argument is local to the function
|
|
var b = 100; # local to the function
|
|
c = 100; # local to the function; does not overwrite outer c
|
|
return a + 1;
|
|
}
|
|
var a = 10; # local at top level
|
|
var b = 20; # local at top level
|
|
c = 30; # local at top level; there is no more-outer-scope c
|
|
if (NR == 3) {
|
|
var a = 40; # scoped to the if-statement; doesn't overwrite outer a
|
|
b = 50; # not scoped to the if-statement; overwrites outer b
|
|
c = 60; # not scoped to the if-statement; overwrites outer c
|
|
d = 70; # there is no outer d so a local d is created here
|
|
|
|
$inner_a = a;
|
|
$inner_b = b;
|
|
$inner_c = c;
|
|
$inner_d = d;
|
|
}
|
|
$outer_a = a;
|
|
$outer_b = b;
|
|
$outer_c = c;
|
|
$outer_d = d; # there is no outer d defined so no assignment happens
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ cat data/scope-example.dat
|
|
n=1,x=123
|
|
n=2,x=456
|
|
n=3,x=789
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr --oxtab --from data/scope-example.dat put -f data/scope-example.mlr
|
|
n 1
|
|
x 123
|
|
outer_a 10
|
|
outer_b 20
|
|
outer_c 30
|
|
|
|
n 2
|
|
x 456
|
|
outer_a 10
|
|
outer_b 20
|
|
outer_c 30
|
|
|
|
n 3
|
|
x 789
|
|
inner_a 40
|
|
inner_b 50
|
|
inner_c 60
|
|
inner_d 70
|
|
outer_a 10
|
|
outer_b 50
|
|
outer_c 60
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
|
|
<p/> And this example demonstrates the type-declaration rules:
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ cat data/type-decl-example.mlr
|
|
subr s(a, str b, int c) { # a is implicitly var (untyped).
|
|
# b is explicitly str.
|
|
# c is explicitly int.
|
|
# The type-checking is done at the callsite
|
|
# when arguments are bound to parameters.
|
|
#
|
|
var b = 100; # error # Re-declaration in the same scope is disallowed.
|
|
int n = 10; # Declaration of variable local to the subroutine.
|
|
n = 20; # Assignment is OK.
|
|
int n = 30; # error # Re-declaration in the same scope is disallowed.
|
|
str n = "abc"; # error # Re-declaration in the same scope is disallowed.
|
|
#
|
|
float f1 = 1; # error # 1 is an int, not a float.
|
|
float f2 = 2.0; # 2.0 is a float.
|
|
num f3 = 3; # 3 is a num.
|
|
num f4 = 4.0; # 4.0 is a num.
|
|
} #
|
|
#
|
|
call s(1, 2, 3); # Type-assertion '3 is int' is done here at the callsite.
|
|
#
|
|
k = "def"; # Top-level variable k.
|
|
#
|
|
for (str k, v in $*) { # k and v are bound here, masking outer k.
|
|
print k . ":" . v; # k is explicitly str; v is implicitly var.
|
|
} #
|
|
#
|
|
print "k is".k; # k at this scope level is still "def".
|
|
print "v is".v; # v is undefined in this scope.
|
|
#
|
|
i = -1; #
|
|
for (i = 1, int j = 2; i <= 10; i += 1, j *= 2) { # C-style triple-for variables use enclosing scope, unless
|
|
# declared local: i is outer, j is local to the loop.
|
|
print "inner i =" . i; #
|
|
print "inner j =" . j; #
|
|
} #
|
|
print "outer i =" . i; # i has been modified by the loop.
|
|
print "outer j =" . j; # j is undefined in this scope.
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
|
|
<!-- ================================================================ -->
|
|
<a id="Map_literals"/><h2>Map literals</h2>
|
|
|
|
<p/>Miller’s <code>put</code>/<code>filter</code> DSL has four kinds of hashmaps.
|
|
<b>Stream records</b> are (single-level) maps from name to value.
|
|
<b>Out-of-stream variables</b> and <b>local variables</b> can also be maps,
|
|
although they can be multi-level hashmaps (e.g. <code>@sum[$x][$y]</code>). The
|
|
fourth kind is <b>map literals</b>. These cannot be on the left-hand side of
|
|
assignment expressions. Syntactically they look like JSON, although Miller
|
|
allows string and integer keys in its map literals while JSON allows only
|
|
string keys (e.g. <code>"3"</code> rather than <code>3</code>).
|
|
|
|
<p/> For example, the following swaps the input stream’s <code>a</code> and
|
|
<code>i</code> fields, modifies <code>y</code>, and drops the rest:
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr --opprint put '
|
|
$* = {
|
|
"a": $i,
|
|
"i": $a,
|
|
"y": $y * 10,
|
|
}
|
|
' data/small
|
|
a i y
|
|
1 pan 7.268029
|
|
2 eks 5.221511
|
|
3 wye 3.383185
|
|
4 eks 1.341887
|
|
5 wye 8.636245
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
|
|
<p/>Likewise, you can assign map literals to out-of-stream variables or local variables;
|
|
pass them as arguments to user-defined functions, return them from functions, and so on:
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr --from data/small put '
|
|
func f(map m): map {
|
|
m["x"] *= 200;
|
|
return m;
|
|
}
|
|
$* = f({"a": $a, "x": $x});
|
|
'
|
|
a=pan,x=69.358029
|
|
a=eks,x=151.735993
|
|
a=wye,x=40.920661
|
|
a=eks,x=76.279879
|
|
a=wye,x=114.657784
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
|
|
<p/> Like out-of-stream and local variables, map literals can be multi-level:
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr --from data/small put -q '
|
|
begin {
|
|
@o = {
|
|
"nrec": 0,
|
|
"nkey": {"numeric":0, "non-numeric":0},
|
|
};
|
|
}
|
|
@o["nrec"] += 1;
|
|
for (k, v in $*) {
|
|
if (is_numeric(v)) {
|
|
@o["nkey"]["numeric"] += 1;
|
|
} else {
|
|
@o["nkey"]["non-numeric"] += 1;
|
|
}
|
|
}
|
|
end {
|
|
dump @o;
|
|
}
|
|
'
|
|
{
|
|
"nrec": 5,
|
|
"nkey": {
|
|
"numeric": 15,
|
|
"non-numeric": 10
|
|
}
|
|
}
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
|
|
<p/>By default, map-valued expressions are dumped using JSON formatting. If you
|
|
use <code>dump</code> to print a hashmap with integer keys and you don’t want
|
|
them double-quoted (JSON-style) then you can use <code>mlr put
|
|
--jknquoteint</code>. See also <code>mlr put --help</code>.
|
|
|
|
<!-- ================================================================ -->
|
|
<a id="Type-checking"/><h2>Type-checking</h2>
|
|
|
|
<p/> Miller’s <code>put</code>/<code>filter</code> DSLs support two optional
|
|
kinds of type-checking. One is inline <b>type-tests</b> and
|
|
<b>type-assertions</b> within expressions. The other is <b>type
|
|
declarations</b> for assignments to local variables, binding of arguments to
|
|
user-defined functions, and return values from user-defined functions, These
|
|
are discussed in the following subsections.
|
|
|
|
<p/> Use of type-checking is entirely up to you: omit it if you want
|
|
flexibility with heterogeneous data; use it if you want to help catch
|
|
misspellings in your DSL code or unexpected irregularities in your input data.
|
|
|
|
<!-- ================================================================ -->
|
|
<a id="Type-test_and_type-assertion_expressions"/><h3>Type-test and type-assertion expressions</h3>
|
|
|
|
<p/> The following <code>is...</code> functions take a value and return a boolean
|
|
indicating whether the argument is of the indicated type. The
|
|
<code>assert_...</code> functions return their argument if it is of the specified
|
|
type, and cause a fatal error otherwise:
|
|
|
|
<table>
|
|
<tr>
|
|
<td>
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr -F | grep ^is
|
|
is_absent
|
|
is_bool
|
|
is_boolean
|
|
is_empty
|
|
is_empty_map
|
|
is_float
|
|
is_int
|
|
is_map
|
|
is_nonempty_map
|
|
is_not_empty
|
|
is_not_map
|
|
is_not_null
|
|
is_null
|
|
is_numeric
|
|
is_present
|
|
is_string
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
</td>
|
|
<td>
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr -F | grep ^assert
|
|
asserting_absent
|
|
asserting_bool
|
|
asserting_boolean
|
|
asserting_empty
|
|
asserting_empty_map
|
|
asserting_float
|
|
asserting_int
|
|
asserting_map
|
|
asserting_nonempty_map
|
|
asserting_not_empty
|
|
asserting_not_map
|
|
asserting_not_null
|
|
asserting_null
|
|
asserting_numeric
|
|
asserting_present
|
|
asserting_string
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
</td>
|
|
</tr>
|
|
</table>
|
|
|
|
<p/> Please see the <a href="cookbook.html#Data-cleaning_examples">Cookbook part 1</a> for examples
|
|
of how to use these.
|
|
|
|
<!-- ================================================================ -->
|
|
<a id="Type-declarations_for_local_variables,_function_parameter,_and_function_return_values"/><h3>Type-declarations for local variables, function parameter, and function return values</h3>
|
|
|
|
<p/> Local variables can be defined either untyped as in <code>x = 1</code>, or
|
|
typed as in <code>int x = 1</code>. Types include <b>var</b> (explicitly untyped),
|
|
<b>int</b>, <b>float</b>, <b>num</b> (int or float), <b>str</b>, <b>bool</b>,
|
|
and <b>map</b>. These optional type declarations are enforced at the time
|
|
values are assigned to variables: whether at the initial value assignment as in
|
|
<code>int x = 1</code> or in any subsequent assignments to the same variable
|
|
farther down in the scope.
|
|
|
|
<p/> The reason for <code>num</code> is that <code>int</code> and <code>float</code> typedecls are very precise:
|
|
|
|
<div class="pokipanel">
|
|
<pre>
|
|
float a = 0; # Runtime error since 0 is int not float
|
|
int b = 1.0; # Runtime error since 1.0 is float not int
|
|
num c = 0; # OK
|
|
num d = 1.0; # OK
|
|
</pre>
|
|
</div>
|
|
|
|
<p/> A suggestion is to use <code>num</code> for general use when you want numeric
|
|
content, and use <code>int</code> when you genuinely want integer-only values, e.g.
|
|
in loop indices or map keys (since Miller map keys can only be strings or
|
|
ints).
|
|
|
|
<p/> The <code>var</code> type declaration indicates no type restrictions, e.g.
|
|
<code>var x = 1</code> has the same type restrictions on <code>x</code> as <code>x =
|
|
1</code>. The difference is in intentional shadowing: if you have <code>x = 1</code>
|
|
in outer scope and <code>x = 2</code> in inner scope (e.g. within a for-loop or an
|
|
if-statement) then outer-scope <code>x</code> has value 2 after the second
|
|
assignment. But if you have <code>var x = 2</code> in the inner scope, then you
|
|
are declaring a variable scoped to the inner block.) For example:
|
|
|
|
<div class="pokipanel">
|
|
<pre>
|
|
x = 1;
|
|
if (NR == 4) {
|
|
x = 2; # Refers to outer-scope x: value changes from 1 to 2.
|
|
}
|
|
print x; # Value of x is now two
|
|
</pre>
|
|
</div>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
x = 1;
|
|
if (NR == 4) {
|
|
var x = 2; # Defines a new inner-scope x with value 2
|
|
}
|
|
print x; # Value of this x is still 1
|
|
</pre>
|
|
</div>
|
|
|
|
<p/> Likewise function arguments can optionally be typed, with type enforced
|
|
when the function is called:
|
|
|
|
<div class="pokipanel">
|
|
<pre>
|
|
func f(map m, int i) {
|
|
...
|
|
}
|
|
$a = f({1:2, 3:4}, 5); # OK
|
|
$b = f({1:2, 3:4}, "abc"); # Runtime error
|
|
$c = f({1:2, 3:4}, $x); # Runtime error for records with non-integer field named x
|
|
if (NR == 4) {
|
|
var x = 2; # Defines a new inner-scope x with value 2
|
|
}
|
|
print x; # Value of this x is still 1
|
|
</pre>
|
|
</div>
|
|
|
|
<p/> Thirdly, function return values can be type-checked at the point of
|
|
<code>return</code> using <code>:</code> and a typedecl after the parameter list:
|
|
|
|
<div class="pokipanel">
|
|
<pre>
|
|
func f(map m, int i): bool {
|
|
...
|
|
...
|
|
if (...) {
|
|
return "false"; # Runtime error if this branch is taken
|
|
}
|
|
...
|
|
...
|
|
if (...) {
|
|
return retval; # Runtime error if this function doesn't have an in-scope
|
|
# boolean-valued variable named retval
|
|
}
|
|
...
|
|
...
|
|
# In Miller if your functions don't explicitly return a value, they return absent-null.
|
|
# So it would also be a runtime error on reaching the end of this function without
|
|
# an explicit return statement.
|
|
}
|
|
</pre>
|
|
</div>
|
|
|
|
<!-- ================================================================ -->
|
|
<a id="Null_data:_empty_and_absent"/><h2>Null data: empty and absent</h2>
|
|
|
|
<p/> Please see
|
|
<a href="reference.html#Null_data:_empty_and_absent">here</a>.
|
|
|
|
<!-- ================================================================ -->
|
|
<a id="Aggregate_variable_assignments"/><h2>Aggregate variable assignments</h2>
|
|
|
|
<p/>There are three remaining kinds of variable assignment using out-of-stream
|
|
variables, the last two of which use the <code>$*</code> syntax:
|
|
<ul>
|
|
<li/> Recursive copy of out-of-stream variables
|
|
<li/> Out-of-stream variable assigned to full stream record
|
|
<li/> Full stream record assigned to an out-of-stream variable
|
|
</ul>
|
|
|
|
<p/> Example recursive copy of out-of-stream variables:
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr --opprint put -q '@v["sum"] += $x; @v["count"] += 1; end{dump; @w = @v; dump}' data/small
|
|
{
|
|
"v": {
|
|
"sum": 2.264762,
|
|
"count": 5
|
|
}
|
|
}
|
|
{
|
|
"v": {
|
|
"sum": 2.264762,
|
|
"count": 5
|
|
},
|
|
"w": {
|
|
"sum": 2.264762,
|
|
"count": 5
|
|
}
|
|
}
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
|
|
<p/>Example of out-of-stream variable assigned to full stream record, where the 2nd record is stashed, and the 4th record is overwritten with that:
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr put 'NR == 2 {@keep = $*}; NR == 4 {$* = @keep}' data/small
|
|
a=pan,b=pan,i=1,x=0.3467901443380824,y=0.7268028627434533
|
|
a=eks,b=pan,i=2,x=0.7586799647899636,y=0.5221511083334797
|
|
a=wye,b=wye,i=3,x=0.20460330576630303,y=0.33831852551664776
|
|
a=eks,b=pan,i=2,x=0.7586799647899636,y=0.5221511083334797
|
|
a=wye,b=pan,i=5,x=0.5732889198020006,y=0.8636244699032729
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
|
|
<p/>Example of full stream record assigned to an out-of-stream variable, finding
|
|
the record for which the <code>x</code> field has the largest value in the input
|
|
stream:
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ cat data/small
|
|
a=pan,b=pan,i=1,x=0.3467901443380824,y=0.7268028627434533
|
|
a=eks,b=pan,i=2,x=0.7586799647899636,y=0.5221511083334797
|
|
a=wye,b=wye,i=3,x=0.20460330576630303,y=0.33831852551664776
|
|
a=eks,b=wye,i=4,x=0.38139939387114097,y=0.13418874328430463
|
|
a=wye,b=pan,i=5,x=0.5732889198020006,y=0.8636244699032729
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr --opprint put -q 'is_null(@xmax) || $x > @xmax {@xmax=$x; @recmax=$*}; end {emit @recmax}' data/small
|
|
a b i x y
|
|
eks pan 2 0.7586799647899636 0.5221511083334797
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
|
|
<!-- ================================================================ -->
|
|
<a id="Keywords_for_filter_and_put"/><h2>Keywords for filter and put</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr --help-all-keywords
|
|
all: used in "emit", "emitp", and "unset" as a synonym for @*
|
|
|
|
begin: defines a block of statements to be executed before input records
|
|
are ingested. The body statements must be wrapped in curly braces.
|
|
Example: 'begin { @count = 0 }'
|
|
|
|
bool: declares a boolean local variable in the current curly-braced scope.
|
|
Type-checking happens at assignment: 'bool b = 1' is an error.
|
|
|
|
break: causes execution to continue after the body of the current
|
|
for/while/do-while loop.
|
|
|
|
call: used for invoking a user-defined subroutine.
|
|
Example: 'subr s(k,v) { print k . " is " . v} call s("a", $a)'
|
|
|
|
continue: causes execution to skip the remaining statements in the body of
|
|
the current for/while/do-while loop. For-loop increments are still applied.
|
|
|
|
do: with "while", introduces a do-while loop. The body statements must be wrapped
|
|
in curly braces.
|
|
|
|
dump: prints all currently defined out-of-stream variables immediately
|
|
to stdout as JSON.
|
|
|
|
With >, >>, or |, the data do not become part of the output record stream but
|
|
are instead redirected.
|
|
|
|
The > and >> are for write and append, as in the shell, but (as with awk) the
|
|
file-overwrite for > is on first write, not per record. The | is for piping to
|
|
a process which will process the data. There will be one open file for each
|
|
distinct file name (for > and >>) or one subordinate process for each distinct
|
|
value of the piped-to command (for |). Output-formatting flags are taken from
|
|
the main command line.
|
|
|
|
Example: mlr --from f.dat put -q '@v[NR]=$*; end { dump }'
|
|
Example: mlr --from f.dat put -q '@v[NR]=$*; end { dump > "mytap.dat"}'
|
|
Example: mlr --from f.dat put -q '@v[NR]=$*; end { dump >> "mytap.dat"}'
|
|
Example: mlr --from f.dat put -q '@v[NR]=$*; end { dump | "jq .[]"}'
|
|
|
|
edump: prints all currently defined out-of-stream variables immediately
|
|
to stderr as JSON.
|
|
|
|
Example: mlr --from f.dat put -q '@v[NR]=$*; end { edump }'
|
|
|
|
elif: the way Miller spells "else if". The body statements must be wrapped
|
|
in curly braces.
|
|
|
|
else: terminates an if/elif/elif chain. The body statements must be wrapped
|
|
in curly braces.
|
|
|
|
emit: inserts an out-of-stream variable into the output record stream. Hashmap
|
|
indices present in the data but not slotted by emit arguments are not output.
|
|
|
|
With >, >>, or |, the data do not become part of the output record stream but
|
|
are instead redirected.
|
|
|
|
The > and >> are for write and append, as in the shell, but (as with awk) the
|
|
file-overwrite for > is on first write, not per record. The | is for piping to
|
|
a process which will process the data. There will be one open file for each
|
|
distinct file name (for > and >>) or one subordinate process for each distinct
|
|
value of the piped-to command (for |). Output-formatting flags are taken from
|
|
the main command line.
|
|
|
|
You can use any of the output-format command-line flags, e.g. --ocsv, --ofs,
|
|
etc., to control the format of the output if the output is redirected. See also mlr -h.
|
|
|
|
Example: mlr --from f.dat put 'emit > "/tmp/data-".$a, $*'
|
|
Example: mlr --from f.dat put 'emit > "/tmp/data-".$a, mapexcept($*, "a")'
|
|
Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emit @sums'
|
|
Example: mlr --from f.dat put --ojson '@sums[$a][$b]+=$x; emit > "tap-".$a.$b.".dat", @sums'
|
|
Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emit @sums, "index1", "index2"'
|
|
Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emit @*, "index1", "index2"'
|
|
Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emit > "mytap.dat", @*, "index1", "index2"'
|
|
Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emit >> "mytap.dat", @*, "index1", "index2"'
|
|
Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emit | "gzip > mytap.dat.gz", @*, "index1", "index2"'
|
|
Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emit > stderr, @*, "index1", "index2"'
|
|
Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emit | "grep somepattern", @*, "index1", "index2"'
|
|
|
|
Please see http://johnkerl.org/miller/doc for more information.
|
|
|
|
emitf: inserts non-indexed out-of-stream variable(s) side-by-side into the
|
|
output record stream.
|
|
|
|
With >, >>, or |, the data do not become part of the output record stream but
|
|
are instead redirected.
|
|
|
|
The > and >> are for write and append, as in the shell, but (as with awk) the
|
|
file-overwrite for > is on first write, not per record. The | is for piping to
|
|
a process which will process the data. There will be one open file for each
|
|
distinct file name (for > and >>) or one subordinate process for each distinct
|
|
value of the piped-to command (for |). Output-formatting flags are taken from
|
|
the main command line.
|
|
|
|
You can use any of the output-format command-line flags, e.g. --ocsv, --ofs,
|
|
etc., to control the format of the output if the output is redirected. See also mlr -h.
|
|
|
|
Example: mlr --from f.dat put '@a=$i;@b+=$x;@c+=$y; emitf @a'
|
|
Example: mlr --from f.dat put --oxtab '@a=$i;@b+=$x;@c+=$y; emitf > "tap-".$i.".dat", @a'
|
|
Example: mlr --from f.dat put '@a=$i;@b+=$x;@c+=$y; emitf @a, @b, @c'
|
|
Example: mlr --from f.dat put '@a=$i;@b+=$x;@c+=$y; emitf > "mytap.dat", @a, @b, @c'
|
|
Example: mlr --from f.dat put '@a=$i;@b+=$x;@c+=$y; emitf >> "mytap.dat", @a, @b, @c'
|
|
Example: mlr --from f.dat put '@a=$i;@b+=$x;@c+=$y; emitf > stderr, @a, @b, @c'
|
|
Example: mlr --from f.dat put '@a=$i;@b+=$x;@c+=$y; emitf | "grep somepattern", @a, @b, @c'
|
|
Example: mlr --from f.dat put '@a=$i;@b+=$x;@c+=$y; emitf | "grep somepattern > mytap.dat", @a, @b, @c'
|
|
|
|
Please see http://johnkerl.org/miller/doc for more information.
|
|
|
|
emitp: inserts an out-of-stream variable into the output record stream.
|
|
Hashmap indices present in the data but not slotted by emitp arguments are
|
|
output concatenated with ":".
|
|
|
|
With >, >>, or |, the data do not become part of the output record stream but
|
|
are instead redirected.
|
|
|
|
The > and >> are for write and append, as in the shell, but (as with awk) the
|
|
file-overwrite for > is on first write, not per record. The | is for piping to
|
|
a process which will process the data. There will be one open file for each
|
|
distinct file name (for > and >>) or one subordinate process for each distinct
|
|
value of the piped-to command (for |). Output-formatting flags are taken from
|
|
the main command line.
|
|
|
|
You can use any of the output-format command-line flags, e.g. --ocsv, --ofs,
|
|
etc., to control the format of the output if the output is redirected. See also mlr -h.
|
|
|
|
Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emitp @sums'
|
|
Example: mlr --from f.dat put --opprint '@sums[$a][$b]+=$x; emitp > "tap-".$a.$b.".dat", @sums'
|
|
Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emitp @sums, "index1", "index2"'
|
|
Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emitp @*, "index1", "index2"'
|
|
Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emitp > "mytap.dat", @*, "index1", "index2"'
|
|
Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emitp >> "mytap.dat", @*, "index1", "index2"'
|
|
Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emitp | "gzip > mytap.dat.gz", @*, "index1", "index2"'
|
|
Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emitp > stderr, @*, "index1", "index2"'
|
|
Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emitp | "grep somepattern", @*, "index1", "index2"'
|
|
|
|
Please see http://johnkerl.org/miller/doc for more information.
|
|
|
|
end: defines a block of statements to be executed after input records
|
|
are ingested. The body statements must be wrapped in curly braces.
|
|
Example: 'end { emit @count }'
|
|
Example: 'end { eprint "Final count is " . @count }'
|
|
|
|
eprint: prints expression immediately to stderr.
|
|
Example: mlr --from f.dat put -q 'eprint "The sum of x and y is ".($x+$y)'
|
|
Example: mlr --from f.dat put -q 'for (k, v in $*) { eprint k . " => " . v }'
|
|
Example: mlr --from f.dat put '(NR % 1000 == 0) { eprint "Checkpoint ".NR}'
|
|
|
|
eprintn: prints expression immediately to stderr, without trailing newline.
|
|
Example: mlr --from f.dat put -q 'eprintn "The sum of x and y is ".($x+$y); eprint ""'
|
|
|
|
false: the boolean literal value.
|
|
|
|
filter: includes/excludes the record in the output record stream.
|
|
|
|
Example: mlr --from f.dat put 'filter (NR == 2 || $x > 5.4)'
|
|
|
|
Instead of put with 'filter false' you can simply use put -q. The following
|
|
uses the input record to accumulate data but only prints the running sum
|
|
without printing the input record:
|
|
|
|
Example: mlr --from f.dat put -q '@running_sum += $x * $y; emit @running_sum'
|
|
|
|
float: declares a floating-point local variable in the current curly-braced scope.
|
|
Type-checking happens at assignment: 'float x = 0' is an error.
|
|
|
|
for: defines a for-loop using one of three styles. The body statements must
|
|
be wrapped in curly braces.
|
|
For-loop over stream record:
|
|
Example: 'for (k, v in $*) { ... }'
|
|
For-loop over out-of-stream variables:
|
|
Example: 'for (k, v in @counts) { ... }'
|
|
Example: 'for ((k1, k2), v in @counts) { ... }'
|
|
Example: 'for ((k1, k2, k3), v in @*) { ... }'
|
|
C-style for-loop:
|
|
Example: 'for (var i = 0, var b = 1; i < 10; i += 1, b *= 2) { ... }'
|
|
|
|
func: used for defining a user-defined function.
|
|
Example: 'func f(a,b) { return sqrt(a**2+b**2)} $d = f($x, $y)'
|
|
|
|
if: starts an if/elif/elif chain. The body statements must be wrapped
|
|
in curly braces.
|
|
|
|
in: used in for-loops over stream records or out-of-stream variables.
|
|
|
|
int: declares an integer local variable in the current curly-braced scope.
|
|
Type-checking happens at assignment: 'int x = 0.0' is an error.
|
|
|
|
map: declares an map-valued local variable in the current curly-braced scope.
|
|
Type-checking happens at assignment: 'map b = 0' is an error. map b = {} is
|
|
always OK. map b = a is OK or not depending on whether a is a map.
|
|
|
|
num: declares an int/float local variable in the current curly-braced scope.
|
|
Type-checking happens at assignment: 'num b = true' is an error.
|
|
|
|
print: prints expression immediately to stdout.
|
|
Example: mlr --from f.dat put -q 'print "The sum of x and y is ".($x+$y)'
|
|
Example: mlr --from f.dat put -q 'for (k, v in $*) { print k . " => " . v }'
|
|
Example: mlr --from f.dat put '(NR % 1000 == 0) { print > stderr, "Checkpoint ".NR}'
|
|
|
|
printn: prints expression immediately to stdout, without trailing newline.
|
|
Example: mlr --from f.dat put -q 'printn "."; end { print "" }'
|
|
|
|
return: specifies the return value from a user-defined function.
|
|
Omitted return statements (including via if-branches) result in an absent-null
|
|
return value, which in turns results in a skipped assignment to an LHS.
|
|
|
|
stderr: Used for tee, emit, emitf, emitp, print, and dump in place of filename
|
|
to print to standard error.
|
|
|
|
stdout: Used for tee, emit, emitf, emitp, print, and dump in place of filename
|
|
to print to standard output.
|
|
|
|
str: declares a string local variable in the current curly-braced scope.
|
|
Type-checking happens at assignment.
|
|
|
|
subr: used for defining a subroutine.
|
|
Example: 'subr s(k,v) { print k . " is " . v} call s("a", $a)'
|
|
|
|
tee: prints the current record to specified file.
|
|
This is an immediate print to the specified file (except for pprint format
|
|
which of course waits until the end of the input stream to format all output).
|
|
|
|
The > and >> are for write and append, as in the shell, but (as with awk) the
|
|
file-overwrite for > is on first write, not per record. The | is for piping to
|
|
a process which will process the data. There will be one open file for each
|
|
distinct file name (for > and >>) or one subordinate process for each distinct
|
|
value of the piped-to command (for |). Output-formatting flags are taken from
|
|
the main command line.
|
|
|
|
You can use any of the output-format command-line flags, e.g. --ocsv, --ofs,
|
|
etc., to control the format of the output. See also mlr -h.
|
|
|
|
emit with redirect and tee with redirect are identical, except tee can only
|
|
output $*.
|
|
|
|
Example: mlr --from f.dat put 'tee > "/tmp/data-".$a, $*'
|
|
Example: mlr --from f.dat put 'tee >> "/tmp/data-".$a.$b, $*'
|
|
Example: mlr --from f.dat put 'tee > stderr, $*'
|
|
Example: mlr --from f.dat put -q 'tee | "tr [a-z\] [A-Z\]", $*'
|
|
Example: mlr --from f.dat put -q 'tee | "tr [a-z\] [A-Z\] > /tmp/data-".$a, $*'
|
|
Example: mlr --from f.dat put -q 'tee | "gzip > /tmp/data-".$a.".gz", $*'
|
|
Example: mlr --from f.dat put -q --ojson 'tee | "gzip > /tmp/data-".$a.".gz", $*'
|
|
|
|
true: the boolean literal value.
|
|
|
|
unset: clears field(s) from the current record, or an out-of-stream or local variable.
|
|
|
|
Example: mlr --from f.dat put 'unset $x'
|
|
Example: mlr --from f.dat put 'unset $*'
|
|
Example: mlr --from f.dat put 'for (k, v in $*) { if (k =~ "a.*") { unset $[k] } }'
|
|
Example: mlr --from f.dat put '...; unset @sums'
|
|
Example: mlr --from f.dat put '...; unset @sums["green"]'
|
|
Example: mlr --from f.dat put '...; unset @*'
|
|
|
|
var: declares an untyped local variable in the current curly-braced scope.
|
|
Examples: 'var a=1', 'var xyz=""'
|
|
|
|
while: introduces a while loop, or with "do", introduces a do-while loop.
|
|
The body statements must be wrapped in curly braces.
|
|
|
|
ENV: access to environment variables by name, e.g. '$home = ENV["HOME"]'
|
|
|
|
FILENAME: evaluates to the name of the current file being processed.
|
|
|
|
FILENUM: evaluates to the number of the current file being processed,
|
|
starting with 1.
|
|
|
|
FNR: evaluates to the number of the current record within the current file
|
|
being processed, starting with 1. Resets at the start of each file.
|
|
|
|
IFS: evaluates to the input field separator from the command line.
|
|
|
|
IPS: evaluates to the input pair separator from the command line.
|
|
|
|
IRS: evaluates to the input record separator from the command line,
|
|
or to LF or CRLF from the input data if in autodetect mode (which is
|
|
the default).
|
|
|
|
M_E: the mathematical constant e.
|
|
|
|
M_PI: the mathematical constant pi.
|
|
|
|
NF: evaluates to the number of fields in the current record.
|
|
|
|
NR: evaluates to the number of the current record over all files
|
|
being processed, starting with 1. Does not reset at the start of each file.
|
|
|
|
OFS: evaluates to the output field separator from the command line.
|
|
|
|
OPS: evaluates to the output pair separator from the command line.
|
|
|
|
ORS: evaluates to the output record separator from the command line,
|
|
or to LF or CRLF from the input data if in autodetect mode (which is
|
|
the default).
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
|
|
</div>
|
|
<!-- ================================================================ -->
|
|
<a id="Operator_precedence"/><h1>Operator precedence</h1>
|
|
<button style="font-weight:bold;color:maroon;border:0" padding=0 onclick="bodyToggler.toggle('body_section_toggle_operator_precedence');" href="javascript:;">Toggle section visibility</button>
|
|
<div id="body_section_toggle_operator_precedence" style="display: block">
|
|
|
|
<p/>Operators are listed in order of decreasing precedence, highest first.
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
Operators Associativity
|
|
--------- -------------
|
|
() left to right
|
|
** right to left
|
|
! ~ unary+ unary- & right to left
|
|
binary* / // % left to right
|
|
binary+ binary- . left to right
|
|
<< >> left to right
|
|
& left to right
|
|
^ left to right
|
|
| left to right
|
|
< <= > >= left to right
|
|
== != =~ !=~ left to right
|
|
&& left to right
|
|
^^ left to right
|
|
|| left to right
|
|
? : right to left
|
|
= N/A for Miller (there is no $a=$b=$c)
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
|
|
</div>
|
|
<!-- ================================================================ -->
|
|
<a id="Operator_and_function_semantics"/><h1>Operator and function semantics</h1>
|
|
<button style="font-weight:bold;color:maroon;border:0" padding=0 onclick="bodyToggler.toggle('body_section_toggle_operator_and_function_semantics');" href="javascript:;">Toggle section visibility</button>
|
|
<div id="body_section_toggle_operator_and_function_semantics" style="display: block">
|
|
|
|
<ul>
|
|
|
|
<li/> Functions are in general pass-throughs straight to the system-standard C
|
|
library.
|
|
|
|
<li/> The <code>min</code> and <code>max</code> functions are different from other
|
|
multi-argument functions which return null if any of their inputs are null: for
|
|
<code>min</code> and <code>max</code>, by contrast, if one argument is absent-null, the other
|
|
is returned. Empty-null loses min or max against numeric or boolean; empty-null
|
|
is less than any other string.
|
|
|
|
<li/> Symmetrically with respect to the bitwise OR, XOR, and AND operators
|
|
<code>|</code>, <code>^</code>, <code>&</code>, Miller has logical operators
|
|
<code>||</code>, <code>^^</code>, <code>&&</code>: the logical XOR not existing in
|
|
C.
|
|
|
|
<li/> The exponentiation operator <code>**</code> is familiar from many languages.
|
|
|
|
<li/> The regex-match and regex-not-match operators <code>=~</code> and
|
|
<code>!=~</code> are similar to those in Ruby and Perl.
|
|
|
|
</ul>
|
|
|
|
</div>
|
|
<!-- ================================================================ -->
|
|
<a id="Control_structures"/><h1>Control structures</h1>
|
|
<button style="font-weight:bold;color:maroon;border:0" padding=0 onclick="bodyToggler.toggle('body_section_toggle_control_structures');" href="javascript:;">Toggle section visibility</button>
|
|
<div id="body_section_toggle_control_structures" style="display: block">
|
|
|
|
<!-- ================================================================ -->
|
|
<a id="Pattern-action_blocks"/><h2>Pattern-action blocks</h2>
|
|
|
|
<p/>These are reminiscent of <code>awk</code> syntax. They can be used to allow
|
|
assignments to be done only when appropriate — e.g. for math-function
|
|
domain restrictions, regex-matching, and so on:
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr cat data/put-gating-example-1.dkvp
|
|
x=-1
|
|
x=0
|
|
x=1
|
|
x=2
|
|
x=3
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr put '$x > 0.0 { $y = log10($x); $z = sqrt($y) }' data/put-gating-example-1.dkvp
|
|
x=-1
|
|
x=0
|
|
x=1,y=0.000000,z=0.000000
|
|
x=2,y=0.301030,z=0.548662
|
|
x=3,y=0.477121,z=0.690740
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr cat data/put-gating-example-2.dkvp
|
|
a=abc_123
|
|
a=some other name
|
|
a=xyz_789
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr put '$a =~ "([a-z]+)_([0-9]+)" { $b = "left_\1"; $c = "right_\2" }' data/put-gating-example-2.dkvp
|
|
a=abc_123,b=left_abc,c=right_123
|
|
a=some other name
|
|
a=xyz_789,b=left_xyz,c=right_789
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
|
|
<p/>This produces heteregenous output which Miller, of course, has no problems
|
|
with (see <a href="record-heterogeneity.html">Record-heterogeneity</a>). But if you
|
|
want homogeneous output, the curly braces can be replaced with a semicolon
|
|
between the expression and the body statements. This causes <code>put</code> to
|
|
evaluate the boolean expression (along with any side effects, namely,
|
|
regex-captures <code>\1</code>, <code>\2</code>, etc.) but doesn’t use it as a
|
|
criterion for whether subsequent assignments should be executed. Instead,
|
|
subsequent assignments are done unconditionally:
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr put '$x > 0.0; $y = log10($x); $z = sqrt($y)' data/put-gating-example-1.dkvp
|
|
x=-1,y=nan,z=nan
|
|
x=0,y=-inf,z=nan
|
|
x=1,y=0.000000,z=0.000000
|
|
x=2,y=0.301030,z=0.548662
|
|
x=3,y=0.477121,z=0.690740
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr put '$a =~ "([a-z]+)_([0-9]+)"; $b = "left_\1"; $c = "right_\2"' data/put-gating-example-2.dkvp
|
|
a=abc_123,b=left_abc,c=right_123
|
|
a=some other name,b=left_,c=right_
|
|
a=xyz_789,b=left_xyz,c=right_789
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
|
|
<!-- ================================================================ -->
|
|
<a id="If-statements"/><h2>If-statements</h2>
|
|
|
|
<p/>These are again reminiscent of <code>awk</code>. Pattern-action blocks are a special case of <code>if</code> with no
|
|
<code>elif</code> or <code>else</code> blocks, no <code>if</code> keyword, and parentheses optional around the boolean expression:
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
mlr put 'NR == 4 {$foo = "bar"}'
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
mlr put 'if (NR == 4) {$foo = "bar"}'
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
|
|
<p/>Compound statements use <code>elif</code> (rather than <code>elsif</code> or <code>else if</code>):
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
mlr put '
|
|
if (NR == 2) {
|
|
...
|
|
} elif (NR ==4) {
|
|
...
|
|
} elif (NR ==6) {
|
|
...
|
|
} else {
|
|
...
|
|
}
|
|
'
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
|
|
<!-- ================================================================ -->
|
|
<a id="While_and_do-while_loops"/><h2>While and do-while loops</h2>
|
|
|
|
<p/>Miller’s <code>while</code> and <code>do-while</code> are unsurprising in
|
|
comparison to various languages, as are <code>break</code> and <code>continue</code>:
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ echo x=1,y=2 | mlr put '
|
|
while (NF < 10) {
|
|
$[NF+1] = ""
|
|
}
|
|
$foo = "bar"
|
|
'
|
|
x=1,y=2,3=,4=,5=,6=,7=,8=,9=,10=,foo=bar
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ echo x=1,y=2 | mlr put '
|
|
do {
|
|
$[NF+1] = "";
|
|
if (NF == 5) {
|
|
break
|
|
}
|
|
} while (NF < 10);
|
|
$foo = "bar"
|
|
'
|
|
x=1,y=2,3=,4=,5=,foo=bar
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
|
|
<p/> A <code>break</code> or <code>continue</code> within nested conditional blocks or
|
|
if-statements will, of course, propagate to the innermost loop enclosing them,
|
|
if any. A <code>break</code> or <code>continue</code> outside a loop is a syntax error
|
|
that will be flagged as soon as the expression is parsed, before any input
|
|
records are ingested.
|
|
|
|
<p/> The existence of <code>while</code>, <code>do-while</code>, and <code>for</code> loops
|
|
in Miller’s DSL means that you can create infinite-loop scenarios
|
|
inadvertently. In particular, please recall that DSL statements are executed
|
|
once if in <code>begin</code> or <code>end</code> blocks, and once <i>per record</i>
|
|
otherwise. For example, <b><code>while (NR < 10)</code> will never terminate as
|
|
<code>NR</code> is only incremented between records</b>.
|
|
|
|
<!-- ================================================================ -->
|
|
<a id="For-loops"/><h2>For-loops</h2>
|
|
|
|
<p/>While Miller’s <code>while</code> and <code>do-while</code> statements are
|
|
much as in many other languages, <code>for</code> loops are more idiosyncratic to
|
|
Miller. They are loops over key-value pairs, whether in stream records,
|
|
out-of-stream variables, local variables, or map-literals: more reminiscent of
|
|
<code>foreach</code>, as in (for example) PHP. There are <b>for-loops over map
|
|
keys</b> and <b>for-loops over key-value tuples</b>. Additionally, Miller has a
|
|
<b>C-style triple-for loop</b> with initialize, test, and update statements.
|
|
|
|
<p/>As with <code>while</code> and <code>do-while</code>, a <code>break</code> or
|
|
<code>continue</code> within nested control structures will propagate to the
|
|
innermost loop enclosing them, if any, and a <code>break</code> or
|
|
<code>continue</code> outside a loop is a syntax error that will be flagged as soon
|
|
as the expression is parsed, before any input records are ingested.
|
|
|
|
<a id="Key-only_for-loops"/><h3>Key-only for-loops </h3>
|
|
|
|
<p/>The <code>key</code> variable is always bound to the <i>key</i> of key-value pairs:
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr --from data/small put '
|
|
print "NR = ".NR;
|
|
for (key in $*) {
|
|
value = $[key];
|
|
print " key:" . key . " value:".value;
|
|
}
|
|
|
|
'
|
|
NR = 1
|
|
key:a value:pan
|
|
key:b value:pan
|
|
key:i value:1
|
|
key:x value:0.346790
|
|
key:y value:0.726803
|
|
a=pan,b=pan,i=1,x=0.3467901443380824,y=0.7268028627434533
|
|
NR = 2
|
|
key:a value:eks
|
|
key:b value:pan
|
|
key:i value:2
|
|
key:x value:0.758680
|
|
key:y value:0.522151
|
|
a=eks,b=pan,i=2,x=0.7586799647899636,y=0.5221511083334797
|
|
NR = 3
|
|
key:a value:wye
|
|
key:b value:wye
|
|
key:i value:3
|
|
key:x value:0.204603
|
|
key:y value:0.338319
|
|
a=wye,b=wye,i=3,x=0.20460330576630303,y=0.33831852551664776
|
|
NR = 4
|
|
key:a value:eks
|
|
key:b value:wye
|
|
key:i value:4
|
|
key:x value:0.381399
|
|
key:y value:0.134189
|
|
a=eks,b=wye,i=4,x=0.38139939387114097,y=0.13418874328430463
|
|
NR = 5
|
|
key:a value:wye
|
|
key:b value:pan
|
|
key:i value:5
|
|
key:x value:0.573289
|
|
key:y value:0.863624
|
|
a=wye,b=pan,i=5,x=0.5732889198020006,y=0.8636244699032729
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr -n put '
|
|
end {
|
|
o = {1:2, 3:{4:5}};
|
|
for (key in o) {
|
|
print " key:" . key . " valuetype:" . typeof(o[key]);
|
|
}
|
|
}
|
|
'
|
|
key:1 valuetype:int
|
|
key:3 valuetype:map
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
|
|
<p/>Note that the value corresponding to a given key may be gotten as through a
|
|
<b>computed field name</b> using square brackets as in <code>$[key]</code> for
|
|
stream records, or by indexing the looped-over variable using square brackets.
|
|
|
|
<a id="Key-value_for-loops"/><h3>Key-value for-loops </h3>
|
|
|
|
<p/>Single-level keys may be gotten at using either <code>for(k,v)</code> or
|
|
<code>for((k),v)</code>; multi-level keys may be gotten at using
|
|
<code>for((k1,k2,k3),v)</code> and so on. The <code>v</code> variable will be bound to
|
|
to a scalar value (a string or a number) if the map stops at that level, or to
|
|
a map-valued variable if the map goes deeper. If the map isn’t deep
|
|
enough then the loop body won’t be executed.
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ cat data/for-srec-example.tbl
|
|
label1 label2 f1 f2 f3
|
|
blue green 100 240 350
|
|
red green 120 11 195
|
|
yellow blue 140 0 240
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr --pprint --from data/for-srec-example.tbl put '
|
|
$sum1 = $f1 + $f2 + $f3;
|
|
$sum2 = 0;
|
|
$sum3 = 0;
|
|
for (key, value in $*) {
|
|
if (key =~ "^f[0-9]+") {
|
|
$sum2 += value;
|
|
$sum3 += $[key];
|
|
}
|
|
}
|
|
'
|
|
label1 label2 f1 f2 f3 sum1 sum2 sum3
|
|
blue green 100 240 350 690 690 690
|
|
red green 120 11 195 326 326 326
|
|
yellow blue 140 0 240 380 380 380
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr --from data/small --opprint put 'for (k,v in $*) { $[k."_type"] = typeof(v) }'
|
|
a b i x y a_type b_type i_type x_type y_type
|
|
pan pan 1 0.3467901443380824 0.7268028627434533 string string int float float
|
|
eks pan 2 0.7586799647899636 0.5221511083334797 string string int float float
|
|
wye wye 3 0.20460330576630303 0.33831852551664776 string string int float float
|
|
eks wye 4 0.38139939387114097 0.13418874328430463 string string int float float
|
|
wye pan 5 0.5732889198020006 0.8636244699032729 string string int float float
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
|
|
<p/>Note that the value of the current field in the for-loop can be gotten either using the bound
|
|
variable <code>value</code>, or through a <b>computed field name</b> using square brackets as in <code>$[key]</code>.
|
|
|
|
<p/>Important note: to avoid inconsistent looping behavior in case you’re
|
|
setting new fields (and/or unsetting existing ones) while looping over the
|
|
record, <b>Miller makes a copy of the record before the loop: loop variables
|
|
are bound from the copy and all other reads/writes involve the record
|
|
itself</b>:
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr --from data/small --opprint put '
|
|
$sum1 = 0;
|
|
$sum2 = 0;
|
|
for (k,v in $*) {
|
|
if (is_numeric(v)) {
|
|
$sum1 +=v;
|
|
$sum2 += $[k];
|
|
}
|
|
}
|
|
'
|
|
a b i x y sum1 sum2
|
|
pan pan 1 0.3467901443380824 0.7268028627434533 2.073593 8.294372
|
|
eks pan 2 0.7586799647899636 0.5221511083334797 3.280831 13.123324
|
|
wye wye 3 0.20460330576630303 0.33831852551664776 3.542922 14.171687
|
|
eks wye 4 0.38139939387114097 0.13418874328430463 4.515588 18.062353
|
|
wye pan 5 0.5732889198020006 0.8636244699032729 6.436913 25.747654
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
|
|
It can be confusing to modify the stream record while iterating over a copy of it, so
|
|
instead you might find it simpler to use a local variable in the loop and only update
|
|
the stream record after the loop:
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr --from data/small --opprint put '
|
|
sum = 0;
|
|
for (k,v in $*) {
|
|
if (is_numeric(v)) {
|
|
sum += $[k];
|
|
}
|
|
}
|
|
$sum = sum
|
|
'
|
|
a b i x y sum
|
|
pan pan 1 0.3467901443380824 0.7268028627434533 2.073593
|
|
eks pan 2 0.7586799647899636 0.5221511083334797 3.280831
|
|
wye wye 3 0.20460330576630303 0.33831852551664776 3.542922
|
|
eks wye 4 0.38139939387114097 0.13418874328430463 4.515588
|
|
wye pan 5 0.5732889198020006 0.8636244699032729 6.436913
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
|
|
<p/>You can also start iterating on sub-hashmaps of an out-of-stream or local
|
|
variable; you can loop over nested keys; you can loop over all out-of-stream
|
|
variables. The bound variables are bound to a copy of the sub-hashmap as it
|
|
was before the loop started. The sub-hashmap is specified by square-bracketed
|
|
indices after <code>in</code>, and additional deeper indices are bound to loop
|
|
key-variables. The terminal values are bound to the loop value-variable
|
|
whenever the keys are not too shallow. The value-variable may refer to a
|
|
terminal (string, number) or it may be map-valued if the map goes deeper.
|
|
Example indexing is as follows:
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
# Parentheses are optional for single key:
|
|
for (k1, v in @a["b"]["c"]) { ... }
|
|
for ((k1), v in @a["b"]["c"]) { ... }
|
|
# Parentheses are required for multiple keys:
|
|
for ((k1, k2), v in @a["b"]["c"]) { ... } # Loop over subhashmap of a variable
|
|
for ((k1, k2, k3), v in @a["b"]["c"]) { ... } # Ditto
|
|
for ((k1, k2, k3), v in @a { ... } # Loop over variable starting from basename
|
|
for ((k1, k2, k3), v in @* { ... } # Loop over all variables (k1 is bound to basename)
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
|
|
<p/>That’s confusing in the abstract, so a concrete example is in order.
|
|
Suppose the out-of-stream variable <code>@myvar</code> is populated as follows:
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr -n put --jknquoteint -q '
|
|
begin {
|
|
@myvar = {
|
|
1: 2,
|
|
3: { 4 : 5 },
|
|
6: { 7: { 8: 9 } }
|
|
}
|
|
}
|
|
end { dump }
|
|
'
|
|
{
|
|
"myvar": {
|
|
1: 2,
|
|
3: {
|
|
4: 5
|
|
},
|
|
6: {
|
|
7: {
|
|
8: 9
|
|
}
|
|
}
|
|
}
|
|
}
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
|
|
<p/> Then we can get at various values as follows:
|
|
|
|
<table><tr><td>
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr -n put --jknquoteint -q '
|
|
begin {
|
|
@myvar = {
|
|
1: 2,
|
|
3: { 4 : 5 },
|
|
6: { 7: { 8: 9 } }
|
|
}
|
|
}
|
|
end {
|
|
for (k, v in @myvar) {
|
|
print
|
|
"key=" . k .
|
|
",valuetype=" . typeof(v);
|
|
}
|
|
}
|
|
'
|
|
key=1,valuetype=int
|
|
key=3,valuetype=map
|
|
key=6,valuetype=map
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
</td><td>
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr -n put --jknquoteint -q '
|
|
begin {
|
|
@myvar = {
|
|
1: 2,
|
|
3: { 4 : 5 },
|
|
6: { 7: { 8: 9 } }
|
|
}
|
|
}
|
|
end {
|
|
for ((k1, k2), v in @myvar) {
|
|
print
|
|
"key1=" . k1 .
|
|
",key2=" . k2 .
|
|
",valuetype=" . typeof(v);
|
|
}
|
|
}
|
|
'
|
|
key1=3,key2=4,valuetype=int
|
|
key1=6,key2=7,valuetype=map
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
</td><td>
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr -n put --jknquoteint -q '
|
|
begin {
|
|
@myvar = {
|
|
1: 2,
|
|
3: { 4 : 5 },
|
|
6: { 7: { 8: 9 } }
|
|
}
|
|
}
|
|
end {
|
|
for ((k1, k2), v in @myvar[6]) {
|
|
print
|
|
"key1=" . k1 .
|
|
",key2=" . k2 .
|
|
",valuetype=" . typeof(v);
|
|
}
|
|
}
|
|
'
|
|
key1=7,key2=8,valuetype=int
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
</td></tr></table>
|
|
|
|
<a id="C-style_triple-for_loops"/><h3>C-style triple-for loops</h3>
|
|
|
|
<p/> These are supported as follows:
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr --from data/small --opprint put '
|
|
num suma = 0;
|
|
for (a = 1; a <= NR; a += 1) {
|
|
suma += a;
|
|
}
|
|
$suma = suma;
|
|
'
|
|
a b i x y suma
|
|
pan pan 1 0.3467901443380824 0.7268028627434533 1
|
|
eks pan 2 0.7586799647899636 0.5221511083334797 3
|
|
wye wye 3 0.20460330576630303 0.33831852551664776 6
|
|
eks wye 4 0.38139939387114097 0.13418874328430463 10
|
|
wye pan 5 0.5732889198020006 0.8636244699032729 15
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr --from data/small --opprint put '
|
|
num suma = 0;
|
|
num sumb = 0;
|
|
for (num a = 1, num b = 1; a <= NR; a += 1, b *= 2) {
|
|
suma += a;
|
|
sumb += b;
|
|
}
|
|
$suma = suma;
|
|
$sumb = sumb;
|
|
'
|
|
a b i x y suma sumb
|
|
pan pan 1 0.3467901443380824 0.7268028627434533 1 1
|
|
eks pan 2 0.7586799647899636 0.5221511083334797 3 3
|
|
wye wye 3 0.20460330576630303 0.33831852551664776 6 7
|
|
eks wye 4 0.38139939387114097 0.13418874328430463 10 15
|
|
wye pan 5 0.5732889198020006 0.8636244699032729 15 31
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
|
|
Notes:
|
|
<ul>
|
|
|
|
<li/> In <code>for (start; continuation; update) { body }</code>, the start,
|
|
continuation, and update statements may be empty, single statements, or
|
|
multiple comma-separated statements. If the continuation is empty (e.g. <code>for(i=1;;i+=1)</code>) it defaults
|
|
to true.
|
|
|
|
<li/> In particular, you may use <code>$</code>-variables and/or
|
|
<code>@</code>-variables in the start, continuation, and/or update steps (as well
|
|
as the body, of course).
|
|
|
|
<li/> The typedecls such as <code>int</code> or <code>num</code> are optional. If a
|
|
typedecl is provided (for a local variable), it binds a variable scoped to the
|
|
for-loop regardless of whether a same-name variable is present in outer scope.
|
|
If a typedecl is not provided, then the variable is scoped to the for-loop if
|
|
no same-name variable is present in outer scope, or if a same-name variable is
|
|
present in outer scope then it is modified.
|
|
|
|
<li/> Miller has no <code>++</code> or <code>--</code> operators.
|
|
|
|
<li/> As with all for/if/while statements in Miller, the curly braces are
|
|
required even if the body is a single statement, or empty.
|
|
|
|
</ul>
|
|
|
|
<!-- ================================================================ -->
|
|
<a id="Begin/end_blocks"/><h2>Begin/end blocks</h2>
|
|
|
|
<p/>Miller supports an <code>awk</code>-like <code>begin/end</code> syntax. The
|
|
statements in the <code>begin</code> block are executed before any input records
|
|
are read; the statements in the <code>end</code> block are executed after the last
|
|
input record is read. (If you want to execute some statement at the start of
|
|
each file, not at the start of the first file as with <code>begin</code>, you might
|
|
use a pattern/action block of the form <code>FNR == 1 { ... }</code>.) All
|
|
statements outside of <code>begin</code> or <code>end</code> are, of course, executed
|
|
on every input record. Semicolons separate statements inside or outside of
|
|
begin/end blocks; semicolons are required between begin/end block bodies and
|
|
any subsequent statement. For example:
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr put '
|
|
begin { @sum = 0 };
|
|
@x_sum += $x;
|
|
end { emit @x_sum }
|
|
' ../data/small
|
|
a=pan,b=pan,i=1,x=0.3467901443380824,y=0.7268028627434533
|
|
a=eks,b=pan,i=2,x=0.7586799647899636,y=0.5221511083334797
|
|
a=wye,b=wye,i=3,x=0.20460330576630303,y=0.33831852551664776
|
|
a=eks,b=wye,i=4,x=0.38139939387114097,y=0.13418874328430463
|
|
a=wye,b=pan,i=5,x=0.5732889198020006,y=0.8636244699032729
|
|
a=zee,b=pan,i=6,x=0.5271261600918548,y=0.49322128674835697
|
|
a=eks,b=zee,i=7,x=0.6117840605678454,y=0.1878849191181694
|
|
a=zee,b=wye,i=8,x=0.5985540091064224,y=0.976181385699006
|
|
a=hat,b=wye,i=9,x=0.03144187646093577,y=0.7495507603507059
|
|
a=pan,b=wye,i=10,x=0.5026260055412137,y=0.9526183602969864
|
|
x_sum=4.536294
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
|
|
<p/>Since uninitialized out-of-stream variables default to 0 for
|
|
addition/substraction and 1 for multiplication when they appear on expression
|
|
right-hand sides (as in <code>awk</code>), the above can be written more succinctly
|
|
as
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr put '
|
|
@x_sum += $x;
|
|
end { emit @x_sum }
|
|
' ../data/small
|
|
a=pan,b=pan,i=1,x=0.3467901443380824,y=0.7268028627434533
|
|
a=eks,b=pan,i=2,x=0.7586799647899636,y=0.5221511083334797
|
|
a=wye,b=wye,i=3,x=0.20460330576630303,y=0.33831852551664776
|
|
a=eks,b=wye,i=4,x=0.38139939387114097,y=0.13418874328430463
|
|
a=wye,b=pan,i=5,x=0.5732889198020006,y=0.8636244699032729
|
|
a=zee,b=pan,i=6,x=0.5271261600918548,y=0.49322128674835697
|
|
a=eks,b=zee,i=7,x=0.6117840605678454,y=0.1878849191181694
|
|
a=zee,b=wye,i=8,x=0.5985540091064224,y=0.976181385699006
|
|
a=hat,b=wye,i=9,x=0.03144187646093577,y=0.7495507603507059
|
|
a=pan,b=wye,i=10,x=0.5026260055412137,y=0.9526183602969864
|
|
x_sum=4.536294
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
|
|
<p/>The <b>put -q</b> option is a shorthand which suppresses printing of each
|
|
output record, with only <code>emit</code> statements being output. So to get only
|
|
summary outputs, one could write
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr put -q '
|
|
@x_sum += $x;
|
|
end { emit @x_sum }
|
|
' ../data/small
|
|
x_sum=4.536294
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
|
|
<p/>We can do similarly with multiple out-of-stream variables:
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr put -q '
|
|
@x_count += 1;
|
|
@x_sum += $x;
|
|
end {
|
|
emit @x_count;
|
|
emit @x_sum;
|
|
}
|
|
' ../data/small
|
|
x_count=10
|
|
x_sum=4.536294
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
|
|
This is of course not much different than
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr stats1 -a count,sum -f x ../data/small
|
|
x_count=10,x_sum=4.536294
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
|
|
<p/>Note that it’s a syntax error for begin/end blocks to refer to field
|
|
names (beginning with <code>$</code>), since these execute outside the context of
|
|
input records.
|
|
|
|
</div>
|
|
<!-- ================================================================ -->
|
|
<a id="Output_statements"/><h1>Output statements</h1>
|
|
<button style="font-weight:bold;color:maroon;border:0" padding=0 onclick="bodyToggler.toggle('body_section_toggle_output_statements');" href="javascript:;">Toggle section visibility</button>
|
|
<div id="body_section_toggle_output_statements" style="display: block">
|
|
|
|
<p/>You can <b>output</b> variable-values or expressions in <b>five ways</b>:
|
|
|
|
<ul>
|
|
|
|
<li/> <b>Assign</b> them to stream-record fields. For example,
|
|
<code>$cumulative_sum = @sum</code>. For another example, <code>$nr = NR</code> adds a
|
|
field named <code>nr</code> to each output record, containing the value of the
|
|
built-in variable <code>NR</code> as of when that record was ingested.
|
|
|
|
<li/> Use the <b>print</b> or <b>eprint</b> keywords which immediately print an
|
|
expression <i>directly to standard output or standard error</i>, respectively.
|
|
Note that <code>dump</code>, <code>edump</code>, <code>print</code>, and <code>eprint</code>
|
|
don’t output records which participate in <code>then</code>-chaining; rather,
|
|
they’re just immediate prints to stdout/stderr. The <code>printn</code> and
|
|
<code>eprintn</code> keywords are the same except that they don’t print final
|
|
newlines. Additionally, you can print to a specified file instead of
|
|
stdout/stderr.
|
|
|
|
<li/> Use the <b>dump</b> or <b>edump</b> keywords, which <i>immediately print
|
|
all out-of-stream variables as a JSON data structure to the standard output or
|
|
standard error</i> (respectively).
|
|
|
|
<li/> Use <b>tee</b> which formats the current stream record (not just an
|
|
arbitrary string as with <b>print</b>) to a specific file.
|
|
|
|
<li/> Use <b>emit</b>/<b>emitp</b>/<b>emitf</b> to send out-of-stream
|
|
variables’ current values to the output record stream, e.g. <code>@sum +=
|
|
$x; emit @sum</code> which produces an extra output record such as
|
|
<code>sum=3.1648382</code>.
|
|
|
|
</ul>
|
|
|
|
<p/>For the first two options you are populating the output-records stream
|
|
which feeds into the next verb in a <code>then</code>-chain (if any), or which otherwise
|
|
is formatted for output using <code>--o...</code> flags.
|
|
|
|
<p/>For the last three options you are sending output directly to standard
|
|
output, standard error, or a file.
|
|
|
|
<!-- ================================================================ -->
|
|
<a id="Print_statements"/><h2>Print statements</h2>
|
|
|
|
<p/>The <code>print</code> statement is perhaps self-explanatory, but with a few
|
|
light caveats:
|
|
|
|
<ul>
|
|
|
|
<li/> There are four variants: <code>print</code> goes to stdout with final
|
|
newline, <code>printn</code> goes to stdout without final newline (you can include
|
|
one using "\n" in your output string), <code>eprint</code> goes to stderr with
|
|
final newline, and <code>eprintn</code> goes to stderr without final newline.
|
|
|
|
<li/> Output goes directly to stdout/stderr, respectively: data produced this
|
|
way do not go downstream to the next verb in a <code>then</code>-chain. (Use
|
|
<code>emit</code> for that.)
|
|
|
|
<li/> Print statements are for strings (<code>print "hello"</code>), or things
|
|
which can be made into strings: numbers (<code>print 3</code>, <code>print $a +
|
|
$b</code>, or concatenations thereof (<code>print "a + b = " . ($a + $b)</code>).
|
|
Maps (in <code>$*</code>, map-valued out-of-stream or local variables, and map
|
|
literals) aren’t convertible into strings. If you print a map, you get
|
|
<code>{is-a-map}</code> as output. Please use <code>dump</code> to print maps.
|
|
|
|
<li/>You can redirect print output to a file:
|
|
<code>mlr --from myfile.dat put 'print > "tap.txt", $x'</code>
|
|
<code>mlr --from myfile.dat put 'o=$*; print > $a.".txt", $x'</code>.
|
|
|
|
<li/> See also the <a href="#Redirected-output_statements">section on redirected output</a> for examples.
|
|
|
|
</ul>
|
|
|
|
<!-- ================================================================ -->
|
|
<a id="Dump_statements"/><h2>Dump statements</h2>
|
|
|
|
<p/>The <code>dump</code> statement is for printing expressions, including maps,
|
|
directly to stdout/stderr, respectively:
|
|
|
|
<ul>
|
|
|
|
<li/> There are two variants: <code>dump</code> prints to stdout; <code>edump</code>
|
|
prints to stderr.
|
|
|
|
<li/> Output goes directly to stdout/stderr, respectively: data produced this
|
|
way do not go downstream to the next verb in a <code>then</code>-chain. (Use
|
|
<code>emit</code> for that.)
|
|
|
|
<li/> You can use <code>dump</code> to output single strings, numbers,
|
|
or expressions including map-valued data. Map-valued data are printed
|
|
as JSON. Miller allows string and integer keys in its map literals while
|
|
JSON allows only string keys, so use <code>mlr put --jknquoteint</code> if
|
|
you want integer-valued map keys not double-quoted.
|
|
|
|
<li/> If you use <code>dump</code> (or <code>edump</code>) with no arguments, you get a
|
|
JSON structure representing the current values of all out-of-stream variables.
|
|
|
|
<li/> As with <code>print</code>, you can redirect output to files.
|
|
|
|
<li/> See also the <a href="#Redirected-output_statements">section on redirected output</a> for examples.
|
|
|
|
</ul>
|
|
|
|
<!-- ================================================================ -->
|
|
<a id="Tee_statements"/><h2>Tee statements</h2>
|
|
|
|
<p/> Records produced by a <code>mlr put</code> go downstream to the next verb in
|
|
your <code>then</code>-chain, if any, or otherwise to standard output. If you want
|
|
to additionally copy out records to files, you can do that using <code>tee</code>.
|
|
|
|
<p/>The syntax is, by example, <code>mlr --from myfile.dat put 'tee >
|
|
"tap.dat", $*' then sort -n index</code>. First is <code>tee ></code>, then the
|
|
filename expression (which can be an expression such as
|
|
<code>"tap.".$a.".dat"</code>), then a comma, then <code>$*</code>. (Nothing else but
|
|
<code>$*</code> is teeable.)
|
|
|
|
<p/> See also the <a href="#Redirected-output_statements">section on redirected
|
|
output</a> for examples.
|
|
|
|
<!-- ================================================================ -->
|
|
<a id="Redirected-output_statements"/><h2>Redirected-output statements</h2>
|
|
|
|
The <b>print</b>, <b>dump</b> <b>tee</b>, <b>emitf</b>, <b>emit</b>, and
|
|
<b>emitp</b> keywords all allow you to redirect output to one or more files or
|
|
pipe-to commands. The filenames/commands are strings which can be constructed
|
|
using record-dependent values, so you can do things like splitting a table into
|
|
multiple files, one for each account ID, and so on.
|
|
|
|
<p/> Details:
|
|
|
|
<ul>
|
|
|
|
<li/> The <code>print</code> and <code>dump</code> keywords produce output immediately
|
|
to standard output, or to specified file(s) or pipe-to command if present.
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr --help-keyword print
|
|
print: prints expression immediately to stdout.
|
|
Example: mlr --from f.dat put -q 'print "The sum of x and y is ".($x+$y)'
|
|
Example: mlr --from f.dat put -q 'for (k, v in $*) { print k . " => " . v }'
|
|
Example: mlr --from f.dat put '(NR % 1000 == 0) { print > stderr, "Checkpoint ".NR}'
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr --help-keyword dump
|
|
dump: prints all currently defined out-of-stream variables immediately
|
|
to stdout as JSON.
|
|
|
|
With >, >>, or |, the data do not become part of the output record stream but
|
|
are instead redirected.
|
|
|
|
The > and >> are for write and append, as in the shell, but (as with awk) the
|
|
file-overwrite for > is on first write, not per record. The | is for piping to
|
|
a process which will process the data. There will be one open file for each
|
|
distinct file name (for > and >>) or one subordinate process for each distinct
|
|
value of the piped-to command (for |). Output-formatting flags are taken from
|
|
the main command line.
|
|
|
|
Example: mlr --from f.dat put -q '@v[NR]=$*; end { dump }'
|
|
Example: mlr --from f.dat put -q '@v[NR]=$*; end { dump > "mytap.dat"}'
|
|
Example: mlr --from f.dat put -q '@v[NR]=$*; end { dump >> "mytap.dat"}'
|
|
Example: mlr --from f.dat put -q '@v[NR]=$*; end { dump | "jq .[]"}'
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
|
|
<li/> <code>mlr put</code> sends the current record (possibly modified by the
|
|
<code>put</code> expression) to the output record stream. Records are then input to
|
|
the following verb in a <code>then</code>-chain (if any), else printed to standard
|
|
output (unless <code>put -q</code>). The <b>tee</b> keyword <i>additionally</i>
|
|
writes the output record to specified file(s) or pipe-to command, or
|
|
immediately to <code>stdout</code>/<code>stderr</code>.
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr --help-keyword tee
|
|
tee: prints the current record to specified file.
|
|
This is an immediate print to the specified file (except for pprint format
|
|
which of course waits until the end of the input stream to format all output).
|
|
|
|
The > and >> are for write and append, as in the shell, but (as with awk) the
|
|
file-overwrite for > is on first write, not per record. The | is for piping to
|
|
a process which will process the data. There will be one open file for each
|
|
distinct file name (for > and >>) or one subordinate process for each distinct
|
|
value of the piped-to command (for |). Output-formatting flags are taken from
|
|
the main command line.
|
|
|
|
You can use any of the output-format command-line flags, e.g. --ocsv, --ofs,
|
|
etc., to control the format of the output. See also mlr -h.
|
|
|
|
emit with redirect and tee with redirect are identical, except tee can only
|
|
output $*.
|
|
|
|
Example: mlr --from f.dat put 'tee > "/tmp/data-".$a, $*'
|
|
Example: mlr --from f.dat put 'tee >> "/tmp/data-".$a.$b, $*'
|
|
Example: mlr --from f.dat put 'tee > stderr, $*'
|
|
Example: mlr --from f.dat put -q 'tee | "tr [a-z\] [A-Z\]", $*'
|
|
Example: mlr --from f.dat put -q 'tee | "tr [a-z\] [A-Z\] > /tmp/data-".$a, $*'
|
|
Example: mlr --from f.dat put -q 'tee | "gzip > /tmp/data-".$a.".gz", $*'
|
|
Example: mlr --from f.dat put -q --ojson 'tee | "gzip > /tmp/data-".$a.".gz", $*'
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
|
|
<li/> <code>mlr put</code>’s <code>emitf</code>, <code>emitp</code>, and
|
|
<code>emit</code> send out-of-stream variables to the output record stream. These
|
|
are then input to the following verb in a <code>then</code>-chain (if any), else
|
|
printed to standard output. When redirected with <code>></code>,
|
|
<code>>></code>, or <code>|</code>, they <i>instead</i> write the out-of-stream
|
|
variable(s) to specified file(s) or pipe-to command, or immediately to
|
|
<code>stdout</code>/<code>stderr</code>.
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr --help-keyword emitf
|
|
emitf: inserts non-indexed out-of-stream variable(s) side-by-side into the
|
|
output record stream.
|
|
|
|
With >, >>, or |, the data do not become part of the output record stream but
|
|
are instead redirected.
|
|
|
|
The > and >> are for write and append, as in the shell, but (as with awk) the
|
|
file-overwrite for > is on first write, not per record. The | is for piping to
|
|
a process which will process the data. There will be one open file for each
|
|
distinct file name (for > and >>) or one subordinate process for each distinct
|
|
value of the piped-to command (for |). Output-formatting flags are taken from
|
|
the main command line.
|
|
|
|
You can use any of the output-format command-line flags, e.g. --ocsv, --ofs,
|
|
etc., to control the format of the output if the output is redirected. See also mlr -h.
|
|
|
|
Example: mlr --from f.dat put '@a=$i;@b+=$x;@c+=$y; emitf @a'
|
|
Example: mlr --from f.dat put --oxtab '@a=$i;@b+=$x;@c+=$y; emitf > "tap-".$i.".dat", @a'
|
|
Example: mlr --from f.dat put '@a=$i;@b+=$x;@c+=$y; emitf @a, @b, @c'
|
|
Example: mlr --from f.dat put '@a=$i;@b+=$x;@c+=$y; emitf > "mytap.dat", @a, @b, @c'
|
|
Example: mlr --from f.dat put '@a=$i;@b+=$x;@c+=$y; emitf >> "mytap.dat", @a, @b, @c'
|
|
Example: mlr --from f.dat put '@a=$i;@b+=$x;@c+=$y; emitf > stderr, @a, @b, @c'
|
|
Example: mlr --from f.dat put '@a=$i;@b+=$x;@c+=$y; emitf | "grep somepattern", @a, @b, @c'
|
|
Example: mlr --from f.dat put '@a=$i;@b+=$x;@c+=$y; emitf | "grep somepattern > mytap.dat", @a, @b, @c'
|
|
|
|
Please see http://johnkerl.org/miller/doc for more information.
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr --help-keyword emitp
|
|
emitp: inserts an out-of-stream variable into the output record stream.
|
|
Hashmap indices present in the data but not slotted by emitp arguments are
|
|
output concatenated with ":".
|
|
|
|
With >, >>, or |, the data do not become part of the output record stream but
|
|
are instead redirected.
|
|
|
|
The > and >> are for write and append, as in the shell, but (as with awk) the
|
|
file-overwrite for > is on first write, not per record. The | is for piping to
|
|
a process which will process the data. There will be one open file for each
|
|
distinct file name (for > and >>) or one subordinate process for each distinct
|
|
value of the piped-to command (for |). Output-formatting flags are taken from
|
|
the main command line.
|
|
|
|
You can use any of the output-format command-line flags, e.g. --ocsv, --ofs,
|
|
etc., to control the format of the output if the output is redirected. See also mlr -h.
|
|
|
|
Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emitp @sums'
|
|
Example: mlr --from f.dat put --opprint '@sums[$a][$b]+=$x; emitp > "tap-".$a.$b.".dat", @sums'
|
|
Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emitp @sums, "index1", "index2"'
|
|
Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emitp @*, "index1", "index2"'
|
|
Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emitp > "mytap.dat", @*, "index1", "index2"'
|
|
Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emitp >> "mytap.dat", @*, "index1", "index2"'
|
|
Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emitp | "gzip > mytap.dat.gz", @*, "index1", "index2"'
|
|
Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emitp > stderr, @*, "index1", "index2"'
|
|
Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emitp | "grep somepattern", @*, "index1", "index2"'
|
|
|
|
Please see http://johnkerl.org/miller/doc for more information.
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr --help-keyword emit
|
|
emit: inserts an out-of-stream variable into the output record stream. Hashmap
|
|
indices present in the data but not slotted by emit arguments are not output.
|
|
|
|
With >, >>, or |, the data do not become part of the output record stream but
|
|
are instead redirected.
|
|
|
|
The > and >> are for write and append, as in the shell, but (as with awk) the
|
|
file-overwrite for > is on first write, not per record. The | is for piping to
|
|
a process which will process the data. There will be one open file for each
|
|
distinct file name (for > and >>) or one subordinate process for each distinct
|
|
value of the piped-to command (for |). Output-formatting flags are taken from
|
|
the main command line.
|
|
|
|
You can use any of the output-format command-line flags, e.g. --ocsv, --ofs,
|
|
etc., to control the format of the output if the output is redirected. See also mlr -h.
|
|
|
|
Example: mlr --from f.dat put 'emit > "/tmp/data-".$a, $*'
|
|
Example: mlr --from f.dat put 'emit > "/tmp/data-".$a, mapexcept($*, "a")'
|
|
Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emit @sums'
|
|
Example: mlr --from f.dat put --ojson '@sums[$a][$b]+=$x; emit > "tap-".$a.$b.".dat", @sums'
|
|
Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emit @sums, "index1", "index2"'
|
|
Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emit @*, "index1", "index2"'
|
|
Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emit > "mytap.dat", @*, "index1", "index2"'
|
|
Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emit >> "mytap.dat", @*, "index1", "index2"'
|
|
Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emit | "gzip > mytap.dat.gz", @*, "index1", "index2"'
|
|
Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emit > stderr, @*, "index1", "index2"'
|
|
Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emit | "grep somepattern", @*, "index1", "index2"'
|
|
|
|
Please see http://johnkerl.org/miller/doc for more information.
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
|
|
</ul>
|
|
|
|
<!-- ================================================================ -->
|
|
<a id="Emit_statements"/><h2>Emit statements</h2>
|
|
|
|
<p/>There are three variants: <code>emitf</code>, <code>emit</code>, and
|
|
<code>emitp</code>. Keep in mind that out-of-stream variables are a nested,
|
|
multi-level hashmap (directly viewable as JSON using <code>dump</code>), whereas
|
|
Miller output records are lists of single-level key-value pairs. The three emit
|
|
variants allow you to control how the multilevel hashmaps are flatten down to
|
|
output records. You can emit any map-valued expression, including <code>$*</code>,
|
|
map-valued out-of-stream variables, the entire out-of-stream-variable
|
|
collection <code>@*</code>, map-valued local variables, map literals, or map-valued
|
|
function return values.
|
|
|
|
<p/>Use <b>emitf</b> to output several out-of-stream variables side-by-side in the same output record.
|
|
For <code>emitf</code> these mustn’t have indexing using <code>@name[...]</code>. Example:
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr put -q '@count += 1; @x_sum += $x; @y_sum += $y; end { emitf @count, @x_sum, @y_sum}' data/small
|
|
count=5,x_sum=2.264762,y_sum=2.585086
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
|
|
<p/>Use <b>emit</b> to output an out-of-stream variable. If it’s non-indexed you’ll get a simple key-value pair:
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ cat data/small
|
|
a=pan,b=pan,i=1,x=0.3467901443380824,y=0.7268028627434533
|
|
a=eks,b=pan,i=2,x=0.7586799647899636,y=0.5221511083334797
|
|
a=wye,b=wye,i=3,x=0.20460330576630303,y=0.33831852551664776
|
|
a=eks,b=wye,i=4,x=0.38139939387114097,y=0.13418874328430463
|
|
a=wye,b=pan,i=5,x=0.5732889198020006,y=0.8636244699032729
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr put -q '@sum += $x; end { dump }' data/small
|
|
{
|
|
"sum": 2.264762
|
|
}
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr put -q '@sum += $x; end { emit @sum }' data/small
|
|
sum=2.264762
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
|
|
<p/>If it’s indexed then use as many names after <code>emit</code> as there are indices:
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr put -q '@sum[$a] += $x; end { dump }' data/small
|
|
{
|
|
"sum": {
|
|
"pan": 0.346790,
|
|
"eks": 1.140079,
|
|
"wye": 0.777892
|
|
}
|
|
}
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr put -q '@sum[$a] += $x; end { emit @sum, "a" }' data/small
|
|
a=pan,sum=0.346790
|
|
a=eks,sum=1.140079
|
|
a=wye,sum=0.777892
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr put -q '@sum[$a][$b] += $x; end { dump }' data/small
|
|
{
|
|
"sum": {
|
|
"pan": {
|
|
"pan": 0.346790
|
|
},
|
|
"eks": {
|
|
"pan": 0.758680,
|
|
"wye": 0.381399
|
|
},
|
|
"wye": {
|
|
"wye": 0.204603,
|
|
"pan": 0.573289
|
|
}
|
|
}
|
|
}
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr put -q '@sum[$a][$b] += $x; end { emit @sum, "a", "b" }' data/small
|
|
a=pan,b=pan,sum=0.346790
|
|
a=eks,b=pan,sum=0.758680
|
|
a=eks,b=wye,sum=0.381399
|
|
a=wye,b=wye,sum=0.204603
|
|
a=wye,b=pan,sum=0.573289
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr put -q '@sum[$a][$b][$i] += $x; end { dump }' data/small
|
|
{
|
|
"sum": {
|
|
"pan": {
|
|
"pan": {
|
|
"1": 0.346790
|
|
}
|
|
},
|
|
"eks": {
|
|
"pan": {
|
|
"2": 0.758680
|
|
},
|
|
"wye": {
|
|
"4": 0.381399
|
|
}
|
|
},
|
|
"wye": {
|
|
"wye": {
|
|
"3": 0.204603
|
|
},
|
|
"pan": {
|
|
"5": 0.573289
|
|
}
|
|
}
|
|
}
|
|
}
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr put -q '@sum[$a][$b][$i] += $x; end { emit @sum, "a", "b", "i" }' data/small
|
|
a=pan,b=pan,i=1,sum=0.346790
|
|
a=eks,b=pan,i=2,sum=0.758680
|
|
a=eks,b=wye,i=4,sum=0.381399
|
|
a=wye,b=wye,i=3,sum=0.204603
|
|
a=wye,b=pan,i=5,sum=0.573289
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
|
|
<p/>Now for <b>emitp</b>: if you have as many names following <code>emit</code> as
|
|
there are levels in the out-of-stream variable’s hashmap, then <code>emit</code> and <code>emitp</code> do the same
|
|
thing. Where they differ is when you don’t specify as many names as there are hashmap levels. In this
|
|
case, Miller needs to flatten multiple map indices down to output-record keys: <code>emitp</code> includes full
|
|
prefixing (hence the <code>p</code> in <code>emitp</code>) while <code>emit</code> takes the deepest hashmap key as the
|
|
output-record key:
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr put -q '@sum[$a][$b] += $x; end { dump }' data/small
|
|
{
|
|
"sum": {
|
|
"pan": {
|
|
"pan": 0.346790
|
|
},
|
|
"eks": {
|
|
"pan": 0.758680,
|
|
"wye": 0.381399
|
|
},
|
|
"wye": {
|
|
"wye": 0.204603,
|
|
"pan": 0.573289
|
|
}
|
|
}
|
|
}
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr put -q '@sum[$a][$b] += $x; end { emit @sum, "a" }' data/small
|
|
a=pan,pan=0.346790
|
|
a=eks,pan=0.758680,wye=0.381399
|
|
a=wye,wye=0.204603,pan=0.573289
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr put -q '@sum[$a][$b] += $x; end { emit @sum }' data/small
|
|
pan=0.346790
|
|
pan=0.758680,wye=0.381399
|
|
wye=0.204603,pan=0.573289
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr put -q '@sum[$a][$b] += $x; end { emitp @sum, "a" }' data/small
|
|
a=pan,sum:pan=0.346790
|
|
a=eks,sum:pan=0.758680,sum:wye=0.381399
|
|
a=wye,sum:wye=0.204603,sum:pan=0.573289
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr put -q '@sum[$a][$b] += $x; end { emitp @sum }' data/small
|
|
sum:pan:pan=0.346790,sum:eks:pan=0.758680,sum:eks:wye=0.381399,sum:wye:wye=0.204603,sum:wye:pan=0.573289
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr --oxtab put -q '@sum[$a][$b] += $x; end { emitp @sum }' data/small
|
|
sum:pan:pan 0.346790
|
|
sum:eks:pan 0.758680
|
|
sum:eks:wye 0.381399
|
|
sum:wye:wye 0.204603
|
|
sum:wye:pan 0.573289
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
|
|
<p/>Use <b>--oflatsep</b> to specify the character which joins multilevel
|
|
keys for <code>emitp</code> (it defaults to a colon):
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr put -q --oflatsep / '@sum[$a][$b] += $x; end { emitp @sum, "a" }' data/small
|
|
a=pan,sum/pan=0.346790
|
|
a=eks,sum/pan=0.758680,sum/wye=0.381399
|
|
a=wye,sum/wye=0.204603,sum/pan=0.573289
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr put -q --oflatsep / '@sum[$a][$b] += $x; end { emitp @sum }' data/small
|
|
sum/pan/pan=0.346790,sum/eks/pan=0.758680,sum/eks/wye=0.381399,sum/wye/wye=0.204603,sum/wye/pan=0.573289
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr --oxtab put -q --oflatsep / '@sum[$a][$b] += $x; end { emitp @sum }' data/small
|
|
sum/pan/pan 0.346790
|
|
sum/eks/pan 0.758680
|
|
sum/eks/wye 0.381399
|
|
sum/wye/wye 0.204603
|
|
sum/wye/pan 0.573289
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
|
|
<!-- ================================================================ -->
|
|
<a id="Multi-emit_statements"/><h2>Multi-emit statements</h2>
|
|
|
|
<p/>You can emit <b>multiple map-valued expressions side-by-side</b> by
|
|
including their names in parentheses:
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr --from data/medium --opprint put -q '
|
|
@x_count[$a][$b] += 1;
|
|
@x_sum[$a][$b] += $x;
|
|
end {
|
|
for ((a, b), _ in @x_count) {
|
|
@x_mean[a][b] = @x_sum[a][b] / @x_count[a][b]
|
|
}
|
|
emit (@x_sum, @x_count, @x_mean), "a", "b"
|
|
}
|
|
'
|
|
a b x_sum x_count x_mean
|
|
pan pan 219.185129 427 0.513314
|
|
pan wye 198.432931 395 0.502362
|
|
pan eks 216.075228 429 0.503672
|
|
pan hat 205.222776 417 0.492141
|
|
pan zee 205.097518 413 0.496604
|
|
eks pan 179.963030 371 0.485076
|
|
eks wye 196.945286 407 0.483895
|
|
eks zee 176.880365 357 0.495463
|
|
eks eks 215.916097 413 0.522799
|
|
eks hat 208.783171 417 0.500679
|
|
wye wye 185.295850 377 0.491501
|
|
wye pan 195.847900 392 0.499612
|
|
wye hat 212.033183 426 0.497730
|
|
wye zee 194.774048 385 0.505907
|
|
wye eks 204.812961 386 0.530604
|
|
zee pan 202.213804 389 0.519830
|
|
zee wye 233.991394 455 0.514267
|
|
zee eks 190.961778 391 0.488393
|
|
zee zee 206.640635 403 0.512756
|
|
zee hat 191.300006 409 0.467726
|
|
hat wye 208.883010 423 0.493813
|
|
hat zee 196.349450 385 0.509999
|
|
hat eks 189.006793 389 0.485879
|
|
hat hat 182.853532 381 0.479931
|
|
hat pan 168.553807 363 0.464336
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
|
|
What this does is walk through the first out-of-stream variable
|
|
(<code>@x_sum</code> in this example) as usual, then for each keylist found (e.g.
|
|
<code>pan,wye</code>), include the values for the remaining out-of-stream variables
|
|
(here, <code>@x_count</code> and <code>@x_mean</code>). You should use this when all
|
|
out-of-stream variables in the emit statement have <b>the same shape and the same
|
|
keylists</b>.
|
|
|
|
<!-- ================================================================ -->
|
|
<a id="Emit-all_statements"/><h2>Emit-all statements</h2>
|
|
|
|
<p/>Use <b>emit all</b> (or <code>emit @*</code> which is synonymous) to output all
|
|
out-of-stream variables. You can use the following idiom to get various
|
|
accumulators output side-by-side (reminiscent of <code>mlr stats1</code>):
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr --from data/small --opprint put -q '@v[$a][$b]["sum"] += $x; @v[$a][$b]["count"] += 1; end{emit @*,"a","b"}'
|
|
a b sum count
|
|
pan pan 0.346790 1
|
|
eks pan 0.758680 1
|
|
eks wye 0.381399 1
|
|
wye wye 0.204603 1
|
|
wye pan 0.573289 1
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr --from data/small --opprint put -q '@sum[$a][$b] += $x; @count[$a][$b] += 1; end{emit @*,"a","b"}'
|
|
a b sum
|
|
pan pan 0.346790
|
|
eks pan 0.758680
|
|
eks wye 0.381399
|
|
wye wye 0.204603
|
|
wye pan 0.573289
|
|
|
|
a b count
|
|
pan pan 1
|
|
eks pan 1
|
|
eks wye 1
|
|
wye wye 1
|
|
wye pan 1
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr --from data/small --opprint put -q '@sum[$a][$b] += $x; @count[$a][$b] += 1; end{emit (@sum, @count),"a","b"}'
|
|
a b sum count
|
|
pan pan 0.346790 1
|
|
eks pan 0.758680 1
|
|
eks wye 0.381399 1
|
|
wye wye 0.204603 1
|
|
wye pan 0.573289 1
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
|
|
</div>
|
|
<!-- ================================================================ -->
|
|
<a id="Unset_statements"/><h1>Unset statements</h1>
|
|
<button style="font-weight:bold;color:maroon;border:0" padding=0 onclick="bodyToggler.toggle('body_section_toggle_unset_statements');" href="javascript:;">Toggle section visibility</button>
|
|
<div id="body_section_toggle_unset_statements" style="display: block">
|
|
|
|
<p/>You can clear a map key by assigning the empty string as its value: <code>$x=""</code> or <code>@x=""</code>.
|
|
Using <code>unset</code> you can remove the key entirely. Examples:
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ cat data/small
|
|
a=pan,b=pan,i=1,x=0.3467901443380824,y=0.7268028627434533
|
|
a=eks,b=pan,i=2,x=0.7586799647899636,y=0.5221511083334797
|
|
a=wye,b=wye,i=3,x=0.20460330576630303,y=0.33831852551664776
|
|
a=eks,b=wye,i=4,x=0.38139939387114097,y=0.13418874328430463
|
|
a=wye,b=pan,i=5,x=0.5732889198020006,y=0.8636244699032729
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr put 'unset $x, $a' data/small
|
|
b=pan,i=1,y=0.7268028627434533
|
|
b=pan,i=2,y=0.5221511083334797
|
|
b=wye,i=3,y=0.33831852551664776
|
|
b=wye,i=4,y=0.13418874328430463
|
|
b=pan,i=5,y=0.8636244699032729
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
|
|
<p/>This can also be done, of course, using <code>mlr cut -x</code>. You can also
|
|
clear out-of-stream or local variables, at the base name level, or at an
|
|
indexed sublevel:
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr put -q '@sum[$a][$b] += $x; end { dump; unset @sum; dump }' data/small
|
|
{
|
|
"sum": {
|
|
"pan": {
|
|
"pan": 0.346790
|
|
},
|
|
"eks": {
|
|
"pan": 0.758680,
|
|
"wye": 0.381399
|
|
},
|
|
"wye": {
|
|
"wye": 0.204603,
|
|
"pan": 0.573289
|
|
}
|
|
}
|
|
}
|
|
{
|
|
}
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr put -q '@sum[$a][$b] += $x; end { dump; unset @sum["eks"]; dump }' data/small
|
|
{
|
|
"sum": {
|
|
"pan": {
|
|
"pan": 0.346790
|
|
},
|
|
"eks": {
|
|
"pan": 0.758680,
|
|
"wye": 0.381399
|
|
},
|
|
"wye": {
|
|
"wye": 0.204603,
|
|
"pan": 0.573289
|
|
}
|
|
}
|
|
}
|
|
{
|
|
"sum": {
|
|
"pan": {
|
|
"pan": 0.346790
|
|
},
|
|
"wye": {
|
|
"wye": 0.204603,
|
|
"pan": 0.573289
|
|
}
|
|
}
|
|
}
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
|
|
<p/>If you use <code>unset all</code> (or <code>unset @*</code> which is synonymous), that will unset all out-of-stream
|
|
variables which have been defined up to that point.
|
|
|
|
</div>
|
|
<!-- ================================================================ -->
|
|
<a id="Filter_statements"/><h1>Filter statements</h1>
|
|
<button style="font-weight:bold;color:maroon;border:0" padding=0 onclick="bodyToggler.toggle('body_section_toggle_filter_statements');" href="javascript:;">Toggle section visibility</button>
|
|
<div id="body_section_toggle_filter_statements" style="display: block">
|
|
|
|
<p/> You can use <code>filter</code> within <code>put</code>. In fact, the
|
|
following two are synonymous:
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr filter 'NR==2 || NR==3' data/small
|
|
a=eks,b=pan,i=2,x=0.7586799647899636,y=0.5221511083334797
|
|
a=wye,b=wye,i=3,x=0.20460330576630303,y=0.33831852551664776
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr put 'filter NR==2 || NR==3' data/small
|
|
a=eks,b=pan,i=2,x=0.7586799647899636,y=0.5221511083334797
|
|
a=wye,b=wye,i=3,x=0.20460330576630303,y=0.33831852551664776
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
|
|
<p/>The former, of course, is much easier to type. But the latter allows you to define more complex expressions
|
|
for the filter, and/or do other things in addition to the filter:
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr put '@running_sum += $x; filter @running_sum > 1.3' data/small
|
|
a=wye,b=wye,i=3,x=0.20460330576630303,y=0.33831852551664776
|
|
a=eks,b=wye,i=4,x=0.38139939387114097,y=0.13418874328430463
|
|
a=wye,b=pan,i=5,x=0.5732889198020006,y=0.8636244699032729
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr put '$z = $x * $y; filter $z > 0.3' data/small
|
|
a=eks,b=pan,i=2,x=0.7586799647899636,y=0.5221511083334797,z=0.396146
|
|
a=wye,b=pan,i=5,x=0.5732889198020006,y=0.8636244699032729,z=0.495106
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
|
|
</div>
|
|
<!-- ================================================================ -->
|
|
<a id="Built-in_functions_for_filter_and_put,_summary"/><h1>Built-in functions for filter and put, summary</h1>
|
|
<button style="font-weight:bold;color:maroon;border:0" padding=0 onclick="bodyToggler.toggle('body_section_toggle_built_in_functions_summary');" href="javascript:;">Toggle section visibility</button>
|
|
<div id="body_section_toggle_built_in_functions_summary" style="display: block">
|
|
|
|
<table border=1>
|
|
<tr class="mlrbg">
|
|
<th>Name</th> <th>Class</th> <th>#Args</th>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#+">+</a></td> <td>arithmetic</td> <td>2</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#+">+</a></td> <td>arithmetic</td> <td>1</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#-">-</a></td> <td>arithmetic</td> <td>2</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#-">-</a></td> <td>arithmetic</td> <td>1</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#*">*</a></td> <td>arithmetic</td> <td>2</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#/">/</a></td> <td>arithmetic</td> <td>2</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#//">//</a></td> <td>arithmetic</td> <td>2</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#.+">.+</a></td> <td>arithmetic</td> <td>2</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#.+">.+</a></td> <td>arithmetic</td> <td>1</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#.-">.-</a></td> <td>arithmetic</td> <td>2</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#.-">.-</a></td> <td>arithmetic</td> <td>1</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#.*">.*</a></td> <td>arithmetic</td> <td>2</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#./">./</a></td> <td>arithmetic</td> <td>2</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#.//">.//</a></td> <td>arithmetic</td> <td>2</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#%">%</a></td> <td>arithmetic</td> <td>2</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#**">**</a></td> <td>arithmetic</td> <td>2</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#|">|</a></td> <td>arithmetic</td> <td>2</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#^">^</a></td> <td>arithmetic</td> <td>2</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#&">&</a></td> <td>arithmetic</td> <td>2</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#~">~</a></td> <td>arithmetic</td> <td>1</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#<<"><<</a></td> <td>arithmetic</td> <td>2</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#>>">>></a></td> <td>arithmetic</td> <td>2</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#bitcount">bitcount</a></td> <td>arithmetic</td> <td>1</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#==">==</a></td> <td>boolean</td> <td>2</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#!=">!=</a></td> <td>boolean</td> <td>2</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#=~">=~</a></td> <td>boolean</td> <td>2</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#!=~">!=~</a></td> <td>boolean</td> <td>2</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#>">></a></td> <td>boolean</td> <td>2</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#>=">>=</a></td> <td>boolean</td> <td>2</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#<"><</a></td> <td>boolean</td> <td>2</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#<="><=</a></td> <td>boolean</td> <td>2</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#&&">&&</a></td> <td>boolean</td> <td>2</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#||">||</a></td> <td>boolean</td> <td>2</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#^^">^^</a></td> <td>boolean</td> <td>2</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#!">!</a></td> <td>boolean</td> <td>1</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#? :">? :</a></td> <td>boolean</td> <td>3</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#.">.</a></td> <td>string</td> <td>2</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#gsub">gsub</a></td> <td>string</td> <td>3</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#regextract">regextract</a></td> <td>string</td> <td>2</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#regextract_or_else">regextract_or_else</a></td> <td>string</td> <td>3</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#strlen">strlen</a></td> <td>string</td> <td>1</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#sub">sub</a></td> <td>string</td> <td>3</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#ssub">ssub</a></td> <td>string</td> <td>3</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#substr">substr</a></td> <td>string</td> <td>3</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#tolower">tolower</a></td> <td>string</td> <td>1</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#toupper">toupper</a></td> <td>string</td> <td>1</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#capitalize">capitalize</a></td> <td>string</td> <td>1</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#lstrip">lstrip</a></td> <td>string</td> <td>1</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#rstrip">rstrip</a></td> <td>string</td> <td>1</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#strip">strip</a></td> <td>string</td> <td>1</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#collapse_whitespace">collapse_whitespace</a></td> <td>string</td> <td>1</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#clean_whitespace">clean_whitespace</a></td> <td>string</td> <td>1</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#system">system</a></td> <td>string</td> <td>1</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#abs">abs</a></td> <td>math</td> <td>1</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#acos">acos</a></td> <td>math</td> <td>1</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#acosh">acosh</a></td> <td>math</td> <td>1</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#asin">asin</a></td> <td>math</td> <td>1</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#asinh">asinh</a></td> <td>math</td> <td>1</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#atan">atan</a></td> <td>math</td> <td>1</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#atan2">atan2</a></td> <td>math</td> <td>2</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#atanh">atanh</a></td> <td>math</td> <td>1</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#cbrt">cbrt</a></td> <td>math</td> <td>1</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#ceil">ceil</a></td> <td>math</td> <td>1</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#cos">cos</a></td> <td>math</td> <td>1</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#cosh">cosh</a></td> <td>math</td> <td>1</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#erf">erf</a></td> <td>math</td> <td>1</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#erfc">erfc</a></td> <td>math</td> <td>1</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#exp">exp</a></td> <td>math</td> <td>1</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#expm1">expm1</a></td> <td>math</td> <td>1</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#floor">floor</a></td> <td>math</td> <td>1</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#invqnorm">invqnorm</a></td> <td>math</td> <td>1</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#log">log</a></td> <td>math</td> <td>1</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#log10">log10</a></td> <td>math</td> <td>1</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#log1p">log1p</a></td> <td>math</td> <td>1</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#logifit">logifit</a></td> <td>math</td> <td>3</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#madd">madd</a></td> <td>math</td> <td>3</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#max">max</a></td> <td>math</td> <td>variadic</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#mexp">mexp</a></td> <td>math</td> <td>3</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#min">min</a></td> <td>math</td> <td>variadic</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#mmul">mmul</a></td> <td>math</td> <td>3</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#msub">msub</a></td> <td>math</td> <td>3</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#pow">pow</a></td> <td>math</td> <td>2</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#qnorm">qnorm</a></td> <td>math</td> <td>1</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#round">round</a></td> <td>math</td> <td>1</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#roundm">roundm</a></td> <td>math</td> <td>2</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#sgn">sgn</a></td> <td>math</td> <td>1</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#sin">sin</a></td> <td>math</td> <td>1</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#sinh">sinh</a></td> <td>math</td> <td>1</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#sqrt">sqrt</a></td> <td>math</td> <td>1</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#tan">tan</a></td> <td>math</td> <td>1</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#tanh">tanh</a></td> <td>math</td> <td>1</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#urand">urand</a></td> <td>math</td> <td>0</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#urandrange">urandrange</a></td> <td>math</td> <td>2</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#urand32">urand32</a></td> <td>math</td> <td>0</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#urandint">urandint</a></td> <td>math</td> <td>2</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#dhms2fsec">dhms2fsec</a></td> <td>time</td> <td>1</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#dhms2sec">dhms2sec</a></td> <td>time</td> <td>1</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#fsec2dhms">fsec2dhms</a></td> <td>time</td> <td>1</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#fsec2hms">fsec2hms</a></td> <td>time</td> <td>1</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#gmt2sec">gmt2sec</a></td> <td>time</td> <td>1</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#localtime2sec">localtime2sec</a></td> <td>time</td> <td>1</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#hms2fsec">hms2fsec</a></td> <td>time</td> <td>1</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#hms2sec">hms2sec</a></td> <td>time</td> <td>1</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#sec2dhms">sec2dhms</a></td> <td>time</td> <td>1</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#sec2gmt">sec2gmt</a></td> <td>time</td> <td>1</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#sec2gmt">sec2gmt</a></td> <td>time</td> <td>2</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#sec2gmtdate">sec2gmtdate</a></td> <td>time</td> <td>1</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#sec2localtime">sec2localtime</a></td> <td>time</td> <td>1</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#sec2localtime">sec2localtime</a></td> <td>time</td> <td>2</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#sec2localdate">sec2localdate</a></td> <td>time</td> <td>1</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#sec2hms">sec2hms</a></td> <td>time</td> <td>1</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#strftime">strftime</a></td> <td>time</td> <td>2</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#strftime_local">strftime_local</a></td> <td>time</td> <td>2</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#strptime">strptime</a></td> <td>time</td> <td>2</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#strptime_local">strptime_local</a></td> <td>time</td> <td>2</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#systime">systime</a></td> <td>time</td> <td>0</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#is_absent">is_absent</a></td> <td>typing</td> <td>1</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#is_bool">is_bool</a></td> <td>typing</td> <td>1</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#is_boolean">is_boolean</a></td> <td>typing</td> <td>1</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#is_empty">is_empty</a></td> <td>typing</td> <td>1</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#is_empty_map">is_empty_map</a></td> <td>typing</td> <td>1</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#is_float">is_float</a></td> <td>typing</td> <td>1</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#is_int">is_int</a></td> <td>typing</td> <td>1</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#is_map">is_map</a></td> <td>typing</td> <td>1</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#is_nonempty_map">is_nonempty_map</a></td> <td>typing</td> <td>1</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#is_not_empty">is_not_empty</a></td> <td>typing</td> <td>1</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#is_not_map">is_not_map</a></td> <td>typing</td> <td>1</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#is_not_null">is_not_null</a></td> <td>typing</td> <td>1</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#is_null">is_null</a></td> <td>typing</td> <td>1</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#is_numeric">is_numeric</a></td> <td>typing</td> <td>1</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#is_present">is_present</a></td> <td>typing</td> <td>1</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#is_string">is_string</a></td> <td>typing</td> <td>1</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#asserting_absent">asserting_absent</a></td> <td>typing</td> <td>1</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#asserting_bool">asserting_bool</a></td> <td>typing</td> <td>1</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#asserting_boolean">asserting_boolean</a></td> <td>typing</td> <td>1</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#asserting_empty">asserting_empty</a></td> <td>typing</td> <td>1</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#asserting_empty_map">asserting_empty_map</a></td> <td>typing</td> <td>1</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#asserting_float">asserting_float</a></td> <td>typing</td> <td>1</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#asserting_int">asserting_int</a></td> <td>typing</td> <td>1</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#asserting_map">asserting_map</a></td> <td>typing</td> <td>1</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#asserting_nonempty_map">asserting_nonempty_map</a></td> <td>typing</td> <td>1</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#asserting_not_empty">asserting_not_empty</a></td> <td>typing</td> <td>1</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#asserting_not_map">asserting_not_map</a></td> <td>typing</td> <td>1</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#asserting_not_null">asserting_not_null</a></td> <td>typing</td> <td>1</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#asserting_null">asserting_null</a></td> <td>typing</td> <td>1</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#asserting_numeric">asserting_numeric</a></td> <td>typing</td> <td>1</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#asserting_present">asserting_present</a></td> <td>typing</td> <td>1</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#asserting_string">asserting_string</a></td> <td>typing</td> <td>1</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#boolean">boolean</a></td> <td>conversion</td> <td>1</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#float">float</a></td> <td>conversion</td> <td>1</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#fmtnum">fmtnum</a></td> <td>conversion</td> <td>2</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#hexfmt">hexfmt</a></td> <td>conversion</td> <td>1</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#int">int</a></td> <td>conversion</td> <td>1</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#string">string</a></td> <td>conversion</td> <td>1</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#typeof">typeof</a></td> <td>conversion</td> <td>1</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#depth">depth</a></td> <td>maps</td> <td>1</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#haskey">haskey</a></td> <td>maps</td> <td>2</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#joink">joink</a></td> <td>maps</td> <td>2</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#joinkv">joinkv</a></td> <td>maps</td> <td>3</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#joinv">joinv</a></td> <td>maps</td> <td>2</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#leafcount">leafcount</a></td> <td>maps</td> <td>1</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#length">length</a></td> <td>maps</td> <td>1</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#mapdiff">mapdiff</a></td> <td>maps</td> <td>variadic</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#mapexcept">mapexcept</a></td> <td>maps</td> <td>variadic</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#mapselect">mapselect</a></td> <td>maps</td> <td>variadic</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#mapsum">mapsum</a></td> <td>maps</td> <td>variadic</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#splitkv">splitkv</a></td> <td>maps</td> <td>3</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#splitkvx">splitkvx</a></td> <td>maps</td> <td>3</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#splitnv">splitnv</a></td> <td>maps</td> <td>2</td>
|
|
</tr>
|
|
<tr>
|
|
<td><a href="#splitnvx">splitnvx</a></td> <td>maps</td> <td>2</td>
|
|
</tr>
|
|
</table>
|
|
|
|
</div>
|
|
<!-- ================================================================ -->
|
|
<a id="Built-in_functions_for_filter_and_put"/><h1>Built-in functions for filter and put</h1>
|
|
<button style="font-weight:bold;color:maroon;border:0" padding=0 onclick="bodyToggler.toggle('body_section_toggle_built_in_functions');" href="javascript:;">Toggle section visibility</button>
|
|
<div id="body_section_toggle_built_in_functions" style="display: block">
|
|
|
|
<p/>Each function takes a specific number of arguments, as shown below, except
|
|
for functions marked as variadic such as <code>min</code> and <code>max</code>. (The
|
|
latter compute min and max of any number of numerical arguments.) There is no
|
|
notion of optional or default-on-absent arguments. All argument-passing is
|
|
positional rather than by name; arguments are passed by value, not by
|
|
reference.
|
|
|
|
<p/>You can get a list of all functions using <b>mlr -F</b>.
|
|
|
|
|
|
<a id="+"/>
|
|
<h2>+</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
+ (class=arithmetic #args=2): Addition.
|
|
|
|
+ (class=arithmetic #args=1): Unary plus.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="-"/>
|
|
<h2>-</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
- (class=arithmetic #args=2): Subtraction.
|
|
|
|
- (class=arithmetic #args=1): Unary minus.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="*"/>
|
|
<h2>*</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
* (class=arithmetic #args=2): Multiplication.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="/"/>
|
|
<h2>/</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
/ (class=arithmetic #args=2): Division.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="//"/>
|
|
<h2>//</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
// (class=arithmetic #args=2): Integer division: rounds to negative (pythonic).
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id=".+"/>
|
|
<h2>.+</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
.+ (class=arithmetic #args=2): Addition, with integer-to-integer overflow
|
|
|
|
.+ (class=arithmetic #args=1): Unary plus, with integer-to-integer overflow.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id=".-"/>
|
|
<h2>.-</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
.- (class=arithmetic #args=2): Subtraction, with integer-to-integer overflow.
|
|
|
|
.- (class=arithmetic #args=1): Unary minus, with integer-to-integer overflow.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id=".*"/>
|
|
<h2>.*</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
.* (class=arithmetic #args=2): Multiplication, with integer-to-integer overflow.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="./"/>
|
|
<h2>./</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
./ (class=arithmetic #args=2): Division, with integer-to-integer overflow.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id=".//"/>
|
|
<h2>.//</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
.// (class=arithmetic #args=2): Integer division: rounds to negative (pythonic), with integer-to-integer overflow.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="%"/>
|
|
<h2>%</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
% (class=arithmetic #args=2): Remainder; never negative-valued (pythonic).
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="**"/>
|
|
<h2>**</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
** (class=arithmetic #args=2): Exponentiation; same as pow, but as an infix
|
|
operator.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="|"/>
|
|
<h2>|</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
| (class=arithmetic #args=2): Bitwise OR.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="^"/>
|
|
<h2>^</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
^ (class=arithmetic #args=2): Bitwise XOR.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="&"/>
|
|
<h2>&</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
& (class=arithmetic #args=2): Bitwise AND.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="~"/>
|
|
<h2>~</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
~ (class=arithmetic #args=1): Bitwise NOT. Beware '$y=~$x' since =~ is the
|
|
regex-match operator: try '$y = ~$x'.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="<<"/>
|
|
<h2><<</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
<< (class=arithmetic #args=2): Bitwise left-shift.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id=">>"/>
|
|
<h2>>></h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
>> (class=arithmetic #args=2): Bitwise right-shift.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="=="/>
|
|
<h2>==</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
== (class=boolean #args=2): String/numeric equality. Mixing number and string
|
|
results in string compare.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="!="/>
|
|
<h2>!=</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
!= (class=boolean #args=2): String/numeric inequality. Mixing number and string
|
|
results in string compare.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="=~"/>
|
|
<h2>=~</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
=~ (class=boolean #args=2): String (left-hand side) matches regex (right-hand
|
|
side), e.g. '$name =~ "^a.*b$"'.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="!=~"/>
|
|
<h2>!=~</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
!=~ (class=boolean #args=2): String (left-hand side) does not match regex
|
|
(right-hand side), e.g. '$name !=~ "^a.*b$"'.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id=">"/>
|
|
<h2>></h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
> (class=boolean #args=2): String/numeric greater-than. Mixing number and string
|
|
results in string compare.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id=">="/>
|
|
<h2>>=</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
>= (class=boolean #args=2): String/numeric greater-than-or-equals. Mixing number
|
|
and string results in string compare.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="<"/>
|
|
<h2><</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
< (class=boolean #args=2): String/numeric less-than. Mixing number and string
|
|
results in string compare.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="<="/>
|
|
<h2><=</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
<= (class=boolean #args=2): String/numeric less-than-or-equals. Mixing number
|
|
and string results in string compare.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="&&"/>
|
|
<h2>&&</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
&& (class=boolean #args=2): Logical AND.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="||"/>
|
|
<h2>||</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
|| (class=boolean #args=2): Logical OR.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="^^"/>
|
|
<h2>^^</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
^^ (class=boolean #args=2): Logical XOR.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="!"/>
|
|
<h2>!</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
! (class=boolean #args=1): Logical negation.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="? :"/>
|
|
<h2>? :</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
? : (class=boolean #args=3): Ternary operator.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="."/>
|
|
<h2>.</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
. (class=string #args=2): String concatenation.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="abs"/>
|
|
<h2>abs</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
abs (class=math #args=1): Absolute value.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="acos"/>
|
|
<h2>acos</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
acos (class=math #args=1): Inverse trigonometric cosine.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="acosh"/>
|
|
<h2>acosh</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
acosh (class=math #args=1): Inverse hyperbolic cosine.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="asin"/>
|
|
<h2>asin</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
asin (class=math #args=1): Inverse trigonometric sine.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="asinh"/>
|
|
<h2>asinh</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
asinh (class=math #args=1): Inverse hyperbolic sine.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="asserting_absent"/>
|
|
<h2>asserting_absent</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
asserting_absent (class=typing #args=1): Returns argument if it is absent in the input data, else
|
|
throws an error.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="asserting_bool"/>
|
|
<h2>asserting_bool</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
asserting_bool (class=typing #args=1): Returns argument if it is present with boolean value, else
|
|
throws an error.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="asserting_boolean"/>
|
|
<h2>asserting_boolean</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
asserting_boolean (class=typing #args=1): Returns argument if it is present with boolean value, else
|
|
throws an error.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="asserting_empty"/>
|
|
<h2>asserting_empty</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
asserting_empty (class=typing #args=1): Returns argument if it is present in input with empty value,
|
|
else throws an error.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="asserting_empty_map"/>
|
|
<h2>asserting_empty_map</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
asserting_empty_map (class=typing #args=1): Returns argument if it is a map with empty value, else
|
|
throws an error.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="asserting_float"/>
|
|
<h2>asserting_float</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
asserting_float (class=typing #args=1): Returns argument if it is present with float value, else
|
|
throws an error.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="asserting_int"/>
|
|
<h2>asserting_int</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
asserting_int (class=typing #args=1): Returns argument if it is present with int value, else
|
|
throws an error.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="asserting_map"/>
|
|
<h2>asserting_map</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
asserting_map (class=typing #args=1): Returns argument if it is a map, else throws an error.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="asserting_nonempty_map"/>
|
|
<h2>asserting_nonempty_map</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
asserting_nonempty_map (class=typing #args=1): Returns argument if it is a non-empty map, else throws
|
|
an error.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="asserting_not_empty"/>
|
|
<h2>asserting_not_empty</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
asserting_not_empty (class=typing #args=1): Returns argument if it is present in input with non-empty
|
|
value, else throws an error.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="asserting_not_map"/>
|
|
<h2>asserting_not_map</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
asserting_not_map (class=typing #args=1): Returns argument if it is not a map, else throws an error.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="asserting_not_null"/>
|
|
<h2>asserting_not_null</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
asserting_not_null (class=typing #args=1): Returns argument if it is non-null (non-empty and non-absent),
|
|
else throws an error.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="asserting_null"/>
|
|
<h2>asserting_null</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
asserting_null (class=typing #args=1): Returns argument if it is null (empty or absent), else throws
|
|
an error.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="asserting_numeric"/>
|
|
<h2>asserting_numeric</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
asserting_numeric (class=typing #args=1): Returns argument if it is present with int or float value,
|
|
else throws an error.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="asserting_present"/>
|
|
<h2>asserting_present</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
asserting_present (class=typing #args=1): Returns argument if it is present in input, else throws
|
|
an error.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="asserting_string"/>
|
|
<h2>asserting_string</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
asserting_string (class=typing #args=1): Returns argument if it is present with string (including
|
|
empty-string) value, else throws an error.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="atan"/>
|
|
<h2>atan</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
atan (class=math #args=1): One-argument arctangent.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="atan2"/>
|
|
<h2>atan2</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
atan2 (class=math #args=2): Two-argument arctangent.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="atanh"/>
|
|
<h2>atanh</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
atanh (class=math #args=1): Inverse hyperbolic tangent.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="bitcount"/>
|
|
<h2>bitcount</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
bitcount (class=arithmetic #args=1): Count of 1-bits
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="boolean"/>
|
|
<h2>boolean</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
boolean (class=conversion #args=1): Convert int/float/bool/string to boolean.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="capitalize"/>
|
|
<h2>capitalize</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
capitalize (class=string #args=1): Convert string's first character to uppercase.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="cbrt"/>
|
|
<h2>cbrt</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
cbrt (class=math #args=1): Cube root.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="ceil"/>
|
|
<h2>ceil</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
ceil (class=math #args=1): Ceiling: nearest integer at or above.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="clean_whitespace"/>
|
|
<h2>clean_whitespace</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
clean_whitespace (class=string #args=1): Same as collapse_whitespace and strip.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="collapse_whitespace"/>
|
|
<h2>collapse_whitespace</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
collapse_whitespace (class=string #args=1): Strip repeated whitespace from string.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="cos"/>
|
|
<h2>cos</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
cos (class=math #args=1): Trigonometric cosine.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="cosh"/>
|
|
<h2>cosh</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
cosh (class=math #args=1): Hyperbolic cosine.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="depth"/>
|
|
<h2>depth</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
depth (class=maps #args=1): Prints maximum depth of hashmap: ''. Scalars have depth 0.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="dhms2fsec"/>
|
|
<h2>dhms2fsec</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
dhms2fsec (class=time #args=1): Recovers floating-point seconds as in
|
|
dhms2fsec("5d18h53m20.250000s") = 500000.250000
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="dhms2sec"/>
|
|
<h2>dhms2sec</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
dhms2sec (class=time #args=1): Recovers integer seconds as in
|
|
dhms2sec("5d18h53m20s") = 500000
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="erf"/>
|
|
<h2>erf</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
erf (class=math #args=1): Error function.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="erfc"/>
|
|
<h2>erfc</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
erfc (class=math #args=1): Complementary error function.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="exp"/>
|
|
<h2>exp</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
exp (class=math #args=1): Exponential function e**x.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="expm1"/>
|
|
<h2>expm1</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
expm1 (class=math #args=1): e**x - 1.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="float"/>
|
|
<h2>float</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
float (class=conversion #args=1): Convert int/float/bool/string to float.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="floor"/>
|
|
<h2>floor</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
floor (class=math #args=1): Floor: nearest integer at or below.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="fmtnum"/>
|
|
<h2>fmtnum</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
fmtnum (class=conversion #args=2): Convert int/float/bool to string using
|
|
printf-style format string, e.g. '$s = fmtnum($n, "%06lld")'. WARNING: Miller numbers
|
|
are all long long or double. If you use formats like %d or %f, behavior is undefined.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="fsec2dhms"/>
|
|
<h2>fsec2dhms</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
fsec2dhms (class=time #args=1): Formats floating-point seconds as in
|
|
fsec2dhms(500000.25) = "5d18h53m20.250000s"
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="fsec2hms"/>
|
|
<h2>fsec2hms</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
fsec2hms (class=time #args=1): Formats floating-point seconds as in
|
|
fsec2hms(5000.25) = "01:23:20.250000"
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="gmt2sec"/>
|
|
<h2>gmt2sec</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
gmt2sec (class=time #args=1): Parses GMT timestamp as integer seconds since
|
|
the epoch.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="gsub"/>
|
|
<h2>gsub</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
gsub (class=string #args=3): Example: '$name=gsub($name, "old", "new")'
|
|
(replace all).
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="haskey"/>
|
|
<h2>haskey</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
haskey (class=maps #args=2): True/false if map has/hasn't key, e.g. 'haskey($*, "a")' or
|
|
'haskey(mymap, mykey)'. Error if 1st argument is not a map.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="hexfmt"/>
|
|
<h2>hexfmt</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
hexfmt (class=conversion #args=1): Convert int to string, e.g. 255 to "0xff".
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="hms2fsec"/>
|
|
<h2>hms2fsec</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
hms2fsec (class=time #args=1): Recovers floating-point seconds as in
|
|
hms2fsec("01:23:20.250000") = 5000.250000
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="hms2sec"/>
|
|
<h2>hms2sec</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
hms2sec (class=time #args=1): Recovers integer seconds as in
|
|
hms2sec("01:23:20") = 5000
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="int"/>
|
|
<h2>int</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
int (class=conversion #args=1): Convert int/float/bool/string to int.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="invqnorm"/>
|
|
<h2>invqnorm</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
invqnorm (class=math #args=1): Inverse of normal cumulative distribution
|
|
function. Note that invqorm(urand()) is normally distributed.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="is_absent"/>
|
|
<h2>is_absent</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
is_absent (class=typing #args=1): False if field is present in input, true otherwise
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="is_bool"/>
|
|
<h2>is_bool</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
is_bool (class=typing #args=1): True if field is present with boolean value. Synonymous with is_boolean.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="is_boolean"/>
|
|
<h2>is_boolean</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
is_boolean (class=typing #args=1): True if field is present with boolean value. Synonymous with is_bool.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="is_empty"/>
|
|
<h2>is_empty</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
is_empty (class=typing #args=1): True if field is present in input with empty string value, false otherwise.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="is_empty_map"/>
|
|
<h2>is_empty_map</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
is_empty_map (class=typing #args=1): True if argument is a map which is empty.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="is_float"/>
|
|
<h2>is_float</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
is_float (class=typing #args=1): True if field is present with value inferred to be float
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="is_int"/>
|
|
<h2>is_int</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
is_int (class=typing #args=1): True if field is present with value inferred to be int
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="is_map"/>
|
|
<h2>is_map</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
is_map (class=typing #args=1): True if argument is a map.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="is_nonempty_map"/>
|
|
<h2>is_nonempty_map</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
is_nonempty_map (class=typing #args=1): True if argument is a map which is non-empty.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="is_not_empty"/>
|
|
<h2>is_not_empty</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
is_not_empty (class=typing #args=1): False if field is present in input with empty value, true otherwise
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="is_not_map"/>
|
|
<h2>is_not_map</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
is_not_map (class=typing #args=1): True if argument is not a map.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="is_not_null"/>
|
|
<h2>is_not_null</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
is_not_null (class=typing #args=1): False if argument is null (empty or absent), true otherwise.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="is_null"/>
|
|
<h2>is_null</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
is_null (class=typing #args=1): True if argument is null (empty or absent), false otherwise.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="is_numeric"/>
|
|
<h2>is_numeric</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
is_numeric (class=typing #args=1): True if field is present with value inferred to be int or float
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="is_present"/>
|
|
<h2>is_present</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
is_present (class=typing #args=1): True if field is present in input, false otherwise.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="is_string"/>
|
|
<h2>is_string</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
is_string (class=typing #args=1): True if field is present with string (including empty-string) value
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="joink"/>
|
|
<h2>joink</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
joink (class=maps #args=2): Makes string from map keys. E.g. 'joink($*, ",")'.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="joinkv"/>
|
|
<h2>joinkv</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
joinkv (class=maps #args=3): Makes string from map key-value pairs. E.g. 'joinkv(@v[2], "=", ",")'
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="joinv"/>
|
|
<h2>joinv</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
joinv (class=maps #args=2): Makes string from map keys. E.g. 'joinv(mymap, ",")'.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="leafcount"/>
|
|
<h2>leafcount</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
leafcount (class=maps #args=1): Counts total number of terminal values in hashmap. For single-level maps,
|
|
same as length.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="length"/>
|
|
<h2>length</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
length (class=maps #args=1): Counts number of top-level entries in hashmap. Scalars have length 1.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="localtime2sec"/>
|
|
<h2>localtime2sec</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
localtime2sec (class=time #args=1): Parses local timestamp as integer seconds since
|
|
the epoch. Consults $TZ environment variable.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="log"/>
|
|
<h2>log</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
log (class=math #args=1): Natural (base-e) logarithm.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="log10"/>
|
|
<h2>log10</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
log10 (class=math #args=1): Base-10 logarithm.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="log1p"/>
|
|
<h2>log1p</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
log1p (class=math #args=1): log(1-x).
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="logifit"/>
|
|
<h2>logifit</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
logifit (class=math #args=3): Given m and b from logistic regression, compute
|
|
fit: $yhat=logifit($x,$m,$b).
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="lstrip"/>
|
|
<h2>lstrip</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
lstrip (class=string #args=1): Strip leading whitespace from string.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="madd"/>
|
|
<h2>madd</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
madd (class=math #args=3): a + b mod m (integers)
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="mapdiff"/>
|
|
<h2>mapdiff</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
mapdiff (class=maps variadic): With 0 args, returns empty map. With 1 arg, returns copy of arg.
|
|
With 2 or more, returns copy of arg 1 with all keys from any of remaining argument maps removed.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="mapexcept"/>
|
|
<h2>mapexcept</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
mapexcept (class=maps variadic): Returns a map with keys from remaining arguments, if any, unset.
|
|
E.g. 'mapexcept({1:2,3:4,5:6}, 1, 5, 7)' is '{3:4}'.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="mapselect"/>
|
|
<h2>mapselect</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
mapselect (class=maps variadic): Returns a map with only keys from remaining arguments set.
|
|
E.g. 'mapselect({1:2,3:4,5:6}, 1, 5, 7)' is '{1:2,5:6}'.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="mapsum"/>
|
|
<h2>mapsum</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
mapsum (class=maps variadic): With 0 args, returns empty map. With >= 1 arg, returns a map with
|
|
key-value pairs from all arguments. Rightmost collisions win, e.g. 'mapsum({1:2,3:4},{1:5})' is '{1:5,3:4}'.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="max"/>
|
|
<h2>max</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
max (class=math variadic): max of n numbers; null loses
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="mexp"/>
|
|
<h2>mexp</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
mexp (class=math #args=3): a ** b mod m (integers)
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="min"/>
|
|
<h2>min</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
min (class=math variadic): Min of n numbers; null loses
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="mmul"/>
|
|
<h2>mmul</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
mmul (class=math #args=3): a * b mod m (integers)
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="msub"/>
|
|
<h2>msub</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
msub (class=math #args=3): a - b mod m (integers)
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="pow"/>
|
|
<h2>pow</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
pow (class=math #args=2): Exponentiation; same as **.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="qnorm"/>
|
|
<h2>qnorm</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
qnorm (class=math #args=1): Normal cumulative distribution function.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="regextract"/>
|
|
<h2>regextract</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
regextract (class=string #args=2): Example: '$name=regextract($name, "[A-Z]{3}[0-9]{2}")'
|
|
.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="regextract_or_else"/>
|
|
<h2>regextract_or_else</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
regextract_or_else (class=string #args=3): Example: '$name=regextract_or_else($name, "[A-Z]{3}[0-9]{2}", "default")'
|
|
.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="round"/>
|
|
<h2>round</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
round (class=math #args=1): Round to nearest integer.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="roundm"/>
|
|
<h2>roundm</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
roundm (class=math #args=2): Round to nearest multiple of m: roundm($x,$m) is
|
|
the same as round($x/$m)*$m
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="rstrip"/>
|
|
<h2>rstrip</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
rstrip (class=string #args=1): Strip trailing whitespace from string.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="sec2dhms"/>
|
|
<h2>sec2dhms</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
sec2dhms (class=time #args=1): Formats integer seconds as in sec2dhms(500000)
|
|
= "5d18h53m20s"
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="sec2gmt"/>
|
|
<h2>sec2gmt</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
sec2gmt (class=time #args=1): Formats seconds since epoch (integer part)
|
|
as GMT timestamp, e.g. sec2gmt(1440768801.7) = "2015-08-28T13:33:21Z".
|
|
Leaves non-numbers as-is.
|
|
|
|
sec2gmt (class=time #args=2): Formats seconds since epoch as GMT timestamp with n
|
|
decimal places for seconds, e.g. sec2gmt(1440768801.7,1) = "2015-08-28T13:33:21.7Z".
|
|
Leaves non-numbers as-is.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="sec2gmtdate"/>
|
|
<h2>sec2gmtdate</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
sec2gmtdate (class=time #args=1): Formats seconds since epoch (integer part)
|
|
as GMT timestamp with year-month-date, e.g. sec2gmtdate(1440768801.7) = "2015-08-28".
|
|
Leaves non-numbers as-is.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="sec2hms"/>
|
|
<h2>sec2hms</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
sec2hms (class=time #args=1): Formats integer seconds as in
|
|
sec2hms(5000) = "01:23:20"
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="sec2localdate"/>
|
|
<h2>sec2localdate</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
sec2localdate (class=time #args=1): Formats seconds since epoch (integer part)
|
|
as local timestamp with year-month-date, e.g. sec2localdate(1440768801.7) = "2015-08-28".
|
|
Consults $TZ environment variable. Leaves non-numbers as-is.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="sec2localtime"/>
|
|
<h2>sec2localtime</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
sec2localtime (class=time #args=1): Formats seconds since epoch (integer part)
|
|
as local timestamp, e.g. sec2localtime(1440768801.7) = "2015-08-28T13:33:21Z".
|
|
Consults $TZ environment variable. Leaves non-numbers as-is.
|
|
|
|
sec2localtime (class=time #args=2): Formats seconds since epoch as local timestamp with n
|
|
decimal places for seconds, e.g. sec2localtime(1440768801.7,1) = "2015-08-28T13:33:21.7Z".
|
|
Consults $TZ environment variable. Leaves non-numbers as-is.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="sgn"/>
|
|
<h2>sgn</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
sgn (class=math #args=1): +1 for positive input, 0 for zero input, -1 for
|
|
negative input.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="sin"/>
|
|
<h2>sin</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
sin (class=math #args=1): Trigonometric sine.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="sinh"/>
|
|
<h2>sinh</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
sinh (class=math #args=1): Hyperbolic sine.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="splitkv"/>
|
|
<h2>splitkv</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
splitkv (class=maps #args=3): Splits string by separators into map with type inference.
|
|
E.g. 'splitkv("a=1,b=2,c=3", "=", ",")' gives '{"a" : 1, "b" : 2, "c" : 3}'.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="splitkvx"/>
|
|
<h2>splitkvx</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
splitkvx (class=maps #args=3): Splits string by separators into map without type inference (keys and
|
|
values are strings). E.g. 'splitkv("a=1,b=2,c=3", "=", ",")' gives
|
|
'{"a" : "1", "b" : "2", "c" : "3"}'.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="splitnv"/>
|
|
<h2>splitnv</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
splitnv (class=maps #args=2): Splits string by separator into integer-indexed map with type inference.
|
|
E.g. 'splitnv("a,b,c" , ",")' gives '{1 : "a", 2 : "b", 3 : "c"}'.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="splitnvx"/>
|
|
<h2>splitnvx</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
splitnvx (class=maps #args=2): Splits string by separator into integer-indexed map without type
|
|
inference (values are strings). E.g. 'splitnv("4,5,6" , ",")' gives '{1 : "4", 2 : "5", 3 : "6"}'.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="sqrt"/>
|
|
<h2>sqrt</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
sqrt (class=math #args=1): Square root.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="ssub"/>
|
|
<h2>ssub</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
ssub (class=string #args=3): Like sub but does no regexing. No characters are special.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="strftime"/>
|
|
<h2>strftime</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
strftime (class=time #args=2): Formats seconds since the epoch as timestamp, e.g.
|
|
strftime(1440768801.7,"%Y-%m-%dT%H:%M:%SZ") = "2015-08-28T13:33:21Z", and
|
|
strftime(1440768801.7,"%Y-%m-%dT%H:%M:%3SZ") = "2015-08-28T13:33:21.700Z".
|
|
Format strings are as in the C library (please see "man strftime" on your system),
|
|
with the Miller-specific addition of "%1S" through "%9S" which format the seconds
|
|
with 1 through 9 decimal places, respectively. ("%S" uses no decimal places.)
|
|
See also strftime_local.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="strftime_local"/>
|
|
<h2>strftime_local</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
strftime_local (class=time #args=2): Like strftime but consults the $TZ environment variable to get local time zone.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="string"/>
|
|
<h2>string</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
string (class=conversion #args=1): Convert int/float/bool/string to string.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="strip"/>
|
|
<h2>strip</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
strip (class=string #args=1): Strip leading and trailing whitespace from string.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="strlen"/>
|
|
<h2>strlen</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
strlen (class=string #args=1): String length.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="strptime"/>
|
|
<h2>strptime</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
strptime (class=time #args=2): Parses timestamp as floating-point seconds since the epoch,
|
|
e.g. strptime("2015-08-28T13:33:21Z","%Y-%m-%dT%H:%M:%SZ") = 1440768801.000000,
|
|
and strptime("2015-08-28T13:33:21.345Z","%Y-%m-%dT%H:%M:%SZ") = 1440768801.345000.
|
|
See also strptime_local.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="strptime_local"/>
|
|
<h2>strptime_local</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
strptime_local (class=time #args=2): Like strptime, but consults $TZ environment variable to find and use local timezone.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="sub"/>
|
|
<h2>sub</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
sub (class=string #args=3): Example: '$name=sub($name, "old", "new")'
|
|
(replace once).
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="substr"/>
|
|
<h2>substr</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
substr (class=string #args=3): substr(s,m,n) gives substring of s from 0-up position m to n
|
|
inclusive. Negative indices -len .. -1 alias to 0 .. len-1.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="system"/>
|
|
<h2>system</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
system (class=string #args=1): Run command string, yielding its stdout minus final carriage return.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="systime"/>
|
|
<h2>systime</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
systime (class=time #args=0): Floating-point seconds since the epoch,
|
|
e.g. 1440768801.748936.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="tan"/>
|
|
<h2>tan</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
tan (class=math #args=1): Trigonometric tangent.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="tanh"/>
|
|
<h2>tanh</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
tanh (class=math #args=1): Hyperbolic tangent.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="tolower"/>
|
|
<h2>tolower</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
tolower (class=string #args=1): Convert string to lowercase.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="toupper"/>
|
|
<h2>toupper</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
toupper (class=string #args=1): Convert string to uppercase.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="typeof"/>
|
|
<h2>typeof</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
typeof (class=conversion #args=1): Convert argument to type of argument (e.g.
|
|
MT_STRING). For debug.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="urand"/>
|
|
<h2>urand</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
urand (class=math #args=0): Floating-point numbers uniformly distributed on the unit interval.
|
|
Int-valued example: '$n=floor(20+urand()*11)'.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="urand32"/>
|
|
<h2>urand32</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
urand32 (class=math #args=0): Integer uniformly distributed 0 and 2**32-1
|
|
inclusive.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="urandint"/>
|
|
<h2>urandint</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
urandint (class=math #args=2): Integer uniformly distributed between inclusive
|
|
integer endpoints.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="urandrange"/>
|
|
<h2>urandrange</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
urandrange (class=math #args=2): Floating-point numbers uniformly distributed on the interval [a, b).
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<!-- ================================================================ -->
|
|
|
|
</div>
|
|
<!-- ================================================================ -->
|
|
<a id="User-defined_functions_and_subroutines"/><h1>User-defined functions and subroutines</h1>
|
|
<button style="font-weight:bold;color:maroon;border:0" padding=0 onclick="bodyToggler.toggle('body_section_toggle_user_defined_functions');" href="javascript:;">Toggle section visibility</button>
|
|
<div id="body_section_toggle_user_defined_functions" style="display: block">
|
|
|
|
<p/> As of Miller 5.0.0 you can define your own functions, as well as subroutines.
|
|
|
|
<!-- ================================================================ -->
|
|
<a id="User-defined_functions"/><h2>User-defined functions</h2>
|
|
|
|
<p/>Here’s the obligatory example of a recursive function to compute the factorial function:
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr --opprint --from data/small put '
|
|
func f(n) {
|
|
if (is_numeric(n)) {
|
|
if (n > 0) {
|
|
return n * f(n-1);
|
|
} else {
|
|
return 1;
|
|
}
|
|
}
|
|
# implicitly return absent-null if non-numeric
|
|
}
|
|
$ox = f($x + NR);
|
|
$oi = f($i);
|
|
'
|
|
a b i x y ox oi
|
|
pan pan 1 0.3467901443380824 0.7268028627434533 0.467054 1
|
|
eks pan 2 0.7586799647899636 0.5221511083334797 3.680838 2
|
|
wye wye 3 0.20460330576630303 0.33831852551664776 1.741251 6
|
|
eks wye 4 0.38139939387114097 0.13418874328430463 18.588349 24
|
|
wye pan 5 0.5732889198020006 0.8636244699032729 211.387310 120
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
|
|
<p/>Properties of user-defined functions:
|
|
|
|
<ul>
|
|
|
|
<li/> Function bodies start with <code>func</code> and a parameter list, defined
|
|
outside of <code>begin</code>, <code>end</code>, or other <code>func</code> or
|
|
<code>subr</code> blocks. (I.e. the Miller DSL has no nested functions.)
|
|
|
|
<li/> A function (uniqified by its name) may not be redefined: either by
|
|
redefining a user-defined function, or by redefining a built-in function.
|
|
However, functions and subroutines have separate namespaces: you can define a
|
|
subroutine <code>log</code> which does not clash with the mathematical <code>log</code>
|
|
function.
|
|
|
|
<li/> Functions may be defined either before or after use (there is an
|
|
object-binding/linkage step at startup). More specifically, functions may be
|
|
either recursive or mutually recursive. Functions may not call subroutines.
|
|
|
|
<li/> Functions may be defined and called either within <code>mlr put</code> or
|
|
<code>mlr put</code>.
|
|
|
|
<li/> Functions have read access to <code>$</code>-variables and
|
|
<code>@</code>-variables but may not modify them.
|
|
See also
|
|
<a href="cookbook.html#Memoization_with_out-of-stream_variables">this cookbook item</a> for an example.
|
|
|
|
<li/> Argument values may be reassigned: they are not read-only.
|
|
|
|
<li/> When a return value is not implicitly returned, this results in a return
|
|
value of absent-null. (In the example above, if there were records for which
|
|
the argument to <code>f</code> is non-numeric, the assignments would be skipped.)
|
|
See also the section on
|
|
<a href="#Null_data:_empty_and_absent">empty_and_absent null data</a>.
|
|
|
|
<li/> See the section on <a href="#Local_variables">local variables</a> for
|
|
information on scope and extent of arguments, as well as for information on the
|
|
use of local variables within functions.
|
|
|
|
<li/> See the section on <a href="#Expressions_from_files">expressions from
|
|
files</a> for information on the use of <code>-f</code> and <code>-e</code> flags.
|
|
|
|
</ul>
|
|
|
|
<!-- ================================================================ -->
|
|
<a id="User-defined_subroutines"/><h2>User-defined subroutines</h2>
|
|
|
|
<p/>Example:
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr --opprint --from data/small put -q '
|
|
begin {
|
|
@call_count = 0;
|
|
}
|
|
subr s(n) {
|
|
@call_count += 1;
|
|
if (is_numeric(n)) {
|
|
if (n > 1) {
|
|
call s(n-1);
|
|
} else {
|
|
print "numcalls=" . @call_count;
|
|
}
|
|
}
|
|
}
|
|
print "NR=" . NR;
|
|
call s(NR);
|
|
'
|
|
NR=1
|
|
numcalls=1
|
|
NR=2
|
|
numcalls=3
|
|
NR=3
|
|
numcalls=6
|
|
NR=4
|
|
numcalls=10
|
|
NR=5
|
|
numcalls=15
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
|
|
<p/>Properties of user-defined subroutines:
|
|
|
|
<ul>
|
|
|
|
<li/> Subroutine bodies start with <code>subr</code> and a parameter list, defined
|
|
outside of <code>begin</code>, <code>end</code>, or other <code>func</code> or
|
|
<code>subr</code> blocks. (I.e. the Miller DSL has no nested subroutines.)
|
|
|
|
<li/> A subroutine (uniqified by its name) may not be redefined.
|
|
However, functions and subroutines have separate namespaces: you can define a
|
|
subroutine <code>log</code> which does not clash with the mathematical <code>log</code>
|
|
function.
|
|
|
|
<li/> Subroutines may be defined either before or after use (there is an
|
|
object-binding/linkage step at startup). More specifically, subroutines may be
|
|
either recursive or mutually recursive. Subroutines may call functions.
|
|
|
|
<li/> Subroutines may be defined and called either within <code>mlr put</code> or
|
|
<code>mlr put</code>.
|
|
|
|
<li/> Subroutines have read/write access to <code>$</code>-variables and
|
|
<code>@</code>-variables.
|
|
|
|
<li/> Argument values may be reassigned: they are not read-only.
|
|
|
|
<li/> See the section on <a href="#Local_variables">local variables</a> for
|
|
information on scope and extent of arguments, as well as for information on the
|
|
use of local variables within functions.
|
|
|
|
<li/> See the section on <a href="#Expressions_from_files">expressions from
|
|
files</a> for information on the use of <code>-f</code> and <code>-e</code> flags.
|
|
|
|
</ul>
|
|
|
|
</div>
|
|
<!-- ================================================================ -->
|
|
<a id="Errors_and_transparency"/><h1>Errors and transparency</h1>
|
|
<button style="font-weight:bold;color:maroon;border:0" padding=0 onclick="bodyToggler.toggle('body_section_toggle_transparency');" href="javascript:;">Toggle section visibility</button>
|
|
<div id="body_section_toggle_transparency" style="display: block">
|
|
|
|
<p/>As soon as you have a programming language, you start having the problem
|
|
<i>What is my code doing, and why?</i> This includes getting syntax errors
|
|
— which are always annoying — as well as the even more annoying
|
|
problem of a program which parses without syntax error but doesn’t do
|
|
what you expect.
|
|
|
|
<p/> The <code>syntax error</code> message is cryptic: it says <code>syntax error at
|
|
</code> followed by the next symbol it couldn’t parse. This is good, but
|
|
(as of 5.0.0) it doesn’t say things like <code>syntax error at line 17,
|
|
character 22</code>. Here are some common causes of syntax errors:
|
|
|
|
<ul>
|
|
|
|
<li/> Don’t forget <code>;</code> at end of line, before another statement on
|
|
the next line.
|
|
|
|
<li/> Miller’s DSL lacks the <code>++</code> and <code>--</code> operators.
|
|
|
|
<li/> Curly braces are required for the bodies of
|
|
<code>if</code>/<code>while</code>/<code>for</code> blocks, even when the body is a single
|
|
statement.
|
|
|
|
</ul>
|
|
|
|
<p/>Now for transparency:
|
|
|
|
<ul>
|
|
|
|
<li/>As in any language, you can do
|
|
<a href="#Print_statements"><code>print</code></a> (or <code>eprint</code> to print to
|
|
stderr). See also <a href="#Dump_statements"><code>dump</code></a> and <a
|
|
href="#Emit_statements"><code>emit</code></a>.
|
|
|
|
<li/> The <code>-v</code> option to <code>mlr put</code> and <code>mlr filter</code> prints
|
|
abstract syntax trees for your code. While not all details here will be of
|
|
interest to everyone, certainly this makes questions such as operator
|
|
precedence completely unambiguous.
|
|
|
|
<li/> The <code>-T</code> option prints a trace of each statement executed.
|
|
|
|
<li/> The <code>-t</code> and <code>-a</code> options show low-level details for the
|
|
parsing process and for stack-variable-index allocation, respectively. These
|
|
will likely be of interest to people who enjoy compilers, and probably less
|
|
useful for a more general audience.
|
|
|
|
<li/> Please see the <a href="#Type-checking">type-checking section</a> for
|
|
type declarations and type-assertions you can use to make sure expressions and
|
|
the data flowing them are evaluating as you expect. I made them optional
|
|
because one of Miller’s important use-cases is being able to say simple
|
|
things like <code>mlr put '$y = $x + 1' myfile.dat</code> with a minimum of
|
|
punctuational bric-a-brac — but for programs over a few lines I generally
|
|
find that the more type-specification, the better.
|
|
|
|
</ul>
|
|
|
|
</div>
|
|
<!-- ================================================================ -->
|
|
<a id="A_note_on_the_complexity_of_Miller’s_expression_language"/><h1>A note on the complexity of Miller’s expression language</h1>
|
|
<button style="font-weight:bold;color:maroon;border:0" padding=0 onclick="bodyToggler.toggle('body_section_toggle_a_note_on_complexity');" href="javascript:;">Toggle section visibility</button>
|
|
<div id="body_section_toggle_a_note_on_complexity" style="display: block">
|
|
|
|
<p/> One of Miller’s strengths is its brevity: it’s much quicker
|
|
— and less error-prone — to type <code>mlr stats1 -a sum -f x,y -g
|
|
a,b</code> than having to track summation variables as in <code>awk</code>, or using
|
|
Miller’s out-of-stream variables. And the more language features
|
|
Miller’s put-DSL has (for-loops, if-statements, nested control
|
|
structures, user-defined functions, etc.) then the <i>less</i> powerful it
|
|
begins to seem: because of the other programming-language features it
|
|
<i>doesn’t</i> have (classes, execptions, and so on).
|
|
|
|
<p/> When I was originally prototyping Miller in 2015, the decision I had was
|
|
whether to hand-code in a low-level language like C or Rust, with my own
|
|
hand-rolled DSL, or whether to use a higher-level language (like Python or Lua
|
|
or Nim) and let the <code>put</code> statements be handled by the implementation
|
|
language’s own <code>eval</code>: the implementation language would take the
|
|
place of a DSL. Multiple performance experiments showed me I could get better
|
|
throughput using the former, and using C in particular — by a wide margin. So
|
|
Miller is C under the hood with a hand-rolled DSL.
|
|
|
|
<p/> I do want to keep focusing on what Miller is good at — concise
|
|
notation, low latency, and high throughput — and not add too much in
|
|
terms of high-level-language features to the DSL. That said, some sort of
|
|
customizability is a basic thing to want. As of 4.1.0 we have recursive
|
|
for/while/if structures on about the same complexity level as <code>awk</code>; as
|
|
of 5.0.0 we have user-defined functions and map-valued variables, again on
|
|
about the same complexity level as <code>awk</code> along with optional
|
|
type-declaration syntax. While I’m excited by these powerful language
|
|
features, I hope to keep new features beyond 5.0.0 focused on Miller’s
|
|
sweet spot which is speed plus simplicity.
|
|
|
|
</div>
|
|
|
|
<!-- ================================================================ -->
|
|
<script type="text/javascript" src="js/miller-doc-toggler.js"></script>
|
|
<!-- wtf -->
|
|
<script type="text/javascript">
|
|
// Put this at the bottom of the page since its constructor scans the
|
|
// document's div tags to find the toggleables.
|
|
const bodyToggler = new MillerDocToggler(
|
|
"body_section_toggle_",
|
|
'maroon',
|
|
'maroon',
|
|
);
|
|
</script>
|
|
|
|
</body>
|
|
</html>
|