mirror of
https://github.com/johnkerl/miller.git
synced 2026-01-23 18:25:45 +00:00
5450 lines
164 KiB
HTML
5450 lines
164 KiB
HTML
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
|
|
<html lang="en">
|
|
|
|
<!-- PAGE GENERATED FROM template.html and content-for-reference-dsl.html BY poki. -->
|
|
<!-- PLEASE MAKE CHANGES THERE AND THEN RE-RUN poki. -->
|
|
<head>
|
|
<meta http-equiv="Content-type" content="text/html;charset=UTF-8"/>
|
|
<meta name="description" content="Miller documentation"/>
|
|
<meta name="viewport" content="width=device-width, initial-scale=1.0"/> <!-- mobile-friendly -->
|
|
<meta name="keywords"
|
|
content="John Kerl, Kerl, Miller, miller, mlr, OLAP, data analysis software, regression, correlation, variance, data tools, " />
|
|
|
|
<title> Reference: DSL </title>
|
|
<link rel="stylesheet" type="text/css" href="css/miller.css"/>
|
|
<link rel="stylesheet" type="text/css" href="css/poki-callbacks.css"/>
|
|
</head>
|
|
|
|
<!-- ================================================================ -->
|
|
<script type="text/javascript">
|
|
var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www.");
|
|
document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E"));
|
|
</script>
|
|
<script type="text/javascript">
|
|
try {
|
|
var pageTracker = _gat._getTracker("UA-15651652-1");
|
|
pageTracker._trackPageview();
|
|
} catch(err) {}
|
|
</script>
|
|
|
|
<!-- ================================================================ -->
|
|
<script type="text/javascript">
|
|
function toggle_div(div) {
|
|
if (div != null) {
|
|
if (div.id.startsWith("section_toggle_")) {
|
|
var state = div.style.display;
|
|
if (state == "block") {
|
|
div.style.display = "none";
|
|
} else {
|
|
div.style.display = "block";
|
|
}
|
|
}
|
|
}
|
|
}
|
|
function expand_div(div) {
|
|
if (div != null) {
|
|
if (div.id.startsWith("section_toggle_")) {
|
|
div.style.display = "block";
|
|
}
|
|
}
|
|
}
|
|
function collapse_div(div) {
|
|
if (div != null) {
|
|
if (div.id.startsWith("section_toggle_")) {
|
|
div.style.display = "none";
|
|
}
|
|
}
|
|
}
|
|
|
|
function toggle_by_name(divName) {
|
|
toggle_div(document.getElementById(divName));
|
|
}
|
|
function expand_by_name(divName) {
|
|
expand_div(document.getElementById(divName));
|
|
}
|
|
function collapse_by_name(divName) {
|
|
collapse_div(document.getElementById(divName));
|
|
}
|
|
|
|
function expand_all() {
|
|
var divs = document.getElementsByTagName("div");
|
|
for(var i = 0; i < divs.length; i++) {
|
|
expand_div(divs[i]);
|
|
}
|
|
}
|
|
function collapse_all() {
|
|
var divs = document.getElementsByTagName("div");
|
|
for(var i = 0; i < divs.length; i++){
|
|
collapse_div(divs[i]);
|
|
}
|
|
}
|
|
</script>
|
|
|
|
<!--
|
|
The background image is from a screenshot of a Google search for "data analysis
|
|
tools", lightened and sepia-toned. Over this was placed a Mac Terminal app with
|
|
very light-grey font and translucent background, in which a few statistical
|
|
Miller commands were run with pretty-print-tabular output format.
|
|
<body background="pix/sepia-overlay.jpg">
|
|
-->
|
|
<body bgcolor="#ffffff">
|
|
|
|
<!-- ================================================================ -->
|
|
<table width="100%">
|
|
<tr>
|
|
|
|
<!-- navbar -->
|
|
<td width="15%">
|
|
<!--
|
|
<img src="pix/mlr.jpg" />
|
|
<img style="border-width:1px; color:black;" src="pix/mlr.jpg" />
|
|
-->
|
|
|
|
<div class="pokinav">
|
|
<center><titleinbody>Miller</titleinbody></center>
|
|
|
|
<!-- PAGE LIST GENERATED FROM template.html BY poki -->
|
|
<br/><b>Overview:</b>
|
|
<br/>• <a href="index.html">About Miller</a>
|
|
<br/>• <a href="10-min.html">Miller in 10 minutes</a>
|
|
<br/>• <a href="file-formats.html">File formats</a>
|
|
<br/>• <a href="feature-comparison.html">Miller features in the context of the Unix toolkit</a>
|
|
<br/>• <a href="record-heterogeneity.html">Record-heterogeneity</a>
|
|
<br/>• <a href="internationalization.html">Internationalization</a>
|
|
<br/><b>Using Miller:</b>
|
|
<br/>• <a href="faq.html">FAQ</a>
|
|
<br/>• <a href="cookbook.html">Cookbook part 1</a>
|
|
<br/>• <a href="cookbook2.html">Cookbook part 2</a>
|
|
<br/>• <a href="cookbook3.html">Cookbook part 3</a>
|
|
<br/>• <a href="data-examples.html">Data-diving examples</a>
|
|
<br/>• <a href="manpage.html">Manpage</a>
|
|
<br/>• <a href="reference.html">Reference</a>
|
|
<br/>• <a href="reference-verbs.html">Reference: Verbs</a>
|
|
<br/>• <a href="reference-dsl.html"><b>Reference: DSL</b></a>
|
|
<br/>• <a href="release-docs.html">Documents by release</a>
|
|
<br/>• <a href="build.html">Installation, portability, dependencies, and testing</a>
|
|
<br/><b>Background:</b>
|
|
<br/>• <a href="why.html">Why?</a>
|
|
<br/>• <a href="whyc.html">Why C?</a>
|
|
<br/>• <a href="etymology.html">Why call it Miller?</a>
|
|
<br/>• <a href="originality.html">How original is Miller?</a>
|
|
<br/>• <a href="performance.html">Performance</a>
|
|
<br/><b>Repository:</b>
|
|
<br/>• <a href="to-do.html">Things to do</a>
|
|
<br/>• <a href="contact.html">Contact information</a>
|
|
<br/>• <a href="https://github.com/johnkerl/miller">GitHub repo</a>
|
|
<br/> <br/> <br/> <br/> <br/> <br/> <br/> <br/> <br/> <br/> <br/> <br/>
|
|
<br/> <br/> <br/> <br/> <br/> <br/> <br/> <br/> <br/> <br/> <br/> <br/>
|
|
<br/> <br/> <br/> <br/> <br/> <br/>
|
|
</div>
|
|
</td>
|
|
|
|
<!-- page body -->
|
|
<td>
|
|
<!--
|
|
This is a visually gorgeous feature (here & in the CSS): it allows for
|
|
independent scroll of the nav and body panels. In particular the nav
|
|
stays on-screen as you scroll the body.
|
|
|
|
However, two problems:
|
|
|
|
(1) In Firefox & Chrome both I get janky end-of-body scrolls: there is
|
|
more content but I can't scroll down to it unless I repeatedly retry the
|
|
scrolldown. Which is weird.
|
|
|
|
(2) Worse, only the first page renders in PDF (again, Firefox & Chrome).
|
|
|
|
For now I'm disabling this separate-scroll feature. A frontender, I am
|
|
not ... maybe someday I'll find a config which gets *all* the features
|
|
I want; for now, it's a tradeoff.
|
|
-->
|
|
|
|
<!-- Implementation details: one bit is right here:
|
|
|
|
div style="overflow-y:scroll;height:1500px"
|
|
|
|
and the other bit is in css/poki-callbacks.css:
|
|
|
|
.pokinav {
|
|
display: inline-block;
|
|
background: #e8d9bc;
|
|
border: 1;
|
|
box-shadow: 0px 0px 3px 3px #C9C9C9;
|
|
margin: 10px;
|
|
padding-top: 10px;
|
|
padding-bottom: 10px;
|
|
padding-left: 10px;
|
|
padding-right: 10px;
|
|
overflow-y: scroll; < - - - - - - here
|
|
height: 1500px;
|
|
}
|
|
|
|
-->
|
|
<div>
|
|
<center> <titleinbody> Reference: DSL </titleinbody> </center>
|
|
<p/>
|
|
|
|
<!-- BODY COPIED FROM content-for-reference-dsl.html BY poki -->
|
|
<div class="pokitoc">
|
|
<center><b>Contents:</b></center>
|
|
• <a href="#Overview">Overview</a><br/>
|
|
• <a href="#Syntax">Syntax</a><br/>
|
|
• <a href="#Expression_formatting">Expression formatting</a><br/>
|
|
• <a href="#Expressions_from_files">Expressions from files</a><br/>
|
|
• <a href="#Semicolons,_commas,_newlines,_and_curly_braces">Semicolons, commas, newlines, and curly braces</a><br/>
|
|
• <a href="#Variables">Variables</a><br/>
|
|
• <a href="#Built-in_variables">Built-in variables</a><br/>
|
|
• <a href="#Field_names">Field names</a><br/>
|
|
• <a href="#Out-of-stream_variables">Out-of-stream variables</a><br/>
|
|
• <a href="#Indexed_out-of-stream_variables">Indexed out-of-stream variables</a><br/>
|
|
• <a href="#Local_variables">Local variables</a><br/>
|
|
• <a href="#Map_literals">Map literals</a><br/>
|
|
• <a href="#Type-checking">Type-checking</a><br/>
|
|
• <a href="#Type-test_and_type-assertion_expressions">Type-test and type-assertion expressions</a><br/>
|
|
• <a href="#Type-declarations_for_local_variables,_function_parameter,_and_function_return_values">Type-declarations for local variables, function parameter, and function return values</a><br/>
|
|
• <a href="#Null_data:_empty_and_absent">Null data: empty and absent</a><br/>
|
|
• <a href="#Aggregate_variable_assignments">Aggregate variable assignments</a><br/>
|
|
• <a href="#Keywords_for_filter_and_put">Keywords for filter and put</a><br/>
|
|
• <a href="#Operator_precedence">Operator precedence</a><br/>
|
|
• <a href="#Operator_and_function_semantics">Operator and function semantics</a><br/>
|
|
• <a href="#Control_structures">Control structures</a><br/>
|
|
• <a href="#Pattern-action_blocks">Pattern-action blocks</a><br/>
|
|
• <a href="#If-statements">If-statements</a><br/>
|
|
• <a href="#While_and_do-while_loops">While and do-while loops</a><br/>
|
|
• <a href="#For-loops">For-loops</a><br/>
|
|
• <a href="#Key-only_for-loops">Key-only for-loops</a><br/>
|
|
• <a href="#Key-value_for-loops">Key-value for-loops</a><br/>
|
|
• <a href="#C-style_triple-for_loops">C-style triple-for loops</a><br/>
|
|
• <a href="#Begin/end_blocks">Begin/end blocks</a><br/>
|
|
• <a href="#Output_statements">Output statements</a><br/>
|
|
• <a href="#Print_statements">Print statements</a><br/>
|
|
• <a href="#Dump_statements">Dump statements</a><br/>
|
|
• <a href="#Tee_statements">Tee statements</a><br/>
|
|
• <a href="#Redirected-output_statements">Redirected-output statements</a><br/>
|
|
• <a href="#Emit_statements">Emit statements</a><br/>
|
|
• <a href="#Multi-emit_statements">Multi-emit statements</a><br/>
|
|
• <a href="#Emit-all_statements">Emit-all statements</a><br/>
|
|
• <a href="#Unset_statements">Unset statements</a><br/>
|
|
• <a href="#Filter_statements">Filter statements</a><br/>
|
|
• <a href="#Built-in_functions_for_filter_and_put">Built-in functions for filter and put</a><br/>
|
|
• <a href="#+">+</a><br/>
|
|
• <a href="#-">-</a><br/>
|
|
• <a href="#*">*</a><br/>
|
|
• <a href="#/">/</a><br/>
|
|
• <a href="#//">//</a><br/>
|
|
• <a href="#%">%</a><br/>
|
|
• <a href="#**">**</a><br/>
|
|
• <a href="#|">|</a><br/>
|
|
• <a href="#^">^</a><br/>
|
|
• <a href="#&">&</a><br/>
|
|
• <a href="#~">~</a><br/>
|
|
• <a href="#<<"><<</a><br/>
|
|
• <a href="#>>">>></a><br/>
|
|
• <a href="#==">==</a><br/>
|
|
• <a href="#!=">!=</a><br/>
|
|
• <a href="#=~">=~</a><br/>
|
|
• <a href="#!=~">!=~</a><br/>
|
|
• <a href="#>">></a><br/>
|
|
• <a href="#>=">>=</a><br/>
|
|
• <a href="#<"><</a><br/>
|
|
• <a href="#<="><=</a><br/>
|
|
• <a href="#&&">&&</a><br/>
|
|
• <a href="#||">||</a><br/>
|
|
• <a href="#^^">^^</a><br/>
|
|
• <a href="#!">!</a><br/>
|
|
• <a href="#?_:">? :</a><br/>
|
|
• <a href="#.">.</a><br/>
|
|
• <a href="#abs">abs</a><br/>
|
|
• <a href="#acos">acos</a><br/>
|
|
• <a href="#acosh">acosh</a><br/>
|
|
• <a href="#asin">asin</a><br/>
|
|
• <a href="#asinh">asinh</a><br/>
|
|
• <a href="#asserting_absent">asserting_absent</a><br/>
|
|
• <a href="#asserting_bool">asserting_bool</a><br/>
|
|
• <a href="#asserting_boolean">asserting_boolean</a><br/>
|
|
• <a href="#asserting_empty">asserting_empty</a><br/>
|
|
• <a href="#asserting_empty_map">asserting_empty_map</a><br/>
|
|
• <a href="#asserting_float">asserting_float</a><br/>
|
|
• <a href="#asserting_int">asserting_int</a><br/>
|
|
• <a href="#asserting_map">asserting_map</a><br/>
|
|
• <a href="#asserting_nonempty_map">asserting_nonempty_map</a><br/>
|
|
• <a href="#asserting_not_empty">asserting_not_empty</a><br/>
|
|
• <a href="#asserting_not_map">asserting_not_map</a><br/>
|
|
• <a href="#asserting_not_null">asserting_not_null</a><br/>
|
|
• <a href="#asserting_null">asserting_null</a><br/>
|
|
• <a href="#asserting_numeric">asserting_numeric</a><br/>
|
|
• <a href="#asserting_present">asserting_present</a><br/>
|
|
• <a href="#asserting_string">asserting_string</a><br/>
|
|
• <a href="#atan">atan</a><br/>
|
|
• <a href="#atan2">atan2</a><br/>
|
|
• <a href="#atanh">atanh</a><br/>
|
|
• <a href="#boolean">boolean</a><br/>
|
|
• <a href="#cbrt">cbrt</a><br/>
|
|
• <a href="#ceil">ceil</a><br/>
|
|
• <a href="#cos">cos</a><br/>
|
|
• <a href="#cosh">cosh</a><br/>
|
|
• <a href="#depth">depth</a><br/>
|
|
• <a href="#dhms2fsec">dhms2fsec</a><br/>
|
|
• <a href="#dhms2sec">dhms2sec</a><br/>
|
|
• <a href="#erf">erf</a><br/>
|
|
• <a href="#erfc">erfc</a><br/>
|
|
• <a href="#exp">exp</a><br/>
|
|
• <a href="#expm1">expm1</a><br/>
|
|
• <a href="#float">float</a><br/>
|
|
• <a href="#floor">floor</a><br/>
|
|
• <a href="#fmtnum">fmtnum</a><br/>
|
|
• <a href="#fsec2dhms">fsec2dhms</a><br/>
|
|
• <a href="#fsec2hms">fsec2hms</a><br/>
|
|
• <a href="#gmt2sec">gmt2sec</a><br/>
|
|
• <a href="#gsub">gsub</a><br/>
|
|
• <a href="#haskey">haskey</a><br/>
|
|
• <a href="#hexfmt">hexfmt</a><br/>
|
|
• <a href="#hms2fsec">hms2fsec</a><br/>
|
|
• <a href="#hms2sec">hms2sec</a><br/>
|
|
• <a href="#int">int</a><br/>
|
|
• <a href="#invqnorm">invqnorm</a><br/>
|
|
• <a href="#is_absent">is_absent</a><br/>
|
|
• <a href="#is_bool">is_bool</a><br/>
|
|
• <a href="#is_boolean">is_boolean</a><br/>
|
|
• <a href="#is_empty">is_empty</a><br/>
|
|
• <a href="#is_empty_map">is_empty_map</a><br/>
|
|
• <a href="#is_float">is_float</a><br/>
|
|
• <a href="#is_int">is_int</a><br/>
|
|
• <a href="#is_map">is_map</a><br/>
|
|
• <a href="#is_nonempty_map">is_nonempty_map</a><br/>
|
|
• <a href="#is_not_empty">is_not_empty</a><br/>
|
|
• <a href="#is_not_map">is_not_map</a><br/>
|
|
• <a href="#is_not_null">is_not_null</a><br/>
|
|
• <a href="#is_null">is_null</a><br/>
|
|
• <a href="#is_numeric">is_numeric</a><br/>
|
|
• <a href="#is_present">is_present</a><br/>
|
|
• <a href="#is_string">is_string</a><br/>
|
|
• <a href="#joink">joink</a><br/>
|
|
• <a href="#joinkv">joinkv</a><br/>
|
|
• <a href="#joinv">joinv</a><br/>
|
|
• <a href="#leafcount">leafcount</a><br/>
|
|
• <a href="#length">length</a><br/>
|
|
• <a href="#log">log</a><br/>
|
|
• <a href="#log10">log10</a><br/>
|
|
• <a href="#log1p">log1p</a><br/>
|
|
• <a href="#logifit">logifit</a><br/>
|
|
• <a href="#madd">madd</a><br/>
|
|
• <a href="#mapdiff">mapdiff</a><br/>
|
|
• <a href="#mapexcept">mapexcept</a><br/>
|
|
• <a href="#mapselect">mapselect</a><br/>
|
|
• <a href="#mapsum">mapsum</a><br/>
|
|
• <a href="#max">max</a><br/>
|
|
• <a href="#mexp">mexp</a><br/>
|
|
• <a href="#min">min</a><br/>
|
|
• <a href="#mmul">mmul</a><br/>
|
|
• <a href="#msub">msub</a><br/>
|
|
• <a href="#pow">pow</a><br/>
|
|
• <a href="#qnorm">qnorm</a><br/>
|
|
• <a href="#round">round</a><br/>
|
|
• <a href="#roundm">roundm</a><br/>
|
|
• <a href="#sec2dhms">sec2dhms</a><br/>
|
|
• <a href="#sec2gmt">sec2gmt</a><br/>
|
|
• <a href="#sec2gmtdate">sec2gmtdate</a><br/>
|
|
• <a href="#sec2hms">sec2hms</a><br/>
|
|
• <a href="#sgn">sgn</a><br/>
|
|
• <a href="#sin">sin</a><br/>
|
|
• <a href="#sinh">sinh</a><br/>
|
|
• <a href="#splitkv">splitkv</a><br/>
|
|
• <a href="#splitkvx">splitkvx</a><br/>
|
|
• <a href="#splitnv">splitnv</a><br/>
|
|
• <a href="#splitnvx">splitnvx</a><br/>
|
|
• <a href="#sqrt">sqrt</a><br/>
|
|
• <a href="#strftime">strftime</a><br/>
|
|
• <a href="#string">string</a><br/>
|
|
• <a href="#strlen">strlen</a><br/>
|
|
• <a href="#strptime">strptime</a><br/>
|
|
• <a href="#sub">sub</a><br/>
|
|
• <a href="#substr">substr</a><br/>
|
|
• <a href="#systime">systime</a><br/>
|
|
• <a href="#tan">tan</a><br/>
|
|
• <a href="#tanh">tanh</a><br/>
|
|
• <a href="#tolower">tolower</a><br/>
|
|
• <a href="#toupper">toupper</a><br/>
|
|
• <a href="#typeof">typeof</a><br/>
|
|
• <a href="#urand">urand</a><br/>
|
|
• <a href="#urand32">urand32</a><br/>
|
|
• <a href="#urandint">urandint</a><br/>
|
|
• <a href="#User-defined_functions_and_subroutines">User-defined functions and subroutines</a><br/>
|
|
• <a href="#User-defined_functions">User-defined functions</a><br/>
|
|
• <a href="#User-defined_subroutines">User-defined subroutines</a><br/>
|
|
• <a href="#Errors_and_transparency">Errors and transparency</a><br/>
|
|
• <a href="#A_note_on_the_complexity_of_Miller’s_expression_language">A note on the complexity of Miller’s expression language</a><br/>
|
|
</div>
|
|
<p/>
|
|
|
|
<p/>
|
|
<button style="font-weight:bold;color:maroon;border:0" onclick="expand_all();" href="javascript:;">Expand all sections</button>
|
|
<button style="font-weight:bold;color:maroon;border:0" onclick="collapse_all();" href="javascript:;">Collapse all sections</button>
|
|
|
|
<!-- ================================================================ -->
|
|
<a id="Overview"/><h1>Overview</h1>
|
|
<button style="font-weight:bold;color:maroon;border:0" padding=0 onclick="toggle_by_name('section_toggle_overview');" href="javascript:;">Toggle section visibility</button>
|
|
<div id="section_toggle_overview" style="display: block">
|
|
|
|
<p/> Here’s comparison of verbs and <tt>put</tt>/<tt>filter</tt> DSL expressions:
|
|
|
|
<table border=1>
|
|
<tr> <td>
|
|
Example:
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr stats1 -a sum -f x -g a data/small
|
|
a=pan,x_sum=0.346790
|
|
a=eks,x_sum=1.140079
|
|
a=wye,x_sum=0.777892
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
<p/>
|
|
<ul>
|
|
<li/> Verbs are coded in C
|
|
<li/> They run a bit faster
|
|
<li/> They take fewer keystrokes
|
|
<li/> There is less to learn
|
|
<li/> Their customization is limited to each verb’s options
|
|
</ul>
|
|
</td>
|
|
<td>
|
|
Example:
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr put -q '@x_sum[$a] += $x; end{emit @x_sum, "a"}' data/small
|
|
a=pan,x_sum=0.346790
|
|
a=eks,x_sum=1.140079
|
|
a=wye,x_sum=0.777892
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
<ul>
|
|
<li/> You get to write your own DSL expressions
|
|
<li/> They run a bit slower
|
|
<li/> They take more keystrokes
|
|
<li/> There is more to learn
|
|
<li/> They are highly customizable
|
|
</ul>
|
|
</td> </tr>
|
|
</table>
|
|
|
|
<p/>Please see <a href="reference-dsl-verbs.html">here</a> for information on
|
|
verbs other than <tt>put</tt> and <tt>filter</tt>.
|
|
|
|
<p/>
|
|
The essential usages of <tt>mlr filter</tt> and <tt>mlr put</tt> are for
|
|
record-selection and record-updating expressions, respectively. For example, given the following input data:
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ cat data/small
|
|
a=pan,b=pan,i=1,x=0.3467901443380824,y=0.7268028627434533
|
|
a=eks,b=pan,i=2,x=0.7586799647899636,y=0.5221511083334797
|
|
a=wye,b=wye,i=3,x=0.20460330576630303,y=0.33831852551664776
|
|
a=eks,b=wye,i=4,x=0.38139939387114097,y=0.13418874328430463
|
|
a=wye,b=pan,i=5,x=0.5732889198020006,y=0.8636244699032729
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
|
|
<p/> you might retain only the records whose <tt>a</tt> field has value <tt>eks</tt>:
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr filter '$a == "eks"' data/small
|
|
a=eks,b=pan,i=2,x=0.7586799647899636,y=0.5221511083334797
|
|
a=eks,b=wye,i=4,x=0.38139939387114097,y=0.13418874328430463
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
|
|
<p/> or you might add a new field which is a function of existing fields:
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr put '$ab = $a . "_" . $b ' data/small
|
|
a=pan,b=pan,i=1,x=0.3467901443380824,y=0.7268028627434533,ab=pan_pan
|
|
a=eks,b=pan,i=2,x=0.7586799647899636,y=0.5221511083334797,ab=eks_pan
|
|
a=wye,b=wye,i=3,x=0.20460330576630303,y=0.33831852551664776,ab=wye_wye
|
|
a=eks,b=wye,i=4,x=0.38139939387114097,y=0.13418874328430463,ab=eks_wye
|
|
a=wye,b=pan,i=5,x=0.5732889198020006,y=0.8636244699032729,ab=wye_pan
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
|
|
<p/>The two verbs <tt>mlr filter</tt> and <tt>mlr put</tt> are essentially the
|
|
same. The only differences are:
|
|
|
|
<ul>
|
|
|
|
<li/> Expressions sent to <tt>mlr filter</tt> must end with a boolean expression,
|
|
which is the filtering criterion;
|
|
|
|
<li/> <tt>mlr filter</tt> expressions may not
|
|
reference the <tt>filter</tt> keyword within them; and
|
|
|
|
<li/> <tt>mlr filter</tt> expressions may not use <tt>tee</tt>, <tt>emit</tt>,
|
|
<tt>emitp</tt>, or <tt>emitf</tt>.
|
|
|
|
</ul>
|
|
|
|
<p/> All the rest is the same: in particular, you can define and invoke
|
|
functions and subroutines to help produce the final boolean statement, and
|
|
record fields may be assigned to in the statements preceding the final boolean
|
|
statement.
|
|
|
|
<p/>There are more details and more choices, of course, as detailed in the following sections.
|
|
|
|
</div>
|
|
<!-- ================================================================ -->
|
|
<a id="Syntax"/><h1>Syntax</h1>
|
|
<button style="font-weight:bold;color:maroon;border:0" padding=0 onclick="toggle_by_name('section_toggle_syntax');" href="javascript:;">Toggle section visibility</button>
|
|
<div id="section_toggle_syntax" style="display: block">
|
|
|
|
<!-- ================================================================ -->
|
|
<a id="Expression_formatting"/><h2>Expression formatting</h2>
|
|
|
|
<p/>Multiple expressions may be given, separated by semicolons, and each may refer to the ones before:
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ ruby -e '10.times{|i|puts "i=#{i}"}' | mlr --opprint put '$j = $i + 1; $k = $i +$j'
|
|
i j k
|
|
0 1 1
|
|
1 2 3
|
|
2 3 5
|
|
3 4 7
|
|
4 5 9
|
|
5 6 11
|
|
6 7 13
|
|
7 8 15
|
|
8 9 17
|
|
9 10 19
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
|
|
Newlines within the expression are ignored, which can help increase legibility of complex expressions:
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr --opprint put '
|
|
$nf = NF;
|
|
$nr = NR;
|
|
$fnr = FNR;
|
|
$filenum = FILENUM;
|
|
$filename = FILENAME
|
|
' data/small data/small2
|
|
a b i x y nf nr fnr filenum filename
|
|
pan pan 1 0.3467901443380824 0.7268028627434533 5 1 1 1 data/small
|
|
eks pan 2 0.7586799647899636 0.5221511083334797 5 2 2 1 data/small
|
|
wye wye 3 0.20460330576630303 0.33831852551664776 5 3 3 1 data/small
|
|
eks wye 4 0.38139939387114097 0.13418874328430463 5 4 4 1 data/small
|
|
wye pan 5 0.5732889198020006 0.8636244699032729 5 5 5 1 data/small
|
|
pan eks 9999 0.267481232652199086 0.557077185510228001 5 6 1 2 data/small2
|
|
wye eks 10000 0.734806020620654365 0.884788571337605134 5 7 2 2 data/small2
|
|
pan wye 10001 0.870530722602517626 0.009854780514656930 5 8 3 2 data/small2
|
|
hat wye 10002 0.321507044286237609 0.568893318795083758 5 9 4 2 data/small2
|
|
pan zee 10003 0.272054845593895200 0.425789896597056627 5 10 5 2 data/small2
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr --opprint filter '($x > 0.5 && $y < 0.5) || ($x < 0.5 && $y > 0.5)' then stats2 -a corr -f x,y data/medium
|
|
x_y_corr
|
|
-0.747994
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
|
|
<!-- ================================================================ -->
|
|
<a id="Expressions_from_files"/><h2>Expressions from files</h2>
|
|
|
|
<p/>The simplest way to enter expressions for <tt>put</tt> and <tt>filter</tt> is between single quotes on the command line, e.g.
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr --from data/small put '$xy = sqrt($x**2 + $y**2)'
|
|
a=pan,b=pan,i=1,x=0.3467901443380824,y=0.7268028627434533,xy=0.805299
|
|
a=eks,b=pan,i=2,x=0.7586799647899636,y=0.5221511083334797,xy=0.920998
|
|
a=wye,b=wye,i=3,x=0.20460330576630303,y=0.33831852551664776,xy=0.395376
|
|
a=eks,b=wye,i=4,x=0.38139939387114097,y=0.13418874328430463,xy=0.404317
|
|
a=wye,b=pan,i=5,x=0.5732889198020006,y=0.8636244699032729,xy=1.036584
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr --from data/small put 'func f(a, b) { return sqrt(a**2 + b**2) } $xy = f($x, $y)'
|
|
a=pan,b=pan,i=1,x=0.3467901443380824,y=0.7268028627434533,xy=0.805299
|
|
a=eks,b=pan,i=2,x=0.7586799647899636,y=0.5221511083334797,xy=0.920998
|
|
a=wye,b=wye,i=3,x=0.20460330576630303,y=0.33831852551664776,xy=0.395376
|
|
a=eks,b=wye,i=4,x=0.38139939387114097,y=0.13418874328430463,xy=0.404317
|
|
a=wye,b=pan,i=5,x=0.5732889198020006,y=0.8636244699032729,xy=1.036584
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
|
|
<p/>You may, though, find it convenient to put expressions into files for reuse, and read them
|
|
<b>using the -f option</b>. For example:
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ cat data/fe-example-3.mlr
|
|
func f(a, b) {
|
|
return sqrt(a**2 + b**2)
|
|
}
|
|
$xy = f($x, $y)
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr --from data/small put -f data/fe-example-3.mlr
|
|
a=pan,b=pan,i=1,x=0.3467901443380824,y=0.7268028627434533,xy=0.805299
|
|
a=eks,b=pan,i=2,x=0.7586799647899636,y=0.5221511083334797,xy=0.920998
|
|
a=wye,b=wye,i=3,x=0.20460330576630303,y=0.33831852551664776,xy=0.395376
|
|
a=eks,b=wye,i=4,x=0.38139939387114097,y=0.13418874328430463,xy=0.404317
|
|
a=wye,b=pan,i=5,x=0.5732889198020006,y=0.8636244699032729,xy=1.036584
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
|
|
<p/>If you have some of the logic in a file and you want to write the rest on the command line, you
|
|
can <b>use the -f and -e options together</b>:
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ cat data/fe-example-4.mlr
|
|
func f(a, b) {
|
|
return sqrt(a**2 + b**2)
|
|
}
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr --from data/small put -f data/fe-example-4.mlr -e '$xy = f($x, $y)'
|
|
a=pan,b=pan,i=1,x=0.3467901443380824,y=0.7268028627434533,xy=0.805299
|
|
a=eks,b=pan,i=2,x=0.7586799647899636,y=0.5221511083334797,xy=0.920998
|
|
a=wye,b=wye,i=3,x=0.20460330576630303,y=0.33831852551664776,xy=0.395376
|
|
a=eks,b=wye,i=4,x=0.38139939387114097,y=0.13418874328430463,xy=0.404317
|
|
a=wye,b=pan,i=5,x=0.5732889198020006,y=0.8636244699032729,xy=1.036584
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
|
|
<p/>A suggested use-case here is defining functions in files, and calling them from command-line expressions.
|
|
|
|
<p/>Another suggested use-case is putting default parameter values in files, e.g. using
|
|
<tt>begin{@count=is_present(@count)?@count:10}</tt> in the file, where you can precede that using
|
|
<tt>begin{@count=40}</tt> using <tt>-e</tt>.
|
|
|
|
<p/>Moreover, you can have one or more <tt>-f</tt> expressions (maybe one
|
|
function per file, for example) and one or more <tt>-e</tt> expressions on the
|
|
command line. If you mix <tt>-f</tt> and <tt>-e</tt> then the expressions are
|
|
evaluated in the order encountered. (Since the expressions are all simply
|
|
concatenated together in order, don’t forget intervening semicolons: e.g.
|
|
not <tt>mlr put -e '$x=1' -e '$y=2 ...'</tt> but rather <tt>mlr put -e '$x=1;' -e
|
|
'$y=2' ...</tt>.)
|
|
|
|
<!-- ================================================================ -->
|
|
<a id="Semicolons,_commas,_newlines,_and_curly_braces"/><h2>Semicolons, commas, newlines, and curly braces</h2>
|
|
|
|
<p/>Miller uses <b>semicolons as statement separators</b>, not statement terminators. This means you can write:
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
mlr put 'x=1'
|
|
mlr put 'x=1;$y=2'
|
|
mlr put 'x=1;$y=2;'
|
|
mlr put 'x=1;;;;$y=2;'
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
|
|
<p/>Semicolons are optional after closing curly braces (which close conditionals and loops as discussed below).
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ echo x=1,y=2 | mlr put 'while (NF < 10) { $[NF+1] = ""} $foo = "bar"'
|
|
x=1,y=2,3=,4=,5=,6=,7=,8=,9=,10=,foo=bar
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ echo x=1,y=2 | mlr put 'while (NF < 10) { $[NF+1] = ""}; $foo = "bar"'
|
|
x=1,y=2,3=,4=,5=,6=,7=,8=,9=,10=,foo=bar
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
|
|
<p/>Semicolons are required between statements even if those statements are on
|
|
separate lines. <b>Newlines</b> are for your convenience but have no syntactic
|
|
meaning: line endings do not terminate statements. For example, adjacent
|
|
assignment statements must be separated by semicolons even if those statements
|
|
are on separate lines:
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
mlr put '
|
|
$x = 1
|
|
$y = 2 # Syntax error
|
|
'
|
|
|
|
mlr put '
|
|
$x = 1;
|
|
$y = 2 # This is OK
|
|
'
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
|
|
<p/><b>Trailing commas</b> are allowed in function/subroutine definitions,
|
|
function/subroutine callsites, and map literals. This is intended for (although
|
|
not restricted to) the multi-line case:
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr --csvlite --from data/a.csv put '
|
|
func f(
|
|
num a,
|
|
num b,
|
|
): num {
|
|
return a**2 + b**2;
|
|
}
|
|
$* = {
|
|
"s": $a + $b,
|
|
"t": $a - $b,
|
|
"u": f(
|
|
$a,
|
|
$b,
|
|
),
|
|
"v": NR,
|
|
}
|
|
'
|
|
s,t,u,v
|
|
3,-1,5.000000,1
|
|
9,-1,41.000000,2
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
|
|
<p/>Bodies for all compound statements must be enclosed in <b>curly braces</b>, even if the body is a single statement:
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
mlr put 'if ($x == 1) $y = 2' # Syntax error
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
mlr put 'if ($x == 1) { $y = 2 }' # This is OK
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
|
|
<p/>Bodies for compound statements may be empty:
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
mlr put 'if ($x == 1) { }' # This no-op is syntactically acceptable
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
|
|
</div>
|
|
<!-- ================================================================ -->
|
|
<a id="Variables"/><h1>Variables</h1>
|
|
<button style="font-weight:bold;color:maroon;border:0" padding=0 onclick="toggle_by_name('section_toggle_variables');" href="javascript:;">Toggle section visibility</button>
|
|
<div id="section_toggle_variables" style="display: block">
|
|
|
|
<p/>Miller has the following kinds of variables:
|
|
|
|
<p/> <b>Built-in variables</b> such as <tt>NF</tt>, <tt>NF</tt>,
|
|
<tt>FILENAME</tt>, <tt>PI</tt>, and <tt>E</tt>. These are all capital letters
|
|
and are read-only (although some of them change value from one record to
|
|
another).
|
|
|
|
<p/> <b>Fields of stream records</b>, accessed using the <tt>$</tt> prefix.
|
|
These refer to fields of the current data-stream record. For example, in
|
|
<tt>echo x=1,y=2 | mlr put '$z = $x + $y'</tt>, <tt>$x</tt> and <tt>$y</tt>
|
|
refer to input fields, and <tt>$z</tt> refers to a new, computed output field.
|
|
In a few contexts, presented below, you can refer to the entire record as
|
|
<tt>$*</tt>.
|
|
|
|
<p/> <b>Out-of-stream variables</b> accessed using the <tt>@</tt> prefix. These
|
|
refer to data which persist from one record to the next, including in
|
|
<tt>begin</tt> and <tt>end</tt> blocks (which execute before/after the record
|
|
stream is consumed, respectively). You use them to remember values across
|
|
records, such as sums, differences, counters, and so on. In a few contexts,
|
|
presented below, you can refer to the entire out-of-stream-variables collection
|
|
as <tt>@*</tt>.
|
|
|
|
<p/> <b>Local variables</b> are limited in scope and extent to the current
|
|
statements being executed: these include function arguments, bound variables in
|
|
for loops, and explicitly declared local variables.
|
|
|
|
<p/> <b>Keywords</b> are not variables, but since their names are reserved, you
|
|
cannot use these names for local variables.
|
|
|
|
<!-- ================================================================ -->
|
|
<a id="Built-in_variables"/><h2>Built-in variables</h2>
|
|
|
|
<p/> These are written all in capital letters, such as <tt>NR</tt>,
|
|
<tt>NF</tt>, <tt>FILENAME</tt>, and only a small, specific set of them is
|
|
defined by Miller.
|
|
|
|
<p/>Namely, Miller supports the following five built-in variables for <a
|
|
href="reference-verbs.html#filter"><tt>filter</tt></a> and <tt>put</tt>, all <tt>awk</tt>-inspired:
|
|
<tt>NF</tt>, <tt>NR</tt>, <tt>FNR</tt>, <tt>FILENUM</tt>, and
|
|
<tt>FILENAME</tt>, as well as the mathematical constants <tt>PI</tt> and
|
|
<tt>E</tt>. Lastly, the <tt>ENV</tt> hashmap allows read access to environment
|
|
variables, e.g. <tt>ENV["HOME"]</tt> or <tt>ENV["foo_".$hostname]</tt>.
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr filter 'FNR == 2' data/small*
|
|
a=eks,b=pan,i=2,x=0.7586799647899636,y=0.5221511083334797
|
|
1=pan,2=pan,3=1,4=0.3467901443380824,5=0.7268028627434533
|
|
a=wye,b=eks,i=10000,x=0.734806020620654365,y=0.884788571337605134
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr put '$fnr = FNR' data/small*
|
|
a=pan,b=pan,i=1,x=0.3467901443380824,y=0.7268028627434533,fnr=1
|
|
a=eks,b=pan,i=2,x=0.7586799647899636,y=0.5221511083334797,fnr=2
|
|
a=wye,b=wye,i=3,x=0.20460330576630303,y=0.33831852551664776,fnr=3
|
|
a=eks,b=wye,i=4,x=0.38139939387114097,y=0.13418874328430463,fnr=4
|
|
a=wye,b=pan,i=5,x=0.5732889198020006,y=0.8636244699032729,fnr=5
|
|
1=a,2=b,3=i,4=x,5=y,fnr=1
|
|
1=pan,2=pan,3=1,4=0.3467901443380824,5=0.7268028627434533,fnr=2
|
|
1=eks,2=pan,3=2,4=0.7586799647899636,5=0.5221511083334797,fnr=3
|
|
1=wye,2=wye,3=3,4=0.20460330576630303,5=0.33831852551664776,fnr=4
|
|
1=eks,2=wye,3=4,4=0.38139939387114097,5=0.13418874328430463,fnr=5
|
|
1=wye,2=pan,3=5,4=0.5732889198020006,5=0.8636244699032729,fnr=6
|
|
a=pan,b=eks,i=9999,x=0.267481232652199086,y=0.557077185510228001,fnr=1
|
|
a=wye,b=eks,i=10000,x=0.734806020620654365,y=0.884788571337605134,fnr=2
|
|
a=pan,b=wye,i=10001,x=0.870530722602517626,y=0.009854780514656930,fnr=3
|
|
a=hat,b=wye,i=10002,x=0.321507044286237609,y=0.568893318795083758,fnr=4
|
|
a=pan,b=zee,i=10003,x=0.272054845593895200,y=0.425789896597056627,fnr=5
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
|
|
<p/> Their values of <tt>NF</tt>, <tt>NR</tt>, <tt>FNR</tt>, <tt>FILENUM</tt>,
|
|
and <tt>FILENAME</tt> change from one record to the next as Miller scans
|
|
through your input data stream. The mathematical constants, of course, do not
|
|
change; <tt>ENV</tt> is populated from the system environment variables at the
|
|
time Miller starts and is read-only for the remainder of program execution.
|
|
|
|
<p/> Their <b>scope is global</b>: you can refer to them in any <tt>filter</tt>
|
|
or <tt>put</tt> statement. Their values are assigned by the input-record
|
|
reader:
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr --csv put '$nr = NR' data/a.csv
|
|
a,b,c,nr
|
|
1,2,3,1
|
|
4,5,6,2
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr --csv repeat -n 3 then put '$nr = NR' data/a.csv
|
|
a,b,c,nr
|
|
1,2,3,1
|
|
1,2,3,1
|
|
1,2,3,1
|
|
4,5,6,2
|
|
4,5,6,2
|
|
4,5,6,2
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
|
|
<p/> The <b>extent</b> is for the duration of the put/filter: in a
|
|
<tt>begin</tt> statement (which executes before the first input record is
|
|
consumed) you will find <tt>NR=1</tt> and in an <tt>end</tt> statement (which
|
|
is executed after the last input record is consumed) you will find <tt>NR</tt>
|
|
to be the total number of records ingested.
|
|
|
|
<p/> These are all <b>read-only</b> for the <tt>mlr put</tt> and <tt>mlr
|
|
filter</tt> DSLs: they may be assigned from, e.g. <tt>$nr=NR</tt>, but they may
|
|
not be assigned to: <tt>NR=100</tt> is a syntax error.
|
|
|
|
<!-- ================================================================ -->
|
|
<a id="Field_names"/><h2>Field names</h2>
|
|
|
|
<p/>Names of fields within stream records must be specified using a <tt>$</tt>
|
|
in <tt>filter</tt> and <a href="reference-verbs.html#put"><tt>put</tt></a>
|
|
expressions, even though the dollar signs don’t appear in the data stream
|
|
itself. For integer-indexed data, this looks like <tt>awk</tt>’s
|
|
<tt>$1,$2,$3</tt>, except that Miller allows non-numeric names such as
|
|
<tt>$quantity</tt> or <tt>$hostname</tt>. Likewise, enclose string literals
|
|
in double quotes in <tt>filter</tt> expressions even though they don’t
|
|
appear in file data. In particular, <tt>mlr filter '$x=="abc"'</tt> passes
|
|
through the record <tt>x=abc</tt>.
|
|
|
|
<p/>If field names have <b>special characters</b> such as <tt>.</tt> then you
|
|
can use braces, e.g. <tt>'${field.name}'</tt>.
|
|
|
|
<p/>You may also use a <b>computed field name</b> in square brackets, e.g.
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ echo a=3,b=4 | mlr filter '$["x"] < 0.5'
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ echo s=green,t=blue,a=3,b=4 | mlr put '$[$s."_".$t] = $a * $b'
|
|
s=green,t=blue,a=3,b=4,green_blue=12
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
|
|
<p/> The names of record fields depend on the contents of your input data stream, and their
|
|
values change from one record to the next as Miller scans through your input
|
|
data stream.
|
|
|
|
<p/> Their <b>extent</b> is limited to the current record; their <b>scope</b>
|
|
is the <tt>filter</tt> or <tt>put</tt> command in which they appear.
|
|
|
|
<p/> These are <b>read-write</b>: you can do <tt>$y=2*$x</tt>,
|
|
<tt>$x=$x+1</tt>, etc.
|
|
|
|
<p/> Records are Miller’s output: field names present in the input
|
|
stream are passed through to output (written to standard output) unless fields
|
|
are removed with <tt>cut</tt>, or records are excluded with <tt>filter</tt> or
|
|
<tt>put -q</tt>, etc. Simply assign a value to a field and it will be output.
|
|
|
|
<!-- ================================================================ -->
|
|
<a id="Out-of-stream_variables"/><h2>Out-of-stream variables</h2>
|
|
|
|
<p/> These are prefixed with an at-sign, e.g. <tt>@sum</tt>. Furthermore,
|
|
unlike built-in variables and stream-record fields, they are maintained in an
|
|
arbitrarily nested hashmap: you can do <tt>@sum += $quanity</tt>, or
|
|
<tt>@sum[$color] += $quanity</tt>, or <tt>@sum[$color][$shape] +=
|
|
$quanity</tt>. The keys for the multi-level hashmap can be any expression which
|
|
evaluates to string or integer: e.g. <tt>@sum[NR] = $a + $b</tt>,
|
|
<tt>@sum[$a."-".$b] = $x</tt>, etc.
|
|
|
|
<p/> Their names and their values are entirely under your control; they change
|
|
only when you assign to them.
|
|
|
|
<p/> Just as for field names in stream records, if you want to define out-of-stream variables
|
|
with <b>special characters</b> such as <tt>.</tt> then you can use braces, e.g. <tt>'@{variable.name}["index"]'</tt>.
|
|
|
|
<p/>You may use a <b>computed key </b> in square brackets, e.g.
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ echo s=green,t=blue,a=3,b=4 | mlr put -q '@[$s."_".$t] = $a * $b; emit all'
|
|
green_blue=12
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
|
|
<p/> Out-of-stream variables are <b>scoped</b> to the <tt>put</tt> command in
|
|
which they appear. In particular, if you have two or more <tt>put</tt>
|
|
commands separated by <tt>then</tt>, each put will have its own set of
|
|
out-of-stream variables:
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ cat data/a.dkvp
|
|
a=1,b=2,c=3
|
|
a=4,b=5,c=6
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr put '@sum += $a; end {emit @sum}' then put 'is_present($a) {$a=10*$a; @sum += $a}; end {emit @sum}' data/a.dkvp
|
|
a=10,b=2,c=3
|
|
a=40,b=5,c=6
|
|
sum=5
|
|
sum=50
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
|
|
<p/> Out-of-stream variables’ <b>extent</b> is from the start to the end of the record stream,
|
|
i.e. every time the <tt>put</tt> or <tt>filter</tt> statement referring to them is executed.
|
|
|
|
<p/> Out-of-stream variables are <b>read-write</b>: you can do <tt>$sum=@sum</tt>, <tt>@sum=$sum</tt>,
|
|
etc.
|
|
|
|
<!-- ================================================================ -->
|
|
<a id="Indexed_out-of-stream_variables"/><h2>Indexed out-of-stream variables</h2>
|
|
|
|
<p/>Using an index on the <tt>@count</tt> and <tt>@sum</tt> variables, we get the benefit of the
|
|
<tt>-g</tt> (group-by) option which <tt>mlr stats1</tt> and various other Miller commands have:
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr put -q '
|
|
@x_count[$a] += 1;
|
|
@x_sum[$a] += $x;
|
|
end {
|
|
emit @x_count, "a";
|
|
emit @x_sum, "a";
|
|
}
|
|
' ../data/small
|
|
a=pan,x_count=2
|
|
a=eks,x_count=3
|
|
a=wye,x_count=2
|
|
a=zee,x_count=2
|
|
a=hat,x_count=1
|
|
a=pan,x_sum=0.849416
|
|
a=eks,x_sum=1.751863
|
|
a=wye,x_sum=0.777892
|
|
a=zee,x_sum=1.125680
|
|
a=hat,x_sum=0.031442
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr stats1 -a count,sum -f x -g a ../data/small
|
|
a=pan,x_count=2,x_sum=0.849416
|
|
a=eks,x_count=3,x_sum=1.751863
|
|
a=wye,x_count=2,x_sum=0.777892
|
|
a=zee,x_count=2,x_sum=1.125680
|
|
a=hat,x_count=1,x_sum=0.031442
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
|
|
<p/>Indices can be arbitrarily deep — here there are two or more of them:
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr --from data/medium put -q '
|
|
@x_count[$a][$b] += 1;
|
|
@x_sum[$a][$b] += $x;
|
|
end {
|
|
emit (@x_count, @x_sum), "a", "b";
|
|
}
|
|
'
|
|
a=pan,b=pan,x_count=427,x_sum=219.185129
|
|
a=pan,b=wye,x_count=395,x_sum=198.432931
|
|
a=pan,b=eks,x_count=429,x_sum=216.075228
|
|
a=pan,b=hat,x_count=417,x_sum=205.222776
|
|
a=pan,b=zee,x_count=413,x_sum=205.097518
|
|
a=eks,b=pan,x_count=371,x_sum=179.963030
|
|
a=eks,b=wye,x_count=407,x_sum=196.945286
|
|
a=eks,b=zee,x_count=357,x_sum=176.880365
|
|
a=eks,b=eks,x_count=413,x_sum=215.916097
|
|
a=eks,b=hat,x_count=417,x_sum=208.783171
|
|
a=wye,b=wye,x_count=377,x_sum=185.295850
|
|
a=wye,b=pan,x_count=392,x_sum=195.847900
|
|
a=wye,b=hat,x_count=426,x_sum=212.033183
|
|
a=wye,b=zee,x_count=385,x_sum=194.774048
|
|
a=wye,b=eks,x_count=386,x_sum=204.812961
|
|
a=zee,b=pan,x_count=389,x_sum=202.213804
|
|
a=zee,b=wye,x_count=455,x_sum=233.991394
|
|
a=zee,b=eks,x_count=391,x_sum=190.961778
|
|
a=zee,b=zee,x_count=403,x_sum=206.640635
|
|
a=zee,b=hat,x_count=409,x_sum=191.300006
|
|
a=hat,b=wye,x_count=423,x_sum=208.883010
|
|
a=hat,b=zee,x_count=385,x_sum=196.349450
|
|
a=hat,b=eks,x_count=389,x_sum=189.006793
|
|
a=hat,b=hat,x_count=381,x_sum=182.853532
|
|
a=hat,b=pan,x_count=363,x_sum=168.553807
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
|
|
The idea is that <tt>stats1</tt>, and other Miller verbs, encapsulate
|
|
frequently-used patterns with a minimum of keystroking (and run a little
|
|
faster), whereas using out-of-stream variables you have more flexibility and
|
|
control in what you do.
|
|
|
|
<p/>Begin/end blocks can be mixed with pattern/action blocks. For example:
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr put '
|
|
begin {
|
|
@num_total = 0;
|
|
@num_positive = 0;
|
|
};
|
|
@num_total += 1;
|
|
$x > 0.0 {
|
|
@num_positive += 1;
|
|
$y = log10($x); $z = sqrt($y)
|
|
};
|
|
end {
|
|
emitf @num_total, @num_positive
|
|
}
|
|
' data/put-gating-example-1.dkvp
|
|
x=-1
|
|
x=0
|
|
x=1,y=0.000000,z=0.000000
|
|
x=2,y=0.301030,z=0.548662
|
|
x=3,y=0.477121,z=0.690740
|
|
num_total=5,num_positive=3
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
|
|
<!-- ================================================================ -->
|
|
<a id="Local_variables"/><h2>Local variables</h2>
|
|
|
|
<p/>Local variables are similar to out-of-stream variables, except that
|
|
their extent is limited to the expressions in which they appear (and their
|
|
basenames can’t be computed using square brackets).
|
|
There are three kinds of local variables: <b>arguments</b> to
|
|
functions/subroutines, <b>variables bound within for-loops</b>, and
|
|
<b>locals</b> defined within control blocks. They may be untyped using
|
|
<tt>var</tt>, or typed using <tt>num</tt>, <tt>int</tt>, <tt>float</tt>,
|
|
<tt>str</tt>, <tt>bool</tt>, and <tt>map</tt>.
|
|
|
|
<p/>For example:
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ # Here I'm using a specified random-number seed so this example always
|
|
# produces the same output for this web document: in everyday practice we
|
|
# would leave off the --seed 12345 part.
|
|
mlr --seed 12345 seqgen --start 1 --stop 10 then put '
|
|
func f(a, b) { # function arguments a and b
|
|
r = 0.0; # local r scoped to the function
|
|
for (int i = 0; i < 6; i += 1) { # local i scoped to the for-loop
|
|
num u = urand(); # local u scoped to the for-loop
|
|
r += u; # updates r from the enclosing scope
|
|
}
|
|
r /= 6;
|
|
return a + (b - a) * r;
|
|
}
|
|
num o = f(10, 20); # local to the top-level scope
|
|
$o = o;
|
|
'
|
|
i=1,o=14.662901
|
|
i=2,o=17.881983
|
|
i=3,o=14.586560
|
|
i=4,o=16.402409
|
|
i=5,o=16.336598
|
|
i=6,o=14.622701
|
|
i=7,o=15.983753
|
|
i=8,o=13.852177
|
|
i=9,o=15.472899
|
|
i=10,o=15.643912
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
|
|
<p/>Things which are completely unsurprising, resembling many other languages:
|
|
|
|
<ul>
|
|
|
|
<li/> Parameter names are bound to their arguments but can be reassigned, e.g.
|
|
if there is a parameter named <tt>a</tt> then you can reassign the value of
|
|
<tt>a</tt> to be something else within the function if you like.
|
|
|
|
<li/> However, you cannot redeclare the <i>type</i> of an argument or a local:
|
|
<tt>var a=1; var a=2</tt> is an error but
|
|
<tt>var a=1; a=2</tt> is OK.
|
|
|
|
<li/> All argument-passing is positional rather than by name; arguments are
|
|
passed by value, not by reference. (This is also true for map-valued variables:
|
|
they are not, and cannot be, passed by reference)
|
|
|
|
<li/> You can define locals (using <tt>var</tt>, <tt>num</tt>, etc.) at any
|
|
scope (if-statements, else-statements, while-loops, for-loops, or the top-level
|
|
scope), and nested scopes will have access (more details on scope in the next
|
|
section). If you define a local variable with the same name inside an inner
|
|
scope, then a new variable is created with the narrower scope.
|
|
|
|
<li/> If you assign to a local variable for the first time in a scope without
|
|
declaring it as <tt>var</tt>, <tt>num</tt>, etc. then: if it exists in an outer
|
|
scope, that outer-scope variable will be updated; if not, it will be defined in
|
|
the current scope as if <tt>var</tt> had been used. (See also <a
|
|
href="#Type-checking">here</a> for an example.) I recommend always declaring
|
|
variables explicitly to make the intended scoping clear.
|
|
|
|
<li/> Functions and subroutines never have access to locals from their callee
|
|
(unless passed by value as arguments).
|
|
|
|
</ul>
|
|
|
|
<p/>Things which are perhaps surprising compared to other languages:
|
|
|
|
<ul>
|
|
|
|
<li/> Type declarations using <tt>var</tt>, or typed using <tt>num</tt>,
|
|
<tt>int</tt>, <tt>float</tt>, <tt>str</tt>, and <tt>bool</tt> are necessary to
|
|
declare local variables. Function arguments and variables bound in for-loops
|
|
over stream records and out-of-stream variables are <i>implicitly</i> declared
|
|
using <tt>var</tt>. (Some examples are shown below.)
|
|
|
|
<li/> Type-checking is done at assignment time. For example, <tt>float f =
|
|
0</tt> is an error (since <tt>0</tt> is an integer), as is <tt>float f = 0.0; f
|
|
= 1</tt>. For this reason I prefer to use <tt>num</tt> over <tt>float</tt> in
|
|
most contexts since <tt>num</tt> encompasses integer and floating-point values.
|
|
More information about type-checking is <a href="#Type-checking">here</a>.
|
|
|
|
<li/> Bound variables in for-loops over stream records and out-of-stream
|
|
variables are implicitly local to that block. E.g. in
|
|
<tt>for (k, v in $*) { ... }</tt>
|
|
<tt>for ((k1, k2), v in @*) { ... }</tt>
|
|
if there are <tt>k</tt>, <tt>v</tt>, etc. in the enclosing scope then those
|
|
will be masked by the loop-local bound variables in the loop, and moreover
|
|
the values of the loop-local bound variables are not available after the
|
|
end of the loop.
|
|
|
|
<li/> For C-style triple-for loops, if a for-loop variable is defined using
|
|
<tt>var</tt>, <tt>int</tt>, etc. then it is scoped to that for-loop. E.g.
|
|
<tt>for (i = 0; i < 10; i += 1) { ... }</tt> and <tt>for (int i = 0; i < 10; i
|
|
+= 1) { ... }</tt>. (This is unsurprising.). If there is no typedecl and an
|
|
outer-scope variable of that name exists, then it is used. (This is also
|
|
unsurprising.) But of there is no outer-scope variable of that name then the
|
|
variable is scoped to the for-loop only.
|
|
|
|
</ul>
|
|
|
|
<p/> The following example demonstrates the scope rules:
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ cat data/scope-example.mlr
|
|
func f(a) { # argument is local to the function
|
|
var b = 100; # local to the function
|
|
c = 100; # local to the function; does not overwrite outer c
|
|
return a + 1;
|
|
}
|
|
var a = 10; # local at top level
|
|
var b = 20; # local at top level
|
|
c = 30; # local at top level; there is no more-outer-scope c
|
|
if (NR == 3) {
|
|
var a = 40; # scoped to the if-statement; doesn't overwrite outer a
|
|
b = 50; # not scoped to the if-statement; overwrites outer b
|
|
c = 60; # not scoped to the if-statement; overwrites outer c
|
|
d = 70; # there is no outer d so a local d is created here
|
|
|
|
$inner_a = a;
|
|
$inner_b = b;
|
|
$inner_c = c;
|
|
$inner_d = d;
|
|
}
|
|
$outer_a = a;
|
|
$outer_b = b;
|
|
$outer_c = c;
|
|
$outer_d = d; # there is no outer d defined so no assignment happens
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ cat data/scope-example.dat
|
|
n=1,x=123
|
|
n=2,x=456
|
|
n=3,x=789
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr --oxtab --from data/scope-example.dat put -f data/scope-example.mlr
|
|
n 1
|
|
x 123
|
|
outer_a 10
|
|
outer_b 20
|
|
outer_c 30
|
|
|
|
n 2
|
|
x 456
|
|
outer_a 10
|
|
outer_b 20
|
|
outer_c 30
|
|
|
|
n 3
|
|
x 789
|
|
inner_a 40
|
|
inner_b 50
|
|
inner_c 60
|
|
inner_d 70
|
|
outer_a 10
|
|
outer_b 50
|
|
outer_c 60
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
|
|
<p/> And this example demonstrates the type-declaration rules:
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ cat data/type-decl-example.mlr
|
|
subr s(a, str b, int c) { # a is implicitly var (untyped).
|
|
# b is explicitly str.
|
|
# c is explicitly int.
|
|
# The type-checking is done at the callsite
|
|
# when arguments are bound to parameters.
|
|
#
|
|
var b = 100; # error # Re-declaration in the same scope is disallowed.
|
|
int n = 10; # Declaration of variable local to the subroutine.
|
|
n = 20; # Assignment is OK.
|
|
int n = 30; # error # Re-declaration in the same scope is disallowed.
|
|
str n = "abc"; # error # Re-declaration in the same scope is disallowed.
|
|
#
|
|
float f1 = 1; # error # 1 is an int, not a float.
|
|
float f2 = 2.0; # 2.0 is a float.
|
|
num f3 = 3; # 3 is a num.
|
|
num f4 = 4.0; # 4.0 is a num.
|
|
} #
|
|
#
|
|
call s(1, 2, 3); # Type-assertion '3 is int' is done here at the callsite.
|
|
#
|
|
k = "def"; # Top-level variable k.
|
|
#
|
|
for (str k, v in $*) { # k and v are bound here, masking outer k.
|
|
print k . ":" . v; # k is explicitly str; v is implicitly var.
|
|
} #
|
|
#
|
|
print "k is".k; # k at this scope level is still "def".
|
|
print "v is".v; # v is undefined in this scope.
|
|
#
|
|
i = -1; #
|
|
for (i = 1, int j = 2; i <= 10; i += 1, j *= 2) { # C-style triple-for variables use enclosing scope, unless
|
|
# declared local: i is outer, j is local to the loop.
|
|
print "inner i =" . i; #
|
|
print "inner j =" . j; #
|
|
} #
|
|
print "outer i =" . i; # i has been modified by the loop.
|
|
print "outer j =" . j; # j is undefined in this scope.
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
|
|
<!-- ================================================================ -->
|
|
<a id="Map_literals"/><h2>Map literals</h2>
|
|
|
|
<p/>Miller’s <tt>put</tt>/<tt>filter</tt> DSL has four kinds of hashmaps.
|
|
<b>Stream records</b> are (single-level) maps from name to value.
|
|
<b>Out-of-stream variables</b> and <b>local variables</b> can also be maps,
|
|
although they can be multi-level hashmaps (e.g. <tt>@sum[$x][$y]</tt>). The
|
|
fourth kind is <b>map literals</b>. These cannot be on the left-hand side of
|
|
assignment expressions. Syntactically they look like JSON, although Miller
|
|
allows string and integer keys in its map literals while JSON allows only
|
|
string keys (e.g. <tt>"3"</tt> rather than <tt>3</tt>).
|
|
|
|
<p/> For example, the following swaps the input stream’s <tt>a</tt> and
|
|
<tt>i</tt> fields, modifies <tt>y</tt>, and drops the rest:
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr --opprint put '
|
|
$* = {
|
|
"a": $i,
|
|
"i": $a,
|
|
"y": $y * 10,
|
|
}
|
|
' data/small
|
|
a i y
|
|
1 pan 7.268029
|
|
2 eks 5.221511
|
|
3 wye 3.383185
|
|
4 eks 1.341887
|
|
5 wye 8.636245
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
|
|
<p/>Likewise, you can assign map literals to out-of-stream variables or local variables;
|
|
pass them as arguments to user-defined functions, return them from functions, and so on:
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr --from ../c/s put '
|
|
func f(map m): map {
|
|
m["x"] *= 200;
|
|
return m;
|
|
}
|
|
$* = f({"a": $a, "x": $x});
|
|
'
|
|
a=pan,x=69.358029
|
|
a=eks,x=151.735993
|
|
a=wye,x=40.920661
|
|
a=eks,x=76.279879
|
|
a=wye,x=114.657784
|
|
a=zee,x=105.425232
|
|
a=eks,x=122.356812
|
|
a=zee,x=119.710802
|
|
a=hat,x=6.288375
|
|
a=pan,x=100.525201
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
|
|
<p/> Like out-of-stream and local variables, map literals can be multi-level:
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr --from data/small put -q '
|
|
begin {
|
|
@o = {
|
|
"nrec": 0,
|
|
"nkey": {"numeric":0, "non-numeric":0},
|
|
};
|
|
}
|
|
@o["nrec"] += 1;
|
|
for (k, v in $*) {
|
|
if (is_numeric(v)) {
|
|
@o["nkey"]["numeric"] += 1;
|
|
} else {
|
|
@o["nkey"]["non-numeric"] += 1;
|
|
}
|
|
}
|
|
end {
|
|
dump @o;
|
|
}
|
|
'
|
|
{
|
|
"nrec": 5,
|
|
"nkey": {
|
|
"numeric": 15,
|
|
"non-numeric": 10
|
|
}
|
|
}
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
|
|
<p/>By default, map-valued expressions are dumped using JSON formatting. If you
|
|
use <tt>dump</tt> to print a hashmap with integer keys and you don’t want
|
|
them double-quoted (JSON-style) then you can use <tt>mlr put
|
|
--jknquoteint</tt>. See also <tt>mlr put --help</tt>.
|
|
|
|
<!-- ================================================================ -->
|
|
<a id="Type-checking"/><h2>Type-checking</h2>
|
|
|
|
<p/> Miller’s <tt>put</tt>/<tt>filter</tt> DSLs support two optional
|
|
kinds of type-checking. One is inline <b>type-tests</b> and
|
|
<b>type-assertions</b> within expressions. The other is <b>type
|
|
declarations</b> for assignments to local variables, binding of arguments to
|
|
user-defined functions, and return values from user-defined functions, These
|
|
are discussed in the following subsections.
|
|
|
|
<p/> Use of type-checking is entirely up to you: omit it if you want
|
|
flexibility with heterogeneous data; use it if you want to help catch
|
|
misspellings in your DSL code or unexpected irregularities in your input data.
|
|
|
|
<!-- ================================================================ -->
|
|
<a id="Type-test_and_type-assertion_expressions"/><h3>Type-test and type-assertion expressions</h3>
|
|
|
|
<p/> The following <tt>is...</tt> functions take a value and return a boolean
|
|
indicating whether the argument is of the indicated type. The
|
|
<tt>assert_...</tt> functions return their argument if it is of the specified
|
|
type, and cause a fatal error otherwise:
|
|
|
|
<table>
|
|
<tr>
|
|
<td>
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr -F | grep ^is
|
|
is_absent
|
|
is_bool
|
|
is_boolean
|
|
is_empty
|
|
is_empty_map
|
|
is_float
|
|
is_int
|
|
is_map
|
|
is_nonempty_map
|
|
is_not_empty
|
|
is_not_map
|
|
is_not_null
|
|
is_null
|
|
is_numeric
|
|
is_present
|
|
is_string
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
</td>
|
|
<td>
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr -F | grep ^assert
|
|
asserting_absent
|
|
asserting_bool
|
|
asserting_boolean
|
|
asserting_empty
|
|
asserting_empty_map
|
|
asserting_float
|
|
asserting_int
|
|
asserting_map
|
|
asserting_nonempty_map
|
|
asserting_not_empty
|
|
asserting_not_map
|
|
asserting_not_null
|
|
asserting_null
|
|
asserting_numeric
|
|
asserting_present
|
|
asserting_string
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
</td>
|
|
</tr>
|
|
</table>
|
|
|
|
<p/> Please see the <a href="cookbook.html#Data-cleaning_examples">Cookbook part 1</a> for examples
|
|
of how to use these.
|
|
|
|
<!-- ================================================================ -->
|
|
<a id="Type-declarations_for_local_variables,_function_parameter,_and_function_return_values"/><h3>Type-declarations for local variables, function parameter, and function return values</h3>
|
|
|
|
<p/> Local variables can be defined either untyped as in <tt>x = 1</tt>, or
|
|
typed as in <tt>int x = 1</tt>. Types include <b>var</b> (explicitly untyped),
|
|
<b>int</b>, <b>float</b>, <b>num</b> (int or float), <b>str</b>, <b>bool</b>,
|
|
and <b>map</b>. These optional type declarations are enforced at the time
|
|
values are assigned to variables: whether at the initial value assignment as in
|
|
<tt>int x = 1</tt> or in any subsequent assignments to the same variable
|
|
farther down in the scope.
|
|
|
|
<p/> The reason for <tt>num</tt> is that <tt>int</tt> and <tt>float</tt> typedecls are very precise:
|
|
|
|
<div class="pokipanel">
|
|
<pre>
|
|
float a = 0; # Runtime error since 0 is int not float
|
|
int b = 1.0; # Runtime error since 1.0 is float not int
|
|
num c = 0; # OK
|
|
num d = 1.0; # OK
|
|
</pre>
|
|
</div>
|
|
|
|
<p/> A suggestion is to use <tt>num</tt> for general use when you want numeric
|
|
content, and use <tt>int</tt> when you genuinely want integer-only values, e.g.
|
|
in loop indices or map keys (since Miller map keys can only be strings or
|
|
ints).
|
|
|
|
<p/> The <tt>var</tt> type declaration indicates no type restrictions, e.g.
|
|
<tt>var x = 1</tt> has the same type restrictions on <tt>x</tt> as <tt>x =
|
|
1</tt>. The difference is in intentional shadowing: if you have <tt>x = 1</tt>
|
|
in outer scope and <tt>x = 2</tt> in inner scope (e.g. within a for-loop or an
|
|
if-statement) then outer-scope <tt>x</tt> has value 2 after the second
|
|
assignment. But if you have <tt>var x = 2</tt> in the inner scope, then you
|
|
are declaring a variable scoped to the inner block.) For example:
|
|
|
|
<div class="pokipanel">
|
|
<pre>
|
|
x = 1;
|
|
if (NR == 4) {
|
|
x = 2; # Refers to outer-scope x: value changes from 1 to 2.
|
|
}
|
|
print x; # Value of x is now two
|
|
</pre>
|
|
</div>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
x = 1;
|
|
if (NR == 4) {
|
|
var x = 2; # Defines a new inner-scope x with value 2
|
|
}
|
|
print x; # Value of this x is still 1
|
|
</pre>
|
|
</div>
|
|
|
|
<p/> Likewise function arguments can optionally be typed, with type enforced
|
|
when the function is called:
|
|
|
|
<div class="pokipanel">
|
|
<pre>
|
|
func f(map m, int i) {
|
|
...
|
|
}
|
|
$a = f({1:2, 3:4}, 5); # OK
|
|
$b = f({1:2, 3:4}, "abc"); # Runtime error
|
|
$c = f({1:2, 3:4}, $x); # Runtime error for records with non-integer field named x
|
|
if (NR == 4) {
|
|
var x = 2; # Defines a new inner-scope x with value 2
|
|
}
|
|
print x; # Value of this x is still 1
|
|
</pre>
|
|
</div>
|
|
|
|
<p/> Thirdly, function return values can be type-checked at the point of
|
|
<tt>return</tt> using <tt>:</tt> and a typedecl after the parameter list:
|
|
|
|
<div class="pokipanel">
|
|
<pre>
|
|
func f(map m, int i): bool {
|
|
...
|
|
...
|
|
if (...) {
|
|
return "false"; # Runtime error if this branch is taken
|
|
}
|
|
...
|
|
...
|
|
if (...) {
|
|
return retval; # Runtime error if this function doesn't have an in-scope
|
|
# boolean-valued variable named retval
|
|
}
|
|
...
|
|
...
|
|
# In Miller if your functions don't explicitly return a value, they return absent-null.
|
|
# So it would also be a runtime error on reaching the end of this function without
|
|
# an explicit return statement.
|
|
}
|
|
</pre>
|
|
</div>
|
|
|
|
<!-- ================================================================ -->
|
|
<a id="Null_data:_empty_and_absent"/><h2>Null data: empty and absent</h2>
|
|
|
|
<p/> Please see
|
|
<a href="reference.html#Null_data:_empty_and_absent">here</a>.
|
|
|
|
<!-- ================================================================ -->
|
|
<a id="Aggregate_variable_assignments"/><h2>Aggregate variable assignments</h2>
|
|
|
|
<p/>There are three remaining kinds of variable assignment using out-of-stream
|
|
variables, the last two of which use the <tt>$*</tt> syntax:
|
|
<ul>
|
|
<li/> Recursive copy of out-of-stream variables
|
|
<li/> Out-of-stream variable assigned to full stream record
|
|
<li/> Full stream record assigned to an out-of-stream variable
|
|
</ul>
|
|
|
|
<p/> Example recursive copy of out-of-stream variables:
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr --opprint put -q '@v["sum"] += $x; @v["count"] += 1; end{dump; @w = @v; dump}' data/small
|
|
{
|
|
"v": {
|
|
"sum": 2.264762,
|
|
"count": 5
|
|
}
|
|
}
|
|
{
|
|
"v": {
|
|
"sum": 2.264762,
|
|
"count": 5
|
|
},
|
|
"w": {
|
|
"sum": 2.264762,
|
|
"count": 5
|
|
}
|
|
}
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
|
|
<p/>Example of out-of-stream variable assigned to full stream record, where the 2nd record is stashed, and the 4th record is overwritten with that:
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr put 'NR == 2 {@keep = $*}; NR == 4 {$* = @keep}' data/small
|
|
a=pan,b=pan,i=1,x=0.3467901443380824,y=0.7268028627434533
|
|
a=eks,b=pan,i=2,x=0.7586799647899636,y=0.5221511083334797
|
|
a=wye,b=wye,i=3,x=0.20460330576630303,y=0.33831852551664776
|
|
a=eks,b=pan,i=2,x=0.7586799647899636,y=0.5221511083334797
|
|
a=wye,b=pan,i=5,x=0.5732889198020006,y=0.8636244699032729
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
|
|
<p/>Example of full stream record assigned to an out-of-stream variable, finding
|
|
the record for which the <tt>x</tt> field has the largest value in the input
|
|
stream:
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ cat data/small
|
|
a=pan,b=pan,i=1,x=0.3467901443380824,y=0.7268028627434533
|
|
a=eks,b=pan,i=2,x=0.7586799647899636,y=0.5221511083334797
|
|
a=wye,b=wye,i=3,x=0.20460330576630303,y=0.33831852551664776
|
|
a=eks,b=wye,i=4,x=0.38139939387114097,y=0.13418874328430463
|
|
a=wye,b=pan,i=5,x=0.5732889198020006,y=0.8636244699032729
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr --opprint put -q 'is_null(@xmax) || $x > @xmax {@xmax=$x; @recmax=$*}; end {emit @recmax}' data/small
|
|
a b i x y
|
|
eks pan 2 0.7586799647899636 0.5221511083334797
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
|
|
<!-- ================================================================ -->
|
|
<a id="Keywords_for_filter_and_put"/><h2>Keywords for filter and put</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr --help-all-keywords
|
|
all: used in "emit", "emitp", and "unset" as a synonym for @*
|
|
|
|
begin: defines a block of statements to be executed before input records
|
|
are ingested. The body statements must be wrapped in curly braces.
|
|
Example: 'begin { @count = 0 }'
|
|
|
|
bool: declares a boolean local variable in the current curly-braced scope.
|
|
Type-checking happens at assignment: 'bool b = 1' is an error.
|
|
|
|
break: causes execution to continue after the body of the current
|
|
for/while/do-while loop.
|
|
|
|
call: used for invoking a user-defined subroutine.
|
|
Example: 'subr s(k,v) { print k . " is " . v} call s("a", $a)'
|
|
|
|
continue: causes execution to skip the remaining statements in the body of
|
|
the current for/while/do-while loop. For-loop increments are still applied.
|
|
|
|
do: with "while", introduces a do-while loop. The body statements must be wrapped
|
|
in curly braces.
|
|
|
|
dump: prints all currently defined out-of-stream variables immediately
|
|
to stdout as JSON.
|
|
|
|
With >, >>, or |, the data do not become part of the output record stream but
|
|
are instead redirected.
|
|
|
|
The > and >> are for write and append, as in the shell, but (as with awk) the
|
|
file-overwrite for > is on first write, not per record. The | is for piping to
|
|
a process which will process the data. There will be one open file for each
|
|
distinct file name (for > and >>) or one subordinate process for each distinct
|
|
value of the piped-to command (for |). Output-formatting flags are taken from
|
|
the main command line.
|
|
|
|
Example: mlr --from f.dat put -q '@v[NR]=$*; end { dump }'
|
|
Example: mlr --from f.dat put -q '@v[NR]=$*; end { dump > "mytap.dat"}'
|
|
Example: mlr --from f.dat put -q '@v[NR]=$*; end { dump >> "mytap.dat"}'
|
|
Example: mlr --from f.dat put -q '@v[NR]=$*; end { dump | "jq .[]"}'
|
|
|
|
edump: prints all currently defined out-of-stream variables immediately
|
|
to stderr as JSON.
|
|
|
|
Example: mlr --from f.dat put -q '@v[NR]=$*; end { edump }'
|
|
|
|
elif: the way Miller spells "else if". The body statements must be wrapped
|
|
in curly braces.
|
|
|
|
else: terminates an if/elif/elif chain. The body statements must be wrapped
|
|
in curly braces.
|
|
|
|
emit: inserts an out-of-stream variable into the output record stream. Hashmap
|
|
indices present in the data but not slotted by emit arguments are not output.
|
|
|
|
With >, >>, or |, the data do not become part of the output record stream but
|
|
are instead redirected.
|
|
|
|
The > and >> are for write and append, as in the shell, but (as with awk) the
|
|
file-overwrite for > is on first write, not per record. The | is for piping to
|
|
a process which will process the data. There will be one open file for each
|
|
distinct file name (for > and >>) or one subordinate process for each distinct
|
|
value of the piped-to command (for |). Output-formatting flags are taken from
|
|
the main command line.
|
|
|
|
You can use any of the output-format command-line flags, e.g. --ocsv, --ofs,
|
|
etc., to control the format of the output if the output is redirected. See also mlr -h.
|
|
|
|
Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emit @sums'
|
|
Example: mlr --from f.dat put --ojson '@sums[$a][$b]+=$x; emit > "tap-".$a.$b.".dat", @sums'
|
|
Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emit @sums, "index1", "index2"'
|
|
Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emit @*, "index1", "index2"'
|
|
Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emit > "mytap.dat", @*, "index1", "index2"'
|
|
Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emit >> "mytap.dat", @*, "index1", "index2"'
|
|
Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emit | "gzip > mytap.dat.gz", @*, "index1", "index2"'
|
|
Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emit > stderr, @*, "index1", "index2"'
|
|
Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emit | "grep somepattern", @*, "index1", "index2"'
|
|
|
|
Please see http://johnkerl.org/miller/doc for more information.
|
|
|
|
emitf: inserts non-indexed out-of-stream variable(s) side-by-side into the
|
|
output record stream.
|
|
|
|
With >, >>, or |, the data do not become part of the output record stream but
|
|
are instead redirected.
|
|
|
|
The > and >> are for write and append, as in the shell, but (as with awk) the
|
|
file-overwrite for > is on first write, not per record. The | is for piping to
|
|
a process which will process the data. There will be one open file for each
|
|
distinct file name (for > and >>) or one subordinate process for each distinct
|
|
value of the piped-to command (for |). Output-formatting flags are taken from
|
|
the main command line.
|
|
|
|
You can use any of the output-format command-line flags, e.g. --ocsv, --ofs,
|
|
etc., to control the format of the output if the output is redirected. See also mlr -h.
|
|
|
|
Example: mlr --from f.dat put '@a=$i;@b+=$x;@c+=$y; emitf @a'
|
|
Example: mlr --from f.dat put --oxtab '@a=$i;@b+=$x;@c+=$y; emitf > "tap-".$i.".dat", @a'
|
|
Example: mlr --from f.dat put '@a=$i;@b+=$x;@c+=$y; emitf @a, @b, @c'
|
|
Example: mlr --from f.dat put '@a=$i;@b+=$x;@c+=$y; emitf > "mytap.dat", @a, @b, @c'
|
|
Example: mlr --from f.dat put '@a=$i;@b+=$x;@c+=$y; emitf >> "mytap.dat", @a, @b, @c'
|
|
Example: mlr --from f.dat put '@a=$i;@b+=$x;@c+=$y; emitf > stderr, @a, @b, @c'
|
|
Example: mlr --from f.dat put '@a=$i;@b+=$x;@c+=$y; emitf | "grep somepattern", @a, @b, @c'
|
|
Example: mlr --from f.dat put '@a=$i;@b+=$x;@c+=$y; emitf | "grep somepattern > mytap.dat", @a, @b, @c'
|
|
|
|
Please see http://johnkerl.org/miller/doc for more information.
|
|
|
|
emitp: inserts an out-of-stream variable into the output record stream.
|
|
Hashmap indices present in the data but not slotted by emitp arguments are
|
|
output concatenated with ":".
|
|
|
|
With >, >>, or |, the data do not become part of the output record stream but
|
|
are instead redirected.
|
|
|
|
The > and >> are for write and append, as in the shell, but (as with awk) the
|
|
file-overwrite for > is on first write, not per record. The | is for piping to
|
|
a process which will process the data. There will be one open file for each
|
|
distinct file name (for > and >>) or one subordinate process for each distinct
|
|
value of the piped-to command (for |). Output-formatting flags are taken from
|
|
the main command line.
|
|
|
|
You can use any of the output-format command-line flags, e.g. --ocsv, --ofs,
|
|
etc., to control the format of the output if the output is redirected. See also mlr -h.
|
|
|
|
Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emitp @sums'
|
|
Example: mlr --from f.dat put --opprint '@sums[$a][$b]+=$x; emitp > "tap-".$a.$b.".dat", @sums'
|
|
Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emitp @sums, "index1", "index2"'
|
|
Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emitp @*, "index1", "index2"'
|
|
Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emitp > "mytap.dat", @*, "index1", "index2"'
|
|
Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emitp >> "mytap.dat", @*, "index1", "index2"'
|
|
Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emitp | "gzip > mytap.dat.gz", @*, "index1", "index2"'
|
|
Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emitp > stderr, @*, "index1", "index2"'
|
|
Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emitp | "grep somepattern", @*, "index1", "index2"'
|
|
|
|
Please see http://johnkerl.org/miller/doc for more information.
|
|
|
|
end: defines a block of statements to be executed after input records
|
|
are ingested. The body statements must be wrapped in curly braces.
|
|
Example: 'end { emit @count }'
|
|
Example: 'end { eprint "Final count is " . @count }'
|
|
|
|
eprint: prints expression immediately to stderr.
|
|
Example: mlr --from f.dat put -q 'eprint "The sum of x and y is ".($x+$y)'
|
|
Example: mlr --from f.dat put -q 'for (k, v in $*) { eprint k . " => " . v }'
|
|
Example: mlr --from f.dat put '(NR % 1000 == 0) { eprint "Checkpoint ".NR}'
|
|
|
|
eprintn: prints expression immediately to stderr, without trailing newline.
|
|
Example: mlr --from f.dat put -q 'eprintn "The sum of x and y is ".($x+$y); eprint ""'
|
|
|
|
false: the boolean literal value.
|
|
|
|
filter: includes/excludes the record in the output record stream.
|
|
|
|
Example: mlr --from f.dat put 'filter (NR == 2 || $x > 5.4)'
|
|
|
|
Instead of put with 'filter false' you can simply use put -q. The following
|
|
uses the input record to accumulate data but only prints the running sum
|
|
without printing the input record:
|
|
|
|
Example: mlr --from f.dat put -q '@running_sum += $x * $y; emit @running_sum'
|
|
|
|
float: declares a floating-point local variable in the current curly-braced scope.
|
|
Type-checking happens at assignment: 'float x = 0' is an error.
|
|
|
|
for: defines a for-loop using one of three styles. The body statements must
|
|
be wrapped in curly braces.
|
|
For-loop over stream record:
|
|
Example: 'for (k, v in $*) { ... }'
|
|
For-loop over out-of-stream variables:
|
|
Example: 'for (k, v in @counts) { ... }'
|
|
Example: 'for ((k1, k2), v in @counts) { ... }'
|
|
Example: 'for ((k1, k2, k3), v in @*) { ... }'
|
|
C-style for-loop:
|
|
Example: 'for (var i = 0, var b = 1; i < 10; i += 1, b *= 2) { ... }'
|
|
|
|
func: used for defining a user-defined function.
|
|
Example: 'func f(a,b) { return sqrt(a**2+b**2)} $d = f($x, $y)'
|
|
|
|
if: starts an if/elif/elif chain. The body statements must be wrapped
|
|
in curly braces.
|
|
|
|
in: used in for-loops over stream records or out-of-stream variables.
|
|
|
|
int: declares an integer local variable in the current curly-braced scope.
|
|
Type-checking happens at assignment: 'int x = 0.0' is an error.
|
|
|
|
map: declares an map-valued local variable in the current curly-braced scope.
|
|
Type-checking happens at assignment: 'map b = 0' is an error. map b = {} is
|
|
always OK. map b = a is OK or not depending on whether a is a map.
|
|
|
|
num: declares an int/float local variable in the current curly-braced scope.
|
|
Type-checking happens at assignment: 'num b = true' is an error.
|
|
|
|
print: prints expression immediately to stdout.
|
|
Example: mlr --from f.dat put -q 'print "The sum of x and y is ".($x+$y)'
|
|
Example: mlr --from f.dat put -q 'for (k, v in $*) { print k . " => " . v }'
|
|
Example: mlr --from f.dat put '(NR % 1000 == 0) { print > stderr, "Checkpoint ".NR}'
|
|
|
|
printn: prints expression immediately to stdout, without trailing newline.
|
|
Example: mlr --from f.dat put -q 'printn "."; end { print "" }'
|
|
|
|
return: specifies the return value from a user-defined function.
|
|
Omitted return statements (including via if-branches) result in an absent-null
|
|
return value, which in turns results in a skipped assignment to an LHS.
|
|
|
|
stderr: Used for tee, emit, emitf, emitp, print, and dump in place of filename
|
|
to print to standard error.
|
|
|
|
stdout: Used for tee, emit, emitf, emitp, print, and dump in place of filename
|
|
to print to standard output.
|
|
|
|
str: declares a string local variable in the current curly-braced scope.
|
|
Type-checking happens at assignment.
|
|
|
|
subr: used for defining a subroutine.
|
|
Example: 'subr s(k,v) { print k . " is " . v} call s("a", $a)'
|
|
|
|
tee: prints the current record to specified file.
|
|
This is an immediate print to the specified file (except for pprint format
|
|
which of course waits until the end of the input stream to format all output).
|
|
|
|
The > and >> are for write and append, as in the shell, but (as with awk) the
|
|
file-overwrite for > is on first write, not per record. The | is for piping to
|
|
a process which will process the data. There will be one open file for each
|
|
distinct file name (for > and >>) or one subordinate process for each distinct
|
|
value of the piped-to command (for |). Output-formatting flags are taken from
|
|
the main command line.
|
|
|
|
You can use any of the output-format command-line flags, e.g. --ocsv, --ofs,
|
|
etc., to control the format of the output. See also mlr -h.
|
|
|
|
emit with redirect and tee with redirect are identical, except tee can only
|
|
output $*.
|
|
|
|
Example: mlr --from f.dat put 'tee > "/tmp/data-".$a, $*'
|
|
Example: mlr --from f.dat put 'tee > "/tmp/data-".$a, mapexcept($*, "a")'
|
|
Example: mlr --from f.dat put 'tee >> "/tmp/data-".$a.$b, $*'
|
|
Example: mlr --from f.dat put 'tee > stderr, $*'
|
|
Example: mlr --from f.dat put -q 'tee | "tr [a-z\] [A-Z\]", $*'
|
|
Example: mlr --from f.dat put -q 'tee | "tr [a-z\] [A-Z\] > /tmp/data-".$a, $*'
|
|
Example: mlr --from f.dat put -q 'tee | "gzip > /tmp/data-".$a.".gz", $*'
|
|
Example: mlr --from f.dat put -q --ojson 'tee | "gzip > /tmp/data-".$a.".gz", $*'
|
|
|
|
true: the boolean literal value.
|
|
|
|
unset: clears field(s) from the current record, or an out-of-stream or local variable.
|
|
|
|
Example: mlr --from f.dat put 'unset $x'
|
|
Example: mlr --from f.dat put 'unset $*'
|
|
Example: mlr --from f.dat put 'for (k, v in $*) { if (k =~ "a.*") { unset $[k] } }'
|
|
Example: mlr --from f.dat put '...; unset @sums'
|
|
Example: mlr --from f.dat put '...; unset @sums["green"]'
|
|
Example: mlr --from f.dat put '...; unset @*'
|
|
|
|
var: declares an untyped local variable in the current curly-braced scope.
|
|
Examples: 'var a=1', 'var xyz=""'
|
|
|
|
while: introduces a while loop, or with "do", introduces a do-while loop.
|
|
The body statements must be wrapped in curly braces.
|
|
|
|
E: the mathematical constant.
|
|
|
|
ENV: access to environment variables by name, e.g. '$home = ENV["HOME"]'
|
|
|
|
FILENAME: evaluates to the name of the current file being processed.
|
|
|
|
FILENUM: evaluates to the number of the current file being processed,
|
|
starting with 1.
|
|
|
|
FNR: evaluates to the number of the current record within the current file
|
|
being processed, starting with 1. Resets at the start of each file.
|
|
|
|
IFS: evaluates to the input field separator from the command line.
|
|
|
|
IPS: evaluates to the input pair separator from the command line.
|
|
|
|
IRS: evaluates to the input record separator from the command line,
|
|
or to LF or CRLF from the input data if in autodetect mode (which is
|
|
the default).
|
|
|
|
NF: evaluates to the number of fields in the current record.
|
|
|
|
NR: evaluates to the number of the current record over all files
|
|
being processed, starting with 1. Does not reset at the start of each file.
|
|
|
|
OFS: evaluates to the output field separator from the command line.
|
|
|
|
OPS: evaluates to the output pair separator from the command line.
|
|
|
|
ORS: evaluates to the output record separator from the command line,
|
|
or to LF or CRLF from the input data if in autodetect mode (which is
|
|
the default).
|
|
|
|
PI: the mathematical constant.
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
|
|
</div>
|
|
<!-- ================================================================ -->
|
|
<a id="Operator_precedence"/><h1>Operator precedence</h1>
|
|
<button style="font-weight:bold;color:maroon;border:0" padding=0 onclick="toggle_by_name('section_toggle_operator_precedence');" href="javascript:;">Toggle section visibility</button>
|
|
<div id="section_toggle_operator_precedence" style="display: block">
|
|
|
|
<p/>Operators are listed in order of decreasing precedence, highest first.
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
Operators Associativity
|
|
--------- -------------
|
|
() left to right
|
|
** right to left
|
|
! ~ unary+ unary- & right to left
|
|
binary* / // % left to right
|
|
binary+ binary- . left to right
|
|
<< >> left to right
|
|
& left to right
|
|
^ left to right
|
|
| left to right
|
|
< <= > >= left to right
|
|
== != =~ !=~ left to right
|
|
&& left to right
|
|
^^ left to right
|
|
|| left to right
|
|
? : right to left
|
|
= N/A for Miller (there is no $a=$b=$c)
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
|
|
</div>
|
|
<!-- ================================================================ -->
|
|
<a id="Operator_and_function_semantics"/><h1>Operator and function semantics</h1>
|
|
<button style="font-weight:bold;color:maroon;border:0" padding=0 onclick="toggle_by_name('section_toggle_operator_and_function_semantics');" href="javascript:;">Toggle section visibility</button>
|
|
<div id="section_toggle_operator_and_function_semantics" style="display: block">
|
|
|
|
<ul>
|
|
|
|
<li/> Functions are in general pass-throughs straight to the system-standard C
|
|
library.
|
|
|
|
<li/> The <tt>min</tt> and <tt>max</tt> functions are different from other
|
|
multi-argument functions which return null if any of their inputs are null: for
|
|
<tt>min</tt> and <tt>max</tt>, by contrast, if one argument is absent-null, the other
|
|
is returned. Empty-null loses min or max against numeric or boolean; empty-null
|
|
is less than any other string.
|
|
|
|
<li/> Symmetrically with respect to the bitwise OR, XOR, and AND operators
|
|
<tt>|</tt>, <tt>^</tt>, <tt>&</tt>, Miller has logical operators
|
|
<tt>||</tt>, <tt>^^</tt>, <tt>&&</tt>: the logical XOR not existing in
|
|
C.
|
|
|
|
<li/> The exponentiation operator <tt>**</tt> is familiar from many languages.
|
|
|
|
<li/> The regex-match and regex-not-match operators <tt>=~</tt> and
|
|
<tt>!=~</tt> are similar to those in Ruby and Perl.
|
|
|
|
</ul>
|
|
|
|
</div>
|
|
<!-- ================================================================ -->
|
|
<a id="Control_structures"/><h1>Control structures</h1>
|
|
<button style="font-weight:bold;color:maroon;border:0" padding=0 onclick="toggle_by_name('section_toggle_control_structures');" href="javascript:;">Toggle section visibility</button>
|
|
<div id="section_toggle_control_structures" style="display: block">
|
|
|
|
<!-- ================================================================ -->
|
|
<a id="Pattern-action_blocks"/><h2>Pattern-action blocks</h2>
|
|
|
|
<p/>These are reminiscent of <tt>awk</tt> syntax. They can be used to allow
|
|
assignments to be done only when appropriate — e.g. for math-function
|
|
domain restrictions, regex-matching, and so on:
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr cat data/put-gating-example-1.dkvp
|
|
x=-1
|
|
x=0
|
|
x=1
|
|
x=2
|
|
x=3
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr put '$x > 0.0 { $y = log10($x); $z = sqrt($y) }' data/put-gating-example-1.dkvp
|
|
x=-1
|
|
x=0
|
|
x=1,y=0.000000,z=0.000000
|
|
x=2,y=0.301030,z=0.548662
|
|
x=3,y=0.477121,z=0.690740
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr cat data/put-gating-example-2.dkvp
|
|
a=abc_123
|
|
a=some other name
|
|
a=xyz_789
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr put '$a =~ "([a-z]+)_([0-9]+)" { $b = "left_\1"; $c = "right_\2" }' data/put-gating-example-2.dkvp
|
|
a=abc_123,b=left_abc,c=right_123
|
|
a=some other name
|
|
a=xyz_789,b=left_xyz,c=right_789
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
|
|
<p/>This produces heteregenous output which Miller, of course, has no problems
|
|
with (see <a href="record-heterogeneity.html">Record-heterogeneity</a>). But if you
|
|
want homogeneous output, the curly braces can be replaced with a semicolon
|
|
between the expression and the body statements. This causes <tt>put</tt> to
|
|
evaluate the boolean expression (along with any side effects, namely,
|
|
regex-captures <tt>\1</tt>, <tt>\2</tt>, etc.) but doesn’t use it as a
|
|
criterion for whether subsequent assignments should be executed. Instead,
|
|
subsequent assignments are done unconditionally:
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr put '$x > 0.0; $y = log10($x); $z = sqrt($y)' data/put-gating-example-1.dkvp
|
|
x=-1,y=nan,z=nan
|
|
x=0,y=-inf,z=nan
|
|
x=1,y=0.000000,z=0.000000
|
|
x=2,y=0.301030,z=0.548662
|
|
x=3,y=0.477121,z=0.690740
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr put '$a =~ "([a-z]+)_([0-9]+)"; $b = "left_\1"; $c = "right_\2"' data/put-gating-example-2.dkvp
|
|
a=abc_123,b=left_abc,c=right_123
|
|
a=some other name,b=left_,c=right_
|
|
a=xyz_789,b=left_xyz,c=right_789
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
|
|
<!-- ================================================================ -->
|
|
<a id="If-statements"/><h2>If-statements</h2>
|
|
|
|
<p/>These are again reminiscent of <tt>awk</tt>. Pattern-action blocks are a special case of <tt>if</tt> with no
|
|
<tt>elif</tt> or <tt>else</tt> blocks, no <tt>if</tt> keyword, and parentheses optional around the boolean expression:
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
mlr put 'NR == 4 {$foo = "bar"}'
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
mlr put 'if (NR == 4) {$foo = "bar"}'
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
|
|
<p/>Compound statements use <tt>elif</tt> (rather than <tt>elsif</tt> or <tt>else if</tt>):
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
mlr put '
|
|
if (NR == 2) {
|
|
...
|
|
} elif (NR ==4) {
|
|
...
|
|
} elif (NR ==6) {
|
|
...
|
|
} else {
|
|
...
|
|
}
|
|
'
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
|
|
<!-- ================================================================ -->
|
|
<a id="While_and_do-while_loops"/><h2>While and do-while loops</h2>
|
|
|
|
<p/>Miller’s <tt>while</tt> and <tt>do-while</tt> are unsurprising in
|
|
comparison to various languages, as are <tt>break</tt> and <tt>continue</tt>:
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ echo x=1,y=2 | mlr put '
|
|
while (NF < 10) {
|
|
$[NF+1] = ""
|
|
}
|
|
$foo = "bar"
|
|
'
|
|
x=1,y=2,3=,4=,5=,6=,7=,8=,9=,10=,foo=bar
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ echo x=1,y=2 | mlr put '
|
|
do {
|
|
$[NF+1] = "";
|
|
if (NF == 5) {
|
|
break
|
|
}
|
|
} while (NF < 10);
|
|
$foo = "bar"
|
|
'
|
|
x=1,y=2,3=,4=,5=,foo=bar
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
|
|
<p/> A <tt>break</tt> or <tt>continue</tt> within nested conditional blocks or
|
|
if-statements will, of course, propagate to the innermost loop enclosing them,
|
|
if any. A <tt>break</tt> or <tt>continue</tt> outside a loop is a syntax error
|
|
that will be flagged as soon as the expression is parsed, before any input
|
|
records are ingested.
|
|
|
|
<p/> The existence of <tt>while</tt>, <tt>do-while</tt>, and <tt>for</tt> loops
|
|
in Miller’s DSL means that you can create infinite-loop scenarios
|
|
inadvertently. In particular, please recall that DSL statements are executed
|
|
once if in <tt>begin</tt> or <tt>end</tt> blocks, and once <i>per record</i>
|
|
otherwise. For example, <b><tt>while (NR < 10)</tt> will never terminate as
|
|
<tt>NR</tt> is only incremented between records</b>.
|
|
|
|
<!-- ================================================================ -->
|
|
<a id="For-loops"/><h2>For-loops</h2>
|
|
|
|
<p/>While Miller’s <tt>while</tt> and <tt>do-while</tt> statements are
|
|
much as in many other languages, <tt>for</tt> loops are more idiosyncratic to
|
|
Miller. They are loops over key-value pairs, whether in stream records,
|
|
out-of-stream variables, local variables, or map-literals: more reminiscent of
|
|
<tt>foreach</tt>, as in (for example) PHP. There are <b>for-loops over map
|
|
keys</b> and <b>for-loops over key-value tuples</b>. Additionally, Miller has a
|
|
<b>C-style triple-for loop</b> with initialize, test, and update statements.
|
|
|
|
<p/>As with <tt>while</tt> and <tt>do-while</tt>, a <tt>break</tt> or
|
|
<tt>continue</tt> within nested control structures will propagate to the
|
|
innermost loop enclosing them, if any, and a <tt>break</tt> or
|
|
<tt>continue</tt> outside a loop is a syntax error that will be flagged as soon
|
|
as the expression is parsed, before any input records are ingested.
|
|
|
|
<a id="Key-only_for-loops"/><h3>Key-only for-loops </h3>
|
|
|
|
<p/>The <tt>key</tt> variable is always bound to the <i>key</i> of key-value pairs:
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr --from data/small put '
|
|
print "NR = ".NR;
|
|
for (key in $*) {
|
|
value = $[key];
|
|
print " key:" . key . " value:".value;
|
|
}
|
|
|
|
'
|
|
NR = 1
|
|
key:a value:pan
|
|
key:b value:pan
|
|
key:i value:1
|
|
key:x value:0.346790
|
|
key:y value:0.726803
|
|
a=pan,b=pan,i=1,x=0.3467901443380824,y=0.7268028627434533
|
|
NR = 2
|
|
key:a value:eks
|
|
key:b value:pan
|
|
key:i value:2
|
|
key:x value:0.758680
|
|
key:y value:0.522151
|
|
a=eks,b=pan,i=2,x=0.7586799647899636,y=0.5221511083334797
|
|
NR = 3
|
|
key:a value:wye
|
|
key:b value:wye
|
|
key:i value:3
|
|
key:x value:0.204603
|
|
key:y value:0.338319
|
|
a=wye,b=wye,i=3,x=0.20460330576630303,y=0.33831852551664776
|
|
NR = 4
|
|
key:a value:eks
|
|
key:b value:wye
|
|
key:i value:4
|
|
key:x value:0.381399
|
|
key:y value:0.134189
|
|
a=eks,b=wye,i=4,x=0.38139939387114097,y=0.13418874328430463
|
|
NR = 5
|
|
key:a value:wye
|
|
key:b value:pan
|
|
key:i value:5
|
|
key:x value:0.573289
|
|
key:y value:0.863624
|
|
a=wye,b=pan,i=5,x=0.5732889198020006,y=0.8636244699032729
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr -n put '
|
|
end {
|
|
o = {1:2, 3:{4:5}};
|
|
for (key in o) {
|
|
print " key:" . key . " valuetype:" . typeof(o[key]);
|
|
}
|
|
}
|
|
'
|
|
key:1 valuetype:int
|
|
key:3 valuetype:map
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
|
|
<p/>Note that the value corresponding to a given key may be gotten as through a
|
|
<b>computed field name</b> using square brackets as in <tt>$[key]</tt> for
|
|
stream records, or by indexing the looped-over variable using square brackets.
|
|
|
|
<a id="Key-value_for-loops"/><h3>Key-value for-loops </h3>
|
|
|
|
<p/>Single-level keys may be gotten at using either <tt>for(k,v)</tt> or
|
|
<tt>for((k),v)</tt>; multi-level keys may be gotten at using
|
|
<tt>for((k1,k2,k3),v)</tt> and so on. The <tt>v</tt> variable will be bound to
|
|
to a scalar value (a string or a number) if the map stops at that level, or to
|
|
a map-valued variable if the map goes deeper. If the map isn’t deep
|
|
enough then the loop body won’t be executed.
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ cat data/for-srec-example.tbl
|
|
label1 label2 f1 f2 f3
|
|
blue green 100 240 350
|
|
red green 120 11 195
|
|
yellow blue 140 0 240
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr --pprint --from data/for-srec-example.tbl put '
|
|
$sum1 = $f1 + $f2 + $f3;
|
|
$sum2 = 0;
|
|
$sum3 = 0;
|
|
for (key, value in $*) {
|
|
if (key =~ "^f[0-9]+") {
|
|
$sum2 += value;
|
|
$sum3 += $[key];
|
|
}
|
|
}
|
|
'
|
|
label1 label2 f1 f2 f3 sum1 sum2 sum3
|
|
blue green 100 240 350 690 690 690
|
|
red green 120 11 195 326 326 326
|
|
yellow blue 140 0 240 380 380 380
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr --from data/small --opprint put 'for (k,v in $*) { $[k."_type"] = typeof(v) }'
|
|
a b i x y a_type b_type i_type x_type y_type
|
|
pan pan 1 0.3467901443380824 0.7268028627434533 string string int float float
|
|
eks pan 2 0.7586799647899636 0.5221511083334797 string string int float float
|
|
wye wye 3 0.20460330576630303 0.33831852551664776 string string int float float
|
|
eks wye 4 0.38139939387114097 0.13418874328430463 string string int float float
|
|
wye pan 5 0.5732889198020006 0.8636244699032729 string string int float float
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
|
|
<p/>Note that the value of the current field in the for-loop can be gotten either using the bound
|
|
variable <tt>value</tt>, or through a <b>computed field name</b> using square brackets as in <tt>$[key]</tt>.
|
|
|
|
<p/>Important note: to avoid inconsistent looping behavior in case you’re
|
|
setting new fields (and/or unsetting existing ones) while looping over the
|
|
record, <b>Miller makes a copy of the record before the loop: loop variables
|
|
are bound from the copy and all other reads/writes involve the record
|
|
itself</b>:
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr --from data/small --opprint put '
|
|
$sum1 = 0;
|
|
$sum2 = 0;
|
|
for (k,v in $*) {
|
|
if (is_numeric(v)) {
|
|
$sum1 +=v;
|
|
$sum2 += $[k];
|
|
}
|
|
}
|
|
'
|
|
a b i x y sum1 sum2
|
|
pan pan 1 0.3467901443380824 0.7268028627434533 2.073593 8.294372
|
|
eks pan 2 0.7586799647899636 0.5221511083334797 3.280831 13.123324
|
|
wye wye 3 0.20460330576630303 0.33831852551664776 3.542922 14.171687
|
|
eks wye 4 0.38139939387114097 0.13418874328430463 4.515588 18.062353
|
|
wye pan 5 0.5732889198020006 0.8636244699032729 6.436913 25.747654
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
|
|
It can be confusing to modify the stream record while iterating over a copy of it, so
|
|
instead you might find it simpler to use a local variable in the loop and only update
|
|
the stream record after the loop:
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr --from data/small --opprint put '
|
|
sum = 0;
|
|
for (k,v in $*) {
|
|
if (is_numeric(v)) {
|
|
sum += $[k];
|
|
}
|
|
}
|
|
$sum = sum
|
|
'
|
|
a b i x y sum
|
|
pan pan 1 0.3467901443380824 0.7268028627434533 2.073593
|
|
eks pan 2 0.7586799647899636 0.5221511083334797 3.280831
|
|
wye wye 3 0.20460330576630303 0.33831852551664776 3.542922
|
|
eks wye 4 0.38139939387114097 0.13418874328430463 4.515588
|
|
wye pan 5 0.5732889198020006 0.8636244699032729 6.436913
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
|
|
<p/>You can also start iterating on sub-hashmaps of an out-of-stream or local
|
|
variable; you can loop over nested keys; you can loop over all out-of-stream
|
|
variables. The bound variables are bound to a copy of the sub-hashmap as it
|
|
was before the loop started. The sub-hashmap is specified by square-bracketed
|
|
indices after <tt>in</tt>, and additional deeper indices are bound to loop
|
|
key-variables. The terminal values are bound to the loop value-variable
|
|
whenever the keys are not too shallow. The value-variable may refer to a
|
|
terminal (string, number) or it may be map-valued if the map goes deeper.
|
|
Example indexing is as follows:
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
# Parentheses are optional for single key:
|
|
for (k1, v in @a["b"]["c"]) { ... }
|
|
for ((k1), v in @a["b"]["c"]) { ... }
|
|
# Parentheses are required for multiple keys:
|
|
for ((k1, k2), v in @a["b"]["c"]) { ... } # Loop over subhashmap of a variable
|
|
for ((k1, k2, k3), v in @a["b"]["c"]) { ... } # Ditto
|
|
for ((k1, k2, k3), v in @a { ... } # Loop over variable starting from basename
|
|
for ((k1, k2, k3), v in @* { ... } # Loop over all variables (k1 is bound to basename)
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
|
|
<p/>That’s confusing in the abstract, so a concrete example is in order.
|
|
Suppose the out-of-stream variable <tt>@myvar</tt> is populated as follows:
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr -n put --jknquoteint -q '
|
|
begin {
|
|
@myvar = {
|
|
1: 2,
|
|
3: { 4 : 5 },
|
|
6: { 7: { 8: 9 } }
|
|
}
|
|
}
|
|
end { dump }
|
|
'
|
|
{
|
|
"myvar": {
|
|
1: 2,
|
|
3: {
|
|
4: 5
|
|
},
|
|
6: {
|
|
7: {
|
|
8: 9
|
|
}
|
|
}
|
|
}
|
|
}
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
|
|
<p/> Then we can get at various values as follows:
|
|
|
|
<table><tr><td>
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr -n put --jknquoteint -q '
|
|
begin {
|
|
@myvar = {
|
|
1: 2,
|
|
3: { 4 : 5 },
|
|
6: { 7: { 8: 9 } }
|
|
}
|
|
}
|
|
end {
|
|
for (k, v in @myvar) {
|
|
print
|
|
"key=" . k .
|
|
",valuetype=" . typeof(v);
|
|
}
|
|
}
|
|
'
|
|
key=1,valuetype=int
|
|
key=3,valuetype=map
|
|
key=6,valuetype=map
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
</td><td>
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr -n put --jknquoteint -q '
|
|
begin {
|
|
@myvar = {
|
|
1: 2,
|
|
3: { 4 : 5 },
|
|
6: { 7: { 8: 9 } }
|
|
}
|
|
}
|
|
end {
|
|
for ((k1, k2), v in @myvar) {
|
|
print
|
|
"key1=" . k1 .
|
|
",key2=" . k2 .
|
|
",valuetype=" . typeof(v);
|
|
}
|
|
}
|
|
'
|
|
key1=3,key2=4,valuetype=int
|
|
key1=6,key2=7,valuetype=map
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
</td><td>
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr -n put --jknquoteint -q '
|
|
begin {
|
|
@myvar = {
|
|
1: 2,
|
|
3: { 4 : 5 },
|
|
6: { 7: { 8: 9 } }
|
|
}
|
|
}
|
|
end {
|
|
for ((k1, k2), v in @myvar[6]) {
|
|
print
|
|
"key1=" . k1 .
|
|
",key2=" . k2 .
|
|
",valuetype=" . typeof(v);
|
|
}
|
|
}
|
|
'
|
|
key1=7,key2=8,valuetype=int
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
</td></tr></table>
|
|
|
|
<a id="C-style_triple-for_loops"/><h3>C-style triple-for loops</h3>
|
|
|
|
<p/> These are supported as follows:
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr --from data/small --opprint put '
|
|
num suma = 0;
|
|
for (a = 1; a <= NR; a += 1) {
|
|
suma += a;
|
|
}
|
|
$suma = suma;
|
|
'
|
|
a b i x y suma
|
|
pan pan 1 0.3467901443380824 0.7268028627434533 1
|
|
eks pan 2 0.7586799647899636 0.5221511083334797 3
|
|
wye wye 3 0.20460330576630303 0.33831852551664776 6
|
|
eks wye 4 0.38139939387114097 0.13418874328430463 10
|
|
wye pan 5 0.5732889198020006 0.8636244699032729 15
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr --from data/small --opprint put '
|
|
num suma = 0;
|
|
num sumb = 0;
|
|
for (num a = 1, num b = 1; a <= NR; a += 1, b *= 2) {
|
|
suma += a;
|
|
sumb += b;
|
|
}
|
|
$suma = suma;
|
|
$sumb = sumb;
|
|
'
|
|
a b i x y suma sumb
|
|
pan pan 1 0.3467901443380824 0.7268028627434533 1 1
|
|
eks pan 2 0.7586799647899636 0.5221511083334797 3 3
|
|
wye wye 3 0.20460330576630303 0.33831852551664776 6 7
|
|
eks wye 4 0.38139939387114097 0.13418874328430463 10 15
|
|
wye pan 5 0.5732889198020006 0.8636244699032729 15 31
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
|
|
Notes:
|
|
<ul>
|
|
|
|
<li/> In <tt>for (start; continuation; update) { body }</tt>, the start,
|
|
continuation, and update statements may be empty, single statements, or
|
|
multiple comma-separated statements. If the continuation is empty (e.g. <tt>for(i=1;;i+=1)</tt>) it defaults
|
|
to true.
|
|
|
|
<li/> In particular, you may use <tt>$</tt>-variables and/or
|
|
<tt>@</tt>-variables in the start, continuation, and/or update steps (as well
|
|
as the body, of course).
|
|
|
|
<li/> The typedecls such as <tt>int</tt> or <tt>num</tt> are optional. If a
|
|
typedecl is provided (for a local variable), it binds a variable scoped to the
|
|
for-loop regardless of whether a same-name variable is present in outer scope.
|
|
If a typedecl is not provided, then the variable is scoped to the for-loop if
|
|
no same-name variable is present in outer scope, or if a same-name variable is
|
|
present in outer scope then it is modified.
|
|
|
|
<li/> Miller has no <tt>++</tt> or <tt>--</tt> operators.
|
|
|
|
<li/> As with all for/if/while statements in Miller, the curly braces are
|
|
required even if the body is a single statement, or empty.
|
|
|
|
</ul>
|
|
|
|
<!-- ================================================================ -->
|
|
<a id="Begin/end_blocks"/><h2>Begin/end blocks</h2>
|
|
|
|
<p/>Miller supports an <tt>awk</tt>-like <tt>begin/end</tt> syntax. The
|
|
statements in the <tt>begin</tt> block are executed before any input records
|
|
are read; the statements in the <tt>end</tt> block are executed after the last
|
|
input record is read. (If you want to execute some statement at the start of
|
|
each file, not at the start of the first file as with <tt>begin</tt>, you might
|
|
use a pattern/action block of the form <tt>FNR == 1 { ... }</tt>.) All
|
|
statements outside of <tt>begin</tt> or <tt>end</tt> are, of course, executed
|
|
on every input record. Semicolons separate statements inside or outside of
|
|
begin/end blocks; semicolons are required between begin/end block bodies and
|
|
any subsequent statement. For example:
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr put '
|
|
begin { @sum = 0 };
|
|
@x_sum += $x;
|
|
end { emit @x_sum }
|
|
' ../data/small
|
|
a=pan,b=pan,i=1,x=0.3467901443380824,y=0.7268028627434533
|
|
a=eks,b=pan,i=2,x=0.7586799647899636,y=0.5221511083334797
|
|
a=wye,b=wye,i=3,x=0.20460330576630303,y=0.33831852551664776
|
|
a=eks,b=wye,i=4,x=0.38139939387114097,y=0.13418874328430463
|
|
a=wye,b=pan,i=5,x=0.5732889198020006,y=0.8636244699032729
|
|
a=zee,b=pan,i=6,x=0.5271261600918548,y=0.49322128674835697
|
|
a=eks,b=zee,i=7,x=0.6117840605678454,y=0.1878849191181694
|
|
a=zee,b=wye,i=8,x=0.5985540091064224,y=0.976181385699006
|
|
a=hat,b=wye,i=9,x=0.03144187646093577,y=0.7495507603507059
|
|
a=pan,b=wye,i=10,x=0.5026260055412137,y=0.9526183602969864
|
|
x_sum=4.536294
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
|
|
<p/>Since uninitialized out-of-stream variables default to 0 for
|
|
addition/substraction and 1 for multiplication when they appear on expression
|
|
right-hand sides (as in <tt>awk</tt>), the above can be written more succinctly
|
|
as
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr put '
|
|
@x_sum += $x;
|
|
end { emit @x_sum }
|
|
' ../data/small
|
|
a=pan,b=pan,i=1,x=0.3467901443380824,y=0.7268028627434533
|
|
a=eks,b=pan,i=2,x=0.7586799647899636,y=0.5221511083334797
|
|
a=wye,b=wye,i=3,x=0.20460330576630303,y=0.33831852551664776
|
|
a=eks,b=wye,i=4,x=0.38139939387114097,y=0.13418874328430463
|
|
a=wye,b=pan,i=5,x=0.5732889198020006,y=0.8636244699032729
|
|
a=zee,b=pan,i=6,x=0.5271261600918548,y=0.49322128674835697
|
|
a=eks,b=zee,i=7,x=0.6117840605678454,y=0.1878849191181694
|
|
a=zee,b=wye,i=8,x=0.5985540091064224,y=0.976181385699006
|
|
a=hat,b=wye,i=9,x=0.03144187646093577,y=0.7495507603507059
|
|
a=pan,b=wye,i=10,x=0.5026260055412137,y=0.9526183602969864
|
|
x_sum=4.536294
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
|
|
<p/>The <b>put -q</b> option is a shorthand which suppresses printing of each
|
|
output record, with only <tt>emit</tt> statements being output. So to get only
|
|
summary outputs, one could write
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr put -q '
|
|
@x_sum += $x;
|
|
end { emit @x_sum }
|
|
' ../data/small
|
|
x_sum=4.536294
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
|
|
<p/>We can do similarly with multiple out-of-stream variables:
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr put -q '
|
|
@x_count += 1;
|
|
@x_sum += $x;
|
|
end {
|
|
emit @x_count;
|
|
emit @x_sum;
|
|
}
|
|
' ../data/small
|
|
x_count=10
|
|
x_sum=4.536294
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
|
|
This is of course not much different than
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr stats1 -a count,sum -f x ../data/small
|
|
x_count=10,x_sum=4.536294
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
|
|
<p/>Note that it’s a syntax error for begin/end blocks to refer to field
|
|
names (beginning with <tt>$</tt>), since these execute outside the context of
|
|
input records.
|
|
|
|
</div>
|
|
<!-- ================================================================ -->
|
|
<a id="Output_statements"/><h1>Output statements</h1>
|
|
<button style="font-weight:bold;color:maroon;border:0" padding=0 onclick="toggle_by_name('section_toggle_output_statements');" href="javascript:;">Toggle section visibility</button>
|
|
<div id="section_toggle_output_statements" style="display: block">
|
|
|
|
<p/>You can <b>output</b> variable-values or expressions in <b>five ways</b>:
|
|
|
|
<ul>
|
|
|
|
<li/> <b>Assign</b> them to stream-record fields. For example,
|
|
<tt>$cumulative_sum = @sum</tt>. For another example, <tt>$nr = NR</tt> adds a
|
|
field named <tt>nr</tt> to each output record, containing the value of the
|
|
built-in variable <tt>NR</tt> as of when that record was ingested.
|
|
|
|
<li/> Use the <b>print</b> or <b>eprint</b> keywords which immediately print an
|
|
expression <i>directly to standard output or standard error</i>, respectively.
|
|
Note that <tt>dump</tt>, <tt>edump</tt>, <tt>print</tt>, and <tt>eprint</tt>
|
|
don’t output records which participate in <tt>then</tt>-chaining; rather,
|
|
they’re just immediate prints to stdout/stderr. The <tt>printn</tt> and
|
|
<tt>eprintn</tt> keywords are the same except that they don’t print final
|
|
newlines. Additionally, you can print to a specified file instead of
|
|
stdout/stderr.
|
|
|
|
<li/> Use the <b>dump</b> or <b>edump</b> keywords, which <i>immediately print
|
|
all out-of-stream variables as a JSON data structure to the standard output or
|
|
standard error</i> (respectively).
|
|
|
|
<li/> Use <b>tee</b> which formats the current stream record (not just an
|
|
arbitrary string as with <b>print</b>) to a specific file.
|
|
|
|
<li/> Use <b>emit</b>/<b>emitp</b>/<b>emitf</b> to send out-of-stream
|
|
variables’ current values to the output record stream, e.g. <tt>@sum +=
|
|
$x; emit @sum</tt> which produces an extra output record such as
|
|
<tt>sum=3.1648382</tt>.
|
|
|
|
</ul>
|
|
|
|
<p/>For the first two options you are populating the output-records stream
|
|
which feeds into the next verb in a <tt>then</tt>-chain (if any), or which otherwise
|
|
is formatted for output using <tt>--o...</tt> flags.
|
|
|
|
<p/>For the last three options you are sending output directly to standard
|
|
output, standard error, or a file.
|
|
|
|
<!-- ================================================================ -->
|
|
<a id="Print_statements"/><h2>Print statements</h2>
|
|
|
|
<p/>The <tt>print</tt> statement is perhaps self-explanatory, but with a few
|
|
light caveats:
|
|
|
|
<ul>
|
|
|
|
<li/> There are four variants: <tt>print</tt> goes to stdout with final
|
|
newline, <tt>printn</tt> goes to stdout without final newline (you can include
|
|
one using "\n" in your output string), <tt>eprint</tt> goes to stderr with
|
|
final newline, and <tt>eprintn</tt> goes to stderr without final newline.
|
|
|
|
<li/> Output goes directly to stdout/stderr, respectively: data produced this
|
|
way do not go downstream to the next verb in a <tt>then</tt>-chain. (Use
|
|
<tt>emit</tt> for that.)
|
|
|
|
<li/> Print statements are for strings (<tt>print "hello"</tt>), or things
|
|
which can be made into strings: numbers (<tt>print 3</tt>, <tt>print $a +
|
|
$b</tt>, or concatenations thereof (<tt>print "a + b = " . ($a + $b)</tt>).
|
|
Maps (in <tt>$*</tt>, map-valued out-of-stream or local variables, and map
|
|
literals) aren’t convertible into strings. If you print a map, you get
|
|
<tt>{is-a-map}</tt> as output. Please use <tt>dump</tt> to print maps.
|
|
|
|
<li/>You can redirect print output to a file:
|
|
<tt>mlr --from myfile.dat put 'print > "tap.txt", $x'</tt>
|
|
<tt>mlr --from myfile.dat put 'o=$*; print > $a.".txt", $x'</tt>.
|
|
|
|
<li/> See also the <a href="#Redirected-output_statements">section on redirected output</a> for examples.
|
|
|
|
</ul>
|
|
|
|
<!-- ================================================================ -->
|
|
<a id="Dump_statements"/><h2>Dump statements</h2>
|
|
|
|
<p/>The <tt>dump</tt> statement is for printing expressions, including maps,
|
|
directly to stdout/stderr, respectively:
|
|
|
|
<ul>
|
|
|
|
<li/> There are two variants: <tt>dump</tt> prints to stdout; <tt>edump</tt>
|
|
prints to stderr.
|
|
|
|
<li/> Output goes directly to stdout/stderr, respectively: data produced this
|
|
way do not go downstream to the next verb in a <tt>then</tt>-chain. (Use
|
|
<tt>emit</tt> for that.)
|
|
|
|
<li/> You can use <tt>dump</tt> to output single strings, numbers,
|
|
or expressions including map-valued data. Map-valued data are printed
|
|
as JSON. Miller allows string and integer keys in its map literals while
|
|
JSON allows only string keys, so use <tt>mlr put --jknquoteint</tt> if
|
|
you want integer-valued map keys not double-quoted.
|
|
|
|
<li/> If you use <tt>dump</tt> (or <tt>edump</tt>) with no arguments, you get a
|
|
JSON structure representing the current values of all out-of-stream variables.
|
|
|
|
<li/> As with <tt>print</tt>, you can redirect output to files.
|
|
|
|
<li/> See also the <a href="#Redirected-output_statements">section on redirected output</a> for examples.
|
|
|
|
</ul>
|
|
|
|
<!-- ================================================================ -->
|
|
<a id="Tee_statements"/><h2>Tee statements</h2>
|
|
|
|
<p/> Records produced by a <tt>mlr put</tt> go downstream to the next verb in
|
|
your <tt>then</tt>-chain, if any, or otherwise to standard output. If you want
|
|
to additionally copy out records to files, you can do that using <tt>tee</tt>.
|
|
|
|
<p/>The syntax is, by example, <tt>mlr --from myfile.dat put 'tee >
|
|
"tap.dat", $*' then sort -n index</tt>. First is <tt>tee ></tt>, then the
|
|
filename expression (which can be an expression such as
|
|
<tt>"tap.".$a.".dat"</tt>), then a comma, then <tt>$*</tt>. (Nothing else but
|
|
<tt>$*</tt> is teeable.)
|
|
|
|
<p/> See also the <a href="#Redirected-output_statements">section on redirected
|
|
output</a> for examples.
|
|
|
|
<!-- ================================================================ -->
|
|
<a id="Redirected-output_statements"/><h2>Redirected-output statements</h2>
|
|
|
|
The <b>print</b>, <b>dump</b> <b>tee</b>, <b>emitf</b>, <b>emit</b>, and
|
|
<b>emitp</b> keywords all allow you to redirect output to one or more files or
|
|
pipe-to commands. The filenames/commands are strings which can be constructed
|
|
using record-dependent values, so you can do things like splitting a table into
|
|
multiple files, one for each account ID, and so on.
|
|
|
|
<p/> Details:
|
|
|
|
<ul>
|
|
|
|
<li/> The <tt>print</tt> and <tt>dump</tt> keywords produce output immediately
|
|
to standard output, or to specified file(s) or pipe-to command if present.
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr --help-keyword print
|
|
print: prints expression immediately to stdout.
|
|
Example: mlr --from f.dat put -q 'print "The sum of x and y is ".($x+$y)'
|
|
Example: mlr --from f.dat put -q 'for (k, v in $*) { print k . " => " . v }'
|
|
Example: mlr --from f.dat put '(NR % 1000 == 0) { print > stderr, "Checkpoint ".NR}'
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr --help-keyword dump
|
|
dump: prints all currently defined out-of-stream variables immediately
|
|
to stdout as JSON.
|
|
|
|
With >, >>, or |, the data do not become part of the output record stream but
|
|
are instead redirected.
|
|
|
|
The > and >> are for write and append, as in the shell, but (as with awk) the
|
|
file-overwrite for > is on first write, not per record. The | is for piping to
|
|
a process which will process the data. There will be one open file for each
|
|
distinct file name (for > and >>) or one subordinate process for each distinct
|
|
value of the piped-to command (for |). Output-formatting flags are taken from
|
|
the main command line.
|
|
|
|
Example: mlr --from f.dat put -q '@v[NR]=$*; end { dump }'
|
|
Example: mlr --from f.dat put -q '@v[NR]=$*; end { dump > "mytap.dat"}'
|
|
Example: mlr --from f.dat put -q '@v[NR]=$*; end { dump >> "mytap.dat"}'
|
|
Example: mlr --from f.dat put -q '@v[NR]=$*; end { dump | "jq .[]"}'
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
|
|
<li/> <tt>mlr put</tt> sends the current record (possibly modified by the
|
|
<tt>put</tt> expression) to the output record stream. Records are then input to
|
|
the following verb in a <tt>then</tt>-chain (if any), else printed to standard
|
|
output (unless <tt>put -q</tt>). The <b>tee</b> keyword <i>additionally</i>
|
|
writes the output record to specified file(s) or pipe-to command, or
|
|
immediately to <tt>stdout</tt>/<tt>stderr</tt>.
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr --help-keyword tee
|
|
tee: prints the current record to specified file.
|
|
This is an immediate print to the specified file (except for pprint format
|
|
which of course waits until the end of the input stream to format all output).
|
|
|
|
The > and >> are for write and append, as in the shell, but (as with awk) the
|
|
file-overwrite for > is on first write, not per record. The | is for piping to
|
|
a process which will process the data. There will be one open file for each
|
|
distinct file name (for > and >>) or one subordinate process for each distinct
|
|
value of the piped-to command (for |). Output-formatting flags are taken from
|
|
the main command line.
|
|
|
|
You can use any of the output-format command-line flags, e.g. --ocsv, --ofs,
|
|
etc., to control the format of the output. See also mlr -h.
|
|
|
|
emit with redirect and tee with redirect are identical, except tee can only
|
|
output $*.
|
|
|
|
Example: mlr --from f.dat put 'tee > "/tmp/data-".$a, $*'
|
|
Example: mlr --from f.dat put 'tee > "/tmp/data-".$a, mapexcept($*, "a")'
|
|
Example: mlr --from f.dat put 'tee >> "/tmp/data-".$a.$b, $*'
|
|
Example: mlr --from f.dat put 'tee > stderr, $*'
|
|
Example: mlr --from f.dat put -q 'tee | "tr [a-z\] [A-Z\]", $*'
|
|
Example: mlr --from f.dat put -q 'tee | "tr [a-z\] [A-Z\] > /tmp/data-".$a, $*'
|
|
Example: mlr --from f.dat put -q 'tee | "gzip > /tmp/data-".$a.".gz", $*'
|
|
Example: mlr --from f.dat put -q --ojson 'tee | "gzip > /tmp/data-".$a.".gz", $*'
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
|
|
<li/> <tt>mlr put</tt>’s <tt>emitf</tt>, <tt>emitp</tt>, and
|
|
<tt>emit</tt> send out-of-stream variables to the output record stream. These
|
|
are then input to the following verb in a <tt>then</tt>-chain (if any), else
|
|
printed to standard output. When redirected with <tt>></tt>,
|
|
<tt>>></tt>, or <tt>|</tt>, they <i>instead</i> write the out-of-stream
|
|
variable(s) to specified file(s) or pipe-to command, or immediately to
|
|
<tt>stdout</tt>/<tt>stderr</tt>.
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr --help-keyword emitf
|
|
emitf: inserts non-indexed out-of-stream variable(s) side-by-side into the
|
|
output record stream.
|
|
|
|
With >, >>, or |, the data do not become part of the output record stream but
|
|
are instead redirected.
|
|
|
|
The > and >> are for write and append, as in the shell, but (as with awk) the
|
|
file-overwrite for > is on first write, not per record. The | is for piping to
|
|
a process which will process the data. There will be one open file for each
|
|
distinct file name (for > and >>) or one subordinate process for each distinct
|
|
value of the piped-to command (for |). Output-formatting flags are taken from
|
|
the main command line.
|
|
|
|
You can use any of the output-format command-line flags, e.g. --ocsv, --ofs,
|
|
etc., to control the format of the output if the output is redirected. See also mlr -h.
|
|
|
|
Example: mlr --from f.dat put '@a=$i;@b+=$x;@c+=$y; emitf @a'
|
|
Example: mlr --from f.dat put --oxtab '@a=$i;@b+=$x;@c+=$y; emitf > "tap-".$i.".dat", @a'
|
|
Example: mlr --from f.dat put '@a=$i;@b+=$x;@c+=$y; emitf @a, @b, @c'
|
|
Example: mlr --from f.dat put '@a=$i;@b+=$x;@c+=$y; emitf > "mytap.dat", @a, @b, @c'
|
|
Example: mlr --from f.dat put '@a=$i;@b+=$x;@c+=$y; emitf >> "mytap.dat", @a, @b, @c'
|
|
Example: mlr --from f.dat put '@a=$i;@b+=$x;@c+=$y; emitf > stderr, @a, @b, @c'
|
|
Example: mlr --from f.dat put '@a=$i;@b+=$x;@c+=$y; emitf | "grep somepattern", @a, @b, @c'
|
|
Example: mlr --from f.dat put '@a=$i;@b+=$x;@c+=$y; emitf | "grep somepattern > mytap.dat", @a, @b, @c'
|
|
|
|
Please see http://johnkerl.org/miller/doc for more information.
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr --help-keyword emitp
|
|
emitp: inserts an out-of-stream variable into the output record stream.
|
|
Hashmap indices present in the data but not slotted by emitp arguments are
|
|
output concatenated with ":".
|
|
|
|
With >, >>, or |, the data do not become part of the output record stream but
|
|
are instead redirected.
|
|
|
|
The > and >> are for write and append, as in the shell, but (as with awk) the
|
|
file-overwrite for > is on first write, not per record. The | is for piping to
|
|
a process which will process the data. There will be one open file for each
|
|
distinct file name (for > and >>) or one subordinate process for each distinct
|
|
value of the piped-to command (for |). Output-formatting flags are taken from
|
|
the main command line.
|
|
|
|
You can use any of the output-format command-line flags, e.g. --ocsv, --ofs,
|
|
etc., to control the format of the output if the output is redirected. See also mlr -h.
|
|
|
|
Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emitp @sums'
|
|
Example: mlr --from f.dat put --opprint '@sums[$a][$b]+=$x; emitp > "tap-".$a.$b.".dat", @sums'
|
|
Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emitp @sums, "index1", "index2"'
|
|
Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emitp @*, "index1", "index2"'
|
|
Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emitp > "mytap.dat", @*, "index1", "index2"'
|
|
Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emitp >> "mytap.dat", @*, "index1", "index2"'
|
|
Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emitp | "gzip > mytap.dat.gz", @*, "index1", "index2"'
|
|
Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emitp > stderr, @*, "index1", "index2"'
|
|
Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emitp | "grep somepattern", @*, "index1", "index2"'
|
|
|
|
Please see http://johnkerl.org/miller/doc for more information.
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr --help-keyword emit
|
|
emit: inserts an out-of-stream variable into the output record stream. Hashmap
|
|
indices present in the data but not slotted by emit arguments are not output.
|
|
|
|
With >, >>, or |, the data do not become part of the output record stream but
|
|
are instead redirected.
|
|
|
|
The > and >> are for write and append, as in the shell, but (as with awk) the
|
|
file-overwrite for > is on first write, not per record. The | is for piping to
|
|
a process which will process the data. There will be one open file for each
|
|
distinct file name (for > and >>) or one subordinate process for each distinct
|
|
value of the piped-to command (for |). Output-formatting flags are taken from
|
|
the main command line.
|
|
|
|
You can use any of the output-format command-line flags, e.g. --ocsv, --ofs,
|
|
etc., to control the format of the output if the output is redirected. See also mlr -h.
|
|
|
|
Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emit @sums'
|
|
Example: mlr --from f.dat put --ojson '@sums[$a][$b]+=$x; emit > "tap-".$a.$b.".dat", @sums'
|
|
Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emit @sums, "index1", "index2"'
|
|
Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emit @*, "index1", "index2"'
|
|
Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emit > "mytap.dat", @*, "index1", "index2"'
|
|
Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emit >> "mytap.dat", @*, "index1", "index2"'
|
|
Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emit | "gzip > mytap.dat.gz", @*, "index1", "index2"'
|
|
Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emit > stderr, @*, "index1", "index2"'
|
|
Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emit | "grep somepattern", @*, "index1", "index2"'
|
|
|
|
Please see http://johnkerl.org/miller/doc for more information.
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
|
|
</ul>
|
|
|
|
<!-- ================================================================ -->
|
|
<a id="Emit_statements"/><h2>Emit statements</h2>
|
|
|
|
<p/>There are three variants: <tt>emitf</tt>, <tt>emit</tt>, and
|
|
<tt>emitp</tt>. Keep in mind that out-of-stream variables are a nested,
|
|
multi-level hashmap (directly viewable as JSON using <tt>dump</tt>), whereas
|
|
Miller output records are lists of single-level key-value pairs. The three emit
|
|
variants allow you to control how the multilevel hashmaps are flatten down to
|
|
output records. You can emit any map-valued expression, including <tt>$*</tt>,
|
|
map-valued out-of-stream variables, the entire out-of-stream-variable
|
|
collection <tt>@*</tt>, map-valued local variables, map literals, or map-valued
|
|
function return values.
|
|
|
|
<p/>Use <b>emitf</b> to output several out-of-stream variables side-by-side in the same output record.
|
|
For <tt>emitf</tt> these mustn’t have indexing using <tt>@name[...]</tt>. Example:
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr put -q '@count += 1; @x_sum += $x; @y_sum += $y; end { emitf @count, @x_sum, @y_sum}' data/small
|
|
count=5,x_sum=2.264762,y_sum=2.585086
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
|
|
<p/>Use <b>emit</b> to output an out-of-stream variable. If it’s non-indexed you’ll get a simple key-value pair:
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ cat data/small
|
|
a=pan,b=pan,i=1,x=0.3467901443380824,y=0.7268028627434533
|
|
a=eks,b=pan,i=2,x=0.7586799647899636,y=0.5221511083334797
|
|
a=wye,b=wye,i=3,x=0.20460330576630303,y=0.33831852551664776
|
|
a=eks,b=wye,i=4,x=0.38139939387114097,y=0.13418874328430463
|
|
a=wye,b=pan,i=5,x=0.5732889198020006,y=0.8636244699032729
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr put -q '@sum += $x; end { dump }' data/small
|
|
{
|
|
"sum": 2.264762
|
|
}
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr put -q '@sum += $x; end { emit @sum }' data/small
|
|
sum=2.264762
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
|
|
<p/>If it’s indexed then use as many names after <tt>emit</tt> as there are indices:
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr put -q '@sum[$a] += $x; end { dump }' data/small
|
|
{
|
|
"sum": {
|
|
"pan": 0.346790,
|
|
"eks": 1.140079,
|
|
"wye": 0.777892
|
|
}
|
|
}
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr put -q '@sum[$a] += $x; end { emit @sum, "a" }' data/small
|
|
a=pan,sum=0.346790
|
|
a=eks,sum=1.140079
|
|
a=wye,sum=0.777892
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr put -q '@sum[$a][$b] += $x; end { dump }' data/small
|
|
{
|
|
"sum": {
|
|
"pan": {
|
|
"pan": 0.346790
|
|
},
|
|
"eks": {
|
|
"pan": 0.758680,
|
|
"wye": 0.381399
|
|
},
|
|
"wye": {
|
|
"wye": 0.204603,
|
|
"pan": 0.573289
|
|
}
|
|
}
|
|
}
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr put -q '@sum[$a][$b] += $x; end { emit @sum, "a", "b" }' data/small
|
|
a=pan,b=pan,sum=0.346790
|
|
a=eks,b=pan,sum=0.758680
|
|
a=eks,b=wye,sum=0.381399
|
|
a=wye,b=wye,sum=0.204603
|
|
a=wye,b=pan,sum=0.573289
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr put -q '@sum[$a][$b][$i] += $x; end { dump }' data/small
|
|
{
|
|
"sum": {
|
|
"pan": {
|
|
"pan": {
|
|
"1": 0.346790
|
|
}
|
|
},
|
|
"eks": {
|
|
"pan": {
|
|
"2": 0.758680
|
|
},
|
|
"wye": {
|
|
"4": 0.381399
|
|
}
|
|
},
|
|
"wye": {
|
|
"wye": {
|
|
"3": 0.204603
|
|
},
|
|
"pan": {
|
|
"5": 0.573289
|
|
}
|
|
}
|
|
}
|
|
}
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr put -q '@sum[$a][$b][$i] += $x; end { emit @sum, "a", "b", "i" }' data/small
|
|
a=pan,b=pan,i=1,sum=0.346790
|
|
a=eks,b=pan,i=2,sum=0.758680
|
|
a=eks,b=wye,i=4,sum=0.381399
|
|
a=wye,b=wye,i=3,sum=0.204603
|
|
a=wye,b=pan,i=5,sum=0.573289
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
|
|
<p/>Now for <b>emitp</b>: if you have as many names following <tt>emit</tt> as
|
|
there are levels in the out-of-stream variable’s hashmap, then <tt>emit</tt> and <tt>emitp</tt> do the same
|
|
thing. Where they differ is when you don’t specify as many names as there are hashmap levels. In this
|
|
case, Miller needs to flatten multiple map indices down to output-record keys: <tt>emitp</tt> includes full
|
|
prefixing (hence the <tt>p</tt> in <tt>emitp</tt>) while <tt>emit</tt> takes the deepest hashmap key as the
|
|
output-record key:
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr put -q '@sum[$a][$b] += $x; end { dump }' data/small
|
|
{
|
|
"sum": {
|
|
"pan": {
|
|
"pan": 0.346790
|
|
},
|
|
"eks": {
|
|
"pan": 0.758680,
|
|
"wye": 0.381399
|
|
},
|
|
"wye": {
|
|
"wye": 0.204603,
|
|
"pan": 0.573289
|
|
}
|
|
}
|
|
}
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr put -q '@sum[$a][$b] += $x; end { emit @sum, "a" }' data/small
|
|
a=pan,pan=0.346790
|
|
a=eks,pan=0.758680,wye=0.381399
|
|
a=wye,wye=0.204603,pan=0.573289
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr put -q '@sum[$a][$b] += $x; end { emit @sum }' data/small
|
|
pan=0.346790
|
|
pan=0.758680,wye=0.381399
|
|
wye=0.204603,pan=0.573289
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr put -q '@sum[$a][$b] += $x; end { emitp @sum, "a" }' data/small
|
|
a=pan,sum:pan=0.346790
|
|
a=eks,sum:pan=0.758680,sum:wye=0.381399
|
|
a=wye,sum:wye=0.204603,sum:pan=0.573289
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr put -q '@sum[$a][$b] += $x; end { emitp @sum }' data/small
|
|
sum:pan:pan=0.346790,sum:eks:pan=0.758680,sum:eks:wye=0.381399,sum:wye:wye=0.204603,sum:wye:pan=0.573289
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr --oxtab put -q '@sum[$a][$b] += $x; end { emitp @sum }' data/small
|
|
sum:pan:pan 0.346790
|
|
sum:eks:pan 0.758680
|
|
sum:eks:wye 0.381399
|
|
sum:wye:wye 0.204603
|
|
sum:wye:pan 0.573289
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
|
|
<p/>Use <b>--oflatsep</b> to specify the character which joins multilevel
|
|
keys for <tt>emitp</tt> (it defaults to a colon):
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr put -q --oflatsep / '@sum[$a][$b] += $x; end { emitp @sum, "a" }' data/small
|
|
a=pan,sum/pan=0.346790
|
|
a=eks,sum/pan=0.758680,sum/wye=0.381399
|
|
a=wye,sum/wye=0.204603,sum/pan=0.573289
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr put -q --oflatsep / '@sum[$a][$b] += $x; end { emitp @sum }' data/small
|
|
sum/pan/pan=0.346790,sum/eks/pan=0.758680,sum/eks/wye=0.381399,sum/wye/wye=0.204603,sum/wye/pan=0.573289
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr --oxtab put -q --oflatsep / '@sum[$a][$b] += $x; end { emitp @sum }' data/small
|
|
sum/pan/pan 0.346790
|
|
sum/eks/pan 0.758680
|
|
sum/eks/wye 0.381399
|
|
sum/wye/wye 0.204603
|
|
sum/wye/pan 0.573289
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
|
|
<!-- ================================================================ -->
|
|
<a id="Multi-emit_statements"/><h2>Multi-emit statements</h2>
|
|
|
|
<p/>You can emit <b>multiple map-valued expressions side-by-side</b> by
|
|
including their names in parentheses:
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr --from data/medium --opprint put -q '
|
|
@x_count[$a][$b] += 1;
|
|
@x_sum[$a][$b] += $x;
|
|
end {
|
|
for ((a, b), _ in @x_count) {
|
|
@x_mean[a][b] = @x_sum[a][b] / @x_count[a][b]
|
|
}
|
|
emit (@x_sum, @x_count, @x_mean), "a", "b"
|
|
}
|
|
'
|
|
a b x_sum x_count x_mean
|
|
pan pan 219.185129 427 0.513314
|
|
pan wye 198.432931 395 0.502362
|
|
pan eks 216.075228 429 0.503672
|
|
pan hat 205.222776 417 0.492141
|
|
pan zee 205.097518 413 0.496604
|
|
eks pan 179.963030 371 0.485076
|
|
eks wye 196.945286 407 0.483895
|
|
eks zee 176.880365 357 0.495463
|
|
eks eks 215.916097 413 0.522799
|
|
eks hat 208.783171 417 0.500679
|
|
wye wye 185.295850 377 0.491501
|
|
wye pan 195.847900 392 0.499612
|
|
wye hat 212.033183 426 0.497730
|
|
wye zee 194.774048 385 0.505907
|
|
wye eks 204.812961 386 0.530604
|
|
zee pan 202.213804 389 0.519830
|
|
zee wye 233.991394 455 0.514267
|
|
zee eks 190.961778 391 0.488393
|
|
zee zee 206.640635 403 0.512756
|
|
zee hat 191.300006 409 0.467726
|
|
hat wye 208.883010 423 0.493813
|
|
hat zee 196.349450 385 0.509999
|
|
hat eks 189.006793 389 0.485879
|
|
hat hat 182.853532 381 0.479931
|
|
hat pan 168.553807 363 0.464336
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
|
|
What this does is walk through the first out-of-stream variable
|
|
(<tt>@x_sum</tt> in this example) as usual, then for each keylist found (e.g.
|
|
<tt>pan,wye</tt>), include the values for the remaining out-of-stream variables
|
|
(here, <tt>@x_count</tt> and <tt>@x_mean</tt>). You should use this when all
|
|
out-of-stream variables in the emit statement have <b>the same shape and the same
|
|
keylists</b>.
|
|
|
|
<!-- ================================================================ -->
|
|
<a id="Emit-all_statements"/><h2>Emit-all statements</h2>
|
|
|
|
<p/>Use <b>emit all</b> (or <tt>emit @*</tt> which is synonymous) to output all
|
|
out-of-stream variables. You can use the following idiom to get various
|
|
accumulators output side-by-side (reminiscent of <tt>mlr stats1</tt>):
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr --from data/small --opprint put -q '@v[$a][$b]["sum"] += $x; @v[$a][$b]["count"] += 1; end{emit @*,"a","b"}'
|
|
a b sum count
|
|
pan pan 0.346790 1
|
|
eks pan 0.758680 1
|
|
eks wye 0.381399 1
|
|
wye wye 0.204603 1
|
|
wye pan 0.573289 1
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr --from data/small --opprint put -q '@sum[$a][$b] += $x; @count[$a][$b] += 1; end{emit @*,"a","b"}'
|
|
a b sum
|
|
pan pan 0.346790
|
|
eks pan 0.758680
|
|
eks wye 0.381399
|
|
wye wye 0.204603
|
|
wye pan 0.573289
|
|
|
|
a b count
|
|
pan pan 1
|
|
eks pan 1
|
|
eks wye 1
|
|
wye wye 1
|
|
wye pan 1
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr --from data/small --opprint put -q '@sum[$a][$b] += $x; @count[$a][$b] += 1; end{emit (@sum, @count),"a","b"}'
|
|
a b sum count
|
|
pan pan 0.346790 1
|
|
eks pan 0.758680 1
|
|
eks wye 0.381399 1
|
|
wye wye 0.204603 1
|
|
wye pan 0.573289 1
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
|
|
</div>
|
|
<!-- ================================================================ -->
|
|
<a id="Unset_statements"/><h1>Unset statements</h1>
|
|
<button style="font-weight:bold;color:maroon;border:0" padding=0 onclick="toggle_by_name('section_toggle_unset_statements');" href="javascript:;">Toggle section visibility</button>
|
|
<div id="section_toggle_unset_statements" style="display: block">
|
|
|
|
<p/>You can clear a map key by assigning the empty string as its value: <tt>$x=""</tt> or <tt>@x=""</tt>.
|
|
Using <tt>unset</tt> you can remove the key entirely. Examples:
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ cat data/small
|
|
a=pan,b=pan,i=1,x=0.3467901443380824,y=0.7268028627434533
|
|
a=eks,b=pan,i=2,x=0.7586799647899636,y=0.5221511083334797
|
|
a=wye,b=wye,i=3,x=0.20460330576630303,y=0.33831852551664776
|
|
a=eks,b=wye,i=4,x=0.38139939387114097,y=0.13418874328430463
|
|
a=wye,b=pan,i=5,x=0.5732889198020006,y=0.8636244699032729
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr put 'unset $x, $a' data/small
|
|
b=pan,i=1,y=0.7268028627434533
|
|
b=pan,i=2,y=0.5221511083334797
|
|
b=wye,i=3,y=0.33831852551664776
|
|
b=wye,i=4,y=0.13418874328430463
|
|
b=pan,i=5,y=0.8636244699032729
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
|
|
<p/>This can also be done, of course, using <tt>mlr cut -x</tt>. You can also
|
|
clear out-of-stream or local variables, at the base name level, or at an
|
|
indexed sublevel:
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr put -q '@sum[$a][$b] += $x; end { dump; unset @sum; dump }' data/small
|
|
{
|
|
"sum": {
|
|
"pan": {
|
|
"pan": 0.346790
|
|
},
|
|
"eks": {
|
|
"pan": 0.758680,
|
|
"wye": 0.381399
|
|
},
|
|
"wye": {
|
|
"wye": 0.204603,
|
|
"pan": 0.573289
|
|
}
|
|
}
|
|
}
|
|
{
|
|
}
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr put -q '@sum[$a][$b] += $x; end { dump; unset @sum["eks"]; dump }' data/small
|
|
{
|
|
"sum": {
|
|
"pan": {
|
|
"pan": 0.346790
|
|
},
|
|
"eks": {
|
|
"pan": 0.758680,
|
|
"wye": 0.381399
|
|
},
|
|
"wye": {
|
|
"wye": 0.204603,
|
|
"pan": 0.573289
|
|
}
|
|
}
|
|
}
|
|
{
|
|
"sum": {
|
|
"pan": {
|
|
"pan": 0.346790
|
|
},
|
|
"wye": {
|
|
"wye": 0.204603,
|
|
"pan": 0.573289
|
|
}
|
|
}
|
|
}
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
|
|
<p/>If you use <tt>unset all</tt> (or <tt>unset @*</tt> which is synonymous), that will unset all out-of-stream
|
|
variables which have been defined up to that point.
|
|
|
|
</div>
|
|
<!-- ================================================================ -->
|
|
<a id="Filter_statements"/><h1>Filter statements</h1>
|
|
<button style="font-weight:bold;color:maroon;border:0" padding=0 onclick="toggle_by_name('section_toggle_filter_statements');" href="javascript:;">Toggle section visibility</button>
|
|
<div id="section_toggle_filter_statements" style="display: block">
|
|
|
|
<p/> You can use <tt>filter</tt> within <tt>put</tt>. In fact, the
|
|
following two are synonymous:
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr filter 'NR==2 || NR==3' data/small
|
|
a=eks,b=pan,i=2,x=0.7586799647899636,y=0.5221511083334797
|
|
a=wye,b=wye,i=3,x=0.20460330576630303,y=0.33831852551664776
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr put 'filter NR==2 || NR==3' data/small
|
|
a=eks,b=pan,i=2,x=0.7586799647899636,y=0.5221511083334797
|
|
a=wye,b=wye,i=3,x=0.20460330576630303,y=0.33831852551664776
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
|
|
<p/>The former, of course, is much easier to type. But the latter allows you to define more complex expressions
|
|
for the filter, and/or do other things in addition to the filter:
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr put '@running_sum += $x; filter @running_sum > 1.3' data/small
|
|
a=wye,b=wye,i=3,x=0.20460330576630303,y=0.33831852551664776
|
|
a=eks,b=wye,i=4,x=0.38139939387114097,y=0.13418874328430463
|
|
a=wye,b=pan,i=5,x=0.5732889198020006,y=0.8636244699032729
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr put '$z = $x * $y; filter $z > 0.3' data/small
|
|
a=eks,b=pan,i=2,x=0.7586799647899636,y=0.5221511083334797,z=0.396146
|
|
a=wye,b=pan,i=5,x=0.5732889198020006,y=0.8636244699032729,z=0.495106
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
|
|
</div>
|
|
<!-- ================================================================ -->
|
|
<a id="Built-in_functions_for_filter_and_put"/><h1>Built-in functions for filter and put</h1>
|
|
<button style="font-weight:bold;color:maroon;border:0" padding=0 onclick="toggle_by_name('section_toggle_built_in_functions');" href="javascript:;">Toggle section visibility</button>
|
|
<div id="section_toggle_built_in_functions" style="display: block">
|
|
|
|
<p/>Each function takes a specific number of arguments, as shown below, except
|
|
for functions marked as variadic such as <tt>min</tt> and <tt>max</tt>. (The
|
|
latter compute min and max of any number of numerical arguments.) There is no
|
|
notion of optional or default-on-absent arguments. All argument-passing is
|
|
positional rather than by name; arguments are passed by value, not by
|
|
reference.
|
|
|
|
<p/>You can get a list of all functions using <b>mlr -F</b>.
|
|
|
|
|
|
<a id="+"/><h2>+</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
+ (class=arithmetic #args=2): Addition.
|
|
|
|
+ (class=arithmetic #args=1): Unary plus.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="-"/><h2>-</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
- (class=arithmetic #args=2): Subtraction.
|
|
|
|
- (class=arithmetic #args=1): Unary minus.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="*"/><h2>*</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
* (class=arithmetic #args=2): Multiplication.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="/"/><h2>/</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
/ (class=arithmetic #args=2): Division.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="//"/><h2>//</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
// (class=arithmetic #args=2): Integer division: rounds to negative (pythonic).
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="%"/><h2>%</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
% (class=arithmetic #args=2): Remainder; never negative-valued (pythonic).
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="**"/><h2>**</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
** (class=arithmetic #args=2): Exponentiation; same as pow, but as an infix
|
|
operator.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="|"/><h2>|</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
| (class=arithmetic #args=2): Bitwise OR.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="^"/><h2>^</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
^ (class=arithmetic #args=2): Bitwise XOR.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="&"/><h2>&</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
& (class=arithmetic #args=2): Bitwise AND.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="~"/><h2>~</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
~ (class=arithmetic #args=1): Bitwise NOT. Beware '$y=~$x' since =~ is the
|
|
regex-match operator: try '$y = ~$x'.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="<<"/><h2><<</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
<< (class=arithmetic #args=2): Bitwise left-shift.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id=">>"/><h2>>></h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
>> (class=arithmetic #args=2): Bitwise right-shift.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="=="/><h2>==</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
== (class=boolean #args=2): String/numeric equality. Mixing number and string
|
|
results in string compare.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="!="/><h2>!=</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
!= (class=boolean #args=2): String/numeric inequality. Mixing number and string
|
|
results in string compare.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="=~"/><h2>=~</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
=~ (class=boolean #args=2): String (left-hand side) matches regex (right-hand
|
|
side), e.g. '$name =~ "^a.*b$"'.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="!=~"/><h2>!=~</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
!=~ (class=boolean #args=2): String (left-hand side) does not match regex
|
|
(right-hand side), e.g. '$name !=~ "^a.*b$"'.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id=">"/><h2>></h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
> (class=boolean #args=2): String/numeric greater-than. Mixing number and string
|
|
results in string compare.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id=">="/><h2>>=</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
>= (class=boolean #args=2): String/numeric greater-than-or-equals. Mixing number
|
|
and string results in string compare.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="<"/><h2><</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
< (class=boolean #args=2): String/numeric less-than. Mixing number and string
|
|
results in string compare.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="<="/><h2><=</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
<= (class=boolean #args=2): String/numeric less-than-or-equals. Mixing number
|
|
and string results in string compare.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="&&"/><h2>&&</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
&& (class=boolean #args=2): Logical AND.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="||"/><h2>||</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
|| (class=boolean #args=2): Logical OR.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="^^"/><h2>^^</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
^^ (class=boolean #args=2): Logical XOR.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="!"/><h2>!</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
! (class=boolean #args=1): Logical negation.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="?_:"/><h2>? :</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
? : (class=boolean #args=3): Ternary operator.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="."/><h2>.</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
. (class=string #args=2): String concatenation.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="abs"/><h2>abs</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
abs (class=math #args=1): Absolute value.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="acos"/><h2>acos</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
acos (class=math #args=1): Inverse trigonometric cosine.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="acosh"/><h2>acosh</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
acosh (class=math #args=1): Inverse hyperbolic cosine.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="asin"/><h2>asin</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
asin (class=math #args=1): Inverse trigonometric sine.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="asinh"/><h2>asinh</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
asinh (class=math #args=1): Inverse hyperbolic sine.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="asserting_absent"/><h2>asserting_absent</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
asserting_absent (class=typing #args=1): Returns argument if it is absent in the input data, else
|
|
throws an error.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="asserting_bool"/><h2>asserting_bool</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
asserting_bool (class=typing #args=1): Returns argument if it is present with boolean value, else
|
|
throws an error.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="asserting_boolean"/><h2>asserting_boolean</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
asserting_boolean (class=typing #args=1): Returns argument if it is present with boolean value, else
|
|
throws an error.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="asserting_empty"/><h2>asserting_empty</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
asserting_empty (class=typing #args=1): Returns argument if it is present in input with empty value,
|
|
else throws an error.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="asserting_empty_map"/><h2>asserting_empty_map</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
asserting_empty_map (class=typing #args=1): Returns argument if it is a map with empty value, else
|
|
throws an error.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="asserting_float"/><h2>asserting_float</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
asserting_float (class=typing #args=1): Returns argument if it is present with float value, else
|
|
throws an error.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="asserting_int"/><h2>asserting_int</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
asserting_int (class=typing #args=1): Returns argument if it is present with int value, else
|
|
throws an error.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="asserting_map"/><h2>asserting_map</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
asserting_map (class=typing #args=1): Returns argument if it is a map, else throws an error.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="asserting_nonempty_map"/><h2>asserting_nonempty_map</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
asserting_nonempty_map (class=typing #args=1): Returns argument if it is a non-empty map, else throws
|
|
an error.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="asserting_not_empty"/><h2>asserting_not_empty</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
asserting_not_empty (class=typing #args=1): Returns argument if it is present in input with non-empty
|
|
value, else throws an error.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="asserting_not_map"/><h2>asserting_not_map</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
asserting_not_map (class=typing #args=1): Returns argument if it is not a map, else throws an error.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="asserting_not_null"/><h2>asserting_not_null</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
asserting_not_null (class=typing #args=1): Returns argument if it is non-null (non-empty and non-absent),
|
|
else throws an error.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="asserting_null"/><h2>asserting_null</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
asserting_null (class=typing #args=1): Returns argument if it is null (empty or absent), else throws
|
|
an error.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="asserting_numeric"/><h2>asserting_numeric</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
asserting_numeric (class=typing #args=1): Returns argument if it is present with int or float value,
|
|
else throws an error.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="asserting_present"/><h2>asserting_present</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
asserting_present (class=typing #args=1): Returns argument if it is present in input, else throws
|
|
an error.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="asserting_string"/><h2>asserting_string</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
asserting_string (class=typing #args=1): Returns argument if it is present with string (including
|
|
empty-string) value, else throws an error.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="atan"/><h2>atan</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
atan (class=math #args=1): One-argument arctangent.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="atan2"/><h2>atan2</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
atan2 (class=math #args=2): Two-argument arctangent.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="atanh"/><h2>atanh</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
atanh (class=math #args=1): Inverse hyperbolic tangent.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="boolean"/><h2>boolean</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
boolean (class=conversion #args=1): Convert int/float/bool/string to boolean.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="cbrt"/><h2>cbrt</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
cbrt (class=math #args=1): Cube root.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="ceil"/><h2>ceil</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
ceil (class=math #args=1): Ceiling: nearest integer at or above.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="cos"/><h2>cos</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
cos (class=math #args=1): Trigonometric cosine.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="cosh"/><h2>cosh</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
cosh (class=math #args=1): Hyperbolic cosine.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="depth"/><h2>depth</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
depth (class=maps #args=1): Prints maximum depth of hashmap: ''. Scalars have depth 0.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="dhms2fsec"/><h2>dhms2fsec</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
dhms2fsec (class=time #args=1): Recovers floating-point seconds as in
|
|
dhms2fsec("5d18h53m20.250000s") = 500000.250000
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="dhms2sec"/><h2>dhms2sec</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
dhms2sec (class=time #args=1): Recovers integer seconds as in
|
|
dhms2sec("5d18h53m20s") = 500000
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="erf"/><h2>erf</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
erf (class=math #args=1): Error function.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="erfc"/><h2>erfc</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
erfc (class=math #args=1): Complementary error function.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="exp"/><h2>exp</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
exp (class=math #args=1): Exponential function e**x.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="expm1"/><h2>expm1</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
expm1 (class=math #args=1): e**x - 1.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="float"/><h2>float</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
float (class=conversion #args=1): Convert int/float/bool/string to float.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="floor"/><h2>floor</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
floor (class=math #args=1): Floor: nearest integer at or below.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="fmtnum"/><h2>fmtnum</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
fmtnum (class=conversion #args=2): Convert int/float/bool to string using
|
|
printf-style format string, e.g. '$s = fmtnum($n, "%06lld")'.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="fsec2dhms"/><h2>fsec2dhms</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
fsec2dhms (class=time #args=1): Formats floating-point seconds as in
|
|
fsec2dhms(500000.25) = "5d18h53m20.250000s"
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="fsec2hms"/><h2>fsec2hms</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
fsec2hms (class=time #args=1): Formats floating-point seconds as in
|
|
fsec2hms(5000.25) = "01:23:20.250000"
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="gmt2sec"/><h2>gmt2sec</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
gmt2sec (class=time #args=1): Parses GMT timestamp as integer seconds since
|
|
the epoch.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="gsub"/><h2>gsub</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
gsub (class=string #args=3): Example: '$name=gsub($name, "old", "new")'
|
|
(replace all).
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="haskey"/><h2>haskey</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
haskey (class=maps #args=2): True/false if map has/hasn't key, e.g. 'haskey($*, "a")' or
|
|
'haskey(mymap, mykey)'. Error if 1st argument is not a map.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="hexfmt"/><h2>hexfmt</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
hexfmt (class=conversion #args=1): Convert int to string, e.g. 255 to "0xff".
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="hms2fsec"/><h2>hms2fsec</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
hms2fsec (class=time #args=1): Recovers floating-point seconds as in
|
|
hms2fsec("01:23:20.250000") = 5000.250000
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="hms2sec"/><h2>hms2sec</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
hms2sec (class=time #args=1): Recovers integer seconds as in
|
|
hms2sec("01:23:20") = 5000
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="int"/><h2>int</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
int (class=conversion #args=1): Convert int/float/bool/string to int.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="invqnorm"/><h2>invqnorm</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
invqnorm (class=math #args=1): Inverse of normal cumulative distribution
|
|
function. Note that invqorm(urand()) is normally distributed.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="is_absent"/><h2>is_absent</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
is_absent (class=typing #args=1): False if field is present in input, false otherwise
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="is_bool"/><h2>is_bool</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
is_bool (class=typing #args=1): True if field is present with boolean value. Synonymous with is_boolean.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="is_boolean"/><h2>is_boolean</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
is_boolean (class=typing #args=1): True if field is present with boolean value. Synonymous with is_bool.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="is_empty"/><h2>is_empty</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
is_empty (class=typing #args=1): True if field is present in input with empty string value, false otherwise.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="is_empty_map"/><h2>is_empty_map</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
is_empty_map (class=typing #args=1): True if argument is a map which is empty.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="is_float"/><h2>is_float</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
is_float (class=typing #args=1): True if field is present with value inferred to be float
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="is_int"/><h2>is_int</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
is_int (class=typing #args=1): True if field is present with value inferred to be int
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="is_map"/><h2>is_map</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
is_map (class=typing #args=1): True if argument is a map.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="is_nonempty_map"/><h2>is_nonempty_map</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
is_nonempty_map (class=typing #args=1): True if argument is a map which is non-empty.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="is_not_empty"/><h2>is_not_empty</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
is_not_empty (class=typing #args=1): False if field is present in input with empty value, false otherwise
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="is_not_map"/><h2>is_not_map</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
is_not_map (class=typing #args=1): True if argument is not a map.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="is_not_null"/><h2>is_not_null</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
is_not_null (class=typing #args=1): False if argument is null (empty or absent), true otherwise.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="is_null"/><h2>is_null</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
is_null (class=typing #args=1): True if argument is null (empty or absent), false otherwise.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="is_numeric"/><h2>is_numeric</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
is_numeric (class=typing #args=1): True if field is present with value inferred to be int or float
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="is_present"/><h2>is_present</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
is_present (class=typing #args=1): True if field is present in input, false otherwise.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="is_string"/><h2>is_string</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
is_string (class=typing #args=1): True if field is present with string (including empty-string) value
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="joink"/><h2>joink</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
joink (class=maps #args=2): Makes string from map keys. E.g. 'joink($*, ",")'.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="joinkv"/><h2>joinkv</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
joinkv (class=maps #args=3): Makes string from map key-value pairs. E.g. 'joinkv(@v[2], "=", ",")'
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="joinv"/><h2>joinv</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
joinv (class=maps #args=2): Makes string from map keys. E.g. 'joinv(mymap, ",")'.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="leafcount"/><h2>leafcount</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
leafcount (class=maps #args=1): Counts total number of terminal values in hashmap. For single-level maps,
|
|
same as length.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="length"/><h2>length</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
length (class=maps #args=1): Counts number of top-level entries in hashmap. Scalars have length 1.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="log"/><h2>log</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
log (class=math #args=1): Natural (base-e) logarithm.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="log10"/><h2>log10</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
log10 (class=math #args=1): Base-10 logarithm.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="log1p"/><h2>log1p</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
log1p (class=math #args=1): log(1-x).
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="logifit"/><h2>logifit</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
logifit (class=math #args=3): Given m and b from logistic regression, compute
|
|
fit: $yhat=logifit($x,$m,$b).
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="madd"/><h2>madd</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
madd (class=math #args=3): a + b mod m (integers)
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="mapdiff"/><h2>mapdiff</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
mapdiff (class=maps variadic): With 0 args, returns empty map. With 1 arg, returns copy of arg.
|
|
With 2 or more, returns copy of arg 1 with all keys from any of remaining argument maps removed.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="mapexcept"/><h2>mapexcept</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
mapexcept (class=maps variadic): Returns a map with keys from remaining arguments, if any, unset.
|
|
E.g. 'mapexcept({1:2,3:4,5:6}, 1, 5, 7)' is '{3:4}'.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="mapselect"/><h2>mapselect</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
mapselect (class=maps variadic): Returns a map with only keys from remaining arguments set.
|
|
E.g. 'mapselect({1:2,3:4,5:6}, 1, 5, 7)' is '{1:2,5:6}'.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="mapsum"/><h2>mapsum</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
mapsum (class=maps variadic): With 0 args, returns empty map. With >= 1 arg, returns a map with
|
|
key-value pairs from all arguments. Rightmost collisions win, e.g. 'mapsum({1:2,3:4},{1:5})' is '{1:5,3:4}'.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="max"/><h2>max</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
max (class=math variadic): max of n numbers; null loses
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="mexp"/><h2>mexp</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
mexp (class=math #args=3): a ** b mod m (integers)
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="min"/><h2>min</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
min (class=math variadic): Min of n numbers; null loses
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="mmul"/><h2>mmul</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
mmul (class=math #args=3): a * b mod m (integers)
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="msub"/><h2>msub</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
msub (class=math #args=3): a - b mod m (integers)
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="pow"/><h2>pow</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
pow (class=math #args=2): Exponentiation; same as **.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="qnorm"/><h2>qnorm</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
qnorm (class=math #args=1): Normal cumulative distribution function.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="round"/><h2>round</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
round (class=math #args=1): Round to nearest integer.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="roundm"/><h2>roundm</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
roundm (class=math #args=2): Round to nearest multiple of m: roundm($x,$m) is
|
|
the same as round($x/$m)*$m
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="sec2dhms"/><h2>sec2dhms</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
sec2dhms (class=time #args=1): Formats integer seconds as in sec2dhms(500000)
|
|
= "5d18h53m20s"
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="sec2gmt"/><h2>sec2gmt</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
sec2gmt (class=time #args=1): Formats seconds since epoch (integer part)
|
|
as GMT timestamp, e.g. sec2gmt(1440768801.7) = "2015-08-28T13:33:21Z".
|
|
Leaves non-numbers as-is.
|
|
|
|
sec2gmt (class=time #args=2): Formats seconds since epoch as GMT timestamp with n
|
|
decimal places for seconds, e.g. sec2gmt(1440768801.7,1) = "2015-08-28T13:33:21.7Z".
|
|
Leaves non-numbers as-is.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="sec2gmtdate"/><h2>sec2gmtdate</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
sec2gmtdate (class=time #args=1): Formats seconds since epoch (integer part)
|
|
as GMT timestamp with year-month-date, e.g. sec2gmtdate(1440768801.7) = "2015-08-28".
|
|
Leaves non-numbers as-is.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="sec2hms"/><h2>sec2hms</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
sec2hms (class=time #args=1): Formats integer seconds as in
|
|
sec2hms(5000) = "01:23:20"
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="sgn"/><h2>sgn</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
sgn (class=math #args=1): +1 for positive input, 0 for zero input, -1 for
|
|
negative input.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="sin"/><h2>sin</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
sin (class=math #args=1): Trigonometric sine.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="sinh"/><h2>sinh</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
sinh (class=math #args=1): Hyperbolic sine.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="splitkv"/><h2>splitkv</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
splitkv (class=maps #args=3): Splits string by separators into map with type inference.
|
|
E.g. 'splitkv("a=1,b=2,c=3", "=", ",")' gives '{"a" : 1, "b" : 2, "c" : 3}'.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="splitkvx"/><h2>splitkvx</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
splitkvx (class=maps #args=3): Splits string by separators into map without type inference (keys and
|
|
values are strings). E.g. 'splitkv("a=1,b=2,c=3", "=", ",")' gives
|
|
'{"a" : "1", "b" : "2", "c" : "3"}'.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="splitnv"/><h2>splitnv</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
splitnv (class=maps #args=2): Splits string by separator into integer-indexed map with type inference.
|
|
E.g. 'splitnv("a,b,c" , ",")' gives '{1 : "a", 2 : "b", 3 : "c"}'.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="splitnvx"/><h2>splitnvx</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
splitnvx (class=maps #args=2): Splits string by separator into integer-indexed map without type
|
|
inference (values are strings). E.g. 'splitnv("4,5,6" , ",")' gives '{1 : "4", 2 : "5", 3 : "6"}'.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="sqrt"/><h2>sqrt</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
sqrt (class=math #args=1): Square root.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="strftime"/><h2>strftime</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
strftime (class=time #args=2): Formats seconds since the epoch as timestamp, e.g.
|
|
strftime(1440768801.7,"%Y-%m-%dT%H:%M:%SZ") = "2015-08-28T13:33:21Z", and
|
|
strftime(1440768801.7,"%Y-%m-%dT%H:%M:%3SZ") = "2015-08-28T13:33:21.700Z".
|
|
Format strings are as in the C library (please see "man strftime" on your system),
|
|
with the Miller-specific addition of "%1S" through "%9S" which format the seocnds
|
|
with 1 through 9 decimal places, respectively. ("%S" uses no decimal places.)
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="string"/><h2>string</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
string (class=conversion #args=1): Convert int/float/bool/string to string.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="strlen"/><h2>strlen</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
strlen (class=string #args=1): String length.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="strptime"/><h2>strptime</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
strptime (class=time #args=2): Parses timestamp as floating-point seconds since the epoch,
|
|
e.g. strptime("2015-08-28T13:33:21Z","%Y-%m-%dT%H:%M:%SZ") = 1440768801.000000,
|
|
and strptime("2015-08-28T13:33:21.345Z","%Y-%m-%dT%H:%M:%SZ") = 1440768801.345000.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="sub"/><h2>sub</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
sub (class=string #args=3): Example: '$name=sub($name, "old", "new")'
|
|
(replace once).
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="substr"/><h2>substr</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
substr (class=string #args=3): substr(s,m,n) gives substring of s from 0-up position m to n
|
|
inclusive. Negative indices -len .. -1 alias to 0 .. len-1.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="systime"/><h2>systime</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
systime (class=time #args=0): Floating-point seconds since the epoch,
|
|
e.g. 1440768801.748936.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="tan"/><h2>tan</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
tan (class=math #args=1): Trigonometric tangent.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="tanh"/><h2>tanh</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
tanh (class=math #args=1): Hyperbolic tangent.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="tolower"/><h2>tolower</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
tolower (class=string #args=1): Convert string to lowercase.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="toupper"/><h2>toupper</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
toupper (class=string #args=1): Convert string to uppercase.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="typeof"/><h2>typeof</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
typeof (class=conversion #args=1): Convert argument to type of argument (e.g.
|
|
MT_STRING). For debug.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="urand"/><h2>urand</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
urand (class=math #args=0): Floating-point numbers on the unit interval.
|
|
Int-valued example: '$n=floor(20+urand()*11)'.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="urand32"/><h2>urand32</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
urand32 (class=math #args=0): Integer uniformly distributed 0 and 2**32-1
|
|
inclusive.
|
|
</pre>
|
|
</div>
|
|
|
|
|
|
<a id="urandint"/><h2>urandint</h2>
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
urandint (class=math #args=2): Integer uniformly distributed between inclusive
|
|
integer endpoints.
|
|
</pre>
|
|
</div>
|
|
|
|
<!-- ================================================================ -->
|
|
|
|
</div>
|
|
<!-- ================================================================ -->
|
|
<a id="User-defined_functions_and_subroutines"/><h1>User-defined functions and subroutines</h1>
|
|
<button style="font-weight:bold;color:maroon;border:0" padding=0 onclick="toggle_by_name('section_toggle_user_defined_functions');" href="javascript:;">Toggle section visibility</button>
|
|
<div id="section_toggle_user_defined_functions" style="display: block">
|
|
|
|
<p/> As of Miller 5.0.0 you can define your own functions, as well as subroutines.
|
|
|
|
<!-- ================================================================ -->
|
|
<a id="User-defined_functions"/><h2>User-defined functions</h2>
|
|
|
|
<p/>Here’s the obligatory example of a recursive function to compute the factorial function:
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr --opprint --from data/small put '
|
|
func f(n) {
|
|
if (is_numeric(n)) {
|
|
if (n > 0) {
|
|
return n * f(n-1);
|
|
} else {
|
|
return 1;
|
|
}
|
|
}
|
|
# implicitly return absent-null if non-numeric
|
|
}
|
|
$ox = f($x + NR);
|
|
$oi = f($i);
|
|
'
|
|
a b i x y ox oi
|
|
pan pan 1 0.3467901443380824 0.7268028627434533 0.467054 1
|
|
eks pan 2 0.7586799647899636 0.5221511083334797 3.680838 2
|
|
wye wye 3 0.20460330576630303 0.33831852551664776 1.741251 6
|
|
eks wye 4 0.38139939387114097 0.13418874328430463 18.588349 24
|
|
wye pan 5 0.5732889198020006 0.8636244699032729 211.387310 120
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
|
|
<p/>Properties of user-defined functions:
|
|
|
|
<ul>
|
|
|
|
<li/> Function bodies start with <tt>func</tt> and a parameter list, defined
|
|
outside of <tt>begin</tt>, <tt>end</tt>, or other <tt>func</tt> or
|
|
<tt>subr</tt> blocks. (I.e. the Miller DSL has no nested functions.)
|
|
|
|
<li/> A function (uniqified by its name) may not be redefined: either by
|
|
redefining a user-defined function, or by redefining a built-in function.
|
|
However, functions and subroutines have separate namespaces: you can define a
|
|
subroutine <tt>log</tt> which does not clash with the mathematical <tt>log</tt>
|
|
function.
|
|
|
|
<li/> Functions may be defined either before or after use (there is an
|
|
object-binding/linkage step at startup). More specifically, functions may be
|
|
either recursive or mutually recursive. Functions may not call subroutines.
|
|
|
|
<li/> Functions may be defined and called either within <tt>mlr put</tt> or
|
|
<tt>mlr put</tt>.
|
|
|
|
<li/> Functions have read access to <tt>$</tt>-variables and
|
|
<tt>@</tt>-variables but may not modify them.
|
|
See also
|
|
<a href="cookbook.html#Memoization_with_out-of-stream_variables">this cookbook item</a> for an example.
|
|
|
|
<li/> Argument values may be reassigned: they are not read-only.
|
|
|
|
<li/> When a return value is not implicitly returned, this results in a return
|
|
value of absent-null. (In the example above, if there were records for which
|
|
the argument to <tt>f</tt> is non-numeric, the assignments would be skipped.)
|
|
See also the section on
|
|
<a href="#Null_data:_empty_and_absent">empty_and_absent null data</a>.
|
|
|
|
<li/> See the section on <a href="#Local_variables">local variables</a> for
|
|
information on scope and extent of arguments, as well as for information on the
|
|
use of local variables within functions.
|
|
|
|
<li/> See the section on <a href="#Expressions_from_files">expressions from
|
|
files</a> for information on the use of <tt>-f</tt> and <tt>-e</tt> flags.
|
|
|
|
</ul>
|
|
|
|
<!-- ================================================================ -->
|
|
<a id="User-defined_subroutines"/><h2>User-defined subroutines</h2>
|
|
|
|
<p/>Example:
|
|
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr --opprint --from data/small put -q '
|
|
begin {
|
|
@call_count = 0;
|
|
}
|
|
subr s(n) {
|
|
@call_count += 1;
|
|
if (is_numeric(n)) {
|
|
if (n > 1) {
|
|
call s(n-1);
|
|
} else {
|
|
print "numcalls=" . @call_count;
|
|
}
|
|
}
|
|
}
|
|
print "NR=" . NR;
|
|
call s(NR);
|
|
'
|
|
NR=1
|
|
numcalls=1
|
|
NR=2
|
|
numcalls=3
|
|
NR=3
|
|
numcalls=6
|
|
NR=4
|
|
numcalls=10
|
|
NR=5
|
|
numcalls=15
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
|
|
<p/>Properties of user-defined subroutines:
|
|
|
|
<ul>
|
|
|
|
<li/> Subroutine bodies start with <tt>subr</tt> and a parameter list, defined
|
|
outside of <tt>begin</tt>, <tt>end</tt>, or other <tt>func</tt> or
|
|
<tt>subr</tt> blocks. (I.e. the Miller DSL has no nested subroutines.)
|
|
|
|
<li/> A subroutine (uniqified by its name) may not be redefined.
|
|
However, functions and subroutines have separate namespaces: you can define a
|
|
subroutine <tt>log</tt> which does not clash with the mathematical <tt>log</tt>
|
|
function.
|
|
|
|
<li/> Subroutines may be defined either before or after use (there is an
|
|
object-binding/linkage step at startup). More specifically, subroutines may be
|
|
either recursive or mutually recursive. Subroutines may call functions.
|
|
|
|
<li/> Subroutines may be defined and called either within <tt>mlr put</tt> or
|
|
<tt>mlr put</tt>.
|
|
|
|
<li/> Subroutines have read/write access to <tt>$</tt>-variables and
|
|
<tt>@</tt>-variables.
|
|
|
|
<li/> Argument values may be reassigned: they are not read-only.
|
|
|
|
<li/> See the section on <a href="#Local_variables">local variables</a> for
|
|
information on scope and extent of arguments, as well as for information on the
|
|
use of local variables within functions.
|
|
|
|
<li/> See the section on <a href="#Expressions_from_files">expressions from
|
|
files</a> for information on the use of <tt>-f</tt> and <tt>-e</tt> flags.
|
|
|
|
</ul>
|
|
|
|
</div>
|
|
<!-- ================================================================ -->
|
|
<a id="Errors_and_transparency"/><h1>Errors and transparency</h1>
|
|
<button style="font-weight:bold;color:maroon;border:0" padding=0 onclick="toggle_by_name('section_toggle_transparency');" href="javascript:;">Toggle section visibility</button>
|
|
<div id="section_toggle_transparency" style="display: block">
|
|
|
|
<p/>As soon as you have a programming language, you start having the problem
|
|
<i>What is my code doing, and why?</i> This includes getting syntax errors
|
|
— which are always annoying — as well as the even more annoying
|
|
problem of a program which parses without syntax error but doesn’t do
|
|
what you expect.
|
|
|
|
<p/> The <tt>syntax error</tt> message is cryptic: it says <tt>syntax error at
|
|
</tt> followed by the next symbol it couldn’t parse. This is good, but
|
|
(as of 5.0.0) it doesn’t say things like <tt>syntax error at line 17,
|
|
character 22</tt>. Here are some common causes of syntax errors:
|
|
|
|
<ul>
|
|
|
|
<li/> Don’t forget <tt>;</tt> at end of line, before another statement on
|
|
the next line.
|
|
|
|
<li/> Miller’s DSL lacks the <tt>++</tt> and <tt>--</tt> operators.
|
|
|
|
<li/> Curly braces are required for the bodies of
|
|
<tt>if</tt>/<tt>while</tt>/<tt>for</tt> blocks, even when the body is a single
|
|
statement.
|
|
|
|
</ul>
|
|
|
|
<p/>Now for transparency:
|
|
|
|
<ul>
|
|
|
|
<li/>As in any language, you can do
|
|
<a href="#Print_statements"><tt>print</tt></a> (or <tt>eprint</tt> to print to
|
|
stderr). See also <a href="#Dump_statements"><tt>dump</tt></a> and <a
|
|
href="#Emit_statements"><tt>emit</tt></a>.
|
|
|
|
<li/> The <tt>-v</tt> option to <tt>mlr put</tt> and <tt>mlr filter</tt> prints
|
|
abstract syntax trees for your code. While not all details here will be of
|
|
interest to everyone, certainly this makes questions such as operator
|
|
precedence completely unambiguous.
|
|
|
|
<li/> The <tt>-T</tt> option prints a trace of each statement executed.
|
|
|
|
<li/> The <tt>-t</tt> and <tt>-a</tt> options show low-level details for the
|
|
parsing process and for stack-variable-index allocation, respectively. These
|
|
will likely be of interest to people who enjoy compilers, and probably less
|
|
useful for a more general audience.
|
|
|
|
<li/> Please see the <a href="#Type-checking">type-checking section</a> for
|
|
type declarations and type-assertions you can use to make sure expressions and
|
|
the data flowing them are evaluating as you expect. I made them optional
|
|
because one of Miller’s important use-cases is being able to say simple
|
|
things like <tt>mlr put '$y = $x + 1' myfile.dat</tt> with a minimum of
|
|
punctuational bric-a-brac — but for programs over a few lines I generally
|
|
find that the more type-specification, the better.
|
|
|
|
</ul>
|
|
|
|
</div>
|
|
<!-- ================================================================ -->
|
|
<a id="A_note_on_the_complexity_of_Miller’s_expression_language"/><h1>A note on the complexity of Miller’s expression language</h1>
|
|
<button style="font-weight:bold;color:maroon;border:0" padding=0 onclick="toggle_by_name('section_toggle_a_note_on_complexity');" href="javascript:;">Toggle section visibility</button>
|
|
<div id="section_toggle_a_note_on_complexity" style="display: block">
|
|
|
|
<p/> One of Miller’s strengths is its brevity: it’s much quicker
|
|
— and less error-prone — to type <tt>mlr stats1 -a sum -f x,y -g
|
|
a,b</tt> than having to track summation variables as in <tt>awk</tt>, or using
|
|
Miller’s out-of-stream variables. And the more language features
|
|
Miller’s put-DSL has (for-loops, if-statements, nested control
|
|
structures, user-defined functions, etc.) then the <i>less</i> powerful it
|
|
begins to seem: because of the other programming-language features it
|
|
<i>doesn’t</i> have (classes, execptions, and so on).
|
|
|
|
<p/> When I was originally prototyping Miller in 2015, the decision I had was
|
|
whether to hand-code in a low-level language like C or Rust, with my own
|
|
hand-rolled DSL, or whether to use a higher-level language (like Python or Lua
|
|
or Nim) and let the <tt>put</tt> statements be handled by the implementation
|
|
language’s own <tt>eval</tt>: the implementation language would take the
|
|
place of a DSL. Multiple performance experiments showed me I could get better
|
|
throughput using the former, and using C in particular — by a wide margin. So
|
|
Miller is C under the hood with a hand-rolled DSL.
|
|
|
|
<p/> I do want to keep focusing on what Miller is good at — concise
|
|
notation, low latency, and high throughput — and not add too much in
|
|
terms of high-level-language features to the DSL. That said, some sort of
|
|
customizability is a basic thing to want. As of 4.1.0 we have recursive
|
|
for/while/if structures on about the same complexity level as <tt>awk</tt>; as
|
|
of 5.0.0 we have user-defined functions and map-valued variables, again on
|
|
about the same complexity level as <tt>awk</tt> along with optional
|
|
type-declaration syntax. While I’m excited by these powerful language
|
|
features, I hope to keep new features beyond 5.0.0 focused on Miller’s
|
|
sweet spot which is speed plus simplicity.
|
|
|
|
</div>
|
|
</div>
|
|
</td>
|
|
|
|
</table>
|
|
</body>
|
|
</html>
|