iterating

This commit is contained in:
John Kerl 2021-05-24 22:02:43 -04:00 committed by John Kerl
parent 2eef6d852c
commit 01ce429a90
124 changed files with 32 additions and 51094 deletions

1
.gitignore vendored
View file

@ -82,6 +82,7 @@ push2
data/.gitignore
docs/_build
docs6/_build
c/mlr.static
miller-*.src.rpm

View file

@ -52,7 +52,7 @@ but it can also do format conversion (here, you can pretty-print in tabular form
yellow circle 1 87 63.5058 8.3350
purple square 0 91 72.3735 8.2430
``mlr head`` and ``mlr tail`` count records rather than lines. Whethere you're getting the first few records or the last few, the CSV header is included either way::
``mlr head`` and ``mlr tail`` count records rather than lines. Whether you're getting the first few records or the last few, the CSV header is included either way::
$ mlr --csv head -n 4 example.csv
color,shape,flag,index,quantity,rate
@ -208,12 +208,6 @@ OK, CSV and pretty-print are fine. But Miller can also convert between a few oth
{ "color": "yellow", "shape": "CIRCLE", "flag": 1, "index": 87, "quantity": 63.5058, "rate": 8.3350, "ratio": 7.619172 }
{ "color": "purple", "shape": "SQUARE", "flag": 0, "index": 91, "quantity": 72.3735, "rate": 8.2430, "ratio": 8.779995 }
Or, JSON output with vertical-formatting flags::
$ mlr --icsv --ojson tail -n 2 example.csv
{ "color": "yellow", "shape": "circle", "flag": 1, "index": 87, "quantity": 63.5058, "rate": 8.3350 }
{ "color": "purple", "shape": "square", "flag": 0, "index": 91, "quantity": 72.3735, "rate": 8.2430 }
Sorts and stats
^^^^^^^^^^^^^^^

View file

@ -16,7 +16,7 @@ but it can also do format conversion (here, you can pretty-print in tabular form
POKI_RUN_COMMAND{{mlr --icsv --opprint cat example.csv}}HERE
``mlr head`` and ``mlr tail`` count records rather than lines. Whethere you're getting the first few records or the last few, the CSV header is included either way::
``mlr head`` and ``mlr tail`` count records rather than lines. Whether you're getting the first few records or the last few, the CSV header is included either way::
POKI_RUN_COMMAND{{mlr --csv head -n 4 example.csv}}HERE
@ -67,10 +67,6 @@ OK, CSV and pretty-print are fine. But Miller can also convert between a few oth
POKI_RUN_COMMAND{{mlr --icsv --ojson put '$ratio = $quantity/$rate; $shape = toupper($shape)' example.csv}}HERE
Or, JSON output with vertical-formatting flags::
POKI_RUN_COMMAND{{mlr --icsv --ojson tail -n 2 example.csv}}HERE
Sorts and stats
^^^^^^^^^^^^^^^

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

View file

@ -1,4 +0,0 @@
# Sphinx build info version 1
# This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done.
config: e016f5df2654b2ea86739097b17e0daf
tags: 645f666f9bcd5a90fca523b33c5a78b7

View file

@ -1,495 +0,0 @@
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>Miller in 10 minutes &#8212; Miller 5.10.2 documentation</title>
<link rel="stylesheet" href="_static/classic.css" type="text/css" />
<link rel="stylesheet" href="_static/pygments.css" type="text/css" />
<script id="documentation_options" data-url_root="./" src="_static/documentation_options.js"></script>
<script src="_static/jquery.js"></script>
<script src="_static/underscore.js"></script>
<script src="_static/doctools.js"></script>
<script src="_static/language_data.js"></script>
<link rel="index" title="Index" href="genindex.html" />
<link rel="search" title="Search" href="search.html" />
<link rel="next" title="Unix-toolkit context" href="feature-comparison.html" />
<link rel="prev" title="Features" href="features.html" />
</head><body>
<div class="related" role="navigation" aria-label="related navigation">
<h3>Navigation</h3>
<ul>
<li class="right" style="margin-right: 10px">
<a href="genindex.html" title="General Index"
accesskey="I">index</a></li>
<li class="right" >
<a href="feature-comparison.html" title="Unix-toolkit context"
accesskey="N">next</a> |</li>
<li class="right" >
<a href="features.html" title="Features"
accesskey="P">previous</a> |</li>
<li class="nav-item nav-item-0"><a href="index.html">Miller 5.10.2 documentation</a> &#187;</li>
<li class="nav-item nav-item-this"><a href="">Miller in 10 minutes</a></li>
</ul>
</div>
<div class="document">
<div class="documentwrapper">
<div class="bodywrapper">
<div class="body" role="main">
<div class="section" id="miller-in-10-minutes">
<h1>Miller in 10 minutes<a class="headerlink" href="#miller-in-10-minutes" title="Permalink to this headline"></a></h1>
<div class="section" id="csv-file-examples">
<h2>CSV-file examples<a class="headerlink" href="#csv-file-examples" title="Permalink to this headline"></a></h2>
<p>Suppose you have this CSV data file:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ cat example.csv
color,shape,flag,index,quantity,rate
yellow,triangle,1,11,43.6498,9.8870
red,square,1,15,79.2778,0.0130
red,circle,1,16,13.8103,2.9010
red,square,0,48,77.5542,7.4670
purple,triangle,0,51,81.2290,8.5910
red,square,0,64,77.1991,9.5310
purple,triangle,0,65,80.1405,5.8240
yellow,circle,1,73,63.9785,4.2370
yellow,circle,1,87,63.5058,8.3350
purple,square,0,91,72.3735,8.2430
</pre></div>
</div>
<p><code class="docutils literal notranslate"><span class="pre">mlr</span> <span class="pre">cat</span></code> is like cat it passes the data through unmodified:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ mlr --csv cat example.csv
color,shape,flag,index,quantity,rate
yellow,triangle,1,11,43.6498,9.8870
red,square,1,15,79.2778,0.0130
red,circle,1,16,13.8103,2.9010
red,square,0,48,77.5542,7.4670
purple,triangle,0,51,81.2290,8.5910
red,square,0,64,77.1991,9.5310
purple,triangle,0,65,80.1405,5.8240
yellow,circle,1,73,63.9785,4.2370
yellow,circle,1,87,63.5058,8.3350
purple,square,0,91,72.3735,8.2430
</pre></div>
</div>
<p>but it can also do format conversion (here, you can pretty-print in tabular format):</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ mlr --icsv --opprint cat example.csv
color shape flag index quantity rate
yellow triangle 1 11 43.6498 9.8870
red square 1 15 79.2778 0.0130
red circle 1 16 13.8103 2.9010
red square 0 48 77.5542 7.4670
purple triangle 0 51 81.2290 8.5910
red square 0 64 77.1991 9.5310
purple triangle 0 65 80.1405 5.8240
yellow circle 1 73 63.9785 4.2370
yellow circle 1 87 63.5058 8.3350
purple square 0 91 72.3735 8.2430
</pre></div>
</div>
<p><code class="docutils literal notranslate"><span class="pre">mlr</span> <span class="pre">head</span></code> and <code class="docutils literal notranslate"><span class="pre">mlr</span> <span class="pre">tail</span></code> count records rather than lines. Whethere youre getting the first few records or the last few, the CSV header is included either way:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ mlr --csv head -n 4 example.csv
color,shape,flag,index,quantity,rate
yellow,triangle,1,11,43.6498,9.8870
red,square,1,15,79.2778,0.0130
red,circle,1,16,13.8103,2.9010
red,square,0,48,77.5542,7.4670
</pre></div>
</div>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ mlr --csv tail -n 4 example.csv
color,shape,flag,index,quantity,rate
purple,triangle,0,65,80.1405,5.8240
yellow,circle,1,73,63.9785,4.2370
yellow,circle,1,87,63.5058,8.3350
purple,square,0,91,72.3735,8.2430
</pre></div>
</div>
<p>You can sort primarily alphabetically on one field, then secondarily numerically descending on another field:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ mlr --icsv --opprint sort -f shape -nr index example.csv
color shape flag index quantity rate
yellow circle 1 87 63.5058 8.3350
yellow circle 1 73 63.9785 4.2370
red circle 1 16 13.8103 2.9010
purple square 0 91 72.3735 8.2430
red square 0 64 77.1991 9.5310
red square 0 48 77.5542 7.4670
red square 1 15 79.2778 0.0130
purple triangle 0 65 80.1405 5.8240
purple triangle 0 51 81.2290 8.5910
yellow triangle 1 11 43.6498 9.8870
</pre></div>
</div>
<p>You can use <code class="docutils literal notranslate"><span class="pre">cut</span></code> to retain only specified fields, in the same order they appeared in the input data:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ mlr --icsv --opprint cut -f flag,shape example.csv
shape flag
triangle 1
square 1
circle 1
square 0
triangle 0
square 0
triangle 0
circle 1
circle 1
square 0
</pre></div>
</div>
<p>You can also use <code class="docutils literal notranslate"><span class="pre">cut</span> <span class="pre">-o</span></code> to retain only specified fields in your preferred order:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ mlr --icsv --opprint cut -o -f flag,shape example.csv
flag shape
1 triangle
1 square
1 circle
0 square
0 triangle
0 square
0 triangle
1 circle
1 circle
0 square
</pre></div>
</div>
<p>You can use <code class="docutils literal notranslate"><span class="pre">cut</span> <span class="pre">-x</span></code> to omit fields you dont care about:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ mlr --icsv --opprint cut -x -f flag,shape example.csv
color index quantity rate
yellow 11 43.6498 9.8870
red 15 79.2778 0.0130
red 16 13.8103 2.9010
red 48 77.5542 7.4670
purple 51 81.2290 8.5910
red 64 77.1991 9.5310
purple 65 80.1405 5.8240
yellow 73 63.9785 4.2370
yellow 87 63.5058 8.3350
purple 91 72.3735 8.2430
</pre></div>
</div>
<p>You can use <code class="docutils literal notranslate"><span class="pre">filter</span></code> to keep only records you care about:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ mlr --icsv --opprint filter &#39;$color == &quot;red&quot;&#39; example.csv
color shape flag index quantity rate
red square 1 15 79.2778 0.0130
red circle 1 16 13.8103 2.9010
red square 0 48 77.5542 7.4670
red square 0 64 77.1991 9.5310
</pre></div>
</div>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ mlr --icsv --opprint filter &#39;$color == &quot;red&quot; &amp;&amp; $flag == 1&#39; example.csv
color shape flag index quantity rate
red square 1 15 79.2778 0.0130
red circle 1 16 13.8103 2.9010
</pre></div>
</div>
<p>You can use <code class="docutils literal notranslate"><span class="pre">put</span></code> to create new fields which are computed from other fields:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ mlr --icsv --opprint put &#39;$ratio = $quantity / $rate; $color_shape = $color . &quot;_&quot; . $shape&#39; example.csv
color shape flag index quantity rate ratio color_shape
yellow triangle 1 11 43.6498 9.8870 4.414868 yellow_triangle
red square 1 15 79.2778 0.0130 6098.292308 red_square
red circle 1 16 13.8103 2.9010 4.760531 red_circle
red square 0 48 77.5542 7.4670 10.386260 red_square
purple triangle 0 51 81.2290 8.5910 9.455127 purple_triangle
red square 0 64 77.1991 9.5310 8.099790 red_square
purple triangle 0 65 80.1405 5.8240 13.760388 purple_triangle
yellow circle 1 73 63.9785 4.2370 15.099953 yellow_circle
yellow circle 1 87 63.5058 8.3350 7.619172 yellow_circle
purple square 0 91 72.3735 8.2430 8.779995 purple_square
</pre></div>
</div>
<p>Even though Millers main selling point is name-indexing, sometimes you really want to refer to a field name by its positional index. Use <code class="docutils literal notranslate"><span class="pre">$[[3]]</span></code> to access the name of field 3 or <code class="docutils literal notranslate"><span class="pre">$[[[3]]]</span></code> to access the value of field 3:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ mlr --icsv --opprint put &#39;$[[3]] = &quot;NEW&quot;&#39; example.csv
color shape NEW index quantity rate
yellow triangle 1 11 43.6498 9.8870
red square 1 15 79.2778 0.0130
red circle 1 16 13.8103 2.9010
red square 0 48 77.5542 7.4670
purple triangle 0 51 81.2290 8.5910
red square 0 64 77.1991 9.5310
purple triangle 0 65 80.1405 5.8240
yellow circle 1 73 63.9785 4.2370
yellow circle 1 87 63.5058 8.3350
purple square 0 91 72.3735 8.2430
</pre></div>
</div>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ mlr --icsv --opprint put &#39;$[[[3]]] = &quot;NEW&quot;&#39; example.csv
color shape flag index quantity rate
yellow triangle NEW 11 43.6498 9.8870
red square NEW 15 79.2778 0.0130
red circle NEW 16 13.8103 2.9010
red square NEW 48 77.5542 7.4670
purple triangle NEW 51 81.2290 8.5910
red square NEW 64 77.1991 9.5310
purple triangle NEW 65 80.1405 5.8240
yellow circle NEW 73 63.9785 4.2370
yellow circle NEW 87 63.5058 8.3350
purple square NEW 91 72.3735 8.2430
</pre></div>
</div>
</div>
<div class="section" id="json-file-examples">
<h2>JSON-file examples<a class="headerlink" href="#json-file-examples" title="Permalink to this headline"></a></h2>
<p>OK, CSV and pretty-print are fine. But Miller can also convert between a few other formats lets take a look at JSON output:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ mlr --icsv --ojson put &#39;$ratio = $quantity/$rate; $shape = toupper($shape)&#39; example.csv
{ &quot;color&quot;: &quot;yellow&quot;, &quot;shape&quot;: &quot;TRIANGLE&quot;, &quot;flag&quot;: 1, &quot;index&quot;: 11, &quot;quantity&quot;: 43.6498, &quot;rate&quot;: 9.8870, &quot;ratio&quot;: 4.414868 }
{ &quot;color&quot;: &quot;red&quot;, &quot;shape&quot;: &quot;SQUARE&quot;, &quot;flag&quot;: 1, &quot;index&quot;: 15, &quot;quantity&quot;: 79.2778, &quot;rate&quot;: 0.0130, &quot;ratio&quot;: 6098.292308 }
{ &quot;color&quot;: &quot;red&quot;, &quot;shape&quot;: &quot;CIRCLE&quot;, &quot;flag&quot;: 1, &quot;index&quot;: 16, &quot;quantity&quot;: 13.8103, &quot;rate&quot;: 2.9010, &quot;ratio&quot;: 4.760531 }
{ &quot;color&quot;: &quot;red&quot;, &quot;shape&quot;: &quot;SQUARE&quot;, &quot;flag&quot;: 0, &quot;index&quot;: 48, &quot;quantity&quot;: 77.5542, &quot;rate&quot;: 7.4670, &quot;ratio&quot;: 10.386260 }
{ &quot;color&quot;: &quot;purple&quot;, &quot;shape&quot;: &quot;TRIANGLE&quot;, &quot;flag&quot;: 0, &quot;index&quot;: 51, &quot;quantity&quot;: 81.2290, &quot;rate&quot;: 8.5910, &quot;ratio&quot;: 9.455127 }
{ &quot;color&quot;: &quot;red&quot;, &quot;shape&quot;: &quot;SQUARE&quot;, &quot;flag&quot;: 0, &quot;index&quot;: 64, &quot;quantity&quot;: 77.1991, &quot;rate&quot;: 9.5310, &quot;ratio&quot;: 8.099790 }
{ &quot;color&quot;: &quot;purple&quot;, &quot;shape&quot;: &quot;TRIANGLE&quot;, &quot;flag&quot;: 0, &quot;index&quot;: 65, &quot;quantity&quot;: 80.1405, &quot;rate&quot;: 5.8240, &quot;ratio&quot;: 13.760388 }
{ &quot;color&quot;: &quot;yellow&quot;, &quot;shape&quot;: &quot;CIRCLE&quot;, &quot;flag&quot;: 1, &quot;index&quot;: 73, &quot;quantity&quot;: 63.9785, &quot;rate&quot;: 4.2370, &quot;ratio&quot;: 15.099953 }
{ &quot;color&quot;: &quot;yellow&quot;, &quot;shape&quot;: &quot;CIRCLE&quot;, &quot;flag&quot;: 1, &quot;index&quot;: 87, &quot;quantity&quot;: 63.5058, &quot;rate&quot;: 8.3350, &quot;ratio&quot;: 7.619172 }
{ &quot;color&quot;: &quot;purple&quot;, &quot;shape&quot;: &quot;SQUARE&quot;, &quot;flag&quot;: 0, &quot;index&quot;: 91, &quot;quantity&quot;: 72.3735, &quot;rate&quot;: 8.2430, &quot;ratio&quot;: 8.779995 }
</pre></div>
</div>
<p>Or, JSON output with vertical-formatting flags:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ mlr --icsv --ojson tail -n 2 example.csv
{ &quot;color&quot;: &quot;yellow&quot;, &quot;shape&quot;: &quot;circle&quot;, &quot;flag&quot;: 1, &quot;index&quot;: 87, &quot;quantity&quot;: 63.5058, &quot;rate&quot;: 8.3350 }
{ &quot;color&quot;: &quot;purple&quot;, &quot;shape&quot;: &quot;square&quot;, &quot;flag&quot;: 0, &quot;index&quot;: 91, &quot;quantity&quot;: 72.3735, &quot;rate&quot;: 8.2430 }
</pre></div>
</div>
</div>
<div class="section" id="sorts-and-stats">
<h2>Sorts and stats<a class="headerlink" href="#sorts-and-stats" title="Permalink to this headline"></a></h2>
<p>Now suppose you want to sort the data on a given column, <em>and then</em> take the top few in that ordering. You can use Millers <code class="docutils literal notranslate"><span class="pre">then</span></code> feature to pipe commands together.</p>
<p>Here are the records with the top three <code class="docutils literal notranslate"><span class="pre">index</span></code> values:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ mlr --icsv --opprint sort -f shape -nr index then head -n 3 example.csv
color shape flag index quantity rate
yellow circle 1 87 63.5058 8.3350
yellow circle 1 73 63.9785 4.2370
red circle 1 16 13.8103 2.9010
</pre></div>
</div>
<p>Lots of Miller commands take a <code class="docutils literal notranslate"><span class="pre">-g</span></code> option for group-by: here, <code class="docutils literal notranslate"><span class="pre">head</span> <span class="pre">-n</span> <span class="pre">1</span> <span class="pre">-g</span> <span class="pre">shape</span></code> outputs the first record for each distinct value of the <code class="docutils literal notranslate"><span class="pre">shape</span></code> field. This means were finding the record with highest <code class="docutils literal notranslate"><span class="pre">index</span></code> field for each distinct <code class="docutils literal notranslate"><span class="pre">shape</span></code> field:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ mlr --icsv --opprint sort -f shape -nr index then head -n 1 -g shape example.csv
color shape flag index quantity rate
yellow circle 1 87 63.5058 8.3350
purple square 0 91 72.3735 8.2430
purple triangle 0 65 80.1405 5.8240
</pre></div>
</div>
<p>Statistics can be computed with or without group-by field(s):</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ mlr --icsv --opprint --from example.csv stats1 -a count,min,mean,max -f quantity -g shape
shape quantity_count quantity_min quantity_mean quantity_max
triangle 3 43.649800 68.339767 81.229000
square 4 72.373500 76.601150 79.277800
circle 3 13.810300 47.098200 63.978500
</pre></div>
</div>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ mlr --icsv --opprint --from example.csv stats1 -a count,min,mean,max -f quantity -g shape,color
shape color quantity_count quantity_min quantity_mean quantity_max
triangle yellow 1 43.649800 43.649800 43.649800
square red 3 77.199100 78.010367 79.277800
circle red 1 13.810300 13.810300 13.810300
triangle purple 2 80.140500 80.684750 81.229000
circle yellow 2 63.505800 63.742150 63.978500
square purple 1 72.373500 72.373500 72.373500
</pre></div>
</div>
<p>If your output has a lot of columns, you can use XTAB format to line things up vertically for you instead:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ mlr --icsv --oxtab --from example.csv stats1 -a p0,p10,p25,p50,p75,p90,p99,p100 -f rate
rate_p0 0.013000
rate_p10 2.901000
rate_p25 4.237000
rate_p50 8.243000
rate_p75 8.591000
rate_p90 9.887000
rate_p99 9.887000
rate_p100 9.887000
</pre></div>
</div>
</div>
<div class="section" id="choices-for-printing-to-files">
<span id="min-choices-for-printing-to-files"></span><h2>Choices for printing to files<a class="headerlink" href="#choices-for-printing-to-files" title="Permalink to this headline"></a></h2>
<p>Often we want to print output to the screen. Miller does this by default, as weve seen in the previous examples.</p>
<p>Sometimes we want to print output to another file: just use <strong>&gt; outputfilenamegoeshere</strong> at the end of your command:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="o">%</span> <span class="n">mlr</span> <span class="o">--</span><span class="n">icsv</span> <span class="o">--</span><span class="n">opprint</span> <span class="n">cat</span> <span class="n">example</span><span class="o">.</span><span class="n">csv</span> <span class="o">&gt;</span> <span class="n">newfile</span><span class="o">.</span><span class="n">csv</span>
<span class="c1"># Output goes to the new file;</span>
<span class="c1"># nothing is printed to the screen.</span>
</pre></div>
</div>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="o">%</span> <span class="n">cat</span> <span class="n">newfile</span><span class="o">.</span><span class="n">csv</span>
<span class="n">color</span> <span class="n">shape</span> <span class="n">flag</span> <span class="n">index</span> <span class="n">quantity</span> <span class="n">rate</span>
<span class="n">yellow</span> <span class="n">triangle</span> <span class="mi">1</span> <span class="mi">11</span> <span class="mf">43.6498</span> <span class="mf">9.8870</span>
<span class="n">red</span> <span class="n">square</span> <span class="mi">1</span> <span class="mi">15</span> <span class="mf">79.2778</span> <span class="mf">0.0130</span>
<span class="n">red</span> <span class="n">circle</span> <span class="mi">1</span> <span class="mi">16</span> <span class="mf">13.8103</span> <span class="mf">2.9010</span>
<span class="n">red</span> <span class="n">square</span> <span class="mi">0</span> <span class="mi">48</span> <span class="mf">77.5542</span> <span class="mf">7.4670</span>
<span class="n">purple</span> <span class="n">triangle</span> <span class="mi">0</span> <span class="mi">51</span> <span class="mf">81.2290</span> <span class="mf">8.5910</span>
<span class="n">red</span> <span class="n">square</span> <span class="mi">0</span> <span class="mi">64</span> <span class="mf">77.1991</span> <span class="mf">9.5310</span>
<span class="n">purple</span> <span class="n">triangle</span> <span class="mi">0</span> <span class="mi">65</span> <span class="mf">80.1405</span> <span class="mf">5.8240</span>
<span class="n">yellow</span> <span class="n">circle</span> <span class="mi">1</span> <span class="mi">73</span> <span class="mf">63.9785</span> <span class="mf">4.2370</span>
<span class="n">yellow</span> <span class="n">circle</span> <span class="mi">1</span> <span class="mi">87</span> <span class="mf">63.5058</span> <span class="mf">8.3350</span>
<span class="n">purple</span> <span class="n">square</span> <span class="mi">0</span> <span class="mi">91</span> <span class="mf">72.3735</span> <span class="mf">8.2430</span>
</pre></div>
</div>
<p>Other times we just want our files to be <strong>changed in-place</strong>: just use <strong>mlr -I</strong>:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="o">%</span> <span class="n">cp</span> <span class="n">example</span><span class="o">.</span><span class="n">csv</span> <span class="n">newfile</span><span class="o">.</span><span class="n">txt</span>
<span class="o">%</span> <span class="n">cat</span> <span class="n">newfile</span><span class="o">.</span><span class="n">txt</span>
<span class="n">color</span><span class="p">,</span><span class="n">shape</span><span class="p">,</span><span class="n">flag</span><span class="p">,</span><span class="n">index</span><span class="p">,</span><span class="n">quantity</span><span class="p">,</span><span class="n">rate</span>
<span class="n">yellow</span><span class="p">,</span><span class="n">triangle</span><span class="p">,</span><span class="mi">1</span><span class="p">,</span><span class="mi">11</span><span class="p">,</span><span class="mf">43.6498</span><span class="p">,</span><span class="mf">9.8870</span>
<span class="n">red</span><span class="p">,</span><span class="n">square</span><span class="p">,</span><span class="mi">1</span><span class="p">,</span><span class="mi">15</span><span class="p">,</span><span class="mf">79.2778</span><span class="p">,</span><span class="mf">0.0130</span>
<span class="n">red</span><span class="p">,</span><span class="n">circle</span><span class="p">,</span><span class="mi">1</span><span class="p">,</span><span class="mi">16</span><span class="p">,</span><span class="mf">13.8103</span><span class="p">,</span><span class="mf">2.9010</span>
<span class="n">red</span><span class="p">,</span><span class="n">square</span><span class="p">,</span><span class="mi">0</span><span class="p">,</span><span class="mi">48</span><span class="p">,</span><span class="mf">77.5542</span><span class="p">,</span><span class="mf">7.4670</span>
<span class="n">purple</span><span class="p">,</span><span class="n">triangle</span><span class="p">,</span><span class="mi">0</span><span class="p">,</span><span class="mi">51</span><span class="p">,</span><span class="mf">81.2290</span><span class="p">,</span><span class="mf">8.5910</span>
<span class="n">red</span><span class="p">,</span><span class="n">square</span><span class="p">,</span><span class="mi">0</span><span class="p">,</span><span class="mi">64</span><span class="p">,</span><span class="mf">77.1991</span><span class="p">,</span><span class="mf">9.5310</span>
<span class="n">purple</span><span class="p">,</span><span class="n">triangle</span><span class="p">,</span><span class="mi">0</span><span class="p">,</span><span class="mi">65</span><span class="p">,</span><span class="mf">80.1405</span><span class="p">,</span><span class="mf">5.8240</span>
<span class="n">yellow</span><span class="p">,</span><span class="n">circle</span><span class="p">,</span><span class="mi">1</span><span class="p">,</span><span class="mi">73</span><span class="p">,</span><span class="mf">63.9785</span><span class="p">,</span><span class="mf">4.2370</span>
<span class="n">yellow</span><span class="p">,</span><span class="n">circle</span><span class="p">,</span><span class="mi">1</span><span class="p">,</span><span class="mi">87</span><span class="p">,</span><span class="mf">63.5058</span><span class="p">,</span><span class="mf">8.3350</span>
<span class="n">purple</span><span class="p">,</span><span class="n">square</span><span class="p">,</span><span class="mi">0</span><span class="p">,</span><span class="mi">91</span><span class="p">,</span><span class="mf">72.3735</span><span class="p">,</span><span class="mf">8.2430</span>
<span class="o">%</span> <span class="n">mlr</span> <span class="o">-</span><span class="n">I</span> <span class="o">--</span><span class="n">icsv</span> <span class="o">--</span><span class="n">opprint</span> <span class="n">cat</span> <span class="n">newfile</span><span class="o">.</span><span class="n">txt</span>
<span class="o">%</span> <span class="n">cat</span> <span class="n">newfile</span><span class="o">.</span><span class="n">txt</span>
<span class="n">color</span> <span class="n">shape</span> <span class="n">flag</span> <span class="n">index</span> <span class="n">quantity</span> <span class="n">rate</span>
<span class="n">yellow</span> <span class="n">triangle</span> <span class="mi">1</span> <span class="mi">11</span> <span class="mf">43.6498</span> <span class="mf">9.8870</span>
<span class="n">red</span> <span class="n">square</span> <span class="mi">1</span> <span class="mi">15</span> <span class="mf">79.2778</span> <span class="mf">0.0130</span>
<span class="n">red</span> <span class="n">circle</span> <span class="mi">1</span> <span class="mi">16</span> <span class="mf">13.8103</span> <span class="mf">2.9010</span>
<span class="n">red</span> <span class="n">square</span> <span class="mi">0</span> <span class="mi">48</span> <span class="mf">77.5542</span> <span class="mf">7.4670</span>
<span class="n">purple</span> <span class="n">triangle</span> <span class="mi">0</span> <span class="mi">51</span> <span class="mf">81.2290</span> <span class="mf">8.5910</span>
<span class="n">red</span> <span class="n">square</span> <span class="mi">0</span> <span class="mi">64</span> <span class="mf">77.1991</span> <span class="mf">9.5310</span>
<span class="n">purple</span> <span class="n">triangle</span> <span class="mi">0</span> <span class="mi">65</span> <span class="mf">80.1405</span> <span class="mf">5.8240</span>
<span class="n">yellow</span> <span class="n">circle</span> <span class="mi">1</span> <span class="mi">73</span> <span class="mf">63.9785</span> <span class="mf">4.2370</span>
<span class="n">yellow</span> <span class="n">circle</span> <span class="mi">1</span> <span class="mi">87</span> <span class="mf">63.5058</span> <span class="mf">8.3350</span>
<span class="n">purple</span> <span class="n">square</span> <span class="mi">0</span> <span class="mi">91</span> <span class="mf">72.3735</span> <span class="mf">8.2430</span>
</pre></div>
</div>
<p>Also using <code class="docutils literal notranslate"><span class="pre">mlr</span> <span class="pre">-I</span></code> you can bulk-operate on lots of files: e.g.:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">mlr</span> <span class="o">-</span><span class="n">I</span> <span class="o">--</span><span class="n">csv</span> <span class="n">cut</span> <span class="o">-</span><span class="n">x</span> <span class="o">-</span><span class="n">f</span> <span class="n">unwanted_column_name</span> <span class="o">*.</span><span class="n">csv</span>
</pre></div>
</div>
<p>If you like, you can first copy off your original data somewhere else, before doing in-place operations.</p>
<p>Lastly, using <code class="docutils literal notranslate"><span class="pre">tee</span></code> within <code class="docutils literal notranslate"><span class="pre">put</span></code>, you can split your input data into separate files per one or more field names:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ mlr --csv --from example.csv put -q &#39;tee &gt; $shape.&quot;.csv&quot;, $*&#39;
</pre></div>
</div>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ cat circle.csv
color,shape,flag,index,quantity,rate
red,circle,1,16,13.8103,2.9010
yellow,circle,1,73,63.9785,4.2370
yellow,circle,1,87,63.5058,8.3350
</pre></div>
</div>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ cat square.csv
color,shape,flag,index,quantity,rate
red,square,1,15,79.2778,0.0130
red,square,0,48,77.5542,7.4670
red,square,0,64,77.1991,9.5310
purple,square,0,91,72.3735,8.2430
</pre></div>
</div>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ cat triangle.csv
color,shape,flag,index,quantity,rate
yellow,triangle,1,11,43.6498,9.8870
purple,triangle,0,51,81.2290,8.5910
purple,triangle,0,65,80.1405,5.8240
</pre></div>
</div>
</div>
<div class="section" id="other-format-examples">
<h2>Other-format examples<a class="headerlink" href="#other-format-examples" title="Permalink to this headline"></a></h2>
<p>Whats a CSV file, really? Its an array of rows, or <em>records</em>, each being a list of key-value pairs, or <em>fields</em>: for CSV it so happens that all the keys are shared in the header line and the values vary data line by data line.</p>
<p>For example, if you have:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">shape</span><span class="p">,</span><span class="n">flag</span><span class="p">,</span><span class="n">index</span>
<span class="n">circle</span><span class="p">,</span><span class="mi">1</span><span class="p">,</span><span class="mi">24</span>
<span class="n">square</span><span class="p">,</span><span class="mi">0</span><span class="p">,</span><span class="mi">36</span>
</pre></div>
</div>
<p>then thats a way of saying:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">shape</span><span class="o">=</span><span class="n">circle</span><span class="p">,</span><span class="n">flag</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span><span class="n">index</span><span class="o">=</span><span class="mi">24</span>
<span class="n">shape</span><span class="o">=</span><span class="n">square</span><span class="p">,</span><span class="n">flag</span><span class="o">=</span><span class="mi">0</span><span class="p">,</span><span class="n">index</span><span class="o">=</span><span class="mi">36</span>
</pre></div>
</div>
<p>Data written this way are called <strong>DKVP</strong>, for <em>delimited key-value pairs</em>.</p>
<p>Weve also already seen other ways to write the same data:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">CSV</span> <span class="n">PPRINT</span> <span class="n">JSON</span>
<span class="n">shape</span><span class="p">,</span><span class="n">flag</span><span class="p">,</span><span class="n">index</span> <span class="n">shape</span> <span class="n">flag</span> <span class="n">index</span> <span class="p">[</span>
<span class="n">circle</span><span class="p">,</span><span class="mi">1</span><span class="p">,</span><span class="mi">24</span> <span class="n">circle</span> <span class="mi">1</span> <span class="mi">24</span> <span class="p">{</span>
<span class="n">square</span><span class="p">,</span><span class="mi">0</span><span class="p">,</span><span class="mi">36</span> <span class="n">square</span> <span class="mi">0</span> <span class="mi">36</span> <span class="s2">&quot;shape&quot;</span><span class="p">:</span> <span class="s2">&quot;circle&quot;</span><span class="p">,</span>
<span class="s2">&quot;flag&quot;</span><span class="p">:</span> <span class="mi">1</span><span class="p">,</span>
<span class="s2">&quot;index&quot;</span><span class="p">:</span> <span class="mi">24</span>
<span class="p">},</span>
<span class="n">DKVP</span> <span class="n">XTAB</span> <span class="p">{</span>
<span class="n">shape</span><span class="o">=</span><span class="n">circle</span><span class="p">,</span><span class="n">flag</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span><span class="n">index</span><span class="o">=</span><span class="mi">24</span> <span class="n">shape</span> <span class="n">circle</span> <span class="s2">&quot;shape&quot;</span><span class="p">:</span> <span class="s2">&quot;square&quot;</span><span class="p">,</span>
<span class="n">shape</span><span class="o">=</span><span class="n">square</span><span class="p">,</span><span class="n">flag</span><span class="o">=</span><span class="mi">0</span><span class="p">,</span><span class="n">index</span><span class="o">=</span><span class="mi">36</span> <span class="n">flag</span> <span class="mi">1</span> <span class="s2">&quot;flag&quot;</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span>
<span class="n">index</span> <span class="mi">24</span> <span class="s2">&quot;index&quot;</span><span class="p">:</span> <span class="mi">36</span>
<span class="p">}</span>
<span class="n">shape</span> <span class="n">square</span> <span class="p">]</span>
<span class="n">flag</span> <span class="mi">0</span>
<span class="n">index</span> <span class="mi">36</span>
</pre></div>
</div>
<p>Anything we can do with CSV input data, we can do with any other format input data. And you can read from one format, do any record-processing, and output to the same format as the input, or to a different output format.</p>
</div>
</div>
<div class="clearer"></div>
</div>
</div>
</div>
<div class="sphinxsidebar" role="navigation" aria-label="main navigation">
<div class="sphinxsidebarwrapper">
<h3><a href="index.html">Table of Contents</a></h3>
<ul>
<li><a class="reference internal" href="#">Miller in 10 minutes</a><ul>
<li><a class="reference internal" href="#csv-file-examples">CSV-file examples</a></li>
<li><a class="reference internal" href="#json-file-examples">JSON-file examples</a></li>
<li><a class="reference internal" href="#sorts-and-stats">Sorts and stats</a></li>
<li><a class="reference internal" href="#choices-for-printing-to-files">Choices for printing to files</a></li>
<li><a class="reference internal" href="#other-format-examples">Other-format examples</a></li>
</ul>
</li>
</ul>
<h4>Previous topic</h4>
<p class="topless"><a href="features.html"
title="previous chapter">Features</a></p>
<h4>Next topic</h4>
<p class="topless"><a href="feature-comparison.html"
title="next chapter">Unix-toolkit context</a></p>
<div role="note" aria-label="source link">
<h3>This Page</h3>
<ul class="this-page-menu">
<li><a href="_sources/10min.rst.txt"
rel="nofollow">Show Source</a></li>
</ul>
</div>
<div id="searchbox" style="display: none" role="search">
<h3 id="searchlabel">Quick search</h3>
<div class="searchformwrapper">
<form class="search" action="search.html" method="get">
<input type="text" name="q" aria-labelledby="searchlabel" />
<input type="submit" value="Go" />
</form>
</div>
</div>
<script>$('#searchbox').show(0);</script>
</div>
</div>
<div class="clearer"></div>
</div>
<div class="related" role="navigation" aria-label="related navigation">
<h3>Navigation</h3>
<ul>
<li class="right" style="margin-right: 10px">
<a href="genindex.html" title="General Index"
>index</a></li>
<li class="right" >
<a href="feature-comparison.html" title="Unix-toolkit context"
>next</a> |</li>
<li class="right" >
<a href="features.html" title="Features"
>previous</a> |</li>
<li class="nav-item nav-item-0"><a href="index.html">Miller 5.10.2 documentation</a> &#187;</li>
<li class="nav-item nav-item-this"><a href="">Miller in 10 minutes</a></li>
</ul>
</div>
<div class="footer" role="contentinfo">
&#169; Copyright 2020, John Kerl.
Created using <a href="https://www.sphinx-doc.org/">Sphinx</a> 3.2.1.
</div>
</body>
</html>

Binary file not shown.

Before

Width:  |  Height:  |  Size: 507 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 137 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 632 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 77 KiB

View file

@ -1,402 +0,0 @@
..
PLEASE DO NOT EDIT DIRECTLY. EDIT THE .rst.in FILE PLEASE.
Miller in 10 minutes
====================
CSV-file examples
^^^^^^^^^^^^^^^^^
Suppose you have this CSV data file::
$ cat example.csv
color,shape,flag,index,quantity,rate
yellow,triangle,1,11,43.6498,9.8870
red,square,1,15,79.2778,0.0130
red,circle,1,16,13.8103,2.9010
red,square,0,48,77.5542,7.4670
purple,triangle,0,51,81.2290,8.5910
red,square,0,64,77.1991,9.5310
purple,triangle,0,65,80.1405,5.8240
yellow,circle,1,73,63.9785,4.2370
yellow,circle,1,87,63.5058,8.3350
purple,square,0,91,72.3735,8.2430
``mlr cat`` is like cat -- it passes the data through unmodified::
$ mlr --csv cat example.csv
color,shape,flag,index,quantity,rate
yellow,triangle,1,11,43.6498,9.8870
red,square,1,15,79.2778,0.0130
red,circle,1,16,13.8103,2.9010
red,square,0,48,77.5542,7.4670
purple,triangle,0,51,81.2290,8.5910
red,square,0,64,77.1991,9.5310
purple,triangle,0,65,80.1405,5.8240
yellow,circle,1,73,63.9785,4.2370
yellow,circle,1,87,63.5058,8.3350
purple,square,0,91,72.3735,8.2430
but it can also do format conversion (here, you can pretty-print in tabular format)::
$ mlr --icsv --opprint cat example.csv
color shape flag index quantity rate
yellow triangle 1 11 43.6498 9.8870
red square 1 15 79.2778 0.0130
red circle 1 16 13.8103 2.9010
red square 0 48 77.5542 7.4670
purple triangle 0 51 81.2290 8.5910
red square 0 64 77.1991 9.5310
purple triangle 0 65 80.1405 5.8240
yellow circle 1 73 63.9785 4.2370
yellow circle 1 87 63.5058 8.3350
purple square 0 91 72.3735 8.2430
``mlr head`` and ``mlr tail`` count records rather than lines. Whethere you're getting the first few records or the last few, the CSV header is included either way::
$ mlr --csv head -n 4 example.csv
color,shape,flag,index,quantity,rate
yellow,triangle,1,11,43.6498,9.8870
red,square,1,15,79.2778,0.0130
red,circle,1,16,13.8103,2.9010
red,square,0,48,77.5542,7.4670
::
$ mlr --csv tail -n 4 example.csv
color,shape,flag,index,quantity,rate
purple,triangle,0,65,80.1405,5.8240
yellow,circle,1,73,63.9785,4.2370
yellow,circle,1,87,63.5058,8.3350
purple,square,0,91,72.3735,8.2430
You can sort primarily alphabetically on one field, then secondarily numerically descending on another field::
$ mlr --icsv --opprint sort -f shape -nr index example.csv
color shape flag index quantity rate
yellow circle 1 87 63.5058 8.3350
yellow circle 1 73 63.9785 4.2370
red circle 1 16 13.8103 2.9010
purple square 0 91 72.3735 8.2430
red square 0 64 77.1991 9.5310
red square 0 48 77.5542 7.4670
red square 1 15 79.2778 0.0130
purple triangle 0 65 80.1405 5.8240
purple triangle 0 51 81.2290 8.5910
yellow triangle 1 11 43.6498 9.8870
You can use ``cut`` to retain only specified fields, in the same order they appeared in the input data::
$ mlr --icsv --opprint cut -f flag,shape example.csv
shape flag
triangle 1
square 1
circle 1
square 0
triangle 0
square 0
triangle 0
circle 1
circle 1
square 0
You can also use ``cut -o`` to retain only specified fields in your preferred order::
$ mlr --icsv --opprint cut -o -f flag,shape example.csv
flag shape
1 triangle
1 square
1 circle
0 square
0 triangle
0 square
0 triangle
1 circle
1 circle
0 square
You can use ``cut -x`` to omit fields you don't care about::
$ mlr --icsv --opprint cut -x -f flag,shape example.csv
color index quantity rate
yellow 11 43.6498 9.8870
red 15 79.2778 0.0130
red 16 13.8103 2.9010
red 48 77.5542 7.4670
purple 51 81.2290 8.5910
red 64 77.1991 9.5310
purple 65 80.1405 5.8240
yellow 73 63.9785 4.2370
yellow 87 63.5058 8.3350
purple 91 72.3735 8.2430
You can use ``filter`` to keep only records you care about::
$ mlr --icsv --opprint filter '$color == "red"' example.csv
color shape flag index quantity rate
red square 1 15 79.2778 0.0130
red circle 1 16 13.8103 2.9010
red square 0 48 77.5542 7.4670
red square 0 64 77.1991 9.5310
::
$ mlr --icsv --opprint filter '$color == "red" && $flag == 1' example.csv
color shape flag index quantity rate
red square 1 15 79.2778 0.0130
red circle 1 16 13.8103 2.9010
You can use ``put`` to create new fields which are computed from other fields::
$ mlr --icsv --opprint put '$ratio = $quantity / $rate; $color_shape = $color . "_" . $shape' example.csv
color shape flag index quantity rate ratio color_shape
yellow triangle 1 11 43.6498 9.8870 4.414868 yellow_triangle
red square 1 15 79.2778 0.0130 6098.292308 red_square
red circle 1 16 13.8103 2.9010 4.760531 red_circle
red square 0 48 77.5542 7.4670 10.386260 red_square
purple triangle 0 51 81.2290 8.5910 9.455127 purple_triangle
red square 0 64 77.1991 9.5310 8.099790 red_square
purple triangle 0 65 80.1405 5.8240 13.760388 purple_triangle
yellow circle 1 73 63.9785 4.2370 15.099953 yellow_circle
yellow circle 1 87 63.5058 8.3350 7.619172 yellow_circle
purple square 0 91 72.3735 8.2430 8.779995 purple_square
Even though Miller's main selling point is name-indexing, sometimes you really want to refer to a field name by its positional index. Use ``$[[3]]`` to access the name of field 3 or ``$[[[3]]]`` to access the value of field 3::
$ mlr --icsv --opprint put '$[[3]] = "NEW"' example.csv
color shape NEW index quantity rate
yellow triangle 1 11 43.6498 9.8870
red square 1 15 79.2778 0.0130
red circle 1 16 13.8103 2.9010
red square 0 48 77.5542 7.4670
purple triangle 0 51 81.2290 8.5910
red square 0 64 77.1991 9.5310
purple triangle 0 65 80.1405 5.8240
yellow circle 1 73 63.9785 4.2370
yellow circle 1 87 63.5058 8.3350
purple square 0 91 72.3735 8.2430
::
$ mlr --icsv --opprint put '$[[[3]]] = "NEW"' example.csv
color shape flag index quantity rate
yellow triangle NEW 11 43.6498 9.8870
red square NEW 15 79.2778 0.0130
red circle NEW 16 13.8103 2.9010
red square NEW 48 77.5542 7.4670
purple triangle NEW 51 81.2290 8.5910
red square NEW 64 77.1991 9.5310
purple triangle NEW 65 80.1405 5.8240
yellow circle NEW 73 63.9785 4.2370
yellow circle NEW 87 63.5058 8.3350
purple square NEW 91 72.3735 8.2430
JSON-file examples
^^^^^^^^^^^^^^^^^^
OK, CSV and pretty-print are fine. But Miller can also convert between a few other formats -- let's take a look at JSON output::
$ mlr --icsv --ojson put '$ratio = $quantity/$rate; $shape = toupper($shape)' example.csv
{ "color": "yellow", "shape": "TRIANGLE", "flag": 1, "index": 11, "quantity": 43.6498, "rate": 9.8870, "ratio": 4.414868 }
{ "color": "red", "shape": "SQUARE", "flag": 1, "index": 15, "quantity": 79.2778, "rate": 0.0130, "ratio": 6098.292308 }
{ "color": "red", "shape": "CIRCLE", "flag": 1, "index": 16, "quantity": 13.8103, "rate": 2.9010, "ratio": 4.760531 }
{ "color": "red", "shape": "SQUARE", "flag": 0, "index": 48, "quantity": 77.5542, "rate": 7.4670, "ratio": 10.386260 }
{ "color": "purple", "shape": "TRIANGLE", "flag": 0, "index": 51, "quantity": 81.2290, "rate": 8.5910, "ratio": 9.455127 }
{ "color": "red", "shape": "SQUARE", "flag": 0, "index": 64, "quantity": 77.1991, "rate": 9.5310, "ratio": 8.099790 }
{ "color": "purple", "shape": "TRIANGLE", "flag": 0, "index": 65, "quantity": 80.1405, "rate": 5.8240, "ratio": 13.760388 }
{ "color": "yellow", "shape": "CIRCLE", "flag": 1, "index": 73, "quantity": 63.9785, "rate": 4.2370, "ratio": 15.099953 }
{ "color": "yellow", "shape": "CIRCLE", "flag": 1, "index": 87, "quantity": 63.5058, "rate": 8.3350, "ratio": 7.619172 }
{ "color": "purple", "shape": "SQUARE", "flag": 0, "index": 91, "quantity": 72.3735, "rate": 8.2430, "ratio": 8.779995 }
Or, JSON output with vertical-formatting flags::
$ mlr --icsv --ojson tail -n 2 example.csv
{ "color": "yellow", "shape": "circle", "flag": 1, "index": 87, "quantity": 63.5058, "rate": 8.3350 }
{ "color": "purple", "shape": "square", "flag": 0, "index": 91, "quantity": 72.3735, "rate": 8.2430 }
Sorts and stats
^^^^^^^^^^^^^^^
Now suppose you want to sort the data on a given column, *and then* take the top few in that ordering. You can use Miller's ``then`` feature to pipe commands together.
Here are the records with the top three ``index`` values::
$ mlr --icsv --opprint sort -f shape -nr index then head -n 3 example.csv
color shape flag index quantity rate
yellow circle 1 87 63.5058 8.3350
yellow circle 1 73 63.9785 4.2370
red circle 1 16 13.8103 2.9010
Lots of Miller commands take a ``-g`` option for group-by: here, ``head -n 1 -g shape`` outputs the first record for each distinct value of the ``shape`` field. This means we're finding the record with highest ``index`` field for each distinct ``shape`` field::
$ mlr --icsv --opprint sort -f shape -nr index then head -n 1 -g shape example.csv
color shape flag index quantity rate
yellow circle 1 87 63.5058 8.3350
purple square 0 91 72.3735 8.2430
purple triangle 0 65 80.1405 5.8240
Statistics can be computed with or without group-by field(s)::
$ mlr --icsv --opprint --from example.csv stats1 -a count,min,mean,max -f quantity -g shape
shape quantity_count quantity_min quantity_mean quantity_max
triangle 3 43.649800 68.339767 81.229000
square 4 72.373500 76.601150 79.277800
circle 3 13.810300 47.098200 63.978500
::
$ mlr --icsv --opprint --from example.csv stats1 -a count,min,mean,max -f quantity -g shape,color
shape color quantity_count quantity_min quantity_mean quantity_max
triangle yellow 1 43.649800 43.649800 43.649800
square red 3 77.199100 78.010367 79.277800
circle red 1 13.810300 13.810300 13.810300
triangle purple 2 80.140500 80.684750 81.229000
circle yellow 2 63.505800 63.742150 63.978500
square purple 1 72.373500 72.373500 72.373500
If your output has a lot of columns, you can use XTAB format to line things up vertically for you instead::
$ mlr --icsv --oxtab --from example.csv stats1 -a p0,p10,p25,p50,p75,p90,p99,p100 -f rate
rate_p0 0.013000
rate_p10 2.901000
rate_p25 4.237000
rate_p50 8.243000
rate_p75 8.591000
rate_p90 9.887000
rate_p99 9.887000
rate_p100 9.887000
.. _10min-choices-for-printing-to-files:
Choices for printing to files
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Often we want to print output to the screen. Miller does this by default, as we've seen in the previous examples.
Sometimes we want to print output to another file: just use **> outputfilenamegoeshere** at the end of your command:
::
% mlr --icsv --opprint cat example.csv > newfile.csv
# Output goes to the new file;
# nothing is printed to the screen.
::
% cat newfile.csv
color shape flag index quantity rate
yellow triangle 1 11 43.6498 9.8870
red square 1 15 79.2778 0.0130
red circle 1 16 13.8103 2.9010
red square 0 48 77.5542 7.4670
purple triangle 0 51 81.2290 8.5910
red square 0 64 77.1991 9.5310
purple triangle 0 65 80.1405 5.8240
yellow circle 1 73 63.9785 4.2370
yellow circle 1 87 63.5058 8.3350
purple square 0 91 72.3735 8.2430
Other times we just want our files to be **changed in-place**: just use **mlr -I**::
% cp example.csv newfile.txt
% cat newfile.txt
color,shape,flag,index,quantity,rate
yellow,triangle,1,11,43.6498,9.8870
red,square,1,15,79.2778,0.0130
red,circle,1,16,13.8103,2.9010
red,square,0,48,77.5542,7.4670
purple,triangle,0,51,81.2290,8.5910
red,square,0,64,77.1991,9.5310
purple,triangle,0,65,80.1405,5.8240
yellow,circle,1,73,63.9785,4.2370
yellow,circle,1,87,63.5058,8.3350
purple,square,0,91,72.3735,8.2430
% mlr -I --icsv --opprint cat newfile.txt
% cat newfile.txt
color shape flag index quantity rate
yellow triangle 1 11 43.6498 9.8870
red square 1 15 79.2778 0.0130
red circle 1 16 13.8103 2.9010
red square 0 48 77.5542 7.4670
purple triangle 0 51 81.2290 8.5910
red square 0 64 77.1991 9.5310
purple triangle 0 65 80.1405 5.8240
yellow circle 1 73 63.9785 4.2370
yellow circle 1 87 63.5058 8.3350
purple square 0 91 72.3735 8.2430
Also using ``mlr -I`` you can bulk-operate on lots of files: e.g.::
mlr -I --csv cut -x -f unwanted_column_name *.csv
If you like, you can first copy off your original data somewhere else, before doing in-place operations.
Lastly, using ``tee`` within ``put``, you can split your input data into separate files per one or more field names::
$ mlr --csv --from example.csv put -q 'tee > $shape.".csv", $*'
::
$ cat circle.csv
color,shape,flag,index,quantity,rate
red,circle,1,16,13.8103,2.9010
yellow,circle,1,73,63.9785,4.2370
yellow,circle,1,87,63.5058,8.3350
::
$ cat square.csv
color,shape,flag,index,quantity,rate
red,square,1,15,79.2778,0.0130
red,square,0,48,77.5542,7.4670
red,square,0,64,77.1991,9.5310
purple,square,0,91,72.3735,8.2430
::
$ cat triangle.csv
color,shape,flag,index,quantity,rate
yellow,triangle,1,11,43.6498,9.8870
purple,triangle,0,51,81.2290,8.5910
purple,triangle,0,65,80.1405,5.8240
Other-format examples
^^^^^^^^^^^^^^^^^^^^^
What's a CSV file, really? It's an array of rows, or *records*, each being a list of key-value pairs, or *fields*: for CSV it so happens that all the keys are shared in the header line and the values vary data line by data line.
For example, if you have::
shape,flag,index
circle,1,24
square,0,36
then that's a way of saying::
shape=circle,flag=1,index=24
shape=square,flag=0,index=36
Data written this way are called **DKVP**, for *delimited key-value pairs*.
We've also already seen other ways to write the same data::
CSV PPRINT JSON
shape,flag,index shape flag index [
circle,1,24 circle 1 24 {
square,0,36 square 0 36 "shape": "circle",
"flag": 1,
"index": 24
},
DKVP XTAB {
shape=circle,flag=1,index=24 shape circle "shape": "square",
shape=square,flag=0,index=36 flag 1 "flag": 0,
index 24 "index": 36
}
shape square ]
flag 0
index 36
Anything we can do with CSV input data, we can do with any other format input data. And you can read from one format, do any record-processing, and output to the same format as the input, or to a different output format.

View file

@ -1,211 +0,0 @@
..
PLEASE DO NOT EDIT DIRECTLY. EDIT THE .rst.in FILE PLEASE.
Building from source
================================================================
Please also see :doc:`install` for information about pre-built executables.
Miller license
----------------------------------------------------------------
Two-clause BSD license https://github.com/johnkerl/miller/blob/master/LICENSE.txt.
From release tarball using autoconfig
----------------------------------------------------------------
Miller allows you the option of using GNU ``autoconfigure`` to build portably.
Grateful acknowledgement: Miller's GNU autoconfig work was done by the generous and expert efforts of `Thomas Klausner <https://github.com/0-wiz-0/>`_.
* Obtain ``mlr-i.j.k.tar.gz`` from https://github.com/johnkerl/miller/tags, replacing ``i.j.k`` with the desired release, e.g. ``2.2.1``.
* ``tar zxvf mlr-i.j.k.tar.gz``
* ``cd mlr-i.j.k``
* Install the following packages using your system's package manager (``apt-get``, ``yum install``, etc.): **flex**
* Various configuration options of your choice, e.g.
* ``./configure``
* ``./configure --prefix=/usr/local``
* ``./configure --prefix=$HOME/pkgs``
* ``./configure CC=clang``
* ``./configure --disable-shared`` (to make a statically linked executable)
* ``./configure 'CFLAGS=-Wall -std=gnu99 -O3'``
* etc.
* ``make`` creates the ``c/mlr`` executable
* ``make check``
* ``make install`` copies the ``c/mlr`` executable to your prefix's ``bin`` subdirectory.
From git clone using autoconfig
----------------------------------------------------------------
* ``git clone https://github.com/johnkerl/miller``
* ``cd miller``
* Install the following packages using your system's package manager (``apt-get``, ``yum install``, etc.): **automake autoconf libtool flex**
* Run ``autoreconf -fiv``. (This is necessary when building from head as discussed in https://github.com/johnkerl/miller/issues/131.)
* Then continue from "Install the following ... " as above.
Without using autoconfig
----------------------------------------------------------------
GNU autoconfig is familiar to many users, and indeed plenty of folks won't bother to use an open-source software package which doesn't have autoconfig support. And this is for good reason: GNU autoconfig allows us to build software on a wide diversity of platforms. For this reason I'm happy that Miller supports autoconfig.
But, many others (myself included!) find autoconfig confusing: if it works without errors, great, but if not, the ``./configure && make`` output can be exceedingly difficult to decipher. And this also can be a turn-off for using open-source software: if you can't figure out the build errors, you may just keep walking. For this reason I'm happy that Miller allows you to build without autoconfig. (Of course, if you have any build errors, feel free to contact me at mailto:kerl.john.r+miller@gmail.com -- or, better, open an issue with "New Issue" at https://github.com/johnkerl/miller/issues.)
Steps:
* Obtain a release tarball or git clone.
* ``cd`` into the ``c`` subdirectory.
* Edit the ``INSTALLDIR`` in ``Makefile.no-autoconfig``.
* To change the C compiler, edit the ``CC=`` lines in ``Makefile.no-autoconfig`` and ``dsls/Makefile.no-autoconfig``.
* ``make -f Makefile.no-autoconfig`` creates the ``mlr`` executable and runs unit/regression tests (i.e. the equivalent of both ``make`` and ``make check`` using autoconfig).
* ``make install`` copies the ``mlr`` executable to your install directory.
The ``Makefile.no-autoconfig`` is simple: little more than ``gcc *.c``. Customzing is less automatic than autoconfig, but more transparent. I expect this makefile to work with few modifications on a large fraction of modern Linux/BSD-like systems: I'm aware of successful use with ``gcc`` and ``clang``, on Ubuntu 12.04 LTS, SELinux, Darwin (MacOS Yosemite), and FreeBSD.
Windows
----------------------------------------------------------------
*Disclaimer: I'm now relying exclusively on* `Appveyor <https://ci.appveyor.com/project/johnkerl/miller>`_ *for Windows builds; I haven't built from source using MSYS in quite a while.*
Miller has been built on Windows using MSYS2: http://www.msys2.org/. You can install MSYS2 and build Miller from its source code within MSYS2, and then you can use the binary from outside MSYS2. You can also use a precompiled binary (see above).
You will first need to install MSYS2: http://www.msys2.org/. Then, start an MSYS2 shell, e.g. (supposing you installed MSYS2 to ``C:\msys2\``) run ``C:\msys2\mingw64.exe``. Within the MSYS2 shell, you can run the following to install dependent packages:
::
pacman -Syu
pacman -Su
pacman -S base-devel
pacman -S msys2-devel
pacman -S mingw-w64-x86_64-toolchain
pacman -S mingw-w64-x86_64-pcre
pacman -S msys2-runtime
The list of dependent packages may be also found in **appveyor.yml** in the Miller base directory.
Then, simply run **msys2-build.sh** which is a thin wrapper around ``./configure && make`` which accommodates certain Windows/MSYS2 idiosyncracies.
There is a unit-test false-negative issue involving the semantics of the ``mkstemp`` library routine but a ``make -k`` in the ``c`` subdirectory has been producing a ``mlr.exe`` for me.
Within MSYS2 you can run ``mlr``: simply copy it from the ``c`` subdirectory to your desired location somewhere within your MSYS2 ``$PATH``. To run ``mlr`` outside of MSYS2, just as with precompiled binaries as described above, you'll need ``msys-2.0.dll``. One way to do this is to augment your path:
::
C:\> set PATH=%PATH%;\msys64\mingw64\bin
Another way to do it is to copy the Miller executable and the DLL to the same directory:
::
C:\> mkdir \mbin
C:\> copy \msys64\mingw64\bin\msys-2.0.dll \mbin
C:\> copy \msys64\wherever\you\installed\miller\c\mlr.exe \mbin
C:\> set PATH=%PATH%;\mbin
In case of problems
----------------------------------------------------------------
If you have any build errors, feel free to contact me at mailto:kerl.john.r+miller@gmail.com -- or, better, open an issue with "New Issue" at https://github.com/johnkerl/miller/issues.
Dependencies
----------------------------------------------------------------
Required external dependencies
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
These are necessary to produce the ``mlr`` executable.
* ``gcc``, ``clang``, etc. (or presumably other compilers; please open an issue or send me a pull request if you have information for me about other 21st-century compilers)
* The standard C library
* ``flex``
* ``automake``, ``autoconf``, and ``libtool``, if you build with autoconfig
Optional external dependencies
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
This documentation pageset is built using Sphinx. Please see `./README.md` for details.
Internal dependencies
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
These are included within the `Miller source tree <https://github.com/johnkerl/miller>`_ and do not need to be separately installed (and in fact any separate installation will not be picked up in the Miller build):
* `Mersenne Twister <http://en.wikipedia.org/wiki/Mersenne_Twister>`_ for pseudorandom-number generation: `C implementation by Nishimura and Matsumoto <https://github.com/johnkerl/miller/blob/master/c/lib/mtrand.c>`_ with license terms respected.
* `MinUnit <http://www.jera.com/techinfo/jtns/jtn002.html>`_ for unit-testing, with as-is-no-warranty license http://www.jera.com/techinfo/jtns/jtn002.html#License, https://github.com/johnkerl/miller/blob/master/c/lib/minunit.h.
* The `Lemon parser-generator <http://www.hwaci.com/sw/lemon/>`_, the author of which explicitly disclaims copyright.
* The `udp JSON parser <https://github.com/udp/json-parser>`_, with BSD2 license.
* The `sheredom UTF-8 library <https://github.com/sheredom/utf8.h>`_, which is free and unencumbered software released into the public domain.
* The NetBSD ``strptime`` (needed for the Windows/MSYS2 port since MSYS2 lacks this), with BSD license.
Creating a new release: for developers
----------------------------------------------------------------
At present I'm the primary developer so this is just my checklist for making new releases.
In this example I am using version 3.4.0; of course that will change for subsequent revisions.
* Update version found in ``mlr --version`` and ``man mlr``:
* Edit ``configure.ac``, ``c/mlrvers.h``, ``miller.spec``, and ``docs/conf.py`` from ``3.3.2-dev`` to ``3.4.0``.
* Do a fresh ``autoreconf -fiv`` and commit the output. (Preferably on a Linux host, rather than MacOS, to reduce needless diffs in autogen build files.)
* ``make -C c -f Makefile.no-autoconfig installhome && make -C man -f Makefile.no-autoconfig installhome && make -C docs -f Makefile.no-autoconfig html``
* The ordering is important: the first build creates ``mlr``; the second runs ``mlr`` to create ``manpage.txt``; the third includes ``manpage.txt`` into one of its outputs.
* Commit and push.
* Create the release tarball and SRPM:
* On buildbox: ``./configure && make distcheck``
* On buildbox: make SRPM as in https://github.com/johnkerl/miller/blob/master/README-RPM.md
* On all buildboxes: ``cd c`` and ``make -f Makefile.no-autoconfig mlr.static``. Then copy ``mlr.static`` to ``../mlr.{arch}``. (This may require as prerequisite ``sudo yum install glibc-static`` or the like.)
* For static binaries, please do ``ldd mlr.static`` and make sure it says ``not a dynamic executable``.
* Then ``mv mlr.static ../mlr.linux_x86_64``
* Pull back release tarball ``mlr-3.4.0.tar.gz`` and SRPM ``miller-3.4.0-1.el6.src.rpm`` from buildbox, and ``mlr.{arch}`` binaries from whatever buildboxes.
* Download ``mlr.exe`` and ``msys-2.0.dll`` from https://ci.appveyor.com/project/johnkerl/miller/build/artifacts.
* Create the Github release tag:
* Don't forget the ``v`` in ``v3.4.0``
* Write the release notes
* Attach the release tarball, SRPM, and binaries. Double-check assets were successfully uploaded.
* Publish the release
* Check the release-specific docs:
* Look at https://miller.readthedocs.io for new-version docs, after a few minutes' propagation time.
* Notify:
* Submit ``brew`` pull request; notify any other distros which don't appear to have autoupdated since the previous release (notes below)
* Similarly for ``macports``: https://github.com/macports/macports-ports/blob/master/textproc/miller/Portfile.
* Social-media updates.
::
git remote add upstream https://github.com/Homebrew/homebrew-core # one-time setup only
git fetch upstream
git rebase upstream/master
git checkout -b miller-3.4.0
shasum -a 256 /path/to/mlr-3.4.0.tar.gz
edit Formula/miller.rb
# Test the URL from the line like
# url "https://github.com/johnkerl/miller/releases/download/v3.4.0/mlr-3.4.0.tar.gz"
# in a browser for typos
# A '@BrewTestBot Test this please' comment within the homebrew-core pull request will restart the homebrew travis build
git add Formula/miller.rb
git commit -m 'miller 3.4.0'
git push -u origin miller-3.4.0
(submit the pull request)
* Afterwork:
* Edit ``configure.ac`` and ``c/mlrvers.h`` to change version from ``3.4.0`` to ``3.4.0-dev``.
* ``make -C c -f Makefile.no-autoconfig installhome && make -C doc -f Makefile.no-autoconfig all installhome``
* Commit and push.
Misc. development notes
----------------------------------------------------------------
I use terminal width 120 and tabwidth 4.

View file

@ -1,11 +0,0 @@
..
PLEASE DO NOT EDIT DIRECTLY. EDIT THE .rst.in FILE PLEASE.
Contact
================================================================
Bug reports, feature requests, etc.: https://github.com/johnkerl/miller/issues
For issues involving this documentation site please also use https://github.com/johnkerl/miller/issues
Other correspondence: mailto:kerl.john.r+miller@gmail.com

File diff suppressed because it is too large Load diff

View file

@ -1,526 +0,0 @@
..
PLEASE DO NOT EDIT DIRECTLY. EDIT THE .rst.in FILE PLEASE.
Cookbook part 2: Random things, and some math
================================================================
Randomly selecting words from a list
----------------------------------------------------------------
Given this `word list <https://github.com/johnkerl/miller/blob/master/docs/data/english-words.txt>`_, first take a look to see what the first few lines look like:
::
$ head data/english-words.txt
a
aa
aal
aalii
aam
aardvark
aardwolf
aba
abac
abaca
Then the following will randomly sample ten words with four to eight characters in them:
::
$ mlr --from data/english-words.txt --nidx filter -S 'n=strlen($1);4<=n&&n<=8' then sample -k 10
thionine
birchman
mildewy
avigate
addedly
abaze
askant
aiming
insulant
coinmate
Randomly generating jabberwocky words
----------------------------------------------------------------
These are simple *n*-grams as `described here <http://johnkerl.org/randspell/randspell-slides-ts.pdf>`_. Some common functions are `located here <https://github.com/johnkerl/miller/blob/master/docs/ngrams/ngfuncs.mlr.txt>`_. Then here are scripts for `1-grams <https://github.com/johnkerl/miller/blob/master/docs/ngrams/ng1.mlr.txt>`_ `2-grams <https://github.com/johnkerl/miller/blob/master/docs/ngrams/ng2.mlr.txt>`_ `3-grams <https://github.com/johnkerl/miller/blob/master/docs/ngrams/ng3.mlr.txt>`_ `4-grams <https://github.com/johnkerl/miller/blob/master/docs/ngrams/ng4.mlr.txt>`_, and `5-grams <https://github.com/johnkerl/miller/blob/master/docs/ngrams/ng5.mlr.txt>`_.
The idea is that words from the input file are consumed, then taken apart and pasted back together in ways which imitate the letter-to-letter transitions found in the word list -- giving us automatically generated words in the same vein as *bromance* and *spork*:
::
$ mlr --nidx --from ./ngrams/gsl-2000.txt put -q -f ./ngrams/ngfuncs.mlr -f ./ngrams/ng5.mlr
beard
plastinguish
politicially
noise
loan
country
controductionary
suppery
lose
lessors
dollar
judge
rottendence
lessenger
diffendant
suggestional
Program timing
----------------------------------------------------------------
This admittedly artificial example demonstrates using Miller time and stats functions to introspectively acquire some information about Miller's own runtime. The ``delta`` function computes the difference between successive timestamps.
::
$ ruby -e '10000.times{|i|puts "i=#{i+1}"}' > lines.txt
$ head -n 5 lines.txt
i=1
i=2
i=3
i=4
i=5
mlr --ofmt '%.9le' --opprint put '$t=systime()' then step -a delta -f t lines.txt | head -n 7
i t t_delta
1 1430603027.018016 1.430603027e+09
2 1430603027.018043 2.694129944e-05
3 1430603027.018048 5.006790161e-06
4 1430603027.018052 4.053115845e-06
5 1430603027.018055 2.861022949e-06
6 1430603027.018058 3.099441528e-06
mlr --ofmt '%.9le' --oxtab \
put '$t=systime()' then \
step -a delta -f t then \
filter '$i>1' then \
stats1 -a min,mean,max -f t_delta \
lines.txt
t_delta_min 2.861022949e-06
t_delta_mean 4.077508505e-06
t_delta_max 5.388259888e-05
Computing interquartile ranges
----------------------------------------------------------------
For one or more specified field names, simply compute p25 and p75, then write the IQR as the difference of p75 and p25:
::
$ mlr --oxtab stats1 -f x -a p25,p75 \
then put '$x_iqr = $x_p75 - $x_p25' \
data/medium
x_p25 0.246670
x_p75 0.748186
x_iqr 0.501516
For wildcarded field names, first compute p25 and p75, then loop over field names with ``p25`` in them:
::
$ mlr --oxtab stats1 --fr '[i-z]' -a p25,p75 \
then put 'for (k,v in $*) {
if (k =~ "(.*)_p25") {
$["\1_iqr"] = $["\1_p75"] - $["\1_p25"]
}
}' \
data/medium
i_p25 2501
i_p75 7501
x_p25 0.246670
x_p75 0.748186
y_p25 0.252137
y_p75 0.764003
i_iqr 5000
x_iqr 0.501516
y_iqr 0.511866
Computing weighted means
----------------------------------------------------------------
This might be more elegantly implemented as an option within the ``stats1`` verb. Meanwhile, it's expressible within the DSL:
::
$ mlr --from data/medium put -q '
# Using the y field for weighting in this example
weight = $y;
# Using the a field for weighted aggregation in this example
@sumwx[$a] += weight * $i;
@sumw[$a] += weight;
@sumx[$a] += $i;
@sumn[$a] += 1;
end {
map wmean = {};
map mean = {};
for (a in @sumwx) {
wmean[a] = @sumwx[a] / @sumw[a]
}
for (a in @sumx) {
mean[a] = @sumx[a] / @sumn[a]
}
#emit wmean, "a";
#emit mean, "a";
emit (wmean, mean), "a";
}'
a=pan,wmean=4979.563722,mean=5028.259010
a=eks,wmean=4890.381593,mean=4956.290076
a=wye,wmean=4946.987746,mean=4920.001017
a=zee,wmean=5164.719685,mean=5123.092330
a=hat,wmean=4925.533162,mean=4967.743946
Generating random numbers from various distributions
----------------------------------------------------------------
Here we can chain together a few simple building blocks:
::
$ cat expo-sample.sh
# Generate 100,000 pairs of independent and identically distributed
# exponentially distributed random variables with the same rate parameter
# (namely, 2.5). Then compute histograms of one of them, along with
# histograms for their sum and their product.
#
# See also https://en.wikipedia.org/wiki/Exponential_distribution
#
# Here I'm using a specified random-number seed so this example always
# produces the same output for this web document: in everyday practice we
# wouldn't do that.
mlr -n \
--seed 0 \
--opprint \
seqgen --stop 100000 \
then put '
# https://en.wikipedia.org/wiki/Inverse_transform_sampling
func expo_sample(lambda) {
return -log(1-urand())/lambda
}
$u = expo_sample(2.5);
$v = expo_sample(2.5);
$s = $u + $v;
$p = $u * $v;
' \
then histogram -f u,s,p --lo 0 --hi 2 --nbins 50 \
then bar -f u_count,s_count,p_count --auto -w 20
Namely:
* Set the Miller random-number seed so this webdoc looks the same every time I regenerate it.
* Use pretty-printed tabular output.
* Use pretty-printed tabular output.
* Use ``seqgen`` to produce 100,000 records ``i=0``, ``i=1``, etc.
* Send those to a ``put`` step which defines an inverse-transform-sampling function and calls it twice, then computes the sum and product of samples.
* Send those to a histogram, and from there to a bar-plotter. This is just for visualization; you could just as well output CSV and send that off to your own plotting tool, etc.
The output is as follows:
::
$ sh expo-sample.sh
bin_lo bin_hi u_count s_count p_count
0.000000 0.040000 [78]*******************#[9497] [353]#...................[3732] [20]*******************#[39755]
0.040000 0.080000 [78]******************..[9497] [353]*****...............[3732] [20]*******.............[39755]
0.080000 0.120000 [78]****************....[9497] [353]*********...........[3732] [20]****................[39755]
0.120000 0.160000 [78]**************......[9497] [353]************........[3732] [20]***.................[39755]
0.160000 0.200000 [78]*************.......[9497] [353]**************......[3732] [20]**..................[39755]
0.200000 0.240000 [78]************........[9497] [353]****************....[3732] [20]*...................[39755]
0.240000 0.280000 [78]**********..........[9497] [353]******************..[3732] [20]*...................[39755]
0.280000 0.320000 [78]**********..........[9497] [353]******************..[3732] [20]*...................[39755]
0.320000 0.360000 [78]*********...........[9497] [353]*******************.[3732] [20]#...................[39755]
0.360000 0.400000 [78]********............[9497] [353]*******************.[3732] [20]#...................[39755]
0.400000 0.440000 [78]*******.............[9497] [353]*******************#[3732] [20]#...................[39755]
0.440000 0.480000 [78]******..............[9497] [353]******************..[3732] [20]#...................[39755]
0.480000 0.520000 [78]*****...............[9497] [353]******************..[3732] [20]#...................[39755]
0.520000 0.560000 [78]*****...............[9497] [353]******************..[3732] [20]#...................[39755]
0.560000 0.600000 [78]****................[9497] [353]*****************...[3732] [20]#...................[39755]
0.600000 0.640000 [78]****................[9497] [353]*****************...[3732] [20]#...................[39755]
0.640000 0.680000 [78]****................[9497] [353]****************....[3732] [20]#...................[39755]
0.680000 0.720000 [78]***.................[9497] [353]****************....[3732] [20]#...................[39755]
0.720000 0.760000 [78]***.................[9497] [353]**************......[3732] [20]#...................[39755]
0.760000 0.800000 [78]**..................[9497] [353]**************......[3732] [20]#...................[39755]
0.800000 0.840000 [78]**..................[9497] [353]*************.......[3732] [20]#...................[39755]
0.840000 0.880000 [78]**..................[9497] [353]************........[3732] [20]#...................[39755]
0.880000 0.920000 [78]**..................[9497] [353]***********.........[3732] [20]#...................[39755]
0.920000 0.960000 [78]*...................[9497] [353]***********.........[3732] [20]#...................[39755]
0.960000 1.000000 [78]*...................[9497] [353]**********..........[3732] [20]#...................[39755]
1.000000 1.040000 [78]*...................[9497] [353]*********...........[3732] [20]#...................[39755]
1.040000 1.080000 [78]*...................[9497] [353]*********...........[3732] [20]#...................[39755]
1.080000 1.120000 [78]*...................[9497] [353]********............[3732] [20]#...................[39755]
1.120000 1.160000 [78]*...................[9497] [353]********............[3732] [20]#...................[39755]
1.160000 1.200000 [78]#...................[9497] [353]*******.............[3732] [20]#...................[39755]
1.200000 1.240000 [78]#...................[9497] [353]******..............[3732] [20]#...................[39755]
1.240000 1.280000 [78]#...................[9497] [353]*****...............[3732] [20]#...................[39755]
1.280000 1.320000 [78]#...................[9497] [353]*****...............[3732] [20]#...................[39755]
1.320000 1.360000 [78]#...................[9497] [353]*****...............[3732] [20]#...................[39755]
1.360000 1.400000 [78]#...................[9497] [353]****................[3732] [20]#...................[39755]
1.400000 1.440000 [78]#...................[9497] [353]****................[3732] [20]#...................[39755]
1.440000 1.480000 [78]#...................[9497] [353]***.................[3732] [20]#...................[39755]
1.480000 1.520000 [78]#...................[9497] [353]***.................[3732] [20]#...................[39755]
1.520000 1.560000 [78]#...................[9497] [353]***.................[3732] [20]#...................[39755]
1.560000 1.600000 [78]#...................[9497] [353]**..................[3732] [20]#...................[39755]
1.600000 1.640000 [78]#...................[9497] [353]**..................[3732] [20]#...................[39755]
1.640000 1.680000 [78]#...................[9497] [353]*...................[3732] [20]#...................[39755]
1.680000 1.720000 [78]#...................[9497] [353]*...................[3732] [20]#...................[39755]
1.720000 1.760000 [78]#...................[9497] [353]*...................[3732] [20]#...................[39755]
1.760000 1.800000 [78]#...................[9497] [353]*...................[3732] [20]#...................[39755]
1.800000 1.840000 [78]#...................[9497] [353]#...................[3732] [20]#...................[39755]
1.840000 1.880000 [78]#...................[9497] [353]#...................[3732] [20]#...................[39755]
1.880000 1.920000 [78]#...................[9497] [353]#...................[3732] [20]#...................[39755]
1.920000 1.960000 [78]#...................[9497] [353]#...................[3732] [20]#...................[39755]
1.960000 2.000000 [78]#...................[9497] [353]#...................[3732] [20]#...................[39755]
Sieve of Eratosthenes
----------------------------------------------------------------
The `Sieve of Eratosthenes <http://en.wikipedia.org/wiki/Sieve_of_Eratosthenes>`_ is a standard introductory programming topic. The idea is to find all primes up to some *N* by making a list of the numbers 1 to *N*, then striking out all multiples of 2 except 2 itself, all multiples of 3 except 3 itself, all multiples of 4 except 4 itself, and so on. Whatever survives that without getting marked is a prime. This is easy enough in Miller. Notice that here all the work is in ``begin`` and ``end`` statements; there is no file input (so we use ``mlr -n`` to keep Miller from waiting for input data).
::
$ cat programs/sieve.mlr
# ================================================================
# Sieve of Eratosthenes: simple example of Miller DSL as programming language.
# ================================================================
# Put this in a begin-block so we can do either
# mlr -n put -q -f name-of-this-file.mlr
# or
# mlr -n put -q -f name-of-this-file.mlr -e '@n = 200'
# i.e. 100 is the default upper limit, and another can be specified using -e.
begin {
@n = 100;
}
end {
for (int i = 0; i <= @n; i += 1) {
@s[i] = true;
}
@s[0] = false; # 0 is neither prime nor composite
@s[1] = false; # 1 is neither prime nor composite
# Strike out multiples
for (int i = 2; i <= @n; i += 1) {
for (int j = i+i; j <= @n; j += i) {
@s[j] = false;
}
}
# Print survivors
for (int i = 0; i <= @n; i += 1) {
if (@s[i]) {
print i;
}
}
}
::
$ mlr -n put -f programs/sieve.mlr
2
3
5
7
11
13
17
19
23
29
31
37
41
43
47
53
59
61
67
71
73
79
83
89
97
Mandelbrot-set generator
----------------------------------------------------------------
The `Mandelbrot set <http://en.wikipedia.org/wiki/Mandelbrot_set>`_ is also easily expressed. This isn't an important case of data-processing in the vein for which Miller was designed, but it is an example of Miller as a general-purpose programming language -- a test case for the expressiveness of the language.
The (approximate) computation of points in the complex plane which are and aren't members is just a few lines of complex arithmetic (see the Wikipedia article); how to render them is another task. Using graphics libraries you can create PNG or JPEG files, but another fun way to do this is by printing various characters to the screen:
::
$ cat programs/mand.mlr
# Mandelbrot set generator: simple example of Miller DSL as programming language.
begin {
# Set defaults
@rcorn = -2.0;
@icorn = -2.0;
@side = 4.0;
@iheight = 50;
@iwidth = 100;
@maxits = 100;
@levelstep = 5;
@chars = "@X*o-."; # Palette of characters to print to the screen.
@verbose = false;
@do_julia = false;
@jr = 0.0; # Real part of Julia point, if any
@ji = 0.0; # Imaginary part of Julia point, if any
}
# Here, we can override defaults from an input file (if any). In Miller's
# put/filter DSL, absent-null right-hand sides result in no assignment so we
# can simply put @rcorn = $rcorn: if there is a field in the input like
# 'rcorn = -1.847' we'll read and use it, else we'll keep the default.
@rcorn = $rcorn;
@icorn = $icorn;
@side = $side;
@iheight = $iheight;
@iwidth = $iwidth;
@maxits = $maxits;
@levelstep = $levelstep;
@chars = $chars;
@verbose = $verbose;
@do_julia = $do_julia;
@jr = $jr;
@ji = $ji;
end {
if (@verbose) {
print "RCORN = ".@rcorn;
print "ICORN = ".@icorn;
print "SIDE = ".@side;
print "IHEIGHT = ".@iheight;
print "IWIDTH = ".@iwidth;
print "MAXITS = ".@maxits;
print "LEVELSTEP = ".@levelstep;
print "CHARS = ".@chars;
}
# Iterate over a matrix of rows and columns, printing one character for each cell.
for (int ii = @iheight-1; ii >= 0; ii -= 1) {
num pi = @icorn + (ii/@iheight) * @side;
for (int ir = 0; ir < @iwidth; ir += 1) {
num pr = @rcorn + (ir/@iwidth) * @side;
printn get_point_plot(pr, pi, @maxits, @do_julia, @jr, @ji);
}
print;
}
}
# This is a function to approximate membership in the Mandelbrot set (or Julia
# set for a given Julia point if do_julia == true) for a given point in the
# complex plane.
func get_point_plot(pr, pi, maxits, do_julia, jr, ji) {
num zr = 0.0;
num zi = 0.0;
num cr = 0.0;
num ci = 0.0;
if (!do_julia) {
zr = 0.0;
zi = 0.0;
cr = pr;
ci = pi;
} else {
zr = pr;
zi = pi;
cr = jr;
ci = ji;
}
int iti = 0;
bool escaped = false;
num zt = 0;
for (iti = 0; iti < maxits; iti += 1) {
num mag = zr*zr + zi+zi;
if (mag > 4.0) {
escaped = true;
break;
}
# z := z^2 + c
zt = zr*zr - zi*zi + cr;
zi = 2*zr*zi + ci;
zr = zt;
}
if (!escaped) {
return ".";
} else {
# The // operator is Miller's (pythonic) integer-division operator
int level = (iti // @levelstep) % strlen(@chars);
return substr(@chars, level, level);
}
}
At standard resolution this makes a nice little ASCII plot:
::
$ mlr -n put -f ./programs/mand.mlr
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@XXXXXX@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@XXXX.XXXX@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@XXXXXXXooXXXX@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@XXXXX**o..*XXXXX@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@XXXXXX*-....-oXXXXXX@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@XXXXX@XXXXXXXXXX*......o*XXXXXXXXXX@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@XXXXXXXXXX**oo*-.-........oo.XXXXXXXXX@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@XXXXXXXXXXXXX....................X..o-XXX@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@XXXXXXXXXXXXXXX*oo......................oXXXXX@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@XXX*XXXXXXXXXXXX**o........................*X*X@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@XXXXXXooo***o*.*XX**X..........................o-XX@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@XXXXXXXX*-.......-***.............................oXX@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@XXXXXXXX*@..........Xo............................*XX@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@XXXX@XXXXXXXX*o@oX...........@...........................oXXX@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
.........................................................o*XXXXX@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@XXXXXXXXX*-.oX...........@...........................oXXXXX@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@XXXXXXXXXX**@..........*o............................*XXXXXXXX@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@XXXXXXXXXXXXX-........***.............................oXXXXXXXXXX@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@XXXXXXXXXXXXoo****o*.XX***@..........................o-XXXXXXXXXXXXX@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@XXXXX*XXXX*XXXXXXX**-........................***XXXXX@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@XXXXXXXXXXXXX*o*.....................@o*XXXX@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@XXXXXXXXXXXX*....................*..o-XX@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@XXXXX*ooo*-.o........oo.X*XXXXXX@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@XXXXXXXXX**@.....*XXXXXXXXX@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@XXXXXXXXX*o....-o*XXXXXX@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@XXXXXXXXXXo*o..*XXXXXXXX@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@XXXXXXXXXXXXX*o*XXXXXXX@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@XXXXXXXXXXXX@XXXXXXXX@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@XXXXXXXXX@@XXXXX@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@XXXXX@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
But using a very small font size (as small as my Mac will let me go), and by choosing the coordinates to zoom in on a particular part of the complex plane, we can get a nice little picture:
::
#!/bin/bash
# Get the number of rows and columns from the terminal window dimensions
iheight=$(stty size | mlr --nidx --fs space cut -f 1)
iwidth=$(stty size | mlr --nidx --fs space cut -f 2)
echo "rcorn=-1.755350,icorn=+0.014230,side=0.000020,maxits=10000,iheight=$iheight,iwidth=$iwidth" \
| mlr put -f programs/mand.mlr
.. image:: pix/mand.png

View file

@ -1,321 +0,0 @@
..
PLEASE DO NOT EDIT DIRECTLY. EDIT THE .rst.in FILE PLEASE.
Cookbook part 3: Stats with and without out-of-stream variables
================================================================
Overview
----------------------------------------------------------------
One of Miller's strengths is its compact notation: for example, given input of the form
::
$ head -n 5 ../data/medium
a=pan,b=pan,i=1,x=0.3467901443380824,y=0.7268028627434533
a=eks,b=pan,i=2,x=0.7586799647899636,y=0.5221511083334797
a=wye,b=wye,i=3,x=0.20460330576630303,y=0.33831852551664776
a=eks,b=wye,i=4,x=0.38139939387114097,y=0.13418874328430463
a=wye,b=pan,i=5,x=0.5732889198020006,y=0.8636244699032729
you can simply do
::
$ mlr --oxtab stats1 -a sum -f x ../data/medium
x_sum 4986.019682
or
::
$ mlr --opprint stats1 -a sum -f x -g b ../data/medium
b x_sum
pan 965.763670
wye 1023.548470
zee 979.742016
eks 1016.772857
hat 1000.192668
rather than the more tedious
::
$ mlr --oxtab put -q '
@x_sum += $x;
end {
emit @x_sum
}
' data/medium
x_sum 4986.019682
or
::
$ mlr --opprint put -q '
@x_sum[$b] += $x;
end {
emit @x_sum, "b"
}
' data/medium
b x_sum
pan 965.763670
wye 1023.548470
zee 979.742016
eks 1016.772857
hat 1000.192668
The former (``mlr stats1`` et al.) has the advantages of being easier to type, being less error-prone to type, and running faster.
Nonetheless, out-of-stream variables (which I whimsically call *oosvars*), begin/end blocks, and emit statements give you the ability to implement logic -- if you wish to do so -- which isn't present in other Miller verbs. (If you find yourself often using the same out-of-stream-variable logic over and over, please file a request at https://github.com/johnkerl/miller/issues to get it implemented directly in C as a Miller verb of its own.)
The following examples compute some things using oosvars which are already computable using Miller verbs, by way of providing food for thought.
Mean without/with oosvars
----------------------------------------------------------------
::
$ mlr --opprint stats1 -a mean -f x data/medium
x_mean
0.498602
::
$ mlr --opprint put -q '
@x_sum += $x;
@x_count += 1;
end {
@x_mean = @x_sum / @x_count;
emit @x_mean
}
' data/medium
x_mean
0.498602
Keyed mean without/with oosvars
----------------------------------------------------------------
::
$ mlr --opprint stats1 -a mean -f x -g a,b data/medium
a b x_mean
pan pan 0.513314
eks pan 0.485076
wye wye 0.491501
eks wye 0.483895
wye pan 0.499612
zee pan 0.519830
eks zee 0.495463
zee wye 0.514267
hat wye 0.493813
pan wye 0.502362
zee eks 0.488393
hat zee 0.509999
hat eks 0.485879
wye hat 0.497730
pan eks 0.503672
eks eks 0.522799
hat hat 0.479931
hat pan 0.464336
zee zee 0.512756
pan hat 0.492141
pan zee 0.496604
zee hat 0.467726
wye zee 0.505907
eks hat 0.500679
wye eks 0.530604
::
$ mlr --opprint put -q '
@x_sum[$a][$b] += $x;
@x_count[$a][$b] += 1;
end{
for ((a, b), v in @x_sum) {
@x_mean[a][b] = @x_sum[a][b] / @x_count[a][b];
}
emit @x_mean, "a", "b"
}
' data/medium
a b x_mean
pan pan 0.513314
pan wye 0.502362
pan eks 0.503672
pan hat 0.492141
pan zee 0.496604
eks pan 0.485076
eks wye 0.483895
eks zee 0.495463
eks eks 0.522799
eks hat 0.500679
wye wye 0.491501
wye pan 0.499612
wye hat 0.497730
wye zee 0.505907
wye eks 0.530604
zee pan 0.519830
zee wye 0.514267
zee eks 0.488393
zee zee 0.512756
zee hat 0.467726
hat wye 0.493813
hat zee 0.509999
hat eks 0.485879
hat hat 0.479931
hat pan 0.464336
Variance and standard deviation without/with oosvars
----------------------------------------------------------------
::
$ mlr --oxtab stats1 -a count,sum,mean,var,stddev -f x data/medium
x_count 10000
x_sum 4986.019682
x_mean 0.498602
x_var 0.084270
x_stddev 0.290293
::
$ cat variance.mlr
@n += 1;
@sumx += $x;
@sumx2 += $x**2;
end {
@mean = @sumx / @n;
@var = (@sumx2 - @mean * (2 * @sumx - @n * @mean)) / (@n - 1);
@stddev = sqrt(@var);
emitf @n, @sumx, @sumx2, @mean, @var, @stddev
}
::
$ mlr --oxtab put -q -f variance.mlr data/medium
n 10000
sumx 4986.019682
sumx2 3328.652400
mean 0.498602
var 0.084270
stddev 0.290293
You can also do this keyed, of course, imitating the keyed-mean example above.
Min/max without/with oosvars
----------------------------------------------------------------
::
$ mlr --oxtab stats1 -a min,max -f x data/medium
x_min 0.000045
x_max 0.999953
::
$ mlr --oxtab put -q '@x_min = min(@x_min, $x); @x_max = max(@x_max, $x); end{emitf @x_min, @x_max}' data/medium
x_min 0.000045
x_max 0.999953
Keyed min/max without/with oosvars
----------------------------------------------------------------
::
$ mlr --opprint stats1 -a min,max -f x -g a data/medium
a x_min x_max
pan 0.000204 0.999403
eks 0.000692 0.998811
wye 0.000187 0.999823
zee 0.000549 0.999490
hat 0.000045 0.999953
::
$ mlr --opprint --from data/medium put -q '
@min[$a] = min(@min[$a], $x);
@max[$a] = max(@max[$a], $x);
end{
emit (@min, @max), "a";
}
'
a min max
pan 0.000204 0.999403
eks 0.000692 0.998811
wye 0.000187 0.999823
zee 0.000549 0.999490
hat 0.000045 0.999953
Delta without/with oosvars
----------------------------------------------------------------
::
$ mlr --opprint step -a delta -f x data/small
a b i x y x_delta
pan pan 1 0.3467901443380824 0.7268028627434533 0
eks pan 2 0.7586799647899636 0.5221511083334797 0.411890
wye wye 3 0.20460330576630303 0.33831852551664776 -0.554077
eks wye 4 0.38139939387114097 0.13418874328430463 0.176796
wye pan 5 0.5732889198020006 0.8636244699032729 0.191890
::
$ mlr --opprint put '$x_delta = is_present(@last) ? $x - @last : 0; @last = $x' data/small
a b i x y x_delta
pan pan 1 0.3467901443380824 0.7268028627434533 0
eks pan 2 0.7586799647899636 0.5221511083334797 0.411890
wye wye 3 0.20460330576630303 0.33831852551664776 -0.554077
eks wye 4 0.38139939387114097 0.13418874328430463 0.176796
wye pan 5 0.5732889198020006 0.8636244699032729 0.191890
Keyed delta without/with oosvars
----------------------------------------------------------------
::
$ mlr --opprint step -a delta -f x -g a data/small
a b i x y x_delta
pan pan 1 0.3467901443380824 0.7268028627434533 0
eks pan 2 0.7586799647899636 0.5221511083334797 0
wye wye 3 0.20460330576630303 0.33831852551664776 0
eks wye 4 0.38139939387114097 0.13418874328430463 -0.377281
wye pan 5 0.5732889198020006 0.8636244699032729 0.368686
::
$ mlr --opprint put '$x_delta = is_present(@last[$a]) ? $x - @last[$a] : 0; @last[$a]=$x' data/small
a b i x y x_delta
pan pan 1 0.3467901443380824 0.7268028627434533 0
eks pan 2 0.7586799647899636 0.5221511083334797 0
wye wye 3 0.20460330576630303 0.33831852551664776 0
eks wye 4 0.38139939387114097 0.13418874328430463 -0.377281
wye pan 5 0.5732889198020006 0.8636244699032729 0.368686
Exponentially weighted moving averages without/with oosvars
----------------------------------------------------------------
::
$ mlr --opprint step -a ewma -d 0.1 -f x data/small
a b i x y x_ewma_0.1
pan pan 1 0.3467901443380824 0.7268028627434533 0.346790
eks pan 2 0.7586799647899636 0.5221511083334797 0.387979
wye wye 3 0.20460330576630303 0.33831852551664776 0.369642
eks wye 4 0.38139939387114097 0.13418874328430463 0.370817
wye pan 5 0.5732889198020006 0.8636244699032729 0.391064
::
$ mlr --opprint put '
begin{ @a=0.1 };
$e = NR==1 ? $x : @a * $x + (1 - @a) * @e;
@e=$e
' data/small
a b i x y e
pan pan 1 0.3467901443380824 0.7268028627434533 0.346790
eks pan 2 0.7586799647899636 0.5221511083334797 0.387979
wye wye 3 0.20460330576630303 0.33831852551664776 0.369642
eks wye 4 0.38139939387114097 0.13418874328430463 0.370817
wye pan 5 0.5732889198020006 0.8636244699032729 0.391064

View file

@ -1,92 +0,0 @@
..
PLEASE DO NOT EDIT DIRECTLY. EDIT THE .rst.in FILE PLEASE.
Customization: .mlrrc
================================================================
How to use .mlrrc
----------------------------------------------------------------
Suppose you always use CSV files. Then instead of always having to type ``--csv`` as in
::
mlr --csv cut -x -f extra mydata.csv
::
mlr --csv sort -n id mydata.csv
and so on, you can instead put the following into your ``$HOME/.mlrrc``:
::
--csv
Then you can just type things like
::
mlr cut -x -f extra mydata.csv
::
mlr sort -n id mydata.csv
and the ``--csv`` part will automatically be understood. (If you do want to process, say, a JSON file then ``mlr --json ...`` at the command line will override the default from your ``.mlrrc``.)
What you can put in your .mlrrc
----------------------------------------------------------------
* You can include any command-line flags, except the "terminal" ones such as ``--help``.
* The formatting rule is you need to put one flag beginning with ``--`` per line: for example, ``--csv`` on one line and ``--nr-progress-mod 1000`` on a separate line.
* Since every line starts with a ``--`` option, you can leave off the initial ``--`` if you want. For example, ``ojson`` is the same as ``--ojson``, and ``nr-progress-mod 1000`` is the same as ``--nr-progress-mod 1000``.
* Comments are from a ``#`` to the end of the line.
* Empty lines are ignored -- including lines which are empty after comments are removed.
Here is an example ``.mlrrc file``:
::
# These are my preferred default settings for Miller
# Input and output formats are CSV by default (unless otherwise specified
# on the mlr command line):
csv
# If a data line has fewer fields than the header line, instead of erroring
# (which is the default), just insert empty values for the missing ones:
allow-ragged-csv-input
# These are no-ops for CSV, but when I do use JSON output, I want these
# pretty-printing options to be used:
jvstack
jlistwrap
# Use "@", rather than "#", for comments within data files:
skip-comments-with @
Where to put your .mlrrc
----------------------------------------------------------------
* If the environment variable ``MLRRC`` is set:
* If its value is ``__none__`` then no ``.mlrrc`` files are processed. (This is nice for things like regression testing.)
* Otherwise, its value (as a filename) is loaded and processed. If there are syntax errors, they abort ``mlr`` with a usage message (as if you had mistyped something on the command line). If the file can't be loaded at all, though, it is silently skipped.
* Any ``.mlrrc`` in your home directory or current directory is ignored whenever ``MLRRC`` is set in the environment.
* Example line in your shell's rc file: ``export MLRRC=/path/to/my/mlrrc``
* Otherwise:
* If ``$HOME/.mlrrc`` exists, it's processed as above.
* If ``./.mlrrc`` exists, it's then also processed as above.
* The idea is you can have all your settings in your ``$HOME/.mlrrc``, then override maybe one or two for your current directory if you like.

View file

@ -1,194 +0,0 @@
..
PLEASE DO NOT EDIT DIRECTLY. EDIT THE .rst.in FILE PLEASE.
Data-diving examples
================================================================
flins data
----------------------------------------------------------------
The `flins.csv <data/flins.csv>`_ file is some sample data obtained from https://support.spatialkey.com/spatialkey-sample-csv-data.
Vertical-tabular format is good for a quick look at CSV data layout -- seeing what columns you have to work with:
::
$ head -n 2 data/flins.csv | mlr --icsv --oxtab cat
county Seminole
tiv_2011 22890.55
tiv_2012 20848.71
line Residential
A few simple queries:
::
$ mlr --from data/flins.csv --icsv --opprint count-distinct -f county | head
county count
Seminole 1
Miami Dade 2
Palm Beach 1
Highlands 2
Duval 1
St. Johns 1
$ mlr --from data/flins.csv --icsv --opprint count-distinct -f construction,line
Categorization of total insured value:
::
$ mlr --from data/flins.csv --icsv --opprint stats1 -a min,mean,max -f tiv_2012
tiv_2012_min tiv_2012_mean tiv_2012_max
19757.910000 1061531.463750 2785551.630000
$ mlr --from data/flins.csv --icsv --opprint stats1 -a min,mean,max -f tiv_2012 -g construction,line
$ mlr --from data/flins.csv --icsv --oxtab stats1 -a p0,p10,p50,p90,p95,p99,p100 -f hu_site_deductible
hu_site_deductible_p0
hu_site_deductible_p10
hu_site_deductible_p50
hu_site_deductible_p90
hu_site_deductible_p95
hu_site_deductible_p99
hu_site_deductible_p100
$ mlr --from data/flins.csv --icsv --opprint stats1 -a p95,p99,p100 -f hu_site_deductible -g county then sort -f county | head
county hu_site_deductible_p95 hu_site_deductible_p99 hu_site_deductible_p100
Duval - - -
Highlands - - -
Miami Dade - - -
Palm Beach - - -
Seminole - - -
St. Johns - - -
$ mlr --from data/flins.csv --icsv --oxtab stats2 -a corr,linreg-ols,r2 -f tiv_2011,tiv_2012
tiv_2011_tiv_2012_corr 0.935363
tiv_2011_tiv_2012_ols_m 1.089091
tiv_2011_tiv_2012_ols_b 103095.523356
tiv_2011_tiv_2012_ols_n 8
tiv_2011_tiv_2012_r2 0.874904
$ mlr --from data/flins.csv --icsv --opprint stats2 -a corr,linreg-ols,r2 -f tiv_2011,tiv_2012 -g county
county tiv_2011_tiv_2012_corr tiv_2011_tiv_2012_ols_m tiv_2011_tiv_2012_ols_b tiv_2011_tiv_2012_ols_n tiv_2011_tiv_2012_r2
Seminole - - - 1 -
Miami Dade 1.000000 0.930643 -2311.154328 2 1.000000
Palm Beach - - - 1 -
Highlands 1.000000 1.055693 -4529.793939 2 1.000000
Duval - - - 1 -
St. Johns - - - 1 -
Color/shape data
----------------------------------------------------------------
The `colored-shapes.dkvp <https://github.com/johnkerl/miller/blob/master/docs/data/colored-shapes.dkvp>`_ file is some sample data produced by the `mkdat2 <https://github.com/johnkerl/miller/blob/master/doc/datagen/mkdat2>`_ script. The idea is:
* Produce some data with known distributions and correlations, and verify that Miller recovers those properties empirically.
* Each record is labeled with one of a few colors and one of a few shapes.
* The ``flag`` field is 0 or 1, with probability dependent on color
* The ``u`` field is plain uniform on the unit interval.
* The ``v`` field is the same, except tightly correlated with ``u`` for red circles.
* The ``w`` field is autocorrelated for each color/shape pair.
* The ``x`` field is boring Gaussian with mean 5 and standard deviation about 1.2, with no dependence on color or shape.
Peek at the data:
::
$ wc -l data/colored-shapes.dkvp
10078 data/colored-shapes.dkvp
$ head -n 6 data/colored-shapes.dkvp | mlr --opprint cat
color shape flag i u v w x
yellow triangle 1 11 0.6321695890307647 0.9887207810889004 0.4364983936735774 5.7981881667050565
red square 1 15 0.21966833570651523 0.001257332190235938 0.7927778364718627 2.944117399716207
red circle 1 16 0.20901671281497636 0.29005231936593445 0.13810280912907674 5.065034003400998
red square 0 48 0.9562743938458542 0.7467203085342884 0.7755423050923582 7.117831369597269
purple triangle 0 51 0.4355354501763202 0.8591292672156728 0.8122903963006748 5.753094629505863
red square 0 64 0.2015510269821953 0.9531098083420033 0.7719912015786777 5.612050466474166
Look at uncategorized stats (using `creach <https://github.com/johnkerl/scripts/blob/master/fundam/creach>`_ for spacing).
Here it looks reasonable that ``u`` is unit-uniform; something's up with ``v`` but we can't yet see what:
::
$ mlr --oxtab stats1 -a min,mean,max -f flag,u,v data/colored-shapes.dkvp | creach 3
flag_min 0
flag_mean 0.398889
flag_max 1
u_min 0.000044
u_mean 0.498326
u_max 0.999969
v_min -0.092709
v_mean 0.497787
v_max 1.072500
The histogram shows the different distribution of 0/1 flags:
::
$ mlr --opprint histogram -f flag,u,v --lo -0.1 --hi 1.1 --nbins 12 data/colored-shapes.dkvp
bin_lo bin_hi flag_count u_count v_count
-0.100000 0.000000 6058 0 36
0.000000 0.100000 0 1062 988
0.100000 0.200000 0 985 1003
0.200000 0.300000 0 1024 1014
0.300000 0.400000 0 1002 991
0.400000 0.500000 0 989 1041
0.500000 0.600000 0 1001 1016
0.600000 0.700000 0 972 962
0.700000 0.800000 0 1035 1070
0.800000 0.900000 0 995 993
0.900000 1.000000 4020 1013 939
1.000000 1.100000 0 0 25
Look at univariate stats by color and shape. In particular, color-dependent flag probabilities pop out, aligning with their original Bernoulli probablities from the data-generator script:
::
$ mlr --opprint stats1 -a min,mean,max -f flag,u,v -g color then sort -f color data/colored-shapes.dkvp
color flag_min flag_mean flag_max u_min u_mean u_max v_min v_mean v_max
blue 0 0.584354 1 0.000044 0.517717 0.999969 0.001489 0.491056 0.999576
green 0 0.209197 1 0.000488 0.504861 0.999936 0.000501 0.499085 0.999676
orange 0 0.521452 1 0.001235 0.490532 0.998885 0.002449 0.487764 0.998475
purple 0 0.090193 1 0.000266 0.494005 0.999647 0.000364 0.497051 0.999975
red 0 0.303167 1 0.000671 0.492560 0.999882 -0.092709 0.496535 1.072500
yellow 0 0.892427 1 0.001300 0.497129 0.999923 0.000711 0.510627 0.999919
$ mlr --opprint stats1 -a min,mean,max -f flag,u,v -g shape then sort -f shape data/colored-shapes.dkvp
shape flag_min flag_mean flag_max u_min u_mean u_max v_min v_mean v_max
circle 0 0.399846 1 0.000044 0.498555 0.999923 -0.092709 0.495524 1.072500
square 0 0.396112 1 0.000188 0.499385 0.999969 0.000089 0.496538 0.999975
triangle 0 0.401542 1 0.000881 0.496859 0.999661 0.000717 0.501050 0.999995
Look at bivariate stats by color and shape. In particular, ``u,v`` pairwise correlation for red circles pops out:
::
$ mlr --opprint --right stats2 -a corr -f u,v,w,x data/colored-shapes.dkvp
u_v_corr w_x_corr
0.133418 -0.011320
$ mlr --opprint --right stats2 -a corr -f u,v,w,x -g color,shape then sort -nr u_v_corr data/colored-shapes.dkvp
color shape u_v_corr w_x_corr
red circle 0.980798 -0.018565
orange square 0.176858 -0.071044
green circle 0.057644 0.011795
red square 0.055745 -0.000680
yellow triangle 0.044573 0.024605
yellow square 0.043792 -0.044623
purple circle 0.035874 0.134112
blue square 0.032412 -0.053508
blue triangle 0.015356 -0.000608
orange circle 0.010519 -0.162795
red triangle 0.008098 0.012486
purple triangle 0.005155 -0.045058
purple square -0.025680 0.057694
green square -0.025776 -0.003265
orange triangle -0.030457 -0.131870
yellow circle -0.064773 0.073695
blue circle -0.102348 -0.030529
green triangle -0.109018 -0.048488

View file

@ -1,319 +0,0 @@
..
PLEASE DO NOT EDIT DIRECTLY. EDIT THE .rst.in FILE PLEASE.
Mixing with other languages
================================================================
As discussed in the section on :doc:`file-formats`, Miller supports several different file formats. Different tools are good at different things, so it's important to be able to move data into and out of other languages. **CSV** and **JSON** are well-known, of course; here are some examples using **DKVP** format, with **Ruby** and **Python**. Last, we show how to use arbitrary **shell commands** to extend functionality beyond Miller's domain-specific language.
DKVP I/O in Python
----------------------------------------------------------------
Here are the I/O routines:
::
#!/usr/bin/env python
# ================================================================
# Example of DKVP I/O using Python.
#
# Key point: Use Miller for what it's good at; pass data into/out of tools in
# other languages to do what they're good at.
#
# bash$ python -i dkvp_io.py
#
# # READ
# >>> map = dkvpline2map('x=1,y=2', '=', ',')
# >>> map
# OrderedDict([('x', '1'), ('y', '2')])
#
# # MODIFY
# >>> map['z'] = map['x'] + map['y']
# >>> map
# OrderedDict([('x', '1'), ('y', '2'), ('z', 3)])
#
# # WRITE
# >>> line = map2dkvpline(map, '=', ',')
# >>> line
# 'x=1,y=2,z=3'
#
# ================================================================
import re
import collections
# ----------------------------------------------------------------
# ips and ifs (input pair separator and input field separator) are nominally '=' and ','.
def dkvpline2map(line, ips, ifs):
pairs = re.split(ifs, line)
map = collections.OrderedDict()
for pair in pairs:
key, value = re.split(ips, pair, 1)
# Type inference:
try:
value = int(value)
except:
try:
value = float(value)
except:
pass
map[key] = value
return map
# ----------------------------------------------------------------
# ops and ofs (output pair separator and output field separator) are nominally '=' and ','.
def map2dkvpline(map , ops, ofs):
line = ''
pairs = []
for key in map:
pairs.append(str(key) + ops + str(map[key]))
return str.join(ofs, pairs)
And here is an example using them:
::
$ cat polyglot-dkvp-io/example.py
#!/usr/bin/env python
import sys
import re
import copy
import dkvp_io
while True:
# Read the original record:
line = sys.stdin.readline().strip()
if line == '':
break
map = dkvp_io.dkvpline2map(line, '=', ',')
# Drop a field:
map.pop('x')
# Compute some new fields:
map['ab'] = map['a'] + map['b']
map['iy'] = map['i'] + map['y']
# Add new fields which show type of each already-existing field:
omap = copy.copy(map) # since otherwise the for-loop will modify what it loops over
keys = omap.keys()
for key in keys:
# Convert "<type 'int'>" to just "int", etc.:
type_string = str(map[key].__class__)
type_string = re.sub("<type '", "", type_string) # python2
type_string = re.sub("<class '", "", type_string) # python3
type_string = re.sub("'>", "", type_string)
map['t'+key] = type_string
# Write the modified record:
print(dkvp_io.map2dkvpline(map, '=', ','))
Run as-is:
::
$ python polyglot-dkvp-io/example.py < data/small
a=pan,b=pan,i=1,y=0.7268028627434533,ab=panpan,iy=1.7268028627434533,ta=str,tb=str,ti=int,ty=float,tab=str,tiy=float
a=eks,b=pan,i=2,y=0.5221511083334797,ab=ekspan,iy=2.5221511083334796,ta=str,tb=str,ti=int,ty=float,tab=str,tiy=float
a=wye,b=wye,i=3,y=0.33831852551664776,ab=wyewye,iy=3.3383185255166477,ta=str,tb=str,ti=int,ty=float,tab=str,tiy=float
a=eks,b=wye,i=4,y=0.13418874328430463,ab=ekswye,iy=4.134188743284304,ta=str,tb=str,ti=int,ty=float,tab=str,tiy=float
a=wye,b=pan,i=5,y=0.8636244699032729,ab=wyepan,iy=5.863624469903273,ta=str,tb=str,ti=int,ty=float,tab=str,tiy=float
Run as-is, then pipe to Miller for pretty-printing:
::
$ python polyglot-dkvp-io/example.py < data/small | mlr --opprint cat
a b i y ab iy ta tb ti ty tab tiy
pan pan 1 0.7268028627434533 panpan 1.7268028627434533 str str int float str float
eks pan 2 0.5221511083334797 ekspan 2.5221511083334796 str str int float str float
wye wye 3 0.33831852551664776 wyewye 3.3383185255166477 str str int float str float
eks wye 4 0.13418874328430463 ekswye 4.134188743284304 str str int float str float
wye pan 5 0.8636244699032729 wyepan 5.863624469903273 str str int float str float
DKVP I/O in Ruby
----------------------------------------------------------------
Here are the I/O routines:
::
#!/usr/bin/env ruby
# ================================================================
# Example of DKVP I/O using Ruby.
#
# Key point: Use Miller for what it's good at; pass data into/out of tools in
# other languages to do what they're good at.
#
# bash$ irb -I. -r dkvp_io.rb
#
# # READ
# irb(main):001:0> map = dkvpline2map('x=1,y=2', '=', ',')
# => {"x"=>"1", "y"=>"2"}
#
# # MODIFY
# irb(main):001:0> map['z'] = map['x'] + map['y']
# => 3
#
# # WRITE
# irb(main):002:0> line = map2dkvpline(map, '=', ',')
# => "x=1,y=2,z=3"
#
# ================================================================
# ----------------------------------------------------------------
# ips and ifs (input pair separator and input field separator) are nominally '=' and ','.
def dkvpline2map(line, ips, ifs)
map = {}
line.split(ifs).each do |pair|
(k, v) = pair.split(ips, 2)
# Type inference:
begin
v = Integer(v)
rescue ArgumentError
begin
v = Float(v)
rescue ArgumentError
# Leave as string
end
end
map[k] = v
end
map
end
# ----------------------------------------------------------------
# ops and ofs (output pair separator and output field separator) are nominally '=' and ','.
def map2dkvpline(map, ops, ofs)
map.collect{|k,v| k.to_s + ops + v.to_s}.join(ofs)
end
And here is an example using them:
::
$ cat polyglot-dkvp-io/example.rb
#!/usr/bin/env ruby
require 'dkvp_io'
ARGF.each do |line|
# Read the original record:
map = dkvpline2map(line.chomp, '=', ',')
# Drop a field:
map.delete('x')
# Compute some new fields:
map['ab'] = map['a'] + map['b']
map['iy'] = map['i'] + map['y']
# Add new fields which show type of each already-existing field:
keys = map.keys
keys.each do |key|
map['t'+key] = map[key].class
end
# Write the modified record:
puts map2dkvpline(map, '=', ',')
end
Run as-is:
::
$ ruby -I./polyglot-dkvp-io polyglot-dkvp-io/example.rb data/small
a=pan,b=pan,i=1,y=0.7268028627434533,ab=panpan,iy=1.7268028627434533,ta=String,tb=String,ti=Integer,ty=Float,tab=String,tiy=Float
a=eks,b=pan,i=2,y=0.5221511083334797,ab=ekspan,iy=2.5221511083334796,ta=String,tb=String,ti=Integer,ty=Float,tab=String,tiy=Float
a=wye,b=wye,i=3,y=0.33831852551664776,ab=wyewye,iy=3.3383185255166477,ta=String,tb=String,ti=Integer,ty=Float,tab=String,tiy=Float
a=eks,b=wye,i=4,y=0.13418874328430463,ab=ekswye,iy=4.134188743284304,ta=String,tb=String,ti=Integer,ty=Float,tab=String,tiy=Float
a=wye,b=pan,i=5,y=0.8636244699032729,ab=wyepan,iy=5.863624469903273,ta=String,tb=String,ti=Integer,ty=Float,tab=String,tiy=Float
Run as-is, then pipe to Miller for pretty-printing:
::
$ ruby -I./polyglot-dkvp-io polyglot-dkvp-io/example.rb data/small | mlr --opprint cat
a b i y ab iy ta tb ti ty tab tiy
pan pan 1 0.7268028627434533 panpan 1.7268028627434533 String String Integer Float String Float
eks pan 2 0.5221511083334797 ekspan 2.5221511083334796 String String Integer Float String Float
wye wye 3 0.33831852551664776 wyewye 3.3383185255166477 String String Integer Float String Float
eks wye 4 0.13418874328430463 ekswye 4.134188743284304 String String Integer Float String Float
wye pan 5 0.8636244699032729 wyepan 5.863624469903273 String String Integer Float String Float
SQL-output examples
----------------------------------------------------------------
Please see :ref:`sql-output-examples`.
SQL-input examples
----------------------------------------------------------------
Please see :ref:`sql-input-examples`.
Running shell commands
----------------------------------------------------------------
The :ref:`reference-dsl-system` DSL function allows you to run a specific shell command and put its output -- minus the final newline -- into a record field. The command itself is any string, either a literal string, or a concatenation of strings, perhaps including other field values or what have you.
::
$ mlr --opprint put '$o = system("echo hello world")' data/small
a b i x y o
pan pan 1 0.3467901443380824 0.7268028627434533 hello world
eks pan 2 0.7586799647899636 0.5221511083334797 hello world
wye wye 3 0.20460330576630303 0.33831852551664776 hello world
eks wye 4 0.38139939387114097 0.13418874328430463 hello world
wye pan 5 0.5732889198020006 0.8636244699032729 hello world
::
$ mlr --opprint put '$o = system("echo {" . NR . "}")' data/small
a b i x y o
pan pan 1 0.3467901443380824 0.7268028627434533 {1}
eks pan 2 0.7586799647899636 0.5221511083334797 {2}
wye wye 3 0.20460330576630303 0.33831852551664776 {3}
eks wye 4 0.38139939387114097 0.13418874328430463 {4}
wye pan 5 0.5732889198020006 0.8636244699032729 {5}
::
$ mlr --opprint put '$o = system("echo -n ".$a."| sha1sum")' data/small
a b i x y o
pan pan 1 0.3467901443380824 0.7268028627434533 f29c748220331c273ef16d5115f6ecd799947f13 -
eks pan 2 0.7586799647899636 0.5221511083334797 456d988ecb3bf1b75f057fc6e9fe70db464e9388 -
wye wye 3 0.20460330576630303 0.33831852551664776 eab0de043d67f441c7fd1e335f0ca38708e6ebf7 -
eks wye 4 0.38139939387114097 0.13418874328430463 456d988ecb3bf1b75f057fc6e9fe70db464e9388 -
wye pan 5 0.5732889198020006 0.8636244699032729 eab0de043d67f441c7fd1e335f0ca38708e6ebf7 -
Note that running a subprocess on every record takes a non-trivial amount of time. Comparing asking the system ``date`` command for the current time in nanoseconds versus computing it in process:
..
hard-coded, not live-code, since %N doesn't exist on all platforms
::
$ mlr --opprint put '$t=system("date +%s.%N")' then step -a delta -f t data/small
a b i x y t t_delta
pan pan 1 0.3467901443380824 0.7268028627434533 1568774318.513903817 0
eks pan 2 0.7586799647899636 0.5221511083334797 1568774318.514722876 0.000819
wye wye 3 0.20460330576630303 0.33831852551664776 1568774318.515618046 0.000895
eks wye 4 0.38139939387114097 0.13418874328430463 1568774318.516547441 0.000929
wye pan 5 0.5732889198020006 0.8636244699032729 1568774318.517518828 0.000971
::
$ mlr --opprint put '$t=systime()' then step -a delta -f t data/small
a b i x y t t_delta
pan pan 1 0.3467901443380824 0.7268028627434533 1568774318.518699 0
eks pan 2 0.7586799647899636 0.5221511083334797 1568774318.518717 0.000018
wye wye 3 0.20460330576630303 0.33831852551664776 1568774318.518723 0.000006
eks wye 4 0.38139939387114097 0.13418874328430463 1568774318.518727 0.000004
wye pan 5 0.5732889198020006 0.8636244699032729 1568774318.518730 0.000003

View file

@ -1,9 +0,0 @@
..
PLEASE DO NOT EDIT DIRECTLY. EDIT THE .rst.in FILE PLEASE.
Why call it Miller?
================================================================
The Unix toolkit was created in the **1970s** and is a mainstay to this day. Miller is written in plain C, and its look and feel adheres closely to the `classic toolkit style <http://en.wikipedia.org/wiki/Unix_philosophy>`_: if this were music, Miller would be a **tribute album**. Likewise, since commands are subcommands of the ``mlr`` executable, the result is a **band**, if you will, of command-line tools. Put these together and the namesake is another classic product of the 1970s: the `Steve Miller Band <http://en.wikipedia.org/wiki/Steve%5fMiller%5fBand>`_.
(Additionally, and far more prosaically ... just as a miller is someone who grinds and mixes grain into flour to extend its usefulness, Miller grinds and mixes data for you.)

View file

@ -1,658 +0,0 @@
..
PLEASE DO NOT EDIT DIRECTLY. EDIT THE .rst.in FILE PLEASE.
FAQ
=========
No output at all
----------------------------------------------------------------
Try ``od -xcv`` and/or ``cat -e`` on your file to check for non-printable characters.
If you're using Miller version less than 5.0.0 (try ``mlr --version`` on your system to find out), when the line-ending-autodetect feature was introduced, please see http://johnkerl.org/miller-releases/miller-4.5.0/doc/index.html.
Fields not selected
----------------------------------------------------------------
Check the field-separators of the data, e.g. with the command-line ``head`` program. Example: for CSV, Miller's default record separator is comma; if your data is tab-delimited, e.g. ``aTABbTABc``, then Miller won't find three fields named ``a``, ``b``, and ``c`` but rather just one named ``aTABbTABc``. Solution in this case: ``mlr --fs tab {remaining arguments ...}``.
Also try ``od -xcv`` and/or ``cat -e`` on your file to check for non-printable characters.
Diagnosing delimiter specifications
----------------------------------------------------------------
::
# Use the `file` command to see if there are CR/LF terminators (in this case,
# there are not):
$ file data/colours.csv
data/colours.csv: UTF-8 Unicode text
# Look at the file to find names of fields
$ cat data/colours.csv
KEY;DE;EN;ES;FI;FR;IT;NL;PL;RO;TR
masterdata_colourcode_1;Weiß;White;Blanco;Valkoinen;Blanc;Bianco;Wit;Biały;Alb;Beyaz
masterdata_colourcode_2;Schwarz;Black;Negro;Musta;Noir;Nero;Zwart;Czarny;Negru;Siyah
# Extract a few fields:
$ mlr --csv cut -f KEY,PL,RO data/colours.csv
(only blank lines appear)
# Use XTAB output format to get a sharper picture of where records/fields
# are being split:
$ mlr --icsv --oxtab cat data/colours.csv
KEY;DE;EN;ES;FI;FR;IT;NL;PL;RO;TR masterdata_colourcode_1;Weiß;White;Blanco;Valkoinen;Blanc;Bianco;Wit;Biały;Alb;Beyaz
KEY;DE;EN;ES;FI;FR;IT;NL;PL;RO;TR masterdata_colourcode_2;Schwarz;Black;Negro;Musta;Noir;Nero;Zwart;Czarny;Negru;Siyah
# Using XTAB output format makes it clearer that KEY;DE;...;RO;TR is being
# treated as a single field name in the CSV header, and likewise each
# subsequent line is being treated as a single field value. This is because
# the default field separator is a comma but we have semicolons here.
# Use XTAB again with different field separator (--fs semicolon):
mlr --icsv --ifs semicolon --oxtab cat data/colours.csv
KEY masterdata_colourcode_1
DE Weiß
EN White
ES Blanco
FI Valkoinen
FR Blanc
IT Bianco
NL Wit
PL Biały
RO Alb
TR Beyaz
KEY masterdata_colourcode_2
DE Schwarz
EN Black
ES Negro
FI Musta
FR Noir
IT Nero
NL Zwart
PL Czarny
RO Negru
TR Siyah
# Using the new field-separator, retry the cut:
mlr --csv --fs semicolon cut -f KEY,PL,RO data/colours.csv
KEY;PL;RO
masterdata_colourcode_1;Biały;Alb
masterdata_colourcode_2;Czarny;Negru
How do I suppress numeric conversion?
----------------------------------------------------------------
**TL;DR use put -S**.
Within ``mlr put`` and ``mlr filter``, the default behavior for scanning input records is to parse them as integer, if possible, then as float, if possible, else leave them as string:
::
$ cat data/scan-example-1.tbl
value
1
2.0
3x
hello
::
$ mlr --pprint put '$copy = $value; $type = typeof($value)' data/scan-example-1.tbl
value copy type
1 1 int
2.0 2.000000 float
3x 3x string
hello hello string
The numeric-conversion rule is simple:
* Try to scan as integer (``"1"`` should be int);
* If that doesn't succeed, try to scan as float (``"1.0"`` should be float);
* If that doesn't succeed, leave the value as a string (``"1x"`` is string).
This is a sensible default: you should be able to put ``'$z = $x + $y'`` without having to write ``'$z = int($x) + float($y)'``. Also note that default output format for floating-point numbers created by ``put`` (and other verbs such as ``stats1``) is six decimal places; you can override this using ``mlr --ofmt``. Also note that Miller uses your system's C library functions whenever possible: e.g. ``sscanf`` for converting strings to integer or floating-point.
But now suppose you have data like these:
::
$ cat data/scan-example-2.tbl
value
0001
0002
0005
0005WA
0006
0007
0007WA
0008
0009
0010
::
$ mlr --pprint put '$copy = $value; $type = typeof($value)' data/scan-example-2.tbl
value copy type
0001 1 int
0002 2 int
0005 5 int
0005WA 0005WA string
0006 6 int
0007 7 int
0007WA 0007WA string
0008 8.000000 float
0009 9.000000 float
0010 8 int
The same conversion rules as above are being used. Namely:
* By default field values are inferred to int, else float, else string;
* leading zeroes indicate octal for integers (``sscanf`` semantics);
* since ``0008`` doesn't scan as integer (leading 0 requests octal but 8 isn't a valid octal digit), the float scan is tried next and it succeeds;
* default floating-point output format is 6 decimal places (override with ``mlr --ofmt``).
Taken individually the rules make sense; taken collectively they produce a mishmash of types here.
The solution is to **use the -S flag** for ``mlr put`` and/or ``mlr filter``. Then all field values are left as string. You can type-coerce on demand using syntax like ``'$z = int($x) + float($y)'``. (See also :doc:`reference-dsl`; see also https://github.com/johnkerl/miller/issues/150.)
::
$ mlr --pprint put -S '$copy = $value; $type = typeof($value)' data/scan-example-2.tbl
value copy type
0001 0001 string
0002 0002 string
0005 0005 string
0005WA 0005WA string
0006 0006 string
0007 0007 string
0007WA 0007WA string
0008 0008 string
0009 0009 string
0010 0010 string
How do I examine then-chaining?
----------------------------------------------------------------
Then-chaining found in Miller is intended to function the same as Unix pipes, but with less keystroking. You can print your data one pipeline step at a time, to see what intermediate output at one step becomes the input to the next step.
First, look at the input data:
::
$ cat data/then-example.csv
Status,Payment_Type,Amount
paid,cash,10.00
pending,debit,20.00
paid,cash,50.00
pending,credit,40.00
paid,debit,30.00
Next, run the first step of your command, omitting anything from the first ``then`` onward:
::
$ mlr --icsv --opprint count-distinct -f Status,Payment_Type data/then-example.csv
Status Payment_Type count
paid cash 2
pending debit 1
pending credit 1
paid debit 1
After that, run it with the next ``then`` step included:
::
$ mlr --icsv --opprint count-distinct -f Status,Payment_Type then sort -nr count data/then-example.csv
Status Payment_Type count
paid cash 2
pending debit 1
pending credit 1
paid debit 1
Now if you use ``then`` to include another verb after that, the columns ``Status``, ``Payment_Type``, and ``count`` will be the input to that verb.
Note, by the way, that you'll get the same results using pipes:
::
$ mlr --csv count-distinct -f Status,Payment_Type data/then-example.csv | mlr --icsv --opprint sort -nr count
Status Payment_Type count
paid cash 2
pending debit 1
pending credit 1
paid debit 1
I assigned $9 and it's not 9th
----------------------------------------------------------------
Miller records are ordered lists of key-value pairs. For NIDX format, DKVP format when keys are missing, or CSV/CSV-lite format with ``--implicit-csv-header``, Miller will sequentially assign keys of the form ``1``, ``2``, etc. But these are not integer array indices: they're just field names taken from the initial field ordering in the input data.
::
$ echo x,y,z | mlr --dkvp cat
1=x,2=y,3=z
::
$ echo x,y,z | mlr --dkvp put '$6="a";$4="b";$55="cde"'
1=x,2=y,3=z,6=a,4=b,55=cde
::
$ echo x,y,z | mlr --nidx cat
x,y,z
::
$ echo x,y,z | mlr --csv --implicit-csv-header cat
1,2,3
x,y,z
::
$ echo x,y,z | mlr --dkvp rename 2,999
1=x,999=y,3=z
::
$ echo x,y,z | mlr --dkvp rename 2,newname
1=x,newname=y,3=z
::
$ echo x,y,z | mlr --csv --implicit-csv-header reorder -f 3,1,2
3,1,2
z,x,y
How can I filter by date?
----------------------------------------------------------------
Given input like
::
$ cat dates.csv
date,event
2018-02-03,initialization
2018-03-07,discovery
2018-02-03,allocation
we can use ``strptime`` to parse the date field into seconds-since-epoch and then do numeric comparisons. Simply match your input dataset's date-formatting to the :ref:`reference-dsl-strptime` format-string. For example:
::
$ mlr --csv filter 'strptime($date, "%Y-%m-%d") > strptime("2018-03-03", "%Y-%m-%d")' dates.csv
date,event
2018-03-07,discovery
Caveat: localtime-handling in timezones with DST is still a work in progress; see https://github.com/johnkerl/miller/issues/170. See also https://github.com/johnkerl/miller/issues/208 -- thanks @aborruso!
How can I handle commas-as-data in various formats?
----------------------------------------------------------------
:doc:`CSV <file-formats>` handles this well and by design:
::
$ cat commas.csv
Name,Role
"Xiao, Lin",administrator
"Khavari, Darius",tester
Likewise :ref:`file-formats-json`:
::
$ mlr --icsv --ojson cat commas.csv
{ "Name": "Xiao, Lin", "Role": "administrator" }
{ "Name": "Khavari, Darius", "Role": "tester" }
For Miller's :ref:`vertical-tabular format <file-formats-xtab>` there is no escaping for carriage returns, but commas work fine:
::
$ mlr --icsv --oxtab cat commas.csv
Name Xiao, Lin
Role administrator
Name Khavari, Darius
Role tester
But for :ref:`Key-value_pairs <file-formats-dkvp>` and :ref:`index-numbered <file-formats-nidx>`, commas are the default field separator. And -- as of Miller 5.4.0 anyway -- there is no CSV-style double-quote-handling like there is for CSV. So commas within the data look like delimiters:
::
$ mlr --icsv --odkvp cat commas.csv
Name=Xiao, Lin,Role=administrator
Name=Khavari, Darius,Role=tester
One solution is to use a different delimiter, such as a pipe character:
::
$ mlr --icsv --odkvp --ofs pipe cat commas.csv
Name=Xiao, Lin|Role=administrator
Name=Khavari, Darius|Role=tester
To be extra-sure to avoid data/delimiter clashes, you can also use control
characters as delimiters -- here, control-A:
::
$ mlr --icsv --odkvp --ofs '\001' cat commas.csv | cat -v
Name=Xiao, Lin^ARole=administrator
Name=Khavari, Darius^ARole=tester
How can I handle field names with special symbols in them?
----------------------------------------------------------------
Simply surround the field names with curly braces:
::
$ echo 'x.a=3,y:b=4,z/c=5' | mlr put '${product.all} = ${x.a} * ${y:b} * ${z/c}'
x.a=3,y:b=4,z/c=5,product.all=60
How to escape '?' in regexes?
----------------------------------------------------------------
One way is to use square brackets; an alternative is to use simple string-substitution rather than a regular expression.
::
$ cat data/question.dat
a=is it?,b=it is!
$ mlr --oxtab put '$c = gsub($a, "[?]"," ...")' data/question.dat
a is it?
b it is!
c is it ...
$ mlr --oxtab put '$c = ssub($a, "?"," ...")' data/question.dat
a is it?
b it is!
c is it ...
The ``ssub`` function exists precisely for this reason: so you don't have to escape anything.
How can I put single-quotes into strings?
----------------------------------------------------------------
This is a little tricky due to the shell's handling of quotes. For simplicity, let's first put an update script into a file:
::
$a = "It's OK, I said, then 'for now'."
::
$ echo a=bcd | mlr put -f data/single-quote-example.mlr
a=It's OK, I said, then 'for now'.
So, it's simple: Miller's DSL uses double quotes for strings, and you can put single quotes (or backslash-escaped double-quotes) inside strings, no problem.
Without putting the update expression in a file, it's messier:
::
$ echo a=bcd | mlr put '$a="It'\''s OK, I said, '\''for now'\''."'
a=It's OK, I said, 'for now'.
The idea is that the outermost single-quotes are to protect the ``put`` expression from the shell, and the double quotes within them are for Miller. To get a single quote in the middle there, you need to actually put it *outside* the single-quoting for the shell. The pieces are the following, all concatenated together:
* ``$a="It``
* ``\'``
* ``s OK, I said,``
* ``\'``
* ``for now``
* ``\'``
* ``.``
Why doesn't mlr cut put fields in the order I want?
----------------------------------------------------------------
Example: columns ``x,i,a`` were requested but they appear here in the order ``a,i,x``:
::
$ cat data/small
a=pan,b=pan,i=1,x=0.3467901443380824,y=0.7268028627434533
a=eks,b=pan,i=2,x=0.7586799647899636,y=0.5221511083334797
a=wye,b=wye,i=3,x=0.20460330576630303,y=0.33831852551664776
a=eks,b=wye,i=4,x=0.38139939387114097,y=0.13418874328430463
a=wye,b=pan,i=5,x=0.5732889198020006,y=0.8636244699032729
::
$ mlr cut -f x,i,a data/small
a=pan,i=1,x=0.3467901443380824
a=eks,i=2,x=0.7586799647899636
a=wye,i=3,x=0.20460330576630303
a=eks,i=4,x=0.38139939387114097
a=wye,i=5,x=0.5732889198020006
The issue is that Miller's ``cut``, by default, outputs cut fields in the order they appear in the input data. This design decision was made intentionally to parallel the Unix/Linux system ``cut`` command, which has the same semantics.
The solution is to use the ``-o`` option:
::
$ mlr cut -o -f x,i,a data/small
x=0.3467901443380824,i=1,a=pan
x=0.7586799647899636,i=2,a=eks
x=0.20460330576630303,i=3,a=wye
x=0.38139939387114097,i=4,a=eks
x=0.5732889198020006,i=5,a=wye
NR is not consecutive after then-chaining
----------------------------------------------------------------
Given this input data:
::
$ cat data/small
a=pan,b=pan,i=1,x=0.3467901443380824,y=0.7268028627434533
a=eks,b=pan,i=2,x=0.7586799647899636,y=0.5221511083334797
a=wye,b=wye,i=3,x=0.20460330576630303,y=0.33831852551664776
a=eks,b=wye,i=4,x=0.38139939387114097,y=0.13418874328430463
a=wye,b=pan,i=5,x=0.5732889198020006,y=0.8636244699032729
why don't I see ``NR=1`` and ``NR=2`` here??
::
$ mlr filter '$x > 0.5' then put '$NR = NR' data/small
a=eks,b=pan,i=2,x=0.7586799647899636,y=0.5221511083334797,NR=2
a=wye,b=pan,i=5,x=0.5732889198020006,y=0.8636244699032729,NR=5
The reason is that ``NR`` is computed for the original input records and isn't dynamically updated. By contrast, ``NF`` is dynamically updated: it's the number of fields in the current record, and if you add/remove a field, the value of ``NF`` will change:
::
$ echo x=1,y=2,z=3 | mlr put '$nf1 = NF; $u = 4; $nf2 = NF; unset $x,$y,$z; $nf3 = NF'
nf1=3,u=4,nf2=5,nf3=3
``NR``, by contrast (and ``FNR`` as well), retains the value from the original input stream, and records may be dropped by a ``filter`` within a ``then``-chain. To recover consecutive record numbers, you can use out-of-stream variables as follows:
::
$ mlr --opprint --from data/small put '
begin{ @nr1 = 0 }
@nr1 += 1;
$nr1 = @nr1
' \
then filter '$x>0.5' \
then put '
begin{ @nr2 = 0 }
@nr2 += 1;
$nr2 = @nr2
'
a b i x y nr1 nr2
eks pan 2 0.7586799647899636 0.5221511083334797 2 1
wye pan 5 0.5732889198020006 0.8636244699032729 5 2
Or, simply use ``mlr cat -n``:
::
$ mlr filter '$x > 0.5' then cat -n data/small
n=1,a=eks,b=pan,i=2,x=0.7586799647899636,y=0.5221511083334797
n=2,a=wye,b=pan,i=5,x=0.5732889198020006,y=0.8636244699032729
Why am I not seeing all possible joins occur?
----------------------------------------------------------------
**This section describes behavior before Miller 5.1.0. As of 5.1.0, -u is the default.**
For example, the right file here has nine records, and the left file should add in the ``hostname`` column -- so the join output should also have 9 records:
::
$ mlr --icsvlite --opprint cat data/join-u-left.csv
hostname ipaddr
nadir.east.our.org 10.3.1.18
zenith.west.our.org 10.3.1.27
apoapsis.east.our.org 10.4.5.94
::
$ mlr --icsvlite --opprint cat data/join-u-right.csv
ipaddr timestamp bytes
10.3.1.27 1448762579 4568
10.3.1.18 1448762578 8729
10.4.5.94 1448762579 17445
10.3.1.27 1448762589 12
10.3.1.18 1448762588 44558
10.4.5.94 1448762589 8899
10.3.1.27 1448762599 0
10.3.1.18 1448762598 73425
10.4.5.94 1448762599 12200
::
$ mlr --icsvlite --opprint join -s -j ipaddr -f data/join-u-left.csv data/join-u-right.csv
ipaddr hostname timestamp bytes
10.3.1.27 zenith.west.our.org 1448762579 4568
10.4.5.94 apoapsis.east.our.org 1448762579 17445
10.4.5.94 apoapsis.east.our.org 1448762589 8899
10.4.5.94 apoapsis.east.our.org 1448762599 12200
The issue is that Miller's ``join``, by default (before 5.1.0), took input sorted (lexically ascending) by the sort keys on both the left and right files. This design decision was made intentionally to parallel the Unix/Linux system ``join`` command, which has the same semantics. The benefit of this default is that the joiner program can stream through the left and right files, needing to load neither entirely into memory. The drawback, of course, is that is requires sorted input.
The solution (besides pre-sorting the input files on the join keys) is to simply use **mlr join -u** (which is now the default). This loads the left file entirely into memory (while the right file is still streamed one line at a time) and does all possible joins without requiring sorted input:
::
$ mlr --icsvlite --opprint join -u -j ipaddr -f data/join-u-left.csv data/join-u-right.csv
ipaddr hostname timestamp bytes
10.3.1.27 zenith.west.our.org 1448762579 4568
10.3.1.18 nadir.east.our.org 1448762578 8729
10.4.5.94 apoapsis.east.our.org 1448762579 17445
10.3.1.27 zenith.west.our.org 1448762589 12
10.3.1.18 nadir.east.our.org 1448762588 44558
10.4.5.94 apoapsis.east.our.org 1448762589 8899
10.3.1.27 zenith.west.our.org 1448762599 0
10.3.1.18 nadir.east.our.org 1448762598 73425
10.4.5.94 apoapsis.east.our.org 1448762599 12200
General advice is to make sure the left-file is relatively small, e.g. containing name-to-number mappings, while saving large amounts of data for the right file.
How to rectangularize after joins with unpaired?
----------------------------------------------------------------
Suppose you have the following two data files:
::
id,code
3,0000ff
2,00ff00
4,ff0000
::
id,color
4,red
2,green
Joining on color the results are as expected:
::
$ mlr --csv join -j id -f data/color-codes.csv data/color-names.csv
id,code,color
4,ff0000,red
2,00ff00,green
However, if we ask for left-unpaireds, since there's no ``color`` column, we get a row not having the same column names as the other:
::
$ mlr --csv join --ul -j id -f data/color-codes.csv data/color-names.csv
id,code,color
4,ff0000,red
2,00ff00,green
id,code
3,0000ff
To fix this, we can use **unsparsify**:
::
$ mlr --csv join --ul -j id -f data/color-codes.csv then unsparsify --fill-with "" data/color-names.csv
id,code,color
4,ff0000,red
2,00ff00,green
3,0000ff,
Thanks to @aborruso for the tip!
What about XML or JSON file formats?
----------------------------------------------------------------
Miller handles **tabular data**, which is a list of records each having fields which are key-value pairs. Miller also doesn't require that each record have the same field names (see also :doc:`record-heterogeneity`). Regardless, tabular data is a **non-recursive data structure**.
XML, JSON, etc. are, by contrast, all **recursive** or **nested** data structures. For example, in JSON you can represent a hash map whose values are lists of lists.
Now, you can put tabular data into these formats -- since list-of-key-value-pairs is one of the things representable in XML or JSON. Example:
::
# DKVP
x=1,y=2
z=3
::
# XML
<table>
<record>
<field>
<key> x </key> <value> 1 </value>
</field>
<field>
<key> y </key> <value> 2 </value>
</field>
</record>
<record>
<field>
<key> z </key> <value> 3 </value>
</field>
</record>
</table>
::
# JSON
[{"x":1,"y":2},{"z":3}]
However, a tool like Miller which handles non-recursive data is never going to be able to handle full XML/JSON semantics -- only a small subset. If tabular data represented in XML/JSON/etc are sufficiently well-structured, it may be easy to grep/sed out the data into a simpler text form -- this is a general text-processing problem.
Miller does support tabular data represented in JSON: please see :doc:`file-formats`. See also `jq <https://stedolan.github.io/jq/>`_ for a truly powerful, JSON-specific tool.
For XML, my suggestion is to use a tool like `ff-extractor <http://ff-extractor.sourceforge.net>`_ to do format conversion.

View file

@ -1,67 +0,0 @@
..
PLEASE DO NOT EDIT DIRECTLY. EDIT THE .rst.in FILE PLEASE.
Unix-toolkit context
================================================================
How does Miller fit within the Unix toolkit (`grep`, `sed`, `awk`, etc.)?
File-format awareness
----------------------------------------------------------------
Miller respects CSV headers. If you do ``mlr --csv cat *.csv`` then the header line is written once:
::
$ cat data/a.csv
a,b,c
1,2,3
4,5,6
::
$ cat data/b.csv
a,b,c
7,8,9
::
$ mlr --csv cat data/a.csv data/b.csv
a,b,c
1,2,3
4,5,6
7,8,9
::
$ mlr --csv sort -nr b data/a.csv data/b.csv
a,b,c
7,8,9
4,5,6
1,2,3
Likewise with ``mlr sort``, ``mlr tac``, and so on.
awk-like features: mlr filter and mlr put
----------------------------------------------------------------
* ``mlr filter`` includes/excludes records based on a filter expression, e.g. ``mlr filter '$count > 10'``.
* ``mlr put`` adds a new field as a function of others, e.g. ``mlr put '$xy = $x * $y'`` or ``mlr put '$counter = NR'``.
* The ``$name`` syntax is straight from ``awk``'s ``$1 $2 $3`` (adapted to name-based indexing), as are the variables ``FS``, ``OFS``, ``RS``, ``ORS``, ``NF``, ``NR``, and ``FILENAME``. The ``ENV[...]`` syntax is from Ruby.
* While ``awk`` functions are record-based, Miller subcommands (or *verbs*) are stream-based: each of them maps a stream of records into another stream of records.
* Like ``awk``, Miller (as of v5.0.0) allows you to define new functions within its ``put`` and ``filter`` expression language. Further programmability comes from chaining with ``then``.
* As with ``awk``, ``$``-variables are stream variables and all verbs (such as ``cut``, ``stats1``, ``put``, etc.) as well as ``put``/``filter`` statements operate on streams. This means that you define actions to be done on each record and then stream your data through those actions. The built-in variables ``NF``, ``NR``, etc. change from one line to another, ``$x`` is a label for field ``x`` in the current record, and the input to ``sqrt($x)`` changes from one record to the next. The expression language for the ``put`` and ``filter`` verbs additionally allows you to define ``begin {...}`` and ``end {...}`` blocks for actions to be taken before and after records are processed, respectively.
* As with ``awk``, Miller's ``put``/``filter`` language lets you set ``@sum=0`` before records are read, then update that sum on each record, then print its value at the end. Unlike ``awk``, Miller makes syntactically explicit the difference between variables with extent across all records (names starting with ``@``, such as ``@sum``) and variables which are local to the current expression (names starting without ``@``, such as ``sum``).
* Miller can be faster than ``awk``, ``cut``, and so on, depending on platform; see also :doc:`performance`. In particular, Miller's DSL syntax is parsed into C control structures at startup time, with the bulk data-stream processing all done in C.
See also
----------------------------------------------------------------
See :doc:`reference-verbs` for more on Miller's subcommands ``cat``, ``cut``, ``head``, ``sort``, ``tac``, ``tail``, ``top``, and ``uniq``, as well as :doc:`reference-dsl` for more on the awk-like ``mlr filter`` and ``mlr put``.

View file

@ -1,43 +0,0 @@
..
PLEASE DO NOT EDIT DIRECTLY. EDIT THE .rst.in FILE PLEASE.
Features
================================================================
Miller is like awk, sed, cut, join, and sort for **name-indexed data such as
CSV, TSV, and tabular JSON**. You get to work with your data using named
fields, without needing to count positional column indices.
This is something the Unix toolkit always could have done, and arguably
always should have done. It operates on key-value-pair data while the familiar
Unix tools operate on integer-indexed fields: if the natural data structure for
the latter is the array, then Miller's natural data structure is the
insertion-ordered hash map. This encompasses a **variety of data formats**,
including but not limited to the familiar CSV, TSV, and JSON. (Miller can handle
**positionally-indexed data** as a special case.)
* Miller is **multi-purpose**: it's useful for **data cleaning**, **data reduction**, **statistical reporting**, **devops**, **system administration**, **log-file processing**, **format conversion**, and **database-query post-processing**.
* You can use Miller to snarf and munge **log-file data**, including selecting out relevant substreams, then produce CSV format and load that into all-in-memory/data-frame utilities for further statistical and/or graphical processing.
* Miller complements **data-analysis tools** such as **R**, **pandas**, etc.: you can use Miller to **clean** and **prepare** your data. While you can do **basic statistics** entirely in Miller, its streaming-data feature and single-pass algorithms enable you to **reduce very large data sets**.
* Miller complements SQL **databases**: you can slice, dice, and reformat data on the client side on its way into or out of a database. (Examples :ref:`here <sql-input-examples>` and :ref:`here <sql-output-examples>`.) You can also reap some of the benefits of databases for quick, setup-free one-off tasks when you just need to query some data in disk files in a hurry.
* Miller also goes beyond the classic Unix tools by stepping fully into our modern, **no-SQL** world: its essential record-heterogeneity property allows Miller to operate on data where records with different schema (field names) are interleaved.
* Miller is **streaming**: most operations need only a single record in memory at a time, rather than ingesting all input before producing any output. For those operations which require deeper retention (``sort``, ``tac``, ``stats1``), Miller retains only as much data as needed. This means that whenever functionally possible, you can operate on files which are larger than your system's available RAM, and you can use Miller in **tail -f** contexts.
* Miller is **pipe-friendly** and interoperates with the Unix toolkit
* Miller's I/O formats include **tabular pretty-printing**, **positionally indexed** (Unix-toolkit style), CSV, JSON, and others
* Miller does **conversion** between formats
* Miller's **processing is format-aware**: e.g. CSV ``sort`` and ``tac`` keep header lines first
* Miller has high-throughput **performance** on par with the Unix toolkit
* Not unlike `jq <https://stedolan.github.io/jq/>`_ (for JSON), Miller is written in portable, modern C, with **zero runtime dependencies**. You can download or compile a single binary, ``scp`` it to a faraway machine, and expect it to work.
Releases and release notes: https://github.com/johnkerl/miller/releases.

View file

@ -1,589 +0,0 @@
..
PLEASE DO NOT EDIT DIRECTLY. EDIT THE .rst.in FILE PLEASE.
File formats
================================================================
Miller handles name-indexed data using several formats: some you probably know by name, such as CSV, TSV, and JSON -- and other formats you're likely already seeing and using in your structured data. Additionally, Miller gives you the option of including comments within your data.
Examples
----------------------------------------------------------------
::
$ mlr --usage-data-format-examples
DKVP: delimited key-value pairs (Miller default format)
+---------------------+
| apple=1,bat=2,cog=3 | Record 1: "apple" => "1", "bat" => "2", "cog" => "3"
| dish=7,egg=8,flint | Record 2: "dish" => "7", "egg" => "8", "3" => "flint"
+---------------------+
NIDX: implicitly numerically indexed (Unix-toolkit style)
+---------------------+
| the quick brown | Record 1: "1" => "the", "2" => "quick", "3" => "brown"
| fox jumped | Record 2: "1" => "fox", "2" => "jumped"
+---------------------+
CSV/CSV-lite: comma-separated values with separate header line
+---------------------+
| apple,bat,cog |
| 1,2,3 | Record 1: "apple => "1", "bat" => "2", "cog" => "3"
| 4,5,6 | Record 2: "apple" => "4", "bat" => "5", "cog" => "6"
+---------------------+
Tabular JSON: nested objects are supported, although arrays within them are not:
+---------------------+
| { |
| "apple": 1, | Record 1: "apple" => "1", "bat" => "2", "cog" => "3"
| "bat": 2, |
| "cog": 3 |
| } |
| { |
| "dish": { | Record 2: "dish:egg" => "7", "dish:flint" => "8", "garlic" => ""
| "egg": 7, |
| "flint": 8 |
| }, |
| "garlic": "" |
| } |
+---------------------+
PPRINT: pretty-printed tabular
+---------------------+
| apple bat cog |
| 1 2 3 | Record 1: "apple => "1", "bat" => "2", "cog" => "3"
| 4 5 6 | Record 2: "apple" => "4", "bat" => "5", "cog" => "6"
+---------------------+
XTAB: pretty-printed transposed tabular
+---------------------+
| apple 1 | Record 1: "apple" => "1", "bat" => "2", "cog" => "3"
| bat 2 |
| cog 3 |
| |
| dish 7 | Record 2: "dish" => "7", "egg" => "8"
| egg 8 |
+---------------------+
Markdown tabular (supported for output only):
+-----------------------+
| | apple | bat | cog | |
| | --- | --- | --- | |
| | 1 | 2 | 3 | | Record 1: "apple => "1", "bat" => "2", "cog" => "3"
| | 4 | 5 | 6 | | Record 2: "apple" => "4", "bat" => "5", "cog" => "6"
+-----------------------+
.. _file-formats-csv:
CSV/TSV/ASV/USV/etc.
----------------------------------------------------------------
When ``mlr`` is invoked with the ``--csv`` or ``--csvlite`` option, key names are found on the first record and values are taken from subsequent records. This includes the case of CSV-formatted files. See :doc:`record-heterogeneity` for how Miller handles changes of field names within a single data stream.
Miller has record separator ``RS`` and field separator ``FS``, just as ``awk`` does. For TSV, use ``--fs tab``; to convert TSV to CSV, use ``--ifs tab --ofs comma``, etc. (See also :ref:`reference-separators`.)
**TSV (tab-separated values):** the following are synonymous pairs:
* ``--tsv`` and ``--csv --fs tab``
* ``--itsv`` and ``--icsv --ifs tab``
* ``--otsv`` and ``--ocsv --ofs tab``
* ``--tsvlite`` and ``--csvlite --fs tab``
* ``--itsvlite`` and ``--icsvlite --ifs tab``
* ``--otsvlite`` and ``--ocsvlite --ofs tab``
**ASV (ASCII-separated values):** the flags ``--asv``, ``--iasv``, ``--oasv``, ``--asvlite``, ``--iasvlite``, and ``--oasvlite`` are analogous except they use ASCII FS and RS 0x1f and 0x1e, respectively.
**USV (Unicode-separated values):** likewise, the flags ``--usv``, ``--iusv``, ``--ousv``, ``--usvlite``, ``--iusvlite``, and ``--ousvlite`` use Unicode FS and RS U+241F (UTF-8 0x0xe2909f) and U+241E (UTF-8 0xe2909e), respectively.
Miller's ``--csv`` flag supports `RFC-4180 CSV <https://tools.ietf.org/html/rfc4180">`_. This includes CRLF line-terminators by default, regardless of platform.
Here are the differences between CSV and CSV-lite:
* CSV supports `RFC-4180 <https://tools.ietf.org/html/rfc4180>`_-style double-quoting, including the ability to have commas and/or LF/CRLF line-endings contained within an input field; CSV-lite does not.
* CSV does not allow heterogeneous data; CSV-lite does (see also :doc:`record-heterogeneity`).
* The CSV-lite input-reading code is fractionally more efficient than the CSV input-reader.
Here are things they have in common:
* The ability to specify record/field separators other than the default, e.g. CR-LF vs. LF, or tab instead of comma for TSV, and so on.
* The ``--implicit-csv-header`` flag for input and the ``--headerless-csv-output`` flag for output.
.. _file-formats-dkvp:
DKVP: Key-value pairs
----------------------------------------------------------------
Miller's default file format is DKVP, for **delimited key-value pairs**. Example::
$ mlr cat data/small
a=pan,b=pan,i=1,x=0.3467901443380824,y=0.7268028627434533
a=eks,b=pan,i=2,x=0.7586799647899636,y=0.5221511083334797
a=wye,b=wye,i=3,x=0.20460330576630303,y=0.33831852551664776
a=eks,b=wye,i=4,x=0.38139939387114097,y=0.13418874328430463
a=wye,b=pan,i=5,x=0.5732889198020006,y=0.8636244699032729
Such data are easy to generate, e.g. in Ruby with
::
puts "host=#{hostname},seconds=#{t2-t1},message=#{msg}"
::
puts mymap.collect{|k,v| "#{k}=#{v}"}.join(',')
or ``print`` statements in various languages, e.g.
::
echo "type=3,user=$USER,date=$date\n";
::
logger.log("type=3,user=$USER,date=$date\n");
Fields lacking an IPS will have positional index (starting at 1) used as the key, as in NIDX format. For example, ``dish=7,egg=8,flint`` is parsed as ``"dish" => "7", "egg" => "8", "3" => "flint"`` and ``dish,egg,flint`` is parsed as ``"1" => "dish", "2" => "egg", "3" => "flint"``.
As discussed in :doc:`record-heterogeneity`, Miller handles changes of field names within the same data stream. But using DKVP format this is particularly natural. One of my favorite use-cases for Miller is in application/server logs, where I log all sorts of lines such as
::
resource=/path/to/file,loadsec=0.45,ok=true
record_count=100, resource=/path/to/file
resource=/some/other/path,loadsec=0.97,ok=false
etc. and I just log them as needed. Then later, I can use ``grep``, ``mlr --opprint group-like``, etc.
to analyze my logs.
See :doc:`reference` regarding how to specify separators other than the default equals-sign and comma.
.. _file-formats-nidx:
NIDX: Index-numbered (toolkit style)
----------------------------------------------------------------
With ``--inidx --ifs ' ' --repifs``, Miller splits lines on whitespace and assigns integer field names starting with 1. This recapitulates Unix-toolkit behavior.
Example with index-numbered output:
::
$ cat data/small
a=pan,b=pan,i=1,x=0.3467901443380824,y=0.7268028627434533
a=eks,b=pan,i=2,x=0.7586799647899636,y=0.5221511083334797
a=wye,b=wye,i=3,x=0.20460330576630303,y=0.33831852551664776
a=eks,b=wye,i=4,x=0.38139939387114097,y=0.13418874328430463
a=wye,b=pan,i=5,x=0.5732889198020006,y=0.8636244699032729
$ mlr --onidx --ofs ' ' cat data/small
pan pan 1 0.3467901443380824 0.7268028627434533
eks pan 2 0.7586799647899636 0.5221511083334797
wye wye 3 0.20460330576630303 0.33831852551664776
eks wye 4 0.38139939387114097 0.13418874328430463
wye pan 5 0.5732889198020006 0.8636244699032729
Example with index-numbered input:
::
$ cat data/mydata.txt
oh say can you see
by the dawn's
early light
$ mlr --inidx --ifs ' ' --odkvp cat data/mydata.txt
1=oh,2=say,3=can,4=you,5=see
1=by,2=the,3=dawn's
1=early,2=light
Example with index-numbered input and output:
::
$ cat data/mydata.txt
oh say can you see
by the dawn's
early light
$ mlr --nidx --fs ' ' --repifs cut -f 2,3 data/mydata.txt
say can
the dawn's
light
.. _file-formats-json:
Tabular JSON
----------------------------------------------------------------
JSON is a format which supports arbitrarily deep nesting of "objects" (hashmaps) and "arrays" (lists), while Miller is a tool for handling **tabular data** only. This means Miller cannot (and should not) handle arbitrary JSON. (Check out `jq <https://stedolan.github.io/jq/>`_.)
But if you have tabular data represented in JSON then Miller can handle that for you.
Single-level JSON objects
^^^^^^^^^^^^^^^^^^^^^^^^^
An **array of single-level objects** is, quite simply, **a table**:
::
$ mlr --json head -n 2 then cut -f color,shape data/json-example-1.json
{ "color": "yellow", "shape": "triangle" }
{ "color": "red", "shape": "square" }
$ mlr --json --jvstack head -n 2 then cut -f color,u,v data/json-example-1.json
{
"color": "yellow",
"u": 0.6321695890307647,
"v": 0.9887207810889004
}
{
"color": "red",
"u": 0.21966833570651523,
"v": 0.001257332190235938
}
$ mlr --ijson --opprint stats1 -a mean,stddev,count -f u -g shape data/json-example-1.json
shape u_mean u_stddev u_count
triangle 0.583995 0.131184 3
square 0.409355 0.365428 4
circle 0.366013 0.209094 3
Nested JSON objects
^^^^^^^^^^^^^^^^^^^^^^^^^
Additionally, Miller can **tabularize nested objects by concatentating keys**:
::
$ mlr --json --jvstack head -n 2 data/json-example-2.json
{
"flag": 1,
"i": 11,
"attributes": {
"color": "yellow",
"shape": "triangle"
},
"values": {
"u": 0.632170,
"v": 0.988721,
"w": 0.436498,
"x": 5.798188
}
}
{
"flag": 1,
"i": 15,
"attributes": {
"color": "red",
"shape": "square"
},
"values": {
"u": 0.219668,
"v": 0.001257,
"w": 0.792778,
"x": 2.944117
}
}
$ mlr --ijson --opprint head -n 4 data/json-example-2.json
flag i attributes:color attributes:shape values:u values:v values:w values:x
1 11 yellow triangle 0.632170 0.988721 0.436498 5.798188
1 15 red square 0.219668 0.001257 0.792778 2.944117
1 16 red circle 0.209017 0.290052 0.138103 5.065034
0 48 red square 0.956274 0.746720 0.775542 7.117831
Note in particular that as far as Miller's ``put`` and ``filter``, as well as other I/O formats, are concerned, these are simply field names with colons in them::
$ mlr --json --jvstack head -n 1 then put '${values:uv} = ${values:u} * ${values:v}' data/json-example-2.json
{
"flag": 1,
"i": 11,
"attributes": {
"color": "yellow",
"shape": "triangle"
},
"values": {
"u": 0.632170,
"v": 0.988721,
"w": 0.436498,
"x": 5.798188,
"uv": 0.625040
}
}
Arrays
^^^^^^^^^^^^^^^^^^^^^^^^^
Arrays aren't supported in Miller's ``put``/``filter`` DSL. By default, JSON arrays are read in as integer-keyed maps.
Suppose we have arrays like this in our input data::
$ cat data/json-example-3.json
{
"label": "orange",
"values": [12.2, 13.8, 17.2]
}
{
"label": "purple",
"values": [27.0, 32.4]
}
Then integer indices (starting from 0 and counting up) are used as map keys::
$ mlr --ijson --oxtab cat data/json-example-3.json
label orange
values:0 12.2
values:1 13.8
values:2 17.2
label purple
values:0 27.0
values:1 32.4
When the data are written back out as JSON, field names are re-expanded as above, but what were arrays on input are now maps on output::
$ mlr --json --jvstack cat data/json-example-3.json
{
"label": "orange",
"values": {
"0": 12.2,
"1": 13.8,
"2": 17.2
}
}
{
"label": "purple",
"values": {
"0": 27.0,
"1": 32.4
}
}
This is non-ideal, but it allows Miller (5.x release being latest as of this writing) to handle JSON arrays at all.
You might also use ``mlr --json-skip-arrays-on-input`` or ``mlr --json-fatal-arrays-on-input``.
To truly handle JSON, please use a JSON-processing tool such as `jq <https://stedolan.github.io/jq/>`_.
Formatting JSON options
^^^^^^^^^^^^^^^^^^^^^^^^^
JSON isn't a parameterized format, so ``RS``, ``FS``, ``PS`` aren't specifiable. Nonetheless, you can do the following:
* Use ``--jvstack`` to pretty-print JSON objects with multi-line (vertically stacked) spacing. By default, each Miller record (JSON object) is one per line.
* Keystroke-savers: ``--jsonx`` simply means ``--json --jvstack``, and ``--ojsonx`` simply means ``--ojson --jvstack``.
* Use ``--jlistwrap`` to print the sequence of JSON objects wrapped in an outermost ``[`` and ``]``. By default, these aren't printed.
* Use ``--jquoteall`` to double-quote all object values. By default, integers, floating-point numbers, and booleans ``true`` and ``false`` are not double-quoted when they appear as JSON-object keys.
* Use ``--jflatsep yourstringhere`` to specify the string used for key concatenation: this defaults to a single colon.
* Use ``--jofmt`` to force Miller to apply the global ``--ofmt`` to floating-point values. First note: please use sprintf-style codes for double precision, e.g. ending in ``%lf``, ``%le``, or ``%lg``. Miller floats are double-precision so behavior using ``%f``, ``%d``, etc. is undefined. Second note: ``0.123`` is valid JSON; ``.123`` is not. Thus this feature allows you to emit JSON which may be unparseable by other tools.
Again, please see `jq <https://stedolan.github.io/jq/>`_ for a truly powerful, JSON-specific tool.
JSON non-streaming
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The JSON parser Miller uses does not return until all input is parsed: in particular this means that, unlike for other file formats, Miller does not (at present) handle JSON files in ``tail -f`` contexts.
.. _file-formats-pprint:
PPRINT: Pretty-printed tabular
----------------------------------------------------------------
Miller's pretty-print format is like CSV, but column-aligned. For example, compare
::
$ mlr --ocsv cat data/small
a,b,i,x,y
pan,pan,1,0.3467901443380824,0.7268028627434533
eks,pan,2,0.7586799647899636,0.5221511083334797
wye,wye,3,0.20460330576630303,0.33831852551664776
eks,wye,4,0.38139939387114097,0.13418874328430463
wye,pan,5,0.5732889198020006,0.8636244699032729
$ mlr --opprint cat data/small
a b i x y
pan pan 1 0.3467901443380824 0.7268028627434533
eks pan 2 0.7586799647899636 0.5221511083334797
wye wye 3 0.20460330576630303 0.33831852551664776
eks wye 4 0.38139939387114097 0.13418874328430463
wye pan 5 0.5732889198020006 0.8636244699032729
Note that while Miller is a line-at-a-time processor and retains input lines in memory only where necessary (e.g. for sort), pretty-print output requires it to accumulate all input lines (so that it can compute maximum column widths) before producing any output. This has two consequences: (a) pretty-print output won't work on ``tail -f`` contexts, where Miller will be waiting for an end-of-file marker which never arrives; (b) pretty-print output for large files is constrained by available machine memory.
See :doc:`record-heterogeneity` for how Miller handles changes of field names within a single data stream.
For output only (this isn't supported in the input-scanner as of 5.0.0) you can use ``--barred`` with pprint output format::
$ mlr --opprint --barred cat data/small
+-----+-----+---+---------------------+---------------------+
| a | b | i | x | y |
+-----+-----+---+---------------------+---------------------+
| pan | pan | 1 | 0.3467901443380824 | 0.7268028627434533 |
| eks | pan | 2 | 0.7586799647899636 | 0.5221511083334797 |
| wye | wye | 3 | 0.20460330576630303 | 0.33831852551664776 |
| eks | wye | 4 | 0.38139939387114097 | 0.13418874328430463 |
| wye | pan | 5 | 0.5732889198020006 | 0.8636244699032729 |
+-----+-----+---+---------------------+---------------------+
.. _file-formats-xtab:
XTAB: Vertical tabular
----------------------------------------------------------------
This is perhaps most useful for looking a very wide and/or multi-column data which causes line-wraps on the screen (but see also
`ngrid <https://github.com/twosigma/ngrid/>`_ for an entirely different, very powerful option). Namely::
$ grep -v '^#' /etc/passwd | head -n 6 | mlr --nidx --fs : --opprint cat
1 2 3 4 5 6 7
nobody * -2 -2 Unprivileged User /var/empty /usr/bin/false
root * 0 0 System Administrator /var/root /bin/sh
daemon * 1 1 System Services /var/root /usr/bin/false
_uucp * 4 4 Unix to Unix Copy Protocol /var/spool/uucp /usr/sbin/uucico
_taskgated * 13 13 Task Gate Daemon /var/empty /usr/bin/false
_networkd * 24 24 Network Services /var/networkd /usr/bin/false
$ grep -v '^#' /etc/passwd | head -n 2 | mlr --nidx --fs : --oxtab cat
1 nobody
2 *
3 -2
4 -2
5 Unprivileged User
6 /var/empty
7 /usr/bin/false
1 root
2 *
3 0
4 0
5 System Administrator
6 /var/root
7 /bin/sh
$ grep -v '^#' /etc/passwd | head -n 2 | \
mlr --nidx --fs : --ojson --jvstack --jlistwrap label name,password,uid,gid,gecos,home_dir,shell
[
{
"name": "nobody",
"password": "*",
"uid": -2,
"gid": -2,
"gecos": "Unprivileged User",
"home_dir": "/var/empty",
"shell": "/usr/bin/false"
}
,{
"name": "root",
"password": "*",
"uid": 0,
"gid": 0,
"gecos": "System Administrator",
"home_dir": "/var/root",
"shell": "/bin/sh"
}
]
Markdown tabular
----------------------------------------------------------------
Markdown format looks like this::
$ mlr --omd cat data/small
| a | b | i | x | y |
| --- | --- | --- | --- | --- |
| pan | pan | 1 | 0.3467901443380824 | 0.7268028627434533 |
| eks | pan | 2 | 0.7586799647899636 | 0.5221511083334797 |
| wye | wye | 3 | 0.20460330576630303 | 0.33831852551664776 |
| eks | wye | 4 | 0.38139939387114097 | 0.13418874328430463 |
| wye | pan | 5 | 0.5732889198020006 | 0.8636244699032729 |
which renders like this when dropped into various web tools (e.g. github comments):
.. image:: pix/omd.png
As of Miller 4.3.0, markdown format is supported only for output, not input.
Data-conversion keystroke-savers
----------------------------------------------------------------
While you can do format conversion using ``mlr --icsv --ojson cat myfile.csv``, there are also keystroke-savers for this purpose, such as ``mlr --c2j cat myfile.csv``. For a complete list::
$ mlr --usage-format-conversion-keystroke-saver-options
As keystroke-savers for format-conversion you may use the following:
--c2t --c2d --c2n --c2j --c2x --c2p --c2m
--t2c --t2d --t2n --t2j --t2x --t2p --t2m
--d2c --d2t --d2n --d2j --d2x --d2p --d2m
--n2c --n2t --n2d --n2j --n2x --n2p --n2m
--j2c --j2t --j2d --j2n --j2x --j2p --j2m
--x2c --x2t --x2d --x2n --x2j --x2p --x2m
--p2c --p2t --p2d --p2n --p2j --p2x --p2m
The letters c t d n j x p m refer to formats CSV, TSV, DKVP, NIDX, JSON, XTAB,
PPRINT, and markdown, respectively. Note that markdown format is available for
output only.
Autodetect of line endings
----------------------------------------------------------------
Default line endings (``--irs`` and ``--ors``) are ``'auto'`` which means **autodetect from the input file format**, as long as the input file(s) have lines ending in either LF (also known as linefeed, ``'\n'``, ``0x0a``, Unix-style) or CRLF (also known as carriage-return/linefeed pairs, ``'\r\n'``, ``0x0d 0x0a``, Windows style).
**If both IRS and ORS are auto (which is the default) then LF input will lead to LF output and CRLF input will lead to CRLF output, regardless of the platform you're running on.**
The line-ending autodetector triggers on the first line ending detected in the input stream. E.g. if you specify a CRLF-terminated file on the command line followed by an LF-terminated file then autodetected line endings will be CRLF.
If you use ``--ors {something else}`` with (default or explicitly specified) ``--irs auto`` then line endings are autodetected on input and set to what you specify on output.
If you use ``--irs {something else}`` with (default or explicitly specified) ``--ors auto`` then the output line endings used are LF on Unix/Linux/BSD/MacOSX, and CRLF on Windows.
See also :ref:`reference-separators` for more information about record/field/pair separators.
Comments in data
----------------------------------------------------------------
You can include comments within your data files, and either have them ignored, or passed directly through to the standard output as soon as they are encountered::
$ mlr --usage-comments-in-data
--skip-comments Ignore commented lines (prefixed by "#")
within the input.
--skip-comments-with {string} Ignore commented lines within input, with
specified prefix.
--pass-comments Immediately print commented lines (prefixed by "#")
within the input.
--pass-comments-with {string} Immediately print commented lines within input, with
specified prefix.
Notes:
* Comments are only honored at the start of a line.
* In the absence of any of the above four options, comments are data like
any other text.
* When pass-comments is used, comment lines are written to standard output
immediately upon being read; they are not part of the record stream.
Results may be counterintuitive. A suggestion is to place comments at the
start of data files.
Examples::
$ cat data/budget.csv
# Asana -- here are the budget figures you asked for!
type,quantity
purple,456.78
green,678.12
orange,123.45
$ mlr --skip-comments --icsv --opprint sort -nr quantity data/budget.csv
type quantity
green 678.12
purple 456.78
orange 123.45
$ mlr --pass-comments --icsv --opprint sort -nr quantity data/budget.csv
# Asana -- here are the budget figures you asked for!
type quantity
green 678.12
purple 456.78
orange 123.45

View file

@ -1,64 +0,0 @@
Miller Docs v2
================================================================
Overview
----------------------------------------------------------------
.. toctree::
:maxdepth: 1
quick-examples
features
10min
feature-comparison
file-formats
record-heterogeneity
customization
install
internationalization
contact
Details
----------------------------------------------------------------
.. toctree::
:maxdepth: 1
faq
sql-examples
log-processing-examples
data-examples
cookbook
cookbook2
cookbook3
data-sharing
Reference
----------------------------------------------------------------
.. toctree::
:maxdepth: 1
reference
reference-verbs
reference-dsl
manpage
release-docs
build
Background
----------------------------------------------------------------
.. toctree::
:maxdepth: 1
why
etymology
originality
performance
Index
----------------------------------------------------------------
* :ref:`genindex`
* :ref:`search`

View file

@ -1,54 +0,0 @@
..
PLEASE DO NOT EDIT DIRECTLY. EDIT THE .rst.in FILE PLEASE.
Installation
================================================================
Prebuilt executables via package managers
----------------------------------------------------------------
`Homebrew <https://brew.sh/>`_ installation support for OSX is available via
::
brew update && brew install miller
...and also via `MacPorts <https://www.macports.org/>`_:
::
sudo port selfupdate && sudo port install miller
You may already have the ``mlr`` executable available in your platform's package manager on NetBSD, Debian Linux, Ubuntu Xenial and upward, Arch Linux, or perhaps other distributions. For example, on various Linux distributions you might do one of the following:
::
sudo apt-get install miller
::
sudo apt install miller
::
sudo yum install miller
On Windows, Miller is available via `Chocolatey <https://chocolatey.org/>`_:
::
choco install miller
Prebuilt executables via GitHub per release
----------------------------------------------------------------
Please see https://github.com/johnkerl/miller/releases where there are builds for OSX Yosemite, Linux x86-64 (dynamically linked), and Windows (via Appveyor build artifacts).
Miller is autobuilt for **Linux** using **Travis** on every commit (https://travis-ci.org/johnkerl/miller/builds). This was set up by the generous assistance of `SikhNerd <https://github.com/SikhNerd>`_ on Github, tracked in https://github.com/johnkerl/miller/issues/15. Analogously, Miller is autobuilt for **Windows** using the **Appveyor** continuous-build system: https://ci.appveyor.com/project/johnkerl/miller.
Miller releases from `5.1.0 <https://github.com/johnkerl/miller/releases/tag/v5.1.0w>`_ onward will have a precompiled Windows binary, in addition to the MacOSX and Linux 64-bit precompiled binaries as on previous releases. Specifically, at https://ci.appveyor.com/project/johnkerl/miller you can select *Latest Build* and then *Artifacts* to always get the current head build. Miller releases from 5.3.0 onward will simply point to a particular Appveyor artifact associated with the release.
Building from source
----------------------------------------------------------------
Please see :doc:`build`.

View file

@ -1,17 +0,0 @@
..
PLEASE DO NOT EDIT DIRECTLY. EDIT THE .rst.in FILE PLEASE.
Internationalization
================================================================
Miller handles strings with any characters other than 0x00 or 0xff, using explicit UTF-8-friendly string-length computations. (I have no plans to support UTF-16 or ISO-8859-1.)
By and large, Miller treats strings as sequences of non-null bytes without need to interpret them semantically. Intentional support for internationalization includes:
* Tabular output formats such pprint and xtab (see :doc:`file-formats`) are aligned correctly.
* The :ref:`reference-dsl-strlen` function correctly counts UTF-8 codepoints rather than bytes.
* The :ref:`reference-dsl-toupper`, :ref:`reference-dsl-tolower`, and :ref:`reference-dsl-capitalize` DSL functions within the capabilities of https://github.com/sheredom/utf8.h.
Meanwhile, regular expressions and the DSL functions :ref:`reference-dsl-sub` and :ref:`reference-dsl-gsub` function correctly, albeit without explicit intentional support.
Please file an issue at https://github.com/johnkerl/miller if you encounter bugs related to internationalization (or anything else for that matter).

View file

@ -1,185 +0,0 @@
..
PLEASE DO NOT EDIT DIRECTLY. EDIT THE .rst.in FILE PLEASE.
Log-processing examples
----------------------------------------------------------------
Another of my favorite use-cases for Miller is doing ad-hoc processing of log-file data. Here's where DKVP format really shines: one, since the field names and field values are present on every line, every line stands on its own. That means you can ``grep`` or what have you. Also it means not every line needs to have the same list of field names ("schema").
Again, all the examples in the CSV section apply here -- just change the input-format flags. But there's more you can do when not all the records have the same shape.
Writing a program -- in any language whatsoever -- you can have it print out log lines as it goes along, with items for various events jumbled together. After the program has finished running you can sort it all out, filter it, analyze it, and learn from it.
Suppose your program has printed something like this::
$ cat log.txt
op=enter,time=1472819681
op=cache,type=A9,hit=0
op=cache,type=A4,hit=1
time=1472819690,batch_size=100,num_filtered=237
op=cache,type=A1,hit=1
op=cache,type=A9,hit=0
op=cache,type=A1,hit=1
op=cache,type=A9,hit=0
op=cache,type=A9,hit=0
op=cache,type=A1,hit=1
time=1472819705,batch_size=100,num_filtered=348
op=cache,type=A4,hit=1
op=cache,type=A9,hit=0
op=cache,type=A9,hit=0
op=cache,type=A9,hit=0
op=cache,type=A9,hit=0
op=cache,type=A4,hit=1
time=1472819713,batch_size=100,num_filtered=493
op=cache,type=A9,hit=1
op=cache,type=A1,hit=1
op=cache,type=A9,hit=0
op=cache,type=A9,hit=0
op=cache,type=A9,hit=1
time=1472819720,batch_size=100,num_filtered=554
op=cache,type=A1,hit=0
op=cache,type=A4,hit=1
op=cache,type=A9,hit=0
op=cache,type=A9,hit=0
op=cache,type=A9,hit=0
op=cache,type=A4,hit=0
op=cache,type=A4,hit=0
op=cache,type=A9,hit=0
time=1472819736,batch_size=100,num_filtered=612
op=cache,type=A1,hit=1
op=cache,type=A9,hit=0
op=cache,type=A9,hit=0
op=cache,type=A9,hit=0
op=cache,type=A9,hit=0
op=cache,type=A4,hit=1
op=cache,type=A1,hit=1
op=cache,type=A9,hit=0
op=cache,type=A9,hit=0
time=1472819742,batch_size=100,num_filtered=728
Each print statement simply contains local information: the current timestamp, whether a particular cache was hit or not, etc. Then using either the system ``grep`` command, or Miller's ``having-fields``, or ``is_present``, we can pick out the parts we want and analyze them::
$ grep op=cache log.txt \
| mlr --idkvp --opprint stats1 -a mean -f hit -g type then sort -f type
type hit_mean
A1 0.857143
A4 0.714286
A9 0.090909
::
$ mlr --from log.txt --opprint \
filter 'is_present($batch_size)' \
then step -a delta -f time,num_filtered \
then sec2gmt time
time batch_size num_filtered time_delta num_filtered_delta
2016-09-02T12:34:50Z 100 237 0 0
2016-09-02T12:35:05Z 100 348 15 111
2016-09-02T12:35:13Z 100 493 8 145
2016-09-02T12:35:20Z 100 554 7 61
2016-09-02T12:35:36Z 100 612 16 58
2016-09-02T12:35:42Z 100 728 6 116
Alternatively, we can simply group the similar data for a better look::
$ mlr --opprint group-like log.txt
op time
enter 1472819681
op type hit
cache A9 0
cache A4 1
cache A1 1
cache A9 0
cache A1 1
cache A9 0
cache A9 0
cache A1 1
cache A4 1
cache A9 0
cache A9 0
cache A9 0
cache A9 0
cache A4 1
cache A9 1
cache A1 1
cache A9 0
cache A9 0
cache A9 1
cache A1 0
cache A4 1
cache A9 0
cache A9 0
cache A9 0
cache A4 0
cache A4 0
cache A9 0
cache A1 1
cache A9 0
cache A9 0
cache A9 0
cache A9 0
cache A4 1
cache A1 1
cache A9 0
cache A9 0
time batch_size num_filtered
1472819690 100 237
1472819705 100 348
1472819713 100 493
1472819720 100 554
1472819736 100 612
1472819742 100 728
::
$ mlr --opprint group-like then sec2gmt time log.txt
op time
enter 2016-09-02T12:34:41Z
op type hit
cache A9 0
cache A4 1
cache A1 1
cache A9 0
cache A1 1
cache A9 0
cache A9 0
cache A1 1
cache A4 1
cache A9 0
cache A9 0
cache A9 0
cache A9 0
cache A4 1
cache A9 1
cache A1 1
cache A9 0
cache A9 0
cache A9 1
cache A1 0
cache A4 1
cache A9 0
cache A9 0
cache A9 0
cache A4 0
cache A4 0
cache A9 0
cache A1 1
cache A9 0
cache A9 0
cache A9 0
cache A9 0
cache A4 1
cache A1 1
cache A9 0
cache A9 0
time batch_size num_filtered
2016-09-02T12:34:50Z 100 237
2016-09-02T12:35:05Z 100 348
2016-09-02T12:35:13Z 100 493
2016-09-02T12:35:20Z 100 554
2016-09-02T12:35:36Z 100 612
2016-09-02T12:35:42Z 100 728

File diff suppressed because it is too large Load diff

View file

@ -1,43 +0,0 @@
..
PLEASE DO NOT EDIT DIRECTLY. EDIT THE .rst.in FILE PLEASE.
How original is Miller?
================================================================
It isn't. Miller is one of many, many participants in the online-analytical-processing culture. Other key participants include ``awk``, SQL, spreadsheets, etc. etc. etc. Far from being an original concept, Miller explicitly strives to imitate several existing tools:
**The Unix toolkit**: Intentional similarities as described in :doc:`feature-comparison`.
Recipes abound for command-line data analysis using the Unix toolkit. Here are just a couple of my favorites:
* http://en.wikibooks.org/wiki/Ad_Hoc_Data_Analysis_From_The_Unix_Command_Line
* http://www.gregreda.com/2013/07/15/unix-commands-for-data-science
* https://github.com/dbohdan/structured-text-tools
**RecordStream**: Miller owes particular inspiration to `RecordStream <https://github.com/benbernard/RecordStream>`_. The key difference is that RecordStream is a Perl-based tool for manipulating JSON (including requiring it to separately manipulate other formats such as CSV into and out of JSON), while Miller is fast C which handles its formats natively. The similarities include the ``sort``, ``stats1`` (analog of RecordStream's ``collate``), and ``delta`` operations, as well as ``filter`` and ``put``, and pretty-print formatting.
**stats_m**: A third source of lineage is my Python `stats_m <https://github.com/johnkerl/scripts-math/tree/master/stats>`_ module. This includes simple single-pass algorithms which form Miller's ``stats1`` and ``stats2`` subcommands.
**SQL**: Fourthly, Miller's ``group-by`` command name is from SQL, as is the term ``aggregate``.
**Added value**: Miller's added values include:
* Name-indexing, compared to the Unix toolkit's positional indexing.
* Raw speed, compared to ``awk``, RecordStream, ``stats_m``, or various other kinds of Python/Ruby/etc. scripts one can easily create.
* Compact keystroking for many common tasks, with a decent amount of flexibility.
* Ability to handle text files on the Unix pipe, without need for creating database tables, compared to SQL databases.
* Various file formats, and on-the-fly format conversion.
**jq**: Miller does for name-indexed text what `jq <https://stedolan.github.io/jq/>`_ does for JSON. If you're not already familiar with ``jq``, please check it out!.
**What about similar tools?**
Here's a comprehensive list: https://github.com/dbohdan/structured-text-tools. Last I knew it doesn't mention `rows <https://github.com/turicas/rows>`_ so here's a plug for that as well. As it turns out, I learned about most of these after writing Miller.
**What about DOTADIW?** One of the key points of the `Unix philosophy <http://en.wikipedia.org/wiki/Unix_philosophy>`_ is that a tool should do one thing and do it well. Hence ``sort`` and ``cut`` do just one thing. Why does Miller put ``awk``-like processing, a few SQL-like operations, and statistical reduction all into one tool (see also :doc:`reference`)? This is a fair question. First note that many standard tools, such as ``awk`` and ``perl``, do quite a few things -- as does ``jq``. But I could have pushed for putting format awareness and name-indexing options into ``cut``, ``awk``, and so on (so you could do ``cut -f hostname,uptime`` or ``awk '{sum += $x*$y}END{print sum}'``). Patching ``cut``, ``sort``, etc. on multiple operating systems is a non-starter in terms of uptake. Moreover, it makes sense for me to have Miller be a tool which collects together format-aware record-stream processing into one place, with good reuse of Miller-internal library code for its various features.
**Why not use Perl/Python/Ruby etc.?** Maybe you should. With those tools you'll get far more expressive power, and sufficiently quick turnaround time for small-to-medium-sized data. Using Miller you'll get something less than a complete programming language, but which is fast, with moderate amounts of flexibility and much less keystroking.
When I was first developing Miller I made a survey of several languages. Using low-level implementation languages like C, Go, Rust, and Nim, I'd need to create my own domain-specific language (DSL) which would always be less featured than a full programming language, but I'd get better performance. Using high-level interpreted languages such as Perl/Python/Ruby I'd get the language's ``eval`` for free and I wouldn't need a DSL; Miller would have mainly been a set of format-specific I/O hooks. If I'd gotten good enough performance from the latter I'd have done it without question and Miller would be far more flexible. But C won the performance criteria by a landslide so we have Miller in C with a custom DSL.
**No, really, why one more command-line data-manipulation tool?** I wrote Miller because I was frustrated with tools like ``grep``, ``sed``, and so on being *line-aware* without being *format-aware*. The single most poignant example I can think of is seeing people grep data lines out of their CSV files and sadly losing their header lines. While some lighter-than-SQL processing is very nice to have, at core I wanted the format-awareness of `RecordStream <https://github.com/benbernard/RecordStream>`_ combined with the raw speed of the Unix toolkit. Miller does precisely that.

View file

@ -1,23 +0,0 @@
..
PLEASE DO NOT EDIT DIRECTLY. EDIT THE .rst.in FILE PLEASE.
Performance
================================================================
Disclaimer
----------------------------------------------------------------
In a previous version of this page (see `here <http://johnkerl.org/miller-releases/miller-5.1.0/doc/performance.html>`_) I compared Miller to some items in the Unix toolkit in terms of run time. But such comparisons are very much not apples-to-apples:
* Miller's principal strength is that it handles **key-value data in various formats** while the system tools **do not**. So if you time ``mlr sort`` on a CSV file against system ``sort``, it's not relevant to say which is faster by how many percent -- Miller will respect the header line, leaving it in place, while the system sort will move it, sorting it along with all the other header lines. This would be comparing the run times of two programs produce different outputs. Likewise, ``awk`` doesn't respect header lines, although you can code up some CSV-handling using ``if (NR==1) { ... } else { ... }``. And that's just CSV: I don't know any simple way to get ``sort``, ``awk``, etc. to handle DKVP, JSON, etc. -- which is the main rreason I wrote Miller.
* **Implementations differ by platform**: one ``awk`` may be fundamentally faster than another, and ``mawk`` has a very efficient bytecode implementation -- which handles positionally indexed data far faster than Miller does.
* The system ``sort`` command will, on some systems, handle too-large-for-RAM datasets by spilling to disk; Miller (as of version 5.2.0, mid-2017) does not. Miller sorts are always stable; GNU supports stable and unstable variants.
* Etc.
Summary
----------------------------------------------------------------
Miller can do many kinds of processing on key-value-pair data using elapsed time roughly of the same order of magnitude as items in the Unix toolkit can handle positionally indexed data. Specific results vary widely by platform, implementation details, multi-core use (or not). Lastly, specific special-purpose non-record-aware processing will run far faster in ``grep``, ``sed``, etc.

View file

@ -1,74 +0,0 @@
..
PLEASE DO NOT EDIT DIRECTLY. EDIT THE .rst.in FILE PLEASE.
Quick examples
================================================================
.. image:: coverart/cover-combined.png
Column select::
% mlr --csv cut -f hostname,uptime mydata.csv
Add new columns as function of other columns::
% mlr --nidx put '$sum = $7 < 0.0 ? 3.5 : $7 + 2.1*$8' *.dat
Row filter::
% mlr --csv filter '$status != "down" && $upsec >= 10000' *.csv
Apply column labels and pretty-print::
% grep -v '^#' /etc/group | mlr --ifs : --nidx --opprint label group,pass,gid,member then sort -f group
Join multiple data sources on key columns::
% mlr join -j account_id -f accounts.dat then group-by account_name balances.dat
Multiple formats including JSON::
% mlr --json put '$attr = sub($attr, "([0-9]+)_([0-9]+)_.*", "\1:\2")' data/*.json
Aggregate per-column statistics::
% mlr stats1 -a min,mean,max,p10,p50,p90 -f flag,u,v data/*
Linear regression::
% mlr stats2 -a linreg-pca -f u,v -g shape data/*
Aggregate custom per-column statistics::
% mlr put -q '@sum[$a][$b] += $x; end {emit @sum, "a", "b"}' data/*
Iterate over data using DSL expressions::
% mlr --from estimates.tbl put '
for (k,v in $*) {
if (is_numeric(v) && k =~ "^[t-z].*$") {
$sum += v; $count += 1
}
}
$mean = $sum / $count # no assignment if count unset
'
Run DSL expressions from a script file::
% mlr --from infile.dat put -f analyze.mlr
Split/reduce output to multiple filenames::
% mlr --from infile.dat put 'tee > "./taps/data-".$a."-".$b, $*'
Compressed I/O::
% mlr --from infile.dat put 'tee | "gzip > ./taps/data-".$a."-".$b.".gz", $*'
Interoperate with other data-processing tools using standard pipes::
% mlr --from infile.dat put -q '@v=$*; dump | "jq .[]"'
Tap/trace::
% mlr --from infile.dat put '(NR % 1000 == 0) { print > stderr, "Checkpoint ".NR}'

View file

@ -1,206 +0,0 @@
..
PLEASE DO NOT EDIT DIRECTLY. EDIT THE .rst.in FILE PLEASE.
Record-heterogeneity
================================================================
We think of CSV tables as rectangular: if there are 17 columns in the header then there are 17 columns for every row, else the data have a formatting error.
But heterogeneous data abound (today's no-SQL databases for example). Miller handles this.
For I/O
----------------------------------------------------------------
CSV and pretty-print
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Miller simply prints a newline and a new header when there is a schema change. When there is no schema change, you get CSV per se as a special case. Likewise, Miller reads heterogeneous CSV or pretty-print input the same way. The difference between CSV and CSV-lite is that the former is RFC4180-compliant, while the latter readily handles heterogeneous data (which is non-compliant). For example:
::
$ cat data/het.dkvp
resource=/path/to/file,loadsec=0.45,ok=true
record_count=100,resource=/path/to/file
resource=/path/to/second/file,loadsec=0.32,ok=true
record_count=150,resource=/path/to/second/file
resource=/some/other/path,loadsec=0.97,ok=false
::
$ mlr --ocsvlite cat data/het.dkvp
resource,loadsec,ok
/path/to/file,0.45,true
record_count,resource
100,/path/to/file
resource,loadsec,ok
/path/to/second/file,0.32,true
record_count,resource
150,/path/to/second/file
resource,loadsec,ok
/some/other/path,0.97,false
::
$ mlr --opprint cat data/het.dkvp
resource loadsec ok
/path/to/file 0.45 true
record_count resource
100 /path/to/file
resource loadsec ok
/path/to/second/file 0.32 true
record_count resource
150 /path/to/second/file
resource loadsec ok
/some/other/path 0.97 false
Miller handles explicit header changes as just shown. If your CSV input contains ragged data -- if there are implicit header changes -- you can use ``--allow-ragged-csv-input`` (or keystroke-saver ``--ragged``). For too-short data lines, values are filled with empty string; for too-long data lines, missing field names are replaced with positional indices (counting up from 1, not 0), as follows:
::
$ cat data/ragged.csv
a,b,c
1,2,3
4,5
6,7,8,9
::
$ mlr --icsv --oxtab --allow-ragged-csv-input cat data/ragged.csv
a 1
b 2
c 3
a 4
b 5
c
a 6
b 7
c 8
4 9
You may also find Miller's ``group-like`` feature handy (see also :doc:`reference`):
::
$ mlr --ocsvlite group-like data/het.dkvp
resource,loadsec,ok
/path/to/file,0.45,true
/path/to/second/file,0.32,true
/some/other/path,0.97,false
record_count,resource
100,/path/to/file
150,/path/to/second/file
::
$ mlr --opprint group-like data/het.dkvp
resource loadsec ok
/path/to/file 0.45 true
/path/to/second/file 0.32 true
/some/other/path 0.97 false
record_count resource
100 /path/to/file
150 /path/to/second/file
Key-value-pair, vertical-tabular, and index-numbered formats
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
For these formats, record-heterogeneity comes naturally:
::
$ cat data/het.dkvp
resource=/path/to/file,loadsec=0.45,ok=true
record_count=100,resource=/path/to/file
resource=/path/to/second/file,loadsec=0.32,ok=true
record_count=150,resource=/path/to/second/file
resource=/some/other/path,loadsec=0.97,ok=false
::
$ mlr --onidx --ofs ' ' cat data/het.dkvp
/path/to/file 0.45 true
100 /path/to/file
/path/to/second/file 0.32 true
150 /path/to/second/file
/some/other/path 0.97 false
::
$ mlr --oxtab cat data/het.dkvp
resource /path/to/file
loadsec 0.45
ok true
record_count 100
resource /path/to/file
resource /path/to/second/file
loadsec 0.32
ok true
record_count 150
resource /path/to/second/file
resource /some/other/path
loadsec 0.97
ok false
::
$ mlr --oxtab group-like data/het.dkvp
resource /path/to/file
loadsec 0.45
ok true
resource /path/to/second/file
loadsec 0.32
ok true
resource /some/other/path
loadsec 0.97
ok false
record_count 100
resource /path/to/file
record_count 150
resource /path/to/second/file
For processing
----------------------------------------------------------------
Miller operates on specified fields and takes the rest along: for example, if you are sorting on the ``count`` field then all records in the input stream must have a ``count`` field but the other fields can vary, and moreover the sorted-on field name(s) don't need to be in the same position on each line:
::
$ cat data/sort-het.dkvp
count=500,color=green
count=600
status=ok,count=250,hours=0.22
status=ok,count=200,hours=3.4
count=300,color=blue
count=100,color=green
count=450
::
$ mlr sort -n count data/sort-het.dkvp
count=100,color=green
status=ok,count=200,hours=3.4
status=ok,count=250,hours=0.22
count=300,color=blue
count=450
count=500,color=green
count=600

File diff suppressed because it is too large Load diff

File diff suppressed because it is too large Load diff

File diff suppressed because it is too large Load diff

View file

@ -1,21 +0,0 @@
..
PLEASE DO NOT EDIT DIRECTLY. EDIT THE .rst.in FILE PLEASE.
Documents by release
================================================================
As of September 2020, for 5.9.1 onward, release-specific docs will be handled automatically by https://miller.readthedocs.io whenever a new release is tagged at https://github.com/johnkerl/miller/releases.
Information here is for documents from before the readthedocs port:
* `Head <https://miller.readthedocs.io>`_
* `Miller 5.9.1 <https://johnkerl.org/miller-releases/miller-5.10.0/docs/_build/html>`_
* `Miller 5.9.0 <https://johnkerl.org//miller-releases/miller-5.9.0/doc/index.html>`_
* `Miller 5.8.0 <https://johnkerl.org//miller-releases/miller-5.8.0/doc/index.html>`_
* `Miller 5.7.0 <https://johnkerl.org//miller-releases/miller-5.7.0/doc/index.html>`_
* `Miller 5.6.2 <https://johnkerl.org//miller-releases/miller-5.6.2/doc/index.html>`_
* `Miller 5.6.1 <https://johnkerl.org//miller-releases/miller-5.6.1/doc/index.html>`_
* `Miller 5.6.0 <https://johnkerl.org//miller-releases/miller-5.6.0/doc/index.html>`_
* `Miller 5.5.0 <https://johnkerl.org//miller-releases/miller-5.5.0/doc/index.html>`_
* `Miller 5.4.0 <https://johnkerl.org//miller-releases/miller-5.4.0/doc/index.html>`_
* `Miller 5.3.0 <https://johnkerl.org//miller-releases/miller-5.3.0/doc/index.html>`_

View file

@ -1,227 +0,0 @@
..
PLEASE DO NOT EDIT DIRECTLY. EDIT THE .rst.in FILE PLEASE.
SQL examples
====================
.. _sql-output-examples:
SQL-output examples
^^^^^^^^^^^^^^^^^^^
I like to produce SQL-query output with header-column and tab delimiter: this is CSV but with a tab instead of a comma, also known as TSV. Then I post-process with ``mlr --tsv`` or ``mlr --tsvlite``. This means I can do some (or all, or none) of my data processing within SQL queries, and some (or none, or all) of my data processing using Miller -- whichever is most convenient for my needs at the moment.
For example, using default output formatting in ``mysql`` we get formatting like Miller's ``--opprint --barred``::
$ mysql --database=mydb -e 'show columns in mytable'
+------------------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+------------------+--------------+------+-----+---------+-------+
| id | bigint(20) | NO | MUL | NULL | |
| category | varchar(256) | NO | | NULL | |
| is_permanent | tinyint(1) | NO | | NULL | |
| assigned_to | bigint(20) | YES | | NULL | |
| last_update_time | int(11) | YES | | NULL | |
+------------------+--------------+------+-----+---------+-------+
Using ``mysql``'s ``-B`` we get TSV output::
$ mysql --database=mydb -B -e 'show columns in mytable' | mlr --itsvlite --opprint cat
Field Type Null Key Default Extra
id bigint(20) NO MUL NULL -
category varchar(256) NO - NULL -
is_permanent tinyint(1) NO - NULL -
assigned_to bigint(20) YES - NULL -
last_update_time int(11) YES - NULL -
Since Miller handles TSV output, we can do as much or as little processing as we want in the SQL query, then send the rest on to Miller. This includes outputting as JSON, doing further selects/joins in Miller, doing stats, etc. etc.::
$ mysql --database=mydb -B -e 'show columns in mytable' | mlr --itsvlite --ojson --jlistwrap --jvstack cat
[
{
"Field": "id",
"Type": "bigint(20)",
"Null": "NO",
"Key": "MUL",
"Default": "NULL",
"Extra": ""
},
{
"Field": "category",
"Type": "varchar(256)",
"Null": "NO",
"Key": "",
"Default": "NULL",
"Extra": ""
},
{
"Field": "is_permanent",
"Type": "tinyint(1)",
"Null": "NO",
"Key": "",
"Default": "NULL",
"Extra": ""
},
{
"Field": "assigned_to",
"Type": "bigint(20)",
"Null": "YES",
"Key": "",
"Default": "NULL",
"Extra": ""
},
{
"Field": "last_update_time",
"Type": "int(11)",
"Null": "YES",
"Key": "",
"Default": "NULL",
"Extra": ""
}
]
$ mysql --database=mydb -B -e 'select * from mytable' > query.tsv
$ mlr --from query.tsv --t2p stats1 -a count -f id -g category,assigned_to
category assigned_to id_count
special 10000978 207
special 10003924 385
special 10009872 168
standard 10000978 524
standard 10003924 392
standard 10009872 108
...
Again, all the examples in the CSV section apply here -- just change the input-format flags.
.. _sql-input-examples:
SQL-input examples
^^^^^^^^^^^^^^^^^^
One use of NIDX (value-only, no keys) format is for loading up SQL tables.
Create and load SQL table::
mysql> CREATE TABLE abixy(
a VARCHAR(32),
b VARCHAR(32),
i BIGINT(10),
x DOUBLE,
y DOUBLE
);
Query OK, 0 rows affected (0.01 sec)
bash$ mlr --onidx --fs comma cat data/medium > medium.nidx
mysql> LOAD DATA LOCAL INFILE 'medium.nidx' REPLACE INTO TABLE abixy FIELDS TERMINATED BY ',' ;
Query OK, 10000 rows affected (0.07 sec)
Records: 10000 Deleted: 0 Skipped: 0 Warnings: 0
mysql> SELECT COUNT(*) AS count FROM abixy;
+-------+
| count |
+-------+
| 10000 |
+-------+
1 row in set (0.00 sec)
mysql> SELECT * FROM abixy LIMIT 10;
+------+------+------+---------------------+---------------------+
| a | b | i | x | y |
+------+------+------+---------------------+---------------------+
| pan | pan | 1 | 0.3467901443380824 | 0.7268028627434533 |
| eks | pan | 2 | 0.7586799647899636 | 0.5221511083334797 |
| wye | wye | 3 | 0.20460330576630303 | 0.33831852551664776 |
| eks | wye | 4 | 0.38139939387114097 | 0.13418874328430463 |
| wye | pan | 5 | 0.5732889198020006 | 0.8636244699032729 |
| zee | pan | 6 | 0.5271261600918548 | 0.49322128674835697 |
| eks | zee | 7 | 0.6117840605678454 | 0.1878849191181694 |
| zee | wye | 8 | 0.5985540091064224 | 0.976181385699006 |
| hat | wye | 9 | 0.03144187646093577 | 0.7495507603507059 |
| pan | wye | 10 | 0.5026260055412137 | 0.9526183602969864 |
+------+------+------+---------------------+---------------------+
Aggregate counts within SQL::
mysql> SELECT a, b, COUNT(*) AS count FROM abixy GROUP BY a, b ORDER BY COUNT DESC;
+------+------+-------+
| a | b | count |
+------+------+-------+
| zee | wye | 455 |
| pan | eks | 429 |
| pan | pan | 427 |
| wye | hat | 426 |
| hat | wye | 423 |
| pan | hat | 417 |
| eks | hat | 417 |
| pan | zee | 413 |
| eks | eks | 413 |
| zee | hat | 409 |
| eks | wye | 407 |
| zee | zee | 403 |
| pan | wye | 395 |
| wye | pan | 392 |
| zee | eks | 391 |
| zee | pan | 389 |
| hat | eks | 389 |
| wye | eks | 386 |
| wye | zee | 385 |
| hat | zee | 385 |
| hat | hat | 381 |
| wye | wye | 377 |
| eks | pan | 371 |
| hat | pan | 363 |
| eks | zee | 357 |
+------+------+-------+
25 rows in set (0.01 sec)
Aggregate counts within Miller::
$ mlr --opprint uniq -c -g a,b then sort -nr count data/medium
a b count
zee wye 455
pan eks 429
pan pan 427
wye hat 426
hat wye 423
pan hat 417
eks hat 417
eks eks 413
pan zee 413
zee hat 409
eks wye 407
zee zee 403
pan wye 395
hat pan 363
eks zee 357
Pipe SQL output to aggregate counts within Miller::
$ mysql -D miller -B -e 'select * from abixy' | mlr --itsv --opprint uniq -c -g a,b then sort -nr count
a b count
zee wye 455
pan eks 429
pan pan 427
wye hat 426
hat wye 423
pan hat 417
eks hat 417
eks eks 413
pan zee 413
zee hat 409
eks wye 407
zee zee 403
pan wye 395
wye pan 392
zee eks 391
zee pan 389
hat eks 389
wye eks 386
hat zee 385
wye zee 385
hat hat 381
wye wye 377
eks pan 371
hat pan 363
eks zee 357

View file

@ -1,56 +0,0 @@
..
PLEASE DO NOT EDIT DIRECTLY. EDIT THE .rst.in FILE PLEASE.
Why?
================================================================
Someone asked me the other day about design, tradeoffs, thought process, why I felt it necessary to build Miller, etc. Here are some answers.
Who is Miller for?
----------------------------------------------------------------
For background, I'm a software engineer, with a heavy devops bent and a non-trivial amount of data-engineering in my career. **Initially I wrote Miller mainly for myself:** I'm coder-friendly (being a coder); I'm Github-friendly; most of my data are well-structured or easily structurable (TSV-formatted SQL-query output, CSV files, log files, JSON data structures); I care about interoperability between all the various formats Miller supports (I've encountered them all); I do all my work on Linux or OSX.
But now there's this neat little tool **which seems to be useful for people in various disciplines**. I don't even know entirely *who*. I can click through Github starrers and read a bit about what they seem to do, but not everyone that uses Miller is even *on* Github (or stars things). I've gotten a lot of feature requests through Github -- but only from people who are Github users. Not everyone's a coder (it seems like a lot of Miller's Github starrers are devops folks like myself, or data-science-ish people, or biology/genomics folks.) A lot of people care 100% about CSV. And so on.
So I wonder (please drop a note at https://github.com/johnkerl/miller/issues) does Miller do what you need? Do you use it for all sorts of things, or just one or two nice things? Are there things you wish it did but it doesn't? Is it almost there, or just nowhere near what you want? Are there not enough features or way too many? Are the docs too complicated; do you have a hard time finding out how to do what you want? Should I think differently about what this tool even *is* in the first place? Should I think differently about who it's for?
What was Miller created to do?
----------------------------------------------------------------
First: there are tools like ``xsv`` which handles CSV marvelously and ``jq`` which handles JSON marvelously, and so on -- but I over the years of my career in the software industry I've found myself, and others, doing a lot of ad-hoc things which really were fundamentally the same *except* for format. So the number one thing about Miller is doing common things while supporting **multiple formats**: (a) ingest a list of records where a record is a list of key-value pairs (however represented in the input files); (b) transform that stream of records; (c) emit the transformed stream -- either in the same format as input, or in a different format.
Second thing, a lot like the first: just as I didn't want to build something only for a single file format, I didn't want to build something only for one problem domain. In my work doing software engineering, devops, data engineering, etc. I saw a lot of commonalities and I wanted to **solve as many problems simultaneously as possible**.
Third: it had to be **streaming**. As time goes by and we (some of us, sometimes) have machines with tens or hundreds of GB of RAM, it's maybe less important, but I'm unhappy with tools which ingest all data, then do stuff, then emit all data. One reason is to be able to handle files bigger than available RAM. Another reason is to be able to handle input which trickles in, e.g. you have some process emitting data now and then and you can pipe it to Miller and it will emit transformed records one at a time.
Fourth: it had to be **fast**. This precludes all sorts of very nice things written in Ruby, for example. I love Ruby as a very expressive language, and I have several very useful little utility scripts written in Ruby. But a few years ago I ported over some of my old tried-and-true C programs and the lines-of-code count was a *lot* lower -- it was great! Until I ran them on multi-GB files and realized they took 60x as long to complete. So I couldn't write Miller in Ruby, or in languages like it. I was going to have to do something in a low-level language in order to make it performant. I did simple experiments in several languages, and nothing was as fast as C, so I used C.
Fifth thing: I wanted Miller to be **pipe-friendly and interoperate with other command-line tools**. Since the basic paradigm is ingest records, transform records, emit records -- where the input and output formats can be the same or different, and the transform can be complex, or just pass-through -- this means you can use it to transform data, or re-format it, or both. So if you just want to do data-cleaning/prep/formatting and do all the "real" work in R, you can. If you just want a little glue script between other tools you can get that. And if you want to do non-trivial data-reduction in Miller you can.
Sixth thing: Must have **comprehensive documentation and unit-test**. Since Miller handles a lot of formats and solves a lot of problems, there's a lot to test and a lot to keep working correctly as I add features or optimize. And I wanted it to be able to explain itself -- not only through web docs like the one you're reading but also through ``man mlr`` and ``mlr --help``, ``mlr sort --help``, etc.
Seventh thing: **Must have a domain-specific language** (DSL) **but also must let you do common things without it**. All those little verbs Miller has to help you *avoid* having to write for-loops are great. I use them for keystroke-saving: ``mlr stats1 -a mean,stddev,min,max -f quantity``, for example, without you having to write for-loops or define accumulator variables. But you also have to be able to break out of that and write arbitrary code when you want to: ``mlr put '$distance = $rate * $time'`` or anything else you can think up. In Perl/AWK/etc. it's all DSL. In xsv et al. it's all verbs. In Miller I like having the combination.
Eighth thing: It's an **awful lot of fun to write**. In my experience I didn't find any tools which do multi-format, streaming, efficient, multi-purpose, with DSL and non-DSL, so I wrote one. But I don't guarantee it's unique in the world. It fills a niche in the world (people use it) but it also fills a niche in my life.
Tradeoffs
----------------------------------------------------------------
Miller is command-line-only by design. People who want a graphical user interface won't find it here. This is in part (a) accommodating my personal preferences, and in part (b) guided by my experience/belief that the command line is very expressive. Steep learning curve, yes. I consider that price worth paying.
Another tradeoff: supporting lists of records -- each with only one depth -- keeps me supporting only what can be expressed in *all* of those formats. E.g. in JSON you can have lists of lists of lists which Miller just doesn't handle. So Miller can't (and won't) handle arbitrary JSON because it only handles tabular data which can be expressed in a variety of formats.
A third tradeoff is doing build-from-scratch in a low-level language. It'd be quicker to write (but slower to run) if written in a high-level language. If Miller were written in Python, it would be implemented in significantly fewer lines of code than its current C implementation. The DSL would just be an ``eval`` of Python code. And it would run slower, but maybe not enough slower to be a problem for most folks. Later I found out about the `rows <https://github.com/turicas/rows>`_ tool -- if you find Miller useful, you should check out ``rows`` as well.
A fourth tradeoff is in the DSL (more visibly so in 5.0.0 but already in pre-5.0.0): how much to make it dynamically typed -- so you can just say y=x+1 with a minimum number of keystrokes -- vs. having it do a good job of telling you when you've made a typo. This is a common paradigm across *all* languages. Some like Ruby you don't declare anything and they're quick to code little stuff in but programs of even a few thousand lines (which isn't large in the software world) become insanely unmanageable. Then Java at the other extreme which is very typesafe but you have to type in a lot of punctuation, angle brackets, datatypes, repetition, etc. just to be able to get anything done. And some in the middle like Go which are typesafe but with type inference which aim to do the best of both. In the Miller (5.0.0) DSL you get ``y=x+1`` by default but you can have things like ``int y = x+1`` etc. so the typesafety is opt-in. See also :ref:`reference-dsl-type-checking` for more information on type-checking.
Related tools
----------------------------------------------------------------
Here's a comprehensive list: https://github.com/dbohdan/structured-text-tools. It doesn't mention `rows <https://github.com/turicas/rows>`_ so here's a plug for that as well.
Moving forward
----------------------------------------------------------------
I originally aimed Miller at people who already know what ``sed``/``awk``/``cut``/``sort``/``join`` are and wanted some options. But as time goes by I realize that tools like this can be useful to folks who *don't* know what those things are; people who aren't primarily coders; people who are scientists, or data scientists. These days some journalists do data analysis. So moving forward in terms of docs, I am working on having more cookbook, follow-by-example stuff in addition to the existing language-reference kinds of stuff. And continuing to seek out input from people who use Miller on where to go next.

View file

@ -1,860 +0,0 @@
/*
* basic.css
* ~~~~~~~~~
*
* Sphinx stylesheet -- basic theme.
*
* :copyright: Copyright 2007-2020 by the Sphinx team, see AUTHORS.
* :license: BSD, see LICENSE for details.
*
*/
/* -- main layout ----------------------------------------------------------- */
div.clearer {
clear: both;
}
div.section::after {
display: block;
content: '';
clear: left;
}
/* -- relbar ---------------------------------------------------------------- */
div.related {
width: 100%;
font-size: 90%;
}
div.related h3 {
display: none;
}
div.related ul {
margin: 0;
padding: 0 0 0 10px;
list-style: none;
}
div.related li {
display: inline;
}
/* CHANGE ME */
div.body li {
margin:10px 0 0 0;
}
div.related li.right {
float: right;
margin-right: 5px;
}
/* -- sidebar --------------------------------------------------------------- */
div.sphinxsidebarwrapper {
padding: 10px 5px 0 10px;
}
div.sphinxsidebar {
float: left;
width: 230px;
margin-left: -100%;
font-size: 90%;
word-wrap: break-word;
overflow-wrap : break-word;
}
div.sphinxsidebar ul {
list-style: none;
}
div.sphinxsidebar ul ul,
div.sphinxsidebar ul.want-points {
margin-left: 20px;
list-style: square;
}
div.sphinxsidebar ul ul {
margin-top: 0;
margin-bottom: 0;
}
div.sphinxsidebar form {
margin-top: 10px;
}
div.sphinxsidebar input {
border: 1px solid #98dbcc;
font-family: sans-serif;
font-size: 1em;
}
div.sphinxsidebar #searchbox form.search {
overflow: hidden;
}
div.sphinxsidebar #searchbox input[type="text"] {
float: left;
width: 80%;
padding: 0.25em;
box-sizing: border-box;
}
div.sphinxsidebar #searchbox input[type="submit"] {
float: left;
width: 20%;
border-left: none;
padding: 0.25em;
box-sizing: border-box;
}
img {
border: 0;
max-width: 100%;
}
/* -- search page ----------------------------------------------------------- */
ul.search {
margin: 10px 0 0 20px;
padding: 0;
}
ul.search li {
padding: 5px 0 5px 20px;
background-image: url(file.png);
background-repeat: no-repeat;
background-position: 0 7px;
}
ul.search li a {
font-weight: bold;
}
ul.search li div.context {
color: #888;
margin: 2px 0 0 30px;
text-align: left;
}
ul.keywordmatches li.goodmatch a {
font-weight: bold;
}
/* -- index page ------------------------------------------------------------ */
table.contentstable {
width: 90%;
margin-left: auto;
margin-right: auto;
}
table.contentstable p.biglink {
line-height: 150%;
}
a.biglink {
font-size: 1.3em;
}
span.linkdescr {
font-style: italic;
padding-top: 5px;
font-size: 90%;
}
/* -- general index --------------------------------------------------------- */
table.indextable {
width: 100%;
}
table.indextable td {
text-align: left;
vertical-align: top;
}
table.indextable ul {
margin-top: 0;
margin-bottom: 0;
list-style-type: none;
}
table.indextable > tbody > tr > td > ul {
padding-left: 0em;
}
table.indextable tr.pcap {
height: 10px;
}
table.indextable tr.cap {
margin-top: 10px;
background-color: #f2f2f2;
}
img.toggler {
margin-right: 3px;
margin-top: 3px;
cursor: pointer;
}
div.modindex-jumpbox {
border-top: 1px solid #ddd;
border-bottom: 1px solid #ddd;
margin: 1em 0 1em 0;
padding: 0.4em;
}
div.genindex-jumpbox {
border-top: 1px solid #ddd;
border-bottom: 1px solid #ddd;
margin: 1em 0 1em 0;
padding: 0.4em;
}
/* -- domain module index --------------------------------------------------- */
table.modindextable td {
padding: 2px;
border-collapse: collapse;
}
/* -- general body styles --------------------------------------------------- */
div.body {
min-width: 450px;
max-width: 800px;
}
div.body p, div.body dd, div.body li, div.body blockquote {
-moz-hyphens: auto;
-ms-hyphens: auto;
-webkit-hyphens: auto;
hyphens: auto;
}
a.headerlink {
visibility: hidden;
}
a.brackets:before,
span.brackets > a:before{
content: "[";
}
a.brackets:after,
span.brackets > a:after {
content: "]";
}
h1:hover > a.headerlink,
h2:hover > a.headerlink,
h3:hover > a.headerlink,
h4:hover > a.headerlink,
h5:hover > a.headerlink,
h6:hover > a.headerlink,
dt:hover > a.headerlink,
caption:hover > a.headerlink,
p.caption:hover > a.headerlink,
div.code-block-caption:hover > a.headerlink {
visibility: visible;
}
div.body p.caption {
text-align: inherit;
}
div.body td {
text-align: left;
}
.first {
margin-top: 0 !important;
}
p.rubric {
margin-top: 30px;
font-weight: bold;
}
img.align-left, .figure.align-left, object.align-left {
clear: left;
float: left;
margin-right: 1em;
}
img.align-right, .figure.align-right, object.align-right {
clear: right;
float: right;
margin-left: 1em;
}
img.align-center, .figure.align-center, object.align-center {
display: block;
margin-left: auto;
margin-right: auto;
}
img.align-default, .figure.align-default {
display: block;
margin-left: auto;
margin-right: auto;
}
.align-left {
text-align: left;
}
.align-center {
text-align: center;
}
.align-default {
text-align: center;
}
.align-right {
text-align: right;
}
/* -- sidebars -------------------------------------------------------------- */
div.sidebar {
margin: 0 0 0.5em 1em;
border: 1px solid #ddb;
padding: 7px;
background-color: #ffe;
width: 40%;
float: right;
clear: right;
overflow-x: auto;
}
p.sidebar-title {
font-weight: bold;
}
div.admonition, div.topic, blockquote {
clear: left;
}
/* -- topics ---------------------------------------------------------------- */
div.topic {
border: 1px solid #ccc;
padding: 7px;
margin: 10px 0 10px 0;
}
p.topic-title {
font-size: 1.1em;
font-weight: bold;
margin-top: 10px;
}
/* -- admonitions ----------------------------------------------------------- */
div.admonition {
margin-top: 10px;
margin-bottom: 10px;
padding: 7px;
}
div.admonition dt {
font-weight: bold;
}
p.admonition-title {
margin: 0px 10px 5px 0px;
font-weight: bold;
}
div.body p.centered {
text-align: center;
margin-top: 25px;
}
/* -- content of sidebars/topics/admonitions -------------------------------- */
div.sidebar > :last-child,
div.topic > :last-child,
div.admonition > :last-child {
margin-bottom: 0;
}
div.sidebar::after,
div.topic::after,
div.admonition::after,
blockquote::after {
display: block;
content: '';
clear: both;
}
/* -- tables ---------------------------------------------------------------- */
table.docutils {
margin-top: 10px;
margin-bottom: 10px;
border: 0;
border-collapse: collapse;
}
table.align-center {
margin-left: auto;
margin-right: auto;
}
table.align-default {
margin-left: auto;
margin-right: auto;
}
table caption span.caption-number {
font-style: italic;
}
table caption span.caption-text {
}
table.docutils td, table.docutils th {
padding: 1px 8px 1px 5px;
border-top: 0;
border-left: 0;
border-right: 0;
border-bottom: 1px solid #aaa;
}
table.footnote td, table.footnote th {
border: 0 !important;
}
th {
text-align: left;
padding-right: 5px;
}
table.citation {
border-left: solid 1px gray;
margin-left: 1px;
}
table.citation td {
border-bottom: none;
}
th > :first-child,
td > :first-child {
margin-top: 0px;
}
th > :last-child,
td > :last-child {
margin-bottom: 0px;
}
/* -- figures --------------------------------------------------------------- */
div.figure {
margin: 0.5em;
padding: 0.5em;
}
div.figure p.caption {
padding: 0.3em;
}
div.figure p.caption span.caption-number {
font-style: italic;
}
div.figure p.caption span.caption-text {
}
/* -- field list styles ----------------------------------------------------- */
table.field-list td, table.field-list th {
border: 0 !important;
}
.field-list ul {
margin: 0;
padding-left: 1em;
}
.field-list p {
margin: 0;
}
.field-name {
-moz-hyphens: manual;
-ms-hyphens: manual;
-webkit-hyphens: manual;
hyphens: manual;
}
/* -- hlist styles ---------------------------------------------------------- */
table.hlist {
margin: 1em 0;
}
table.hlist td {
vertical-align: top;
}
/* -- other body styles ----------------------------------------------------- */
ol.arabic {
list-style: decimal;
}
ol.loweralpha {
list-style: lower-alpha;
}
ol.upperalpha {
list-style: upper-alpha;
}
ol.lowerroman {
list-style: lower-roman;
}
ol.upperroman {
list-style: upper-roman;
}
:not(li) > ol > li:first-child > :first-child,
:not(li) > ul > li:first-child > :first-child {
margin-top: 0px;
}
:not(li) > ol > li:last-child > :last-child,
:not(li) > ul > li:last-child > :last-child {
margin-bottom: 0px;
}
ol.simple ol p,
ol.simple ul p,
ul.simple ol p,
ul.simple ul p {
margin-top: 0;
}
ol.simple > li:not(:first-child) > p,
ul.simple > li:not(:first-child) > p {
margin-top: 0;
}
ol.simple p,
ul.simple p {
margin-bottom: 0;
}
dl.footnote > dt,
dl.citation > dt {
float: left;
margin-right: 0.5em;
}
dl.footnote > dd,
dl.citation > dd {
margin-bottom: 0em;
}
dl.footnote > dd:after,
dl.citation > dd:after {
content: "";
clear: both;
}
dl.field-list {
display: grid;
grid-template-columns: fit-content(30%) auto;
}
dl.field-list > dt {
font-weight: bold;
word-break: break-word;
padding-left: 0.5em;
padding-right: 5px;
}
dl.field-list > dt:after {
content: ":";
}
dl.field-list > dd {
padding-left: 0.5em;
margin-top: 0em;
margin-left: 0em;
margin-bottom: 0em;
}
dl {
margin-bottom: 15px;
}
dd > :first-child {
margin-top: 0px;
}
dd ul, dd table {
margin-bottom: 10px;
}
dd {
margin-top: 3px;
margin-bottom: 10px;
margin-left: 30px;
}
dl > dd:last-child,
dl > dd:last-child > :last-child {
margin-bottom: 0;
}
dt:target, span.highlighted {
background-color: #0be54e;
}
rect.highlighted {
fill: #fbe54e;
}
dl.glossary dt {
font-weight: bold;
font-size: 1.1em;
}
.optional {
font-size: 1.3em;
}
.sig-paren {
font-size: larger;
}
.versionmodified {
font-style: italic;
}
.system-message {
background-color: #fda;
padding: 5px;
border: 3px solid red;
}
.footnote:target {
background-color: #ffa;
}
.line-block {
display: block;
margin-top: 1em;
margin-bottom: 1em;
}
.line-block .line-block {
margin-top: 0;
margin-bottom: 0;
margin-left: 1.5em;
}
.guilabel, .menuselection {
font-family: sans-serif;
}
.accelerator {
text-decoration: underline;
}
.classifier {
font-style: oblique;
}
.classifier:before {
font-style: normal;
margin: 0.5em;
content: ":";
}
abbr, acronym {
border-bottom: dotted 1px;
cursor: help;
}
/* -- code displays --------------------------------------------------------- */
pre {
overflow: auto;
overflow-y: hidden; /* fixes display issues on Chrome browsers */
}
pre, div[class*="highlight-"] {
clear: both;
}
span.pre {
-moz-hyphens: none;
-ms-hyphens: none;
-webkit-hyphens: none;
hyphens: none;
}
div[class*="highlight-"] {
margin: 1em 0;
}
td.linenos pre {
border: 0;
background-color: transparent;
color: #aaa;
}
table.highlighttable {
display: block;
}
table.highlighttable tbody {
display: block;
}
table.highlighttable tr {
display: flex;
}
table.highlighttable td {
margin: 0;
padding: 0;
}
table.highlighttable td.linenos {
padding-right: 0.5em;
}
table.highlighttable td.code {
flex: 1;
overflow: hidden;
}
.highlight .hll {
display: block;
}
div.highlight pre,
table.highlighttable pre {
margin: 0;
}
div.code-block-caption + div {
margin-top: 0;
}
div.code-block-caption {
margin-top: 1em;
padding: 2px 5px;
font-size: small;
}
div.code-block-caption code {
background-color: transparent;
}
table.highlighttable td.linenos,
div.doctest > div.highlight span.gp { /* gp: Generic.Prompt */
user-select: none;
}
div.code-block-caption span.caption-number {
padding: 0.1em 0.3em;
font-style: italic;
}
div.code-block-caption span.caption-text {
}
div.literal-block-wrapper {
margin: 1em 0;
}
code.descname {
background-color: transparent;
font-weight: bold;
font-size: 1.2em;
}
code.descclassname {
background-color: transparent;
}
code.xref, a code {
background-color: transparent;
font-weight: bold;
}
h1 code, h2 code, h3 code, h4 code, h5 code, h6 code {
background-color: transparent;
}
.viewcode-link {
float: right;
}
.viewcode-back {
float: right;
font-family: sans-serif;
}
div.viewcode-block:target {
margin: -1px -10px;
padding: 0 10px;
}
/* -- math display ---------------------------------------------------------- */
img.math {
vertical-align: middle;
}
div.body div.math p {
text-align: center;
}
span.eqno {
float: right;
}
span.eqno a.headerlink {
position: absolute;
z-index: 1;
}
div.math:hover a.headerlink {
visibility: visible;
}
/* -- printout stylesheet --------------------------------------------------- */
@media print {
div.document,
div.documentwrapper,
div.bodywrapper {
margin: 0 !important;
width: 100%;
}
div.sphinxsidebar,
div.related,
div.footer,
#top-link {
display: none;
}
}

View file

@ -1,271 +0,0 @@
/*
* classic.css_t
* ~~~~~~~~~~~~~
*
* Sphinx stylesheet -- classic theme.
*
* :copyright: Copyright 2007-2020 by the Sphinx team, see AUTHORS.
* :license: BSD, see LICENSE for details.
*
*/
@import url("basic.css");
/* -- page layout ----------------------------------------------------------- */
html {
/* CSS hack for macOS's scrollbar (see #1125) */
background-color: #FFFFFF;
}
body {
font-family: sans-serif;
font-size: 100%;
background-color: #808080;
color: #000;
margin: 0;
padding: 0;
}
div.document {
/* CHANGE ME */
background-color: #c0c0c0;
}
div.documentwrapper {
float: left;
width: 100%;
}
div.bodywrapper {
margin: 0 0 0 230px;
}
div.body {
background-color: #ffffff;
color: #000000;
padding: 0 20px 30px 20px;
}
div.footer {
color: #ffffff;
width: 100%;
padding: 9px 0 9px 0;
text-align: center;
font-size: 75%;
}
div.footer a {
color: #ffffff;
text-decoration: underline;
}
div.related {
/* CHANGE ME */
background-color: #808080;
line-height: 30px;
color: #000000;
}
div.related a {
/* CHANGE ME */
color: #000000;
}
div.sphinxsidebar {
}
div.sphinxsidebar h3 {
font-family: 'Trebuchet MS', sans-serif;
color: #000000;
font-size: 1.4em;
font-weight: normal;
margin: 0;
padding: 0;
}
div.sphinxsidebar h3 a {
color: #000000;
}
div.sphinxsidebar h4 {
font-family: 'Trebuchet MS', sans-serif;
color: #000000;
font-size: 1.3em;
font-weight: normal;
margin: 5px 0 0 0;
padding: 0;
}
div.sphinxsidebar p {
color: #000000;
}
div.sphinxsidebar p.topless {
margin: 5px 10px 10px 10px;
}
div.sphinxsidebar ul {
margin: 10px;
padding: 0;
color: #ffffff;
}
div.sphinxsidebar a {
/* CHANGE ME */
color: #404040;
}
div.sphinxsidebar input {
border: 1px solid #98dbcc;
font-family: sans-serif;
font-size: 1em;
}
/* -- hyperlink styles ------------------------------------------------------ */
a {
color: #800000;
text-decoration: none;
}
a:visited {
color: #800000;
text-decoration: none;
}
a:hover {
text-decoration: underline;
}
/* -- body styles ----------------------------------------------------------- */
div.body h1,
div.body h2,
div.body h3,
div.body h4,
div.body h5,
div.body h6 {
font-family: 'Trebuchet MS', sans-serif;
background-color: #f2f2f2;
font-weight: normal;
/* CHANGE ME */
color: #800000;
border-bottom: 1px solid #ccc;
margin: 20px -20px 10px -20px;
padding: 3px 0 3px 10px;
}
div.body h1 { margin-top: 0; font-size: 200%; }
div.body h2 { font-size: 160%; }
div.body h3 { font-size: 140%; }
div.body h4 { font-size: 120%; }
div.body h5 { font-size: 110%; }
div.body h6 { font-size: 100%; }
a.headerlink {
color: #c60f0f;
font-size: 0.8em;
padding: 0 4px 0 4px;
text-decoration: none;
}
a.headerlink:hover {
background-color: #c60f0f;
color: white;
}
div.body p, div.body dd, div.body li, div.body blockquote {
text-align: justify;
line-height: 130%;
}
div.admonition p.admonition-title + p {
display: inline;
}
div.admonition p {
margin-bottom: 5px;
}
div.admonition pre {
margin-bottom: 5px;
}
div.admonition ul, div.admonition ol {
margin-bottom: 5px;
}
div.note {
background-color: #eee;
border: 1px solid #ccc;
}
div.seealso {
background-color: #ffc;
border: 1px solid #ff6;
}
div.topic {
background-color: #eee;
}
div.warning {
background-color: #ffe4e4;
border: 1px solid #f66;
}
p.admonition-title {
display: inline;
}
p.admonition-title:after {
content: ":";
}
pre {
padding: 5px;
background-color: unset;
color: unset;
line-height: 120%;
border: 1px solid #ac9;
border-left: none;
border-right: none;
}
code {
background-color: #ecf0f3;
padding: 0 1px 0 1px;
font-size: 0.95em;
}
th, dl.field-list > dt {
background-color: #ede;
}
.warning code {
background: #efc2c2;
}
.note code {
background: #d6d6d6;
}
.viewcode-back {
font-family: sans-serif;
}
div.viewcode-block:target {
background-color: #f4debf;
border-top: 1px solid #ac9;
border-bottom: 1px solid #ac9;
}
div.code-block-caption {
color: #efefef;
background-color: #1c4e63;
}

View file

@ -1,315 +0,0 @@
/*
* doctools.js
* ~~~~~~~~~~~
*
* Sphinx JavaScript utilities for all documentation.
*
* :copyright: Copyright 2007-2020 by the Sphinx team, see AUTHORS.
* :license: BSD, see LICENSE for details.
*
*/
/**
* select a different prefix for underscore
*/
$u = _.noConflict();
/**
* make the code below compatible with browsers without
* an installed firebug like debugger
if (!window.console || !console.firebug) {
var names = ["log", "debug", "info", "warn", "error", "assert", "dir",
"dirxml", "group", "groupEnd", "time", "timeEnd", "count", "trace",
"profile", "profileEnd"];
window.console = {};
for (var i = 0; i < names.length; ++i)
window.console[names[i]] = function() {};
}
*/
/**
* small helper function to urldecode strings
*/
jQuery.urldecode = function(x) {
return decodeURIComponent(x).replace(/\+/g, ' ');
};
/**
* small helper function to urlencode strings
*/
jQuery.urlencode = encodeURIComponent;
/**
* This function returns the parsed url parameters of the
* current request. Multiple values per key are supported,
* it will always return arrays of strings for the value parts.
*/
jQuery.getQueryParameters = function(s) {
if (typeof s === 'undefined')
s = document.location.search;
var parts = s.substr(s.indexOf('?') + 1).split('&');
var result = {};
for (var i = 0; i < parts.length; i++) {
var tmp = parts[i].split('=', 2);
var key = jQuery.urldecode(tmp[0]);
var value = jQuery.urldecode(tmp[1]);
if (key in result)
result[key].push(value);
else
result[key] = [value];
}
return result;
};
/**
* highlight a given string on a jquery object by wrapping it in
* span elements with the given class name.
*/
jQuery.fn.highlightText = function(text, className) {
function highlight(node, addItems) {
if (node.nodeType === 3) {
var val = node.nodeValue;
var pos = val.toLowerCase().indexOf(text);
if (pos >= 0 &&
!jQuery(node.parentNode).hasClass(className) &&
!jQuery(node.parentNode).hasClass("nohighlight")) {
var span;
var isInSVG = jQuery(node).closest("body, svg, foreignObject").is("svg");
if (isInSVG) {
span = document.createElementNS("http://www.w3.org/2000/svg", "tspan");
} else {
span = document.createElement("span");
span.className = className;
}
span.appendChild(document.createTextNode(val.substr(pos, text.length)));
node.parentNode.insertBefore(span, node.parentNode.insertBefore(
document.createTextNode(val.substr(pos + text.length)),
node.nextSibling));
node.nodeValue = val.substr(0, pos);
if (isInSVG) {
var rect = document.createElementNS("http://www.w3.org/2000/svg", "rect");
var bbox = node.parentElement.getBBox();
rect.x.baseVal.value = bbox.x;
rect.y.baseVal.value = bbox.y;
rect.width.baseVal.value = bbox.width;
rect.height.baseVal.value = bbox.height;
rect.setAttribute('class', className);
addItems.push({
"parent": node.parentNode,
"target": rect});
}
}
}
else if (!jQuery(node).is("button, select, textarea")) {
jQuery.each(node.childNodes, function() {
highlight(this, addItems);
});
}
}
var addItems = [];
var result = this.each(function() {
highlight(this, addItems);
});
for (var i = 0; i < addItems.length; ++i) {
jQuery(addItems[i].parent).before(addItems[i].target);
}
return result;
};
/*
* backward compatibility for jQuery.browser
* This will be supported until firefox bug is fixed.
*/
if (!jQuery.browser) {
jQuery.uaMatch = function(ua) {
ua = ua.toLowerCase();
var match = /(chrome)[ \/]([\w.]+)/.exec(ua) ||
/(webkit)[ \/]([\w.]+)/.exec(ua) ||
/(opera)(?:.*version|)[ \/]([\w.]+)/.exec(ua) ||
/(msie) ([\w.]+)/.exec(ua) ||
ua.indexOf("compatible") < 0 && /(mozilla)(?:.*? rv:([\w.]+)|)/.exec(ua) ||
[];
return {
browser: match[ 1 ] || "",
version: match[ 2 ] || "0"
};
};
jQuery.browser = {};
jQuery.browser[jQuery.uaMatch(navigator.userAgent).browser] = true;
}
/**
* Small JavaScript module for the documentation.
*/
var Documentation = {
init : function() {
this.fixFirefoxAnchorBug();
this.highlightSearchWords();
this.initIndexTable();
if (DOCUMENTATION_OPTIONS.NAVIGATION_WITH_KEYS) {
this.initOnKeyListeners();
}
},
/**
* i18n support
*/
TRANSLATIONS : {},
PLURAL_EXPR : function(n) { return n === 1 ? 0 : 1; },
LOCALE : 'unknown',
// gettext and ngettext don't access this so that the functions
// can safely bound to a different name (_ = Documentation.gettext)
gettext : function(string) {
var translated = Documentation.TRANSLATIONS[string];
if (typeof translated === 'undefined')
return string;
return (typeof translated === 'string') ? translated : translated[0];
},
ngettext : function(singular, plural, n) {
var translated = Documentation.TRANSLATIONS[singular];
if (typeof translated === 'undefined')
return (n == 1) ? singular : plural;
return translated[Documentation.PLURALEXPR(n)];
},
addTranslations : function(catalog) {
for (var key in catalog.messages)
this.TRANSLATIONS[key] = catalog.messages[key];
this.PLURAL_EXPR = new Function('n', 'return +(' + catalog.plural_expr + ')');
this.LOCALE = catalog.locale;
},
/**
* add context elements like header anchor links
*/
addContextElements : function() {
$('div[id] > :header:first').each(function() {
$('<a class="headerlink">\u00B6</a>').
attr('href', '#' + this.id).
attr('title', _('Permalink to this headline')).
appendTo(this);
});
$('dt[id]').each(function() {
$('<a class="headerlink">\u00B6</a>').
attr('href', '#' + this.id).
attr('title', _('Permalink to this definition')).
appendTo(this);
});
},
/**
* workaround a firefox stupidity
* see: https://bugzilla.mozilla.org/show_bug.cgi?id=645075
*/
fixFirefoxAnchorBug : function() {
if (document.location.hash && $.browser.mozilla)
window.setTimeout(function() {
document.location.href += '';
}, 10);
},
/**
* highlight the search words provided in the url in the text
*/
highlightSearchWords : function() {
var params = $.getQueryParameters();
var terms = (params.highlight) ? params.highlight[0].split(/\s+/) : [];
if (terms.length) {
var body = $('div.body');
if (!body.length) {
body = $('body');
}
window.setTimeout(function() {
$.each(terms, function() {
body.highlightText(this.toLowerCase(), 'highlighted');
});
}, 10);
$('<p class="highlight-link"><a href="javascript:Documentation.' +
'hideSearchWords()">' + _('Hide Search Matches') + '</a></p>')
.appendTo($('#searchbox'));
}
},
/**
* init the domain index toggle buttons
*/
initIndexTable : function() {
var togglers = $('img.toggler').click(function() {
var src = $(this).attr('src');
var idnum = $(this).attr('id').substr(7);
$('tr.cg-' + idnum).toggle();
if (src.substr(-9) === 'minus.png')
$(this).attr('src', src.substr(0, src.length-9) + 'plus.png');
else
$(this).attr('src', src.substr(0, src.length-8) + 'minus.png');
}).css('display', '');
if (DOCUMENTATION_OPTIONS.COLLAPSE_INDEX) {
togglers.click();
}
},
/**
* helper function to hide the search marks again
*/
hideSearchWords : function() {
$('#searchbox .highlight-link').fadeOut(300);
$('span.highlighted').removeClass('highlighted');
},
/**
* make the url absolute
*/
makeURL : function(relativeURL) {
return DOCUMENTATION_OPTIONS.URL_ROOT + '/' + relativeURL;
},
/**
* get the current relative url
*/
getCurrentURL : function() {
var path = document.location.pathname;
var parts = path.split(/\//);
$.each(DOCUMENTATION_OPTIONS.URL_ROOT.split(/\//), function() {
if (this === '..')
parts.pop();
});
var url = parts.join('/');
return path.substring(url.lastIndexOf('/') + 1, path.length - 1);
},
initOnKeyListeners: function() {
$(document).keydown(function(event) {
var activeElementType = document.activeElement.tagName;
// don't navigate when in search box or textarea
if (activeElementType !== 'TEXTAREA' && activeElementType !== 'INPUT' && activeElementType !== 'SELECT'
&& !event.altKey && !event.ctrlKey && !event.metaKey && !event.shiftKey) {
switch (event.keyCode) {
case 37: // left
var prevHref = $('link[rel="prev"]').prop('href');
if (prevHref) {
window.location.href = prevHref;
return false;
}
case 39: // right
var nextHref = $('link[rel="next"]').prop('href');
if (nextHref) {
window.location.href = nextHref;
return false;
}
}
}
});
}
};
// quick alias for translations
_ = Documentation.gettext;
$(document).ready(function() {
Documentation.init();
});

View file

@ -1,12 +0,0 @@
var DOCUMENTATION_OPTIONS = {
URL_ROOT: document.getElementById("documentation_options").getAttribute('data-url_root'),
VERSION: '5.10.2',
LANGUAGE: 'None',
COLLAPSE_INDEX: false,
BUILDER: 'html',
FILE_SUFFIX: '.html',
LINK_SUFFIX: '.html',
HAS_SOURCE: true,
SOURCELINK_SUFFIX: '.txt',
NAVIGATION_WITH_KEYS: false
};

Binary file not shown.

Before

Width:  |  Height:  |  Size: 286 B

File diff suppressed because it is too large Load diff

File diff suppressed because one or more lines are too long

View file

@ -1,297 +0,0 @@
/*
* language_data.js
* ~~~~~~~~~~~~~~~~
*
* This script contains the language-specific data used by searchtools.js,
* namely the list of stopwords, stemmer, scorer and splitter.
*
* :copyright: Copyright 2007-2020 by the Sphinx team, see AUTHORS.
* :license: BSD, see LICENSE for details.
*
*/
var stopwords = ["a","and","are","as","at","be","but","by","for","if","in","into","is","it","near","no","not","of","on","or","such","that","the","their","then","there","these","they","this","to","was","will","with"];
/* Non-minified version JS is _stemmer.js if file is provided */
/**
* Porter Stemmer
*/
var Stemmer = function() {
var step2list = {
ational: 'ate',
tional: 'tion',
enci: 'ence',
anci: 'ance',
izer: 'ize',
bli: 'ble',
alli: 'al',
entli: 'ent',
eli: 'e',
ousli: 'ous',
ization: 'ize',
ation: 'ate',
ator: 'ate',
alism: 'al',
iveness: 'ive',
fulness: 'ful',
ousness: 'ous',
aliti: 'al',
iviti: 'ive',
biliti: 'ble',
logi: 'log'
};
var step3list = {
icate: 'ic',
ative: '',
alize: 'al',
iciti: 'ic',
ical: 'ic',
ful: '',
ness: ''
};
var c = "[^aeiou]"; // consonant
var v = "[aeiouy]"; // vowel
var C = c + "[^aeiouy]*"; // consonant sequence
var V = v + "[aeiou]*"; // vowel sequence
var mgr0 = "^(" + C + ")?" + V + C; // [C]VC... is m>0
var meq1 = "^(" + C + ")?" + V + C + "(" + V + ")?$"; // [C]VC[V] is m=1
var mgr1 = "^(" + C + ")?" + V + C + V + C; // [C]VCVC... is m>1
var s_v = "^(" + C + ")?" + v; // vowel in stem
this.stemWord = function (w) {
var stem;
var suffix;
var firstch;
var origword = w;
if (w.length < 3)
return w;
var re;
var re2;
var re3;
var re4;
firstch = w.substr(0,1);
if (firstch == "y")
w = firstch.toUpperCase() + w.substr(1);
// Step 1a
re = /^(.+?)(ss|i)es$/;
re2 = /^(.+?)([^s])s$/;
if (re.test(w))
w = w.replace(re,"$1$2");
else if (re2.test(w))
w = w.replace(re2,"$1$2");
// Step 1b
re = /^(.+?)eed$/;
re2 = /^(.+?)(ed|ing)$/;
if (re.test(w)) {
var fp = re.exec(w);
re = new RegExp(mgr0);
if (re.test(fp[1])) {
re = /.$/;
w = w.replace(re,"");
}
}
else if (re2.test(w)) {
var fp = re2.exec(w);
stem = fp[1];
re2 = new RegExp(s_v);
if (re2.test(stem)) {
w = stem;
re2 = /(at|bl|iz)$/;
re3 = new RegExp("([^aeiouylsz])\\1$");
re4 = new RegExp("^" + C + v + "[^aeiouwxy]$");
if (re2.test(w))
w = w + "e";
else if (re3.test(w)) {
re = /.$/;
w = w.replace(re,"");
}
else if (re4.test(w))
w = w + "e";
}
}
// Step 1c
re = /^(.+?)y$/;
if (re.test(w)) {
var fp = re.exec(w);
stem = fp[1];
re = new RegExp(s_v);
if (re.test(stem))
w = stem + "i";
}
// Step 2
re = /^(.+?)(ational|tional|enci|anci|izer|bli|alli|entli|eli|ousli|ization|ation|ator|alism|iveness|fulness|ousness|aliti|iviti|biliti|logi)$/;
if (re.test(w)) {
var fp = re.exec(w);
stem = fp[1];
suffix = fp[2];
re = new RegExp(mgr0);
if (re.test(stem))
w = stem + step2list[suffix];
}
// Step 3
re = /^(.+?)(icate|ative|alize|iciti|ical|ful|ness)$/;
if (re.test(w)) {
var fp = re.exec(w);
stem = fp[1];
suffix = fp[2];
re = new RegExp(mgr0);
if (re.test(stem))
w = stem + step3list[suffix];
}
// Step 4
re = /^(.+?)(al|ance|ence|er|ic|able|ible|ant|ement|ment|ent|ou|ism|ate|iti|ous|ive|ize)$/;
re2 = /^(.+?)(s|t)(ion)$/;
if (re.test(w)) {
var fp = re.exec(w);
stem = fp[1];
re = new RegExp(mgr1);
if (re.test(stem))
w = stem;
}
else if (re2.test(w)) {
var fp = re2.exec(w);
stem = fp[1] + fp[2];
re2 = new RegExp(mgr1);
if (re2.test(stem))
w = stem;
}
// Step 5
re = /^(.+?)e$/;
if (re.test(w)) {
var fp = re.exec(w);
stem = fp[1];
re = new RegExp(mgr1);
re2 = new RegExp(meq1);
re3 = new RegExp("^" + C + v + "[^aeiouwxy]$");
if (re.test(stem) || (re2.test(stem) && !(re3.test(stem))))
w = stem;
}
re = /ll$/;
re2 = new RegExp(mgr1);
if (re.test(w) && re2.test(w)) {
re = /.$/;
w = w.replace(re,"");
}
// and turn initial Y back to y
if (firstch == "y")
w = firstch.toLowerCase() + w.substr(1);
return w;
}
}
var splitChars = (function() {
var result = {};
var singles = [96, 180, 187, 191, 215, 247, 749, 885, 903, 907, 909, 930, 1014, 1648,
1748, 1809, 2416, 2473, 2481, 2526, 2601, 2609, 2612, 2615, 2653, 2702,
2706, 2729, 2737, 2740, 2857, 2865, 2868, 2910, 2928, 2948, 2961, 2971,
2973, 3085, 3089, 3113, 3124, 3213, 3217, 3241, 3252, 3295, 3341, 3345,
3369, 3506, 3516, 3633, 3715, 3721, 3736, 3744, 3748, 3750, 3756, 3761,
3781, 3912, 4239, 4347, 4681, 4695, 4697, 4745, 4785, 4799, 4801, 4823,
4881, 5760, 5901, 5997, 6313, 7405, 8024, 8026, 8028, 8030, 8117, 8125,
8133, 8181, 8468, 8485, 8487, 8489, 8494, 8527, 11311, 11359, 11687, 11695,
11703, 11711, 11719, 11727, 11735, 12448, 12539, 43010, 43014, 43019, 43587,
43696, 43713, 64286, 64297, 64311, 64317, 64319, 64322, 64325, 65141];
var i, j, start, end;
for (i = 0; i < singles.length; i++) {
result[singles[i]] = true;
}
var ranges = [[0, 47], [58, 64], [91, 94], [123, 169], [171, 177], [182, 184], [706, 709],
[722, 735], [741, 747], [751, 879], [888, 889], [894, 901], [1154, 1161],
[1318, 1328], [1367, 1368], [1370, 1376], [1416, 1487], [1515, 1519], [1523, 1568],
[1611, 1631], [1642, 1645], [1750, 1764], [1767, 1773], [1789, 1790], [1792, 1807],
[1840, 1868], [1958, 1968], [1970, 1983], [2027, 2035], [2038, 2041], [2043, 2047],
[2070, 2073], [2075, 2083], [2085, 2087], [2089, 2307], [2362, 2364], [2366, 2383],
[2385, 2391], [2402, 2405], [2419, 2424], [2432, 2436], [2445, 2446], [2449, 2450],
[2483, 2485], [2490, 2492], [2494, 2509], [2511, 2523], [2530, 2533], [2546, 2547],
[2554, 2564], [2571, 2574], [2577, 2578], [2618, 2648], [2655, 2661], [2672, 2673],
[2677, 2692], [2746, 2748], [2750, 2767], [2769, 2783], [2786, 2789], [2800, 2820],
[2829, 2830], [2833, 2834], [2874, 2876], [2878, 2907], [2914, 2917], [2930, 2946],
[2955, 2957], [2966, 2968], [2976, 2978], [2981, 2983], [2987, 2989], [3002, 3023],
[3025, 3045], [3059, 3076], [3130, 3132], [3134, 3159], [3162, 3167], [3170, 3173],
[3184, 3191], [3199, 3204], [3258, 3260], [3262, 3293], [3298, 3301], [3312, 3332],
[3386, 3388], [3390, 3423], [3426, 3429], [3446, 3449], [3456, 3460], [3479, 3481],
[3518, 3519], [3527, 3584], [3636, 3647], [3655, 3663], [3674, 3712], [3717, 3718],
[3723, 3724], [3726, 3731], [3752, 3753], [3764, 3772], [3774, 3775], [3783, 3791],
[3802, 3803], [3806, 3839], [3841, 3871], [3892, 3903], [3949, 3975], [3980, 4095],
[4139, 4158], [4170, 4175], [4182, 4185], [4190, 4192], [4194, 4196], [4199, 4205],
[4209, 4212], [4226, 4237], [4250, 4255], [4294, 4303], [4349, 4351], [4686, 4687],
[4702, 4703], [4750, 4751], [4790, 4791], [4806, 4807], [4886, 4887], [4955, 4968],
[4989, 4991], [5008, 5023], [5109, 5120], [5741, 5742], [5787, 5791], [5867, 5869],
[5873, 5887], [5906, 5919], [5938, 5951], [5970, 5983], [6001, 6015], [6068, 6102],
[6104, 6107], [6109, 6111], [6122, 6127], [6138, 6159], [6170, 6175], [6264, 6271],
[6315, 6319], [6390, 6399], [6429, 6469], [6510, 6511], [6517, 6527], [6572, 6592],
[6600, 6607], [6619, 6655], [6679, 6687], [6741, 6783], [6794, 6799], [6810, 6822],
[6824, 6916], [6964, 6980], [6988, 6991], [7002, 7042], [7073, 7085], [7098, 7167],
[7204, 7231], [7242, 7244], [7294, 7400], [7410, 7423], [7616, 7679], [7958, 7959],
[7966, 7967], [8006, 8007], [8014, 8015], [8062, 8063], [8127, 8129], [8141, 8143],
[8148, 8149], [8156, 8159], [8173, 8177], [8189, 8303], [8306, 8307], [8314, 8318],
[8330, 8335], [8341, 8449], [8451, 8454], [8456, 8457], [8470, 8472], [8478, 8483],
[8506, 8507], [8512, 8516], [8522, 8525], [8586, 9311], [9372, 9449], [9472, 10101],
[10132, 11263], [11493, 11498], [11503, 11516], [11518, 11519], [11558, 11567],
[11622, 11630], [11632, 11647], [11671, 11679], [11743, 11822], [11824, 12292],
[12296, 12320], [12330, 12336], [12342, 12343], [12349, 12352], [12439, 12444],
[12544, 12548], [12590, 12592], [12687, 12689], [12694, 12703], [12728, 12783],
[12800, 12831], [12842, 12880], [12896, 12927], [12938, 12976], [12992, 13311],
[19894, 19967], [40908, 40959], [42125, 42191], [42238, 42239], [42509, 42511],
[42540, 42559], [42592, 42593], [42607, 42622], [42648, 42655], [42736, 42774],
[42784, 42785], [42889, 42890], [42893, 43002], [43043, 43055], [43062, 43071],
[43124, 43137], [43188, 43215], [43226, 43249], [43256, 43258], [43260, 43263],
[43302, 43311], [43335, 43359], [43389, 43395], [43443, 43470], [43482, 43519],
[43561, 43583], [43596, 43599], [43610, 43615], [43639, 43641], [43643, 43647],
[43698, 43700], [43703, 43704], [43710, 43711], [43715, 43738], [43742, 43967],
[44003, 44015], [44026, 44031], [55204, 55215], [55239, 55242], [55292, 55295],
[57344, 63743], [64046, 64047], [64110, 64111], [64218, 64255], [64263, 64274],
[64280, 64284], [64434, 64466], [64830, 64847], [64912, 64913], [64968, 65007],
[65020, 65135], [65277, 65295], [65306, 65312], [65339, 65344], [65371, 65381],
[65471, 65473], [65480, 65481], [65488, 65489], [65496, 65497]];
for (i = 0; i < ranges.length; i++) {
start = ranges[i][0];
end = ranges[i][1];
for (j = start; j <= end; j++) {
result[j] = true;
}
}
return result;
})();
function splitQuery(query) {
var result = [];
var start = -1;
for (var i = 0; i < query.length; i++) {
if (splitChars[query.charCodeAt(i)]) {
if (start !== -1) {
result.push(query.slice(start, i));
start = -1;
}
} else if (start === -1) {
start = i;
}
}
if (start !== -1) {
result.push(query.slice(start));
}
return result;
}

Binary file not shown.

Before

Width:  |  Height:  |  Size: 90 B

Binary file not shown.

Before

Width:  |  Height:  |  Size: 90 B

View file

@ -1,76 +0,0 @@
pre { line-height: 125%; margin: 0; }
td.linenos pre { color: #000000; background-color: #f0f0f0; padding: 0 5px 0 5px; }
span.linenos { color: #000000; background-color: #f0f0f0; padding: 0 5px 0 5px; }
td.linenos pre.special { color: #000000; background-color: #ffffc0; padding: 0 5px 0 5px; }
span.linenos.special { color: #000000; background-color: #ffffc0; padding: 0 5px 0 5px; }
.highlight .hll { background-color: #ffffcc }
/*.highlight { background: #eeffcc; }*/
/* CHANGE ME */
.highlight { background: #eae2cb; }
.highlight .c { color: #408090; font-style: italic } /* Comment */
.highlight .err { border: 1px solid #FF0000 } /* Error */
.highlight .k { color: #007020; font-weight: bold } /* Keyword */
.highlight .o { color: #666666 } /* Operator */
.highlight .ch { color: #408090; font-style: italic } /* Comment.Hashbang */
.highlight .cm { color: #408090; font-style: italic } /* Comment.Multiline */
.highlight .cp { color: #007020 } /* Comment.Preproc */
.highlight .cpf { color: #408090; font-style: italic } /* Comment.PreprocFile */
.highlight .c1 { color: #408090; font-style: italic } /* Comment.Single */
.highlight .cs { color: #408090; background-color: #fff0f0 } /* Comment.Special */
.highlight .gd { color: #A00000 } /* Generic.Deleted */
.highlight .ge { font-style: italic } /* Generic.Emph */
.highlight .gr { color: #FF0000 } /* Generic.Error */
.highlight .gh { color: #000080; font-weight: bold } /* Generic.Heading */
.highlight .gi { color: #00A000 } /* Generic.Inserted */
.highlight .go { color: #333333 } /* Generic.Output */
.highlight .gp { color: #c65d09; font-weight: bold } /* Generic.Prompt */
.highlight .gs { font-weight: bold } /* Generic.Strong */
.highlight .gu { color: #800080; font-weight: bold } /* Generic.Subheading */
.highlight .gt { color: #0044DD } /* Generic.Traceback */
.highlight .kc { color: #007020; font-weight: bold } /* Keyword.Constant */
.highlight .kd { color: #007020; font-weight: bold } /* Keyword.Declaration */
.highlight .kn { color: #007020; font-weight: bold } /* Keyword.Namespace */
.highlight .kp { color: #007020 } /* Keyword.Pseudo */
.highlight .kr { color: #007020; font-weight: bold } /* Keyword.Reserved */
.highlight .kt { color: #902000 } /* Keyword.Type */
.highlight .m { color: #208050 } /* Literal.Number */
.highlight .s { color: #4070a0 } /* Literal.String */
.highlight .na { color: #4070a0 } /* Name.Attribute */
.highlight .nb { color: #007020 } /* Name.Builtin */
.highlight .nc { color: #0e84b5; font-weight: bold } /* Name.Class */
.highlight .no { color: #60add5 } /* Name.Constant */
.highlight .nd { color: #555555; font-weight: bold } /* Name.Decorator */
.highlight .ni { color: #d55537; font-weight: bold } /* Name.Entity */
.highlight .ne { color: #007020 } /* Name.Exception */
.highlight .nf { color: #06287e } /* Name.Function */
.highlight .nl { color: #002070; font-weight: bold } /* Name.Label */
.highlight .nn { color: #0e84b5; font-weight: bold } /* Name.Namespace */
.highlight .nt { color: #062873; font-weight: bold } /* Name.Tag */
.highlight .nv { color: #bb60d5 } /* Name.Variable */
.highlight .ow { color: #007020; font-weight: bold } /* Operator.Word */
.highlight .w { color: #bbbbbb } /* Text.Whitespace */
.highlight .mb { color: #208050 } /* Literal.Number.Bin */
.highlight .mf { color: #208050 } /* Literal.Number.Float */
.highlight .mh { color: #208050 } /* Literal.Number.Hex */
.highlight .mi { color: #208050 } /* Literal.Number.Integer */
.highlight .mo { color: #208050 } /* Literal.Number.Oct */
.highlight .sa { color: #4070a0 } /* Literal.String.Affix */
.highlight .sb { color: #4070a0 } /* Literal.String.Backtick */
.highlight .sc { color: #4070a0 } /* Literal.String.Char */
.highlight .dl { color: #4070a0 } /* Literal.String.Delimiter */
.highlight .sd { color: #4070a0; font-style: italic } /* Literal.String.Doc */
.highlight .s2 { color: #4070a0 } /* Literal.String.Double */
.highlight .se { color: #4070a0; font-weight: bold } /* Literal.String.Escape */
.highlight .sh { color: #4070a0 } /* Literal.String.Heredoc */
.highlight .si { color: #70a0d0; font-style: italic } /* Literal.String.Interpol */
.highlight .sx { color: #c65d09 } /* Literal.String.Other */
.highlight .sr { color: #235388 } /* Literal.String.Regex */
.highlight .s1 { color: #4070a0 } /* Literal.String.Single */
.highlight .ss { color: #517918 } /* Literal.String.Symbol */
.highlight .bp { color: #007020 } /* Name.Builtin.Pseudo */
.highlight .fm { color: #06287e } /* Name.Function.Magic */
.highlight .vc { color: #bb60d5 } /* Name.Variable.Class */
.highlight .vg { color: #bb60d5 } /* Name.Variable.Global */
.highlight .vi { color: #bb60d5 } /* Name.Variable.Instance */
.highlight .vm { color: #bb60d5 } /* Name.Variable.Magic */
.highlight .il { color: #208050 } /* Literal.Number.Integer.Long */

View file

@ -1,514 +0,0 @@
/*
* searchtools.js
* ~~~~~~~~~~~~~~~~
*
* Sphinx JavaScript utilities for the full-text search.
*
* :copyright: Copyright 2007-2020 by the Sphinx team, see AUTHORS.
* :license: BSD, see LICENSE for details.
*
*/
if (!Scorer) {
/**
* Simple result scoring code.
*/
var Scorer = {
// Implement the following function to further tweak the score for each result
// The function takes a result array [filename, title, anchor, descr, score]
// and returns the new score.
/*
score: function(result) {
return result[4];
},
*/
// query matches the full name of an object
objNameMatch: 11,
// or matches in the last dotted part of the object name
objPartialMatch: 6,
// Additive scores depending on the priority of the object
objPrio: {0: 15, // used to be importantResults
1: 5, // used to be objectResults
2: -5}, // used to be unimportantResults
// Used when the priority is not in the mapping.
objPrioDefault: 0,
// query found in title
title: 15,
partialTitle: 7,
// query found in terms
term: 5,
partialTerm: 2
};
}
if (!splitQuery) {
function splitQuery(query) {
return query.split(/\s+/);
}
}
/**
* Search Module
*/
var Search = {
_index : null,
_queued_query : null,
_pulse_status : -1,
htmlToText : function(htmlString) {
var htmlElement = document.createElement('span');
htmlElement.innerHTML = htmlString;
$(htmlElement).find('.headerlink').remove();
docContent = $(htmlElement).find('[role=main]')[0];
if(docContent === undefined) {
console.warn("Content block not found. Sphinx search tries to obtain it " +
"via '[role=main]'. Could you check your theme or template.");
return "";
}
return docContent.textContent || docContent.innerText;
},
init : function() {
var params = $.getQueryParameters();
if (params.q) {
var query = params.q[0];
$('input[name="q"]')[0].value = query;
this.performSearch(query);
}
},
loadIndex : function(url) {
$.ajax({type: "GET", url: url, data: null,
dataType: "script", cache: true,
complete: function(jqxhr, textstatus) {
if (textstatus != "success") {
document.getElementById("searchindexloader").src = url;
}
}});
},
setIndex : function(index) {
var q;
this._index = index;
if ((q = this._queued_query) !== null) {
this._queued_query = null;
Search.query(q);
}
},
hasIndex : function() {
return this._index !== null;
},
deferQuery : function(query) {
this._queued_query = query;
},
stopPulse : function() {
this._pulse_status = 0;
},
startPulse : function() {
if (this._pulse_status >= 0)
return;
function pulse() {
var i;
Search._pulse_status = (Search._pulse_status + 1) % 4;
var dotString = '';
for (i = 0; i < Search._pulse_status; i++)
dotString += '.';
Search.dots.text(dotString);
if (Search._pulse_status > -1)
window.setTimeout(pulse, 500);
}
pulse();
},
/**
* perform a search for something (or wait until index is loaded)
*/
performSearch : function(query) {
// create the required interface elements
this.out = $('#search-results');
this.title = $('<h2>' + _('Searching') + '</h2>').appendTo(this.out);
this.dots = $('<span></span>').appendTo(this.title);
this.status = $('<p class="search-summary">&nbsp;</p>').appendTo(this.out);
this.output = $('<ul class="search"/>').appendTo(this.out);
$('#search-progress').text(_('Preparing search...'));
this.startPulse();
// index already loaded, the browser was quick!
if (this.hasIndex())
this.query(query);
else
this.deferQuery(query);
},
/**
* execute search (requires search index to be loaded)
*/
query : function(query) {
var i;
// stem the searchterms and add them to the correct list
var stemmer = new Stemmer();
var searchterms = [];
var excluded = [];
var hlterms = [];
var tmp = splitQuery(query);
var objectterms = [];
for (i = 0; i < tmp.length; i++) {
if (tmp[i] !== "") {
objectterms.push(tmp[i].toLowerCase());
}
if ($u.indexOf(stopwords, tmp[i].toLowerCase()) != -1 || tmp[i] === "") {
// skip this "word"
continue;
}
// stem the word
var word = stemmer.stemWord(tmp[i].toLowerCase());
// prevent stemmer from cutting word smaller than two chars
if(word.length < 3 && tmp[i].length >= 3) {
word = tmp[i];
}
var toAppend;
// select the correct list
if (word[0] == '-') {
toAppend = excluded;
word = word.substr(1);
}
else {
toAppend = searchterms;
hlterms.push(tmp[i].toLowerCase());
}
// only add if not already in the list
if (!$u.contains(toAppend, word))
toAppend.push(word);
}
var highlightstring = '?highlight=' + $.urlencode(hlterms.join(" "));
// console.debug('SEARCH: searching for:');
// console.info('required: ', searchterms);
// console.info('excluded: ', excluded);
// prepare search
var terms = this._index.terms;
var titleterms = this._index.titleterms;
// array of [filename, title, anchor, descr, score]
var results = [];
$('#search-progress').empty();
// lookup as object
for (i = 0; i < objectterms.length; i++) {
var others = [].concat(objectterms.slice(0, i),
objectterms.slice(i+1, objectterms.length));
results = results.concat(this.performObjectSearch(objectterms[i], others));
}
// lookup as search terms in fulltext
results = results.concat(this.performTermsSearch(searchterms, excluded, terms, titleterms));
// let the scorer override scores with a custom scoring function
if (Scorer.score) {
for (i = 0; i < results.length; i++)
results[i][4] = Scorer.score(results[i]);
}
// now sort the results by score (in opposite order of appearance, since the
// display function below uses pop() to retrieve items) and then
// alphabetically
results.sort(function(a, b) {
var left = a[4];
var right = b[4];
if (left > right) {
return 1;
} else if (left < right) {
return -1;
} else {
// same score: sort alphabetically
left = a[1].toLowerCase();
right = b[1].toLowerCase();
return (left > right) ? -1 : ((left < right) ? 1 : 0);
}
});
// for debugging
//Search.lastresults = results.slice(); // a copy
//console.info('search results:', Search.lastresults);
// print the results
var resultCount = results.length;
function displayNextItem() {
// results left, load the summary and display it
if (results.length) {
var item = results.pop();
var listItem = $('<li style="display:none"></li>');
var requestUrl = "";
var linkUrl = "";
if (DOCUMENTATION_OPTIONS.BUILDER === 'dirhtml') {
// dirhtml builder
var dirname = item[0] + '/';
if (dirname.match(/\/index\/$/)) {
dirname = dirname.substring(0, dirname.length-6);
} else if (dirname == 'index/') {
dirname = '';
}
requestUrl = DOCUMENTATION_OPTIONS.URL_ROOT + dirname;
linkUrl = requestUrl;
} else {
// normal html builders
requestUrl = DOCUMENTATION_OPTIONS.URL_ROOT + item[0] + DOCUMENTATION_OPTIONS.FILE_SUFFIX;
linkUrl = item[0] + DOCUMENTATION_OPTIONS.LINK_SUFFIX;
}
listItem.append($('<a/>').attr('href',
linkUrl +
highlightstring + item[2]).html(item[1]));
if (item[3]) {
listItem.append($('<span> (' + item[3] + ')</span>'));
Search.output.append(listItem);
listItem.slideDown(5, function() {
displayNextItem();
});
} else if (DOCUMENTATION_OPTIONS.HAS_SOURCE) {
$.ajax({url: requestUrl,
dataType: "text",
complete: function(jqxhr, textstatus) {
var data = jqxhr.responseText;
if (data !== '' && data !== undefined) {
listItem.append(Search.makeSearchSummary(data, searchterms, hlterms));
}
Search.output.append(listItem);
listItem.slideDown(5, function() {
displayNextItem();
});
}});
} else {
// no source available, just display title
Search.output.append(listItem);
listItem.slideDown(5, function() {
displayNextItem();
});
}
}
// search finished, update title and status message
else {
Search.stopPulse();
Search.title.text(_('Search Results'));
if (!resultCount)
Search.status.text(_('Your search did not match any documents. Please make sure that all words are spelled correctly and that you\'ve selected enough categories.'));
else
Search.status.text(_('Search finished, found %s page(s) matching the search query.').replace('%s', resultCount));
Search.status.fadeIn(500);
}
}
displayNextItem();
},
/**
* search for object names
*/
performObjectSearch : function(object, otherterms) {
var filenames = this._index.filenames;
var docnames = this._index.docnames;
var objects = this._index.objects;
var objnames = this._index.objnames;
var titles = this._index.titles;
var i;
var results = [];
for (var prefix in objects) {
for (var name in objects[prefix]) {
var fullname = (prefix ? prefix + '.' : '') + name;
var fullnameLower = fullname.toLowerCase()
if (fullnameLower.indexOf(object) > -1) {
var score = 0;
var parts = fullnameLower.split('.');
// check for different match types: exact matches of full name or
// "last name" (i.e. last dotted part)
if (fullnameLower == object || parts[parts.length - 1] == object) {
score += Scorer.objNameMatch;
// matches in last name
} else if (parts[parts.length - 1].indexOf(object) > -1) {
score += Scorer.objPartialMatch;
}
var match = objects[prefix][name];
var objname = objnames[match[1]][2];
var title = titles[match[0]];
// If more than one term searched for, we require other words to be
// found in the name/title/description
if (otherterms.length > 0) {
var haystack = (prefix + ' ' + name + ' ' +
objname + ' ' + title).toLowerCase();
var allfound = true;
for (i = 0; i < otherterms.length; i++) {
if (haystack.indexOf(otherterms[i]) == -1) {
allfound = false;
break;
}
}
if (!allfound) {
continue;
}
}
var descr = objname + _(', in ') + title;
var anchor = match[3];
if (anchor === '')
anchor = fullname;
else if (anchor == '-')
anchor = objnames[match[1]][1] + '-' + fullname;
// add custom score for some objects according to scorer
if (Scorer.objPrio.hasOwnProperty(match[2])) {
score += Scorer.objPrio[match[2]];
} else {
score += Scorer.objPrioDefault;
}
results.push([docnames[match[0]], fullname, '#'+anchor, descr, score, filenames[match[0]]]);
}
}
}
return results;
},
/**
* search for full-text terms in the index
*/
performTermsSearch : function(searchterms, excluded, terms, titleterms) {
var docnames = this._index.docnames;
var filenames = this._index.filenames;
var titles = this._index.titles;
var i, j, file;
var fileMap = {};
var scoreMap = {};
var results = [];
// perform the search on the required terms
for (i = 0; i < searchterms.length; i++) {
var word = searchterms[i];
var files = [];
var _o = [
{files: terms[word], score: Scorer.term},
{files: titleterms[word], score: Scorer.title}
];
// add support for partial matches
if (word.length > 2) {
for (var w in terms) {
if (w.match(word) && !terms[word]) {
_o.push({files: terms[w], score: Scorer.partialTerm})
}
}
for (var w in titleterms) {
if (w.match(word) && !titleterms[word]) {
_o.push({files: titleterms[w], score: Scorer.partialTitle})
}
}
}
// no match but word was a required one
if ($u.every(_o, function(o){return o.files === undefined;})) {
break;
}
// found search word in contents
$u.each(_o, function(o) {
var _files = o.files;
if (_files === undefined)
return
if (_files.length === undefined)
_files = [_files];
files = files.concat(_files);
// set score for the word in each file to Scorer.term
for (j = 0; j < _files.length; j++) {
file = _files[j];
if (!(file in scoreMap))
scoreMap[file] = {};
scoreMap[file][word] = o.score;
}
});
// create the mapping
for (j = 0; j < files.length; j++) {
file = files[j];
if (file in fileMap && fileMap[file].indexOf(word) === -1)
fileMap[file].push(word);
else
fileMap[file] = [word];
}
}
// now check if the files don't contain excluded terms
for (file in fileMap) {
var valid = true;
// check if all requirements are matched
var filteredTermCount = // as search terms with length < 3 are discarded: ignore
searchterms.filter(function(term){return term.length > 2}).length
if (
fileMap[file].length != searchterms.length &&
fileMap[file].length != filteredTermCount
) continue;
// ensure that none of the excluded terms is in the search result
for (i = 0; i < excluded.length; i++) {
if (terms[excluded[i]] == file ||
titleterms[excluded[i]] == file ||
$u.contains(terms[excluded[i]] || [], file) ||
$u.contains(titleterms[excluded[i]] || [], file)) {
valid = false;
break;
}
}
// if we have still a valid result we can add it to the result list
if (valid) {
// select one (max) score for the file.
// for better ranking, we should calculate ranking by using words statistics like basic tf-idf...
var score = $u.max($u.map(fileMap[file], function(w){return scoreMap[file][w]}));
results.push([docnames[file], titles[file], '', null, score, filenames[file]]);
}
}
return results;
},
/**
* helper function to return a node containing the
* search summary for a given text. keywords is a list
* of stemmed words, hlwords is the list of normal, unstemmed
* words. the first one is used to find the occurrence, the
* latter for highlighting it.
*/
makeSearchSummary : function(htmlText, keywords, hlwords) {
var text = Search.htmlToText(htmlText);
var textLower = text.toLowerCase();
var start = 0;
$.each(keywords, function() {
var i = textLower.indexOf(this.toLowerCase());
if (i > -1)
start = i;
});
start = Math.max(start - 120, 0);
var excerpt = ((start > 0) ? '...' : '') +
$.trim(text.substr(start, 240)) +
((start + 240 - text.length) ? '...' : '');
var rv = $('<div class="context"></div>').text(excerpt);
$.each(hlwords, function() {
rv = rv.highlightText(this, 'highlighted');
});
return rv;
}
};
$(document).ready(function() {
Search.init();
});

View file

@ -1,159 +0,0 @@
/*
* sidebar.js
* ~~~~~~~~~~
*
* This script makes the Sphinx sidebar collapsible.
*
* .sphinxsidebar contains .sphinxsidebarwrapper. This script adds
* in .sphixsidebar, after .sphinxsidebarwrapper, the #sidebarbutton
* used to collapse and expand the sidebar.
*
* When the sidebar is collapsed the .sphinxsidebarwrapper is hidden
* and the width of the sidebar and the margin-left of the document
* are decreased. When the sidebar is expanded the opposite happens.
* This script saves a per-browser/per-session cookie used to
* remember the position of the sidebar among the pages.
* Once the browser is closed the cookie is deleted and the position
* reset to the default (expanded).
*
* :copyright: Copyright 2007-2020 by the Sphinx team, see AUTHORS.
* :license: BSD, see LICENSE for details.
*
*/
$(function() {
// global elements used by the functions.
// the 'sidebarbutton' element is defined as global after its
// creation, in the add_sidebar_button function
var bodywrapper = $('.bodywrapper');
var sidebar = $('.sphinxsidebar');
var sidebarwrapper = $('.sphinxsidebarwrapper');
// for some reason, the document has no sidebar; do not run into errors
if (!sidebar.length) return;
// original margin-left of the bodywrapper and width of the sidebar
// with the sidebar expanded
var bw_margin_expanded = bodywrapper.css('margin-left');
var ssb_width_expanded = sidebar.width();
// margin-left of the bodywrapper and width of the sidebar
// with the sidebar collapsed
var bw_margin_collapsed = '.8em';
var ssb_width_collapsed = '.8em';
// colors used by the current theme
var dark_color = $('.related').css('background-color');
var light_color = $('.document').css('background-color');
function sidebar_is_collapsed() {
return sidebarwrapper.is(':not(:visible)');
}
function toggle_sidebar() {
if (sidebar_is_collapsed())
expand_sidebar();
else
collapse_sidebar();
}
function collapse_sidebar() {
sidebarwrapper.hide();
sidebar.css('width', ssb_width_collapsed);
bodywrapper.css('margin-left', bw_margin_collapsed);
sidebarbutton.css({
'margin-left': '0',
'height': bodywrapper.height()
});
sidebarbutton.find('span').text('»');
sidebarbutton.attr('title', _('Expand sidebar'));
document.cookie = 'sidebar=collapsed';
}
function expand_sidebar() {
bodywrapper.css('margin-left', bw_margin_expanded);
sidebar.css('width', ssb_width_expanded);
sidebarwrapper.show();
sidebarbutton.css({
'margin-left': ssb_width_expanded-12,
'height': bodywrapper.height()
});
sidebarbutton.find('span').text('«');
sidebarbutton.attr('title', _('Collapse sidebar'));
document.cookie = 'sidebar=expanded';
}
function add_sidebar_button() {
sidebarwrapper.css({
'float': 'left',
'margin-right': '0',
'width': ssb_width_expanded - 28
});
// create the button
sidebar.append(
'<div id="sidebarbutton"><span>&laquo;</span></div>'
);
var sidebarbutton = $('#sidebarbutton');
light_color = sidebarbutton.css('background-color');
// find the height of the viewport to center the '<<' in the page
var viewport_height;
if (window.innerHeight)
viewport_height = window.innerHeight;
else
viewport_height = $(window).height();
sidebarbutton.find('span').css({
'display': 'block',
'margin-top': (viewport_height - sidebar.position().top - 20) / 2
});
sidebarbutton.click(toggle_sidebar);
sidebarbutton.attr('title', _('Collapse sidebar'));
sidebarbutton.css({
'color': '#FFFFFF',
'border-left': '1px solid ' + dark_color,
'font-size': '1.2em',
'cursor': 'pointer',
'height': bodywrapper.height(),
'padding-top': '1px',
'margin-left': ssb_width_expanded - 12
});
sidebarbutton.hover(
function () {
$(this).css('background-color', dark_color);
},
function () {
$(this).css('background-color', light_color);
}
);
}
function set_position_from_cookie() {
if (!document.cookie)
return;
var items = document.cookie.split(';');
for(var k=0; k<items.length; k++) {
var key_val = items[k].split('=');
var key = key_val[0].replace(/ /, ""); // strip leading spaces
if (key == 'sidebar') {
var value = key_val[1];
if ((value == 'collapsed') && (!sidebar_is_collapsed()))
collapse_sidebar();
else if ((value == 'expanded') && (sidebar_is_collapsed()))
expand_sidebar();
}
}
}
add_sidebar_button();
var sidebarbutton = $('#sidebarbutton');
set_position_from_cookie();
});

View file

@ -1,999 +0,0 @@
// Underscore.js 1.3.1
// (c) 2009-2012 Jeremy Ashkenas, DocumentCloud Inc.
// Underscore is freely distributable under the MIT license.
// Portions of Underscore are inspired or borrowed from Prototype,
// Oliver Steele's Functional, and John Resig's Micro-Templating.
// For all details and documentation:
// http://documentcloud.github.com/underscore
(function() {
// Baseline setup
// --------------
// Establish the root object, `window` in the browser, or `global` on the server.
var root = this;
// Save the previous value of the `_` variable.
var previousUnderscore = root._;
// Establish the object that gets returned to break out of a loop iteration.
var breaker = {};
// Save bytes in the minified (but not gzipped) version:
var ArrayProto = Array.prototype, ObjProto = Object.prototype, FuncProto = Function.prototype;
// Create quick reference variables for speed access to core prototypes.
var slice = ArrayProto.slice,
unshift = ArrayProto.unshift,
toString = ObjProto.toString,
hasOwnProperty = ObjProto.hasOwnProperty;
// All **ECMAScript 5** native function implementations that we hope to use
// are declared here.
var
nativeForEach = ArrayProto.forEach,
nativeMap = ArrayProto.map,
nativeReduce = ArrayProto.reduce,
nativeReduceRight = ArrayProto.reduceRight,
nativeFilter = ArrayProto.filter,
nativeEvery = ArrayProto.every,
nativeSome = ArrayProto.some,
nativeIndexOf = ArrayProto.indexOf,
nativeLastIndexOf = ArrayProto.lastIndexOf,
nativeIsArray = Array.isArray,
nativeKeys = Object.keys,
nativeBind = FuncProto.bind;
// Create a safe reference to the Underscore object for use below.
var _ = function(obj) { return new wrapper(obj); };
// Export the Underscore object for **Node.js**, with
// backwards-compatibility for the old `require()` API. If we're in
// the browser, add `_` as a global object via a string identifier,
// for Closure Compiler "advanced" mode.
if (typeof exports !== 'undefined') {
if (typeof module !== 'undefined' && module.exports) {
exports = module.exports = _;
}
exports._ = _;
} else {
root['_'] = _;
}
// Current version.
_.VERSION = '1.3.1';
// Collection Functions
// --------------------
// The cornerstone, an `each` implementation, aka `forEach`.
// Handles objects with the built-in `forEach`, arrays, and raw objects.
// Delegates to **ECMAScript 5**'s native `forEach` if available.
var each = _.each = _.forEach = function(obj, iterator, context) {
if (obj == null) return;
if (nativeForEach && obj.forEach === nativeForEach) {
obj.forEach(iterator, context);
} else if (obj.length === +obj.length) {
for (var i = 0, l = obj.length; i < l; i++) {
if (i in obj && iterator.call(context, obj[i], i, obj) === breaker) return;
}
} else {
for (var key in obj) {
if (_.has(obj, key)) {
if (iterator.call(context, obj[key], key, obj) === breaker) return;
}
}
}
};
// Return the results of applying the iterator to each element.
// Delegates to **ECMAScript 5**'s native `map` if available.
_.map = _.collect = function(obj, iterator, context) {
var results = [];
if (obj == null) return results;
if (nativeMap && obj.map === nativeMap) return obj.map(iterator, context);
each(obj, function(value, index, list) {
results[results.length] = iterator.call(context, value, index, list);
});
if (obj.length === +obj.length) results.length = obj.length;
return results;
};
// **Reduce** builds up a single result from a list of values, aka `inject`,
// or `foldl`. Delegates to **ECMAScript 5**'s native `reduce` if available.
_.reduce = _.foldl = _.inject = function(obj, iterator, memo, context) {
var initial = arguments.length > 2;
if (obj == null) obj = [];
if (nativeReduce && obj.reduce === nativeReduce) {
if (context) iterator = _.bind(iterator, context);
return initial ? obj.reduce(iterator, memo) : obj.reduce(iterator);
}
each(obj, function(value, index, list) {
if (!initial) {
memo = value;
initial = true;
} else {
memo = iterator.call(context, memo, value, index, list);
}
});
if (!initial) throw new TypeError('Reduce of empty array with no initial value');
return memo;
};
// The right-associative version of reduce, also known as `foldr`.
// Delegates to **ECMAScript 5**'s native `reduceRight` if available.
_.reduceRight = _.foldr = function(obj, iterator, memo, context) {
var initial = arguments.length > 2;
if (obj == null) obj = [];
if (nativeReduceRight && obj.reduceRight === nativeReduceRight) {
if (context) iterator = _.bind(iterator, context);
return initial ? obj.reduceRight(iterator, memo) : obj.reduceRight(iterator);
}
var reversed = _.toArray(obj).reverse();
if (context && !initial) iterator = _.bind(iterator, context);
return initial ? _.reduce(reversed, iterator, memo, context) : _.reduce(reversed, iterator);
};
// Return the first value which passes a truth test. Aliased as `detect`.
_.find = _.detect = function(obj, iterator, context) {
var result;
any(obj, function(value, index, list) {
if (iterator.call(context, value, index, list)) {
result = value;
return true;
}
});
return result;
};
// Return all the elements that pass a truth test.
// Delegates to **ECMAScript 5**'s native `filter` if available.
// Aliased as `select`.
_.filter = _.select = function(obj, iterator, context) {
var results = [];
if (obj == null) return results;
if (nativeFilter && obj.filter === nativeFilter) return obj.filter(iterator, context);
each(obj, function(value, index, list) {
if (iterator.call(context, value, index, list)) results[results.length] = value;
});
return results;
};
// Return all the elements for which a truth test fails.
_.reject = function(obj, iterator, context) {
var results = [];
if (obj == null) return results;
each(obj, function(value, index, list) {
if (!iterator.call(context, value, index, list)) results[results.length] = value;
});
return results;
};
// Determine whether all of the elements match a truth test.
// Delegates to **ECMAScript 5**'s native `every` if available.
// Aliased as `all`.
_.every = _.all = function(obj, iterator, context) {
var result = true;
if (obj == null) return result;
if (nativeEvery && obj.every === nativeEvery) return obj.every(iterator, context);
each(obj, function(value, index, list) {
if (!(result = result && iterator.call(context, value, index, list))) return breaker;
});
return result;
};
// Determine if at least one element in the object matches a truth test.
// Delegates to **ECMAScript 5**'s native `some` if available.
// Aliased as `any`.
var any = _.some = _.any = function(obj, iterator, context) {
iterator || (iterator = _.identity);
var result = false;
if (obj == null) return result;
if (nativeSome && obj.some === nativeSome) return obj.some(iterator, context);
each(obj, function(value, index, list) {
if (result || (result = iterator.call(context, value, index, list))) return breaker;
});
return !!result;
};
// Determine if a given value is included in the array or object using `===`.
// Aliased as `contains`.
_.include = _.contains = function(obj, target) {
var found = false;
if (obj == null) return found;
if (nativeIndexOf && obj.indexOf === nativeIndexOf) return obj.indexOf(target) != -1;
found = any(obj, function(value) {
return value === target;
});
return found;
};
// Invoke a method (with arguments) on every item in a collection.
_.invoke = function(obj, method) {
var args = slice.call(arguments, 2);
return _.map(obj, function(value) {
return (_.isFunction(method) ? method || value : value[method]).apply(value, args);
});
};
// Convenience version of a common use case of `map`: fetching a property.
_.pluck = function(obj, key) {
return _.map(obj, function(value){ return value[key]; });
};
// Return the maximum element or (element-based computation).
_.max = function(obj, iterator, context) {
if (!iterator && _.isArray(obj)) return Math.max.apply(Math, obj);
if (!iterator && _.isEmpty(obj)) return -Infinity;
var result = {computed : -Infinity};
each(obj, function(value, index, list) {
var computed = iterator ? iterator.call(context, value, index, list) : value;
computed >= result.computed && (result = {value : value, computed : computed});
});
return result.value;
};
// Return the minimum element (or element-based computation).
_.min = function(obj, iterator, context) {
if (!iterator && _.isArray(obj)) return Math.min.apply(Math, obj);
if (!iterator && _.isEmpty(obj)) return Infinity;
var result = {computed : Infinity};
each(obj, function(value, index, list) {
var computed = iterator ? iterator.call(context, value, index, list) : value;
computed < result.computed && (result = {value : value, computed : computed});
});
return result.value;
};
// Shuffle an array.
_.shuffle = function(obj) {
var shuffled = [], rand;
each(obj, function(value, index, list) {
if (index == 0) {
shuffled[0] = value;
} else {
rand = Math.floor(Math.random() * (index + 1));
shuffled[index] = shuffled[rand];
shuffled[rand] = value;
}
});
return shuffled;
};
// Sort the object's values by a criterion produced by an iterator.
_.sortBy = function(obj, iterator, context) {
return _.pluck(_.map(obj, function(value, index, list) {
return {
value : value,
criteria : iterator.call(context, value, index, list)
};
}).sort(function(left, right) {
var a = left.criteria, b = right.criteria;
return a < b ? -1 : a > b ? 1 : 0;
}), 'value');
};
// Groups the object's values by a criterion. Pass either a string attribute
// to group by, or a function that returns the criterion.
_.groupBy = function(obj, val) {
var result = {};
var iterator = _.isFunction(val) ? val : function(obj) { return obj[val]; };
each(obj, function(value, index) {
var key = iterator(value, index);
(result[key] || (result[key] = [])).push(value);
});
return result;
};
// Use a comparator function to figure out at what index an object should
// be inserted so as to maintain order. Uses binary search.
_.sortedIndex = function(array, obj, iterator) {
iterator || (iterator = _.identity);
var low = 0, high = array.length;
while (low < high) {
var mid = (low + high) >> 1;
iterator(array[mid]) < iterator(obj) ? low = mid + 1 : high = mid;
}
return low;
};
// Safely convert anything iterable into a real, live array.
_.toArray = function(iterable) {
if (!iterable) return [];
if (iterable.toArray) return iterable.toArray();
if (_.isArray(iterable)) return slice.call(iterable);
if (_.isArguments(iterable)) return slice.call(iterable);
return _.values(iterable);
};
// Return the number of elements in an object.
_.size = function(obj) {
return _.toArray(obj).length;
};
// Array Functions
// ---------------
// Get the first element of an array. Passing **n** will return the first N
// values in the array. Aliased as `head`. The **guard** check allows it to work
// with `_.map`.
_.first = _.head = function(array, n, guard) {
return (n != null) && !guard ? slice.call(array, 0, n) : array[0];
};
// Returns everything but the last entry of the array. Especcialy useful on
// the arguments object. Passing **n** will return all the values in
// the array, excluding the last N. The **guard** check allows it to work with
// `_.map`.
_.initial = function(array, n, guard) {
return slice.call(array, 0, array.length - ((n == null) || guard ? 1 : n));
};
// Get the last element of an array. Passing **n** will return the last N
// values in the array. The **guard** check allows it to work with `_.map`.
_.last = function(array, n, guard) {
if ((n != null) && !guard) {
return slice.call(array, Math.max(array.length - n, 0));
} else {
return array[array.length - 1];
}
};
// Returns everything but the first entry of the array. Aliased as `tail`.
// Especially useful on the arguments object. Passing an **index** will return
// the rest of the values in the array from that index onward. The **guard**
// check allows it to work with `_.map`.
_.rest = _.tail = function(array, index, guard) {
return slice.call(array, (index == null) || guard ? 1 : index);
};
// Trim out all falsy values from an array.
_.compact = function(array) {
return _.filter(array, function(value){ return !!value; });
};
// Return a completely flattened version of an array.
_.flatten = function(array, shallow) {
return _.reduce(array, function(memo, value) {
if (_.isArray(value)) return memo.concat(shallow ? value : _.flatten(value));
memo[memo.length] = value;
return memo;
}, []);
};
// Return a version of the array that does not contain the specified value(s).
_.without = function(array) {
return _.difference(array, slice.call(arguments, 1));
};
// Produce a duplicate-free version of the array. If the array has already
// been sorted, you have the option of using a faster algorithm.
// Aliased as `unique`.
_.uniq = _.unique = function(array, isSorted, iterator) {
var initial = iterator ? _.map(array, iterator) : array;
var result = [];
_.reduce(initial, function(memo, el, i) {
if (0 == i || (isSorted === true ? _.last(memo) != el : !_.include(memo, el))) {
memo[memo.length] = el;
result[result.length] = array[i];
}
return memo;
}, []);
return result;
};
// Produce an array that contains the union: each distinct element from all of
// the passed-in arrays.
_.union = function() {
return _.uniq(_.flatten(arguments, true));
};
// Produce an array that contains every item shared between all the
// passed-in arrays. (Aliased as "intersect" for back-compat.)
_.intersection = _.intersect = function(array) {
var rest = slice.call(arguments, 1);
return _.filter(_.uniq(array), function(item) {
return _.every(rest, function(other) {
return _.indexOf(other, item) >= 0;
});
});
};
// Take the difference between one array and a number of other arrays.
// Only the elements present in just the first array will remain.
_.difference = function(array) {
var rest = _.flatten(slice.call(arguments, 1));
return _.filter(array, function(value){ return !_.include(rest, value); });
};
// Zip together multiple lists into a single array -- elements that share
// an index go together.
_.zip = function() {
var args = slice.call(arguments);
var length = _.max(_.pluck(args, 'length'));
var results = new Array(length);
for (var i = 0; i < length; i++) results[i] = _.pluck(args, "" + i);
return results;
};
// If the browser doesn't supply us with indexOf (I'm looking at you, **MSIE**),
// we need this function. Return the position of the first occurrence of an
// item in an array, or -1 if the item is not included in the array.
// Delegates to **ECMAScript 5**'s native `indexOf` if available.
// If the array is large and already in sort order, pass `true`
// for **isSorted** to use binary search.
_.indexOf = function(array, item, isSorted) {
if (array == null) return -1;
var i, l;
if (isSorted) {
i = _.sortedIndex(array, item);
return array[i] === item ? i : -1;
}
if (nativeIndexOf && array.indexOf === nativeIndexOf) return array.indexOf(item);
for (i = 0, l = array.length; i < l; i++) if (i in array && array[i] === item) return i;
return -1;
};
// Delegates to **ECMAScript 5**'s native `lastIndexOf` if available.
_.lastIndexOf = function(array, item) {
if (array == null) return -1;
if (nativeLastIndexOf && array.lastIndexOf === nativeLastIndexOf) return array.lastIndexOf(item);
var i = array.length;
while (i--) if (i in array && array[i] === item) return i;
return -1;
};
// Generate an integer Array containing an arithmetic progression. A port of
// the native Python `range()` function. See
// [the Python documentation](http://docs.python.org/library/functions.html#range).
_.range = function(start, stop, step) {
if (arguments.length <= 1) {
stop = start || 0;
start = 0;
}
step = arguments[2] || 1;
var len = Math.max(Math.ceil((stop - start) / step), 0);
var idx = 0;
var range = new Array(len);
while(idx < len) {
range[idx++] = start;
start += step;
}
return range;
};
// Function (ahem) Functions
// ------------------
// Reusable constructor function for prototype setting.
var ctor = function(){};
// Create a function bound to a given object (assigning `this`, and arguments,
// optionally). Binding with arguments is also known as `curry`.
// Delegates to **ECMAScript 5**'s native `Function.bind` if available.
// We check for `func.bind` first, to fail fast when `func` is undefined.
_.bind = function bind(func, context) {
var bound, args;
if (func.bind === nativeBind && nativeBind) return nativeBind.apply(func, slice.call(arguments, 1));
if (!_.isFunction(func)) throw new TypeError;
args = slice.call(arguments, 2);
return bound = function() {
if (!(this instanceof bound)) return func.apply(context, args.concat(slice.call(arguments)));
ctor.prototype = func.prototype;
var self = new ctor;
var result = func.apply(self, args.concat(slice.call(arguments)));
if (Object(result) === result) return result;
return self;
};
};
// Bind all of an object's methods to that object. Useful for ensuring that
// all callbacks defined on an object belong to it.
_.bindAll = function(obj) {
var funcs = slice.call(arguments, 1);
if (funcs.length == 0) funcs = _.functions(obj);
each(funcs, function(f) { obj[f] = _.bind(obj[f], obj); });
return obj;
};
// Memoize an expensive function by storing its results.
_.memoize = function(func, hasher) {
var memo = {};
hasher || (hasher = _.identity);
return function() {
var key = hasher.apply(this, arguments);
return _.has(memo, key) ? memo[key] : (memo[key] = func.apply(this, arguments));
};
};
// Delays a function for the given number of milliseconds, and then calls
// it with the arguments supplied.
_.delay = function(func, wait) {
var args = slice.call(arguments, 2);
return setTimeout(function(){ return func.apply(func, args); }, wait);
};
// Defers a function, scheduling it to run after the current call stack has
// cleared.
_.defer = function(func) {
return _.delay.apply(_, [func, 1].concat(slice.call(arguments, 1)));
};
// Returns a function, that, when invoked, will only be triggered at most once
// during a given window of time.
_.throttle = function(func, wait) {
var context, args, timeout, throttling, more;
var whenDone = _.debounce(function(){ more = throttling = false; }, wait);
return function() {
context = this; args = arguments;
var later = function() {
timeout = null;
if (more) func.apply(context, args);
whenDone();
};
if (!timeout) timeout = setTimeout(later, wait);
if (throttling) {
more = true;
} else {
func.apply(context, args);
}
whenDone();
throttling = true;
};
};
// Returns a function, that, as long as it continues to be invoked, will not
// be triggered. The function will be called after it stops being called for
// N milliseconds.
_.debounce = function(func, wait) {
var timeout;
return function() {
var context = this, args = arguments;
var later = function() {
timeout = null;
func.apply(context, args);
};
clearTimeout(timeout);
timeout = setTimeout(later, wait);
};
};
// Returns a function that will be executed at most one time, no matter how
// often you call it. Useful for lazy initialization.
_.once = function(func) {
var ran = false, memo;
return function() {
if (ran) return memo;
ran = true;
return memo = func.apply(this, arguments);
};
};
// Returns the first function passed as an argument to the second,
// allowing you to adjust arguments, run code before and after, and
// conditionally execute the original function.
_.wrap = function(func, wrapper) {
return function() {
var args = [func].concat(slice.call(arguments, 0));
return wrapper.apply(this, args);
};
};
// Returns a function that is the composition of a list of functions, each
// consuming the return value of the function that follows.
_.compose = function() {
var funcs = arguments;
return function() {
var args = arguments;
for (var i = funcs.length - 1; i >= 0; i--) {
args = [funcs[i].apply(this, args)];
}
return args[0];
};
};
// Returns a function that will only be executed after being called N times.
_.after = function(times, func) {
if (times <= 0) return func();
return function() {
if (--times < 1) { return func.apply(this, arguments); }
};
};
// Object Functions
// ----------------
// Retrieve the names of an object's properties.
// Delegates to **ECMAScript 5**'s native `Object.keys`
_.keys = nativeKeys || function(obj) {
if (obj !== Object(obj)) throw new TypeError('Invalid object');
var keys = [];
for (var key in obj) if (_.has(obj, key)) keys[keys.length] = key;
return keys;
};
// Retrieve the values of an object's properties.
_.values = function(obj) {
return _.map(obj, _.identity);
};
// Return a sorted list of the function names available on the object.
// Aliased as `methods`
_.functions = _.methods = function(obj) {
var names = [];
for (var key in obj) {
if (_.isFunction(obj[key])) names.push(key);
}
return names.sort();
};
// Extend a given object with all the properties in passed-in object(s).
_.extend = function(obj) {
each(slice.call(arguments, 1), function(source) {
for (var prop in source) {
obj[prop] = source[prop];
}
});
return obj;
};
// Fill in a given object with default properties.
_.defaults = function(obj) {
each(slice.call(arguments, 1), function(source) {
for (var prop in source) {
if (obj[prop] == null) obj[prop] = source[prop];
}
});
return obj;
};
// Create a (shallow-cloned) duplicate of an object.
_.clone = function(obj) {
if (!_.isObject(obj)) return obj;
return _.isArray(obj) ? obj.slice() : _.extend({}, obj);
};
// Invokes interceptor with the obj, and then returns obj.
// The primary purpose of this method is to "tap into" a method chain, in
// order to perform operations on intermediate results within the chain.
_.tap = function(obj, interceptor) {
interceptor(obj);
return obj;
};
// Internal recursive comparison function.
function eq(a, b, stack) {
// Identical objects are equal. `0 === -0`, but they aren't identical.
// See the Harmony `egal` proposal: http://wiki.ecmascript.org/doku.php?id=harmony:egal.
if (a === b) return a !== 0 || 1 / a == 1 / b;
// A strict comparison is necessary because `null == undefined`.
if (a == null || b == null) return a === b;
// Unwrap any wrapped objects.
if (a._chain) a = a._wrapped;
if (b._chain) b = b._wrapped;
// Invoke a custom `isEqual` method if one is provided.
if (a.isEqual && _.isFunction(a.isEqual)) return a.isEqual(b);
if (b.isEqual && _.isFunction(b.isEqual)) return b.isEqual(a);
// Compare `[[Class]]` names.
var className = toString.call(a);
if (className != toString.call(b)) return false;
switch (className) {
// Strings, numbers, dates, and booleans are compared by value.
case '[object String]':
// Primitives and their corresponding object wrappers are equivalent; thus, `"5"` is
// equivalent to `new String("5")`.
return a == String(b);
case '[object Number]':
// `NaN`s are equivalent, but non-reflexive. An `egal` comparison is performed for
// other numeric values.
return a != +a ? b != +b : (a == 0 ? 1 / a == 1 / b : a == +b);
case '[object Date]':
case '[object Boolean]':
// Coerce dates and booleans to numeric primitive values. Dates are compared by their
// millisecond representations. Note that invalid dates with millisecond representations
// of `NaN` are not equivalent.
return +a == +b;
// RegExps are compared by their source patterns and flags.
case '[object RegExp]':
return a.source == b.source &&
a.global == b.global &&
a.multiline == b.multiline &&
a.ignoreCase == b.ignoreCase;
}
if (typeof a != 'object' || typeof b != 'object') return false;
// Assume equality for cyclic structures. The algorithm for detecting cyclic
// structures is adapted from ES 5.1 section 15.12.3, abstract operation `JO`.
var length = stack.length;
while (length--) {
// Linear search. Performance is inversely proportional to the number of
// unique nested structures.
if (stack[length] == a) return true;
}
// Add the first object to the stack of traversed objects.
stack.push(a);
var size = 0, result = true;
// Recursively compare objects and arrays.
if (className == '[object Array]') {
// Compare array lengths to determine if a deep comparison is necessary.
size = a.length;
result = size == b.length;
if (result) {
// Deep compare the contents, ignoring non-numeric properties.
while (size--) {
// Ensure commutative equality for sparse arrays.
if (!(result = size in a == size in b && eq(a[size], b[size], stack))) break;
}
}
} else {
// Objects with different constructors are not equivalent.
if ('constructor' in a != 'constructor' in b || a.constructor != b.constructor) return false;
// Deep compare objects.
for (var key in a) {
if (_.has(a, key)) {
// Count the expected number of properties.
size++;
// Deep compare each member.
if (!(result = _.has(b, key) && eq(a[key], b[key], stack))) break;
}
}
// Ensure that both objects contain the same number of properties.
if (result) {
for (key in b) {
if (_.has(b, key) && !(size--)) break;
}
result = !size;
}
}
// Remove the first object from the stack of traversed objects.
stack.pop();
return result;
}
// Perform a deep comparison to check if two objects are equal.
_.isEqual = function(a, b) {
return eq(a, b, []);
};
// Is a given array, string, or object empty?
// An "empty" object has no enumerable own-properties.
_.isEmpty = function(obj) {
if (_.isArray(obj) || _.isString(obj)) return obj.length === 0;
for (var key in obj) if (_.has(obj, key)) return false;
return true;
};
// Is a given value a DOM element?
_.isElement = function(obj) {
return !!(obj && obj.nodeType == 1);
};
// Is a given value an array?
// Delegates to ECMA5's native Array.isArray
_.isArray = nativeIsArray || function(obj) {
return toString.call(obj) == '[object Array]';
};
// Is a given variable an object?
_.isObject = function(obj) {
return obj === Object(obj);
};
// Is a given variable an arguments object?
_.isArguments = function(obj) {
return toString.call(obj) == '[object Arguments]';
};
if (!_.isArguments(arguments)) {
_.isArguments = function(obj) {
return !!(obj && _.has(obj, 'callee'));
};
}
// Is a given value a function?
_.isFunction = function(obj) {
return toString.call(obj) == '[object Function]';
};
// Is a given value a string?
_.isString = function(obj) {
return toString.call(obj) == '[object String]';
};
// Is a given value a number?
_.isNumber = function(obj) {
return toString.call(obj) == '[object Number]';
};
// Is the given value `NaN`?
_.isNaN = function(obj) {
// `NaN` is the only value for which `===` is not reflexive.
return obj !== obj;
};
// Is a given value a boolean?
_.isBoolean = function(obj) {
return obj === true || obj === false || toString.call(obj) == '[object Boolean]';
};
// Is a given value a date?
_.isDate = function(obj) {
return toString.call(obj) == '[object Date]';
};
// Is the given value a regular expression?
_.isRegExp = function(obj) {
return toString.call(obj) == '[object RegExp]';
};
// Is a given value equal to null?
_.isNull = function(obj) {
return obj === null;
};
// Is a given variable undefined?
_.isUndefined = function(obj) {
return obj === void 0;
};
// Has own property?
_.has = function(obj, key) {
return hasOwnProperty.call(obj, key);
};
// Utility Functions
// -----------------
// Run Underscore.js in *noConflict* mode, returning the `_` variable to its
// previous owner. Returns a reference to the Underscore object.
_.noConflict = function() {
root._ = previousUnderscore;
return this;
};
// Keep the identity function around for default iterators.
_.identity = function(value) {
return value;
};
// Run a function **n** times.
_.times = function (n, iterator, context) {
for (var i = 0; i < n; i++) iterator.call(context, i);
};
// Escape a string for HTML interpolation.
_.escape = function(string) {
return (''+string).replace(/&/g, '&amp;').replace(/</g, '&lt;').replace(/>/g, '&gt;').replace(/"/g, '&quot;').replace(/'/g, '&#x27;').replace(/\//g,'&#x2F;');
};
// Add your own custom functions to the Underscore object, ensuring that
// they're correctly added to the OOP wrapper as well.
_.mixin = function(obj) {
each(_.functions(obj), function(name){
addToWrapper(name, _[name] = obj[name]);
});
};
// Generate a unique integer id (unique within the entire client session).
// Useful for temporary DOM ids.
var idCounter = 0;
_.uniqueId = function(prefix) {
var id = idCounter++;
return prefix ? prefix + id : id;
};
// By default, Underscore uses ERB-style template delimiters, change the
// following template settings to use alternative delimiters.
_.templateSettings = {
evaluate : /<%([\s\S]+?)%>/g,
interpolate : /<%=([\s\S]+?)%>/g,
escape : /<%-([\s\S]+?)%>/g
};
// When customizing `templateSettings`, if you don't want to define an
// interpolation, evaluation or escaping regex, we need one that is
// guaranteed not to match.
var noMatch = /.^/;
// Within an interpolation, evaluation, or escaping, remove HTML escaping
// that had been previously added.
var unescape = function(code) {
return code.replace(/\\\\/g, '\\').replace(/\\'/g, "'");
};
// JavaScript micro-templating, similar to John Resig's implementation.
// Underscore templating handles arbitrary delimiters, preserves whitespace,
// and correctly escapes quotes within interpolated code.
_.template = function(str, data) {
var c = _.templateSettings;
var tmpl = 'var __p=[],print=function(){__p.push.apply(__p,arguments);};' +
'with(obj||{}){__p.push(\'' +
str.replace(/\\/g, '\\\\')
.replace(/'/g, "\\'")
.replace(c.escape || noMatch, function(match, code) {
return "',_.escape(" + unescape(code) + "),'";
})
.replace(c.interpolate || noMatch, function(match, code) {
return "'," + unescape(code) + ",'";
})
.replace(c.evaluate || noMatch, function(match, code) {
return "');" + unescape(code).replace(/[\r\n\t]/g, ' ') + ";__p.push('";
})
.replace(/\r/g, '\\r')
.replace(/\n/g, '\\n')
.replace(/\t/g, '\\t')
+ "');}return __p.join('');";
var func = new Function('obj', '_', tmpl);
if (data) return func(data, _);
return function(data) {
return func.call(this, data, _);
};
};
// Add a "chain" function, which will delegate to the wrapper.
_.chain = function(obj) {
return _(obj).chain();
};
// The OOP Wrapper
// ---------------
// If Underscore is called as a function, it returns a wrapped object that
// can be used OO-style. This wrapper holds altered versions of all the
// underscore functions. Wrapped objects may be chained.
var wrapper = function(obj) { this._wrapped = obj; };
// Expose `wrapper.prototype` as `_.prototype`
_.prototype = wrapper.prototype;
// Helper function to continue chaining intermediate results.
var result = function(obj, chain) {
return chain ? _(obj).chain() : obj;
};
// A method to easily add functions to the OOP wrapper.
var addToWrapper = function(name, func) {
wrapper.prototype[name] = function() {
var args = slice.call(arguments);
unshift.call(args, this._wrapped);
return result(func.apply(_, args), this._chain);
};
};
// Add all of the Underscore functions to the wrapper object.
_.mixin(_);
// Add all mutator Array functions to the wrapper.
each(['pop', 'push', 'reverse', 'shift', 'sort', 'splice', 'unshift'], function(name) {
var method = ArrayProto[name];
wrapper.prototype[name] = function() {
var wrapped = this._wrapped;
method.apply(wrapped, arguments);
var length = wrapped.length;
if ((name == 'shift' || name == 'splice') && length === 0) delete wrapped[0];
return result(wrapped, this._chain);
};
});
// Add all accessor Array functions to the wrapper.
each(['concat', 'join', 'slice'], function(name) {
var method = ArrayProto[name];
wrapper.prototype[name] = function() {
return result(method.apply(this._wrapped, arguments), this._chain);
};
});
// Start chaining a wrapped Underscore object.
wrapper.prototype.chain = function() {
this._chain = true;
return this;
};
// Extracts the result from a wrapped and chained object.
wrapper.prototype.value = function() {
return this._wrapped;
};
}).call(this);

View file

@ -1,31 +0,0 @@
// Underscore.js 1.3.1
// (c) 2009-2012 Jeremy Ashkenas, DocumentCloud Inc.
// Underscore is freely distributable under the MIT license.
// Portions of Underscore are inspired or borrowed from Prototype,
// Oliver Steele's Functional, and John Resig's Micro-Templating.
// For all details and documentation:
// http://documentcloud.github.com/underscore
(function(){function q(a,c,d){if(a===c)return a!==0||1/a==1/c;if(a==null||c==null)return a===c;if(a._chain)a=a._wrapped;if(c._chain)c=c._wrapped;if(a.isEqual&&b.isFunction(a.isEqual))return a.isEqual(c);if(c.isEqual&&b.isFunction(c.isEqual))return c.isEqual(a);var e=l.call(a);if(e!=l.call(c))return false;switch(e){case "[object String]":return a==String(c);case "[object Number]":return a!=+a?c!=+c:a==0?1/a==1/c:a==+c;case "[object Date]":case "[object Boolean]":return+a==+c;case "[object RegExp]":return a.source==
c.source&&a.global==c.global&&a.multiline==c.multiline&&a.ignoreCase==c.ignoreCase}if(typeof a!="object"||typeof c!="object")return false;for(var f=d.length;f--;)if(d[f]==a)return true;d.push(a);var f=0,g=true;if(e=="[object Array]"){if(f=a.length,g=f==c.length)for(;f--;)if(!(g=f in a==f in c&&q(a[f],c[f],d)))break}else{if("constructor"in a!="constructor"in c||a.constructor!=c.constructor)return false;for(var h in a)if(b.has(a,h)&&(f++,!(g=b.has(c,h)&&q(a[h],c[h],d))))break;if(g){for(h in c)if(b.has(c,
h)&&!f--)break;g=!f}}d.pop();return g}var r=this,G=r._,n={},k=Array.prototype,o=Object.prototype,i=k.slice,H=k.unshift,l=o.toString,I=o.hasOwnProperty,w=k.forEach,x=k.map,y=k.reduce,z=k.reduceRight,A=k.filter,B=k.every,C=k.some,p=k.indexOf,D=k.lastIndexOf,o=Array.isArray,J=Object.keys,s=Function.prototype.bind,b=function(a){return new m(a)};if(typeof exports!=="undefined"){if(typeof module!=="undefined"&&module.exports)exports=module.exports=b;exports._=b}else r._=b;b.VERSION="1.3.1";var j=b.each=
b.forEach=function(a,c,d){if(a!=null)if(w&&a.forEach===w)a.forEach(c,d);else if(a.length===+a.length)for(var e=0,f=a.length;e<f;e++){if(e in a&&c.call(d,a[e],e,a)===n)break}else for(e in a)if(b.has(a,e)&&c.call(d,a[e],e,a)===n)break};b.map=b.collect=function(a,c,b){var e=[];if(a==null)return e;if(x&&a.map===x)return a.map(c,b);j(a,function(a,g,h){e[e.length]=c.call(b,a,g,h)});if(a.length===+a.length)e.length=a.length;return e};b.reduce=b.foldl=b.inject=function(a,c,d,e){var f=arguments.length>2;a==
null&&(a=[]);if(y&&a.reduce===y)return e&&(c=b.bind(c,e)),f?a.reduce(c,d):a.reduce(c);j(a,function(a,b,i){f?d=c.call(e,d,a,b,i):(d=a,f=true)});if(!f)throw new TypeError("Reduce of empty array with no initial value");return d};b.reduceRight=b.foldr=function(a,c,d,e){var f=arguments.length>2;a==null&&(a=[]);if(z&&a.reduceRight===z)return e&&(c=b.bind(c,e)),f?a.reduceRight(c,d):a.reduceRight(c);var g=b.toArray(a).reverse();e&&!f&&(c=b.bind(c,e));return f?b.reduce(g,c,d,e):b.reduce(g,c)};b.find=b.detect=
function(a,c,b){var e;E(a,function(a,g,h){if(c.call(b,a,g,h))return e=a,true});return e};b.filter=b.select=function(a,c,b){var e=[];if(a==null)return e;if(A&&a.filter===A)return a.filter(c,b);j(a,function(a,g,h){c.call(b,a,g,h)&&(e[e.length]=a)});return e};b.reject=function(a,c,b){var e=[];if(a==null)return e;j(a,function(a,g,h){c.call(b,a,g,h)||(e[e.length]=a)});return e};b.every=b.all=function(a,c,b){var e=true;if(a==null)return e;if(B&&a.every===B)return a.every(c,b);j(a,function(a,g,h){if(!(e=
e&&c.call(b,a,g,h)))return n});return e};var E=b.some=b.any=function(a,c,d){c||(c=b.identity);var e=false;if(a==null)return e;if(C&&a.some===C)return a.some(c,d);j(a,function(a,b,h){if(e||(e=c.call(d,a,b,h)))return n});return!!e};b.include=b.contains=function(a,c){var b=false;if(a==null)return b;return p&&a.indexOf===p?a.indexOf(c)!=-1:b=E(a,function(a){return a===c})};b.invoke=function(a,c){var d=i.call(arguments,2);return b.map(a,function(a){return(b.isFunction(c)?c||a:a[c]).apply(a,d)})};b.pluck=
function(a,c){return b.map(a,function(a){return a[c]})};b.max=function(a,c,d){if(!c&&b.isArray(a))return Math.max.apply(Math,a);if(!c&&b.isEmpty(a))return-Infinity;var e={computed:-Infinity};j(a,function(a,b,h){b=c?c.call(d,a,b,h):a;b>=e.computed&&(e={value:a,computed:b})});return e.value};b.min=function(a,c,d){if(!c&&b.isArray(a))return Math.min.apply(Math,a);if(!c&&b.isEmpty(a))return Infinity;var e={computed:Infinity};j(a,function(a,b,h){b=c?c.call(d,a,b,h):a;b<e.computed&&(e={value:a,computed:b})});
return e.value};b.shuffle=function(a){var b=[],d;j(a,function(a,f){f==0?b[0]=a:(d=Math.floor(Math.random()*(f+1)),b[f]=b[d],b[d]=a)});return b};b.sortBy=function(a,c,d){return b.pluck(b.map(a,function(a,b,g){return{value:a,criteria:c.call(d,a,b,g)}}).sort(function(a,b){var c=a.criteria,d=b.criteria;return c<d?-1:c>d?1:0}),"value")};b.groupBy=function(a,c){var d={},e=b.isFunction(c)?c:function(a){return a[c]};j(a,function(a,b){var c=e(a,b);(d[c]||(d[c]=[])).push(a)});return d};b.sortedIndex=function(a,
c,d){d||(d=b.identity);for(var e=0,f=a.length;e<f;){var g=e+f>>1;d(a[g])<d(c)?e=g+1:f=g}return e};b.toArray=function(a){return!a?[]:a.toArray?a.toArray():b.isArray(a)?i.call(a):b.isArguments(a)?i.call(a):b.values(a)};b.size=function(a){return b.toArray(a).length};b.first=b.head=function(a,b,d){return b!=null&&!d?i.call(a,0,b):a[0]};b.initial=function(a,b,d){return i.call(a,0,a.length-(b==null||d?1:b))};b.last=function(a,b,d){return b!=null&&!d?i.call(a,Math.max(a.length-b,0)):a[a.length-1]};b.rest=
b.tail=function(a,b,d){return i.call(a,b==null||d?1:b)};b.compact=function(a){return b.filter(a,function(a){return!!a})};b.flatten=function(a,c){return b.reduce(a,function(a,e){if(b.isArray(e))return a.concat(c?e:b.flatten(e));a[a.length]=e;return a},[])};b.without=function(a){return b.difference(a,i.call(arguments,1))};b.uniq=b.unique=function(a,c,d){var d=d?b.map(a,d):a,e=[];b.reduce(d,function(d,g,h){if(0==h||(c===true?b.last(d)!=g:!b.include(d,g)))d[d.length]=g,e[e.length]=a[h];return d},[]);
return e};b.union=function(){return b.uniq(b.flatten(arguments,true))};b.intersection=b.intersect=function(a){var c=i.call(arguments,1);return b.filter(b.uniq(a),function(a){return b.every(c,function(c){return b.indexOf(c,a)>=0})})};b.difference=function(a){var c=b.flatten(i.call(arguments,1));return b.filter(a,function(a){return!b.include(c,a)})};b.zip=function(){for(var a=i.call(arguments),c=b.max(b.pluck(a,"length")),d=Array(c),e=0;e<c;e++)d[e]=b.pluck(a,""+e);return d};b.indexOf=function(a,c,
d){if(a==null)return-1;var e;if(d)return d=b.sortedIndex(a,c),a[d]===c?d:-1;if(p&&a.indexOf===p)return a.indexOf(c);for(d=0,e=a.length;d<e;d++)if(d in a&&a[d]===c)return d;return-1};b.lastIndexOf=function(a,b){if(a==null)return-1;if(D&&a.lastIndexOf===D)return a.lastIndexOf(b);for(var d=a.length;d--;)if(d in a&&a[d]===b)return d;return-1};b.range=function(a,b,d){arguments.length<=1&&(b=a||0,a=0);for(var d=arguments[2]||1,e=Math.max(Math.ceil((b-a)/d),0),f=0,g=Array(e);f<e;)g[f++]=a,a+=d;return g};
var F=function(){};b.bind=function(a,c){var d,e;if(a.bind===s&&s)return s.apply(a,i.call(arguments,1));if(!b.isFunction(a))throw new TypeError;e=i.call(arguments,2);return d=function(){if(!(this instanceof d))return a.apply(c,e.concat(i.call(arguments)));F.prototype=a.prototype;var b=new F,g=a.apply(b,e.concat(i.call(arguments)));return Object(g)===g?g:b}};b.bindAll=function(a){var c=i.call(arguments,1);c.length==0&&(c=b.functions(a));j(c,function(c){a[c]=b.bind(a[c],a)});return a};b.memoize=function(a,
c){var d={};c||(c=b.identity);return function(){var e=c.apply(this,arguments);return b.has(d,e)?d[e]:d[e]=a.apply(this,arguments)}};b.delay=function(a,b){var d=i.call(arguments,2);return setTimeout(function(){return a.apply(a,d)},b)};b.defer=function(a){return b.delay.apply(b,[a,1].concat(i.call(arguments,1)))};b.throttle=function(a,c){var d,e,f,g,h,i=b.debounce(function(){h=g=false},c);return function(){d=this;e=arguments;var b;f||(f=setTimeout(function(){f=null;h&&a.apply(d,e);i()},c));g?h=true:
a.apply(d,e);i();g=true}};b.debounce=function(a,b){var d;return function(){var e=this,f=arguments;clearTimeout(d);d=setTimeout(function(){d=null;a.apply(e,f)},b)}};b.once=function(a){var b=false,d;return function(){if(b)return d;b=true;return d=a.apply(this,arguments)}};b.wrap=function(a,b){return function(){var d=[a].concat(i.call(arguments,0));return b.apply(this,d)}};b.compose=function(){var a=arguments;return function(){for(var b=arguments,d=a.length-1;d>=0;d--)b=[a[d].apply(this,b)];return b[0]}};
b.after=function(a,b){return a<=0?b():function(){if(--a<1)return b.apply(this,arguments)}};b.keys=J||function(a){if(a!==Object(a))throw new TypeError("Invalid object");var c=[],d;for(d in a)b.has(a,d)&&(c[c.length]=d);return c};b.values=function(a){return b.map(a,b.identity)};b.functions=b.methods=function(a){var c=[],d;for(d in a)b.isFunction(a[d])&&c.push(d);return c.sort()};b.extend=function(a){j(i.call(arguments,1),function(b){for(var d in b)a[d]=b[d]});return a};b.defaults=function(a){j(i.call(arguments,
1),function(b){for(var d in b)a[d]==null&&(a[d]=b[d])});return a};b.clone=function(a){return!b.isObject(a)?a:b.isArray(a)?a.slice():b.extend({},a)};b.tap=function(a,b){b(a);return a};b.isEqual=function(a,b){return q(a,b,[])};b.isEmpty=function(a){if(b.isArray(a)||b.isString(a))return a.length===0;for(var c in a)if(b.has(a,c))return false;return true};b.isElement=function(a){return!!(a&&a.nodeType==1)};b.isArray=o||function(a){return l.call(a)=="[object Array]"};b.isObject=function(a){return a===Object(a)};
b.isArguments=function(a){return l.call(a)=="[object Arguments]"};if(!b.isArguments(arguments))b.isArguments=function(a){return!(!a||!b.has(a,"callee"))};b.isFunction=function(a){return l.call(a)=="[object Function]"};b.isString=function(a){return l.call(a)=="[object String]"};b.isNumber=function(a){return l.call(a)=="[object Number]"};b.isNaN=function(a){return a!==a};b.isBoolean=function(a){return a===true||a===false||l.call(a)=="[object Boolean]"};b.isDate=function(a){return l.call(a)=="[object Date]"};
b.isRegExp=function(a){return l.call(a)=="[object RegExp]"};b.isNull=function(a){return a===null};b.isUndefined=function(a){return a===void 0};b.has=function(a,b){return I.call(a,b)};b.noConflict=function(){r._=G;return this};b.identity=function(a){return a};b.times=function(a,b,d){for(var e=0;e<a;e++)b.call(d,e)};b.escape=function(a){return(""+a).replace(/&/g,"&amp;").replace(/</g,"&lt;").replace(/>/g,"&gt;").replace(/"/g,"&quot;").replace(/'/g,"&#x27;").replace(/\//g,"&#x2F;")};b.mixin=function(a){j(b.functions(a),
function(c){K(c,b[c]=a[c])})};var L=0;b.uniqueId=function(a){var b=L++;return a?a+b:b};b.templateSettings={evaluate:/<%([\s\S]+?)%>/g,interpolate:/<%=([\s\S]+?)%>/g,escape:/<%-([\s\S]+?)%>/g};var t=/.^/,u=function(a){return a.replace(/\\\\/g,"\\").replace(/\\'/g,"'")};b.template=function(a,c){var d=b.templateSettings,d="var __p=[],print=function(){__p.push.apply(__p,arguments);};with(obj||{}){__p.push('"+a.replace(/\\/g,"\\\\").replace(/'/g,"\\'").replace(d.escape||t,function(a,b){return"',_.escape("+
u(b)+"),'"}).replace(d.interpolate||t,function(a,b){return"',"+u(b)+",'"}).replace(d.evaluate||t,function(a,b){return"');"+u(b).replace(/[\r\n\t]/g," ")+";__p.push('"}).replace(/\r/g,"\\r").replace(/\n/g,"\\n").replace(/\t/g,"\\t")+"');}return __p.join('');",e=new Function("obj","_",d);return c?e(c,b):function(a){return e.call(this,a,b)}};b.chain=function(a){return b(a).chain()};var m=function(a){this._wrapped=a};b.prototype=m.prototype;var v=function(a,c){return c?b(a).chain():a},K=function(a,c){m.prototype[a]=
function(){var a=i.call(arguments);H.call(a,this._wrapped);return v(c.apply(b,a),this._chain)}};b.mixin(b);j("pop,push,reverse,shift,sort,splice,unshift".split(","),function(a){var b=k[a];m.prototype[a]=function(){var d=this._wrapped;b.apply(d,arguments);var e=d.length;(a=="shift"||a=="splice")&&e===0&&delete d[0];return v(d,this._chain)}});j(["concat","join","slice"],function(a){var b=k[a];m.prototype[a]=function(){return v(b.apply(this._wrapped,arguments),this._chain)}});m.prototype.chain=function(){this._chain=
true;return this};m.prototype.value=function(){return this._wrapped}}).call(this);

View file

@ -1,318 +0,0 @@
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>Building from source &#8212; Miller 5.10.2 documentation</title>
<link rel="stylesheet" href="_static/classic.css" type="text/css" />
<link rel="stylesheet" href="_static/pygments.css" type="text/css" />
<script id="documentation_options" data-url_root="./" src="_static/documentation_options.js"></script>
<script src="_static/jquery.js"></script>
<script src="_static/underscore.js"></script>
<script src="_static/doctools.js"></script>
<script src="_static/language_data.js"></script>
<link rel="index" title="Index" href="genindex.html" />
<link rel="search" title="Search" href="search.html" />
<link rel="next" title="Why?" href="why.html" />
<link rel="prev" title="Documents by release" href="release-docs.html" />
</head><body>
<div class="related" role="navigation" aria-label="related navigation">
<h3>Navigation</h3>
<ul>
<li class="right" style="margin-right: 10px">
<a href="genindex.html" title="General Index"
accesskey="I">index</a></li>
<li class="right" >
<a href="why.html" title="Why?"
accesskey="N">next</a> |</li>
<li class="right" >
<a href="release-docs.html" title="Documents by release"
accesskey="P">previous</a> |</li>
<li class="nav-item nav-item-0"><a href="index.html">Miller 5.10.2 documentation</a> &#187;</li>
<li class="nav-item nav-item-this"><a href="">Building from source</a></li>
</ul>
</div>
<div class="document">
<div class="documentwrapper">
<div class="bodywrapper">
<div class="body" role="main">
<div class="section" id="building-from-source">
<h1>Building from source<a class="headerlink" href="#building-from-source" title="Permalink to this headline"></a></h1>
<p>Please also see <a class="reference internal" href="install.html"><span class="doc">Installation</span></a> for information about pre-built executables.</p>
<div class="section" id="miller-license">
<h2>Miller license<a class="headerlink" href="#miller-license" title="Permalink to this headline"></a></h2>
<p>Two-clause BSD license <a class="reference external" href="https://github.com/johnkerl/miller/blob/master/LICENSE.txt">https://github.com/johnkerl/miller/blob/master/LICENSE.txt</a>.</p>
</div>
<div class="section" id="from-release-tarball-using-autoconfig">
<h2>From release tarball using autoconfig<a class="headerlink" href="#from-release-tarball-using-autoconfig" title="Permalink to this headline"></a></h2>
<p>Miller allows you the option of using GNU <code class="docutils literal notranslate"><span class="pre">autoconfigure</span></code> to build portably.</p>
<p>Grateful acknowledgement: Millers GNU autoconfig work was done by the generous and expert efforts of <a class="reference external" href="https://github.com/0-wiz-0/">Thomas Klausner</a>.</p>
<ul class="simple">
<li><p>Obtain <code class="docutils literal notranslate"><span class="pre">mlr-i.j.k.tar.gz</span></code> from <a class="reference external" href="https://github.com/johnkerl/miller/tags">https://github.com/johnkerl/miller/tags</a>, replacing <code class="docutils literal notranslate"><span class="pre">i.j.k</span></code> with the desired release, e.g. <code class="docutils literal notranslate"><span class="pre">2.2.1</span></code>.</p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">tar</span> <span class="pre">zxvf</span> <span class="pre">mlr-i.j.k.tar.gz</span></code></p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">cd</span> <span class="pre">mlr-i.j.k</span></code></p></li>
<li><p>Install the following packages using your systems package manager (<code class="docutils literal notranslate"><span class="pre">apt-get</span></code>, <code class="docutils literal notranslate"><span class="pre">yum</span> <span class="pre">install</span></code>, etc.): <strong>flex</strong></p></li>
<li><p>Various configuration options of your choice, e.g.</p>
<ul>
<li><p><code class="docutils literal notranslate"><span class="pre">./configure</span></code></p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">./configure</span> <span class="pre">--prefix=/usr/local</span></code></p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">./configure</span> <span class="pre">--prefix=$HOME/pkgs</span></code></p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">./configure</span> <span class="pre">CC=clang</span></code></p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">./configure</span> <span class="pre">--disable-shared</span></code> (to make a statically linked executable)</p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">./configure</span> <span class="pre">'CFLAGS=-Wall</span> <span class="pre">-std=gnu99</span> <span class="pre">-O3'</span></code></p></li>
<li><p>etc.</p></li>
</ul>
</li>
<li><p><code class="docutils literal notranslate"><span class="pre">make</span></code> creates the <code class="docutils literal notranslate"><span class="pre">c/mlr</span></code> executable</p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">make</span> <span class="pre">check</span></code></p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">make</span> <span class="pre">install</span></code> copies the <code class="docutils literal notranslate"><span class="pre">c/mlr</span></code> executable to your prefixs <code class="docutils literal notranslate"><span class="pre">bin</span></code> subdirectory.</p></li>
</ul>
</div>
<div class="section" id="from-git-clone-using-autoconfig">
<h2>From git clone using autoconfig<a class="headerlink" href="#from-git-clone-using-autoconfig" title="Permalink to this headline"></a></h2>
<ul class="simple">
<li><p><code class="docutils literal notranslate"><span class="pre">git</span> <span class="pre">clone</span> <span class="pre">https://github.com/johnkerl/miller</span></code></p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">cd</span> <span class="pre">miller</span></code></p></li>
<li><p>Install the following packages using your systems package manager (<code class="docutils literal notranslate"><span class="pre">apt-get</span></code>, <code class="docutils literal notranslate"><span class="pre">yum</span> <span class="pre">install</span></code>, etc.): <strong>automake autoconf libtool flex</strong></p></li>
<li><p>Run <code class="docutils literal notranslate"><span class="pre">autoreconf</span> <span class="pre">-fiv</span></code>. (This is necessary when building from head as discussed in <a class="reference external" href="https://github.com/johnkerl/miller/issues/131">https://github.com/johnkerl/miller/issues/131</a>.)</p></li>
<li><p>Then continue from “Install the following … ” as above.</p></li>
</ul>
</div>
<div class="section" id="without-using-autoconfig">
<h2>Without using autoconfig<a class="headerlink" href="#without-using-autoconfig" title="Permalink to this headline"></a></h2>
<p>GNU autoconfig is familiar to many users, and indeed plenty of folks wont bother to use an open-source software package which doesnt have autoconfig support. And this is for good reason: GNU autoconfig allows us to build software on a wide diversity of platforms. For this reason Im happy that Miller supports autoconfig.</p>
<p>But, many others (myself included!) find autoconfig confusing: if it works without errors, great, but if not, the <code class="docutils literal notranslate"><span class="pre">./configure</span> <span class="pre">&amp;&amp;</span> <span class="pre">make</span></code> output can be exceedingly difficult to decipher. And this also can be a turn-off for using open-source software: if you cant figure out the build errors, you may just keep walking. For this reason Im happy that Miller allows you to build without autoconfig. (Of course, if you have any build errors, feel free to contact me at <a class="reference external" href="mailto:kerl&#46;john&#46;r+miller&#37;&#52;&#48;gmail&#46;com">mailto:kerl<span>&#46;</span>john<span>&#46;</span>r+miller<span>&#64;</span>gmail<span>&#46;</span>com</a> or, better, open an issue with “New Issue” at <a class="reference external" href="https://github.com/johnkerl/miller/issues">https://github.com/johnkerl/miller/issues</a>.)</p>
<p>Steps:</p>
<ul class="simple">
<li><p>Obtain a release tarball or git clone.</p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">cd</span></code> into the <code class="docutils literal notranslate"><span class="pre">c</span></code> subdirectory.</p></li>
<li><p>Edit the <code class="docutils literal notranslate"><span class="pre">INSTALLDIR</span></code> in <code class="docutils literal notranslate"><span class="pre">Makefile.no-autoconfig</span></code>.</p></li>
<li><p>To change the C compiler, edit the <code class="docutils literal notranslate"><span class="pre">CC=</span></code> lines in <code class="docutils literal notranslate"><span class="pre">Makefile.no-autoconfig</span></code> and <code class="docutils literal notranslate"><span class="pre">dsls/Makefile.no-autoconfig</span></code>.</p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">make</span> <span class="pre">-f</span> <span class="pre">Makefile.no-autoconfig</span></code> creates the <code class="docutils literal notranslate"><span class="pre">mlr</span></code> executable and runs unit/regression tests (i.e. the equivalent of both <code class="docutils literal notranslate"><span class="pre">make</span></code> and <code class="docutils literal notranslate"><span class="pre">make</span> <span class="pre">check</span></code> using autoconfig).</p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">make</span> <span class="pre">install</span></code> copies the <code class="docutils literal notranslate"><span class="pre">mlr</span></code> executable to your install directory.</p></li>
</ul>
<p>The <code class="docutils literal notranslate"><span class="pre">Makefile.no-autoconfig</span></code> is simple: little more than <code class="docutils literal notranslate"><span class="pre">gcc</span> <span class="pre">*.c</span></code>. Customzing is less automatic than autoconfig, but more transparent. I expect this makefile to work with few modifications on a large fraction of modern Linux/BSD-like systems: Im aware of successful use with <code class="docutils literal notranslate"><span class="pre">gcc</span></code> and <code class="docutils literal notranslate"><span class="pre">clang</span></code>, on Ubuntu 12.04 LTS, SELinux, Darwin (MacOS Yosemite), and FreeBSD.</p>
</div>
<div class="section" id="windows">
<h2>Windows<a class="headerlink" href="#windows" title="Permalink to this headline"></a></h2>
<p><em>Disclaimer: Im now relying exclusively on</em> <a class="reference external" href="https://ci.appveyor.com/project/johnkerl/miller">Appveyor</a> <em>for Windows builds; I havent built from source using MSYS in quite a while.</em></p>
<p>Miller has been built on Windows using MSYS2: <a class="reference external" href="http://www.msys2.org/">http://www.msys2.org/</a>. You can install MSYS2 and build Miller from its source code within MSYS2, and then you can use the binary from outside MSYS2. You can also use a precompiled binary (see above).</p>
<p>You will first need to install MSYS2: <a class="reference external" href="http://www.msys2.org/">http://www.msys2.org/</a>. Then, start an MSYS2 shell, e.g. (supposing you installed MSYS2 to <code class="docutils literal notranslate"><span class="pre">C:\msys2\</span></code>) run <code class="docutils literal notranslate"><span class="pre">C:\msys2\mingw64.exe</span></code>. Within the MSYS2 shell, you can run the following to install dependent packages:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">pacman</span> <span class="o">-</span><span class="n">Syu</span>
<span class="n">pacman</span> <span class="o">-</span><span class="n">Su</span>
<span class="n">pacman</span> <span class="o">-</span><span class="n">S</span> <span class="n">base</span><span class="o">-</span><span class="n">devel</span>
<span class="n">pacman</span> <span class="o">-</span><span class="n">S</span> <span class="n">msys2</span><span class="o">-</span><span class="n">devel</span>
<span class="n">pacman</span> <span class="o">-</span><span class="n">S</span> <span class="n">mingw</span><span class="o">-</span><span class="n">w64</span><span class="o">-</span><span class="n">x86_64</span><span class="o">-</span><span class="n">toolchain</span>
<span class="n">pacman</span> <span class="o">-</span><span class="n">S</span> <span class="n">mingw</span><span class="o">-</span><span class="n">w64</span><span class="o">-</span><span class="n">x86_64</span><span class="o">-</span><span class="n">pcre</span>
<span class="n">pacman</span> <span class="o">-</span><span class="n">S</span> <span class="n">msys2</span><span class="o">-</span><span class="n">runtime</span>
</pre></div>
</div>
<p>The list of dependent packages may be also found in <strong>appveyor.yml</strong> in the Miller base directory.</p>
<p>Then, simply run <strong>msys2-build.sh</strong> which is a thin wrapper around <code class="docutils literal notranslate"><span class="pre">./configure</span> <span class="pre">&amp;&amp;</span> <span class="pre">make</span></code> which accommodates certain Windows/MSYS2 idiosyncracies.</p>
<p>There is a unit-test false-negative issue involving the semantics of the <code class="docutils literal notranslate"><span class="pre">mkstemp</span></code> library routine but a <code class="docutils literal notranslate"><span class="pre">make</span> <span class="pre">-k</span></code> in the <code class="docutils literal notranslate"><span class="pre">c</span></code> subdirectory has been producing a <code class="docutils literal notranslate"><span class="pre">mlr.exe</span></code> for me.</p>
<p>Within MSYS2 you can run <code class="docutils literal notranslate"><span class="pre">mlr</span></code>: simply copy it from the <code class="docutils literal notranslate"><span class="pre">c</span></code> subdirectory to your desired location somewhere within your MSYS2 <code class="docutils literal notranslate"><span class="pre">$PATH</span></code>. To run <code class="docutils literal notranslate"><span class="pre">mlr</span></code> outside of MSYS2, just as with precompiled binaries as described above, youll need <code class="docutils literal notranslate"><span class="pre">msys-2.0.dll</span></code>. One way to do this is to augment your path:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">C</span><span class="p">:</span>\<span class="o">&gt;</span> <span class="nb">set</span> <span class="n">PATH</span><span class="o">=%</span><span class="n">PATH</span><span class="o">%</span><span class="p">;</span>\<span class="n">msys64</span>\<span class="n">mingw64</span>\<span class="nb">bin</span>
</pre></div>
</div>
<p>Another way to do it is to copy the Miller executable and the DLL to the same directory:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">C</span><span class="p">:</span>\<span class="o">&gt;</span> <span class="n">mkdir</span> \<span class="n">mbin</span>
<span class="n">C</span><span class="p">:</span>\<span class="o">&gt;</span> <span class="n">copy</span> \<span class="n">msys64</span>\<span class="n">mingw64</span>\<span class="nb">bin</span>\<span class="n">msys</span><span class="o">-</span><span class="mf">2.0</span><span class="o">.</span><span class="n">dll</span> \<span class="n">mbin</span>
<span class="n">C</span><span class="p">:</span>\<span class="o">&gt;</span> <span class="n">copy</span> \<span class="n">msys64</span>\<span class="n">wherever</span>\<span class="n">you</span>\<span class="n">installed</span>\<span class="n">miller</span>\<span class="n">c</span>\<span class="n">mlr</span><span class="o">.</span><span class="n">exe</span> \<span class="n">mbin</span>
<span class="n">C</span><span class="p">:</span>\<span class="o">&gt;</span> <span class="nb">set</span> <span class="n">PATH</span><span class="o">=%</span><span class="n">PATH</span><span class="o">%</span><span class="p">;</span>\<span class="n">mbin</span>
</pre></div>
</div>
</div>
<div class="section" id="in-case-of-problems">
<h2>In case of problems<a class="headerlink" href="#in-case-of-problems" title="Permalink to this headline"></a></h2>
<p>If you have any build errors, feel free to contact me at <a class="reference external" href="mailto:kerl&#46;john&#46;r+miller&#37;&#52;&#48;gmail&#46;com">mailto:kerl<span>&#46;</span>john<span>&#46;</span>r+miller<span>&#64;</span>gmail<span>&#46;</span>com</a> or, better, open an issue with “New Issue” at <a class="reference external" href="https://github.com/johnkerl/miller/issues">https://github.com/johnkerl/miller/issues</a>.</p>
</div>
<div class="section" id="dependencies">
<h2>Dependencies<a class="headerlink" href="#dependencies" title="Permalink to this headline"></a></h2>
<div class="section" id="required-external-dependencies">
<h3>Required external dependencies<a class="headerlink" href="#required-external-dependencies" title="Permalink to this headline"></a></h3>
<p>These are necessary to produce the <code class="docutils literal notranslate"><span class="pre">mlr</span></code> executable.</p>
<ul class="simple">
<li><p><code class="docutils literal notranslate"><span class="pre">gcc</span></code>, <code class="docutils literal notranslate"><span class="pre">clang</span></code>, etc. (or presumably other compilers; please open an issue or send me a pull request if you have information for me about other 21st-century compilers)</p></li>
<li><p>The standard C library</p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">flex</span></code></p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">automake</span></code>, <code class="docutils literal notranslate"><span class="pre">autoconf</span></code>, and <code class="docutils literal notranslate"><span class="pre">libtool</span></code>, if you build with autoconfig</p></li>
</ul>
</div>
<div class="section" id="optional-external-dependencies">
<h3>Optional external dependencies<a class="headerlink" href="#optional-external-dependencies" title="Permalink to this headline"></a></h3>
<p>This documentation pageset is built using Sphinx. Please see <cite>./README.md</cite> for details.</p>
</div>
<div class="section" id="internal-dependencies">
<h3>Internal dependencies<a class="headerlink" href="#internal-dependencies" title="Permalink to this headline"></a></h3>
<p>These are included within the <a class="reference external" href="https://github.com/johnkerl/miller">Miller source tree</a> and do not need to be separately installed (and in fact any separate installation will not be picked up in the Miller build):</p>
<ul class="simple">
<li><p><a class="reference external" href="http://en.wikipedia.org/wiki/Mersenne_Twister">Mersenne Twister</a> for pseudorandom-number generation: <a class="reference external" href="https://github.com/johnkerl/miller/blob/master/c/lib/mtrand.c">C implementation by Nishimura and Matsumoto</a> with license terms respected.</p></li>
<li><p><a class="reference external" href="http://www.jera.com/techinfo/jtns/jtn002.html">MinUnit</a> for unit-testing, with as-is-no-warranty license <a class="reference external" href="http://www.jera.com/techinfo/jtns/jtn002.html#License">http://www.jera.com/techinfo/jtns/jtn002.html#License</a>, <a class="reference external" href="https://github.com/johnkerl/miller/blob/master/c/lib/minunit.h">https://github.com/johnkerl/miller/blob/master/c/lib/minunit.h</a>.</p></li>
<li><p>The <a class="reference external" href="http://www.hwaci.com/sw/lemon/">Lemon parser-generator</a>, the author of which explicitly disclaims copyright.</p></li>
<li><p>The <a class="reference external" href="https://github.com/udp/json-parser">udp JSON parser</a>, with BSD2 license.</p></li>
<li><p>The <a class="reference external" href="https://github.com/sheredom/utf8.h">sheredom UTF-8 library</a>, which is free and unencumbered software released into the public domain.</p></li>
<li><p>The NetBSD <code class="docutils literal notranslate"><span class="pre">strptime</span></code> (needed for the Windows/MSYS2 port since MSYS2 lacks this), with BSD license.</p></li>
</ul>
</div>
</div>
<div class="section" id="creating-a-new-release-for-developers">
<h2>Creating a new release: for developers<a class="headerlink" href="#creating-a-new-release-for-developers" title="Permalink to this headline"></a></h2>
<p>At present Im the primary developer so this is just my checklist for making new releases.</p>
<p>In this example I am using version 3.4.0; of course that will change for subsequent revisions.</p>
<ul class="simple">
<li><p>Update version found in <code class="docutils literal notranslate"><span class="pre">mlr</span> <span class="pre">--version</span></code> and <code class="docutils literal notranslate"><span class="pre">man</span> <span class="pre">mlr</span></code>:</p>
<ul>
<li><p>Edit <code class="docutils literal notranslate"><span class="pre">configure.ac</span></code>, <code class="docutils literal notranslate"><span class="pre">c/mlrvers.h</span></code>, <code class="docutils literal notranslate"><span class="pre">miller.spec</span></code>, and <code class="docutils literal notranslate"><span class="pre">docs/conf.py</span></code> from <code class="docutils literal notranslate"><span class="pre">3.3.2-dev</span></code> to <code class="docutils literal notranslate"><span class="pre">3.4.0</span></code>.</p></li>
<li><p>Do a fresh <code class="docutils literal notranslate"><span class="pre">autoreconf</span> <span class="pre">-fiv</span></code> and commit the output. (Preferably on a Linux host, rather than MacOS, to reduce needless diffs in autogen build files.)</p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">make</span> <span class="pre">-C</span> <span class="pre">c</span> <span class="pre">-f</span> <span class="pre">Makefile.no-autoconfig</span> <span class="pre">installhome</span> <span class="pre">&amp;&amp;</span> <span class="pre">make</span> <span class="pre">-C</span> <span class="pre">man</span> <span class="pre">-f</span> <span class="pre">Makefile.no-autoconfig</span> <span class="pre">installhome</span> <span class="pre">&amp;&amp;</span> <span class="pre">make</span> <span class="pre">-C</span> <span class="pre">docs</span> <span class="pre">-f</span> <span class="pre">Makefile.no-autoconfig</span> <span class="pre">html</span></code></p></li>
<li><p>The ordering is important: the first build creates <code class="docutils literal notranslate"><span class="pre">mlr</span></code>; the second runs <code class="docutils literal notranslate"><span class="pre">mlr</span></code> to create <code class="docutils literal notranslate"><span class="pre">manpage.txt</span></code>; the third includes <code class="docutils literal notranslate"><span class="pre">manpage.txt</span></code> into one of its outputs.</p></li>
<li><p>Commit and push.</p></li>
</ul>
</li>
<li><p>Create the release tarball and SRPM:</p>
<ul>
<li><p>On buildbox: <code class="docutils literal notranslate"><span class="pre">./configure</span> <span class="pre">&amp;&amp;</span> <span class="pre">make</span> <span class="pre">distcheck</span></code></p></li>
<li><p>On buildbox: make SRPM as in <a class="reference external" href="https://github.com/johnkerl/miller/blob/master/README-RPM.md">https://github.com/johnkerl/miller/blob/master/README-RPM.md</a></p></li>
<li><p>On all buildboxes: <code class="docutils literal notranslate"><span class="pre">cd</span> <span class="pre">c</span></code> and <code class="docutils literal notranslate"><span class="pre">make</span> <span class="pre">-f</span> <span class="pre">Makefile.no-autoconfig</span> <span class="pre">mlr.static</span></code>. Then copy <code class="docutils literal notranslate"><span class="pre">mlr.static</span></code> to <code class="docutils literal notranslate"><span class="pre">../mlr.{arch}</span></code>. (This may require as prerequisite <code class="docutils literal notranslate"><span class="pre">sudo</span> <span class="pre">yum</span> <span class="pre">install</span> <span class="pre">glibc-static</span></code> or the like.)</p></li>
<li><p>For static binaries, please do <code class="docutils literal notranslate"><span class="pre">ldd</span> <span class="pre">mlr.static</span></code> and make sure it says <code class="docutils literal notranslate"><span class="pre">not</span> <span class="pre">a</span> <span class="pre">dynamic</span> <span class="pre">executable</span></code>.</p></li>
<li><p>Then <code class="docutils literal notranslate"><span class="pre">mv</span> <span class="pre">mlr.static</span> <span class="pre">../mlr.linux_x86_64</span></code></p></li>
<li><p>Pull back release tarball <code class="docutils literal notranslate"><span class="pre">mlr-3.4.0.tar.gz</span></code> and SRPM <code class="docutils literal notranslate"><span class="pre">miller-3.4.0-1.el6.src.rpm</span></code> from buildbox, and <code class="docutils literal notranslate"><span class="pre">mlr.{arch}</span></code> binaries from whatever buildboxes.</p></li>
<li><p>Download <code class="docutils literal notranslate"><span class="pre">mlr.exe</span></code> and <code class="docutils literal notranslate"><span class="pre">msys-2.0.dll</span></code> from <a class="reference external" href="https://ci.appveyor.com/project/johnkerl/miller/build/artifacts">https://ci.appveyor.com/project/johnkerl/miller/build/artifacts</a>.</p></li>
</ul>
</li>
<li><p>Create the Github release tag:</p>
<ul>
<li><p>Dont forget the <code class="docutils literal notranslate"><span class="pre">v</span></code> in <code class="docutils literal notranslate"><span class="pre">v3.4.0</span></code></p></li>
<li><p>Write the release notes</p></li>
<li><p>Attach the release tarball, SRPM, and binaries. Double-check assets were successfully uploaded.</p></li>
<li><p>Publish the release</p></li>
</ul>
</li>
<li><p>Check the release-specific docs:</p>
<ul>
<li><p>Look at <a class="reference external" href="https://miller.readthedocs.io">https://miller.readthedocs.io</a> for new-version docs, after a few minutes propagation time.</p></li>
</ul>
</li>
<li><p>Notify:</p>
<ul>
<li><p>Submit <code class="docutils literal notranslate"><span class="pre">brew</span></code> pull request; notify any other distros which dont appear to have autoupdated since the previous release (notes below)</p></li>
<li><p>Similarly for <code class="docutils literal notranslate"><span class="pre">macports</span></code>: <a class="reference external" href="https://github.com/macports/macports-ports/blob/master/textproc/miller/Portfile">https://github.com/macports/macports-ports/blob/master/textproc/miller/Portfile</a>.</p></li>
<li><p>Social-media updates.</p></li>
</ul>
</li>
</ul>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">git</span> <span class="n">remote</span> <span class="n">add</span> <span class="n">upstream</span> <span class="n">https</span><span class="p">:</span><span class="o">//</span><span class="n">github</span><span class="o">.</span><span class="n">com</span><span class="o">/</span><span class="n">Homebrew</span><span class="o">/</span><span class="n">homebrew</span><span class="o">-</span><span class="n">core</span> <span class="c1"># one-time setup only</span>
<span class="n">git</span> <span class="n">fetch</span> <span class="n">upstream</span>
<span class="n">git</span> <span class="n">rebase</span> <span class="n">upstream</span><span class="o">/</span><span class="n">master</span>
<span class="n">git</span> <span class="n">checkout</span> <span class="o">-</span><span class="n">b</span> <span class="n">miller</span><span class="o">-</span><span class="mf">3.4</span><span class="o">.</span><span class="mi">0</span>
<span class="n">shasum</span> <span class="o">-</span><span class="n">a</span> <span class="mi">256</span> <span class="o">/</span><span class="n">path</span><span class="o">/</span><span class="n">to</span><span class="o">/</span><span class="n">mlr</span><span class="o">-</span><span class="mf">3.4</span><span class="o">.</span><span class="mf">0.</span><span class="n">tar</span><span class="o">.</span><span class="n">gz</span>
<span class="n">edit</span> <span class="n">Formula</span><span class="o">/</span><span class="n">miller</span><span class="o">.</span><span class="n">rb</span>
<span class="c1"># Test the URL from the line like</span>
<span class="c1"># url &quot;https://github.com/johnkerl/miller/releases/download/v3.4.0/mlr-3.4.0.tar.gz&quot;</span>
<span class="c1"># in a browser for typos</span>
<span class="c1"># A &#39;@BrewTestBot Test this please&#39; comment within the homebrew-core pull request will restart the homebrew travis build</span>
<span class="n">git</span> <span class="n">add</span> <span class="n">Formula</span><span class="o">/</span><span class="n">miller</span><span class="o">.</span><span class="n">rb</span>
<span class="n">git</span> <span class="n">commit</span> <span class="o">-</span><span class="n">m</span> <span class="s1">&#39;miller 3.4.0&#39;</span>
<span class="n">git</span> <span class="n">push</span> <span class="o">-</span><span class="n">u</span> <span class="n">origin</span> <span class="n">miller</span><span class="o">-</span><span class="mf">3.4</span><span class="o">.</span><span class="mi">0</span>
<span class="p">(</span><span class="n">submit</span> <span class="n">the</span> <span class="n">pull</span> <span class="n">request</span><span class="p">)</span>
</pre></div>
</div>
<ul class="simple">
<li><p>Afterwork:</p>
<ul>
<li><p>Edit <code class="docutils literal notranslate"><span class="pre">configure.ac</span></code> and <code class="docutils literal notranslate"><span class="pre">c/mlrvers.h</span></code> to change version from <code class="docutils literal notranslate"><span class="pre">3.4.0</span></code> to <code class="docutils literal notranslate"><span class="pre">3.4.0-dev</span></code>.</p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">make</span> <span class="pre">-C</span> <span class="pre">c</span> <span class="pre">-f</span> <span class="pre">Makefile.no-autoconfig</span> <span class="pre">installhome</span> <span class="pre">&amp;&amp;</span> <span class="pre">make</span> <span class="pre">-C</span> <span class="pre">doc</span> <span class="pre">-f</span> <span class="pre">Makefile.no-autoconfig</span> <span class="pre">all</span> <span class="pre">installhome</span></code></p></li>
<li><p>Commit and push.</p></li>
</ul>
</li>
</ul>
</div>
<div class="section" id="misc-development-notes">
<h2>Misc. development notes<a class="headerlink" href="#misc-development-notes" title="Permalink to this headline"></a></h2>
<p>I use terminal width 120 and tabwidth 4.</p>
</div>
</div>
<div class="clearer"></div>
</div>
</div>
</div>
<div class="sphinxsidebar" role="navigation" aria-label="main navigation">
<div class="sphinxsidebarwrapper">
<h3><a href="index.html">Table of Contents</a></h3>
<ul>
<li><a class="reference internal" href="#">Building from source</a><ul>
<li><a class="reference internal" href="#miller-license">Miller license</a></li>
<li><a class="reference internal" href="#from-release-tarball-using-autoconfig">From release tarball using autoconfig</a></li>
<li><a class="reference internal" href="#from-git-clone-using-autoconfig">From git clone using autoconfig</a></li>
<li><a class="reference internal" href="#without-using-autoconfig">Without using autoconfig</a></li>
<li><a class="reference internal" href="#windows">Windows</a></li>
<li><a class="reference internal" href="#in-case-of-problems">In case of problems</a></li>
<li><a class="reference internal" href="#dependencies">Dependencies</a><ul>
<li><a class="reference internal" href="#required-external-dependencies">Required external dependencies</a></li>
<li><a class="reference internal" href="#optional-external-dependencies">Optional external dependencies</a></li>
<li><a class="reference internal" href="#internal-dependencies">Internal dependencies</a></li>
</ul>
</li>
<li><a class="reference internal" href="#creating-a-new-release-for-developers">Creating a new release: for developers</a></li>
<li><a class="reference internal" href="#misc-development-notes">Misc. development notes</a></li>
</ul>
</li>
</ul>
<h4>Previous topic</h4>
<p class="topless"><a href="release-docs.html"
title="previous chapter">Documents by release</a></p>
<h4>Next topic</h4>
<p class="topless"><a href="why.html"
title="next chapter">Why?</a></p>
<div role="note" aria-label="source link">
<h3>This Page</h3>
<ul class="this-page-menu">
<li><a href="_sources/build.rst.txt"
rel="nofollow">Show Source</a></li>
</ul>
</div>
<div id="searchbox" style="display: none" role="search">
<h3 id="searchlabel">Quick search</h3>
<div class="searchformwrapper">
<form class="search" action="search.html" method="get">
<input type="text" name="q" aria-labelledby="searchlabel" />
<input type="submit" value="Go" />
</form>
</div>
</div>
<script>$('#searchbox').show(0);</script>
</div>
</div>
<div class="clearer"></div>
</div>
<div class="related" role="navigation" aria-label="related navigation">
<h3>Navigation</h3>
<ul>
<li class="right" style="margin-right: 10px">
<a href="genindex.html" title="General Index"
>index</a></li>
<li class="right" >
<a href="why.html" title="Why?"
>next</a> |</li>
<li class="right" >
<a href="release-docs.html" title="Documents by release"
>previous</a> |</li>
<li class="nav-item nav-item-0"><a href="index.html">Miller 5.10.2 documentation</a> &#187;</li>
<li class="nav-item nav-item-this"><a href="">Building from source</a></li>
</ul>
</div>
<div class="footer" role="contentinfo">
&#169; Copyright 2020, John Kerl.
Created using <a href="https://www.sphinx-doc.org/">Sphinx</a> 3.2.1.
</div>
</body>
</html>

View file

@ -1,107 +0,0 @@
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>Contact &#8212; Miller 5.10.2 documentation</title>
<link rel="stylesheet" href="_static/classic.css" type="text/css" />
<link rel="stylesheet" href="_static/pygments.css" type="text/css" />
<script id="documentation_options" data-url_root="./" src="_static/documentation_options.js"></script>
<script src="_static/jquery.js"></script>
<script src="_static/underscore.js"></script>
<script src="_static/doctools.js"></script>
<script src="_static/language_data.js"></script>
<link rel="index" title="Index" href="genindex.html" />
<link rel="search" title="Search" href="search.html" />
<link rel="next" title="FAQ" href="faq.html" />
<link rel="prev" title="Internationalization" href="internationalization.html" />
</head><body>
<div class="related" role="navigation" aria-label="related navigation">
<h3>Navigation</h3>
<ul>
<li class="right" style="margin-right: 10px">
<a href="genindex.html" title="General Index"
accesskey="I">index</a></li>
<li class="right" >
<a href="faq.html" title="FAQ"
accesskey="N">next</a> |</li>
<li class="right" >
<a href="internationalization.html" title="Internationalization"
accesskey="P">previous</a> |</li>
<li class="nav-item nav-item-0"><a href="index.html">Miller 5.10.2 documentation</a> &#187;</li>
<li class="nav-item nav-item-this"><a href="">Contact</a></li>
</ul>
</div>
<div class="document">
<div class="documentwrapper">
<div class="bodywrapper">
<div class="body" role="main">
<div class="section" id="contact">
<h1>Contact<a class="headerlink" href="#contact" title="Permalink to this headline"></a></h1>
<p>Bug reports, feature requests, etc.: <a class="reference external" href="https://github.com/johnkerl/miller/issues">https://github.com/johnkerl/miller/issues</a></p>
<p>For issues involving this documentation site please also use <a class="reference external" href="https://github.com/johnkerl/miller/issues">https://github.com/johnkerl/miller/issues</a></p>
<p>Other correspondence: <a class="reference external" href="mailto:kerl&#46;john&#46;r+miller&#37;&#52;&#48;gmail&#46;com">mailto:kerl<span>&#46;</span>john<span>&#46;</span>r+miller<span>&#64;</span>gmail<span>&#46;</span>com</a></p>
</div>
<div class="clearer"></div>
</div>
</div>
</div>
<div class="sphinxsidebar" role="navigation" aria-label="main navigation">
<div class="sphinxsidebarwrapper">
<h4>Previous topic</h4>
<p class="topless"><a href="internationalization.html"
title="previous chapter">Internationalization</a></p>
<h4>Next topic</h4>
<p class="topless"><a href="faq.html"
title="next chapter">FAQ</a></p>
<div role="note" aria-label="source link">
<h3>This Page</h3>
<ul class="this-page-menu">
<li><a href="_sources/contact.rst.txt"
rel="nofollow">Show Source</a></li>
</ul>
</div>
<div id="searchbox" style="display: none" role="search">
<h3 id="searchlabel">Quick search</h3>
<div class="searchformwrapper">
<form class="search" action="search.html" method="get">
<input type="text" name="q" aria-labelledby="searchlabel" />
<input type="submit" value="Go" />
</form>
</div>
</div>
<script>$('#searchbox').show(0);</script>
</div>
</div>
<div class="clearer"></div>
</div>
<div class="related" role="navigation" aria-label="related navigation">
<h3>Navigation</h3>
<ul>
<li class="right" style="margin-right: 10px">
<a href="genindex.html" title="General Index"
>index</a></li>
<li class="right" >
<a href="faq.html" title="FAQ"
>next</a> |</li>
<li class="right" >
<a href="internationalization.html" title="Internationalization"
>previous</a> |</li>
<li class="nav-item nav-item-0"><a href="index.html">Miller 5.10.2 documentation</a> &#187;</li>
<li class="nav-item nav-item-this"><a href="">Contact</a></li>
</ul>
</div>
<div class="footer" role="contentinfo">
&#169; Copyright 2020, John Kerl.
Created using <a href="https://www.sphinx-doc.org/">Sphinx</a> 3.2.1.
</div>
</body>
</html>

File diff suppressed because it is too large Load diff

View file

@ -1,610 +0,0 @@
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>Cookbook part 2: Random things, and some math &#8212; Miller 5.10.2 documentation</title>
<link rel="stylesheet" href="_static/classic.css" type="text/css" />
<link rel="stylesheet" href="_static/pygments.css" type="text/css" />
<script id="documentation_options" data-url_root="./" src="_static/documentation_options.js"></script>
<script src="_static/jquery.js"></script>
<script src="_static/underscore.js"></script>
<script src="_static/doctools.js"></script>
<script src="_static/language_data.js"></script>
<link rel="index" title="Index" href="genindex.html" />
<link rel="search" title="Search" href="search.html" />
<link rel="next" title="Cookbook part 3: Stats with and without out-of-stream variables" href="cookbook3.html" />
<link rel="prev" title="Cookbook part 1: common patterns" href="cookbook.html" />
</head><body>
<div class="related" role="navigation" aria-label="related navigation">
<h3>Navigation</h3>
<ul>
<li class="right" style="margin-right: 10px">
<a href="genindex.html" title="General Index"
accesskey="I">index</a></li>
<li class="right" >
<a href="cookbook3.html" title="Cookbook part 3: Stats with and without out-of-stream variables"
accesskey="N">next</a> |</li>
<li class="right" >
<a href="cookbook.html" title="Cookbook part 1: common patterns"
accesskey="P">previous</a> |</li>
<li class="nav-item nav-item-0"><a href="index.html">Miller 5.10.2 documentation</a> &#187;</li>
<li class="nav-item nav-item-this"><a href="">Cookbook part 2: Random things, and some math</a></li>
</ul>
</div>
<div class="document">
<div class="documentwrapper">
<div class="bodywrapper">
<div class="body" role="main">
<div class="section" id="cookbook-part-2-random-things-and-some-math">
<h1>Cookbook part 2: Random things, and some math<a class="headerlink" href="#cookbook-part-2-random-things-and-some-math" title="Permalink to this headline"></a></h1>
<div class="section" id="randomly-selecting-words-from-a-list">
<h2>Randomly selecting words from a list<a class="headerlink" href="#randomly-selecting-words-from-a-list" title="Permalink to this headline"></a></h2>
<p>Given this <a class="reference external" href="https://github.com/johnkerl/miller/blob/master/docs/data/english-words.txt">word list</a>, first take a look to see what the first few lines look like:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ head data/english-words.txt
a
aa
aal
aalii
aam
aardvark
aardwolf
aba
abac
abaca
</pre></div>
</div>
<p>Then the following will randomly sample ten words with four to eight characters in them:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ mlr --from data/english-words.txt --nidx filter -S &#39;n=strlen($1);4&lt;=n&amp;&amp;n&lt;=8&#39; then sample -k 10
thionine
birchman
mildewy
avigate
addedly
abaze
askant
aiming
insulant
coinmate
</pre></div>
</div>
</div>
<div class="section" id="randomly-generating-jabberwocky-words">
<h2>Randomly generating jabberwocky words<a class="headerlink" href="#randomly-generating-jabberwocky-words" title="Permalink to this headline"></a></h2>
<p>These are simple <em>n</em>-grams as <a class="reference external" href="http://johnkerl.org/randspell/randspell-slides-ts.pdf">described here</a>. Some common functions are <a class="reference external" href="https://github.com/johnkerl/miller/blob/master/docs/ngrams/ngfuncs.mlr.txt">located here</a>. Then here are scripts for <a class="reference external" href="https://github.com/johnkerl/miller/blob/master/docs/ngrams/ng1.mlr.txt">1-grams</a> <a class="reference external" href="https://github.com/johnkerl/miller/blob/master/docs/ngrams/ng2.mlr.txt">2-grams</a> <a class="reference external" href="https://github.com/johnkerl/miller/blob/master/docs/ngrams/ng3.mlr.txt">3-grams</a> <a class="reference external" href="https://github.com/johnkerl/miller/blob/master/docs/ngrams/ng4.mlr.txt">4-grams</a>, and <a class="reference external" href="https://github.com/johnkerl/miller/blob/master/docs/ngrams/ng5.mlr.txt">5-grams</a>.</p>
<p>The idea is that words from the input file are consumed, then taken apart and pasted back together in ways which imitate the letter-to-letter transitions found in the word list giving us automatically generated words in the same vein as <em>bromance</em> and <em>spork</em>:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ mlr --nidx --from ./ngrams/gsl-2000.txt put -q -f ./ngrams/ngfuncs.mlr -f ./ngrams/ng5.mlr
beard
plastinguish
politicially
noise
loan
country
controductionary
suppery
lose
lessors
dollar
judge
rottendence
lessenger
diffendant
suggestional
</pre></div>
</div>
</div>
<div class="section" id="program-timing">
<h2>Program timing<a class="headerlink" href="#program-timing" title="Permalink to this headline"></a></h2>
<p>This admittedly artificial example demonstrates using Miller time and stats functions to introspectively acquire some information about Millers own runtime. The <code class="docutils literal notranslate"><span class="pre">delta</span></code> function computes the difference between successive timestamps.</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ ruby -e &#39;10000.times{|i|puts &quot;i=#{i+1}&quot;}&#39; &gt; lines.txt
$ head -n 5 lines.txt
i=1
i=2
i=3
i=4
i=5
mlr --ofmt &#39;%.9le&#39; --opprint put &#39;$t=systime()&#39; then step -a delta -f t lines.txt | head -n 7
i t t_delta
1 1430603027.018016 1.430603027e+09
2 1430603027.018043 2.694129944e-05
3 1430603027.018048 5.006790161e-06
4 1430603027.018052 4.053115845e-06
5 1430603027.018055 2.861022949e-06
6 1430603027.018058 3.099441528e-06
mlr --ofmt &#39;%.9le&#39; --oxtab \
put &#39;$t=systime()&#39; then \
step -a delta -f t then \
filter &#39;$i&gt;1&#39; then \
stats1 -a min,mean,max -f t_delta \
lines.txt
t_delta_min 2.861022949e-06
t_delta_mean 4.077508505e-06
t_delta_max 5.388259888e-05
</pre></div>
</div>
</div>
<div class="section" id="computing-interquartile-ranges">
<h2>Computing interquartile ranges<a class="headerlink" href="#computing-interquartile-ranges" title="Permalink to this headline"></a></h2>
<p>For one or more specified field names, simply compute p25 and p75, then write the IQR as the difference of p75 and p25:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ mlr --oxtab stats1 -f x -a p25,p75 \
then put &#39;$x_iqr = $x_p75 - $x_p25&#39; \
data/medium
x_p25 0.246670
x_p75 0.748186
x_iqr 0.501516
</pre></div>
</div>
<p>For wildcarded field names, first compute p25 and p75, then loop over field names with <code class="docutils literal notranslate"><span class="pre">p25</span></code> in them:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ mlr --oxtab stats1 --fr &#39;[i-z]&#39; -a p25,p75 \
then put &#39;for (k,v in $*) {
if (k =~ &quot;(.*)_p25&quot;) {
$[&quot;\1_iqr&quot;] = $[&quot;\1_p75&quot;] - $[&quot;\1_p25&quot;]
}
}&#39; \
data/medium
i_p25 2501
i_p75 7501
x_p25 0.246670
x_p75 0.748186
y_p25 0.252137
y_p75 0.764003
i_iqr 5000
x_iqr 0.501516
y_iqr 0.511866
</pre></div>
</div>
</div>
<div class="section" id="computing-weighted-means">
<h2>Computing weighted means<a class="headerlink" href="#computing-weighted-means" title="Permalink to this headline"></a></h2>
<p>This might be more elegantly implemented as an option within the <code class="docutils literal notranslate"><span class="pre">stats1</span></code> verb. Meanwhile, its expressible within the DSL:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ mlr --from data/medium put -q &#39;
# Using the y field for weighting in this example
weight = $y;
# Using the a field for weighted aggregation in this example
@sumwx[$a] += weight * $i;
@sumw[$a] += weight;
@sumx[$a] += $i;
@sumn[$a] += 1;
end {
map wmean = {};
map mean = {};
for (a in @sumwx) {
wmean[a] = @sumwx[a] / @sumw[a]
}
for (a in @sumx) {
mean[a] = @sumx[a] / @sumn[a]
}
#emit wmean, &quot;a&quot;;
#emit mean, &quot;a&quot;;
emit (wmean, mean), &quot;a&quot;;
}&#39;
a=pan,wmean=4979.563722,mean=5028.259010
a=eks,wmean=4890.381593,mean=4956.290076
a=wye,wmean=4946.987746,mean=4920.001017
a=zee,wmean=5164.719685,mean=5123.092330
a=hat,wmean=4925.533162,mean=4967.743946
</pre></div>
</div>
</div>
<div class="section" id="generating-random-numbers-from-various-distributions">
<h2>Generating random numbers from various distributions<a class="headerlink" href="#generating-random-numbers-from-various-distributions" title="Permalink to this headline"></a></h2>
<p>Here we can chain together a few simple building blocks:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ cat expo-sample.sh
# Generate 100,000 pairs of independent and identically distributed
# exponentially distributed random variables with the same rate parameter
# (namely, 2.5). Then compute histograms of one of them, along with
# histograms for their sum and their product.
#
# See also https://en.wikipedia.org/wiki/Exponential_distribution
#
# Here I&#39;m using a specified random-number seed so this example always
# produces the same output for this web document: in everyday practice we
# wouldn&#39;t do that.
mlr -n \
--seed 0 \
--opprint \
seqgen --stop 100000 \
then put &#39;
# https://en.wikipedia.org/wiki/Inverse_transform_sampling
func expo_sample(lambda) {
return -log(1-urand())/lambda
}
$u = expo_sample(2.5);
$v = expo_sample(2.5);
$s = $u + $v;
$p = $u * $v;
&#39; \
then histogram -f u,s,p --lo 0 --hi 2 --nbins 50 \
then bar -f u_count,s_count,p_count --auto -w 20
</pre></div>
</div>
<p>Namely:</p>
<ul class="simple">
<li><p>Set the Miller random-number seed so this webdoc looks the same every time I regenerate it.</p></li>
<li><p>Use pretty-printed tabular output.</p></li>
<li><p>Use pretty-printed tabular output.</p></li>
<li><p>Use <code class="docutils literal notranslate"><span class="pre">seqgen</span></code> to produce 100,000 records <code class="docutils literal notranslate"><span class="pre">i=0</span></code>, <code class="docutils literal notranslate"><span class="pre">i=1</span></code>, etc.</p></li>
<li><p>Send those to a <code class="docutils literal notranslate"><span class="pre">put</span></code> step which defines an inverse-transform-sampling function and calls it twice, then computes the sum and product of samples.</p></li>
<li><p>Send those to a histogram, and from there to a bar-plotter. This is just for visualization; you could just as well output CSV and send that off to your own plotting tool, etc.</p></li>
</ul>
<p>The output is as follows:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ sh expo-sample.sh
bin_lo bin_hi u_count s_count p_count
0.000000 0.040000 [78]*******************#[9497] [353]#...................[3732] [20]*******************#[39755]
0.040000 0.080000 [78]******************..[9497] [353]*****...............[3732] [20]*******.............[39755]
0.080000 0.120000 [78]****************....[9497] [353]*********...........[3732] [20]****................[39755]
0.120000 0.160000 [78]**************......[9497] [353]************........[3732] [20]***.................[39755]
0.160000 0.200000 [78]*************.......[9497] [353]**************......[3732] [20]**..................[39755]
0.200000 0.240000 [78]************........[9497] [353]****************....[3732] [20]*...................[39755]
0.240000 0.280000 [78]**********..........[9497] [353]******************..[3732] [20]*...................[39755]
0.280000 0.320000 [78]**********..........[9497] [353]******************..[3732] [20]*...................[39755]
0.320000 0.360000 [78]*********...........[9497] [353]*******************.[3732] [20]#...................[39755]
0.360000 0.400000 [78]********............[9497] [353]*******************.[3732] [20]#...................[39755]
0.400000 0.440000 [78]*******.............[9497] [353]*******************#[3732] [20]#...................[39755]
0.440000 0.480000 [78]******..............[9497] [353]******************..[3732] [20]#...................[39755]
0.480000 0.520000 [78]*****...............[9497] [353]******************..[3732] [20]#...................[39755]
0.520000 0.560000 [78]*****...............[9497] [353]******************..[3732] [20]#...................[39755]
0.560000 0.600000 [78]****................[9497] [353]*****************...[3732] [20]#...................[39755]
0.600000 0.640000 [78]****................[9497] [353]*****************...[3732] [20]#...................[39755]
0.640000 0.680000 [78]****................[9497] [353]****************....[3732] [20]#...................[39755]
0.680000 0.720000 [78]***.................[9497] [353]****************....[3732] [20]#...................[39755]
0.720000 0.760000 [78]***.................[9497] [353]**************......[3732] [20]#...................[39755]
0.760000 0.800000 [78]**..................[9497] [353]**************......[3732] [20]#...................[39755]
0.800000 0.840000 [78]**..................[9497] [353]*************.......[3732] [20]#...................[39755]
0.840000 0.880000 [78]**..................[9497] [353]************........[3732] [20]#...................[39755]
0.880000 0.920000 [78]**..................[9497] [353]***********.........[3732] [20]#...................[39755]
0.920000 0.960000 [78]*...................[9497] [353]***********.........[3732] [20]#...................[39755]
0.960000 1.000000 [78]*...................[9497] [353]**********..........[3732] [20]#...................[39755]
1.000000 1.040000 [78]*...................[9497] [353]*********...........[3732] [20]#...................[39755]
1.040000 1.080000 [78]*...................[9497] [353]*********...........[3732] [20]#...................[39755]
1.080000 1.120000 [78]*...................[9497] [353]********............[3732] [20]#...................[39755]
1.120000 1.160000 [78]*...................[9497] [353]********............[3732] [20]#...................[39755]
1.160000 1.200000 [78]#...................[9497] [353]*******.............[3732] [20]#...................[39755]
1.200000 1.240000 [78]#...................[9497] [353]******..............[3732] [20]#...................[39755]
1.240000 1.280000 [78]#...................[9497] [353]*****...............[3732] [20]#...................[39755]
1.280000 1.320000 [78]#...................[9497] [353]*****...............[3732] [20]#...................[39755]
1.320000 1.360000 [78]#...................[9497] [353]*****...............[3732] [20]#...................[39755]
1.360000 1.400000 [78]#...................[9497] [353]****................[3732] [20]#...................[39755]
1.400000 1.440000 [78]#...................[9497] [353]****................[3732] [20]#...................[39755]
1.440000 1.480000 [78]#...................[9497] [353]***.................[3732] [20]#...................[39755]
1.480000 1.520000 [78]#...................[9497] [353]***.................[3732] [20]#...................[39755]
1.520000 1.560000 [78]#...................[9497] [353]***.................[3732] [20]#...................[39755]
1.560000 1.600000 [78]#...................[9497] [353]**..................[3732] [20]#...................[39755]
1.600000 1.640000 [78]#...................[9497] [353]**..................[3732] [20]#...................[39755]
1.640000 1.680000 [78]#...................[9497] [353]*...................[3732] [20]#...................[39755]
1.680000 1.720000 [78]#...................[9497] [353]*...................[3732] [20]#...................[39755]
1.720000 1.760000 [78]#...................[9497] [353]*...................[3732] [20]#...................[39755]
1.760000 1.800000 [78]#...................[9497] [353]*...................[3732] [20]#...................[39755]
1.800000 1.840000 [78]#...................[9497] [353]#...................[3732] [20]#...................[39755]
1.840000 1.880000 [78]#...................[9497] [353]#...................[3732] [20]#...................[39755]
1.880000 1.920000 [78]#...................[9497] [353]#...................[3732] [20]#...................[39755]
1.920000 1.960000 [78]#...................[9497] [353]#...................[3732] [20]#...................[39755]
1.960000 2.000000 [78]#...................[9497] [353]#...................[3732] [20]#...................[39755]
</pre></div>
</div>
</div>
<div class="section" id="sieve-of-eratosthenes">
<h2>Sieve of Eratosthenes<a class="headerlink" href="#sieve-of-eratosthenes" title="Permalink to this headline"></a></h2>
<p>The <a class="reference external" href="http://en.wikipedia.org/wiki/Sieve_of_Eratosthenes">Sieve of Eratosthenes</a> is a standard introductory programming topic. The idea is to find all primes up to some <em>N</em> by making a list of the numbers 1 to <em>N</em>, then striking out all multiples of 2 except 2 itself, all multiples of 3 except 3 itself, all multiples of 4 except 4 itself, and so on. Whatever survives that without getting marked is a prime. This is easy enough in Miller. Notice that here all the work is in <code class="docutils literal notranslate"><span class="pre">begin</span></code> and <code class="docutils literal notranslate"><span class="pre">end</span></code> statements; there is no file input (so we use <code class="docutils literal notranslate"><span class="pre">mlr</span> <span class="pre">-n</span></code> to keep Miller from waiting for input data).</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ cat programs/sieve.mlr
# ================================================================
# Sieve of Eratosthenes: simple example of Miller DSL as programming language.
# ================================================================
# Put this in a begin-block so we can do either
# mlr -n put -q -f name-of-this-file.mlr
# or
# mlr -n put -q -f name-of-this-file.mlr -e &#39;@n = 200&#39;
# i.e. 100 is the default upper limit, and another can be specified using -e.
begin {
@n = 100;
}
end {
for (int i = 0; i &lt;= @n; i += 1) {
@s[i] = true;
}
@s[0] = false; # 0 is neither prime nor composite
@s[1] = false; # 1 is neither prime nor composite
# Strike out multiples
for (int i = 2; i &lt;= @n; i += 1) {
for (int j = i+i; j &lt;= @n; j += i) {
@s[j] = false;
}
}
# Print survivors
for (int i = 0; i &lt;= @n; i += 1) {
if (@s[i]) {
print i;
}
}
}
</pre></div>
</div>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ mlr -n put -f programs/sieve.mlr
2
3
5
7
11
13
17
19
23
29
31
37
41
43
47
53
59
61
67
71
73
79
83
89
97
</pre></div>
</div>
</div>
<div class="section" id="mandelbrot-set-generator">
<h2>Mandelbrot-set generator<a class="headerlink" href="#mandelbrot-set-generator" title="Permalink to this headline"></a></h2>
<p>The <a class="reference external" href="http://en.wikipedia.org/wiki/Mandelbrot_set">Mandelbrot set</a> is also easily expressed. This isnt an important case of data-processing in the vein for which Miller was designed, but it is an example of Miller as a general-purpose programming language a test case for the expressiveness of the language.</p>
<p>The (approximate) computation of points in the complex plane which are and arent members is just a few lines of complex arithmetic (see the Wikipedia article); how to render them is another task. Using graphics libraries you can create PNG or JPEG files, but another fun way to do this is by printing various characters to the screen:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ cat programs/mand.mlr
# Mandelbrot set generator: simple example of Miller DSL as programming language.
begin {
# Set defaults
@rcorn = -2.0;
@icorn = -2.0;
@side = 4.0;
@iheight = 50;
@iwidth = 100;
@maxits = 100;
@levelstep = 5;
@chars = &quot;@X*o-.&quot;; # Palette of characters to print to the screen.
@verbose = false;
@do_julia = false;
@jr = 0.0; # Real part of Julia point, if any
@ji = 0.0; # Imaginary part of Julia point, if any
}
# Here, we can override defaults from an input file (if any). In Miller&#39;s
# put/filter DSL, absent-null right-hand sides result in no assignment so we
# can simply put @rcorn = $rcorn: if there is a field in the input like
# &#39;rcorn = -1.847&#39; we&#39;ll read and use it, else we&#39;ll keep the default.
@rcorn = $rcorn;
@icorn = $icorn;
@side = $side;
@iheight = $iheight;
@iwidth = $iwidth;
@maxits = $maxits;
@levelstep = $levelstep;
@chars = $chars;
@verbose = $verbose;
@do_julia = $do_julia;
@jr = $jr;
@ji = $ji;
end {
if (@verbose) {
print &quot;RCORN = &quot;.@rcorn;
print &quot;ICORN = &quot;.@icorn;
print &quot;SIDE = &quot;.@side;
print &quot;IHEIGHT = &quot;.@iheight;
print &quot;IWIDTH = &quot;.@iwidth;
print &quot;MAXITS = &quot;.@maxits;
print &quot;LEVELSTEP = &quot;.@levelstep;
print &quot;CHARS = &quot;.@chars;
}
# Iterate over a matrix of rows and columns, printing one character for each cell.
for (int ii = @iheight-1; ii &gt;= 0; ii -= 1) {
num pi = @icorn + (ii/@iheight) * @side;
for (int ir = 0; ir &lt; @iwidth; ir += 1) {
num pr = @rcorn + (ir/@iwidth) * @side;
printn get_point_plot(pr, pi, @maxits, @do_julia, @jr, @ji);
}
print;
}
}
# This is a function to approximate membership in the Mandelbrot set (or Julia
# set for a given Julia point if do_julia == true) for a given point in the
# complex plane.
func get_point_plot(pr, pi, maxits, do_julia, jr, ji) {
num zr = 0.0;
num zi = 0.0;
num cr = 0.0;
num ci = 0.0;
if (!do_julia) {
zr = 0.0;
zi = 0.0;
cr = pr;
ci = pi;
} else {
zr = pr;
zi = pi;
cr = jr;
ci = ji;
}
int iti = 0;
bool escaped = false;
num zt = 0;
for (iti = 0; iti &lt; maxits; iti += 1) {
num mag = zr*zr + zi+zi;
if (mag &gt; 4.0) {
escaped = true;
break;
}
# z := z^2 + c
zt = zr*zr - zi*zi + cr;
zi = 2*zr*zi + ci;
zr = zt;
}
if (!escaped) {
return &quot;.&quot;;
} else {
# The // operator is Miller&#39;s (pythonic) integer-division operator
int level = (iti // @levelstep) % strlen(@chars);
return substr(@chars, level, level);
}
}
</pre></div>
</div>
<p>At standard resolution this makes a nice little ASCII plot:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ mlr -n put -f ./programs/mand.mlr
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@XXXXXX@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@XXXX.XXXX@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@XXXXXXXooXXXX@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@XXXXX**o..*XXXXX@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@XXXXXX*-....-oXXXXXX@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@XXXXX@XXXXXXXXXX*......o*XXXXXXXXXX@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@XXXXXXXXXX**oo*-.-........oo.XXXXXXXXX@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@XXXXXXXXXXXXX....................X..o-XXX@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@XXXXXXXXXXXXXXX*oo......................oXXXXX@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@XXX*XXXXXXXXXXXX**o........................*X*X@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@XXXXXXooo***o*.*XX**X..........................o-XX@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@XXXXXXXX*-.......-***.............................oXX@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@XXXXXXXX*@..........Xo............................*XX@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@XXXX@XXXXXXXX*o@oX...........@...........................oXXX@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
.........................................................o*XXXXX@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@XXXXXXXXX*-.oX...........@...........................oXXXXX@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@XXXXXXXXXX**@..........*o............................*XXXXXXXX@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@XXXXXXXXXXXXX-........***.............................oXXXXXXXXXX@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@XXXXXXXXXXXXoo****o*.XX***@..........................o-XXXXXXXXXXXXX@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@XXXXX*XXXX*XXXXXXX**-........................***XXXXX@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@XXXXXXXXXXXXX*o*.....................@o*XXXX@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@XXXXXXXXXXXX*....................*..o-XX@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@XXXXX*ooo*-.o........oo.X*XXXXXX@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@XXXXXXXXX**@.....*XXXXXXXXX@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@XXXXXXXXX*o....-o*XXXXXX@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@XXXXXXXXXXo*o..*XXXXXXXX@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@XXXXXXXXXXXXX*o*XXXXXXX@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@XXXXXXXXXXXX@XXXXXXXX@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@XXXXXXXXX@@XXXXX@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@XXXXX@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
</pre></div>
</div>
<p>But using a very small font size (as small as my Mac will let me go), and by choosing the coordinates to zoom in on a particular part of the complex plane, we can get a nice little picture:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>#!/bin/bash
# Get the number of rows and columns from the terminal window dimensions
iheight=$(stty size | mlr --nidx --fs space cut -f 1)
iwidth=$(stty size | mlr --nidx --fs space cut -f 2)
echo &quot;rcorn=-1.755350,icorn=+0.014230,side=0.000020,maxits=10000,iheight=$iheight,iwidth=$iwidth&quot; \
| mlr put -f programs/mand.mlr
</pre></div>
</div>
<img alt="_images/mand.png" src="_images/mand.png" />
</div>
</div>
<div class="clearer"></div>
</div>
</div>
</div>
<div class="sphinxsidebar" role="navigation" aria-label="main navigation">
<div class="sphinxsidebarwrapper">
<h3><a href="index.html">Table of Contents</a></h3>
<ul>
<li><a class="reference internal" href="#">Cookbook part 2: Random things, and some math</a><ul>
<li><a class="reference internal" href="#randomly-selecting-words-from-a-list">Randomly selecting words from a list</a></li>
<li><a class="reference internal" href="#randomly-generating-jabberwocky-words">Randomly generating jabberwocky words</a></li>
<li><a class="reference internal" href="#program-timing">Program timing</a></li>
<li><a class="reference internal" href="#computing-interquartile-ranges">Computing interquartile ranges</a></li>
<li><a class="reference internal" href="#computing-weighted-means">Computing weighted means</a></li>
<li><a class="reference internal" href="#generating-random-numbers-from-various-distributions">Generating random numbers from various distributions</a></li>
<li><a class="reference internal" href="#sieve-of-eratosthenes">Sieve of Eratosthenes</a></li>
<li><a class="reference internal" href="#mandelbrot-set-generator">Mandelbrot-set generator</a></li>
</ul>
</li>
</ul>
<h4>Previous topic</h4>
<p class="topless"><a href="cookbook.html"
title="previous chapter">Cookbook part 1: common patterns</a></p>
<h4>Next topic</h4>
<p class="topless"><a href="cookbook3.html"
title="next chapter">Cookbook part 3: Stats with and without out-of-stream variables</a></p>
<div role="note" aria-label="source link">
<h3>This Page</h3>
<ul class="this-page-menu">
<li><a href="_sources/cookbook2.rst.txt"
rel="nofollow">Show Source</a></li>
</ul>
</div>
<div id="searchbox" style="display: none" role="search">
<h3 id="searchlabel">Quick search</h3>
<div class="searchformwrapper">
<form class="search" action="search.html" method="get">
<input type="text" name="q" aria-labelledby="searchlabel" />
<input type="submit" value="Go" />
</form>
</div>
</div>
<script>$('#searchbox').show(0);</script>
</div>
</div>
<div class="clearer"></div>
</div>
<div class="related" role="navigation" aria-label="related navigation">
<h3>Navigation</h3>
<ul>
<li class="right" style="margin-right: 10px">
<a href="genindex.html" title="General Index"
>index</a></li>
<li class="right" >
<a href="cookbook3.html" title="Cookbook part 3: Stats with and without out-of-stream variables"
>next</a> |</li>
<li class="right" >
<a href="cookbook.html" title="Cookbook part 1: common patterns"
>previous</a> |</li>
<li class="nav-item nav-item-0"><a href="index.html">Miller 5.10.2 documentation</a> &#187;</li>
<li class="nav-item nav-item-this"><a href="">Cookbook part 2: Random things, and some math</a></li>
</ul>
</div>
<div class="footer" role="contentinfo">
&#169; Copyright 2020, John Kerl.
Created using <a href="https://www.sphinx-doc.org/">Sphinx</a> 3.2.1.
</div>
</body>
</html>

View file

@ -1,405 +0,0 @@
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>Cookbook part 3: Stats with and without out-of-stream variables &#8212; Miller 5.10.2 documentation</title>
<link rel="stylesheet" href="_static/classic.css" type="text/css" />
<link rel="stylesheet" href="_static/pygments.css" type="text/css" />
<script id="documentation_options" data-url_root="./" src="_static/documentation_options.js"></script>
<script src="_static/jquery.js"></script>
<script src="_static/underscore.js"></script>
<script src="_static/doctools.js"></script>
<script src="_static/language_data.js"></script>
<link rel="index" title="Index" href="genindex.html" />
<link rel="search" title="Search" href="search.html" />
<link rel="next" title="Mixing with other languages" href="data-sharing.html" />
<link rel="prev" title="Cookbook part 2: Random things, and some math" href="cookbook2.html" />
</head><body>
<div class="related" role="navigation" aria-label="related navigation">
<h3>Navigation</h3>
<ul>
<li class="right" style="margin-right: 10px">
<a href="genindex.html" title="General Index"
accesskey="I">index</a></li>
<li class="right" >
<a href="data-sharing.html" title="Mixing with other languages"
accesskey="N">next</a> |</li>
<li class="right" >
<a href="cookbook2.html" title="Cookbook part 2: Random things, and some math"
accesskey="P">previous</a> |</li>
<li class="nav-item nav-item-0"><a href="index.html">Miller 5.10.2 documentation</a> &#187;</li>
<li class="nav-item nav-item-this"><a href="">Cookbook part 3: Stats with and without out-of-stream variables</a></li>
</ul>
</div>
<div class="document">
<div class="documentwrapper">
<div class="bodywrapper">
<div class="body" role="main">
<div class="section" id="cookbook-part-3-stats-with-and-without-out-of-stream-variables">
<h1>Cookbook part 3: Stats with and without out-of-stream variables<a class="headerlink" href="#cookbook-part-3-stats-with-and-without-out-of-stream-variables" title="Permalink to this headline"></a></h1>
<div class="section" id="overview">
<h2>Overview<a class="headerlink" href="#overview" title="Permalink to this headline"></a></h2>
<p>One of Millers strengths is its compact notation: for example, given input of the form</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ head -n 5 ../data/medium
a=pan,b=pan,i=1,x=0.3467901443380824,y=0.7268028627434533
a=eks,b=pan,i=2,x=0.7586799647899636,y=0.5221511083334797
a=wye,b=wye,i=3,x=0.20460330576630303,y=0.33831852551664776
a=eks,b=wye,i=4,x=0.38139939387114097,y=0.13418874328430463
a=wye,b=pan,i=5,x=0.5732889198020006,y=0.8636244699032729
</pre></div>
</div>
<p>you can simply do</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ mlr --oxtab stats1 -a sum -f x ../data/medium
x_sum 4986.019682
</pre></div>
</div>
<p>or</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ mlr --opprint stats1 -a sum -f x -g b ../data/medium
b x_sum
pan 965.763670
wye 1023.548470
zee 979.742016
eks 1016.772857
hat 1000.192668
</pre></div>
</div>
<p>rather than the more tedious</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ mlr --oxtab put -q &#39;
@x_sum += $x;
end {
emit @x_sum
}
&#39; data/medium
x_sum 4986.019682
</pre></div>
</div>
<p>or</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ mlr --opprint put -q &#39;
@x_sum[$b] += $x;
end {
emit @x_sum, &quot;b&quot;
}
&#39; data/medium
b x_sum
pan 965.763670
wye 1023.548470
zee 979.742016
eks 1016.772857
hat 1000.192668
</pre></div>
</div>
<p>The former (<code class="docutils literal notranslate"><span class="pre">mlr</span> <span class="pre">stats1</span></code> et al.) has the advantages of being easier to type, being less error-prone to type, and running faster.</p>
<p>Nonetheless, out-of-stream variables (which I whimsically call <em>oosvars</em>), begin/end blocks, and emit statements give you the ability to implement logic if you wish to do so which isnt present in other Miller verbs. (If you find yourself often using the same out-of-stream-variable logic over and over, please file a request at <a class="reference external" href="https://github.com/johnkerl/miller/issues">https://github.com/johnkerl/miller/issues</a> to get it implemented directly in C as a Miller verb of its own.)</p>
<p>The following examples compute some things using oosvars which are already computable using Miller verbs, by way of providing food for thought.</p>
</div>
<div class="section" id="mean-without-with-oosvars">
<h2>Mean without/with oosvars<a class="headerlink" href="#mean-without-with-oosvars" title="Permalink to this headline"></a></h2>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ mlr --opprint stats1 -a mean -f x data/medium
x_mean
0.498602
</pre></div>
</div>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ mlr --opprint put -q &#39;
@x_sum += $x;
@x_count += 1;
end {
@x_mean = @x_sum / @x_count;
emit @x_mean
}
&#39; data/medium
x_mean
0.498602
</pre></div>
</div>
</div>
<div class="section" id="keyed-mean-without-with-oosvars">
<h2>Keyed mean without/with oosvars<a class="headerlink" href="#keyed-mean-without-with-oosvars" title="Permalink to this headline"></a></h2>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ mlr --opprint stats1 -a mean -f x -g a,b data/medium
a b x_mean
pan pan 0.513314
eks pan 0.485076
wye wye 0.491501
eks wye 0.483895
wye pan 0.499612
zee pan 0.519830
eks zee 0.495463
zee wye 0.514267
hat wye 0.493813
pan wye 0.502362
zee eks 0.488393
hat zee 0.509999
hat eks 0.485879
wye hat 0.497730
pan eks 0.503672
eks eks 0.522799
hat hat 0.479931
hat pan 0.464336
zee zee 0.512756
pan hat 0.492141
pan zee 0.496604
zee hat 0.467726
wye zee 0.505907
eks hat 0.500679
wye eks 0.530604
</pre></div>
</div>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ mlr --opprint put -q &#39;
@x_sum[$a][$b] += $x;
@x_count[$a][$b] += 1;
end{
for ((a, b), v in @x_sum) {
@x_mean[a][b] = @x_sum[a][b] / @x_count[a][b];
}
emit @x_mean, &quot;a&quot;, &quot;b&quot;
}
&#39; data/medium
a b x_mean
pan pan 0.513314
pan wye 0.502362
pan eks 0.503672
pan hat 0.492141
pan zee 0.496604
eks pan 0.485076
eks wye 0.483895
eks zee 0.495463
eks eks 0.522799
eks hat 0.500679
wye wye 0.491501
wye pan 0.499612
wye hat 0.497730
wye zee 0.505907
wye eks 0.530604
zee pan 0.519830
zee wye 0.514267
zee eks 0.488393
zee zee 0.512756
zee hat 0.467726
hat wye 0.493813
hat zee 0.509999
hat eks 0.485879
hat hat 0.479931
hat pan 0.464336
</pre></div>
</div>
</div>
<div class="section" id="variance-and-standard-deviation-without-with-oosvars">
<h2>Variance and standard deviation without/with oosvars<a class="headerlink" href="#variance-and-standard-deviation-without-with-oosvars" title="Permalink to this headline"></a></h2>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ mlr --oxtab stats1 -a count,sum,mean,var,stddev -f x data/medium
x_count 10000
x_sum 4986.019682
x_mean 0.498602
x_var 0.084270
x_stddev 0.290293
</pre></div>
</div>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ cat variance.mlr
@n += 1;
@sumx += $x;
@sumx2 += $x**2;
end {
@mean = @sumx / @n;
@var = (@sumx2 - @mean * (2 * @sumx - @n * @mean)) / (@n - 1);
@stddev = sqrt(@var);
emitf @n, @sumx, @sumx2, @mean, @var, @stddev
}
</pre></div>
</div>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ mlr --oxtab put -q -f variance.mlr data/medium
n 10000
sumx 4986.019682
sumx2 3328.652400
mean 0.498602
var 0.084270
stddev 0.290293
</pre></div>
</div>
<p>You can also do this keyed, of course, imitating the keyed-mean example above.</p>
</div>
<div class="section" id="min-max-without-with-oosvars">
<h2>Min/max without/with oosvars<a class="headerlink" href="#min-max-without-with-oosvars" title="Permalink to this headline"></a></h2>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ mlr --oxtab stats1 -a min,max -f x data/medium
x_min 0.000045
x_max 0.999953
</pre></div>
</div>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ mlr --oxtab put -q &#39;@x_min = min(@x_min, $x); @x_max = max(@x_max, $x); end{emitf @x_min, @x_max}&#39; data/medium
x_min 0.000045
x_max 0.999953
</pre></div>
</div>
</div>
<div class="section" id="keyed-min-max-without-with-oosvars">
<h2>Keyed min/max without/with oosvars<a class="headerlink" href="#keyed-min-max-without-with-oosvars" title="Permalink to this headline"></a></h2>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ mlr --opprint stats1 -a min,max -f x -g a data/medium
a x_min x_max
pan 0.000204 0.999403
eks 0.000692 0.998811
wye 0.000187 0.999823
zee 0.000549 0.999490
hat 0.000045 0.999953
</pre></div>
</div>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ mlr --opprint --from data/medium put -q &#39;
@min[$a] = min(@min[$a], $x);
@max[$a] = max(@max[$a], $x);
end{
emit (@min, @max), &quot;a&quot;;
}
&#39;
a min max
pan 0.000204 0.999403
eks 0.000692 0.998811
wye 0.000187 0.999823
zee 0.000549 0.999490
hat 0.000045 0.999953
</pre></div>
</div>
</div>
<div class="section" id="delta-without-with-oosvars">
<h2>Delta without/with oosvars<a class="headerlink" href="#delta-without-with-oosvars" title="Permalink to this headline"></a></h2>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ mlr --opprint step -a delta -f x data/small
a b i x y x_delta
pan pan 1 0.3467901443380824 0.7268028627434533 0
eks pan 2 0.7586799647899636 0.5221511083334797 0.411890
wye wye 3 0.20460330576630303 0.33831852551664776 -0.554077
eks wye 4 0.38139939387114097 0.13418874328430463 0.176796
wye pan 5 0.5732889198020006 0.8636244699032729 0.191890
</pre></div>
</div>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ mlr --opprint put &#39;$x_delta = is_present(@last) ? $x - @last : 0; @last = $x&#39; data/small
a b i x y x_delta
pan pan 1 0.3467901443380824 0.7268028627434533 0
eks pan 2 0.7586799647899636 0.5221511083334797 0.411890
wye wye 3 0.20460330576630303 0.33831852551664776 -0.554077
eks wye 4 0.38139939387114097 0.13418874328430463 0.176796
wye pan 5 0.5732889198020006 0.8636244699032729 0.191890
</pre></div>
</div>
</div>
<div class="section" id="keyed-delta-without-with-oosvars">
<h2>Keyed delta without/with oosvars<a class="headerlink" href="#keyed-delta-without-with-oosvars" title="Permalink to this headline"></a></h2>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ mlr --opprint step -a delta -f x -g a data/small
a b i x y x_delta
pan pan 1 0.3467901443380824 0.7268028627434533 0
eks pan 2 0.7586799647899636 0.5221511083334797 0
wye wye 3 0.20460330576630303 0.33831852551664776 0
eks wye 4 0.38139939387114097 0.13418874328430463 -0.377281
wye pan 5 0.5732889198020006 0.8636244699032729 0.368686
</pre></div>
</div>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ mlr --opprint put &#39;$x_delta = is_present(@last[$a]) ? $x - @last[$a] : 0; @last[$a]=$x&#39; data/small
a b i x y x_delta
pan pan 1 0.3467901443380824 0.7268028627434533 0
eks pan 2 0.7586799647899636 0.5221511083334797 0
wye wye 3 0.20460330576630303 0.33831852551664776 0
eks wye 4 0.38139939387114097 0.13418874328430463 -0.377281
wye pan 5 0.5732889198020006 0.8636244699032729 0.368686
</pre></div>
</div>
</div>
<div class="section" id="exponentially-weighted-moving-averages-without-with-oosvars">
<h2>Exponentially weighted moving averages without/with oosvars<a class="headerlink" href="#exponentially-weighted-moving-averages-without-with-oosvars" title="Permalink to this headline"></a></h2>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ mlr --opprint step -a ewma -d 0.1 -f x data/small
a b i x y x_ewma_0.1
pan pan 1 0.3467901443380824 0.7268028627434533 0.346790
eks pan 2 0.7586799647899636 0.5221511083334797 0.387979
wye wye 3 0.20460330576630303 0.33831852551664776 0.369642
eks wye 4 0.38139939387114097 0.13418874328430463 0.370817
wye pan 5 0.5732889198020006 0.8636244699032729 0.391064
</pre></div>
</div>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ mlr --opprint put &#39;
begin{ @a=0.1 };
$e = NR==1 ? $x : @a * $x + (1 - @a) * @e;
@e=$e
&#39; data/small
a b i x y e
pan pan 1 0.3467901443380824 0.7268028627434533 0.346790
eks pan 2 0.7586799647899636 0.5221511083334797 0.387979
wye wye 3 0.20460330576630303 0.33831852551664776 0.369642
eks wye 4 0.38139939387114097 0.13418874328430463 0.370817
wye pan 5 0.5732889198020006 0.8636244699032729 0.391064
</pre></div>
</div>
</div>
</div>
<div class="clearer"></div>
</div>
</div>
</div>
<div class="sphinxsidebar" role="navigation" aria-label="main navigation">
<div class="sphinxsidebarwrapper">
<h3><a href="index.html">Table of Contents</a></h3>
<ul>
<li><a class="reference internal" href="#">Cookbook part 3: Stats with and without out-of-stream variables</a><ul>
<li><a class="reference internal" href="#overview">Overview</a></li>
<li><a class="reference internal" href="#mean-without-with-oosvars">Mean without/with oosvars</a></li>
<li><a class="reference internal" href="#keyed-mean-without-with-oosvars">Keyed mean without/with oosvars</a></li>
<li><a class="reference internal" href="#variance-and-standard-deviation-without-with-oosvars">Variance and standard deviation without/with oosvars</a></li>
<li><a class="reference internal" href="#min-max-without-with-oosvars">Min/max without/with oosvars</a></li>
<li><a class="reference internal" href="#keyed-min-max-without-with-oosvars">Keyed min/max without/with oosvars</a></li>
<li><a class="reference internal" href="#delta-without-with-oosvars">Delta without/with oosvars</a></li>
<li><a class="reference internal" href="#keyed-delta-without-with-oosvars">Keyed delta without/with oosvars</a></li>
<li><a class="reference internal" href="#exponentially-weighted-moving-averages-without-with-oosvars">Exponentially weighted moving averages without/with oosvars</a></li>
</ul>
</li>
</ul>
<h4>Previous topic</h4>
<p class="topless"><a href="cookbook2.html"
title="previous chapter">Cookbook part 2: Random things, and some math</a></p>
<h4>Next topic</h4>
<p class="topless"><a href="data-sharing.html"
title="next chapter">Mixing with other languages</a></p>
<div role="note" aria-label="source link">
<h3>This Page</h3>
<ul class="this-page-menu">
<li><a href="_sources/cookbook3.rst.txt"
rel="nofollow">Show Source</a></li>
</ul>
</div>
<div id="searchbox" style="display: none" role="search">
<h3 id="searchlabel">Quick search</h3>
<div class="searchformwrapper">
<form class="search" action="search.html" method="get">
<input type="text" name="q" aria-labelledby="searchlabel" />
<input type="submit" value="Go" />
</form>
</div>
</div>
<script>$('#searchbox').show(0);</script>
</div>
</div>
<div class="clearer"></div>
</div>
<div class="related" role="navigation" aria-label="related navigation">
<h3>Navigation</h3>
<ul>
<li class="right" style="margin-right: 10px">
<a href="genindex.html" title="General Index"
>index</a></li>
<li class="right" >
<a href="data-sharing.html" title="Mixing with other languages"
>next</a> |</li>
<li class="right" >
<a href="cookbook2.html" title="Cookbook part 2: Random things, and some math"
>previous</a> |</li>
<li class="nav-item nav-item-0"><a href="index.html">Miller 5.10.2 documentation</a> &#187;</li>
<li class="nav-item nav-item-this"><a href="">Cookbook part 3: Stats with and without out-of-stream variables</a></li>
</ul>
</div>
<div class="footer" role="contentinfo">
&#169; Copyright 2020, John Kerl.
Created using <a href="https://www.sphinx-doc.org/">Sphinx</a> 3.2.1.
</div>
</body>
</html>

View file

@ -1,186 +0,0 @@
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>Customization: .mlrrc &#8212; Miller 5.10.2 documentation</title>
<link rel="stylesheet" href="_static/classic.css" type="text/css" />
<link rel="stylesheet" href="_static/pygments.css" type="text/css" />
<script id="documentation_options" data-url_root="./" src="_static/documentation_options.js"></script>
<script src="_static/jquery.js"></script>
<script src="_static/underscore.js"></script>
<script src="_static/doctools.js"></script>
<script src="_static/language_data.js"></script>
<link rel="index" title="Index" href="genindex.html" />
<link rel="search" title="Search" href="search.html" />
<link rel="next" title="Installation" href="install.html" />
<link rel="prev" title="Record-heterogeneity" href="record-heterogeneity.html" />
</head><body>
<div class="related" role="navigation" aria-label="related navigation">
<h3>Navigation</h3>
<ul>
<li class="right" style="margin-right: 10px">
<a href="genindex.html" title="General Index"
accesskey="I">index</a></li>
<li class="right" >
<a href="install.html" title="Installation"
accesskey="N">next</a> |</li>
<li class="right" >
<a href="record-heterogeneity.html" title="Record-heterogeneity"
accesskey="P">previous</a> |</li>
<li class="nav-item nav-item-0"><a href="index.html">Miller 5.10.2 documentation</a> &#187;</li>
<li class="nav-item nav-item-this"><a href="">Customization: .mlrrc</a></li>
</ul>
</div>
<div class="document">
<div class="documentwrapper">
<div class="bodywrapper">
<div class="body" role="main">
<div class="section" id="customization-mlrrc">
<h1>Customization: .mlrrc<a class="headerlink" href="#customization-mlrrc" title="Permalink to this headline"></a></h1>
<div class="section" id="how-to-use-mlrrc">
<h2>How to use .mlrrc<a class="headerlink" href="#how-to-use-mlrrc" title="Permalink to this headline"></a></h2>
<p>Suppose you always use CSV files. Then instead of always having to type <code class="docutils literal notranslate"><span class="pre">--csv</span></code> as in</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">mlr</span> <span class="o">--</span><span class="n">csv</span> <span class="n">cut</span> <span class="o">-</span><span class="n">x</span> <span class="o">-</span><span class="n">f</span> <span class="n">extra</span> <span class="n">mydata</span><span class="o">.</span><span class="n">csv</span>
</pre></div>
</div>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">mlr</span> <span class="o">--</span><span class="n">csv</span> <span class="n">sort</span> <span class="o">-</span><span class="n">n</span> <span class="nb">id</span> <span class="n">mydata</span><span class="o">.</span><span class="n">csv</span>
</pre></div>
</div>
<p>and so on, you can instead put the following into your <code class="docutils literal notranslate"><span class="pre">$HOME/.mlrrc</span></code>:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="o">--</span><span class="n">csv</span>
</pre></div>
</div>
<p>Then you can just type things like</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">mlr</span> <span class="n">cut</span> <span class="o">-</span><span class="n">x</span> <span class="o">-</span><span class="n">f</span> <span class="n">extra</span> <span class="n">mydata</span><span class="o">.</span><span class="n">csv</span>
</pre></div>
</div>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">mlr</span> <span class="n">sort</span> <span class="o">-</span><span class="n">n</span> <span class="nb">id</span> <span class="n">mydata</span><span class="o">.</span><span class="n">csv</span>
</pre></div>
</div>
<p>and the <code class="docutils literal notranslate"><span class="pre">--csv</span></code> part will automatically be understood. (If you do want to process, say, a JSON file then <code class="docutils literal notranslate"><span class="pre">mlr</span> <span class="pre">--json</span> <span class="pre">...</span></code> at the command line will override the default from your <code class="docutils literal notranslate"><span class="pre">.mlrrc</span></code>.)</p>
</div>
<div class="section" id="what-you-can-put-in-your-mlrrc">
<h2>What you can put in your .mlrrc<a class="headerlink" href="#what-you-can-put-in-your-mlrrc" title="Permalink to this headline"></a></h2>
<ul class="simple">
<li><p>You can include any command-line flags, except the “terminal” ones such as <code class="docutils literal notranslate"><span class="pre">--help</span></code>.</p></li>
<li><p>The formatting rule is you need to put one flag beginning with <code class="docutils literal notranslate"><span class="pre">--</span></code> per line: for example, <code class="docutils literal notranslate"><span class="pre">--csv</span></code> on one line and <code class="docutils literal notranslate"><span class="pre">--nr-progress-mod</span> <span class="pre">1000</span></code> on a separate line.</p></li>
<li><p>Since every line starts with a <code class="docutils literal notranslate"><span class="pre">--</span></code> option, you can leave off the initial <code class="docutils literal notranslate"><span class="pre">--</span></code> if you want. For example, <code class="docutils literal notranslate"><span class="pre">ojson</span></code> is the same as <code class="docutils literal notranslate"><span class="pre">--ojson</span></code>, and <code class="docutils literal notranslate"><span class="pre">nr-progress-mod</span> <span class="pre">1000</span></code> is the same as <code class="docutils literal notranslate"><span class="pre">--nr-progress-mod</span> <span class="pre">1000</span></code>.</p></li>
<li><p>Comments are from a <code class="docutils literal notranslate"><span class="pre">#</span></code> to the end of the line.</p></li>
<li><p>Empty lines are ignored including lines which are empty after comments are removed.</p></li>
</ul>
<p>Here is an example <code class="docutils literal notranslate"><span class="pre">.mlrrc</span> <span class="pre">file</span></code>:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="c1"># These are my preferred default settings for Miller</span>
<span class="c1"># Input and output formats are CSV by default (unless otherwise specified</span>
<span class="c1"># on the mlr command line):</span>
<span class="n">csv</span>
<span class="c1"># If a data line has fewer fields than the header line, instead of erroring</span>
<span class="c1"># (which is the default), just insert empty values for the missing ones:</span>
<span class="n">allow</span><span class="o">-</span><span class="n">ragged</span><span class="o">-</span><span class="n">csv</span><span class="o">-</span><span class="nb">input</span>
<span class="c1"># These are no-ops for CSV, but when I do use JSON output, I want these</span>
<span class="c1"># pretty-printing options to be used:</span>
<span class="n">jvstack</span>
<span class="n">jlistwrap</span>
<span class="c1"># Use &quot;@&quot;, rather than &quot;#&quot;, for comments within data files:</span>
<span class="n">skip</span><span class="o">-</span><span class="n">comments</span><span class="o">-</span><span class="k">with</span> <span class="o">@</span>
</pre></div>
</div>
</div>
<div class="section" id="where-to-put-your-mlrrc">
<h2>Where to put your .mlrrc<a class="headerlink" href="#where-to-put-your-mlrrc" title="Permalink to this headline"></a></h2>
<ul class="simple">
<li><p>If the environment variable <code class="docutils literal notranslate"><span class="pre">MLRRC</span></code> is set:</p>
<ul>
<li><p>If its value is <code class="docutils literal notranslate"><span class="pre">__none__</span></code> then no <code class="docutils literal notranslate"><span class="pre">.mlrrc</span></code> files are processed. (This is nice for things like regression testing.)</p></li>
<li><p>Otherwise, its value (as a filename) is loaded and processed. If there are syntax errors, they abort <code class="docutils literal notranslate"><span class="pre">mlr</span></code> with a usage message (as if you had mistyped something on the command line). If the file cant be loaded at all, though, it is silently skipped.</p></li>
<li><p>Any <code class="docutils literal notranslate"><span class="pre">.mlrrc</span></code> in your home directory or current directory is ignored whenever <code class="docutils literal notranslate"><span class="pre">MLRRC</span></code> is set in the environment.</p></li>
<li><p>Example line in your shells rc file: <code class="docutils literal notranslate"><span class="pre">export</span> <span class="pre">MLRRC=/path/to/my/mlrrc</span></code></p></li>
</ul>
</li>
<li><p>Otherwise:</p>
<ul>
<li><p>If <code class="docutils literal notranslate"><span class="pre">$HOME/.mlrrc</span></code> exists, its processed as above.</p></li>
<li><p>If <code class="docutils literal notranslate"><span class="pre">./.mlrrc</span></code> exists, its then also processed as above.</p></li>
<li><p>The idea is you can have all your settings in your <code class="docutils literal notranslate"><span class="pre">$HOME/.mlrrc</span></code>, then override maybe one or two for your current directory if you like.</p></li>
</ul>
</li>
</ul>
</div>
</div>
<div class="clearer"></div>
</div>
</div>
</div>
<div class="sphinxsidebar" role="navigation" aria-label="main navigation">
<div class="sphinxsidebarwrapper">
<h3><a href="index.html">Table of Contents</a></h3>
<ul>
<li><a class="reference internal" href="#">Customization: .mlrrc</a><ul>
<li><a class="reference internal" href="#how-to-use-mlrrc">How to use .mlrrc</a></li>
<li><a class="reference internal" href="#what-you-can-put-in-your-mlrrc">What you can put in your .mlrrc</a></li>
<li><a class="reference internal" href="#where-to-put-your-mlrrc">Where to put your .mlrrc</a></li>
</ul>
</li>
</ul>
<h4>Previous topic</h4>
<p class="topless"><a href="record-heterogeneity.html"
title="previous chapter">Record-heterogeneity</a></p>
<h4>Next topic</h4>
<p class="topless"><a href="install.html"
title="next chapter">Installation</a></p>
<div role="note" aria-label="source link">
<h3>This Page</h3>
<ul class="this-page-menu">
<li><a href="_sources/customization.rst.txt"
rel="nofollow">Show Source</a></li>
</ul>
</div>
<div id="searchbox" style="display: none" role="search">
<h3 id="searchlabel">Quick search</h3>
<div class="searchformwrapper">
<form class="search" action="search.html" method="get">
<input type="text" name="q" aria-labelledby="searchlabel" />
<input type="submit" value="Go" />
</form>
</div>
</div>
<script>$('#searchbox').show(0);</script>
</div>
</div>
<div class="clearer"></div>
</div>
<div class="related" role="navigation" aria-label="related navigation">
<h3>Navigation</h3>
<ul>
<li class="right" style="margin-right: 10px">
<a href="genindex.html" title="General Index"
>index</a></li>
<li class="right" >
<a href="install.html" title="Installation"
>next</a> |</li>
<li class="right" >
<a href="record-heterogeneity.html" title="Record-heterogeneity"
>previous</a> |</li>
<li class="nav-item nav-item-0"><a href="index.html">Miller 5.10.2 documentation</a> &#187;</li>
<li class="nav-item nav-item-this"><a href="">Customization: .mlrrc</a></li>
</ul>
</div>
<div class="footer" role="contentinfo">
&#169; Copyright 2020, John Kerl.
Created using <a href="https://www.sphinx-doc.org/">Sphinx</a> 3.2.1.
</div>
</body>
</html>

View file

@ -1,284 +0,0 @@
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>Data-diving examples &#8212; Miller 5.10.2 documentation</title>
<link rel="stylesheet" href="_static/classic.css" type="text/css" />
<link rel="stylesheet" href="_static/pygments.css" type="text/css" />
<script id="documentation_options" data-url_root="./" src="_static/documentation_options.js"></script>
<script src="_static/jquery.js"></script>
<script src="_static/underscore.js"></script>
<script src="_static/doctools.js"></script>
<script src="_static/language_data.js"></script>
<link rel="index" title="Index" href="genindex.html" />
<link rel="search" title="Search" href="search.html" />
<link rel="next" title="Cookbook part 1: common patterns" href="cookbook.html" />
<link rel="prev" title="Log-processing examples" href="log-processing-examples.html" />
</head><body>
<div class="related" role="navigation" aria-label="related navigation">
<h3>Navigation</h3>
<ul>
<li class="right" style="margin-right: 10px">
<a href="genindex.html" title="General Index"
accesskey="I">index</a></li>
<li class="right" >
<a href="cookbook.html" title="Cookbook part 1: common patterns"
accesskey="N">next</a> |</li>
<li class="right" >
<a href="log-processing-examples.html" title="Log-processing examples"
accesskey="P">previous</a> |</li>
<li class="nav-item nav-item-0"><a href="index.html">Miller 5.10.2 documentation</a> &#187;</li>
<li class="nav-item nav-item-this"><a href="">Data-diving examples</a></li>
</ul>
</div>
<div class="document">
<div class="documentwrapper">
<div class="bodywrapper">
<div class="body" role="main">
<div class="section" id="data-diving-examples">
<h1>Data-diving examples<a class="headerlink" href="#data-diving-examples" title="Permalink to this headline"></a></h1>
<div class="section" id="flins-data">
<h2>flins data<a class="headerlink" href="#flins-data" title="Permalink to this headline"></a></h2>
<p>The <a class="reference external" href="data/flins.csv">flins.csv</a> file is some sample data obtained from <a class="reference external" href="https://support.spatialkey.com/spatialkey-sample-csv-data">https://support.spatialkey.com/spatialkey-sample-csv-data</a>.</p>
<p>Vertical-tabular format is good for a quick look at CSV data layout seeing what columns you have to work with:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ head -n 2 data/flins.csv | mlr --icsv --oxtab cat
county Seminole
tiv_2011 22890.55
tiv_2012 20848.71
line Residential
</pre></div>
</div>
<p>A few simple queries:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ mlr --from data/flins.csv --icsv --opprint count-distinct -f county | head
county count
Seminole 1
Miami Dade 2
Palm Beach 1
Highlands 2
Duval 1
St. Johns 1
$ mlr --from data/flins.csv --icsv --opprint count-distinct -f construction,line
</pre></div>
</div>
<p>Categorization of total insured value:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ mlr --from data/flins.csv --icsv --opprint stats1 -a min,mean,max -f tiv_2012
tiv_2012_min tiv_2012_mean tiv_2012_max
19757.910000 1061531.463750 2785551.630000
$ mlr --from data/flins.csv --icsv --opprint stats1 -a min,mean,max -f tiv_2012 -g construction,line
$ mlr --from data/flins.csv --icsv --oxtab stats1 -a p0,p10,p50,p90,p95,p99,p100 -f hu_site_deductible
hu_site_deductible_p0
hu_site_deductible_p10
hu_site_deductible_p50
hu_site_deductible_p90
hu_site_deductible_p95
hu_site_deductible_p99
hu_site_deductible_p100
$ mlr --from data/flins.csv --icsv --opprint stats1 -a p95,p99,p100 -f hu_site_deductible -g county then sort -f county | head
county hu_site_deductible_p95 hu_site_deductible_p99 hu_site_deductible_p100
Duval - - -
Highlands - - -
Miami Dade - - -
Palm Beach - - -
Seminole - - -
St. Johns - - -
$ mlr --from data/flins.csv --icsv --oxtab stats2 -a corr,linreg-ols,r2 -f tiv_2011,tiv_2012
tiv_2011_tiv_2012_corr 0.935363
tiv_2011_tiv_2012_ols_m 1.089091
tiv_2011_tiv_2012_ols_b 103095.523356
tiv_2011_tiv_2012_ols_n 8
tiv_2011_tiv_2012_r2 0.874904
$ mlr --from data/flins.csv --icsv --opprint stats2 -a corr,linreg-ols,r2 -f tiv_2011,tiv_2012 -g county
county tiv_2011_tiv_2012_corr tiv_2011_tiv_2012_ols_m tiv_2011_tiv_2012_ols_b tiv_2011_tiv_2012_ols_n tiv_2011_tiv_2012_r2
Seminole - - - 1 -
Miami Dade 1.000000 0.930643 -2311.154328 2 1.000000
Palm Beach - - - 1 -
Highlands 1.000000 1.055693 -4529.793939 2 1.000000
Duval - - - 1 -
St. Johns - - - 1 -
</pre></div>
</div>
</div>
<div class="section" id="color-shape-data">
<h2>Color/shape data<a class="headerlink" href="#color-shape-data" title="Permalink to this headline"></a></h2>
<p>The <a class="reference external" href="https://github.com/johnkerl/miller/blob/master/docs/data/colored-shapes.dkvp">colored-shapes.dkvp</a> file is some sample data produced by the <a class="reference external" href="https://github.com/johnkerl/miller/blob/master/doc/datagen/mkdat2">mkdat2</a> script. The idea is:</p>
<ul class="simple">
<li><p>Produce some data with known distributions and correlations, and verify that Miller recovers those properties empirically.</p></li>
<li><p>Each record is labeled with one of a few colors and one of a few shapes.</p></li>
<li><p>The <code class="docutils literal notranslate"><span class="pre">flag</span></code> field is 0 or 1, with probability dependent on color</p></li>
<li><p>The <code class="docutils literal notranslate"><span class="pre">u</span></code> field is plain uniform on the unit interval.</p></li>
<li><p>The <code class="docutils literal notranslate"><span class="pre">v</span></code> field is the same, except tightly correlated with <code class="docutils literal notranslate"><span class="pre">u</span></code> for red circles.</p></li>
<li><p>The <code class="docutils literal notranslate"><span class="pre">w</span></code> field is autocorrelated for each color/shape pair.</p></li>
<li><p>The <code class="docutils literal notranslate"><span class="pre">x</span></code> field is boring Gaussian with mean 5 and standard deviation about 1.2, with no dependence on color or shape.</p></li>
</ul>
<p>Peek at the data:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ wc -l data/colored-shapes.dkvp
10078 data/colored-shapes.dkvp
$ head -n 6 data/colored-shapes.dkvp | mlr --opprint cat
color shape flag i u v w x
yellow triangle 1 11 0.6321695890307647 0.9887207810889004 0.4364983936735774 5.7981881667050565
red square 1 15 0.21966833570651523 0.001257332190235938 0.7927778364718627 2.944117399716207
red circle 1 16 0.20901671281497636 0.29005231936593445 0.13810280912907674 5.065034003400998
red square 0 48 0.9562743938458542 0.7467203085342884 0.7755423050923582 7.117831369597269
purple triangle 0 51 0.4355354501763202 0.8591292672156728 0.8122903963006748 5.753094629505863
red square 0 64 0.2015510269821953 0.9531098083420033 0.7719912015786777 5.612050466474166
</pre></div>
</div>
<p>Look at uncategorized stats (using <a class="reference external" href="https://github.com/johnkerl/scripts/blob/master/fundam/creach">creach</a> for spacing).</p>
<p>Here it looks reasonable that <code class="docutils literal notranslate"><span class="pre">u</span></code> is unit-uniform; somethings up with <code class="docutils literal notranslate"><span class="pre">v</span></code> but we cant yet see what:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ mlr --oxtab stats1 -a min,mean,max -f flag,u,v data/colored-shapes.dkvp | creach 3
flag_min 0
flag_mean 0.398889
flag_max 1
u_min 0.000044
u_mean 0.498326
u_max 0.999969
v_min -0.092709
v_mean 0.497787
v_max 1.072500
</pre></div>
</div>
<p>The histogram shows the different distribution of 0/1 flags:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ mlr --opprint histogram -f flag,u,v --lo -0.1 --hi 1.1 --nbins 12 data/colored-shapes.dkvp
bin_lo bin_hi flag_count u_count v_count
-0.100000 0.000000 6058 0 36
0.000000 0.100000 0 1062 988
0.100000 0.200000 0 985 1003
0.200000 0.300000 0 1024 1014
0.300000 0.400000 0 1002 991
0.400000 0.500000 0 989 1041
0.500000 0.600000 0 1001 1016
0.600000 0.700000 0 972 962
0.700000 0.800000 0 1035 1070
0.800000 0.900000 0 995 993
0.900000 1.000000 4020 1013 939
1.000000 1.100000 0 0 25
</pre></div>
</div>
<p>Look at univariate stats by color and shape. In particular, color-dependent flag probabilities pop out, aligning with their original Bernoulli probablities from the data-generator script:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ mlr --opprint stats1 -a min,mean,max -f flag,u,v -g color then sort -f color data/colored-shapes.dkvp
color flag_min flag_mean flag_max u_min u_mean u_max v_min v_mean v_max
blue 0 0.584354 1 0.000044 0.517717 0.999969 0.001489 0.491056 0.999576
green 0 0.209197 1 0.000488 0.504861 0.999936 0.000501 0.499085 0.999676
orange 0 0.521452 1 0.001235 0.490532 0.998885 0.002449 0.487764 0.998475
purple 0 0.090193 1 0.000266 0.494005 0.999647 0.000364 0.497051 0.999975
red 0 0.303167 1 0.000671 0.492560 0.999882 -0.092709 0.496535 1.072500
yellow 0 0.892427 1 0.001300 0.497129 0.999923 0.000711 0.510627 0.999919
$ mlr --opprint stats1 -a min,mean,max -f flag,u,v -g shape then sort -f shape data/colored-shapes.dkvp
shape flag_min flag_mean flag_max u_min u_mean u_max v_min v_mean v_max
circle 0 0.399846 1 0.000044 0.498555 0.999923 -0.092709 0.495524 1.072500
square 0 0.396112 1 0.000188 0.499385 0.999969 0.000089 0.496538 0.999975
triangle 0 0.401542 1 0.000881 0.496859 0.999661 0.000717 0.501050 0.999995
</pre></div>
</div>
<p>Look at bivariate stats by color and shape. In particular, <code class="docutils literal notranslate"><span class="pre">u,v</span></code> pairwise correlation for red circles pops out:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ mlr --opprint --right stats2 -a corr -f u,v,w,x data/colored-shapes.dkvp
u_v_corr w_x_corr
0.133418 -0.011320
$ mlr --opprint --right stats2 -a corr -f u,v,w,x -g color,shape then sort -nr u_v_corr data/colored-shapes.dkvp
color shape u_v_corr w_x_corr
red circle 0.980798 -0.018565
orange square 0.176858 -0.071044
green circle 0.057644 0.011795
red square 0.055745 -0.000680
yellow triangle 0.044573 0.024605
yellow square 0.043792 -0.044623
purple circle 0.035874 0.134112
blue square 0.032412 -0.053508
blue triangle 0.015356 -0.000608
orange circle 0.010519 -0.162795
red triangle 0.008098 0.012486
purple triangle 0.005155 -0.045058
purple square -0.025680 0.057694
green square -0.025776 -0.003265
orange triangle -0.030457 -0.131870
yellow circle -0.064773 0.073695
blue circle -0.102348 -0.030529
green triangle -0.109018 -0.048488
</pre></div>
</div>
</div>
</div>
<div class="clearer"></div>
</div>
</div>
</div>
<div class="sphinxsidebar" role="navigation" aria-label="main navigation">
<div class="sphinxsidebarwrapper">
<h3><a href="index.html">Table of Contents</a></h3>
<ul>
<li><a class="reference internal" href="#">Data-diving examples</a><ul>
<li><a class="reference internal" href="#flins-data">flins data</a></li>
<li><a class="reference internal" href="#color-shape-data">Color/shape data</a></li>
</ul>
</li>
</ul>
<h4>Previous topic</h4>
<p class="topless"><a href="log-processing-examples.html"
title="previous chapter">Log-processing examples</a></p>
<h4>Next topic</h4>
<p class="topless"><a href="cookbook.html"
title="next chapter">Cookbook part 1: common patterns</a></p>
<div role="note" aria-label="source link">
<h3>This Page</h3>
<ul class="this-page-menu">
<li><a href="_sources/data-examples.rst.txt"
rel="nofollow">Show Source</a></li>
</ul>
</div>
<div id="searchbox" style="display: none" role="search">
<h3 id="searchlabel">Quick search</h3>
<div class="searchformwrapper">
<form class="search" action="search.html" method="get">
<input type="text" name="q" aria-labelledby="searchlabel" />
<input type="submit" value="Go" />
</form>
</div>
</div>
<script>$('#searchbox').show(0);</script>
</div>
</div>
<div class="clearer"></div>
</div>
<div class="related" role="navigation" aria-label="related navigation">
<h3>Navigation</h3>
<ul>
<li class="right" style="margin-right: 10px">
<a href="genindex.html" title="General Index"
>index</a></li>
<li class="right" >
<a href="cookbook.html" title="Cookbook part 1: common patterns"
>next</a> |</li>
<li class="right" >
<a href="log-processing-examples.html" title="Log-processing examples"
>previous</a> |</li>
<li class="nav-item nav-item-0"><a href="index.html">Miller 5.10.2 documentation</a> &#187;</li>
<li class="nav-item nav-item-this"><a href="">Data-diving examples</a></li>
</ul>
</div>
<div class="footer" role="contentinfo">
&#169; Copyright 2020, John Kerl.
Created using <a href="https://www.sphinx-doc.org/">Sphinx</a> 3.2.1.
</div>
</body>
</html>

View file

@ -1,401 +0,0 @@
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>Mixing with other languages &#8212; Miller 5.10.2 documentation</title>
<link rel="stylesheet" href="_static/classic.css" type="text/css" />
<link rel="stylesheet" href="_static/pygments.css" type="text/css" />
<script id="documentation_options" data-url_root="./" src="_static/documentation_options.js"></script>
<script src="_static/jquery.js"></script>
<script src="_static/underscore.js"></script>
<script src="_static/doctools.js"></script>
<script src="_static/language_data.js"></script>
<link rel="index" title="Index" href="genindex.html" />
<link rel="search" title="Search" href="search.html" />
<link rel="next" title="Main reference" href="reference.html" />
<link rel="prev" title="Cookbook part 3: Stats with and without out-of-stream variables" href="cookbook3.html" />
</head><body>
<div class="related" role="navigation" aria-label="related navigation">
<h3>Navigation</h3>
<ul>
<li class="right" style="margin-right: 10px">
<a href="genindex.html" title="General Index"
accesskey="I">index</a></li>
<li class="right" >
<a href="reference.html" title="Main reference"
accesskey="N">next</a> |</li>
<li class="right" >
<a href="cookbook3.html" title="Cookbook part 3: Stats with and without out-of-stream variables"
accesskey="P">previous</a> |</li>
<li class="nav-item nav-item-0"><a href="index.html">Miller 5.10.2 documentation</a> &#187;</li>
<li class="nav-item nav-item-this"><a href="">Mixing with other languages</a></li>
</ul>
</div>
<div class="document">
<div class="documentwrapper">
<div class="bodywrapper">
<div class="body" role="main">
<div class="section" id="mixing-with-other-languages">
<h1>Mixing with other languages<a class="headerlink" href="#mixing-with-other-languages" title="Permalink to this headline"></a></h1>
<p>As discussed in the section on <a class="reference internal" href="file-formats.html"><span class="doc">File formats</span></a>, Miller supports several different file formats. Different tools are good at different things, so its important to be able to move data into and out of other languages. <strong>CSV</strong> and <strong>JSON</strong> are well-known, of course; here are some examples using <strong>DKVP</strong> format, with <strong>Ruby</strong> and <strong>Python</strong>. Last, we show how to use arbitrary <strong>shell commands</strong> to extend functionality beyond Millers domain-specific language.</p>
<div class="section" id="dkvp-i-o-in-python">
<h2>DKVP I/O in Python<a class="headerlink" href="#dkvp-i-o-in-python" title="Permalink to this headline"></a></h2>
<p>Here are the I/O routines:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="ch">#!/usr/bin/env python</span>
<span class="c1"># ================================================================</span>
<span class="c1"># Example of DKVP I/O using Python.</span>
<span class="c1">#</span>
<span class="c1"># Key point: Use Miller for what it&#39;s good at; pass data into/out of tools in</span>
<span class="c1"># other languages to do what they&#39;re good at.</span>
<span class="c1">#</span>
<span class="c1"># bash$ python -i dkvp_io.py</span>
<span class="c1">#</span>
<span class="c1"># # READ</span>
<span class="c1"># &gt;&gt;&gt; map = dkvpline2map(&#39;x=1,y=2&#39;, &#39;=&#39;, &#39;,&#39;)</span>
<span class="c1"># &gt;&gt;&gt; map</span>
<span class="c1"># OrderedDict([(&#39;x&#39;, &#39;1&#39;), (&#39;y&#39;, &#39;2&#39;)])</span>
<span class="c1">#</span>
<span class="c1"># # MODIFY</span>
<span class="c1"># &gt;&gt;&gt; map[&#39;z&#39;] = map[&#39;x&#39;] + map[&#39;y&#39;]</span>
<span class="c1"># &gt;&gt;&gt; map</span>
<span class="c1"># OrderedDict([(&#39;x&#39;, &#39;1&#39;), (&#39;y&#39;, &#39;2&#39;), (&#39;z&#39;, 3)])</span>
<span class="c1">#</span>
<span class="c1"># # WRITE</span>
<span class="c1"># &gt;&gt;&gt; line = map2dkvpline(map, &#39;=&#39;, &#39;,&#39;)</span>
<span class="c1"># &gt;&gt;&gt; line</span>
<span class="c1"># &#39;x=1,y=2,z=3&#39;</span>
<span class="c1">#</span>
<span class="c1"># ================================================================</span>
<span class="kn">import</span> <span class="nn">re</span>
<span class="kn">import</span> <span class="nn">collections</span>
<span class="c1"># ----------------------------------------------------------------</span>
<span class="c1"># ips and ifs (input pair separator and input field separator) are nominally &#39;=&#39; and &#39;,&#39;.</span>
<span class="k">def</span> <span class="nf">dkvpline2map</span><span class="p">(</span><span class="n">line</span><span class="p">,</span> <span class="n">ips</span><span class="p">,</span> <span class="n">ifs</span><span class="p">):</span>
<span class="n">pairs</span> <span class="o">=</span> <span class="n">re</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="n">ifs</span><span class="p">,</span> <span class="n">line</span><span class="p">)</span>
<span class="nb">map</span> <span class="o">=</span> <span class="n">collections</span><span class="o">.</span><span class="n">OrderedDict</span><span class="p">()</span>
<span class="k">for</span> <span class="n">pair</span> <span class="ow">in</span> <span class="n">pairs</span><span class="p">:</span>
<span class="n">key</span><span class="p">,</span> <span class="n">value</span> <span class="o">=</span> <span class="n">re</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="n">ips</span><span class="p">,</span> <span class="n">pair</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span>
<span class="c1"># Type inference:</span>
<span class="k">try</span><span class="p">:</span>
<span class="n">value</span> <span class="o">=</span> <span class="nb">int</span><span class="p">(</span><span class="n">value</span><span class="p">)</span>
<span class="k">except</span><span class="p">:</span>
<span class="k">try</span><span class="p">:</span>
<span class="n">value</span> <span class="o">=</span> <span class="nb">float</span><span class="p">(</span><span class="n">value</span><span class="p">)</span>
<span class="k">except</span><span class="p">:</span>
<span class="k">pass</span>
<span class="nb">map</span><span class="p">[</span><span class="n">key</span><span class="p">]</span> <span class="o">=</span> <span class="n">value</span>
<span class="k">return</span> <span class="nb">map</span>
<span class="c1"># ----------------------------------------------------------------</span>
<span class="c1"># ops and ofs (output pair separator and output field separator) are nominally &#39;=&#39; and &#39;,&#39;.</span>
<span class="k">def</span> <span class="nf">map2dkvpline</span><span class="p">(</span><span class="nb">map</span> <span class="p">,</span> <span class="n">ops</span><span class="p">,</span> <span class="n">ofs</span><span class="p">):</span>
<span class="n">line</span> <span class="o">=</span> <span class="s1">&#39;&#39;</span>
<span class="n">pairs</span> <span class="o">=</span> <span class="p">[]</span>
<span class="k">for</span> <span class="n">key</span> <span class="ow">in</span> <span class="nb">map</span><span class="p">:</span>
<span class="n">pairs</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="nb">str</span><span class="p">(</span><span class="n">key</span><span class="p">)</span> <span class="o">+</span> <span class="n">ops</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="nb">map</span><span class="p">[</span><span class="n">key</span><span class="p">]))</span>
<span class="k">return</span> <span class="nb">str</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">ofs</span><span class="p">,</span> <span class="n">pairs</span><span class="p">)</span>
</pre></div>
</div>
<p>And here is an example using them:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ cat polyglot-dkvp-io/example.py
#!/usr/bin/env python
import sys
import re
import copy
import dkvp_io
while True:
# Read the original record:
line = sys.stdin.readline().strip()
if line == &#39;&#39;:
break
map = dkvp_io.dkvpline2map(line, &#39;=&#39;, &#39;,&#39;)
# Drop a field:
map.pop(&#39;x&#39;)
# Compute some new fields:
map[&#39;ab&#39;] = map[&#39;a&#39;] + map[&#39;b&#39;]
map[&#39;iy&#39;] = map[&#39;i&#39;] + map[&#39;y&#39;]
# Add new fields which show type of each already-existing field:
omap = copy.copy(map) # since otherwise the for-loop will modify what it loops over
keys = omap.keys()
for key in keys:
# Convert &quot;&lt;type &#39;int&#39;&gt;&quot; to just &quot;int&quot;, etc.:
type_string = str(map[key].__class__)
type_string = re.sub(&quot;&lt;type &#39;&quot;, &quot;&quot;, type_string) # python2
type_string = re.sub(&quot;&lt;class &#39;&quot;, &quot;&quot;, type_string) # python3
type_string = re.sub(&quot;&#39;&gt;&quot;, &quot;&quot;, type_string)
map[&#39;t&#39;+key] = type_string
# Write the modified record:
print(dkvp_io.map2dkvpline(map, &#39;=&#39;, &#39;,&#39;))
</pre></div>
</div>
<p>Run as-is:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ python polyglot-dkvp-io/example.py &lt; data/small
a=pan,b=pan,i=1,y=0.7268028627434533,ab=panpan,iy=1.7268028627434533,ta=str,tb=str,ti=int,ty=float,tab=str,tiy=float
a=eks,b=pan,i=2,y=0.5221511083334797,ab=ekspan,iy=2.5221511083334796,ta=str,tb=str,ti=int,ty=float,tab=str,tiy=float
a=wye,b=wye,i=3,y=0.33831852551664776,ab=wyewye,iy=3.3383185255166477,ta=str,tb=str,ti=int,ty=float,tab=str,tiy=float
a=eks,b=wye,i=4,y=0.13418874328430463,ab=ekswye,iy=4.134188743284304,ta=str,tb=str,ti=int,ty=float,tab=str,tiy=float
a=wye,b=pan,i=5,y=0.8636244699032729,ab=wyepan,iy=5.863624469903273,ta=str,tb=str,ti=int,ty=float,tab=str,tiy=float
</pre></div>
</div>
<p>Run as-is, then pipe to Miller for pretty-printing:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ python polyglot-dkvp-io/example.py &lt; data/small | mlr --opprint cat
a b i y ab iy ta tb ti ty tab tiy
pan pan 1 0.7268028627434533 panpan 1.7268028627434533 str str int float str float
eks pan 2 0.5221511083334797 ekspan 2.5221511083334796 str str int float str float
wye wye 3 0.33831852551664776 wyewye 3.3383185255166477 str str int float str float
eks wye 4 0.13418874328430463 ekswye 4.134188743284304 str str int float str float
wye pan 5 0.8636244699032729 wyepan 5.863624469903273 str str int float str float
</pre></div>
</div>
</div>
<div class="section" id="dkvp-i-o-in-ruby">
<h2>DKVP I/O in Ruby<a class="headerlink" href="#dkvp-i-o-in-ruby" title="Permalink to this headline"></a></h2>
<p>Here are the I/O routines:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="ch">#!/usr/bin/env ruby</span>
<span class="c1"># ================================================================</span>
<span class="c1"># Example of DKVP I/O using Ruby.</span>
<span class="c1">#</span>
<span class="c1"># Key point: Use Miller for what it&#39;s good at; pass data into/out of tools in</span>
<span class="c1"># other languages to do what they&#39;re good at.</span>
<span class="c1">#</span>
<span class="c1"># bash$ irb -I. -r dkvp_io.rb</span>
<span class="c1">#</span>
<span class="c1"># # READ</span>
<span class="c1"># irb(main):001:0&gt; map = dkvpline2map(&#39;x=1,y=2&#39;, &#39;=&#39;, &#39;,&#39;)</span>
<span class="c1"># =&gt; {&quot;x&quot;=&gt;&quot;1&quot;, &quot;y&quot;=&gt;&quot;2&quot;}</span>
<span class="c1">#</span>
<span class="c1"># # MODIFY</span>
<span class="c1"># irb(main):001:0&gt; map[&#39;z&#39;] = map[&#39;x&#39;] + map[&#39;y&#39;]</span>
<span class="c1"># =&gt; 3</span>
<span class="c1">#</span>
<span class="c1"># # WRITE</span>
<span class="c1"># irb(main):002:0&gt; line = map2dkvpline(map, &#39;=&#39;, &#39;,&#39;)</span>
<span class="c1"># =&gt; &quot;x=1,y=2,z=3&quot;</span>
<span class="c1">#</span>
<span class="c1"># ================================================================</span>
<span class="c1"># ----------------------------------------------------------------</span>
<span class="c1"># ips and ifs (input pair separator and input field separator) are nominally &#39;=&#39; and &#39;,&#39;.</span>
<span class="k">def</span> <span class="nf">dkvpline2map</span><span class="p">(</span><span class="n">line</span><span class="p">,</span> <span class="n">ips</span><span class="p">,</span> <span class="n">ifs</span><span class="p">)</span>
<span class="nb">map</span> <span class="o">=</span> <span class="p">{}</span>
<span class="n">line</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="n">ifs</span><span class="p">)</span><span class="o">.</span><span class="n">each</span> <span class="n">do</span> <span class="o">|</span><span class="n">pair</span><span class="o">|</span>
<span class="p">(</span><span class="n">k</span><span class="p">,</span> <span class="n">v</span><span class="p">)</span> <span class="o">=</span> <span class="n">pair</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="n">ips</span><span class="p">,</span> <span class="mi">2</span><span class="p">)</span>
<span class="c1"># Type inference:</span>
<span class="n">begin</span>
<span class="n">v</span> <span class="o">=</span> <span class="n">Integer</span><span class="p">(</span><span class="n">v</span><span class="p">)</span>
<span class="n">rescue</span> <span class="n">ArgumentError</span>
<span class="n">begin</span>
<span class="n">v</span> <span class="o">=</span> <span class="n">Float</span><span class="p">(</span><span class="n">v</span><span class="p">)</span>
<span class="n">rescue</span> <span class="n">ArgumentError</span>
<span class="c1"># Leave as string</span>
<span class="n">end</span>
<span class="n">end</span>
<span class="nb">map</span><span class="p">[</span><span class="n">k</span><span class="p">]</span> <span class="o">=</span> <span class="n">v</span>
<span class="n">end</span>
<span class="nb">map</span>
<span class="n">end</span>
<span class="c1"># ----------------------------------------------------------------</span>
<span class="c1"># ops and ofs (output pair separator and output field separator) are nominally &#39;=&#39; and &#39;,&#39;.</span>
<span class="k">def</span> <span class="nf">map2dkvpline</span><span class="p">(</span><span class="nb">map</span><span class="p">,</span> <span class="n">ops</span><span class="p">,</span> <span class="n">ofs</span><span class="p">)</span>
<span class="nb">map</span><span class="o">.</span><span class="n">collect</span><span class="p">{</span><span class="o">|</span><span class="n">k</span><span class="p">,</span><span class="n">v</span><span class="o">|</span> <span class="n">k</span><span class="o">.</span><span class="n">to_s</span> <span class="o">+</span> <span class="n">ops</span> <span class="o">+</span> <span class="n">v</span><span class="o">.</span><span class="n">to_s</span><span class="p">}</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">ofs</span><span class="p">)</span>
<span class="n">end</span>
</pre></div>
</div>
<p>And here is an example using them:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ cat polyglot-dkvp-io/example.rb
#!/usr/bin/env ruby
require &#39;dkvp_io&#39;
ARGF.each do |line|
# Read the original record:
map = dkvpline2map(line.chomp, &#39;=&#39;, &#39;,&#39;)
# Drop a field:
map.delete(&#39;x&#39;)
# Compute some new fields:
map[&#39;ab&#39;] = map[&#39;a&#39;] + map[&#39;b&#39;]
map[&#39;iy&#39;] = map[&#39;i&#39;] + map[&#39;y&#39;]
# Add new fields which show type of each already-existing field:
keys = map.keys
keys.each do |key|
map[&#39;t&#39;+key] = map[key].class
end
# Write the modified record:
puts map2dkvpline(map, &#39;=&#39;, &#39;,&#39;)
end
</pre></div>
</div>
<p>Run as-is:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ ruby -I./polyglot-dkvp-io polyglot-dkvp-io/example.rb data/small
a=pan,b=pan,i=1,y=0.7268028627434533,ab=panpan,iy=1.7268028627434533,ta=String,tb=String,ti=Integer,ty=Float,tab=String,tiy=Float
a=eks,b=pan,i=2,y=0.5221511083334797,ab=ekspan,iy=2.5221511083334796,ta=String,tb=String,ti=Integer,ty=Float,tab=String,tiy=Float
a=wye,b=wye,i=3,y=0.33831852551664776,ab=wyewye,iy=3.3383185255166477,ta=String,tb=String,ti=Integer,ty=Float,tab=String,tiy=Float
a=eks,b=wye,i=4,y=0.13418874328430463,ab=ekswye,iy=4.134188743284304,ta=String,tb=String,ti=Integer,ty=Float,tab=String,tiy=Float
a=wye,b=pan,i=5,y=0.8636244699032729,ab=wyepan,iy=5.863624469903273,ta=String,tb=String,ti=Integer,ty=Float,tab=String,tiy=Float
</pre></div>
</div>
<p>Run as-is, then pipe to Miller for pretty-printing:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ ruby -I./polyglot-dkvp-io polyglot-dkvp-io/example.rb data/small | mlr --opprint cat
a b i y ab iy ta tb ti ty tab tiy
pan pan 1 0.7268028627434533 panpan 1.7268028627434533 String String Integer Float String Float
eks pan 2 0.5221511083334797 ekspan 2.5221511083334796 String String Integer Float String Float
wye wye 3 0.33831852551664776 wyewye 3.3383185255166477 String String Integer Float String Float
eks wye 4 0.13418874328430463 ekswye 4.134188743284304 String String Integer Float String Float
wye pan 5 0.8636244699032729 wyepan 5.863624469903273 String String Integer Float String Float
</pre></div>
</div>
</div>
<div class="section" id="sql-output-examples">
<h2>SQL-output examples<a class="headerlink" href="#sql-output-examples" title="Permalink to this headline"></a></h2>
<p>Please see <a class="reference internal" href="sql-examples.html#sql-output-examples"><span class="std std-ref">SQL-output examples</span></a>.</p>
</div>
<div class="section" id="sql-input-examples">
<h2>SQL-input examples<a class="headerlink" href="#sql-input-examples" title="Permalink to this headline"></a></h2>
<p>Please see <a class="reference internal" href="sql-examples.html#sql-input-examples"><span class="std std-ref">SQL-input examples</span></a>.</p>
</div>
<div class="section" id="running-shell-commands">
<h2>Running shell commands<a class="headerlink" href="#running-shell-commands" title="Permalink to this headline"></a></h2>
<p>The <a class="reference internal" href="reference-dsl.html#reference-dsl-system"><span class="std std-ref">system</span></a> DSL function allows you to run a specific shell command and put its output minus the final newline into a record field. The command itself is any string, either a literal string, or a concatenation of strings, perhaps including other field values or what have you.</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ mlr --opprint put &#39;$o = system(&quot;echo hello world&quot;)&#39; data/small
a b i x y o
pan pan 1 0.3467901443380824 0.7268028627434533 hello world
eks pan 2 0.7586799647899636 0.5221511083334797 hello world
wye wye 3 0.20460330576630303 0.33831852551664776 hello world
eks wye 4 0.38139939387114097 0.13418874328430463 hello world
wye pan 5 0.5732889198020006 0.8636244699032729 hello world
</pre></div>
</div>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ mlr --opprint put &#39;$o = system(&quot;echo {&quot; . NR . &quot;}&quot;)&#39; data/small
a b i x y o
pan pan 1 0.3467901443380824 0.7268028627434533 {1}
eks pan 2 0.7586799647899636 0.5221511083334797 {2}
wye wye 3 0.20460330576630303 0.33831852551664776 {3}
eks wye 4 0.38139939387114097 0.13418874328430463 {4}
wye pan 5 0.5732889198020006 0.8636244699032729 {5}
</pre></div>
</div>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ mlr --opprint put &#39;$o = system(&quot;echo -n &quot;.$a.&quot;| sha1sum&quot;)&#39; data/small
a b i x y o
pan pan 1 0.3467901443380824 0.7268028627434533 f29c748220331c273ef16d5115f6ecd799947f13 -
eks pan 2 0.7586799647899636 0.5221511083334797 456d988ecb3bf1b75f057fc6e9fe70db464e9388 -
wye wye 3 0.20460330576630303 0.33831852551664776 eab0de043d67f441c7fd1e335f0ca38708e6ebf7 -
eks wye 4 0.38139939387114097 0.13418874328430463 456d988ecb3bf1b75f057fc6e9fe70db464e9388 -
wye pan 5 0.5732889198020006 0.8636244699032729 eab0de043d67f441c7fd1e335f0ca38708e6ebf7 -
</pre></div>
</div>
<p>Note that running a subprocess on every record takes a non-trivial amount of time. Comparing asking the system <code class="docutils literal notranslate"><span class="pre">date</span></code> command for the current time in nanoseconds versus computing it in process:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ mlr --opprint put &#39;$t=system(&quot;date +%s.%N&quot;)&#39; then step -a delta -f t data/small
a b i x y t t_delta
pan pan 1 0.3467901443380824 0.7268028627434533 1568774318.513903817 0
eks pan 2 0.7586799647899636 0.5221511083334797 1568774318.514722876 0.000819
wye wye 3 0.20460330576630303 0.33831852551664776 1568774318.515618046 0.000895
eks wye 4 0.38139939387114097 0.13418874328430463 1568774318.516547441 0.000929
wye pan 5 0.5732889198020006 0.8636244699032729 1568774318.517518828 0.000971
</pre></div>
</div>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ mlr --opprint put &#39;$t=systime()&#39; then step -a delta -f t data/small
a b i x y t t_delta
pan pan 1 0.3467901443380824 0.7268028627434533 1568774318.518699 0
eks pan 2 0.7586799647899636 0.5221511083334797 1568774318.518717 0.000018
wye wye 3 0.20460330576630303 0.33831852551664776 1568774318.518723 0.000006
eks wye 4 0.38139939387114097 0.13418874328430463 1568774318.518727 0.000004
wye pan 5 0.5732889198020006 0.8636244699032729 1568774318.518730 0.000003
</pre></div>
</div>
</div>
</div>
<div class="clearer"></div>
</div>
</div>
</div>
<div class="sphinxsidebar" role="navigation" aria-label="main navigation">
<div class="sphinxsidebarwrapper">
<h3><a href="index.html">Table of Contents</a></h3>
<ul>
<li><a class="reference internal" href="#">Mixing with other languages</a><ul>
<li><a class="reference internal" href="#dkvp-i-o-in-python">DKVP I/O in Python</a></li>
<li><a class="reference internal" href="#dkvp-i-o-in-ruby">DKVP I/O in Ruby</a></li>
<li><a class="reference internal" href="#sql-output-examples">SQL-output examples</a></li>
<li><a class="reference internal" href="#sql-input-examples">SQL-input examples</a></li>
<li><a class="reference internal" href="#running-shell-commands">Running shell commands</a></li>
</ul>
</li>
</ul>
<h4>Previous topic</h4>
<p class="topless"><a href="cookbook3.html"
title="previous chapter">Cookbook part 3: Stats with and without out-of-stream variables</a></p>
<h4>Next topic</h4>
<p class="topless"><a href="reference.html"
title="next chapter">Main reference</a></p>
<div role="note" aria-label="source link">
<h3>This Page</h3>
<ul class="this-page-menu">
<li><a href="_sources/data-sharing.rst.txt"
rel="nofollow">Show Source</a></li>
</ul>
</div>
<div id="searchbox" style="display: none" role="search">
<h3 id="searchlabel">Quick search</h3>
<div class="searchformwrapper">
<form class="search" action="search.html" method="get">
<input type="text" name="q" aria-labelledby="searchlabel" />
<input type="submit" value="Go" />
</form>
</div>
</div>
<script>$('#searchbox').show(0);</script>
</div>
</div>
<div class="clearer"></div>
</div>
<div class="related" role="navigation" aria-label="related navigation">
<h3>Navigation</h3>
<ul>
<li class="right" style="margin-right: 10px">
<a href="genindex.html" title="General Index"
>index</a></li>
<li class="right" >
<a href="reference.html" title="Main reference"
>next</a> |</li>
<li class="right" >
<a href="cookbook3.html" title="Cookbook part 3: Stats with and without out-of-stream variables"
>previous</a> |</li>
<li class="nav-item nav-item-0"><a href="index.html">Miller 5.10.2 documentation</a> &#187;</li>
<li class="nav-item nav-item-this"><a href="">Mixing with other languages</a></li>
</ul>
</div>
<div class="footer" role="contentinfo">
&#169; Copyright 2020, John Kerl.
Created using <a href="https://www.sphinx-doc.org/">Sphinx</a> 3.2.1.
</div>
</body>
</html>

View file

@ -1,106 +0,0 @@
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>Why call it Miller? &#8212; Miller 5.10.2 documentation</title>
<link rel="stylesheet" href="_static/classic.css" type="text/css" />
<link rel="stylesheet" href="_static/pygments.css" type="text/css" />
<script id="documentation_options" data-url_root="./" src="_static/documentation_options.js"></script>
<script src="_static/jquery.js"></script>
<script src="_static/underscore.js"></script>
<script src="_static/doctools.js"></script>
<script src="_static/language_data.js"></script>
<link rel="index" title="Index" href="genindex.html" />
<link rel="search" title="Search" href="search.html" />
<link rel="next" title="How original is Miller?" href="originality.html" />
<link rel="prev" title="Why?" href="why.html" />
</head><body>
<div class="related" role="navigation" aria-label="related navigation">
<h3>Navigation</h3>
<ul>
<li class="right" style="margin-right: 10px">
<a href="genindex.html" title="General Index"
accesskey="I">index</a></li>
<li class="right" >
<a href="originality.html" title="How original is Miller?"
accesskey="N">next</a> |</li>
<li class="right" >
<a href="why.html" title="Why?"
accesskey="P">previous</a> |</li>
<li class="nav-item nav-item-0"><a href="index.html">Miller 5.10.2 documentation</a> &#187;</li>
<li class="nav-item nav-item-this"><a href="">Why call it Miller?</a></li>
</ul>
</div>
<div class="document">
<div class="documentwrapper">
<div class="bodywrapper">
<div class="body" role="main">
<div class="section" id="why-call-it-miller">
<h1>Why call it Miller?<a class="headerlink" href="#why-call-it-miller" title="Permalink to this headline"></a></h1>
<p>The Unix toolkit was created in the <strong>1970s</strong> and is a mainstay to this day. Miller is written in plain C, and its look and feel adheres closely to the <a class="reference external" href="http://en.wikipedia.org/wiki/Unix_philosophy">classic toolkit style</a>: if this were music, Miller would be a <strong>tribute album</strong>. Likewise, since commands are subcommands of the <code class="docutils literal notranslate"><span class="pre">mlr</span></code> executable, the result is a <strong>band</strong>, if you will, of command-line tools. Put these together and the namesake is another classic product of the 1970s: the <a class="reference external" href="http://en.wikipedia.org/wiki/Steve%5fMiller%5fBand">Steve Miller Band</a>.</p>
<p>(Additionally, and far more prosaically … just as a miller is someone who grinds and mixes grain into flour to extend its usefulness, Miller grinds and mixes data for you.)</p>
</div>
<div class="clearer"></div>
</div>
</div>
</div>
<div class="sphinxsidebar" role="navigation" aria-label="main navigation">
<div class="sphinxsidebarwrapper">
<h4>Previous topic</h4>
<p class="topless"><a href="why.html"
title="previous chapter">Why?</a></p>
<h4>Next topic</h4>
<p class="topless"><a href="originality.html"
title="next chapter">How original is Miller?</a></p>
<div role="note" aria-label="source link">
<h3>This Page</h3>
<ul class="this-page-menu">
<li><a href="_sources/etymology.rst.txt"
rel="nofollow">Show Source</a></li>
</ul>
</div>
<div id="searchbox" style="display: none" role="search">
<h3 id="searchlabel">Quick search</h3>
<div class="searchformwrapper">
<form class="search" action="search.html" method="get">
<input type="text" name="q" aria-labelledby="searchlabel" />
<input type="submit" value="Go" />
</form>
</div>
</div>
<script>$('#searchbox').show(0);</script>
</div>
</div>
<div class="clearer"></div>
</div>
<div class="related" role="navigation" aria-label="related navigation">
<h3>Navigation</h3>
<ul>
<li class="right" style="margin-right: 10px">
<a href="genindex.html" title="General Index"
>index</a></li>
<li class="right" >
<a href="originality.html" title="How original is Miller?"
>next</a> |</li>
<li class="right" >
<a href="why.html" title="Why?"
>previous</a> |</li>
<li class="nav-item nav-item-0"><a href="index.html">Miller 5.10.2 documentation</a> &#187;</li>
<li class="nav-item nav-item-this"><a href="">Why call it Miller?</a></li>
</ul>
</div>
<div class="footer" role="contentinfo">
&#169; Copyright 2020, John Kerl.
Created using <a href="https://www.sphinx-doc.org/">Sphinx</a> 3.2.1.
</div>
</body>
</html>

View file

@ -1,671 +0,0 @@
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>FAQ &#8212; Miller 5.10.2 documentation</title>
<link rel="stylesheet" href="_static/classic.css" type="text/css" />
<link rel="stylesheet" href="_static/pygments.css" type="text/css" />
<script id="documentation_options" data-url_root="./" src="_static/documentation_options.js"></script>
<script src="_static/jquery.js"></script>
<script src="_static/underscore.js"></script>
<script src="_static/doctools.js"></script>
<script src="_static/language_data.js"></script>
<link rel="index" title="Index" href="genindex.html" />
<link rel="search" title="Search" href="search.html" />
<link rel="next" title="SQL examples" href="sql-examples.html" />
<link rel="prev" title="Contact" href="contact.html" />
</head><body>
<div class="related" role="navigation" aria-label="related navigation">
<h3>Navigation</h3>
<ul>
<li class="right" style="margin-right: 10px">
<a href="genindex.html" title="General Index"
accesskey="I">index</a></li>
<li class="right" >
<a href="sql-examples.html" title="SQL examples"
accesskey="N">next</a> |</li>
<li class="right" >
<a href="contact.html" title="Contact"
accesskey="P">previous</a> |</li>
<li class="nav-item nav-item-0"><a href="index.html">Miller 5.10.2 documentation</a> &#187;</li>
<li class="nav-item nav-item-this"><a href="">FAQ</a></li>
</ul>
</div>
<div class="document">
<div class="documentwrapper">
<div class="bodywrapper">
<div class="body" role="main">
<div class="section" id="faq">
<h1>FAQ<a class="headerlink" href="#faq" title="Permalink to this headline"></a></h1>
<div class="section" id="no-output-at-all">
<h2>No output at all<a class="headerlink" href="#no-output-at-all" title="Permalink to this headline"></a></h2>
<p>Try <code class="docutils literal notranslate"><span class="pre">od</span> <span class="pre">-xcv</span></code> and/or <code class="docutils literal notranslate"><span class="pre">cat</span> <span class="pre">-e</span></code> on your file to check for non-printable characters.</p>
<p>If youre using Miller version less than 5.0.0 (try <code class="docutils literal notranslate"><span class="pre">mlr</span> <span class="pre">--version</span></code> on your system to find out), when the line-ending-autodetect feature was introduced, please see <a class="reference external" href="http://johnkerl.org/miller-releases/miller-4.5.0/doc/index.html">http://johnkerl.org/miller-releases/miller-4.5.0/doc/index.html</a>.</p>
</div>
<div class="section" id="fields-not-selected">
<h2>Fields not selected<a class="headerlink" href="#fields-not-selected" title="Permalink to this headline"></a></h2>
<p>Check the field-separators of the data, e.g. with the command-line <code class="docutils literal notranslate"><span class="pre">head</span></code> program. Example: for CSV, Millers default record separator is comma; if your data is tab-delimited, e.g. <code class="docutils literal notranslate"><span class="pre">aTABbTABc</span></code>, then Miller wont find three fields named <code class="docutils literal notranslate"><span class="pre">a</span></code>, <code class="docutils literal notranslate"><span class="pre">b</span></code>, and <code class="docutils literal notranslate"><span class="pre">c</span></code> but rather just one named <code class="docutils literal notranslate"><span class="pre">aTABbTABc</span></code>. Solution in this case: <code class="docutils literal notranslate"><span class="pre">mlr</span> <span class="pre">--fs</span> <span class="pre">tab</span> <span class="pre">{remaining</span> <span class="pre">arguments</span> <span class="pre">...}</span></code>.</p>
<p>Also try <code class="docutils literal notranslate"><span class="pre">od</span> <span class="pre">-xcv</span></code> and/or <code class="docutils literal notranslate"><span class="pre">cat</span> <span class="pre">-e</span></code> on your file to check for non-printable characters.</p>
</div>
<div class="section" id="diagnosing-delimiter-specifications">
<h2>Diagnosing delimiter specifications<a class="headerlink" href="#diagnosing-delimiter-specifications" title="Permalink to this headline"></a></h2>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span># Use the `file` command to see if there are CR/LF terminators (in this case,
# there are not):
$ file data/colours.csv
data/colours.csv: UTF-8 Unicode text
# Look at the file to find names of fields
$ cat data/colours.csv
KEY;DE;EN;ES;FI;FR;IT;NL;PL;RO;TR
masterdata_colourcode_1;Weiß;White;Blanco;Valkoinen;Blanc;Bianco;Wit;Biały;Alb;Beyaz
masterdata_colourcode_2;Schwarz;Black;Negro;Musta;Noir;Nero;Zwart;Czarny;Negru;Siyah
# Extract a few fields:
$ mlr --csv cut -f KEY,PL,RO data/colours.csv
(only blank lines appear)
# Use XTAB output format to get a sharper picture of where records/fields
# are being split:
$ mlr --icsv --oxtab cat data/colours.csv
KEY;DE;EN;ES;FI;FR;IT;NL;PL;RO;TR masterdata_colourcode_1;Weiß;White;Blanco;Valkoinen;Blanc;Bianco;Wit;Biały;Alb;Beyaz
KEY;DE;EN;ES;FI;FR;IT;NL;PL;RO;TR masterdata_colourcode_2;Schwarz;Black;Negro;Musta;Noir;Nero;Zwart;Czarny;Negru;Siyah
# Using XTAB output format makes it clearer that KEY;DE;...;RO;TR is being
# treated as a single field name in the CSV header, and likewise each
# subsequent line is being treated as a single field value. This is because
# the default field separator is a comma but we have semicolons here.
# Use XTAB again with different field separator (--fs semicolon):
mlr --icsv --ifs semicolon --oxtab cat data/colours.csv
KEY masterdata_colourcode_1
DE Weiß
EN White
ES Blanco
FI Valkoinen
FR Blanc
IT Bianco
NL Wit
PL Biały
RO Alb
TR Beyaz
KEY masterdata_colourcode_2
DE Schwarz
EN Black
ES Negro
FI Musta
FR Noir
IT Nero
NL Zwart
PL Czarny
RO Negru
TR Siyah
# Using the new field-separator, retry the cut:
mlr --csv --fs semicolon cut -f KEY,PL,RO data/colours.csv
KEY;PL;RO
masterdata_colourcode_1;Biały;Alb
masterdata_colourcode_2;Czarny;Negru
</pre></div>
</div>
</div>
<div class="section" id="how-do-i-suppress-numeric-conversion">
<h2>How do I suppress numeric conversion?<a class="headerlink" href="#how-do-i-suppress-numeric-conversion" title="Permalink to this headline"></a></h2>
<p><strong>TL;DR use put -S</strong>.</p>
<p>Within <code class="docutils literal notranslate"><span class="pre">mlr</span> <span class="pre">put</span></code> and <code class="docutils literal notranslate"><span class="pre">mlr</span> <span class="pre">filter</span></code>, the default behavior for scanning input records is to parse them as integer, if possible, then as float, if possible, else leave them as string:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ cat data/scan-example-1.tbl
value
1
2.0
3x
hello
</pre></div>
</div>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ mlr --pprint put &#39;$copy = $value; $type = typeof($value)&#39; data/scan-example-1.tbl
value copy type
1 1 int
2.0 2.000000 float
3x 3x string
hello hello string
</pre></div>
</div>
<p>The numeric-conversion rule is simple:</p>
<ul class="simple">
<li><p>Try to scan as integer (<code class="docutils literal notranslate"><span class="pre">&quot;1&quot;</span></code> should be int);</p></li>
<li><p>If that doesnt succeed, try to scan as float (<code class="docutils literal notranslate"><span class="pre">&quot;1.0&quot;</span></code> should be float);</p></li>
<li><p>If that doesnt succeed, leave the value as a string (<code class="docutils literal notranslate"><span class="pre">&quot;1x&quot;</span></code> is string).</p></li>
</ul>
<p>This is a sensible default: you should be able to put <code class="docutils literal notranslate"><span class="pre">'$z</span> <span class="pre">=</span> <span class="pre">$x</span> <span class="pre">+</span> <span class="pre">$y'</span></code> without having to write <code class="docutils literal notranslate"><span class="pre">'$z</span> <span class="pre">=</span> <span class="pre">int($x)</span> <span class="pre">+</span> <span class="pre">float($y)'</span></code>. Also note that default output format for floating-point numbers created by <code class="docutils literal notranslate"><span class="pre">put</span></code> (and other verbs such as <code class="docutils literal notranslate"><span class="pre">stats1</span></code>) is six decimal places; you can override this using <code class="docutils literal notranslate"><span class="pre">mlr</span> <span class="pre">--ofmt</span></code>. Also note that Miller uses your systems C library functions whenever possible: e.g. <code class="docutils literal notranslate"><span class="pre">sscanf</span></code> for converting strings to integer or floating-point.</p>
<p>But now suppose you have data like these:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ cat data/scan-example-2.tbl
value
0001
0002
0005
0005WA
0006
0007
0007WA
0008
0009
0010
</pre></div>
</div>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ mlr --pprint put &#39;$copy = $value; $type = typeof($value)&#39; data/scan-example-2.tbl
value copy type
0001 1 int
0002 2 int
0005 5 int
0005WA 0005WA string
0006 6 int
0007 7 int
0007WA 0007WA string
0008 8.000000 float
0009 9.000000 float
0010 8 int
</pre></div>
</div>
<p>The same conversion rules as above are being used. Namely:</p>
<ul class="simple">
<li><p>By default field values are inferred to int, else float, else string;</p></li>
<li><p>leading zeroes indicate octal for integers (<code class="docutils literal notranslate"><span class="pre">sscanf</span></code> semantics);</p></li>
<li><p>since <code class="docutils literal notranslate"><span class="pre">0008</span></code> doesnt scan as integer (leading 0 requests octal but 8 isnt a valid octal digit), the float scan is tried next and it succeeds;</p></li>
<li><p>default floating-point output format is 6 decimal places (override with <code class="docutils literal notranslate"><span class="pre">mlr</span> <span class="pre">--ofmt</span></code>).</p></li>
</ul>
<p>Taken individually the rules make sense; taken collectively they produce a mishmash of types here.</p>
<p>The solution is to <strong>use the -S flag</strong> for <code class="docutils literal notranslate"><span class="pre">mlr</span> <span class="pre">put</span></code> and/or <code class="docutils literal notranslate"><span class="pre">mlr</span> <span class="pre">filter</span></code>. Then all field values are left as string. You can type-coerce on demand using syntax like <code class="docutils literal notranslate"><span class="pre">'$z</span> <span class="pre">=</span> <span class="pre">int($x)</span> <span class="pre">+</span> <span class="pre">float($y)'</span></code>. (See also <a class="reference internal" href="reference-dsl.html"><span class="doc">DSL reference</span></a>; see also <a class="reference external" href="https://github.com/johnkerl/miller/issues/150">https://github.com/johnkerl/miller/issues/150</a>.)</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ mlr --pprint put -S &#39;$copy = $value; $type = typeof($value)&#39; data/scan-example-2.tbl
value copy type
0001 0001 string
0002 0002 string
0005 0005 string
0005WA 0005WA string
0006 0006 string
0007 0007 string
0007WA 0007WA string
0008 0008 string
0009 0009 string
0010 0010 string
</pre></div>
</div>
</div>
<div class="section" id="how-do-i-examine-then-chaining">
<h2>How do I examine then-chaining?<a class="headerlink" href="#how-do-i-examine-then-chaining" title="Permalink to this headline"></a></h2>
<p>Then-chaining found in Miller is intended to function the same as Unix pipes, but with less keystroking. You can print your data one pipeline step at a time, to see what intermediate output at one step becomes the input to the next step.</p>
<p>First, look at the input data:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ cat data/then-example.csv
Status,Payment_Type,Amount
paid,cash,10.00
pending,debit,20.00
paid,cash,50.00
pending,credit,40.00
paid,debit,30.00
</pre></div>
</div>
<p>Next, run the first step of your command, omitting anything from the first <code class="docutils literal notranslate"><span class="pre">then</span></code> onward:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ mlr --icsv --opprint count-distinct -f Status,Payment_Type data/then-example.csv
Status Payment_Type count
paid cash 2
pending debit 1
pending credit 1
paid debit 1
</pre></div>
</div>
<p>After that, run it with the next <code class="docutils literal notranslate"><span class="pre">then</span></code> step included:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ mlr --icsv --opprint count-distinct -f Status,Payment_Type then sort -nr count data/then-example.csv
Status Payment_Type count
paid cash 2
pending debit 1
pending credit 1
paid debit 1
</pre></div>
</div>
<p>Now if you use <code class="docutils literal notranslate"><span class="pre">then</span></code> to include another verb after that, the columns <code class="docutils literal notranslate"><span class="pre">Status</span></code>, <code class="docutils literal notranslate"><span class="pre">Payment_Type</span></code>, and <code class="docutils literal notranslate"><span class="pre">count</span></code> will be the input to that verb.</p>
<p>Note, by the way, that youll get the same results using pipes:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ mlr --csv count-distinct -f Status,Payment_Type data/then-example.csv | mlr --icsv --opprint sort -nr count
Status Payment_Type count
paid cash 2
pending debit 1
pending credit 1
paid debit 1
</pre></div>
</div>
</div>
<div class="section" id="i-assigned-9-and-it-s-not-9th">
<h2>I assigned $9 and its not 9th<a class="headerlink" href="#i-assigned-9-and-it-s-not-9th" title="Permalink to this headline"></a></h2>
<p>Miller records are ordered lists of key-value pairs. For NIDX format, DKVP format when keys are missing, or CSV/CSV-lite format with <code class="docutils literal notranslate"><span class="pre">--implicit-csv-header</span></code>, Miller will sequentially assign keys of the form <code class="docutils literal notranslate"><span class="pre">1</span></code>, <code class="docutils literal notranslate"><span class="pre">2</span></code>, etc. But these are not integer array indices: theyre just field names taken from the initial field ordering in the input data.</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ echo x,y,z | mlr --dkvp cat
1=x,2=y,3=z
</pre></div>
</div>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ echo x,y,z | mlr --dkvp put &#39;$6=&quot;a&quot;;$4=&quot;b&quot;;$55=&quot;cde&quot;&#39;
1=x,2=y,3=z,6=a,4=b,55=cde
</pre></div>
</div>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ echo x,y,z | mlr --nidx cat
x,y,z
</pre></div>
</div>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ echo x,y,z | mlr --csv --implicit-csv-header cat
1,2,3
x,y,z
</pre></div>
</div>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ echo x,y,z | mlr --dkvp rename 2,999
1=x,999=y,3=z
</pre></div>
</div>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ echo x,y,z | mlr --dkvp rename 2,newname
1=x,newname=y,3=z
</pre></div>
</div>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ echo x,y,z | mlr --csv --implicit-csv-header reorder -f 3,1,2
3,1,2
z,x,y
</pre></div>
</div>
</div>
<div class="section" id="how-can-i-filter-by-date">
<h2>How can I filter by date?<a class="headerlink" href="#how-can-i-filter-by-date" title="Permalink to this headline"></a></h2>
<p>Given input like</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ cat dates.csv
date,event
2018-02-03,initialization
2018-03-07,discovery
2018-02-03,allocation
</pre></div>
</div>
<p>we can use <code class="docutils literal notranslate"><span class="pre">strptime</span></code> to parse the date field into seconds-since-epoch and then do numeric comparisons. Simply match your input datasets date-formatting to the <a class="reference internal" href="reference-dsl.html#reference-dsl-strptime"><span class="std std-ref">strptime</span></a> format-string. For example:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ mlr --csv filter &#39;strptime($date, &quot;%Y-%m-%d&quot;) &gt; strptime(&quot;2018-03-03&quot;, &quot;%Y-%m-%d&quot;)&#39; dates.csv
date,event
2018-03-07,discovery
</pre></div>
</div>
<p>Caveat: localtime-handling in timezones with DST is still a work in progress; see <a class="reference external" href="https://github.com/johnkerl/miller/issues/170">https://github.com/johnkerl/miller/issues/170</a>. See also <a class="reference external" href="https://github.com/johnkerl/miller/issues/208">https://github.com/johnkerl/miller/issues/208</a> thanks &#64;aborruso!</p>
</div>
<div class="section" id="how-can-i-handle-commas-as-data-in-various-formats">
<h2>How can I handle commas-as-data in various formats?<a class="headerlink" href="#how-can-i-handle-commas-as-data-in-various-formats" title="Permalink to this headline"></a></h2>
<p><a class="reference internal" href="file-formats.html"><span class="doc">CSV</span></a> handles this well and by design:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ cat commas.csv
Name,Role
&quot;Xiao, Lin&quot;,administrator
&quot;Khavari, Darius&quot;,tester
</pre></div>
</div>
<p>Likewise <a class="reference internal" href="file-formats.html#file-formats-json"><span class="std std-ref">Tabular JSON</span></a>:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ mlr --icsv --ojson cat commas.csv
{ &quot;Name&quot;: &quot;Xiao, Lin&quot;, &quot;Role&quot;: &quot;administrator&quot; }
{ &quot;Name&quot;: &quot;Khavari, Darius&quot;, &quot;Role&quot;: &quot;tester&quot; }
</pre></div>
</div>
<p>For Millers <a class="reference internal" href="file-formats.html#file-formats-xtab"><span class="std std-ref">vertical-tabular format</span></a> there is no escaping for carriage returns, but commas work fine:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ mlr --icsv --oxtab cat commas.csv
Name Xiao, Lin
Role administrator
Name Khavari, Darius
Role tester
</pre></div>
</div>
<p>But for <a class="reference internal" href="file-formats.html#file-formats-dkvp"><span class="std std-ref">Key-value_pairs</span></a> and <a class="reference internal" href="file-formats.html#file-formats-nidx"><span class="std std-ref">index-numbered</span></a>, commas are the default field separator. And as of Miller 5.4.0 anyway there is no CSV-style double-quote-handling like there is for CSV. So commas within the data look like delimiters:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ mlr --icsv --odkvp cat commas.csv
Name=Xiao, Lin,Role=administrator
Name=Khavari, Darius,Role=tester
</pre></div>
</div>
<p>One solution is to use a different delimiter, such as a pipe character:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ mlr --icsv --odkvp --ofs pipe cat commas.csv
Name=Xiao, Lin|Role=administrator
Name=Khavari, Darius|Role=tester
</pre></div>
</div>
<p>To be extra-sure to avoid data/delimiter clashes, you can also use control
characters as delimiters here, control-A:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ mlr --icsv --odkvp --ofs &#39;\001&#39; cat commas.csv | cat -v
Name=Xiao, Lin^ARole=administrator
Name=Khavari, Darius^ARole=tester
</pre></div>
</div>
</div>
<div class="section" id="how-can-i-handle-field-names-with-special-symbols-in-them">
<h2>How can I handle field names with special symbols in them?<a class="headerlink" href="#how-can-i-handle-field-names-with-special-symbols-in-them" title="Permalink to this headline"></a></h2>
<p>Simply surround the field names with curly braces:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ echo &#39;x.a=3,y:b=4,z/c=5&#39; | mlr put &#39;${product.all} = ${x.a} * ${y:b} * ${z/c}&#39;
x.a=3,y:b=4,z/c=5,product.all=60
</pre></div>
</div>
</div>
<div class="section" id="how-to-escape-in-regexes">
<h2>How to escape ? in regexes?<a class="headerlink" href="#how-to-escape-in-regexes" title="Permalink to this headline"></a></h2>
<p>One way is to use square brackets; an alternative is to use simple string-substitution rather than a regular expression.</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ cat data/question.dat
a=is it?,b=it is!
$ mlr --oxtab put &#39;$c = gsub($a, &quot;[?]&quot;,&quot; ...&quot;)&#39; data/question.dat
a is it?
b it is!
c is it ...
$ mlr --oxtab put &#39;$c = ssub($a, &quot;?&quot;,&quot; ...&quot;)&#39; data/question.dat
a is it?
b it is!
c is it ...
</pre></div>
</div>
<p>The <code class="docutils literal notranslate"><span class="pre">ssub</span></code> function exists precisely for this reason: so you dont have to escape anything.</p>
</div>
<div class="section" id="how-can-i-put-single-quotes-into-strings">
<h2>How can I put single-quotes into strings?<a class="headerlink" href="#how-can-i-put-single-quotes-into-strings" title="Permalink to this headline"></a></h2>
<p>This is a little tricky due to the shells handling of quotes. For simplicity, lets first put an update script into a file:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$a = &quot;It&#39;s OK, I said, then &#39;for now&#39;.&quot;
</pre></div>
</div>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ echo a=bcd | mlr put -f data/single-quote-example.mlr
a=It&#39;s OK, I said, then &#39;for now&#39;.
</pre></div>
</div>
<p>So, its simple: Millers DSL uses double quotes for strings, and you can put single quotes (or backslash-escaped double-quotes) inside strings, no problem.</p>
<p>Without putting the update expression in a file, its messier:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ echo a=bcd | mlr put &#39;$a=&quot;It&#39;\&#39;&#39;s OK, I said, &#39;\&#39;&#39;for now&#39;\&#39;&#39;.&quot;&#39;
a=It&#39;s OK, I said, &#39;for now&#39;.
</pre></div>
</div>
<p>The idea is that the outermost single-quotes are to protect the <code class="docutils literal notranslate"><span class="pre">put</span></code> expression from the shell, and the double quotes within them are for Miller. To get a single quote in the middle there, you need to actually put it <em>outside</em> the single-quoting for the shell. The pieces are the following, all concatenated together:</p>
<ul class="simple">
<li><p><code class="docutils literal notranslate"><span class="pre">$a=&quot;It</span></code></p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">\'</span></code></p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">s</span> <span class="pre">OK,</span> <span class="pre">I</span> <span class="pre">said,</span></code></p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">\'</span></code></p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">for</span> <span class="pre">now</span></code></p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">\'</span></code></p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">.</span></code></p></li>
</ul>
</div>
<div class="section" id="why-doesn-t-mlr-cut-put-fields-in-the-order-i-want">
<h2>Why doesnt mlr cut put fields in the order I want?<a class="headerlink" href="#why-doesn-t-mlr-cut-put-fields-in-the-order-i-want" title="Permalink to this headline"></a></h2>
<p>Example: columns <code class="docutils literal notranslate"><span class="pre">x,i,a</span></code> were requested but they appear here in the order <code class="docutils literal notranslate"><span class="pre">a,i,x</span></code>:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ cat data/small
a=pan,b=pan,i=1,x=0.3467901443380824,y=0.7268028627434533
a=eks,b=pan,i=2,x=0.7586799647899636,y=0.5221511083334797
a=wye,b=wye,i=3,x=0.20460330576630303,y=0.33831852551664776
a=eks,b=wye,i=4,x=0.38139939387114097,y=0.13418874328430463
a=wye,b=pan,i=5,x=0.5732889198020006,y=0.8636244699032729
</pre></div>
</div>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ mlr cut -f x,i,a data/small
a=pan,i=1,x=0.3467901443380824
a=eks,i=2,x=0.7586799647899636
a=wye,i=3,x=0.20460330576630303
a=eks,i=4,x=0.38139939387114097
a=wye,i=5,x=0.5732889198020006
</pre></div>
</div>
<p>The issue is that Millers <code class="docutils literal notranslate"><span class="pre">cut</span></code>, by default, outputs cut fields in the order they appear in the input data. This design decision was made intentionally to parallel the Unix/Linux system <code class="docutils literal notranslate"><span class="pre">cut</span></code> command, which has the same semantics.</p>
<p>The solution is to use the <code class="docutils literal notranslate"><span class="pre">-o</span></code> option:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ mlr cut -o -f x,i,a data/small
x=0.3467901443380824,i=1,a=pan
x=0.7586799647899636,i=2,a=eks
x=0.20460330576630303,i=3,a=wye
x=0.38139939387114097,i=4,a=eks
x=0.5732889198020006,i=5,a=wye
</pre></div>
</div>
</div>
<div class="section" id="nr-is-not-consecutive-after-then-chaining">
<h2>NR is not consecutive after then-chaining<a class="headerlink" href="#nr-is-not-consecutive-after-then-chaining" title="Permalink to this headline"></a></h2>
<p>Given this input data:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ cat data/small
a=pan,b=pan,i=1,x=0.3467901443380824,y=0.7268028627434533
a=eks,b=pan,i=2,x=0.7586799647899636,y=0.5221511083334797
a=wye,b=wye,i=3,x=0.20460330576630303,y=0.33831852551664776
a=eks,b=wye,i=4,x=0.38139939387114097,y=0.13418874328430463
a=wye,b=pan,i=5,x=0.5732889198020006,y=0.8636244699032729
</pre></div>
</div>
<p>why dont I see <code class="docutils literal notranslate"><span class="pre">NR=1</span></code> and <code class="docutils literal notranslate"><span class="pre">NR=2</span></code> here??</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ mlr filter &#39;$x &gt; 0.5&#39; then put &#39;$NR = NR&#39; data/small
a=eks,b=pan,i=2,x=0.7586799647899636,y=0.5221511083334797,NR=2
a=wye,b=pan,i=5,x=0.5732889198020006,y=0.8636244699032729,NR=5
</pre></div>
</div>
<p>The reason is that <code class="docutils literal notranslate"><span class="pre">NR</span></code> is computed for the original input records and isnt dynamically updated. By contrast, <code class="docutils literal notranslate"><span class="pre">NF</span></code> is dynamically updated: its the number of fields in the current record, and if you add/remove a field, the value of <code class="docutils literal notranslate"><span class="pre">NF</span></code> will change:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ echo x=1,y=2,z=3 | mlr put &#39;$nf1 = NF; $u = 4; $nf2 = NF; unset $x,$y,$z; $nf3 = NF&#39;
nf1=3,u=4,nf2=5,nf3=3
</pre></div>
</div>
<p><code class="docutils literal notranslate"><span class="pre">NR</span></code>, by contrast (and <code class="docutils literal notranslate"><span class="pre">FNR</span></code> as well), retains the value from the original input stream, and records may be dropped by a <code class="docutils literal notranslate"><span class="pre">filter</span></code> within a <code class="docutils literal notranslate"><span class="pre">then</span></code>-chain. To recover consecutive record numbers, you can use out-of-stream variables as follows:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ mlr --opprint --from data/small put &#39;
begin{ @nr1 = 0 }
@nr1 += 1;
$nr1 = @nr1
&#39; \
then filter &#39;$x&gt;0.5&#39; \
then put &#39;
begin{ @nr2 = 0 }
@nr2 += 1;
$nr2 = @nr2
&#39;
a b i x y nr1 nr2
eks pan 2 0.7586799647899636 0.5221511083334797 2 1
wye pan 5 0.5732889198020006 0.8636244699032729 5 2
</pre></div>
</div>
<p>Or, simply use <code class="docutils literal notranslate"><span class="pre">mlr</span> <span class="pre">cat</span> <span class="pre">-n</span></code>:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ mlr filter &#39;$x &gt; 0.5&#39; then cat -n data/small
n=1,a=eks,b=pan,i=2,x=0.7586799647899636,y=0.5221511083334797
n=2,a=wye,b=pan,i=5,x=0.5732889198020006,y=0.8636244699032729
</pre></div>
</div>
</div>
<div class="section" id="why-am-i-not-seeing-all-possible-joins-occur">
<h2>Why am I not seeing all possible joins occur?<a class="headerlink" href="#why-am-i-not-seeing-all-possible-joins-occur" title="Permalink to this headline"></a></h2>
<p><strong>This section describes behavior before Miller 5.1.0. As of 5.1.0, -u is the default.</strong></p>
<p>For example, the right file here has nine records, and the left file should add in the <code class="docutils literal notranslate"><span class="pre">hostname</span></code> column so the join output should also have 9 records:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ mlr --icsvlite --opprint cat data/join-u-left.csv
hostname ipaddr
nadir.east.our.org 10.3.1.18
zenith.west.our.org 10.3.1.27
apoapsis.east.our.org 10.4.5.94
</pre></div>
</div>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ mlr --icsvlite --opprint cat data/join-u-right.csv
ipaddr timestamp bytes
10.3.1.27 1448762579 4568
10.3.1.18 1448762578 8729
10.4.5.94 1448762579 17445
10.3.1.27 1448762589 12
10.3.1.18 1448762588 44558
10.4.5.94 1448762589 8899
10.3.1.27 1448762599 0
10.3.1.18 1448762598 73425
10.4.5.94 1448762599 12200
</pre></div>
</div>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ mlr --icsvlite --opprint join -s -j ipaddr -f data/join-u-left.csv data/join-u-right.csv
ipaddr hostname timestamp bytes
10.3.1.27 zenith.west.our.org 1448762579 4568
10.4.5.94 apoapsis.east.our.org 1448762579 17445
10.4.5.94 apoapsis.east.our.org 1448762589 8899
10.4.5.94 apoapsis.east.our.org 1448762599 12200
</pre></div>
</div>
<p>The issue is that Millers <code class="docutils literal notranslate"><span class="pre">join</span></code>, by default (before 5.1.0), took input sorted (lexically ascending) by the sort keys on both the left and right files. This design decision was made intentionally to parallel the Unix/Linux system <code class="docutils literal notranslate"><span class="pre">join</span></code> command, which has the same semantics. The benefit of this default is that the joiner program can stream through the left and right files, needing to load neither entirely into memory. The drawback, of course, is that is requires sorted input.</p>
<p>The solution (besides pre-sorting the input files on the join keys) is to simply use <strong>mlr join -u</strong> (which is now the default). This loads the left file entirely into memory (while the right file is still streamed one line at a time) and does all possible joins without requiring sorted input:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ mlr --icsvlite --opprint join -u -j ipaddr -f data/join-u-left.csv data/join-u-right.csv
ipaddr hostname timestamp bytes
10.3.1.27 zenith.west.our.org 1448762579 4568
10.3.1.18 nadir.east.our.org 1448762578 8729
10.4.5.94 apoapsis.east.our.org 1448762579 17445
10.3.1.27 zenith.west.our.org 1448762589 12
10.3.1.18 nadir.east.our.org 1448762588 44558
10.4.5.94 apoapsis.east.our.org 1448762589 8899
10.3.1.27 zenith.west.our.org 1448762599 0
10.3.1.18 nadir.east.our.org 1448762598 73425
10.4.5.94 apoapsis.east.our.org 1448762599 12200
</pre></div>
</div>
<p>General advice is to make sure the left-file is relatively small, e.g. containing name-to-number mappings, while saving large amounts of data for the right file.</p>
</div>
<div class="section" id="how-to-rectangularize-after-joins-with-unpaired">
<h2>How to rectangularize after joins with unpaired?<a class="headerlink" href="#how-to-rectangularize-after-joins-with-unpaired" title="Permalink to this headline"></a></h2>
<p>Suppose you have the following two data files:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="nb">id</span><span class="p">,</span><span class="n">code</span>
<span class="mi">3</span><span class="p">,</span><span class="mi">0000</span><span class="n">ff</span>
<span class="mi">2</span><span class="p">,</span><span class="mi">00</span><span class="n">ff00</span>
<span class="mi">4</span><span class="p">,</span><span class="n">ff0000</span>
</pre></div>
</div>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="nb">id</span><span class="p">,</span><span class="n">color</span>
<span class="mi">4</span><span class="p">,</span><span class="n">red</span>
<span class="mi">2</span><span class="p">,</span><span class="n">green</span>
</pre></div>
</div>
<p>Joining on color the results are as expected:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ mlr --csv join -j id -f data/color-codes.csv data/color-names.csv
id,code,color
4,ff0000,red
2,00ff00,green
</pre></div>
</div>
<p>However, if we ask for left-unpaireds, since theres no <code class="docutils literal notranslate"><span class="pre">color</span></code> column, we get a row not having the same column names as the other:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ mlr --csv join --ul -j id -f data/color-codes.csv data/color-names.csv
id,code,color
4,ff0000,red
2,00ff00,green
id,code
3,0000ff
</pre></div>
</div>
<p>To fix this, we can use <strong>unsparsify</strong>:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ mlr --csv join --ul -j id -f data/color-codes.csv then unsparsify --fill-with &quot;&quot; data/color-names.csv
id,code,color
4,ff0000,red
2,00ff00,green
3,0000ff,
</pre></div>
</div>
<p>Thanks to &#64;aborruso for the tip!</p>
</div>
<div class="section" id="what-about-xml-or-json-file-formats">
<h2>What about XML or JSON file formats?<a class="headerlink" href="#what-about-xml-or-json-file-formats" title="Permalink to this headline"></a></h2>
<p>Miller handles <strong>tabular data</strong>, which is a list of records each having fields which are key-value pairs. Miller also doesnt require that each record have the same field names (see also <a class="reference internal" href="record-heterogeneity.html"><span class="doc">Record-heterogeneity</span></a>). Regardless, tabular data is a <strong>non-recursive data structure</strong>.</p>
<p>XML, JSON, etc. are, by contrast, all <strong>recursive</strong> or <strong>nested</strong> data structures. For example, in JSON you can represent a hash map whose values are lists of lists.</p>
<p>Now, you can put tabular data into these formats since list-of-key-value-pairs is one of the things representable in XML or JSON. Example:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="c1"># DKVP</span>
<span class="n">x</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span><span class="n">y</span><span class="o">=</span><span class="mi">2</span>
<span class="n">z</span><span class="o">=</span><span class="mi">3</span>
</pre></div>
</div>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="c1"># XML</span>
<span class="o">&lt;</span><span class="n">table</span><span class="o">&gt;</span>
<span class="o">&lt;</span><span class="n">record</span><span class="o">&gt;</span>
<span class="o">&lt;</span><span class="n">field</span><span class="o">&gt;</span>
<span class="o">&lt;</span><span class="n">key</span><span class="o">&gt;</span> <span class="n">x</span> <span class="o">&lt;/</span><span class="n">key</span><span class="o">&gt;</span> <span class="o">&lt;</span><span class="n">value</span><span class="o">&gt;</span> <span class="mi">1</span> <span class="o">&lt;/</span><span class="n">value</span><span class="o">&gt;</span>
<span class="o">&lt;/</span><span class="n">field</span><span class="o">&gt;</span>
<span class="o">&lt;</span><span class="n">field</span><span class="o">&gt;</span>
<span class="o">&lt;</span><span class="n">key</span><span class="o">&gt;</span> <span class="n">y</span> <span class="o">&lt;/</span><span class="n">key</span><span class="o">&gt;</span> <span class="o">&lt;</span><span class="n">value</span><span class="o">&gt;</span> <span class="mi">2</span> <span class="o">&lt;/</span><span class="n">value</span><span class="o">&gt;</span>
<span class="o">&lt;/</span><span class="n">field</span><span class="o">&gt;</span>
<span class="o">&lt;/</span><span class="n">record</span><span class="o">&gt;</span>
<span class="o">&lt;</span><span class="n">record</span><span class="o">&gt;</span>
<span class="o">&lt;</span><span class="n">field</span><span class="o">&gt;</span>
<span class="o">&lt;</span><span class="n">key</span><span class="o">&gt;</span> <span class="n">z</span> <span class="o">&lt;/</span><span class="n">key</span><span class="o">&gt;</span> <span class="o">&lt;</span><span class="n">value</span><span class="o">&gt;</span> <span class="mi">3</span> <span class="o">&lt;/</span><span class="n">value</span><span class="o">&gt;</span>
<span class="o">&lt;/</span><span class="n">field</span><span class="o">&gt;</span>
<span class="o">&lt;/</span><span class="n">record</span><span class="o">&gt;</span>
<span class="o">&lt;/</span><span class="n">table</span><span class="o">&gt;</span>
</pre></div>
</div>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="c1"># JSON</span>
<span class="p">[{</span><span class="s2">&quot;x&quot;</span><span class="p">:</span><span class="mi">1</span><span class="p">,</span><span class="s2">&quot;y&quot;</span><span class="p">:</span><span class="mi">2</span><span class="p">},{</span><span class="s2">&quot;z&quot;</span><span class="p">:</span><span class="mi">3</span><span class="p">}]</span>
</pre></div>
</div>
<p>However, a tool like Miller which handles non-recursive data is never going to be able to handle full XML/JSON semantics only a small subset. If tabular data represented in XML/JSON/etc are sufficiently well-structured, it may be easy to grep/sed out the data into a simpler text form this is a general text-processing problem.</p>
<p>Miller does support tabular data represented in JSON: please see <a class="reference internal" href="file-formats.html"><span class="doc">File formats</span></a>. See also <a class="reference external" href="https://stedolan.github.io/jq/">jq</a> for a truly powerful, JSON-specific tool.</p>
<p>For XML, my suggestion is to use a tool like <a class="reference external" href="http://ff-extractor.sourceforge.net">ff-extractor</a> to do format conversion.</p>
</div>
</div>
<div class="clearer"></div>
</div>
</div>
</div>
<div class="sphinxsidebar" role="navigation" aria-label="main navigation">
<div class="sphinxsidebarwrapper">
<h3><a href="index.html">Table of Contents</a></h3>
<ul>
<li><a class="reference internal" href="#">FAQ</a><ul>
<li><a class="reference internal" href="#no-output-at-all">No output at all</a></li>
<li><a class="reference internal" href="#fields-not-selected">Fields not selected</a></li>
<li><a class="reference internal" href="#diagnosing-delimiter-specifications">Diagnosing delimiter specifications</a></li>
<li><a class="reference internal" href="#how-do-i-suppress-numeric-conversion">How do I suppress numeric conversion?</a></li>
<li><a class="reference internal" href="#how-do-i-examine-then-chaining">How do I examine then-chaining?</a></li>
<li><a class="reference internal" href="#i-assigned-9-and-it-s-not-9th">I assigned $9 and its not 9th</a></li>
<li><a class="reference internal" href="#how-can-i-filter-by-date">How can I filter by date?</a></li>
<li><a class="reference internal" href="#how-can-i-handle-commas-as-data-in-various-formats">How can I handle commas-as-data in various formats?</a></li>
<li><a class="reference internal" href="#how-can-i-handle-field-names-with-special-symbols-in-them">How can I handle field names with special symbols in them?</a></li>
<li><a class="reference internal" href="#how-to-escape-in-regexes">How to escape ? in regexes?</a></li>
<li><a class="reference internal" href="#how-can-i-put-single-quotes-into-strings">How can I put single-quotes into strings?</a></li>
<li><a class="reference internal" href="#why-doesn-t-mlr-cut-put-fields-in-the-order-i-want">Why doesnt mlr cut put fields in the order I want?</a></li>
<li><a class="reference internal" href="#nr-is-not-consecutive-after-then-chaining">NR is not consecutive after then-chaining</a></li>
<li><a class="reference internal" href="#why-am-i-not-seeing-all-possible-joins-occur">Why am I not seeing all possible joins occur?</a></li>
<li><a class="reference internal" href="#how-to-rectangularize-after-joins-with-unpaired">How to rectangularize after joins with unpaired?</a></li>
<li><a class="reference internal" href="#what-about-xml-or-json-file-formats">What about XML or JSON file formats?</a></li>
</ul>
</li>
</ul>
<h4>Previous topic</h4>
<p class="topless"><a href="contact.html"
title="previous chapter">Contact</a></p>
<h4>Next topic</h4>
<p class="topless"><a href="sql-examples.html"
title="next chapter">SQL examples</a></p>
<div role="note" aria-label="source link">
<h3>This Page</h3>
<ul class="this-page-menu">
<li><a href="_sources/faq.rst.txt"
rel="nofollow">Show Source</a></li>
</ul>
</div>
<div id="searchbox" style="display: none" role="search">
<h3 id="searchlabel">Quick search</h3>
<div class="searchformwrapper">
<form class="search" action="search.html" method="get">
<input type="text" name="q" aria-labelledby="searchlabel" />
<input type="submit" value="Go" />
</form>
</div>
</div>
<script>$('#searchbox').show(0);</script>
</div>
</div>
<div class="clearer"></div>
</div>
<div class="related" role="navigation" aria-label="related navigation">
<h3>Navigation</h3>
<ul>
<li class="right" style="margin-right: 10px">
<a href="genindex.html" title="General Index"
>index</a></li>
<li class="right" >
<a href="sql-examples.html" title="SQL examples"
>next</a> |</li>
<li class="right" >
<a href="contact.html" title="Contact"
>previous</a> |</li>
<li class="nav-item nav-item-0"><a href="index.html">Miller 5.10.2 documentation</a> &#187;</li>
<li class="nav-item nav-item-this"><a href="">FAQ</a></li>
</ul>
</div>
<div class="footer" role="contentinfo">
&#169; Copyright 2020, John Kerl.
Created using <a href="https://www.sphinx-doc.org/">Sphinx</a> 3.2.1.
</div>
</body>
</html>

View file

@ -1,162 +0,0 @@
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>Unix-toolkit context &#8212; Miller 5.10.2 documentation</title>
<link rel="stylesheet" href="_static/classic.css" type="text/css" />
<link rel="stylesheet" href="_static/pygments.css" type="text/css" />
<script id="documentation_options" data-url_root="./" src="_static/documentation_options.js"></script>
<script src="_static/jquery.js"></script>
<script src="_static/underscore.js"></script>
<script src="_static/doctools.js"></script>
<script src="_static/language_data.js"></script>
<link rel="index" title="Index" href="genindex.html" />
<link rel="search" title="Search" href="search.html" />
<link rel="next" title="File formats" href="file-formats.html" />
<link rel="prev" title="Miller in 10 minutes" href="10min.html" />
</head><body>
<div class="related" role="navigation" aria-label="related navigation">
<h3>Navigation</h3>
<ul>
<li class="right" style="margin-right: 10px">
<a href="genindex.html" title="General Index"
accesskey="I">index</a></li>
<li class="right" >
<a href="file-formats.html" title="File formats"
accesskey="N">next</a> |</li>
<li class="right" >
<a href="10min.html" title="Miller in 10 minutes"
accesskey="P">previous</a> |</li>
<li class="nav-item nav-item-0"><a href="index.html">Miller 5.10.2 documentation</a> &#187;</li>
<li class="nav-item nav-item-this"><a href="">Unix-toolkit context</a></li>
</ul>
</div>
<div class="document">
<div class="documentwrapper">
<div class="bodywrapper">
<div class="body" role="main">
<div class="section" id="unix-toolkit-context">
<h1>Unix-toolkit context<a class="headerlink" href="#unix-toolkit-context" title="Permalink to this headline"></a></h1>
<p>How does Miller fit within the Unix toolkit (<cite>grep</cite>, <cite>sed</cite>, <cite>awk</cite>, etc.)?</p>
<div class="section" id="file-format-awareness">
<h2>File-format awareness<a class="headerlink" href="#file-format-awareness" title="Permalink to this headline"></a></h2>
<p>Miller respects CSV headers. If you do <code class="docutils literal notranslate"><span class="pre">mlr</span> <span class="pre">--csv</span> <span class="pre">cat</span> <span class="pre">*.csv</span></code> then the header line is written once:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ cat data/a.csv
a,b,c
1,2,3
4,5,6
</pre></div>
</div>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ cat data/b.csv
a,b,c
7,8,9
</pre></div>
</div>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ mlr --csv cat data/a.csv data/b.csv
a,b,c
1,2,3
4,5,6
7,8,9
</pre></div>
</div>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ mlr --csv sort -nr b data/a.csv data/b.csv
a,b,c
7,8,9
4,5,6
1,2,3
</pre></div>
</div>
<p>Likewise with <code class="docutils literal notranslate"><span class="pre">mlr</span> <span class="pre">sort</span></code>, <code class="docutils literal notranslate"><span class="pre">mlr</span> <span class="pre">tac</span></code>, and so on.</p>
</div>
<div class="section" id="awk-like-features-mlr-filter-and-mlr-put">
<h2>awk-like features: mlr filter and mlr put<a class="headerlink" href="#awk-like-features-mlr-filter-and-mlr-put" title="Permalink to this headline"></a></h2>
<ul class="simple">
<li><p><code class="docutils literal notranslate"><span class="pre">mlr</span> <span class="pre">filter</span></code> includes/excludes records based on a filter expression, e.g. <code class="docutils literal notranslate"><span class="pre">mlr</span> <span class="pre">filter</span> <span class="pre">'$count</span> <span class="pre">&gt;</span> <span class="pre">10'</span></code>.</p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">mlr</span> <span class="pre">put</span></code> adds a new field as a function of others, e.g. <code class="docutils literal notranslate"><span class="pre">mlr</span> <span class="pre">put</span> <span class="pre">'$xy</span> <span class="pre">=</span> <span class="pre">$x</span> <span class="pre">*</span> <span class="pre">$y'</span></code> or <code class="docutils literal notranslate"><span class="pre">mlr</span> <span class="pre">put</span> <span class="pre">'$counter</span> <span class="pre">=</span> <span class="pre">NR'</span></code>.</p></li>
<li><p>The <code class="docutils literal notranslate"><span class="pre">$name</span></code> syntax is straight from <code class="docutils literal notranslate"><span class="pre">awk</span></code>s <code class="docutils literal notranslate"><span class="pre">$1</span> <span class="pre">$2</span> <span class="pre">$3</span></code> (adapted to name-based indexing), as are the variables <code class="docutils literal notranslate"><span class="pre">FS</span></code>, <code class="docutils literal notranslate"><span class="pre">OFS</span></code>, <code class="docutils literal notranslate"><span class="pre">RS</span></code>, <code class="docutils literal notranslate"><span class="pre">ORS</span></code>, <code class="docutils literal notranslate"><span class="pre">NF</span></code>, <code class="docutils literal notranslate"><span class="pre">NR</span></code>, and <code class="docutils literal notranslate"><span class="pre">FILENAME</span></code>. The <code class="docutils literal notranslate"><span class="pre">ENV[...]</span></code> syntax is from Ruby.</p></li>
<li><p>While <code class="docutils literal notranslate"><span class="pre">awk</span></code> functions are record-based, Miller subcommands (or <em>verbs</em>) are stream-based: each of them maps a stream of records into another stream of records.</p></li>
<li><p>Like <code class="docutils literal notranslate"><span class="pre">awk</span></code>, Miller (as of v5.0.0) allows you to define new functions within its <code class="docutils literal notranslate"><span class="pre">put</span></code> and <code class="docutils literal notranslate"><span class="pre">filter</span></code> expression language. Further programmability comes from chaining with <code class="docutils literal notranslate"><span class="pre">then</span></code>.</p></li>
<li><p>As with <code class="docutils literal notranslate"><span class="pre">awk</span></code>, <code class="docutils literal notranslate"><span class="pre">$</span></code>-variables are stream variables and all verbs (such as <code class="docutils literal notranslate"><span class="pre">cut</span></code>, <code class="docutils literal notranslate"><span class="pre">stats1</span></code>, <code class="docutils literal notranslate"><span class="pre">put</span></code>, etc.) as well as <code class="docutils literal notranslate"><span class="pre">put</span></code>/<code class="docutils literal notranslate"><span class="pre">filter</span></code> statements operate on streams. This means that you define actions to be done on each record and then stream your data through those actions. The built-in variables <code class="docutils literal notranslate"><span class="pre">NF</span></code>, <code class="docutils literal notranslate"><span class="pre">NR</span></code>, etc. change from one line to another, <code class="docutils literal notranslate"><span class="pre">$x</span></code> is a label for field <code class="docutils literal notranslate"><span class="pre">x</span></code> in the current record, and the input to <code class="docutils literal notranslate"><span class="pre">sqrt($x)</span></code> changes from one record to the next. The expression language for the <code class="docutils literal notranslate"><span class="pre">put</span></code> and <code class="docutils literal notranslate"><span class="pre">filter</span></code> verbs additionally allows you to define <code class="docutils literal notranslate"><span class="pre">begin</span> <span class="pre">{...}</span></code> and <code class="docutils literal notranslate"><span class="pre">end</span> <span class="pre">{...}</span></code> blocks for actions to be taken before and after records are processed, respectively.</p></li>
<li><p>As with <code class="docutils literal notranslate"><span class="pre">awk</span></code>, Millers <code class="docutils literal notranslate"><span class="pre">put</span></code>/<code class="docutils literal notranslate"><span class="pre">filter</span></code> language lets you set <code class="docutils literal notranslate"><span class="pre">&#64;sum=0</span></code> before records are read, then update that sum on each record, then print its value at the end. Unlike <code class="docutils literal notranslate"><span class="pre">awk</span></code>, Miller makes syntactically explicit the difference between variables with extent across all records (names starting with <code class="docutils literal notranslate"><span class="pre">&#64;</span></code>, such as <code class="docutils literal notranslate"><span class="pre">&#64;sum</span></code>) and variables which are local to the current expression (names starting without <code class="docutils literal notranslate"><span class="pre">&#64;</span></code>, such as <code class="docutils literal notranslate"><span class="pre">sum</span></code>).</p></li>
<li><p>Miller can be faster than <code class="docutils literal notranslate"><span class="pre">awk</span></code>, <code class="docutils literal notranslate"><span class="pre">cut</span></code>, and so on, depending on platform; see also <a class="reference internal" href="performance.html"><span class="doc">Performance</span></a>. In particular, Millers DSL syntax is parsed into C control structures at startup time, with the bulk data-stream processing all done in C.</p></li>
</ul>
</div>
<div class="section" id="see-also">
<h2>See also<a class="headerlink" href="#see-also" title="Permalink to this headline"></a></h2>
<p>See <a class="reference internal" href="reference-verbs.html"><span class="doc">Verbs reference</span></a> for more on Millers subcommands <code class="docutils literal notranslate"><span class="pre">cat</span></code>, <code class="docutils literal notranslate"><span class="pre">cut</span></code>, <code class="docutils literal notranslate"><span class="pre">head</span></code>, <code class="docutils literal notranslate"><span class="pre">sort</span></code>, <code class="docutils literal notranslate"><span class="pre">tac</span></code>, <code class="docutils literal notranslate"><span class="pre">tail</span></code>, <code class="docutils literal notranslate"><span class="pre">top</span></code>, and <code class="docutils literal notranslate"><span class="pre">uniq</span></code>, as well as <a class="reference internal" href="reference-dsl.html"><span class="doc">DSL reference</span></a> for more on the awk-like <code class="docutils literal notranslate"><span class="pre">mlr</span> <span class="pre">filter</span></code> and <code class="docutils literal notranslate"><span class="pre">mlr</span> <span class="pre">put</span></code>.</p>
</div>
</div>
<div class="clearer"></div>
</div>
</div>
</div>
<div class="sphinxsidebar" role="navigation" aria-label="main navigation">
<div class="sphinxsidebarwrapper">
<h3><a href="index.html">Table of Contents</a></h3>
<ul>
<li><a class="reference internal" href="#">Unix-toolkit context</a><ul>
<li><a class="reference internal" href="#file-format-awareness">File-format awareness</a></li>
<li><a class="reference internal" href="#awk-like-features-mlr-filter-and-mlr-put">awk-like features: mlr filter and mlr put</a></li>
<li><a class="reference internal" href="#see-also">See also</a></li>
</ul>
</li>
</ul>
<h4>Previous topic</h4>
<p class="topless"><a href="10min.html"
title="previous chapter">Miller in 10 minutes</a></p>
<h4>Next topic</h4>
<p class="topless"><a href="file-formats.html"
title="next chapter">File formats</a></p>
<div role="note" aria-label="source link">
<h3>This Page</h3>
<ul class="this-page-menu">
<li><a href="_sources/feature-comparison.rst.txt"
rel="nofollow">Show Source</a></li>
</ul>
</div>
<div id="searchbox" style="display: none" role="search">
<h3 id="searchlabel">Quick search</h3>
<div class="searchformwrapper">
<form class="search" action="search.html" method="get">
<input type="text" name="q" aria-labelledby="searchlabel" />
<input type="submit" value="Go" />
</form>
</div>
</div>
<script>$('#searchbox').show(0);</script>
</div>
</div>
<div class="clearer"></div>
</div>
<div class="related" role="navigation" aria-label="related navigation">
<h3>Navigation</h3>
<ul>
<li class="right" style="margin-right: 10px">
<a href="genindex.html" title="General Index"
>index</a></li>
<li class="right" >
<a href="file-formats.html" title="File formats"
>next</a> |</li>
<li class="right" >
<a href="10min.html" title="Miller in 10 minutes"
>previous</a> |</li>
<li class="nav-item nav-item-0"><a href="index.html">Miller 5.10.2 documentation</a> &#187;</li>
<li class="nav-item nav-item-this"><a href="">Unix-toolkit context</a></li>
</ul>
</div>
<div class="footer" role="contentinfo">
&#169; Copyright 2020, John Kerl.
Created using <a href="https://www.sphinx-doc.org/">Sphinx</a> 3.2.1.
</div>
</body>
</html>

View file

@ -1,129 +0,0 @@
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>Features &#8212; Miller 5.10.2 documentation</title>
<link rel="stylesheet" href="_static/classic.css" type="text/css" />
<link rel="stylesheet" href="_static/pygments.css" type="text/css" />
<script id="documentation_options" data-url_root="./" src="_static/documentation_options.js"></script>
<script src="_static/jquery.js"></script>
<script src="_static/underscore.js"></script>
<script src="_static/doctools.js"></script>
<script src="_static/language_data.js"></script>
<link rel="index" title="Index" href="genindex.html" />
<link rel="search" title="Search" href="search.html" />
<link rel="next" title="Miller in 10 minutes" href="10min.html" />
<link rel="prev" title="Quick examples" href="quick-examples.html" />
</head><body>
<div class="related" role="navigation" aria-label="related navigation">
<h3>Navigation</h3>
<ul>
<li class="right" style="margin-right: 10px">
<a href="genindex.html" title="General Index"
accesskey="I">index</a></li>
<li class="right" >
<a href="10min.html" title="Miller in 10 minutes"
accesskey="N">next</a> |</li>
<li class="right" >
<a href="quick-examples.html" title="Quick examples"
accesskey="P">previous</a> |</li>
<li class="nav-item nav-item-0"><a href="index.html">Miller 5.10.2 documentation</a> &#187;</li>
<li class="nav-item nav-item-this"><a href="">Features</a></li>
</ul>
</div>
<div class="document">
<div class="documentwrapper">
<div class="bodywrapper">
<div class="body" role="main">
<div class="section" id="features">
<h1>Features<a class="headerlink" href="#features" title="Permalink to this headline"></a></h1>
<p>Miller is like awk, sed, cut, join, and sort for <strong>name-indexed data such as
CSV, TSV, and tabular JSON</strong>. You get to work with your data using named
fields, without needing to count positional column indices.</p>
<p>This is something the Unix toolkit always could have done, and arguably
always should have done. It operates on key-value-pair data while the familiar
Unix tools operate on integer-indexed fields: if the natural data structure for
the latter is the array, then Millers natural data structure is the
insertion-ordered hash map. This encompasses a <strong>variety of data formats</strong>,
including but not limited to the familiar CSV, TSV, and JSON. (Miller can handle
<strong>positionally-indexed data</strong> as a special case.)</p>
<ul class="simple">
<li><p>Miller is <strong>multi-purpose</strong>: its useful for <strong>data cleaning</strong>, <strong>data reduction</strong>, <strong>statistical reporting</strong>, <strong>devops</strong>, <strong>system administration</strong>, <strong>log-file processing</strong>, <strong>format conversion</strong>, and <strong>database-query post-processing</strong>.</p></li>
<li><p>You can use Miller to snarf and munge <strong>log-file data</strong>, including selecting out relevant substreams, then produce CSV format and load that into all-in-memory/data-frame utilities for further statistical and/or graphical processing.</p></li>
<li><p>Miller complements <strong>data-analysis tools</strong> such as <strong>R</strong>, <strong>pandas</strong>, etc.: you can use Miller to <strong>clean</strong> and <strong>prepare</strong> your data. While you can do <strong>basic statistics</strong> entirely in Miller, its streaming-data feature and single-pass algorithms enable you to <strong>reduce very large data sets</strong>.</p></li>
<li><p>Miller complements SQL <strong>databases</strong>: you can slice, dice, and reformat data on the client side on its way into or out of a database. (Examples <a class="reference internal" href="sql-examples.html#sql-input-examples"><span class="std std-ref">here</span></a> and <a class="reference internal" href="sql-examples.html#sql-output-examples"><span class="std std-ref">here</span></a>.) You can also reap some of the benefits of databases for quick, setup-free one-off tasks when you just need to query some data in disk files in a hurry.</p></li>
<li><p>Miller also goes beyond the classic Unix tools by stepping fully into our modern, <strong>no-SQL</strong> world: its essential record-heterogeneity property allows Miller to operate on data where records with different schema (field names) are interleaved.</p></li>
<li><p>Miller is <strong>streaming</strong>: most operations need only a single record in memory at a time, rather than ingesting all input before producing any output. For those operations which require deeper retention (<code class="docutils literal notranslate"><span class="pre">sort</span></code>, <code class="docutils literal notranslate"><span class="pre">tac</span></code>, <code class="docutils literal notranslate"><span class="pre">stats1</span></code>), Miller retains only as much data as needed. This means that whenever functionally possible, you can operate on files which are larger than your systems available RAM, and you can use Miller in <strong>tail -f</strong> contexts.</p></li>
<li><p>Miller is <strong>pipe-friendly</strong> and interoperates with the Unix toolkit</p></li>
<li><p>Millers I/O formats include <strong>tabular pretty-printing</strong>, <strong>positionally indexed</strong> (Unix-toolkit style), CSV, JSON, and others</p></li>
<li><p>Miller does <strong>conversion</strong> between formats</p></li>
<li><p>Millers <strong>processing is format-aware</strong>: e.g. CSV <code class="docutils literal notranslate"><span class="pre">sort</span></code> and <code class="docutils literal notranslate"><span class="pre">tac</span></code> keep header lines first</p></li>
<li><p>Miller has high-throughput <strong>performance</strong> on par with the Unix toolkit</p></li>
<li><p>Not unlike <a class="reference external" href="https://stedolan.github.io/jq/">jq</a> (for JSON), Miller is written in portable, modern C, with <strong>zero runtime dependencies</strong>. You can download or compile a single binary, <code class="docutils literal notranslate"><span class="pre">scp</span></code> it to a faraway machine, and expect it to work.</p></li>
</ul>
<p>Releases and release notes: <a class="reference external" href="https://github.com/johnkerl/miller/releases">https://github.com/johnkerl/miller/releases</a>.</p>
</div>
<div class="clearer"></div>
</div>
</div>
</div>
<div class="sphinxsidebar" role="navigation" aria-label="main navigation">
<div class="sphinxsidebarwrapper">
<h4>Previous topic</h4>
<p class="topless"><a href="quick-examples.html"
title="previous chapter">Quick examples</a></p>
<h4>Next topic</h4>
<p class="topless"><a href="10min.html"
title="next chapter">Miller in 10 minutes</a></p>
<div role="note" aria-label="source link">
<h3>This Page</h3>
<ul class="this-page-menu">
<li><a href="_sources/features.rst.txt"
rel="nofollow">Show Source</a></li>
</ul>
</div>
<div id="searchbox" style="display: none" role="search">
<h3 id="searchlabel">Quick search</h3>
<div class="searchformwrapper">
<form class="search" action="search.html" method="get">
<input type="text" name="q" aria-labelledby="searchlabel" />
<input type="submit" value="Go" />
</form>
</div>
</div>
<script>$('#searchbox').show(0);</script>
</div>
</div>
<div class="clearer"></div>
</div>
<div class="related" role="navigation" aria-label="related navigation">
<h3>Navigation</h3>
<ul>
<li class="right" style="margin-right: 10px">
<a href="genindex.html" title="General Index"
>index</a></li>
<li class="right" >
<a href="10min.html" title="Miller in 10 minutes"
>next</a> |</li>
<li class="right" >
<a href="quick-examples.html" title="Quick examples"
>previous</a> |</li>
<li class="nav-item nav-item-0"><a href="index.html">Miller 5.10.2 documentation</a> &#187;</li>
<li class="nav-item nav-item-this"><a href="">Features</a></li>
</ul>
</div>
<div class="footer" role="contentinfo">
&#169; Copyright 2020, John Kerl.
Created using <a href="https://www.sphinx-doc.org/">Sphinx</a> 3.2.1.
</div>
</body>
</html>

View file

@ -1,643 +0,0 @@
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>File formats &#8212; Miller 5.10.2 documentation</title>
<link rel="stylesheet" href="_static/classic.css" type="text/css" />
<link rel="stylesheet" href="_static/pygments.css" type="text/css" />
<script id="documentation_options" data-url_root="./" src="_static/documentation_options.js"></script>
<script src="_static/jquery.js"></script>
<script src="_static/underscore.js"></script>
<script src="_static/doctools.js"></script>
<script src="_static/language_data.js"></script>
<link rel="index" title="Index" href="genindex.html" />
<link rel="search" title="Search" href="search.html" />
<link rel="next" title="Record-heterogeneity" href="record-heterogeneity.html" />
<link rel="prev" title="Unix-toolkit context" href="feature-comparison.html" />
</head><body>
<div class="related" role="navigation" aria-label="related navigation">
<h3>Navigation</h3>
<ul>
<li class="right" style="margin-right: 10px">
<a href="genindex.html" title="General Index"
accesskey="I">index</a></li>
<li class="right" >
<a href="record-heterogeneity.html" title="Record-heterogeneity"
accesskey="N">next</a> |</li>
<li class="right" >
<a href="feature-comparison.html" title="Unix-toolkit context"
accesskey="P">previous</a> |</li>
<li class="nav-item nav-item-0"><a href="index.html">Miller 5.10.2 documentation</a> &#187;</li>
<li class="nav-item nav-item-this"><a href="">File formats</a></li>
</ul>
</div>
<div class="document">
<div class="documentwrapper">
<div class="bodywrapper">
<div class="body" role="main">
<div class="section" id="file-formats">
<h1>File formats<a class="headerlink" href="#file-formats" title="Permalink to this headline"></a></h1>
<p>Miller handles name-indexed data using several formats: some you probably know by name, such as CSV, TSV, and JSON and other formats youre likely already seeing and using in your structured data. Additionally, Miller gives you the option of including comments within your data.</p>
<div class="section" id="examples">
<h2>Examples<a class="headerlink" href="#examples" title="Permalink to this headline"></a></h2>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ mlr --usage-data-format-examples
DKVP: delimited key-value pairs (Miller default format)
+---------------------+
| apple=1,bat=2,cog=3 | Record 1: &quot;apple&quot; =&gt; &quot;1&quot;, &quot;bat&quot; =&gt; &quot;2&quot;, &quot;cog&quot; =&gt; &quot;3&quot;
| dish=7,egg=8,flint | Record 2: &quot;dish&quot; =&gt; &quot;7&quot;, &quot;egg&quot; =&gt; &quot;8&quot;, &quot;3&quot; =&gt; &quot;flint&quot;
+---------------------+
NIDX: implicitly numerically indexed (Unix-toolkit style)
+---------------------+
| the quick brown | Record 1: &quot;1&quot; =&gt; &quot;the&quot;, &quot;2&quot; =&gt; &quot;quick&quot;, &quot;3&quot; =&gt; &quot;brown&quot;
| fox jumped | Record 2: &quot;1&quot; =&gt; &quot;fox&quot;, &quot;2&quot; =&gt; &quot;jumped&quot;
+---------------------+
CSV/CSV-lite: comma-separated values with separate header line
+---------------------+
| apple,bat,cog |
| 1,2,3 | Record 1: &quot;apple =&gt; &quot;1&quot;, &quot;bat&quot; =&gt; &quot;2&quot;, &quot;cog&quot; =&gt; &quot;3&quot;
| 4,5,6 | Record 2: &quot;apple&quot; =&gt; &quot;4&quot;, &quot;bat&quot; =&gt; &quot;5&quot;, &quot;cog&quot; =&gt; &quot;6&quot;
+---------------------+
Tabular JSON: nested objects are supported, although arrays within them are not:
+---------------------+
| { |
| &quot;apple&quot;: 1, | Record 1: &quot;apple&quot; =&gt; &quot;1&quot;, &quot;bat&quot; =&gt; &quot;2&quot;, &quot;cog&quot; =&gt; &quot;3&quot;
| &quot;bat&quot;: 2, |
| &quot;cog&quot;: 3 |
| } |
| { |
| &quot;dish&quot;: { | Record 2: &quot;dish:egg&quot; =&gt; &quot;7&quot;, &quot;dish:flint&quot; =&gt; &quot;8&quot;, &quot;garlic&quot; =&gt; &quot;&quot;
| &quot;egg&quot;: 7, |
| &quot;flint&quot;: 8 |
| }, |
| &quot;garlic&quot;: &quot;&quot; |
| } |
+---------------------+
PPRINT: pretty-printed tabular
+---------------------+
| apple bat cog |
| 1 2 3 | Record 1: &quot;apple =&gt; &quot;1&quot;, &quot;bat&quot; =&gt; &quot;2&quot;, &quot;cog&quot; =&gt; &quot;3&quot;
| 4 5 6 | Record 2: &quot;apple&quot; =&gt; &quot;4&quot;, &quot;bat&quot; =&gt; &quot;5&quot;, &quot;cog&quot; =&gt; &quot;6&quot;
+---------------------+
XTAB: pretty-printed transposed tabular
+---------------------+
| apple 1 | Record 1: &quot;apple&quot; =&gt; &quot;1&quot;, &quot;bat&quot; =&gt; &quot;2&quot;, &quot;cog&quot; =&gt; &quot;3&quot;
| bat 2 |
| cog 3 |
| |
| dish 7 | Record 2: &quot;dish&quot; =&gt; &quot;7&quot;, &quot;egg&quot; =&gt; &quot;8&quot;
| egg 8 |
+---------------------+
Markdown tabular (supported for output only):
+-----------------------+
| | apple | bat | cog | |
| | --- | --- | --- | |
| | 1 | 2 | 3 | | Record 1: &quot;apple =&gt; &quot;1&quot;, &quot;bat&quot; =&gt; &quot;2&quot;, &quot;cog&quot; =&gt; &quot;3&quot;
| | 4 | 5 | 6 | | Record 2: &quot;apple&quot; =&gt; &quot;4&quot;, &quot;bat&quot; =&gt; &quot;5&quot;, &quot;cog&quot; =&gt; &quot;6&quot;
+-----------------------+
</pre></div>
</div>
</div>
<div class="section" id="csv-tsv-asv-usv-etc">
<span id="file-formats-csv"></span><h2>CSV/TSV/ASV/USV/etc.<a class="headerlink" href="#csv-tsv-asv-usv-etc" title="Permalink to this headline"></a></h2>
<p>When <code class="docutils literal notranslate"><span class="pre">mlr</span></code> is invoked with the <code class="docutils literal notranslate"><span class="pre">--csv</span></code> or <code class="docutils literal notranslate"><span class="pre">--csvlite</span></code> option, key names are found on the first record and values are taken from subsequent records. This includes the case of CSV-formatted files. See <a class="reference internal" href="record-heterogeneity.html"><span class="doc">Record-heterogeneity</span></a> for how Miller handles changes of field names within a single data stream.</p>
<p>Miller has record separator <code class="docutils literal notranslate"><span class="pre">RS</span></code> and field separator <code class="docutils literal notranslate"><span class="pre">FS</span></code>, just as <code class="docutils literal notranslate"><span class="pre">awk</span></code> does. For TSV, use <code class="docutils literal notranslate"><span class="pre">--fs</span> <span class="pre">tab</span></code>; to convert TSV to CSV, use <code class="docutils literal notranslate"><span class="pre">--ifs</span> <span class="pre">tab</span> <span class="pre">--ofs</span> <span class="pre">comma</span></code>, etc. (See also <a class="reference internal" href="reference.html#reference-separators"><span class="std std-ref">Record/field/pair separators</span></a>.)</p>
<p><strong>TSV (tab-separated values):</strong> the following are synonymous pairs:</p>
<ul class="simple">
<li><p><code class="docutils literal notranslate"><span class="pre">--tsv</span></code> and <code class="docutils literal notranslate"><span class="pre">--csv</span> <span class="pre">--fs</span> <span class="pre">tab</span></code></p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">--itsv</span></code> and <code class="docutils literal notranslate"><span class="pre">--icsv</span> <span class="pre">--ifs</span> <span class="pre">tab</span></code></p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">--otsv</span></code> and <code class="docutils literal notranslate"><span class="pre">--ocsv</span> <span class="pre">--ofs</span> <span class="pre">tab</span></code></p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">--tsvlite</span></code> and <code class="docutils literal notranslate"><span class="pre">--csvlite</span> <span class="pre">--fs</span> <span class="pre">tab</span></code></p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">--itsvlite</span></code> and <code class="docutils literal notranslate"><span class="pre">--icsvlite</span> <span class="pre">--ifs</span> <span class="pre">tab</span></code></p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">--otsvlite</span></code> and <code class="docutils literal notranslate"><span class="pre">--ocsvlite</span> <span class="pre">--ofs</span> <span class="pre">tab</span></code></p></li>
</ul>
<p><strong>ASV (ASCII-separated values):</strong> the flags <code class="docutils literal notranslate"><span class="pre">--asv</span></code>, <code class="docutils literal notranslate"><span class="pre">--iasv</span></code>, <code class="docutils literal notranslate"><span class="pre">--oasv</span></code>, <code class="docutils literal notranslate"><span class="pre">--asvlite</span></code>, <code class="docutils literal notranslate"><span class="pre">--iasvlite</span></code>, and <code class="docutils literal notranslate"><span class="pre">--oasvlite</span></code> are analogous except they use ASCII FS and RS 0x1f and 0x1e, respectively.</p>
<p><strong>USV (Unicode-separated values):</strong> likewise, the flags <code class="docutils literal notranslate"><span class="pre">--usv</span></code>, <code class="docutils literal notranslate"><span class="pre">--iusv</span></code>, <code class="docutils literal notranslate"><span class="pre">--ousv</span></code>, <code class="docutils literal notranslate"><span class="pre">--usvlite</span></code>, <code class="docutils literal notranslate"><span class="pre">--iusvlite</span></code>, and <code class="docutils literal notranslate"><span class="pre">--ousvlite</span></code> use Unicode FS and RS U+241F (UTF-8 0x0xe2909f) and U+241E (UTF-8 0xe2909e), respectively.</p>
<p>Millers <code class="docutils literal notranslate"><span class="pre">--csv</span></code> flag supports <a class="reference external" href="https://tools.ietf.org/html/rfc4180&quot;">RFC-4180 CSV</a>. This includes CRLF line-terminators by default, regardless of platform.</p>
<p>Here are the differences between CSV and CSV-lite:</p>
<ul class="simple">
<li><p>CSV supports <a class="reference external" href="https://tools.ietf.org/html/rfc4180">RFC-4180</a>-style double-quoting, including the ability to have commas and/or LF/CRLF line-endings contained within an input field; CSV-lite does not.</p></li>
<li><p>CSV does not allow heterogeneous data; CSV-lite does (see also <a class="reference internal" href="record-heterogeneity.html"><span class="doc">Record-heterogeneity</span></a>).</p></li>
<li><p>The CSV-lite input-reading code is fractionally more efficient than the CSV input-reader.</p></li>
</ul>
<p>Here are things they have in common:</p>
<ul class="simple">
<li><p>The ability to specify record/field separators other than the default, e.g. CR-LF vs. LF, or tab instead of comma for TSV, and so on.</p></li>
<li><p>The <code class="docutils literal notranslate"><span class="pre">--implicit-csv-header</span></code> flag for input and the <code class="docutils literal notranslate"><span class="pre">--headerless-csv-output</span></code> flag for output.</p></li>
</ul>
</div>
<div class="section" id="dkvp-key-value-pairs">
<span id="file-formats-dkvp"></span><h2>DKVP: Key-value pairs<a class="headerlink" href="#dkvp-key-value-pairs" title="Permalink to this headline"></a></h2>
<p>Millers default file format is DKVP, for <strong>delimited key-value pairs</strong>. Example:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ mlr cat data/small
a=pan,b=pan,i=1,x=0.3467901443380824,y=0.7268028627434533
a=eks,b=pan,i=2,x=0.7586799647899636,y=0.5221511083334797
a=wye,b=wye,i=3,x=0.20460330576630303,y=0.33831852551664776
a=eks,b=wye,i=4,x=0.38139939387114097,y=0.13418874328430463
a=wye,b=pan,i=5,x=0.5732889198020006,y=0.8636244699032729
</pre></div>
</div>
<p>Such data are easy to generate, e.g. in Ruby with</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">puts</span> <span class="s2">&quot;host=#</span><span class="si">{hostname}</span><span class="s2">,seconds=#{t2-t1},message=#</span><span class="si">{msg}</span><span class="s2">&quot;</span>
</pre></div>
</div>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">puts</span> <span class="n">mymap</span><span class="o">.</span><span class="n">collect</span><span class="p">{</span><span class="o">|</span><span class="n">k</span><span class="p">,</span><span class="n">v</span><span class="o">|</span> <span class="s2">&quot;#</span><span class="si">{k}</span><span class="s2">=#</span><span class="si">{v}</span><span class="s2">&quot;</span><span class="p">}</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="s1">&#39;,&#39;</span><span class="p">)</span>
</pre></div>
</div>
<p>or <code class="docutils literal notranslate"><span class="pre">print</span></code> statements in various languages, e.g.</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">echo</span> <span class="s2">&quot;type=3,user=$USER,date=$date</span><span class="se">\n</span><span class="s2">&quot;</span><span class="p">;</span>
</pre></div>
</div>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">logger</span><span class="o">.</span><span class="n">log</span><span class="p">(</span><span class="s2">&quot;type=3,user=$USER,date=$date</span><span class="se">\n</span><span class="s2">&quot;</span><span class="p">);</span>
</pre></div>
</div>
<p>Fields lacking an IPS will have positional index (starting at 1) used as the key, as in NIDX format. For example, <code class="docutils literal notranslate"><span class="pre">dish=7,egg=8,flint</span></code> is parsed as <code class="docutils literal notranslate"><span class="pre">&quot;dish&quot;</span> <span class="pre">=&gt;</span> <span class="pre">&quot;7&quot;,</span> <span class="pre">&quot;egg&quot;</span> <span class="pre">=&gt;</span> <span class="pre">&quot;8&quot;,</span> <span class="pre">&quot;3&quot;</span> <span class="pre">=&gt;</span> <span class="pre">&quot;flint&quot;</span></code> and <code class="docutils literal notranslate"><span class="pre">dish,egg,flint</span></code> is parsed as <code class="docutils literal notranslate"><span class="pre">&quot;1&quot;</span> <span class="pre">=&gt;</span> <span class="pre">&quot;dish&quot;,</span> <span class="pre">&quot;2&quot;</span> <span class="pre">=&gt;</span> <span class="pre">&quot;egg&quot;,</span> <span class="pre">&quot;3&quot;</span> <span class="pre">=&gt;</span> <span class="pre">&quot;flint&quot;</span></code>.</p>
<p>As discussed in <a class="reference internal" href="record-heterogeneity.html"><span class="doc">Record-heterogeneity</span></a>, Miller handles changes of field names within the same data stream. But using DKVP format this is particularly natural. One of my favorite use-cases for Miller is in application/server logs, where I log all sorts of lines such as</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">resource</span><span class="o">=/</span><span class="n">path</span><span class="o">/</span><span class="n">to</span><span class="o">/</span><span class="n">file</span><span class="p">,</span><span class="n">loadsec</span><span class="o">=</span><span class="mf">0.45</span><span class="p">,</span><span class="n">ok</span><span class="o">=</span><span class="n">true</span>
<span class="n">record_count</span><span class="o">=</span><span class="mi">100</span><span class="p">,</span> <span class="n">resource</span><span class="o">=/</span><span class="n">path</span><span class="o">/</span><span class="n">to</span><span class="o">/</span><span class="n">file</span>
<span class="n">resource</span><span class="o">=/</span><span class="n">some</span><span class="o">/</span><span class="n">other</span><span class="o">/</span><span class="n">path</span><span class="p">,</span><span class="n">loadsec</span><span class="o">=</span><span class="mf">0.97</span><span class="p">,</span><span class="n">ok</span><span class="o">=</span><span class="n">false</span>
</pre></div>
</div>
<p>etc. and I just log them as needed. Then later, I can use <code class="docutils literal notranslate"><span class="pre">grep</span></code>, <code class="docutils literal notranslate"><span class="pre">mlr</span> <span class="pre">--opprint</span> <span class="pre">group-like</span></code>, etc.
to analyze my logs.</p>
<p>See <a class="reference internal" href="reference.html"><span class="doc">Main reference</span></a> regarding how to specify separators other than the default equals-sign and comma.</p>
</div>
<div class="section" id="nidx-index-numbered-toolkit-style">
<span id="file-formats-nidx"></span><h2>NIDX: Index-numbered (toolkit style)<a class="headerlink" href="#nidx-index-numbered-toolkit-style" title="Permalink to this headline"></a></h2>
<p>With <code class="docutils literal notranslate"><span class="pre">--inidx</span> <span class="pre">--ifs</span> <span class="pre">'</span> <span class="pre">'</span> <span class="pre">--repifs</span></code>, Miller splits lines on whitespace and assigns integer field names starting with 1. This recapitulates Unix-toolkit behavior.</p>
<p>Example with index-numbered output:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ cat data/small
a=pan,b=pan,i=1,x=0.3467901443380824,y=0.7268028627434533
a=eks,b=pan,i=2,x=0.7586799647899636,y=0.5221511083334797
a=wye,b=wye,i=3,x=0.20460330576630303,y=0.33831852551664776
a=eks,b=wye,i=4,x=0.38139939387114097,y=0.13418874328430463
a=wye,b=pan,i=5,x=0.5732889198020006,y=0.8636244699032729
$ mlr --onidx --ofs &#39; &#39; cat data/small
pan pan 1 0.3467901443380824 0.7268028627434533
eks pan 2 0.7586799647899636 0.5221511083334797
wye wye 3 0.20460330576630303 0.33831852551664776
eks wye 4 0.38139939387114097 0.13418874328430463
wye pan 5 0.5732889198020006 0.8636244699032729
</pre></div>
</div>
<p>Example with index-numbered input:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ cat data/mydata.txt
oh say can you see
by the dawn&#39;s
early light
$ mlr --inidx --ifs &#39; &#39; --odkvp cat data/mydata.txt
1=oh,2=say,3=can,4=you,5=see
1=by,2=the,3=dawn&#39;s
1=early,2=light
</pre></div>
</div>
<p>Example with index-numbered input and output:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ cat data/mydata.txt
oh say can you see
by the dawn&#39;s
early light
$ mlr --nidx --fs &#39; &#39; --repifs cut -f 2,3 data/mydata.txt
say can
the dawn&#39;s
light
</pre></div>
</div>
</div>
<div class="section" id="tabular-json">
<span id="file-formats-json"></span><h2>Tabular JSON<a class="headerlink" href="#tabular-json" title="Permalink to this headline"></a></h2>
<p>JSON is a format which supports arbitrarily deep nesting of “objects” (hashmaps) and “arrays” (lists), while Miller is a tool for handling <strong>tabular data</strong> only. This means Miller cannot (and should not) handle arbitrary JSON. (Check out <a class="reference external" href="https://stedolan.github.io/jq/">jq</a>.)</p>
<p>But if you have tabular data represented in JSON then Miller can handle that for you.</p>
<div class="section" id="single-level-json-objects">
<h3>Single-level JSON objects<a class="headerlink" href="#single-level-json-objects" title="Permalink to this headline"></a></h3>
<p>An <strong>array of single-level objects</strong> is, quite simply, <strong>a table</strong>:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ mlr --json head -n 2 then cut -f color,shape data/json-example-1.json
{ &quot;color&quot;: &quot;yellow&quot;, &quot;shape&quot;: &quot;triangle&quot; }
{ &quot;color&quot;: &quot;red&quot;, &quot;shape&quot;: &quot;square&quot; }
$ mlr --json --jvstack head -n 2 then cut -f color,u,v data/json-example-1.json
{
&quot;color&quot;: &quot;yellow&quot;,
&quot;u&quot;: 0.6321695890307647,
&quot;v&quot;: 0.9887207810889004
}
{
&quot;color&quot;: &quot;red&quot;,
&quot;u&quot;: 0.21966833570651523,
&quot;v&quot;: 0.001257332190235938
}
$ mlr --ijson --opprint stats1 -a mean,stddev,count -f u -g shape data/json-example-1.json
shape u_mean u_stddev u_count
triangle 0.583995 0.131184 3
square 0.409355 0.365428 4
circle 0.366013 0.209094 3
</pre></div>
</div>
</div>
<div class="section" id="nested-json-objects">
<h3>Nested JSON objects<a class="headerlink" href="#nested-json-objects" title="Permalink to this headline"></a></h3>
<p>Additionally, Miller can <strong>tabularize nested objects by concatentating keys</strong>:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ mlr --json --jvstack head -n 2 data/json-example-2.json
{
&quot;flag&quot;: 1,
&quot;i&quot;: 11,
&quot;attributes&quot;: {
&quot;color&quot;: &quot;yellow&quot;,
&quot;shape&quot;: &quot;triangle&quot;
},
&quot;values&quot;: {
&quot;u&quot;: 0.632170,
&quot;v&quot;: 0.988721,
&quot;w&quot;: 0.436498,
&quot;x&quot;: 5.798188
}
}
{
&quot;flag&quot;: 1,
&quot;i&quot;: 15,
&quot;attributes&quot;: {
&quot;color&quot;: &quot;red&quot;,
&quot;shape&quot;: &quot;square&quot;
},
&quot;values&quot;: {
&quot;u&quot;: 0.219668,
&quot;v&quot;: 0.001257,
&quot;w&quot;: 0.792778,
&quot;x&quot;: 2.944117
}
}
$ mlr --ijson --opprint head -n 4 data/json-example-2.json
flag i attributes:color attributes:shape values:u values:v values:w values:x
1 11 yellow triangle 0.632170 0.988721 0.436498 5.798188
1 15 red square 0.219668 0.001257 0.792778 2.944117
1 16 red circle 0.209017 0.290052 0.138103 5.065034
0 48 red square 0.956274 0.746720 0.775542 7.117831
</pre></div>
</div>
<p>Note in particular that as far as Millers <code class="docutils literal notranslate"><span class="pre">put</span></code> and <code class="docutils literal notranslate"><span class="pre">filter</span></code>, as well as other I/O formats, are concerned, these are simply field names with colons in them:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ mlr --json --jvstack head -n 1 then put &#39;${values:uv} = ${values:u} * ${values:v}&#39; data/json-example-2.json
{
&quot;flag&quot;: 1,
&quot;i&quot;: 11,
&quot;attributes&quot;: {
&quot;color&quot;: &quot;yellow&quot;,
&quot;shape&quot;: &quot;triangle&quot;
},
&quot;values&quot;: {
&quot;u&quot;: 0.632170,
&quot;v&quot;: 0.988721,
&quot;w&quot;: 0.436498,
&quot;x&quot;: 5.798188,
&quot;uv&quot;: 0.625040
}
}
</pre></div>
</div>
</div>
<div class="section" id="arrays">
<h3>Arrays<a class="headerlink" href="#arrays" title="Permalink to this headline"></a></h3>
<p>Arrays arent supported in Millers <code class="docutils literal notranslate"><span class="pre">put</span></code>/<code class="docutils literal notranslate"><span class="pre">filter</span></code> DSL. By default, JSON arrays are read in as integer-keyed maps.</p>
<p>Suppose we have arrays like this in our input data:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ cat data/json-example-3.json
{
&quot;label&quot;: &quot;orange&quot;,
&quot;values&quot;: [12.2, 13.8, 17.2]
}
{
&quot;label&quot;: &quot;purple&quot;,
&quot;values&quot;: [27.0, 32.4]
}
</pre></div>
</div>
<p>Then integer indices (starting from 0 and counting up) are used as map keys:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ mlr --ijson --oxtab cat data/json-example-3.json
label orange
values:0 12.2
values:1 13.8
values:2 17.2
label purple
values:0 27.0
values:1 32.4
</pre></div>
</div>
<p>When the data are written back out as JSON, field names are re-expanded as above, but what were arrays on input are now maps on output:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ mlr --json --jvstack cat data/json-example-3.json
{
&quot;label&quot;: &quot;orange&quot;,
&quot;values&quot;: {
&quot;0&quot;: 12.2,
&quot;1&quot;: 13.8,
&quot;2&quot;: 17.2
}
}
{
&quot;label&quot;: &quot;purple&quot;,
&quot;values&quot;: {
&quot;0&quot;: 27.0,
&quot;1&quot;: 32.4
}
}
</pre></div>
</div>
<p>This is non-ideal, but it allows Miller (5.x release being latest as of this writing) to handle JSON arrays at all.</p>
<p>You might also use <code class="docutils literal notranslate"><span class="pre">mlr</span> <span class="pre">--json-skip-arrays-on-input</span></code> or <code class="docutils literal notranslate"><span class="pre">mlr</span> <span class="pre">--json-fatal-arrays-on-input</span></code>.</p>
<p>To truly handle JSON, please use a JSON-processing tool such as <a class="reference external" href="https://stedolan.github.io/jq/">jq</a>.</p>
</div>
<div class="section" id="formatting-json-options">
<h3>Formatting JSON options<a class="headerlink" href="#formatting-json-options" title="Permalink to this headline"></a></h3>
<p>JSON isnt a parameterized format, so <code class="docutils literal notranslate"><span class="pre">RS</span></code>, <code class="docutils literal notranslate"><span class="pre">FS</span></code>, <code class="docutils literal notranslate"><span class="pre">PS</span></code> arent specifiable. Nonetheless, you can do the following:</p>
<ul class="simple">
<li><p>Use <code class="docutils literal notranslate"><span class="pre">--jvstack</span></code> to pretty-print JSON objects with multi-line (vertically stacked) spacing. By default, each Miller record (JSON object) is one per line.</p></li>
<li><p>Keystroke-savers: <code class="docutils literal notranslate"><span class="pre">--jsonx</span></code> simply means <code class="docutils literal notranslate"><span class="pre">--json</span> <span class="pre">--jvstack</span></code>, and <code class="docutils literal notranslate"><span class="pre">--ojsonx</span></code> simply means <code class="docutils literal notranslate"><span class="pre">--ojson</span> <span class="pre">--jvstack</span></code>.</p></li>
<li><p>Use <code class="docutils literal notranslate"><span class="pre">--jlistwrap</span></code> to print the sequence of JSON objects wrapped in an outermost <code class="docutils literal notranslate"><span class="pre">[</span></code> and <code class="docutils literal notranslate"><span class="pre">]</span></code>. By default, these arent printed.</p></li>
<li><p>Use <code class="docutils literal notranslate"><span class="pre">--jquoteall</span></code> to double-quote all object values. By default, integers, floating-point numbers, and booleans <code class="docutils literal notranslate"><span class="pre">true</span></code> and <code class="docutils literal notranslate"><span class="pre">false</span></code> are not double-quoted when they appear as JSON-object keys.</p></li>
<li><p>Use <code class="docutils literal notranslate"><span class="pre">--jflatsep</span> <span class="pre">yourstringhere</span></code> to specify the string used for key concatenation: this defaults to a single colon.</p></li>
<li><p>Use <code class="docutils literal notranslate"><span class="pre">--jofmt</span></code> to force Miller to apply the global <code class="docutils literal notranslate"><span class="pre">--ofmt</span></code> to floating-point values. First note: please use sprintf-style codes for double precision, e.g. ending in <code class="docutils literal notranslate"><span class="pre">%lf</span></code>, <code class="docutils literal notranslate"><span class="pre">%le</span></code>, or <code class="docutils literal notranslate"><span class="pre">%lg</span></code>. Miller floats are double-precision so behavior using <code class="docutils literal notranslate"><span class="pre">%f</span></code>, <code class="docutils literal notranslate"><span class="pre">%d</span></code>, etc. is undefined. Second note: <code class="docutils literal notranslate"><span class="pre">0.123</span></code> is valid JSON; <code class="docutils literal notranslate"><span class="pre">.123</span></code> is not. Thus this feature allows you to emit JSON which may be unparseable by other tools.</p></li>
</ul>
<p>Again, please see <a class="reference external" href="https://stedolan.github.io/jq/">jq</a> for a truly powerful, JSON-specific tool.</p>
</div>
<div class="section" id="json-non-streaming">
<h3>JSON non-streaming<a class="headerlink" href="#json-non-streaming" title="Permalink to this headline"></a></h3>
<p>The JSON parser Miller uses does not return until all input is parsed: in particular this means that, unlike for other file formats, Miller does not (at present) handle JSON files in <code class="docutils literal notranslate"><span class="pre">tail</span> <span class="pre">-f</span></code> contexts.</p>
</div>
</div>
<div class="section" id="pprint-pretty-printed-tabular">
<span id="file-formats-pprint"></span><h2>PPRINT: Pretty-printed tabular<a class="headerlink" href="#pprint-pretty-printed-tabular" title="Permalink to this headline"></a></h2>
<p>Millers pretty-print format is like CSV, but column-aligned. For example, compare</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ mlr --ocsv cat data/small
a,b,i,x,y
pan,pan,1,0.3467901443380824,0.7268028627434533
eks,pan,2,0.7586799647899636,0.5221511083334797
wye,wye,3,0.20460330576630303,0.33831852551664776
eks,wye,4,0.38139939387114097,0.13418874328430463
wye,pan,5,0.5732889198020006,0.8636244699032729
$ mlr --opprint cat data/small
a b i x y
pan pan 1 0.3467901443380824 0.7268028627434533
eks pan 2 0.7586799647899636 0.5221511083334797
wye wye 3 0.20460330576630303 0.33831852551664776
eks wye 4 0.38139939387114097 0.13418874328430463
wye pan 5 0.5732889198020006 0.8636244699032729
</pre></div>
</div>
<p>Note that while Miller is a line-at-a-time processor and retains input lines in memory only where necessary (e.g. for sort), pretty-print output requires it to accumulate all input lines (so that it can compute maximum column widths) before producing any output. This has two consequences: (a) pretty-print output wont work on <code class="docutils literal notranslate"><span class="pre">tail</span> <span class="pre">-f</span></code> contexts, where Miller will be waiting for an end-of-file marker which never arrives; (b) pretty-print output for large files is constrained by available machine memory.</p>
<p>See <a class="reference internal" href="record-heterogeneity.html"><span class="doc">Record-heterogeneity</span></a> for how Miller handles changes of field names within a single data stream.</p>
<p>For output only (this isnt supported in the input-scanner as of 5.0.0) you can use <code class="docutils literal notranslate"><span class="pre">--barred</span></code> with pprint output format:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ mlr --opprint --barred cat data/small
+-----+-----+---+---------------------+---------------------+
| a | b | i | x | y |
+-----+-----+---+---------------------+---------------------+
| pan | pan | 1 | 0.3467901443380824 | 0.7268028627434533 |
| eks | pan | 2 | 0.7586799647899636 | 0.5221511083334797 |
| wye | wye | 3 | 0.20460330576630303 | 0.33831852551664776 |
| eks | wye | 4 | 0.38139939387114097 | 0.13418874328430463 |
| wye | pan | 5 | 0.5732889198020006 | 0.8636244699032729 |
+-----+-----+---+---------------------+---------------------+
</pre></div>
</div>
</div>
<div class="section" id="xtab-vertical-tabular">
<span id="file-formats-xtab"></span><h2>XTAB: Vertical tabular<a class="headerlink" href="#xtab-vertical-tabular" title="Permalink to this headline"></a></h2>
<p>This is perhaps most useful for looking a very wide and/or multi-column data which causes line-wraps on the screen (but see also
<a class="reference external" href="https://github.com/twosigma/ngrid/">ngrid</a> for an entirely different, very powerful option). Namely:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ grep -v &#39;^#&#39; /etc/passwd | head -n 6 | mlr --nidx --fs : --opprint cat
1 2 3 4 5 6 7
nobody * -2 -2 Unprivileged User /var/empty /usr/bin/false
root * 0 0 System Administrator /var/root /bin/sh
daemon * 1 1 System Services /var/root /usr/bin/false
_uucp * 4 4 Unix to Unix Copy Protocol /var/spool/uucp /usr/sbin/uucico
_taskgated * 13 13 Task Gate Daemon /var/empty /usr/bin/false
_networkd * 24 24 Network Services /var/networkd /usr/bin/false
$ grep -v &#39;^#&#39; /etc/passwd | head -n 2 | mlr --nidx --fs : --oxtab cat
1 nobody
2 *
3 -2
4 -2
5 Unprivileged User
6 /var/empty
7 /usr/bin/false
1 root
2 *
3 0
4 0
5 System Administrator
6 /var/root
7 /bin/sh
$ grep -v &#39;^#&#39; /etc/passwd | head -n 2 | \
mlr --nidx --fs : --ojson --jvstack --jlistwrap label name,password,uid,gid,gecos,home_dir,shell
[
{
&quot;name&quot;: &quot;nobody&quot;,
&quot;password&quot;: &quot;*&quot;,
&quot;uid&quot;: -2,
&quot;gid&quot;: -2,
&quot;gecos&quot;: &quot;Unprivileged User&quot;,
&quot;home_dir&quot;: &quot;/var/empty&quot;,
&quot;shell&quot;: &quot;/usr/bin/false&quot;
}
,{
&quot;name&quot;: &quot;root&quot;,
&quot;password&quot;: &quot;*&quot;,
&quot;uid&quot;: 0,
&quot;gid&quot;: 0,
&quot;gecos&quot;: &quot;System Administrator&quot;,
&quot;home_dir&quot;: &quot;/var/root&quot;,
&quot;shell&quot;: &quot;/bin/sh&quot;
}
]
</pre></div>
</div>
</div>
<div class="section" id="markdown-tabular">
<h2>Markdown tabular<a class="headerlink" href="#markdown-tabular" title="Permalink to this headline"></a></h2>
<p>Markdown format looks like this:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ mlr --omd cat data/small
| a | b | i | x | y |
| --- | --- | --- | --- | --- |
| pan | pan | 1 | 0.3467901443380824 | 0.7268028627434533 |
| eks | pan | 2 | 0.7586799647899636 | 0.5221511083334797 |
| wye | wye | 3 | 0.20460330576630303 | 0.33831852551664776 |
| eks | wye | 4 | 0.38139939387114097 | 0.13418874328430463 |
| wye | pan | 5 | 0.5732889198020006 | 0.8636244699032729 |
</pre></div>
</div>
<p>which renders like this when dropped into various web tools (e.g. github comments):</p>
<img alt="_images/omd.png" src="_images/omd.png" />
<p>As of Miller 4.3.0, markdown format is supported only for output, not input.</p>
</div>
<div class="section" id="data-conversion-keystroke-savers">
<h2>Data-conversion keystroke-savers<a class="headerlink" href="#data-conversion-keystroke-savers" title="Permalink to this headline"></a></h2>
<p>While you can do format conversion using <code class="docutils literal notranslate"><span class="pre">mlr</span> <span class="pre">--icsv</span> <span class="pre">--ojson</span> <span class="pre">cat</span> <span class="pre">myfile.csv</span></code>, there are also keystroke-savers for this purpose, such as <code class="docutils literal notranslate"><span class="pre">mlr</span> <span class="pre">--c2j</span> <span class="pre">cat</span> <span class="pre">myfile.csv</span></code>. For a complete list:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ mlr --usage-format-conversion-keystroke-saver-options
As keystroke-savers for format-conversion you may use the following:
--c2t --c2d --c2n --c2j --c2x --c2p --c2m
--t2c --t2d --t2n --t2j --t2x --t2p --t2m
--d2c --d2t --d2n --d2j --d2x --d2p --d2m
--n2c --n2t --n2d --n2j --n2x --n2p --n2m
--j2c --j2t --j2d --j2n --j2x --j2p --j2m
--x2c --x2t --x2d --x2n --x2j --x2p --x2m
--p2c --p2t --p2d --p2n --p2j --p2x --p2m
The letters c t d n j x p m refer to formats CSV, TSV, DKVP, NIDX, JSON, XTAB,
PPRINT, and markdown, respectively. Note that markdown format is available for
output only.
</pre></div>
</div>
</div>
<div class="section" id="autodetect-of-line-endings">
<h2>Autodetect of line endings<a class="headerlink" href="#autodetect-of-line-endings" title="Permalink to this headline"></a></h2>
<p>Default line endings (<code class="docutils literal notranslate"><span class="pre">--irs</span></code> and <code class="docutils literal notranslate"><span class="pre">--ors</span></code>) are <code class="docutils literal notranslate"><span class="pre">'auto'</span></code> which means <strong>autodetect from the input file format</strong>, as long as the input file(s) have lines ending in either LF (also known as linefeed, <code class="docutils literal notranslate"><span class="pre">'\n'</span></code>, <code class="docutils literal notranslate"><span class="pre">0x0a</span></code>, Unix-style) or CRLF (also known as carriage-return/linefeed pairs, <code class="docutils literal notranslate"><span class="pre">'\r\n'</span></code>, <code class="docutils literal notranslate"><span class="pre">0x0d</span> <span class="pre">0x0a</span></code>, Windows style).</p>
<p><strong>If both IRS and ORS are auto (which is the default) then LF input will lead to LF output and CRLF input will lead to CRLF output, regardless of the platform youre running on.</strong></p>
<p>The line-ending autodetector triggers on the first line ending detected in the input stream. E.g. if you specify a CRLF-terminated file on the command line followed by an LF-terminated file then autodetected line endings will be CRLF.</p>
<p>If you use <code class="docutils literal notranslate"><span class="pre">--ors</span> <span class="pre">{something</span> <span class="pre">else}</span></code> with (default or explicitly specified) <code class="docutils literal notranslate"><span class="pre">--irs</span> <span class="pre">auto</span></code> then line endings are autodetected on input and set to what you specify on output.</p>
<p>If you use <code class="docutils literal notranslate"><span class="pre">--irs</span> <span class="pre">{something</span> <span class="pre">else}</span></code> with (default or explicitly specified) <code class="docutils literal notranslate"><span class="pre">--ors</span> <span class="pre">auto</span></code> then the output line endings used are LF on Unix/Linux/BSD/MacOSX, and CRLF on Windows.</p>
<p>See also <a class="reference internal" href="reference.html#reference-separators"><span class="std std-ref">Record/field/pair separators</span></a> for more information about record/field/pair separators.</p>
</div>
<div class="section" id="comments-in-data">
<h2>Comments in data<a class="headerlink" href="#comments-in-data" title="Permalink to this headline"></a></h2>
<p>You can include comments within your data files, and either have them ignored, or passed directly through to the standard output as soon as they are encountered:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ mlr --usage-comments-in-data
--skip-comments Ignore commented lines (prefixed by &quot;#&quot;)
within the input.
--skip-comments-with {string} Ignore commented lines within input, with
specified prefix.
--pass-comments Immediately print commented lines (prefixed by &quot;#&quot;)
within the input.
--pass-comments-with {string} Immediately print commented lines within input, with
specified prefix.
Notes:
* Comments are only honored at the start of a line.
* In the absence of any of the above four options, comments are data like
any other text.
* When pass-comments is used, comment lines are written to standard output
immediately upon being read; they are not part of the record stream.
Results may be counterintuitive. A suggestion is to place comments at the
start of data files.
</pre></div>
</div>
<p>Examples:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ cat data/budget.csv
# Asana -- here are the budget figures you asked for!
type,quantity
purple,456.78
green,678.12
orange,123.45
$ mlr --skip-comments --icsv --opprint sort -nr quantity data/budget.csv
type quantity
green 678.12
purple 456.78
orange 123.45
$ mlr --pass-comments --icsv --opprint sort -nr quantity data/budget.csv
# Asana -- here are the budget figures you asked for!
type quantity
green 678.12
purple 456.78
orange 123.45
</pre></div>
</div>
</div>
</div>
<div class="clearer"></div>
</div>
</div>
</div>
<div class="sphinxsidebar" role="navigation" aria-label="main navigation">
<div class="sphinxsidebarwrapper">
<h3><a href="index.html">Table of Contents</a></h3>
<ul>
<li><a class="reference internal" href="#">File formats</a><ul>
<li><a class="reference internal" href="#examples">Examples</a></li>
<li><a class="reference internal" href="#csv-tsv-asv-usv-etc">CSV/TSV/ASV/USV/etc.</a></li>
<li><a class="reference internal" href="#dkvp-key-value-pairs">DKVP: Key-value pairs</a></li>
<li><a class="reference internal" href="#nidx-index-numbered-toolkit-style">NIDX: Index-numbered (toolkit style)</a></li>
<li><a class="reference internal" href="#tabular-json">Tabular JSON</a><ul>
<li><a class="reference internal" href="#single-level-json-objects">Single-level JSON objects</a></li>
<li><a class="reference internal" href="#nested-json-objects">Nested JSON objects</a></li>
<li><a class="reference internal" href="#arrays">Arrays</a></li>
<li><a class="reference internal" href="#formatting-json-options">Formatting JSON options</a></li>
<li><a class="reference internal" href="#json-non-streaming">JSON non-streaming</a></li>
</ul>
</li>
<li><a class="reference internal" href="#pprint-pretty-printed-tabular">PPRINT: Pretty-printed tabular</a></li>
<li><a class="reference internal" href="#xtab-vertical-tabular">XTAB: Vertical tabular</a></li>
<li><a class="reference internal" href="#markdown-tabular">Markdown tabular</a></li>
<li><a class="reference internal" href="#data-conversion-keystroke-savers">Data-conversion keystroke-savers</a></li>
<li><a class="reference internal" href="#autodetect-of-line-endings">Autodetect of line endings</a></li>
<li><a class="reference internal" href="#comments-in-data">Comments in data</a></li>
</ul>
</li>
</ul>
<h4>Previous topic</h4>
<p class="topless"><a href="feature-comparison.html"
title="previous chapter">Unix-toolkit context</a></p>
<h4>Next topic</h4>
<p class="topless"><a href="record-heterogeneity.html"
title="next chapter">Record-heterogeneity</a></p>
<div role="note" aria-label="source link">
<h3>This Page</h3>
<ul class="this-page-menu">
<li><a href="_sources/file-formats.rst.txt"
rel="nofollow">Show Source</a></li>
</ul>
</div>
<div id="searchbox" style="display: none" role="search">
<h3 id="searchlabel">Quick search</h3>
<div class="searchformwrapper">
<form class="search" action="search.html" method="get">
<input type="text" name="q" aria-labelledby="searchlabel" />
<input type="submit" value="Go" />
</form>
</div>
</div>
<script>$('#searchbox').show(0);</script>
</div>
</div>
<div class="clearer"></div>
</div>
<div class="related" role="navigation" aria-label="related navigation">
<h3>Navigation</h3>
<ul>
<li class="right" style="margin-right: 10px">
<a href="genindex.html" title="General Index"
>index</a></li>
<li class="right" >
<a href="record-heterogeneity.html" title="Record-heterogeneity"
>next</a> |</li>
<li class="right" >
<a href="feature-comparison.html" title="Unix-toolkit context"
>previous</a> |</li>
<li class="nav-item nav-item-0"><a href="index.html">Miller 5.10.2 documentation</a> &#187;</li>
<li class="nav-item nav-item-this"><a href="">File formats</a></li>
</ul>
</div>
<div class="footer" role="contentinfo">
&#169; Copyright 2020, John Kerl.
Created using <a href="https://www.sphinx-doc.org/">Sphinx</a> 3.2.1.
</div>
</body>
</html>

View file

@ -1,80 +0,0 @@
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>Index &#8212; Miller 5.10.2 documentation</title>
<link rel="stylesheet" href="_static/classic.css" type="text/css" />
<link rel="stylesheet" href="_static/pygments.css" type="text/css" />
<script id="documentation_options" data-url_root="./" src="_static/documentation_options.js"></script>
<script src="_static/jquery.js"></script>
<script src="_static/underscore.js"></script>
<script src="_static/doctools.js"></script>
<script src="_static/language_data.js"></script>
<link rel="index" title="Index" href="#" />
<link rel="search" title="Search" href="search.html" />
</head><body>
<div class="related" role="navigation" aria-label="related navigation">
<h3>Navigation</h3>
<ul>
<li class="right" style="margin-right: 10px">
<a href="#" title="General Index"
accesskey="I">index</a></li>
<li class="nav-item nav-item-0"><a href="index.html">Miller 5.10.2 documentation</a> &#187;</li>
<li class="nav-item nav-item-this"><a href="">Index</a></li>
</ul>
</div>
<div class="document">
<div class="documentwrapper">
<div class="bodywrapper">
<div class="body" role="main">
<h1 id="index">Index</h1>
<div class="genindex-jumpbox">
</div>
<div class="clearer"></div>
</div>
</div>
</div>
<div class="sphinxsidebar" role="navigation" aria-label="main navigation">
<div class="sphinxsidebarwrapper">
<div id="searchbox" style="display: none" role="search">
<h3 id="searchlabel">Quick search</h3>
<div class="searchformwrapper">
<form class="search" action="search.html" method="get">
<input type="text" name="q" aria-labelledby="searchlabel" />
<input type="submit" value="Go" />
</form>
</div>
</div>
<script>$('#searchbox').show(0);</script>
</div>
</div>
<div class="clearer"></div>
</div>
<div class="related" role="navigation" aria-label="related navigation">
<h3>Navigation</h3>
<ul>
<li class="right" style="margin-right: 10px">
<a href="#" title="General Index"
>index</a></li>
<li class="nav-item nav-item-0"><a href="index.html">Miller 5.10.2 documentation</a> &#187;</li>
<li class="nav-item nav-item-this"><a href="">Index</a></li>
</ul>
</div>
<div class="footer" role="contentinfo">
&#169; Copyright 2020, John Kerl.
Created using <a href="https://www.sphinx-doc.org/">Sphinx</a> 3.2.1.
</div>
</body>
</html>

View file

@ -1,169 +0,0 @@
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>Miller Docs v2 &#8212; Miller 5.10.2 documentation</title>
<link rel="stylesheet" href="_static/classic.css" type="text/css" />
<link rel="stylesheet" href="_static/pygments.css" type="text/css" />
<script id="documentation_options" data-url_root="./" src="_static/documentation_options.js"></script>
<script src="_static/jquery.js"></script>
<script src="_static/underscore.js"></script>
<script src="_static/doctools.js"></script>
<script src="_static/language_data.js"></script>
<link rel="index" title="Index" href="genindex.html" />
<link rel="search" title="Search" href="search.html" />
<link rel="next" title="Quick examples" href="quick-examples.html" />
</head><body>
<div class="related" role="navigation" aria-label="related navigation">
<h3>Navigation</h3>
<ul>
<li class="right" style="margin-right: 10px">
<a href="genindex.html" title="General Index"
accesskey="I">index</a></li>
<li class="right" >
<a href="quick-examples.html" title="Quick examples"
accesskey="N">next</a> |</li>
<li class="nav-item nav-item-0"><a href="#">Miller 5.10.2 documentation</a> &#187;</li>
<li class="nav-item nav-item-this"><a href="">Miller Docs v2</a></li>
</ul>
</div>
<div class="document">
<div class="documentwrapper">
<div class="bodywrapper">
<div class="body" role="main">
<div class="section" id="miller-docs-v2">
<h1>Miller Docs v2<a class="headerlink" href="#miller-docs-v2" title="Permalink to this headline"></a></h1>
<div class="section" id="overview">
<h2>Overview<a class="headerlink" href="#overview" title="Permalink to this headline"></a></h2>
<div class="toctree-wrapper compound">
<ul>
<li class="toctree-l1"><a class="reference internal" href="quick-examples.html">Quick examples</a></li>
<li class="toctree-l1"><a class="reference internal" href="features.html">Features</a></li>
<li class="toctree-l1"><a class="reference internal" href="10min.html">Miller in 10 minutes</a></li>
<li class="toctree-l1"><a class="reference internal" href="feature-comparison.html">Unix-toolkit context</a></li>
<li class="toctree-l1"><a class="reference internal" href="file-formats.html">File formats</a></li>
<li class="toctree-l1"><a class="reference internal" href="record-heterogeneity.html">Record-heterogeneity</a></li>
<li class="toctree-l1"><a class="reference internal" href="customization.html">Customization: .mlrrc</a></li>
<li class="toctree-l1"><a class="reference internal" href="install.html">Installation</a></li>
<li class="toctree-l1"><a class="reference internal" href="internationalization.html">Internationalization</a></li>
<li class="toctree-l1"><a class="reference internal" href="contact.html">Contact</a></li>
</ul>
</div>
</div>
<div class="section" id="details">
<h2>Details<a class="headerlink" href="#details" title="Permalink to this headline"></a></h2>
<div class="toctree-wrapper compound">
<ul>
<li class="toctree-l1"><a class="reference internal" href="faq.html">FAQ</a></li>
<li class="toctree-l1"><a class="reference internal" href="sql-examples.html">SQL examples</a></li>
<li class="toctree-l1"><a class="reference internal" href="log-processing-examples.html">Log-processing examples</a></li>
<li class="toctree-l1"><a class="reference internal" href="data-examples.html">Data-diving examples</a></li>
<li class="toctree-l1"><a class="reference internal" href="cookbook.html">Cookbook part 1: common patterns</a></li>
<li class="toctree-l1"><a class="reference internal" href="cookbook2.html">Cookbook part 2: Random things, and some math</a></li>
<li class="toctree-l1"><a class="reference internal" href="cookbook3.html">Cookbook part 3: Stats with and without out-of-stream variables</a></li>
<li class="toctree-l1"><a class="reference internal" href="data-sharing.html">Mixing with other languages</a></li>
</ul>
</div>
</div>
<div class="section" id="reference">
<h2>Reference<a class="headerlink" href="#reference" title="Permalink to this headline"></a></h2>
<div class="toctree-wrapper compound">
<ul>
<li class="toctree-l1"><a class="reference internal" href="reference.html">Main reference</a></li>
<li class="toctree-l1"><a class="reference internal" href="reference-verbs.html">Verbs reference</a></li>
<li class="toctree-l1"><a class="reference internal" href="reference-dsl.html">DSL reference</a></li>
<li class="toctree-l1"><a class="reference internal" href="manpage.html">Manpage</a></li>
<li class="toctree-l1"><a class="reference internal" href="release-docs.html">Documents by release</a></li>
<li class="toctree-l1"><a class="reference internal" href="build.html">Building from source</a></li>
</ul>
</div>
</div>
<div class="section" id="background">
<h2>Background<a class="headerlink" href="#background" title="Permalink to this headline"></a></h2>
<div class="toctree-wrapper compound">
<ul>
<li class="toctree-l1"><a class="reference internal" href="why.html">Why?</a></li>
<li class="toctree-l1"><a class="reference internal" href="etymology.html">Why call it Miller?</a></li>
<li class="toctree-l1"><a class="reference internal" href="originality.html">How original is Miller?</a></li>
<li class="toctree-l1"><a class="reference internal" href="performance.html">Performance</a></li>
</ul>
</div>
</div>
<div class="section" id="index">
<h2>Index<a class="headerlink" href="#index" title="Permalink to this headline"></a></h2>
<ul class="simple">
<li><p><a class="reference internal" href="genindex.html"><span class="std std-ref">Index</span></a></p></li>
<li><p><a class="reference internal" href="search.html"><span class="std std-ref">Search Page</span></a></p></li>
</ul>
</div>
</div>
<div class="clearer"></div>
</div>
</div>
</div>
<div class="sphinxsidebar" role="navigation" aria-label="main navigation">
<div class="sphinxsidebarwrapper">
<h3><a href="#">Table of Contents</a></h3>
<ul>
<li><a class="reference internal" href="#">Miller Docs v2</a><ul>
<li><a class="reference internal" href="#overview">Overview</a></li>
<li><a class="reference internal" href="#details">Details</a></li>
<li><a class="reference internal" href="#reference">Reference</a></li>
<li><a class="reference internal" href="#background">Background</a></li>
<li><a class="reference internal" href="#index">Index</a></li>
</ul>
</li>
</ul>
<h4>Next topic</h4>
<p class="topless"><a href="quick-examples.html"
title="next chapter">Quick examples</a></p>
<div role="note" aria-label="source link">
<h3>This Page</h3>
<ul class="this-page-menu">
<li><a href="_sources/index.rst.txt"
rel="nofollow">Show Source</a></li>
</ul>
</div>
<div id="searchbox" style="display: none" role="search">
<h3 id="searchlabel">Quick search</h3>
<div class="searchformwrapper">
<form class="search" action="search.html" method="get">
<input type="text" name="q" aria-labelledby="searchlabel" />
<input type="submit" value="Go" />
</form>
</div>
</div>
<script>$('#searchbox').show(0);</script>
</div>
</div>
<div class="clearer"></div>
</div>
<div class="related" role="navigation" aria-label="related navigation">
<h3>Navigation</h3>
<ul>
<li class="right" style="margin-right: 10px">
<a href="genindex.html" title="General Index"
>index</a></li>
<li class="right" >
<a href="quick-examples.html" title="Quick examples"
>next</a> |</li>
<li class="nav-item nav-item-0"><a href="#">Miller 5.10.2 documentation</a> &#187;</li>
<li class="nav-item nav-item-this"><a href="">Miller Docs v2</a></li>
</ul>
</div>
<div class="footer" role="contentinfo">
&#169; Copyright 2020, John Kerl.
Created using <a href="https://www.sphinx-doc.org/">Sphinx</a> 3.2.1.
</div>
</body>
</html>

View file

@ -1,149 +0,0 @@
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>Installation &#8212; Miller 5.10.2 documentation</title>
<link rel="stylesheet" href="_static/classic.css" type="text/css" />
<link rel="stylesheet" href="_static/pygments.css" type="text/css" />
<script id="documentation_options" data-url_root="./" src="_static/documentation_options.js"></script>
<script src="_static/jquery.js"></script>
<script src="_static/underscore.js"></script>
<script src="_static/doctools.js"></script>
<script src="_static/language_data.js"></script>
<link rel="index" title="Index" href="genindex.html" />
<link rel="search" title="Search" href="search.html" />
<link rel="next" title="Internationalization" href="internationalization.html" />
<link rel="prev" title="Customization: .mlrrc" href="customization.html" />
</head><body>
<div class="related" role="navigation" aria-label="related navigation">
<h3>Navigation</h3>
<ul>
<li class="right" style="margin-right: 10px">
<a href="genindex.html" title="General Index"
accesskey="I">index</a></li>
<li class="right" >
<a href="internationalization.html" title="Internationalization"
accesskey="N">next</a> |</li>
<li class="right" >
<a href="customization.html" title="Customization: .mlrrc"
accesskey="P">previous</a> |</li>
<li class="nav-item nav-item-0"><a href="index.html">Miller 5.10.2 documentation</a> &#187;</li>
<li class="nav-item nav-item-this"><a href="">Installation</a></li>
</ul>
</div>
<div class="document">
<div class="documentwrapper">
<div class="bodywrapper">
<div class="body" role="main">
<div class="section" id="installation">
<h1>Installation<a class="headerlink" href="#installation" title="Permalink to this headline"></a></h1>
<div class="section" id="prebuilt-executables-via-package-managers">
<h2>Prebuilt executables via package managers<a class="headerlink" href="#prebuilt-executables-via-package-managers" title="Permalink to this headline"></a></h2>
<p><a class="reference external" href="https://brew.sh/">Homebrew</a> installation support for OSX is available via</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">brew</span> <span class="n">update</span> <span class="o">&amp;&amp;</span> <span class="n">brew</span> <span class="n">install</span> <span class="n">miller</span>
</pre></div>
</div>
<p>…and also via <a class="reference external" href="https://www.macports.org/">MacPorts</a>:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">sudo</span> <span class="n">port</span> <span class="n">selfupdate</span> <span class="o">&amp;&amp;</span> <span class="n">sudo</span> <span class="n">port</span> <span class="n">install</span> <span class="n">miller</span>
</pre></div>
</div>
<p>You may already have the <code class="docutils literal notranslate"><span class="pre">mlr</span></code> executable available in your platforms package manager on NetBSD, Debian Linux, Ubuntu Xenial and upward, Arch Linux, or perhaps other distributions. For example, on various Linux distributions you might do one of the following:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">sudo</span> <span class="n">apt</span><span class="o">-</span><span class="n">get</span> <span class="n">install</span> <span class="n">miller</span>
</pre></div>
</div>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">sudo</span> <span class="n">apt</span> <span class="n">install</span> <span class="n">miller</span>
</pre></div>
</div>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">sudo</span> <span class="n">yum</span> <span class="n">install</span> <span class="n">miller</span>
</pre></div>
</div>
<p>On Windows, Miller is available via <a class="reference external" href="https://chocolatey.org/">Chocolatey</a>:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">choco</span> <span class="n">install</span> <span class="n">miller</span>
</pre></div>
</div>
</div>
<div class="section" id="prebuilt-executables-via-github-per-release">
<h2>Prebuilt executables via GitHub per release<a class="headerlink" href="#prebuilt-executables-via-github-per-release" title="Permalink to this headline"></a></h2>
<p>Please see <a class="reference external" href="https://github.com/johnkerl/miller/releases">https://github.com/johnkerl/miller/releases</a> where there are builds for OSX Yosemite, Linux x86-64 (dynamically linked), and Windows (via Appveyor build artifacts).</p>
<p>Miller is autobuilt for <strong>Linux</strong> using <strong>Travis</strong> on every commit (<a class="reference external" href="https://travis-ci.org/johnkerl/miller/builds">https://travis-ci.org/johnkerl/miller/builds</a>). This was set up by the generous assistance of <a class="reference external" href="https://github.com/SikhNerd">SikhNerd</a> on Github, tracked in <a class="reference external" href="https://github.com/johnkerl/miller/issues/15">https://github.com/johnkerl/miller/issues/15</a>. Analogously, Miller is autobuilt for <strong>Windows</strong> using the <strong>Appveyor</strong> continuous-build system: <a class="reference external" href="https://ci.appveyor.com/project/johnkerl/miller">https://ci.appveyor.com/project/johnkerl/miller</a>.</p>
<p>Miller releases from <a class="reference external" href="https://github.com/johnkerl/miller/releases/tag/v5.1.0w">5.1.0</a> onward will have a precompiled Windows binary, in addition to the MacOSX and Linux 64-bit precompiled binaries as on previous releases. Specifically, at <a class="reference external" href="https://ci.appveyor.com/project/johnkerl/miller">https://ci.appveyor.com/project/johnkerl/miller</a> you can select <em>Latest Build</em> and then <em>Artifacts</em> to always get the current head build. Miller releases from 5.3.0 onward will simply point to a particular Appveyor artifact associated with the release.</p>
</div>
<div class="section" id="building-from-source">
<h2>Building from source<a class="headerlink" href="#building-from-source" title="Permalink to this headline"></a></h2>
<p>Please see <a class="reference internal" href="build.html"><span class="doc">Building from source</span></a>.</p>
</div>
</div>
<div class="clearer"></div>
</div>
</div>
</div>
<div class="sphinxsidebar" role="navigation" aria-label="main navigation">
<div class="sphinxsidebarwrapper">
<h3><a href="index.html">Table of Contents</a></h3>
<ul>
<li><a class="reference internal" href="#">Installation</a><ul>
<li><a class="reference internal" href="#prebuilt-executables-via-package-managers">Prebuilt executables via package managers</a></li>
<li><a class="reference internal" href="#prebuilt-executables-via-github-per-release">Prebuilt executables via GitHub per release</a></li>
<li><a class="reference internal" href="#building-from-source">Building from source</a></li>
</ul>
</li>
</ul>
<h4>Previous topic</h4>
<p class="topless"><a href="customization.html"
title="previous chapter">Customization: .mlrrc</a></p>
<h4>Next topic</h4>
<p class="topless"><a href="internationalization.html"
title="next chapter">Internationalization</a></p>
<div role="note" aria-label="source link">
<h3>This Page</h3>
<ul class="this-page-menu">
<li><a href="_sources/install.rst.txt"
rel="nofollow">Show Source</a></li>
</ul>
</div>
<div id="searchbox" style="display: none" role="search">
<h3 id="searchlabel">Quick search</h3>
<div class="searchformwrapper">
<form class="search" action="search.html" method="get">
<input type="text" name="q" aria-labelledby="searchlabel" />
<input type="submit" value="Go" />
</form>
</div>
</div>
<script>$('#searchbox').show(0);</script>
</div>
</div>
<div class="clearer"></div>
</div>
<div class="related" role="navigation" aria-label="related navigation">
<h3>Navigation</h3>
<ul>
<li class="right" style="margin-right: 10px">
<a href="genindex.html" title="General Index"
>index</a></li>
<li class="right" >
<a href="internationalization.html" title="Internationalization"
>next</a> |</li>
<li class="right" >
<a href="customization.html" title="Customization: .mlrrc"
>previous</a> |</li>
<li class="nav-item nav-item-0"><a href="index.html">Miller 5.10.2 documentation</a> &#187;</li>
<li class="nav-item nav-item-this"><a href="">Installation</a></li>
</ul>
</div>
<div class="footer" role="contentinfo">
&#169; Copyright 2020, John Kerl.
Created using <a href="https://www.sphinx-doc.org/">Sphinx</a> 3.2.1.
</div>
</body>
</html>

View file

@ -1,113 +0,0 @@
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>Internationalization &#8212; Miller 5.10.2 documentation</title>
<link rel="stylesheet" href="_static/classic.css" type="text/css" />
<link rel="stylesheet" href="_static/pygments.css" type="text/css" />
<script id="documentation_options" data-url_root="./" src="_static/documentation_options.js"></script>
<script src="_static/jquery.js"></script>
<script src="_static/underscore.js"></script>
<script src="_static/doctools.js"></script>
<script src="_static/language_data.js"></script>
<link rel="index" title="Index" href="genindex.html" />
<link rel="search" title="Search" href="search.html" />
<link rel="next" title="Contact" href="contact.html" />
<link rel="prev" title="Installation" href="install.html" />
</head><body>
<div class="related" role="navigation" aria-label="related navigation">
<h3>Navigation</h3>
<ul>
<li class="right" style="margin-right: 10px">
<a href="genindex.html" title="General Index"
accesskey="I">index</a></li>
<li class="right" >
<a href="contact.html" title="Contact"
accesskey="N">next</a> |</li>
<li class="right" >
<a href="install.html" title="Installation"
accesskey="P">previous</a> |</li>
<li class="nav-item nav-item-0"><a href="index.html">Miller 5.10.2 documentation</a> &#187;</li>
<li class="nav-item nav-item-this"><a href="">Internationalization</a></li>
</ul>
</div>
<div class="document">
<div class="documentwrapper">
<div class="bodywrapper">
<div class="body" role="main">
<div class="section" id="internationalization">
<h1>Internationalization<a class="headerlink" href="#internationalization" title="Permalink to this headline"></a></h1>
<p>Miller handles strings with any characters other than 0x00 or 0xff, using explicit UTF-8-friendly string-length computations. (I have no plans to support UTF-16 or ISO-8859-1.)</p>
<p>By and large, Miller treats strings as sequences of non-null bytes without need to interpret them semantically. Intentional support for internationalization includes:</p>
<ul class="simple">
<li><p>Tabular output formats such pprint and xtab (see <a class="reference internal" href="file-formats.html"><span class="doc">File formats</span></a>) are aligned correctly.</p></li>
<li><p>The <a class="reference internal" href="reference-dsl.html#reference-dsl-strlen"><span class="std std-ref">strlen</span></a> function correctly counts UTF-8 codepoints rather than bytes.</p></li>
<li><p>The <a class="reference internal" href="reference-dsl.html#reference-dsl-toupper"><span class="std std-ref">toupper</span></a>, <a class="reference internal" href="reference-dsl.html#reference-dsl-tolower"><span class="std std-ref">tolower</span></a>, and <a class="reference internal" href="reference-dsl.html#reference-dsl-capitalize"><span class="std std-ref">capitalize</span></a> DSL functions within the capabilities of <a class="reference external" href="https://github.com/sheredom/utf8.h">https://github.com/sheredom/utf8.h</a>.</p></li>
</ul>
<p>Meanwhile, regular expressions and the DSL functions <a class="reference internal" href="reference-dsl.html#reference-dsl-sub"><span class="std std-ref">sub</span></a> and <a class="reference internal" href="reference-dsl.html#reference-dsl-gsub"><span class="std std-ref">gsub</span></a> function correctly, albeit without explicit intentional support.</p>
<p>Please file an issue at <a class="reference external" href="https://github.com/johnkerl/miller">https://github.com/johnkerl/miller</a> if you encounter bugs related to internationalization (or anything else for that matter).</p>
</div>
<div class="clearer"></div>
</div>
</div>
</div>
<div class="sphinxsidebar" role="navigation" aria-label="main navigation">
<div class="sphinxsidebarwrapper">
<h4>Previous topic</h4>
<p class="topless"><a href="install.html"
title="previous chapter">Installation</a></p>
<h4>Next topic</h4>
<p class="topless"><a href="contact.html"
title="next chapter">Contact</a></p>
<div role="note" aria-label="source link">
<h3>This Page</h3>
<ul class="this-page-menu">
<li><a href="_sources/internationalization.rst.txt"
rel="nofollow">Show Source</a></li>
</ul>
</div>
<div id="searchbox" style="display: none" role="search">
<h3 id="searchlabel">Quick search</h3>
<div class="searchformwrapper">
<form class="search" action="search.html" method="get">
<input type="text" name="q" aria-labelledby="searchlabel" />
<input type="submit" value="Go" />
</form>
</div>
</div>
<script>$('#searchbox').show(0);</script>
</div>
</div>
<div class="clearer"></div>
</div>
<div class="related" role="navigation" aria-label="related navigation">
<h3>Navigation</h3>
<ul>
<li class="right" style="margin-right: 10px">
<a href="genindex.html" title="General Index"
>index</a></li>
<li class="right" >
<a href="contact.html" title="Contact"
>next</a> |</li>
<li class="right" >
<a href="install.html" title="Installation"
>previous</a> |</li>
<li class="nav-item nav-item-0"><a href="index.html">Miller 5.10.2 documentation</a> &#187;</li>
<li class="nav-item nav-item-this"><a href="">Internationalization</a></li>
</ul>
</div>
<div class="footer" role="contentinfo">
&#169; Copyright 2020, John Kerl.
Created using <a href="https://www.sphinx-doc.org/">Sphinx</a> 3.2.1.
</div>
</body>
</html>

Some files were not shown because too many files have changed in this diff Show more