miller/docs6/docs/_build/html/cookbook3.html

365 lines
No EOL
18 KiB
HTML
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>Cookbook part 3: Stats with and without out-of-stream variables &#8212; Miller 6.0.0-alpha documentation</title>
<link rel="stylesheet" href="_static/scrolls.css" type="text/css" />
<link rel="stylesheet" href="_static/pygments.css" type="text/css" />
<link rel="stylesheet" href="_static/print.css" type="text/css" />
<script id="documentation_options" data-url_root="./" src="_static/documentation_options.js"></script>
<script src="_static/jquery.js"></script>
<script src="_static/underscore.js"></script>
<script src="_static/doctools.js"></script>
<script src="_static/language_data.js"></script>
<script src="_static/theme_extras.js"></script>
<link rel="index" title="Index" href="genindex.html" />
<link rel="search" title="Search" href="search.html" />
<link rel="next" title="Why?" href="why.html" />
<link rel="prev" title="Cookbook part 2: Random things, and some math" href="cookbook2.html" />
</head><body>
<div id="content">
<div class="header">
<h1 class="heading"><a href="index.html"
title="back to the documentation overview"><span>Cookbook part 3: Stats with and without out-of-stream variables</span></a></h1>
</div>
<div class="relnav" role="navigation" aria-label="related navigation">
<a href="cookbook2.html">&laquo; Cookbook part 2: Random things, and some math</a> |
<a href="#">Cookbook part 3: Stats with and without out-of-stream variables</a>
| <a href="why.html">Why? &raquo;</a>
</div>
<div id="contentwrapper">
<div id="toc" role="navigation" aria-label="table of contents navigation">
<h3>Table of Contents</h3>
<ul>
<li><a class="reference internal" href="#">Cookbook part 3: Stats with and without out-of-stream variables</a><ul>
<li><a class="reference internal" href="#overview">Overview</a></li>
<li><a class="reference internal" href="#mean-without-with-oosvars">Mean without/with oosvars</a></li>
<li><a class="reference internal" href="#keyed-mean-without-with-oosvars">Keyed mean without/with oosvars</a></li>
<li><a class="reference internal" href="#variance-and-standard-deviation-without-with-oosvars">Variance and standard deviation without/with oosvars</a></li>
<li><a class="reference internal" href="#min-max-without-with-oosvars">Min/max without/with oosvars</a></li>
<li><a class="reference internal" href="#keyed-min-max-without-with-oosvars">Keyed min/max without/with oosvars</a></li>
<li><a class="reference internal" href="#delta-without-with-oosvars">Delta without/with oosvars</a></li>
<li><a class="reference internal" href="#keyed-delta-without-with-oosvars">Keyed delta without/with oosvars</a></li>
<li><a class="reference internal" href="#exponentially-weighted-moving-averages-without-with-oosvars">Exponentially weighted moving averages without/with oosvars</a></li>
</ul>
</li>
</ul>
</div>
<div role="main">
<div class="section" id="cookbook-part-3-stats-with-and-without-out-of-stream-variables">
<h1>Cookbook part 3: Stats with and without out-of-stream variables<a class="headerlink" href="#cookbook-part-3-stats-with-and-without-out-of-stream-variables" title="Permalink to this headline"></a></h1>
<div class="section" id="overview">
<h2>Overview<a class="headerlink" href="#overview" title="Permalink to this headline"></a></h2>
<p>One of Millers strengths is its compact notation: for example, given input of the form</p>
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> head -n 5 ../data/medium
</span> a=pan,b=pan,i=1,x=0.3467901443380824,y=0.7268028627434533
a=eks,b=pan,i=2,x=0.7586799647899636,y=0.5221511083334797
a=wye,b=wye,i=3,x=0.20460330576630303,y=0.33831852551664776
a=eks,b=wye,i=4,x=0.38139939387114097,y=0.13418874328430463
a=wye,b=pan,i=5,x=0.5732889198020006,y=0.8636244699032729
</pre></div>
</div>
<p>you can simply do</p>
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> mlr --oxtab stats1 -a sum -f x ../data/medium
</span> x_sum 4986.019681679581
</pre></div>
</div>
<p>or</p>
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> mlr --opprint stats1 -a sum -f x -g b ../data/medium
</span> b x_sum
pan 965.7636699425815
wye 1023.5484702619565
zee 979.7420161495838
eks 1016.7728571314786
hat 1000.192668193983
</pre></div>
</div>
<p>rather than the more tedious</p>
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> mlr --oxtab put -q &#39;
</span><span class="hll"> @x_sum += $x;
</span><span class="hll"> end {
</span><span class="hll"> emit @x_sum
</span><span class="hll"> }
</span><span class="hll"> &#39; data/medium
</span> x_sum 4986.019681679581
</pre></div>
</div>
<p>or</p>
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> mlr --opprint put -q &#39;
</span><span class="hll"> @x_sum[$b] += $x;
</span><span class="hll"> end {
</span><span class="hll"> emit @x_sum, &quot;b&quot;
</span><span class="hll"> }
</span><span class="hll"> &#39; data/medium
</span> b x_sum
pan 965.7636699425815
wye 1023.5484702619565
zee 979.7420161495838
eks 1016.7728571314786
hat 1000.192668193983
</pre></div>
</div>
<p>The former (<code class="docutils literal notranslate"><span class="pre">mlr</span> <span class="pre">stats1</span></code> et al.) has the advantages of being easier to type, being less error-prone to type, and running faster.</p>
<p>Nonetheless, out-of-stream variables (which I whimsically call <em>oosvars</em>), begin/end blocks, and emit statements give you the ability to implement logic if you wish to do so which isnt present in other Miller verbs. (If you find yourself often using the same out-of-stream-variable logic over and over, please file a request at <a class="reference external" href="https://github.com/johnkerl/miller/issues">https://github.com/johnkerl/miller/issues</a> to get it implemented directly in Go as a Miller verb of its own.)</p>
<p>The following examples compute some things using oosvars which are already computable using Miller verbs, by way of providing food for thought.</p>
</div>
<div class="section" id="mean-without-with-oosvars">
<h2>Mean without/with oosvars<a class="headerlink" href="#mean-without-with-oosvars" title="Permalink to this headline"></a></h2>
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> mlr --opprint stats1 -a mean -f x data/medium
</span> x_mean
0.49860196816795804
</pre></div>
</div>
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> mlr --opprint put -q &#39;
</span><span class="hll"> @x_sum += $x;
</span><span class="hll"> @x_count += 1;
</span><span class="hll"> end {
</span><span class="hll"> @x_mean = @x_sum / @x_count;
</span><span class="hll"> emit @x_mean
</span><span class="hll"> }
</span><span class="hll"> &#39; data/medium
</span> x_mean
0.49860196816795804
</pre></div>
</div>
</div>
<div class="section" id="keyed-mean-without-with-oosvars">
<h2>Keyed mean without/with oosvars<a class="headerlink" href="#keyed-mean-without-with-oosvars" title="Permalink to this headline"></a></h2>
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> mlr --opprint stats1 -a mean -f x -g a,b data/medium
</span> a b x_mean
pan pan 0.5133141190437597
eks pan 0.48507555383425127
wye wye 0.49150092785839306
eks wye 0.4838950517724162
wye pan 0.4996119901034838
zee pan 0.5198298297816007
eks zee 0.49546320772681596
zee wye 0.5142667998230479
hat wye 0.49381326184632596
pan wye 0.5023618498923658
zee eks 0.4883932942792647
hat zee 0.5099985721987774
hat eks 0.48587864619953547
wye hat 0.4977304763723314
pan eks 0.5036718595143479
eks eks 0.5227992666570941
hat hat 0.47993053101017374
hat pan 0.4643355557376876
zee zee 0.5127559183726382
pan hat 0.492140950155604
pan zee 0.4966041598627583
zee hat 0.46772617655014515
wye zee 0.5059066170573692
eks hat 0.5006790659966355
wye eks 0.5306035254809106
</pre></div>
</div>
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> mlr --opprint put -q &#39;
</span><span class="hll"> @x_sum[$a][$b] += $x;
</span><span class="hll"> @x_count[$a][$b] += 1;
</span><span class="hll"> end{
</span><span class="hll"> for ((a, b), v in @x_sum) {
</span><span class="hll"> @x_mean[a][b] = @x_sum[a][b] / @x_count[a][b];
</span><span class="hll"> }
</span><span class="hll"> emit @x_mean, &quot;a&quot;, &quot;b&quot;
</span><span class="hll"> }
</span><span class="hll"> &#39; data/medium
</span> a b x_mean
pan pan 0.5133141190437597
pan wye 0.5023618498923658
pan eks 0.5036718595143479
pan hat 0.492140950155604
pan zee 0.4966041598627583
eks pan 0.48507555383425127
eks wye 0.4838950517724162
eks zee 0.49546320772681596
eks eks 0.5227992666570941
eks hat 0.5006790659966355
wye wye 0.49150092785839306
wye pan 0.4996119901034838
wye hat 0.4977304763723314
wye zee 0.5059066170573692
wye eks 0.5306035254809106
zee pan 0.5198298297816007
zee wye 0.5142667998230479
zee eks 0.4883932942792647
zee zee 0.5127559183726382
zee hat 0.46772617655014515
hat wye 0.49381326184632596
hat zee 0.5099985721987774
hat eks 0.48587864619953547
hat hat 0.47993053101017374
hat pan 0.4643355557376876
</pre></div>
</div>
</div>
<div class="section" id="variance-and-standard-deviation-without-with-oosvars">
<h2>Variance and standard deviation without/with oosvars<a class="headerlink" href="#variance-and-standard-deviation-without-with-oosvars" title="Permalink to this headline"></a></h2>
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> mlr --oxtab stats1 -a count,sum,mean,var,stddev -f x data/medium
</span> x_count 10000
x_sum 4986.019681679581
x_mean 0.49860196816795804
x_var 0.08426974433144456
x_stddev 0.2902925151144007
</pre></div>
</div>
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> cat variance.mlr
</span> @n += 1;
@sumx += $x;
@sumx2 += $x**2;
end {
@mean = @sumx / @n;
@var = (@sumx2 - @mean * (2 * @sumx - @n * @mean)) / (@n - 1);
@stddev = sqrt(@var);
emitf @n, @sumx, @sumx2, @mean, @var, @stddev
}
</pre></div>
</div>
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> mlr --oxtab put -q -f variance.mlr data/medium
</span> n 10000
sumx 4986.019681679581
sumx2 3328.652400179729
mean 0.49860196816795804
var 0.08426974433144456
stddev 0.2902925151144007
</pre></div>
</div>
<p>You can also do this keyed, of course, imitating the keyed-mean example above.</p>
</div>
<div class="section" id="min-max-without-with-oosvars">
<h2>Min/max without/with oosvars<a class="headerlink" href="#min-max-without-with-oosvars" title="Permalink to this headline"></a></h2>
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> mlr --oxtab stats1 -a min,max -f x data/medium
</span> x_min 4.509679127584487e-05
x_max 0.999952670371898
</pre></div>
</div>
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> mlr --oxtab put -q &#39;
</span><span class="hll"> @x_min = min(@x_min, $x);
</span><span class="hll"> @x_max = max(@x_max, $x);
</span><span class="hll"> end{emitf @x_min, @x_max}
</span><span class="hll"> &#39; data/medium
</span> x_min 4.509679127584487e-05
x_max 0.999952670371898
</pre></div>
</div>
</div>
<div class="section" id="keyed-min-max-without-with-oosvars">
<h2>Keyed min/max without/with oosvars<a class="headerlink" href="#keyed-min-max-without-with-oosvars" title="Permalink to this headline"></a></h2>
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> mlr --opprint stats1 -a min,max -f x -g a data/medium
</span> a x_min x_max
pan 0.00020390740306253097 0.9994029107062516
eks 0.0006917972627396018 0.9988110946859143
wye 0.0001874794831505655 0.9998228522652893
zee 0.0005486114815762555 0.9994904324789629
hat 4.509679127584487e-05 0.999952670371898
</pre></div>
</div>
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> mlr --opprint --from data/medium put -q &#39;
</span><span class="hll"> @min[$a] = min(@min[$a], $x);
</span><span class="hll"> @max[$a] = max(@max[$a], $x);
</span><span class="hll"> end{
</span><span class="hll"> emit (@min, @max), &quot;a&quot;;
</span><span class="hll"> }
</span><span class="hll"> &#39;
</span> a min max
pan 0.00020390740306253097 0.9994029107062516
eks 0.0006917972627396018 0.9988110946859143
wye 0.0001874794831505655 0.9998228522652893
zee 0.0005486114815762555 0.9994904324789629
hat 4.509679127584487e-05 0.999952670371898
</pre></div>
</div>
</div>
<div class="section" id="delta-without-with-oosvars">
<h2>Delta without/with oosvars<a class="headerlink" href="#delta-without-with-oosvars" title="Permalink to this headline"></a></h2>
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> mlr --opprint step -a delta -f x data/small
</span> a b i x y x_delta
pan pan 1 0.3467901443380824 0.7268028627434533 0
eks pan 2 0.7586799647899636 0.5221511083334797 0.41188982045188116
wye wye 3 0.20460330576630303 0.33831852551664776 -0.5540766590236605
eks wye 4 0.38139939387114097 0.13418874328430463 0.17679608810483793
wye pan 5 0.5732889198020006 0.8636244699032729 0.19188952593085962
</pre></div>
</div>
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> mlr --opprint put &#39;
</span><span class="hll"> $x_delta = is_present(@last) ? $x - @last : 0;
</span><span class="hll"> @last = $x
</span><span class="hll"> &#39; data/small
</span> a b i x y x_delta
pan pan 1 0.3467901443380824 0.7268028627434533 0
eks pan 2 0.7586799647899636 0.5221511083334797 0.41188982045188116
wye wye 3 0.20460330576630303 0.33831852551664776 -0.5540766590236605
eks wye 4 0.38139939387114097 0.13418874328430463 0.17679608810483793
wye pan 5 0.5732889198020006 0.8636244699032729 0.19188952593085962
</pre></div>
</div>
</div>
<div class="section" id="keyed-delta-without-with-oosvars">
<h2>Keyed delta without/with oosvars<a class="headerlink" href="#keyed-delta-without-with-oosvars" title="Permalink to this headline"></a></h2>
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> mlr --opprint step -a delta -f x -g a data/small
</span> a b i x y x_delta
pan pan 1 0.3467901443380824 0.7268028627434533 0
eks pan 2 0.7586799647899636 0.5221511083334797 0
wye wye 3 0.20460330576630303 0.33831852551664776 0
eks wye 4 0.38139939387114097 0.13418874328430463 -0.3772805709188226
wye pan 5 0.5732889198020006 0.8636244699032729 0.36868561403569755
</pre></div>
</div>
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> mlr --opprint put &#39;
</span><span class="hll"> $x_delta = is_present(@last[$a]) ? $x - @last[$a] : 0;
</span><span class="hll"> @last[$a]=$x
</span><span class="hll"> &#39; data/small
</span> a b i x y x_delta
pan pan 1 0.3467901443380824 0.7268028627434533 0
eks pan 2 0.7586799647899636 0.5221511083334797 0
wye wye 3 0.20460330576630303 0.33831852551664776 0
eks wye 4 0.38139939387114097 0.13418874328430463 -0.3772805709188226
wye pan 5 0.5732889198020006 0.8636244699032729 0.36868561403569755
</pre></div>
</div>
</div>
<div class="section" id="exponentially-weighted-moving-averages-without-with-oosvars">
<h2>Exponentially weighted moving averages without/with oosvars<a class="headerlink" href="#exponentially-weighted-moving-averages-without-with-oosvars" title="Permalink to this headline"></a></h2>
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> mlr --opprint step -a ewma -d 0.1 -f x data/small
</span> a b i x y x_ewma_0.1
pan pan 1 0.3467901443380824 0.7268028627434533 0.3467901443380824
eks pan 2 0.7586799647899636 0.5221511083334797 0.3879791263832706
wye wye 3 0.20460330576630303 0.33831852551664776 0.36964154432157387
eks wye 4 0.38139939387114097 0.13418874328430463 0.37081732927653055
wye pan 5 0.5732889198020006 0.8636244699032729 0.3910644883290776
</pre></div>
</div>
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> mlr --opprint put &#39;
</span><span class="hll"> begin{ @a=0.1 };
</span><span class="hll"> $e = NR==1 ? $x : @a * $x + (1 - @a) * @e;
</span><span class="hll"> @e=$e
</span><span class="hll"> &#39; data/small
</span> a b i x y e
pan pan 1 0.3467901443380824 0.7268028627434533 0.3467901443380824
eks pan 2 0.7586799647899636 0.5221511083334797 0.3879791263832706
wye wye 3 0.20460330576630303 0.33831852551664776 0.36964154432157387
eks wye 4 0.38139939387114097 0.13418874328430463 0.37081732927653055
wye pan 5 0.5732889198020006 0.8636244699032729 0.3910644883290776
</pre></div>
</div>
</div>
</div>
</div>
</div>
</div>
<div class="footer" role="contentinfo">
&#169; Copyright 2021, John Kerl.
Created using <a href="https://www.sphinx-doc.org/">Sphinx</a> 3.2.1.
</div>
</body>
</html>