mirror of
https://github.com/johnkerl/miller.git
synced 2026-01-24 02:36:15 +00:00
365 lines
No EOL
18 KiB
HTML
365 lines
No EOL
18 KiB
HTML
|
||
<!DOCTYPE html>
|
||
|
||
<html>
|
||
<head>
|
||
<meta charset="utf-8" />
|
||
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
|
||
<title>Cookbook part 3: Stats with and without out-of-stream variables — Miller 6.0.0-alpha documentation</title>
|
||
|
||
<link rel="stylesheet" href="_static/scrolls.css" type="text/css" />
|
||
<link rel="stylesheet" href="_static/pygments.css" type="text/css" />
|
||
<link rel="stylesheet" href="_static/print.css" type="text/css" />
|
||
|
||
<script id="documentation_options" data-url_root="./" src="_static/documentation_options.js"></script>
|
||
<script src="_static/jquery.js"></script>
|
||
<script src="_static/underscore.js"></script>
|
||
<script src="_static/doctools.js"></script>
|
||
<script src="_static/language_data.js"></script>
|
||
<script src="_static/theme_extras.js"></script>
|
||
<link rel="index" title="Index" href="genindex.html" />
|
||
<link rel="search" title="Search" href="search.html" />
|
||
<link rel="next" title="Why?" href="why.html" />
|
||
<link rel="prev" title="Cookbook part 2: Random things, and some math" href="cookbook2.html" />
|
||
</head><body>
|
||
<div id="content">
|
||
<div class="header">
|
||
<h1 class="heading"><a href="index.html"
|
||
title="back to the documentation overview"><span>Cookbook part 3: Stats with and without out-of-stream variables</span></a></h1>
|
||
</div>
|
||
<div class="relnav" role="navigation" aria-label="related navigation">
|
||
<a href="cookbook2.html">« Cookbook part 2: Random things, and some math</a> |
|
||
<a href="#">Cookbook part 3: Stats with and without out-of-stream variables</a>
|
||
| <a href="why.html">Why? »</a>
|
||
</div>
|
||
<div id="contentwrapper">
|
||
<div id="toc" role="navigation" aria-label="table of contents navigation">
|
||
<h3>Table of Contents</h3>
|
||
<ul>
|
||
<li><a class="reference internal" href="#">Cookbook part 3: Stats with and without out-of-stream variables</a><ul>
|
||
<li><a class="reference internal" href="#overview">Overview</a></li>
|
||
<li><a class="reference internal" href="#mean-without-with-oosvars">Mean without/with oosvars</a></li>
|
||
<li><a class="reference internal" href="#keyed-mean-without-with-oosvars">Keyed mean without/with oosvars</a></li>
|
||
<li><a class="reference internal" href="#variance-and-standard-deviation-without-with-oosvars">Variance and standard deviation without/with oosvars</a></li>
|
||
<li><a class="reference internal" href="#min-max-without-with-oosvars">Min/max without/with oosvars</a></li>
|
||
<li><a class="reference internal" href="#keyed-min-max-without-with-oosvars">Keyed min/max without/with oosvars</a></li>
|
||
<li><a class="reference internal" href="#delta-without-with-oosvars">Delta without/with oosvars</a></li>
|
||
<li><a class="reference internal" href="#keyed-delta-without-with-oosvars">Keyed delta without/with oosvars</a></li>
|
||
<li><a class="reference internal" href="#exponentially-weighted-moving-averages-without-with-oosvars">Exponentially weighted moving averages without/with oosvars</a></li>
|
||
</ul>
|
||
</li>
|
||
</ul>
|
||
|
||
</div>
|
||
<div role="main">
|
||
|
||
<div class="section" id="cookbook-part-3-stats-with-and-without-out-of-stream-variables">
|
||
<h1>Cookbook part 3: Stats with and without out-of-stream variables<a class="headerlink" href="#cookbook-part-3-stats-with-and-without-out-of-stream-variables" title="Permalink to this headline">¶</a></h1>
|
||
<div class="section" id="overview">
|
||
<h2>Overview<a class="headerlink" href="#overview" title="Permalink to this headline">¶</a></h2>
|
||
<p>One of Miller’s strengths is its compact notation: for example, given input of the form</p>
|
||
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> head -n 5 ../data/medium
|
||
</span> a=pan,b=pan,i=1,x=0.3467901443380824,y=0.7268028627434533
|
||
a=eks,b=pan,i=2,x=0.7586799647899636,y=0.5221511083334797
|
||
a=wye,b=wye,i=3,x=0.20460330576630303,y=0.33831852551664776
|
||
a=eks,b=wye,i=4,x=0.38139939387114097,y=0.13418874328430463
|
||
a=wye,b=pan,i=5,x=0.5732889198020006,y=0.8636244699032729
|
||
</pre></div>
|
||
</div>
|
||
<p>you can simply do</p>
|
||
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> mlr --oxtab stats1 -a sum -f x ../data/medium
|
||
</span> x_sum 4986.019681679581
|
||
</pre></div>
|
||
</div>
|
||
<p>or</p>
|
||
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> mlr --opprint stats1 -a sum -f x -g b ../data/medium
|
||
</span> b x_sum
|
||
pan 965.7636699425815
|
||
wye 1023.5484702619565
|
||
zee 979.7420161495838
|
||
eks 1016.7728571314786
|
||
hat 1000.192668193983
|
||
</pre></div>
|
||
</div>
|
||
<p>rather than the more tedious</p>
|
||
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> mlr --oxtab put -q '
|
||
</span><span class="hll"> @x_sum += $x;
|
||
</span><span class="hll"> end {
|
||
</span><span class="hll"> emit @x_sum
|
||
</span><span class="hll"> }
|
||
</span><span class="hll"> ' data/medium
|
||
</span> x_sum 4986.019681679581
|
||
</pre></div>
|
||
</div>
|
||
<p>or</p>
|
||
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> mlr --opprint put -q '
|
||
</span><span class="hll"> @x_sum[$b] += $x;
|
||
</span><span class="hll"> end {
|
||
</span><span class="hll"> emit @x_sum, "b"
|
||
</span><span class="hll"> }
|
||
</span><span class="hll"> ' data/medium
|
||
</span> b x_sum
|
||
pan 965.7636699425815
|
||
wye 1023.5484702619565
|
||
zee 979.7420161495838
|
||
eks 1016.7728571314786
|
||
hat 1000.192668193983
|
||
</pre></div>
|
||
</div>
|
||
<p>The former (<code class="docutils literal notranslate"><span class="pre">mlr</span> <span class="pre">stats1</span></code> et al.) has the advantages of being easier to type, being less error-prone to type, and running faster.</p>
|
||
<p>Nonetheless, out-of-stream variables (which I whimsically call <em>oosvars</em>), begin/end blocks, and emit statements give you the ability to implement logic – if you wish to do so – which isn’t present in other Miller verbs. (If you find yourself often using the same out-of-stream-variable logic over and over, please file a request at <a class="reference external" href="https://github.com/johnkerl/miller/issues">https://github.com/johnkerl/miller/issues</a> to get it implemented directly in Go as a Miller verb of its own.)</p>
|
||
<p>The following examples compute some things using oosvars which are already computable using Miller verbs, by way of providing food for thought.</p>
|
||
</div>
|
||
<div class="section" id="mean-without-with-oosvars">
|
||
<h2>Mean without/with oosvars<a class="headerlink" href="#mean-without-with-oosvars" title="Permalink to this headline">¶</a></h2>
|
||
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> mlr --opprint stats1 -a mean -f x data/medium
|
||
</span> x_mean
|
||
0.49860196816795804
|
||
</pre></div>
|
||
</div>
|
||
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> mlr --opprint put -q '
|
||
</span><span class="hll"> @x_sum += $x;
|
||
</span><span class="hll"> @x_count += 1;
|
||
</span><span class="hll"> end {
|
||
</span><span class="hll"> @x_mean = @x_sum / @x_count;
|
||
</span><span class="hll"> emit @x_mean
|
||
</span><span class="hll"> }
|
||
</span><span class="hll"> ' data/medium
|
||
</span> x_mean
|
||
0.49860196816795804
|
||
</pre></div>
|
||
</div>
|
||
</div>
|
||
<div class="section" id="keyed-mean-without-with-oosvars">
|
||
<h2>Keyed mean without/with oosvars<a class="headerlink" href="#keyed-mean-without-with-oosvars" title="Permalink to this headline">¶</a></h2>
|
||
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> mlr --opprint stats1 -a mean -f x -g a,b data/medium
|
||
</span> a b x_mean
|
||
pan pan 0.5133141190437597
|
||
eks pan 0.48507555383425127
|
||
wye wye 0.49150092785839306
|
||
eks wye 0.4838950517724162
|
||
wye pan 0.4996119901034838
|
||
zee pan 0.5198298297816007
|
||
eks zee 0.49546320772681596
|
||
zee wye 0.5142667998230479
|
||
hat wye 0.49381326184632596
|
||
pan wye 0.5023618498923658
|
||
zee eks 0.4883932942792647
|
||
hat zee 0.5099985721987774
|
||
hat eks 0.48587864619953547
|
||
wye hat 0.4977304763723314
|
||
pan eks 0.5036718595143479
|
||
eks eks 0.5227992666570941
|
||
hat hat 0.47993053101017374
|
||
hat pan 0.4643355557376876
|
||
zee zee 0.5127559183726382
|
||
pan hat 0.492140950155604
|
||
pan zee 0.4966041598627583
|
||
zee hat 0.46772617655014515
|
||
wye zee 0.5059066170573692
|
||
eks hat 0.5006790659966355
|
||
wye eks 0.5306035254809106
|
||
</pre></div>
|
||
</div>
|
||
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> mlr --opprint put -q '
|
||
</span><span class="hll"> @x_sum[$a][$b] += $x;
|
||
</span><span class="hll"> @x_count[$a][$b] += 1;
|
||
</span><span class="hll"> end{
|
||
</span><span class="hll"> for ((a, b), v in @x_sum) {
|
||
</span><span class="hll"> @x_mean[a][b] = @x_sum[a][b] / @x_count[a][b];
|
||
</span><span class="hll"> }
|
||
</span><span class="hll"> emit @x_mean, "a", "b"
|
||
</span><span class="hll"> }
|
||
</span><span class="hll"> ' data/medium
|
||
</span> a b x_mean
|
||
pan pan 0.5133141190437597
|
||
pan wye 0.5023618498923658
|
||
pan eks 0.5036718595143479
|
||
pan hat 0.492140950155604
|
||
pan zee 0.4966041598627583
|
||
eks pan 0.48507555383425127
|
||
eks wye 0.4838950517724162
|
||
eks zee 0.49546320772681596
|
||
eks eks 0.5227992666570941
|
||
eks hat 0.5006790659966355
|
||
wye wye 0.49150092785839306
|
||
wye pan 0.4996119901034838
|
||
wye hat 0.4977304763723314
|
||
wye zee 0.5059066170573692
|
||
wye eks 0.5306035254809106
|
||
zee pan 0.5198298297816007
|
||
zee wye 0.5142667998230479
|
||
zee eks 0.4883932942792647
|
||
zee zee 0.5127559183726382
|
||
zee hat 0.46772617655014515
|
||
hat wye 0.49381326184632596
|
||
hat zee 0.5099985721987774
|
||
hat eks 0.48587864619953547
|
||
hat hat 0.47993053101017374
|
||
hat pan 0.4643355557376876
|
||
</pre></div>
|
||
</div>
|
||
</div>
|
||
<div class="section" id="variance-and-standard-deviation-without-with-oosvars">
|
||
<h2>Variance and standard deviation without/with oosvars<a class="headerlink" href="#variance-and-standard-deviation-without-with-oosvars" title="Permalink to this headline">¶</a></h2>
|
||
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> mlr --oxtab stats1 -a count,sum,mean,var,stddev -f x data/medium
|
||
</span> x_count 10000
|
||
x_sum 4986.019681679581
|
||
x_mean 0.49860196816795804
|
||
x_var 0.08426974433144456
|
||
x_stddev 0.2902925151144007
|
||
</pre></div>
|
||
</div>
|
||
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> cat variance.mlr
|
||
</span> @n += 1;
|
||
@sumx += $x;
|
||
@sumx2 += $x**2;
|
||
end {
|
||
@mean = @sumx / @n;
|
||
@var = (@sumx2 - @mean * (2 * @sumx - @n * @mean)) / (@n - 1);
|
||
@stddev = sqrt(@var);
|
||
emitf @n, @sumx, @sumx2, @mean, @var, @stddev
|
||
}
|
||
</pre></div>
|
||
</div>
|
||
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> mlr --oxtab put -q -f variance.mlr data/medium
|
||
</span> n 10000
|
||
sumx 4986.019681679581
|
||
sumx2 3328.652400179729
|
||
mean 0.49860196816795804
|
||
var 0.08426974433144456
|
||
stddev 0.2902925151144007
|
||
</pre></div>
|
||
</div>
|
||
<p>You can also do this keyed, of course, imitating the keyed-mean example above.</p>
|
||
</div>
|
||
<div class="section" id="min-max-without-with-oosvars">
|
||
<h2>Min/max without/with oosvars<a class="headerlink" href="#min-max-without-with-oosvars" title="Permalink to this headline">¶</a></h2>
|
||
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> mlr --oxtab stats1 -a min,max -f x data/medium
|
||
</span> x_min 4.509679127584487e-05
|
||
x_max 0.999952670371898
|
||
</pre></div>
|
||
</div>
|
||
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> mlr --oxtab put -q '
|
||
</span><span class="hll"> @x_min = min(@x_min, $x);
|
||
</span><span class="hll"> @x_max = max(@x_max, $x);
|
||
</span><span class="hll"> end{emitf @x_min, @x_max}
|
||
</span><span class="hll"> ' data/medium
|
||
</span> x_min 4.509679127584487e-05
|
||
x_max 0.999952670371898
|
||
</pre></div>
|
||
</div>
|
||
</div>
|
||
<div class="section" id="keyed-min-max-without-with-oosvars">
|
||
<h2>Keyed min/max without/with oosvars<a class="headerlink" href="#keyed-min-max-without-with-oosvars" title="Permalink to this headline">¶</a></h2>
|
||
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> mlr --opprint stats1 -a min,max -f x -g a data/medium
|
||
</span> a x_min x_max
|
||
pan 0.00020390740306253097 0.9994029107062516
|
||
eks 0.0006917972627396018 0.9988110946859143
|
||
wye 0.0001874794831505655 0.9998228522652893
|
||
zee 0.0005486114815762555 0.9994904324789629
|
||
hat 4.509679127584487e-05 0.999952670371898
|
||
</pre></div>
|
||
</div>
|
||
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> mlr --opprint --from data/medium put -q '
|
||
</span><span class="hll"> @min[$a] = min(@min[$a], $x);
|
||
</span><span class="hll"> @max[$a] = max(@max[$a], $x);
|
||
</span><span class="hll"> end{
|
||
</span><span class="hll"> emit (@min, @max), "a";
|
||
</span><span class="hll"> }
|
||
</span><span class="hll"> '
|
||
</span> a min max
|
||
pan 0.00020390740306253097 0.9994029107062516
|
||
eks 0.0006917972627396018 0.9988110946859143
|
||
wye 0.0001874794831505655 0.9998228522652893
|
||
zee 0.0005486114815762555 0.9994904324789629
|
||
hat 4.509679127584487e-05 0.999952670371898
|
||
</pre></div>
|
||
</div>
|
||
</div>
|
||
<div class="section" id="delta-without-with-oosvars">
|
||
<h2>Delta without/with oosvars<a class="headerlink" href="#delta-without-with-oosvars" title="Permalink to this headline">¶</a></h2>
|
||
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> mlr --opprint step -a delta -f x data/small
|
||
</span> a b i x y x_delta
|
||
pan pan 1 0.3467901443380824 0.7268028627434533 0
|
||
eks pan 2 0.7586799647899636 0.5221511083334797 0.41188982045188116
|
||
wye wye 3 0.20460330576630303 0.33831852551664776 -0.5540766590236605
|
||
eks wye 4 0.38139939387114097 0.13418874328430463 0.17679608810483793
|
||
wye pan 5 0.5732889198020006 0.8636244699032729 0.19188952593085962
|
||
</pre></div>
|
||
</div>
|
||
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> mlr --opprint put '
|
||
</span><span class="hll"> $x_delta = is_present(@last) ? $x - @last : 0;
|
||
</span><span class="hll"> @last = $x
|
||
</span><span class="hll"> ' data/small
|
||
</span> a b i x y x_delta
|
||
pan pan 1 0.3467901443380824 0.7268028627434533 0
|
||
eks pan 2 0.7586799647899636 0.5221511083334797 0.41188982045188116
|
||
wye wye 3 0.20460330576630303 0.33831852551664776 -0.5540766590236605
|
||
eks wye 4 0.38139939387114097 0.13418874328430463 0.17679608810483793
|
||
wye pan 5 0.5732889198020006 0.8636244699032729 0.19188952593085962
|
||
</pre></div>
|
||
</div>
|
||
</div>
|
||
<div class="section" id="keyed-delta-without-with-oosvars">
|
||
<h2>Keyed delta without/with oosvars<a class="headerlink" href="#keyed-delta-without-with-oosvars" title="Permalink to this headline">¶</a></h2>
|
||
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> mlr --opprint step -a delta -f x -g a data/small
|
||
</span> a b i x y x_delta
|
||
pan pan 1 0.3467901443380824 0.7268028627434533 0
|
||
eks pan 2 0.7586799647899636 0.5221511083334797 0
|
||
wye wye 3 0.20460330576630303 0.33831852551664776 0
|
||
eks wye 4 0.38139939387114097 0.13418874328430463 -0.3772805709188226
|
||
wye pan 5 0.5732889198020006 0.8636244699032729 0.36868561403569755
|
||
</pre></div>
|
||
</div>
|
||
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> mlr --opprint put '
|
||
</span><span class="hll"> $x_delta = is_present(@last[$a]) ? $x - @last[$a] : 0;
|
||
</span><span class="hll"> @last[$a]=$x
|
||
</span><span class="hll"> ' data/small
|
||
</span> a b i x y x_delta
|
||
pan pan 1 0.3467901443380824 0.7268028627434533 0
|
||
eks pan 2 0.7586799647899636 0.5221511083334797 0
|
||
wye wye 3 0.20460330576630303 0.33831852551664776 0
|
||
eks wye 4 0.38139939387114097 0.13418874328430463 -0.3772805709188226
|
||
wye pan 5 0.5732889198020006 0.8636244699032729 0.36868561403569755
|
||
</pre></div>
|
||
</div>
|
||
</div>
|
||
<div class="section" id="exponentially-weighted-moving-averages-without-with-oosvars">
|
||
<h2>Exponentially weighted moving averages without/with oosvars<a class="headerlink" href="#exponentially-weighted-moving-averages-without-with-oosvars" title="Permalink to this headline">¶</a></h2>
|
||
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> mlr --opprint step -a ewma -d 0.1 -f x data/small
|
||
</span> a b i x y x_ewma_0.1
|
||
pan pan 1 0.3467901443380824 0.7268028627434533 0.3467901443380824
|
||
eks pan 2 0.7586799647899636 0.5221511083334797 0.3879791263832706
|
||
wye wye 3 0.20460330576630303 0.33831852551664776 0.36964154432157387
|
||
eks wye 4 0.38139939387114097 0.13418874328430463 0.37081732927653055
|
||
wye pan 5 0.5732889198020006 0.8636244699032729 0.3910644883290776
|
||
</pre></div>
|
||
</div>
|
||
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> mlr --opprint put '
|
||
</span><span class="hll"> begin{ @a=0.1 };
|
||
</span><span class="hll"> $e = NR==1 ? $x : @a * $x + (1 - @a) * @e;
|
||
</span><span class="hll"> @e=$e
|
||
</span><span class="hll"> ' data/small
|
||
</span> a b i x y e
|
||
pan pan 1 0.3467901443380824 0.7268028627434533 0.3467901443380824
|
||
eks pan 2 0.7586799647899636 0.5221511083334797 0.3879791263832706
|
||
wye wye 3 0.20460330576630303 0.33831852551664776 0.36964154432157387
|
||
eks wye 4 0.38139939387114097 0.13418874328430463 0.37081732927653055
|
||
wye pan 5 0.5732889198020006 0.8636244699032729 0.3910644883290776
|
||
</pre></div>
|
||
</div>
|
||
</div>
|
||
</div>
|
||
|
||
|
||
</div>
|
||
</div>
|
||
</div>
|
||
|
||
<div class="footer" role="contentinfo">
|
||
© Copyright 2021, John Kerl.
|
||
Created using <a href="https://www.sphinx-doc.org/">Sphinx</a> 3.2.1.
|
||
</div>
|
||
</body>
|
||
</html> |