mirror of
https://github.com/johnkerl/miller.git
synced 2026-01-24 02:36:15 +00:00
120 lines
No EOL
5.8 KiB
HTML
120 lines
No EOL
5.8 KiB
HTML
|
||
<!DOCTYPE html>
|
||
|
||
<html>
|
||
<head>
|
||
<meta charset="utf-8" />
|
||
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
|
||
<title>Statistics examples — Miller 6.0.0-alpha documentation</title>
|
||
|
||
<link rel="stylesheet" href="_static/scrolls.css" type="text/css" />
|
||
<link rel="stylesheet" href="_static/pygments.css" type="text/css" />
|
||
<link rel="stylesheet" href="_static/print.css" type="text/css" />
|
||
|
||
<script id="documentation_options" data-url_root="./" src="_static/documentation_options.js"></script>
|
||
<script src="_static/jquery.js"></script>
|
||
<script src="_static/underscore.js"></script>
|
||
<script src="_static/doctools.js"></script>
|
||
<script src="_static/language_data.js"></script>
|
||
<script src="_static/theme_extras.js"></script>
|
||
<link rel="index" title="Index" href="genindex.html" />
|
||
<link rel="search" title="Search" href="search.html" />
|
||
<link rel="next" title="Randomizing examples" href="randomizing-examples.html" />
|
||
<link rel="prev" title="Data-cleaning examples" href="data-cleaning-examples.html" />
|
||
</head><body>
|
||
<div id="content">
|
||
<div class="header">
|
||
<h1 class="heading"><a href="index.html"
|
||
title="back to the documentation overview"><span>Statistics examples</span></a></h1>
|
||
</div>
|
||
<div class="relnav" role="navigation" aria-label="related navigation">
|
||
<a href="data-cleaning-examples.html">« Data-cleaning examples</a> |
|
||
<a href="#">Statistics examples</a>
|
||
| <a href="randomizing-examples.html">Randomizing examples »</a>
|
||
</div>
|
||
<div id="contentwrapper">
|
||
<div id="toc" role="navigation" aria-label="table of contents navigation">
|
||
<h3>Table of Contents</h3>
|
||
<ul>
|
||
<li><a class="reference internal" href="#">Statistics examples</a><ul>
|
||
<li><a class="reference internal" href="#computing-interquartile-ranges">Computing interquartile ranges</a></li>
|
||
<li><a class="reference internal" href="#computing-weighted-means">Computing weighted means</a></li>
|
||
</ul>
|
||
</li>
|
||
</ul>
|
||
|
||
</div>
|
||
<div role="main">
|
||
|
||
<div class="section" id="statistics-examples">
|
||
<h1>Statistics examples<a class="headerlink" href="#statistics-examples" title="Permalink to this headline">¶</a></h1>
|
||
<div class="section" id="computing-interquartile-ranges">
|
||
<h2>Computing interquartile ranges<a class="headerlink" href="#computing-interquartile-ranges" title="Permalink to this headline">¶</a></h2>
|
||
<p>For one or more specified field names, simply compute p25 and p75, then write the IQR as the difference of p75 and p25:</p>
|
||
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> mlr --oxtab stats1 -f x -a p25,p75 \
|
||
</span><span class="hll"> then put '$x_iqr = $x_p75 - $x_p25' \
|
||
</span><span class="hll"> data/medium
|
||
</span> x_p25 0.24667037823231752
|
||
x_p75 0.7481860062358446
|
||
x_iqr 0.5015156280035271
|
||
</pre></div>
|
||
</div>
|
||
<p>For wildcarded field names, first compute p25 and p75, then loop over field names with <code class="docutils literal notranslate"><span class="pre">p25</span></code> in them:</p>
|
||
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> mlr --oxtab stats1 --fr '[i-z]' -a p25,p75 \
|
||
</span><span class="hll"> then put 'for (k,v in $*) {
|
||
</span><span class="hll"> if (k =~ "(.*)_p25") {
|
||
</span><span class="hll"> $["\1_iqr"] = $["\1_p75"] - $["\1_p25"]
|
||
</span><span class="hll"> }
|
||
</span><span class="hll"> }' \
|
||
</span><span class="hll"> data/medium
|
||
</span></pre></div>
|
||
</div>
|
||
</div>
|
||
<div class="section" id="computing-weighted-means">
|
||
<h2>Computing weighted means<a class="headerlink" href="#computing-weighted-means" title="Permalink to this headline">¶</a></h2>
|
||
<p>This might be more elegantly implemented as an option within the <code class="docutils literal notranslate"><span class="pre">stats1</span></code> verb. Meanwhile, it’s expressible within the DSL:</p>
|
||
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> mlr --from data/medium put -q '
|
||
</span><span class="hll"> # Using the y field for weighting in this example
|
||
</span><span class="hll"> weight = $y;
|
||
</span><span class="hll">
|
||
</span><span class="hll"> # Using the a field for weighted aggregation in this example
|
||
</span><span class="hll"> @sumwx[$a] += weight * $i;
|
||
</span><span class="hll"> @sumw[$a] += weight;
|
||
</span><span class="hll">
|
||
</span><span class="hll"> @sumx[$a] += $i;
|
||
</span><span class="hll"> @sumn[$a] += 1;
|
||
</span><span class="hll">
|
||
</span><span class="hll"> end {
|
||
</span><span class="hll"> map wmean = {};
|
||
</span><span class="hll"> map mean = {};
|
||
</span><span class="hll"> for (a in @sumwx) {
|
||
</span><span class="hll"> wmean[a] = @sumwx[a] / @sumw[a]
|
||
</span><span class="hll"> }
|
||
</span><span class="hll"> for (a in @sumx) {
|
||
</span><span class="hll"> mean[a] = @sumx[a] / @sumn[a]
|
||
</span><span class="hll"> }
|
||
</span><span class="hll"> #emit wmean, "a";
|
||
</span><span class="hll"> #emit mean, "a";
|
||
</span><span class="hll"> emit (wmean, mean), "a";
|
||
</span><span class="hll"> }'
|
||
</span> a=pan,wmean=4979.563722208067,mean=5028.259010091302
|
||
a=eks,wmean=4890.3815931472145,mean=4956.2900763358775
|
||
a=wye,wmean=4946.987746229947,mean=4920.001017293998
|
||
a=zee,wmean=5164.719684856538,mean=5123.092330239375
|
||
a=hat,wmean=4925.533162478552,mean=4967.743946419371
|
||
</pre></div>
|
||
</div>
|
||
</div>
|
||
</div>
|
||
|
||
|
||
</div>
|
||
</div>
|
||
</div>
|
||
|
||
<div class="footer" role="contentinfo">
|
||
© Copyright 2021, John Kerl.
|
||
Created using <a href="https://www.sphinx-doc.org/">Sphinx</a> 3.2.1.
|
||
</div>
|
||
</body>
|
||
</html> |