mirror of
https://github.com/johnkerl/miller.git
synced 2026-01-24 02:36:15 +00:00
217 lines
No EOL
13 KiB
HTML
217 lines
No EOL
13 KiB
HTML
|
||
<!DOCTYPE html>
|
||
|
||
<html>
|
||
<head>
|
||
<meta charset="utf-8" />
|
||
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
|
||
<title>Randomizing examples — Miller 6.0.0-alpha documentation</title>
|
||
|
||
<link rel="stylesheet" href="_static/scrolls.css" type="text/css" />
|
||
<link rel="stylesheet" href="_static/pygments.css" type="text/css" />
|
||
<link rel="stylesheet" href="_static/print.css" type="text/css" />
|
||
|
||
<script id="documentation_options" data-url_root="./" src="_static/documentation_options.js"></script>
|
||
<script src="_static/jquery.js"></script>
|
||
<script src="_static/underscore.js"></script>
|
||
<script src="_static/doctools.js"></script>
|
||
<script src="_static/language_data.js"></script>
|
||
<script src="_static/theme_extras.js"></script>
|
||
<link rel="index" title="Index" href="genindex.html" />
|
||
<link rel="search" title="Search" href="search.html" />
|
||
<link rel="next" title="Two-pass algorithms" href="two-pass-algorithms.html" />
|
||
<link rel="prev" title="Statistics examples" href="statistics-examples.html" />
|
||
</head><body>
|
||
<div id="content">
|
||
<div class="header">
|
||
<h1 class="heading"><a href="index.html"
|
||
title="back to the documentation overview"><span>Randomizing examples</span></a></h1>
|
||
</div>
|
||
<div class="relnav" role="navigation" aria-label="related navigation">
|
||
<a href="statistics-examples.html">« Statistics examples</a> |
|
||
<a href="#">Randomizing examples</a>
|
||
| <a href="two-pass-algorithms.html">Two-pass algorithms »</a>
|
||
</div>
|
||
<div id="contentwrapper">
|
||
<div id="toc" role="navigation" aria-label="table of contents navigation">
|
||
<h3>Table of Contents</h3>
|
||
<ul>
|
||
<li><a class="reference internal" href="#">Randomizing examples</a><ul>
|
||
<li><a class="reference internal" href="#generating-random-numbers-from-various-distributions">Generating random numbers from various distributions</a></li>
|
||
<li><a class="reference internal" href="#randomly-selecting-words-from-a-list">Randomly selecting words from a list</a></li>
|
||
<li><a class="reference internal" href="#randomly-generating-jabberwocky-words">Randomly generating jabberwocky words</a></li>
|
||
</ul>
|
||
</li>
|
||
</ul>
|
||
|
||
</div>
|
||
<div role="main">
|
||
|
||
<div class="section" id="randomizing-examples">
|
||
<h1>Randomizing examples<a class="headerlink" href="#randomizing-examples" title="Permalink to this headline">¶</a></h1>
|
||
<div class="section" id="generating-random-numbers-from-various-distributions">
|
||
<h2>Generating random numbers from various distributions<a class="headerlink" href="#generating-random-numbers-from-various-distributions" title="Permalink to this headline">¶</a></h2>
|
||
<p>Here we can chain together a few simple building blocks:</p>
|
||
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> cat expo-sample.sh
|
||
</span> # Generate 100,000 pairs of independent and identically distributed
|
||
# exponentially distributed random variables with the same rate parameter
|
||
# (namely, 2.5). Then compute histograms of one of them, along with
|
||
# histograms for their sum and their product.
|
||
#
|
||
# See also https://en.wikipedia.org/wiki/Exponential_distribution
|
||
#
|
||
# Here I'm using a specified random-number seed so this example always
|
||
# produces the same output for this web document: in everyday practice we
|
||
# wouldn't do that.
|
||
|
||
mlr -n \
|
||
--seed 0 \
|
||
--opprint \
|
||
seqgen --stop 100000 \
|
||
then put '
|
||
# https://en.wikipedia.org/wiki/Inverse_transform_sampling
|
||
func expo_sample(lambda) {
|
||
return -log(1-urand())/lambda
|
||
}
|
||
$u = expo_sample(2.5);
|
||
$v = expo_sample(2.5);
|
||
$s = $u + $v;
|
||
$p = $u * $v;
|
||
' \
|
||
then histogram -f u,s,p --lo 0 --hi 2 --nbins 50 \
|
||
then bar -f u_count,s_count,p_count --auto -w 20
|
||
</pre></div>
|
||
</div>
|
||
<p>Namely:</p>
|
||
<ul class="simple">
|
||
<li><p>Set the Miller random-number seed so this webdoc looks the same every time I regenerate it.</p></li>
|
||
<li><p>Use pretty-printed tabular output.</p></li>
|
||
<li><p>Use pretty-printed tabular output.</p></li>
|
||
<li><p>Use <code class="docutils literal notranslate"><span class="pre">seqgen</span></code> to produce 100,000 records <code class="docutils literal notranslate"><span class="pre">i=0</span></code>, <code class="docutils literal notranslate"><span class="pre">i=1</span></code>, etc.</p></li>
|
||
<li><p>Send those to a <code class="docutils literal notranslate"><span class="pre">put</span></code> step which defines an inverse-transform-sampling function and calls it twice, then computes the sum and product of samples.</p></li>
|
||
<li><p>Send those to a histogram, and from there to a bar-plotter. This is just for visualization; you could just as well output CSV and send that off to your own plotting tool, etc.</p></li>
|
||
</ul>
|
||
<p>The output is as follows:</p>
|
||
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> sh expo-sample.sh
|
||
</span> bin_lo bin_hi u_count s_count p_count
|
||
0 0.04 [64]*******************#[9554] [326]#...................[3703] [19]*******************#[39809]
|
||
0.04 0.08 [64]*****************...[9554] [326]*****...............[3703] [19]*******.............[39809]
|
||
0.08 0.12 [64]****************....[9554] [326]*********...........[3703] [19]****................[39809]
|
||
0.12 0.16 [64]**************......[9554] [326]************........[3703] [19]***.................[39809]
|
||
0.16 0.2 [64]*************.......[9554] [326]**************......[3703] [19]**..................[39809]
|
||
0.2 0.24 [64]************........[9554] [326]*****************...[3703] [19]*...................[39809]
|
||
0.24 0.28 [64]**********..........[9554] [326]******************..[3703] [19]*...................[39809]
|
||
0.28 0.32 [64]*********...........[9554] [326]******************..[3703] [19]*...................[39809]
|
||
0.32 0.36 [64]********............[9554] [326]*******************.[3703] [19]#...................[39809]
|
||
0.36 0.4 [64]*******.............[9554] [326]*******************#[3703] [19]#...................[39809]
|
||
0.4 0.44 [64]*******.............[9554] [326]*******************.[3703] [19]#...................[39809]
|
||
0.44 0.48 [64]******..............[9554] [326]*******************.[3703] [19]#...................[39809]
|
||
0.48 0.52 [64]*****...............[9554] [326]******************..[3703] [19]#...................[39809]
|
||
0.52 0.56 [64]*****...............[9554] [326]******************..[3703] [19]#...................[39809]
|
||
0.56 0.6 [64]****................[9554] [326]*****************...[3703] [19]#...................[39809]
|
||
0.6 0.64 [64]****................[9554] [326]******************..[3703] [19]#...................[39809]
|
||
0.64 0.68 [64]***.................[9554] [326]****************....[3703] [19]#...................[39809]
|
||
0.68 0.72 [64]***.................[9554] [326]****************....[3703] [19]#...................[39809]
|
||
0.72 0.76 [64]***.................[9554] [326]***************.....[3703] [19]#...................[39809]
|
||
0.76 0.8 [64]**..................[9554] [326]**************......[3703] [19]#...................[39809]
|
||
0.8 0.84 [64]**..................[9554] [326]*************.......[3703] [19]#...................[39809]
|
||
0.84 0.88 [64]**..................[9554] [326]************........[3703] [19]#...................[39809]
|
||
0.88 0.92 [64]**..................[9554] [326]************........[3703] [19]#...................[39809]
|
||
0.92 0.96 [64]*...................[9554] [326]***********.........[3703] [19]#...................[39809]
|
||
0.96 1 [64]*...................[9554] [326]**********..........[3703] [19]#...................[39809]
|
||
1 1.04 [64]*...................[9554] [326]*********...........[3703] [19]#...................[39809]
|
||
1.04 1.08 [64]*...................[9554] [326]********............[3703] [19]#...................[39809]
|
||
1.08 1.12 [64]*...................[9554] [326]********............[3703] [19]#...................[39809]
|
||
1.12 1.16 [64]*...................[9554] [326]********............[3703] [19]#...................[39809]
|
||
1.16 1.2 [64]*...................[9554] [326]*******.............[3703] [19]#...................[39809]
|
||
1.2 1.24 [64]#...................[9554] [326]******..............[3703] [19]#...................[39809]
|
||
1.24 1.28 [64]#...................[9554] [326]*****...............[3703] [19]#...................[39809]
|
||
1.28 1.32 [64]#...................[9554] [326]*****...............[3703] [19]#...................[39809]
|
||
1.32 1.36 [64]#...................[9554] [326]****................[3703] [19]#...................[39809]
|
||
1.36 1.4 [64]#...................[9554] [326]****................[3703] [19]#...................[39809]
|
||
1.4 1.44 [64]#...................[9554] [326]****................[3703] [19]#...................[39809]
|
||
1.44 1.48 [64]#...................[9554] [326]***.................[3703] [19]#...................[39809]
|
||
1.48 1.52 [64]#...................[9554] [326]***.................[3703] [19]#...................[39809]
|
||
1.52 1.56 [64]#...................[9554] [326]***.................[3703] [19]#...................[39809]
|
||
1.56 1.6 [64]#...................[9554] [326]**..................[3703] [19]#...................[39809]
|
||
1.6 1.64 [64]#...................[9554] [326]**..................[3703] [19]#...................[39809]
|
||
1.64 1.68 [64]#...................[9554] [326]**..................[3703] [19]#...................[39809]
|
||
1.68 1.72 [64]#...................[9554] [326]*...................[3703] [19]#...................[39809]
|
||
1.72 1.76 [64]#...................[9554] [326]*...................[3703] [19]#...................[39809]
|
||
1.76 1.8 [64]#...................[9554] [326]*...................[3703] [19]#...................[39809]
|
||
1.8 1.84 [64]#...................[9554] [326]#...................[3703] [19]#...................[39809]
|
||
1.84 1.88 [64]#...................[9554] [326]#...................[3703] [19]#...................[39809]
|
||
1.88 1.92 [64]#...................[9554] [326]#...................[3703] [19]#...................[39809]
|
||
1.92 1.96 [64]#...................[9554] [326]#...................[3703] [19]#...................[39809]
|
||
1.96 2 [64]#...................[9554] [326]#...................[3703] [19]#...................[39809]
|
||
</pre></div>
|
||
</div>
|
||
</div>
|
||
<div class="section" id="randomly-selecting-words-from-a-list">
|
||
<h2>Randomly selecting words from a list<a class="headerlink" href="#randomly-selecting-words-from-a-list" title="Permalink to this headline">¶</a></h2>
|
||
<p>Given this <a class="reference external" href="./data/english-words.txt">word list</a>, first take a look to see what the first few lines look like:</p>
|
||
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> head data/english-words.txt
|
||
</span> a
|
||
aa
|
||
aal
|
||
aalii
|
||
aam
|
||
aardvark
|
||
aardwolf
|
||
aba
|
||
abac
|
||
abaca
|
||
</pre></div>
|
||
</div>
|
||
<p>Then the following will randomly sample ten words with four to eight characters in them:</p>
|
||
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> mlr --from data/english-words.txt --nidx filter -S 'n=strlen($1);4<=n&&n<=8' then sample -k 10
|
||
</span> thionine
|
||
birchman
|
||
mildewy
|
||
avigate
|
||
addedly
|
||
abaze
|
||
askant
|
||
aiming
|
||
insulant
|
||
coinmate
|
||
</pre></div>
|
||
</div>
|
||
</div>
|
||
<div class="section" id="randomly-generating-jabberwocky-words">
|
||
<h2>Randomly generating jabberwocky words<a class="headerlink" href="#randomly-generating-jabberwocky-words" title="Permalink to this headline">¶</a></h2>
|
||
<p>These are simple <em>n</em>-grams as <a class="reference external" href="http://johnkerl.org/randspell/randspell-slides-ts.pdf">described here</a>. Some common functions are <a class="reference external" href="https://github.com/johnkerl/miller/blob/master/docs/ngrams/ngfuncs.mlr.txt">located here</a>. Then here are scripts for <a class="reference external" href="https://github.com/johnkerl/miller/blob/master/docs/ngrams/ng1.mlr.txt">1-grams</a> <a class="reference external" href="https://github.com/johnkerl/miller/blob/master/docs/ngrams/ng2.mlr.txt">2-grams</a> <a class="reference external" href="https://github.com/johnkerl/miller/blob/master/docs/ngrams/ng3.mlr.txt">3-grams</a> <a class="reference external" href="https://github.com/johnkerl/miller/blob/master/docs/ngrams/ng4.mlr.txt">4-grams</a>, and <a class="reference external" href="https://github.com/johnkerl/miller/blob/master/docs/ngrams/ng5.mlr.txt">5-grams</a>.</p>
|
||
<p>The idea is that words from the input file are consumed, then taken apart and pasted back together in ways which imitate the letter-to-letter transitions found in the word list – giving us automatically generated words in the same vein as <em>bromance</em> and <em>spork</em>:</p>
|
||
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> mlr --nidx --from ./ngrams/gsl-2000.txt put -q -f ./ngrams/ngfuncs.mlr -f ./ngrams/ng5.mlr
|
||
</span> beard
|
||
plastinguish
|
||
politicially
|
||
noise
|
||
loan
|
||
country
|
||
controductionary
|
||
suppery
|
||
lose
|
||
lessors
|
||
dollar
|
||
judge
|
||
rottendence
|
||
lessenger
|
||
diffendant
|
||
suggestional
|
||
</pre></div>
|
||
</div>
|
||
</div>
|
||
</div>
|
||
|
||
|
||
</div>
|
||
</div>
|
||
</div>
|
||
|
||
<div class="footer" role="contentinfo">
|
||
© Copyright 2021, John Kerl.
|
||
Created using <a href="https://www.sphinx-doc.org/">Sphinx</a> 3.2.1.
|
||
</div>
|
||
</body>
|
||
</html> |