miller/docs6/docs/_build/html/randomizing-examples.html

217 lines
No EOL
13 KiB
HTML
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>Randomizing examples &#8212; Miller 6.0.0-alpha documentation</title>
<link rel="stylesheet" href="_static/scrolls.css" type="text/css" />
<link rel="stylesheet" href="_static/pygments.css" type="text/css" />
<link rel="stylesheet" href="_static/print.css" type="text/css" />
<script id="documentation_options" data-url_root="./" src="_static/documentation_options.js"></script>
<script src="_static/jquery.js"></script>
<script src="_static/underscore.js"></script>
<script src="_static/doctools.js"></script>
<script src="_static/language_data.js"></script>
<script src="_static/theme_extras.js"></script>
<link rel="index" title="Index" href="genindex.html" />
<link rel="search" title="Search" href="search.html" />
<link rel="next" title="Two-pass algorithms" href="two-pass-algorithms.html" />
<link rel="prev" title="Statistics examples" href="statistics-examples.html" />
</head><body>
<div id="content">
<div class="header">
<h1 class="heading"><a href="index.html"
title="back to the documentation overview"><span>Randomizing examples</span></a></h1>
</div>
<div class="relnav" role="navigation" aria-label="related navigation">
<a href="statistics-examples.html">&laquo; Statistics examples</a> |
<a href="#">Randomizing examples</a>
| <a href="two-pass-algorithms.html">Two-pass algorithms &raquo;</a>
</div>
<div id="contentwrapper">
<div id="toc" role="navigation" aria-label="table of contents navigation">
<h3>Table of Contents</h3>
<ul>
<li><a class="reference internal" href="#">Randomizing examples</a><ul>
<li><a class="reference internal" href="#generating-random-numbers-from-various-distributions">Generating random numbers from various distributions</a></li>
<li><a class="reference internal" href="#randomly-selecting-words-from-a-list">Randomly selecting words from a list</a></li>
<li><a class="reference internal" href="#randomly-generating-jabberwocky-words">Randomly generating jabberwocky words</a></li>
</ul>
</li>
</ul>
</div>
<div role="main">
<div class="section" id="randomizing-examples">
<h1>Randomizing examples<a class="headerlink" href="#randomizing-examples" title="Permalink to this headline"></a></h1>
<div class="section" id="generating-random-numbers-from-various-distributions">
<h2>Generating random numbers from various distributions<a class="headerlink" href="#generating-random-numbers-from-various-distributions" title="Permalink to this headline"></a></h2>
<p>Here we can chain together a few simple building blocks:</p>
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> cat expo-sample.sh
</span> # Generate 100,000 pairs of independent and identically distributed
# exponentially distributed random variables with the same rate parameter
# (namely, 2.5). Then compute histograms of one of them, along with
# histograms for their sum and their product.
#
# See also https://en.wikipedia.org/wiki/Exponential_distribution
#
# Here I&#39;m using a specified random-number seed so this example always
# produces the same output for this web document: in everyday practice we
# wouldn&#39;t do that.
mlr -n \
--seed 0 \
--opprint \
seqgen --stop 100000 \
then put &#39;
# https://en.wikipedia.org/wiki/Inverse_transform_sampling
func expo_sample(lambda) {
return -log(1-urand())/lambda
}
$u = expo_sample(2.5);
$v = expo_sample(2.5);
$s = $u + $v;
$p = $u * $v;
&#39; \
then histogram -f u,s,p --lo 0 --hi 2 --nbins 50 \
then bar -f u_count,s_count,p_count --auto -w 20
</pre></div>
</div>
<p>Namely:</p>
<ul class="simple">
<li><p>Set the Miller random-number seed so this webdoc looks the same every time I regenerate it.</p></li>
<li><p>Use pretty-printed tabular output.</p></li>
<li><p>Use pretty-printed tabular output.</p></li>
<li><p>Use <code class="docutils literal notranslate"><span class="pre">seqgen</span></code> to produce 100,000 records <code class="docutils literal notranslate"><span class="pre">i=0</span></code>, <code class="docutils literal notranslate"><span class="pre">i=1</span></code>, etc.</p></li>
<li><p>Send those to a <code class="docutils literal notranslate"><span class="pre">put</span></code> step which defines an inverse-transform-sampling function and calls it twice, then computes the sum and product of samples.</p></li>
<li><p>Send those to a histogram, and from there to a bar-plotter. This is just for visualization; you could just as well output CSV and send that off to your own plotting tool, etc.</p></li>
</ul>
<p>The output is as follows:</p>
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> sh expo-sample.sh
</span> bin_lo bin_hi u_count s_count p_count
0 0.04 [64]*******************#[9554] [326]#...................[3703] [19]*******************#[39809]
0.04 0.08 [64]*****************...[9554] [326]*****...............[3703] [19]*******.............[39809]
0.08 0.12 [64]****************....[9554] [326]*********...........[3703] [19]****................[39809]
0.12 0.16 [64]**************......[9554] [326]************........[3703] [19]***.................[39809]
0.16 0.2 [64]*************.......[9554] [326]**************......[3703] [19]**..................[39809]
0.2 0.24 [64]************........[9554] [326]*****************...[3703] [19]*...................[39809]
0.24 0.28 [64]**********..........[9554] [326]******************..[3703] [19]*...................[39809]
0.28 0.32 [64]*********...........[9554] [326]******************..[3703] [19]*...................[39809]
0.32 0.36 [64]********............[9554] [326]*******************.[3703] [19]#...................[39809]
0.36 0.4 [64]*******.............[9554] [326]*******************#[3703] [19]#...................[39809]
0.4 0.44 [64]*******.............[9554] [326]*******************.[3703] [19]#...................[39809]
0.44 0.48 [64]******..............[9554] [326]*******************.[3703] [19]#...................[39809]
0.48 0.52 [64]*****...............[9554] [326]******************..[3703] [19]#...................[39809]
0.52 0.56 [64]*****...............[9554] [326]******************..[3703] [19]#...................[39809]
0.56 0.6 [64]****................[9554] [326]*****************...[3703] [19]#...................[39809]
0.6 0.64 [64]****................[9554] [326]******************..[3703] [19]#...................[39809]
0.64 0.68 [64]***.................[9554] [326]****************....[3703] [19]#...................[39809]
0.68 0.72 [64]***.................[9554] [326]****************....[3703] [19]#...................[39809]
0.72 0.76 [64]***.................[9554] [326]***************.....[3703] [19]#...................[39809]
0.76 0.8 [64]**..................[9554] [326]**************......[3703] [19]#...................[39809]
0.8 0.84 [64]**..................[9554] [326]*************.......[3703] [19]#...................[39809]
0.84 0.88 [64]**..................[9554] [326]************........[3703] [19]#...................[39809]
0.88 0.92 [64]**..................[9554] [326]************........[3703] [19]#...................[39809]
0.92 0.96 [64]*...................[9554] [326]***********.........[3703] [19]#...................[39809]
0.96 1 [64]*...................[9554] [326]**********..........[3703] [19]#...................[39809]
1 1.04 [64]*...................[9554] [326]*********...........[3703] [19]#...................[39809]
1.04 1.08 [64]*...................[9554] [326]********............[3703] [19]#...................[39809]
1.08 1.12 [64]*...................[9554] [326]********............[3703] [19]#...................[39809]
1.12 1.16 [64]*...................[9554] [326]********............[3703] [19]#...................[39809]
1.16 1.2 [64]*...................[9554] [326]*******.............[3703] [19]#...................[39809]
1.2 1.24 [64]#...................[9554] [326]******..............[3703] [19]#...................[39809]
1.24 1.28 [64]#...................[9554] [326]*****...............[3703] [19]#...................[39809]
1.28 1.32 [64]#...................[9554] [326]*****...............[3703] [19]#...................[39809]
1.32 1.36 [64]#...................[9554] [326]****................[3703] [19]#...................[39809]
1.36 1.4 [64]#...................[9554] [326]****................[3703] [19]#...................[39809]
1.4 1.44 [64]#...................[9554] [326]****................[3703] [19]#...................[39809]
1.44 1.48 [64]#...................[9554] [326]***.................[3703] [19]#...................[39809]
1.48 1.52 [64]#...................[9554] [326]***.................[3703] [19]#...................[39809]
1.52 1.56 [64]#...................[9554] [326]***.................[3703] [19]#...................[39809]
1.56 1.6 [64]#...................[9554] [326]**..................[3703] [19]#...................[39809]
1.6 1.64 [64]#...................[9554] [326]**..................[3703] [19]#...................[39809]
1.64 1.68 [64]#...................[9554] [326]**..................[3703] [19]#...................[39809]
1.68 1.72 [64]#...................[9554] [326]*...................[3703] [19]#...................[39809]
1.72 1.76 [64]#...................[9554] [326]*...................[3703] [19]#...................[39809]
1.76 1.8 [64]#...................[9554] [326]*...................[3703] [19]#...................[39809]
1.8 1.84 [64]#...................[9554] [326]#...................[3703] [19]#...................[39809]
1.84 1.88 [64]#...................[9554] [326]#...................[3703] [19]#...................[39809]
1.88 1.92 [64]#...................[9554] [326]#...................[3703] [19]#...................[39809]
1.92 1.96 [64]#...................[9554] [326]#...................[3703] [19]#...................[39809]
1.96 2 [64]#...................[9554] [326]#...................[3703] [19]#...................[39809]
</pre></div>
</div>
</div>
<div class="section" id="randomly-selecting-words-from-a-list">
<h2>Randomly selecting words from a list<a class="headerlink" href="#randomly-selecting-words-from-a-list" title="Permalink to this headline"></a></h2>
<p>Given this <a class="reference external" href="./data/english-words.txt">word list</a>, first take a look to see what the first few lines look like:</p>
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> head data/english-words.txt
</span> a
aa
aal
aalii
aam
aardvark
aardwolf
aba
abac
abaca
</pre></div>
</div>
<p>Then the following will randomly sample ten words with four to eight characters in them:</p>
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> mlr --from data/english-words.txt --nidx filter -S &#39;n=strlen($1);4&lt;=n&amp;&amp;n&lt;=8&#39; then sample -k 10
</span> thionine
birchman
mildewy
avigate
addedly
abaze
askant
aiming
insulant
coinmate
</pre></div>
</div>
</div>
<div class="section" id="randomly-generating-jabberwocky-words">
<h2>Randomly generating jabberwocky words<a class="headerlink" href="#randomly-generating-jabberwocky-words" title="Permalink to this headline"></a></h2>
<p>These are simple <em>n</em>-grams as <a class="reference external" href="http://johnkerl.org/randspell/randspell-slides-ts.pdf">described here</a>. Some common functions are <a class="reference external" href="https://github.com/johnkerl/miller/blob/master/docs/ngrams/ngfuncs.mlr.txt">located here</a>. Then here are scripts for <a class="reference external" href="https://github.com/johnkerl/miller/blob/master/docs/ngrams/ng1.mlr.txt">1-grams</a> <a class="reference external" href="https://github.com/johnkerl/miller/blob/master/docs/ngrams/ng2.mlr.txt">2-grams</a> <a class="reference external" href="https://github.com/johnkerl/miller/blob/master/docs/ngrams/ng3.mlr.txt">3-grams</a> <a class="reference external" href="https://github.com/johnkerl/miller/blob/master/docs/ngrams/ng4.mlr.txt">4-grams</a>, and <a class="reference external" href="https://github.com/johnkerl/miller/blob/master/docs/ngrams/ng5.mlr.txt">5-grams</a>.</p>
<p>The idea is that words from the input file are consumed, then taken apart and pasted back together in ways which imitate the letter-to-letter transitions found in the word list giving us automatically generated words in the same vein as <em>bromance</em> and <em>spork</em>:</p>
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> mlr --nidx --from ./ngrams/gsl-2000.txt put -q -f ./ngrams/ngfuncs.mlr -f ./ngrams/ng5.mlr
</span> beard
plastinguish
politicially
noise
loan
country
controductionary
suppery
lose
lessors
dollar
judge
rottendence
lessenger
diffendant
suggestional
</pre></div>
</div>
</div>
</div>
</div>
</div>
</div>
<div class="footer" role="contentinfo">
&#169; Copyright 2021, John Kerl.
Created using <a href="https://www.sphinx-doc.org/">Sphinx</a> 3.2.1.
</div>
</body>
</html>