miller/docs6/docs/randomizing-examples.md.in
2021-08-22 11:41:31 -04:00

83 lines
2.6 KiB
Markdown

# Randomizing examples
## Generating random numbers from various distributions
Here we can chain together a few simple building blocks:
GENMD_RUN_COMMAND
cat expo-sample.sh
GENMD_EOF
Namely:
* Set the Miller random-number seed so this webdoc looks the same every time I regenerate it.
* Use pretty-printed tabular output.
* Use `seqgen` to produce 100,000 records `i=0`, `i=1`, etc.
* Send those to a `put` step which defines an inverse-transform-sampling function and calls it twice, then computes the sum and product of samples.
* Send those to a histogram, and from there to a bar-plotter. This is just for visualization; you could just as well output CSV and send that off to your own plotting tool, etc.
The output is as follows:
GENMD_RUN_COMMAND
sh expo-sample.sh
GENMD_EOF
## Randomly selecting words from a list
Given this [word list](./data/english-words.txt), first take a look to see what the first few lines look like:
GENMD_CARDIFY_HIGHLIGHT_ONE
head data/english-words.txt
a
aa
aal
aalii
aam
aardvark
aardwolf
aba
abac
abaca
GENMD_EOF
Then the following will randomly sample ten words with four to eight characters in them:
GENMD_CARDIFY_HIGHLIGHT_ONE
mlr --from data/english-words.txt --nidx filter -S 'n=strlen($1);4<=n&&n<=8' then sample -k 10
thionine
birchman
mildewy
avigate
addedly
abaze
askant
aiming
insulant
coinmate
GENMD_EOF
## Randomly generating jabberwocky words
These are simple *n*-grams as [described here](http://johnkerl.org/randspell/randspell-slides-ts.pdf). Some common functions are [located here](https://github.com/johnkerl/miller/blob/master/docs/ngrams/ngfuncs.mlr.txt). Then here are scripts for [1-grams](https://github.com/johnkerl/miller/blob/master/docs/ngrams/ng1.mlr.txt), [2-grams](https://github.com/johnkerl/miller/blob/master/docs/ngrams/ng2.mlr.txt), [3-grams](https://github.com/johnkerl/miller/blob/master/docs/ngrams/ng3.mlr.txt), [4-grams](https://github.com/johnkerl/miller/blob/master/docs/ngrams/ng4.mlr.txt), and [5-grams](https://github.com/johnkerl/miller/blob/master/docs/ngrams/ng5.mlr.txt).
The idea is that words from the input file are consumed, then taken apart and pasted back together in ways which imitate the letter-to-letter transitions found in the word list -- giving us automatically generated words in the same vein as *bromance* and *spork*:
GENMD_CARDIFY_HIGHLIGHT_ONE
mlr --nidx --from ./ngrams/gsl-2000.txt put -q -f ./ngrams/ngfuncs.mlr -f ./ngrams/ng5.mlr
beard
plastinguish
politicially
noise
loan
country
controductionary
suppery
lose
lessors
dollar
judge
rottendence
lessenger
diffendant
suggestional
GENMD_EOF