First pass at converting Miller 6 docs from Sphinx to Mkdocs (#616)

* Accept more passing emit cases

* Port docs from sphinx to mkdocs

* iterating

* rephrase internal-link syntax using mkdocs

* iterating
This commit is contained in:
John Kerl 2021-08-04 01:54:01 -04:00 committed by GitHub
parent 86f31f2f9b
commit 11eac853d2
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
1056 changed files with 611281 additions and 50 deletions

2
.gitignore vendored
View file

@ -119,3 +119,5 @@ experiments/dsl-parser/two/src
experiments/dsl-parser/two/main
experiments/cli-parser/cliparse
experiments/cli-parser/cliparse.exe
docs6b/site/

2
docs6b/docs/.vimrc Normal file
View file

@ -0,0 +1,2 @@
map \d :w<C-m>:!clear;build-one %<C-m>
map \f :w<C-m>:!clear;make html<C-m>

2
docs6b/docs/10-1.sh Executable file
View file

@ -0,0 +1,2 @@
grep op=cache log.txt \
| mlr --idkvp --opprint stats1 -a mean -f hit -g type then sort -f type

4
docs6b/docs/10-2.sh Executable file
View file

@ -0,0 +1,4 @@
mlr --from log.txt --opprint \
filter 'is_present($batch_size)' \
then step -a delta -f time,num_filtered \
then sec2gmt time

608
docs6b/docs/10min.md Normal file
View file

@ -0,0 +1,608 @@
<!--- PLEASE DO NOT EDIT DIRECTLY. EDIT THE .md.in FILE PLEASE. --->
# Miller in 10 minutes
## Obtaining Miller
You can install Miller for various platforms as follows:
* Linux: ``yum install miller`` or ``apt-get install miller`` depending on your flavor of Linux
* MacOS: ``brew install miller`` or ``port install miller`` depending on your preference of [Homebrew](https://brew.sh>`_ or `MacPorts <https://macports.org).
* Windows: ``choco install miller`` using [Chocolatey](https://chocolatey.org).
* You can get latest builds for Linux, MacOS, and Windows by visiting https://github.com/johnkerl/miller/actions, selecting the latest build, and clicking _Artifacts_. (These are retained for 5 days after each commit.)
* See also the [build page](build.md) if you prefer -- in particular, if your platform's package manager doesn't have the latest release.
As a first check, you should be able to run ``mlr --version`` at your system's command prompt and see something like the following:
<pre>
<b>mlr --version</b>
Miller v6.0.0-dev
</pre>
As a second check, given [example.csv](./example.csv) you should be able to do
<pre>
<b>mlr --csv cat example.csv</b>
color,shape,flag,index,quantity,rate
yellow,triangle,true,11,43.6498,9.8870
red,square,true,15,79.2778,0.0130
red,circle,true,16,13.8103,2.9010
red,square,false,48,77.5542,7.4670
purple,triangle,false,51,81.2290,8.5910
red,square,false,64,77.1991,9.5310
purple,triangle,false,65,80.1405,5.8240
yellow,circle,true,73,63.9785,4.2370
yellow,circle,true,87,63.5058,8.3350
purple,square,false,91,72.3735,8.2430
</pre>
<pre>
<b>mlr --icsv --opprint cat example.csv</b>
color shape flag index quantity rate
yellow triangle true 11 43.6498 9.8870
red square true 15 79.2778 0.0130
red circle true 16 13.8103 2.9010
red square false 48 77.5542 7.4670
purple triangle false 51 81.2290 8.5910
red square false 64 77.1991 9.5310
purple triangle false 65 80.1405 5.8240
yellow circle true 73 63.9785 4.2370
yellow circle true 87 63.5058 8.3350
purple square false 91 72.3735 8.2430
</pre>
If you run into issues on these checks, please check out the resources on the [community page](community.md) for help.
## Miller verbs
Let's take a quick look at some of the most useful Miller verbs -- file-format-aware, name-index-empowered equivalents of standard system commands.
``mlr cat`` is like system ``cat`` (or ``type`` on Windows) -- it passes the data through unmodified:
<pre>
<b>mlr --csv cat example.csv</b>
color,shape,flag,index,quantity,rate
yellow,triangle,true,11,43.6498,9.8870
red,square,true,15,79.2778,0.0130
red,circle,true,16,13.8103,2.9010
red,square,false,48,77.5542,7.4670
purple,triangle,false,51,81.2290,8.5910
red,square,false,64,77.1991,9.5310
purple,triangle,false,65,80.1405,5.8240
yellow,circle,true,73,63.9785,4.2370
yellow,circle,true,87,63.5058,8.3350
purple,square,false,91,72.3735,8.2430
</pre>
But ``mlr cat`` can also do format conversion -- for example, you can pretty-print in tabular format:
<pre>
<b>mlr --icsv --opprint cat example.csv</b>
color shape flag index quantity rate
yellow triangle true 11 43.6498 9.8870
red square true 15 79.2778 0.0130
red circle true 16 13.8103 2.9010
red square false 48 77.5542 7.4670
purple triangle false 51 81.2290 8.5910
red square false 64 77.1991 9.5310
purple triangle false 65 80.1405 5.8240
yellow circle true 73 63.9785 4.2370
yellow circle true 87 63.5058 8.3350
purple square false 91 72.3735 8.2430
</pre>
``mlr head`` and ``mlr tail`` count records rather than lines. Whether you're getting the first few records or the last few, the CSV header is included either way:
<pre>
<b>mlr --csv head -n 4 example.csv</b>
color,shape,flag,index,quantity,rate
yellow,triangle,true,11,43.6498,9.8870
red,square,true,15,79.2778,0.0130
red,circle,true,16,13.8103,2.9010
red,square,false,48,77.5542,7.4670
</pre>
<pre>
<b>mlr --csv tail -n 4 example.csv</b>
color,shape,flag,index,quantity,rate
purple,triangle,false,65,80.1405,5.8240
yellow,circle,true,73,63.9785,4.2370
yellow,circle,true,87,63.5058,8.3350
purple,square,false,91,72.3735,8.2430
</pre>
<pre>
<b>mlr --icsv --ojson tail -n 2 example.csv</b>
{
"color": "yellow",
"shape": "circle",
"flag": true,
"index": 87,
"quantity": 63.5058,
"rate": 8.3350
}
{
"color": "purple",
"shape": "square",
"flag": false,
"index": 91,
"quantity": 72.3735,
"rate": 8.2430
}
</pre>
You can sort on a single field:
<pre>
<b>mlr --icsv --opprint sort -f shape example.csv</b>
color shape flag index quantity rate
red circle true 16 13.8103 2.9010
yellow circle true 73 63.9785 4.2370
yellow circle true 87 63.5058 8.3350
red square true 15 79.2778 0.0130
red square false 48 77.5542 7.4670
red square false 64 77.1991 9.5310
purple square false 91 72.3735 8.2430
yellow triangle true 11 43.6498 9.8870
purple triangle false 51 81.2290 8.5910
purple triangle false 65 80.1405 5.8240
</pre>
Or, you can sort primarily alphabetically on one field, then secondarily numerically descending on another field, and so on:
<pre>
<b>mlr --icsv --opprint sort -f shape -nr index example.csv</b>
color shape flag index quantity rate
yellow circle true 87 63.5058 8.3350
yellow circle true 73 63.9785 4.2370
red circle true 16 13.8103 2.9010
purple square false 91 72.3735 8.2430
red square false 64 77.1991 9.5310
red square false 48 77.5542 7.4670
red square true 15 79.2778 0.0130
purple triangle false 65 80.1405 5.8240
purple triangle false 51 81.2290 8.5910
yellow triangle true 11 43.6498 9.8870
</pre>
If there are fields you don't want to see in your data, you can use ``cut`` to keep only the ones you want, in the same order they appeared in the input data:
<pre>
<b>mlr --icsv --opprint cut -f flag,shape example.csv</b>
shape flag
triangle true
square true
circle true
square false
triangle false
square false
triangle false
circle true
circle true
square false
</pre>
You can also use ``cut -o`` to keep specified fields, but in your preferred order:
<pre>
<b>mlr --icsv --opprint cut -o -f flag,shape example.csv</b>
flag shape
true triangle
true square
true circle
false square
false triangle
false square
false triangle
true circle
true circle
false square
</pre>
You can use ``cut -x`` to omit fields you don't care about:
<pre>
<b>mlr --icsv --opprint cut -x -f flag,shape example.csv</b>
color index quantity rate
yellow 11 43.6498 9.8870
red 15 79.2778 0.0130
red 16 13.8103 2.9010
red 48 77.5542 7.4670
purple 51 81.2290 8.5910
red 64 77.1991 9.5310
purple 65 80.1405 5.8240
yellow 73 63.9785 4.2370
yellow 87 63.5058 8.3350
purple 91 72.3735 8.2430
</pre>
You can use ``filter`` to keep only records you care about:
<pre>
<b>mlr --icsv --opprint filter '$color == "red"' example.csv</b>
color shape flag index quantity rate
red square true 15 79.2778 0.0130
red circle true 16 13.8103 2.9010
red square false 48 77.5542 7.4670
red square false 64 77.1991 9.5310
</pre>
<pre>
<b>mlr --icsv --opprint filter '$color == "red" && $flag == true' example.csv</b>
color shape flag index quantity rate
red square true 15 79.2778 0.0130
red circle true 16 13.8103 2.9010
</pre>
You can use ``put`` to create new fields which are computed from other fields:
<pre>
<b>mlr --icsv --opprint put '</b>
<b> $ratio = $quantity / $rate;</b>
<b> $color_shape = $color . "_" . $shape</b>
<b>' example.csv</b>
color shape flag index quantity rate ratio color_shape
yellow triangle true 11 43.6498 9.8870 4.414868008496004 yellow_triangle
red square true 15 79.2778 0.0130 6098.292307692308 red_square
red circle true 16 13.8103 2.9010 4.760530851430541 red_circle
red square false 48 77.5542 7.4670 10.386259541984733 red_square
purple triangle false 51 81.2290 8.5910 9.455127458968688 purple_triangle
red square false 64 77.1991 9.5310 8.099790158430384 red_square
purple triangle false 65 80.1405 5.8240 13.760388049450551 purple_triangle
yellow circle true 73 63.9785 4.2370 15.09995279679018 yellow_circle
yellow circle true 87 63.5058 8.3350 7.619172165566886 yellow_circle
purple square false 91 72.3735 8.2430 8.779995147397793 purple_square
</pre>
Even though Miller's main selling point is name-indexing, sometimes you really want to refer to a field name by its positional index. Use ``$[[3]]`` to access the name of field 3 or ``$[[[3]]]`` to access the value of field 3:
<pre>
<b>mlr --icsv --opprint put '$[[3]] = "NEW"' example.csv</b>
color shape NEW index quantity rate
yellow triangle true 11 43.6498 9.8870
red square true 15 79.2778 0.0130
red circle true 16 13.8103 2.9010
red square false 48 77.5542 7.4670
purple triangle false 51 81.2290 8.5910
red square false 64 77.1991 9.5310
purple triangle false 65 80.1405 5.8240
yellow circle true 73 63.9785 4.2370
yellow circle true 87 63.5058 8.3350
purple square false 91 72.3735 8.2430
</pre>
<pre>
<b>mlr --icsv --opprint put '$[[[3]]] = "NEW"' example.csv</b>
color shape flag index quantity rate
yellow triangle NEW 11 43.6498 9.8870
red square NEW 15 79.2778 0.0130
red circle NEW 16 13.8103 2.9010
red square NEW 48 77.5542 7.4670
purple triangle NEW 51 81.2290 8.5910
red square NEW 64 77.1991 9.5310
purple triangle NEW 65 80.1405 5.8240
yellow circle NEW 73 63.9785 4.2370
yellow circle NEW 87 63.5058 8.3350
purple square NEW 91 72.3735 8.2430
</pre>
You can find the full list of verbs at the [Verbs Reference](reference-verbs.md) page.
## Multiple input files
Miller takes all the files from the command line as an input stream. But it's format-aware, so it doesn't repeat CSV header lines. For example, with input files [data/a.csv](data/a.csv and [data/b.csv](data/b.csv), the system ``cat`` command will repeat header lines:
<pre>
<b>cat data/a.csv</b>
a,b,c
1,2,3
4,5,6
</pre>
<pre>
<b>cat data/b.csv</b>
a,b,c
7,8,9
</pre>
<pre>
<b>cat data/a.csv data/b.csv</b>
a,b,c
1,2,3
4,5,6
a,b,c
7,8,9
</pre>
However, ``mlr cat`` will not:
<pre>
<b>mlr --csv cat data/a.csv data/b.csv</b>
a,b,c
1,2,3
4,5,6
7,8,9
</pre>
## Chaining verbs together
Often we want to chain queries together -- for example, sorting by a field and taking the top few values. We can do this using pipes:
<pre>
<b>mlr --csv sort -nr index example.csv | mlr --icsv --opprint head -n 3</b>
color shape flag index quantity rate
purple square false 91 72.3735 8.2430
yellow circle true 87 63.5058 8.3350
yellow circle true 73 63.9785 4.2370
</pre>
This works fine -- but Miller also lets you chain verbs together using the word ``then``. Think of this as a Miller-internal pipe that lets you use fewer keystrokes:
<pre>
<b>mlr --icsv --opprint sort -nr index then head -n 3 example.csv</b>
color shape flag index quantity rate
purple square false 91 72.3735 8.2430
yellow circle true 87 63.5058 8.3350
yellow circle true 73 63.9785 4.2370
</pre>
As another convenience, you can put the filename first using ``--from``. When you're interacting with your data at the command line, this makes it easier to up-arrow and append to the previous command:
<pre>
<b>mlr --icsv --opprint --from example.csv sort -nr index then head -n 3</b>
color shape flag index quantity rate
purple square false 91 72.3735 8.2430
yellow circle true 87 63.5058 8.3350
yellow circle true 73 63.9785 4.2370
</pre>
<pre>
<b>mlr --icsv --opprint --from example.csv \</b>
<b> sort -nr index \</b>
<b> then head -n 3 \</b>
<b> then cut -f shape,quantity</b>
shape quantity
square 72.3735
circle 63.5058
circle 63.9785
</pre>
## Sorts and stats
Now suppose you want to sort the data on a given column, *and then* take the top few in that ordering. You can use Miller's ``then`` feature to pipe commands together.
Here are the records with the top three ``index`` values:
<pre>
<b>mlr --icsv --opprint sort -nr index then head -n 3 example.csv</b>
color shape flag index quantity rate
purple square false 91 72.3735 8.2430
yellow circle true 87 63.5058 8.3350
yellow circle true 73 63.9785 4.2370
</pre>
Lots of Miller commands take a ``-g`` option for group-by: here, ``head -n 1 -g shape`` outputs the first record for each distinct value of the ``shape`` field. This means we're finding the record with highest ``index`` field for each distinct ``shape`` field:
<pre>
<b>mlr --icsv --opprint sort -f shape -nr index then head -n 1 -g shape example.csv</b>
color shape flag index quantity rate
yellow circle true 87 63.5058 8.3350
purple square false 91 72.3735 8.2430
purple triangle false 65 80.1405 5.8240
</pre>
Statistics can be computed with or without group-by field(s):
<pre>
<b>mlr --icsv --opprint --from example.csv \</b>
<b> stats1 -a count,min,mean,max -f quantity -g shape</b>
shape quantity_count quantity_min quantity_mean quantity_max
triangle 3 43.6498 68.33976666666666 81.229
square 4 72.3735 76.60114999999999 79.2778
circle 3 13.8103 47.0982 63.9785
</pre>
<pre>
<b>mlr --icsv --opprint --from example.csv \</b>
<b> stats1 -a count,min,mean,max -f quantity -g shape,color</b>
shape color quantity_count quantity_min quantity_mean quantity_max
triangle yellow 1 43.6498 43.6498 43.6498
square red 3 77.1991 78.01036666666666 79.2778
circle red 1 13.8103 13.8103 13.8103
triangle purple 2 80.1405 80.68475000000001 81.229
circle yellow 2 63.5058 63.742149999999995 63.9785
square purple 1 72.3735 72.3735 72.3735
</pre>
If your output has a lot of columns, you can use XTAB format to line things up vertically for you instead:
<pre>
<b>mlr --icsv --oxtab --from example.csv \</b>
<b> stats1 -a p0,p10,p25,p50,p75,p90,p99,p100 -f rate</b>
rate_p0 0.0130
rate_p10 2.9010
rate_p25 4.2370
rate_p50 8.2430
rate_p75 8.5910
rate_p90 9.8870
rate_p99 9.8870
rate_p100 9.8870
</pre>
## File formats and format conversion
Miller supports the following formats:
* CSV (comma-separared values)
* TSV (tab-separated values)
* JSON (JavaScript Object Notation)
* PPRINT (pretty-printed tabular)
* XTAB (vertical-tabular or sideways-tabular)
* NIDX (numerically indexed, label-free, with implicit labels ``"1"``, ``"2"``, etc.)
* DKVP (delimited key-value pairs).
What's a CSV file, really? It's an array of rows, or *records*, each being a list of key-value pairs, or *fields*: for CSV it so happens that all the keys are shared in the header line and the values vary from one data line to another.
For example, if you have:
<pre>
shape,flag,index
circle,1,24
square,0,36
</pre>
then that's a way of saying:
<pre>
shape=circle,flag=1,index=24
shape=square,flag=0,index=36
</pre>
Other ways to write the same data:
<pre>
CSV PPRINT
shape,flag,index shape flag index
circle,1,24 circle 1 24
square,0,36 square 0 36
JSON XTAB
{ shape circle
"shape": "circle", flag 1
"flag": 1, index 24
"index": 24 .
} shape square
{ flag 0
"shape": "square", index 36
"flag": 0,
"index": 36
}
DKVP
shape=circle,flag=1,index=24
shape=square,flag=0,index=36
</pre>
Anything we can do with CSV input data, we can do with any other format input data. And you can read from one format, do any record-processing, and output to the same format as the input, or to a different output format.
How to specify these to Miller:
* If you use ``--csv`` or ``--json`` or ``--pprint``, etc., then Miller will use that format for input and output.
* If you use ``--icsv`` and ``--ojson`` (note the extra ``i`` and ``o``) then Miller will use CSV for input and JSON for output, etc. See also [Keystroke Savers](keystroke-savers.md) for even shorter options like ``--c2j``.
You can read more about this at the [File Formats](file-formats.md) page.
.. _10min-choices-for-printing-to-files:
## Choices for printing to files
Often we want to print output to the screen. Miller does this by default, as we've seen in the previous examples.
Sometimes, though, we want to print output to another file. Just use **> outputfilenamegoeshere** at the end of your command:
.. code-block:: none
:emphasize-lines: 1,1
mlr --icsv --opprint cat example.csv > newfile.csv
# Output goes to the new file;
# nothing is printed to the screen.
.. code-block:: none
:emphasize-lines: 1,1
cat newfile.csv
color shape flag index quantity rate
yellow triangle true 11 43.6498 9.8870
red square true 15 79.2778 0.0130
red circle true 16 13.8103 2.9010
red square false 48 77.5542 7.4670
purple triangle false 51 81.2290 8.5910
red square false 64 77.1991 9.5310
purple triangle false 65 80.1405 5.8240
yellow circle true 73 63.9785 4.2370
yellow circle true 87 63.5058 8.3350
purple square false 91 72.3735 8.2430
Other times we just want our files to be **changed in-place**: just use **mlr -I**:
.. code-block:: none
:emphasize-lines: 1,1
cp example.csv newfile.txt
.. code-block:: none
:emphasize-lines: 1,1
cat newfile.txt
color,shape,flag,index,quantity,rate
yellow,triangle,true,11,43.6498,9.8870
red,square,true,15,79.2778,0.0130
red,circle,true,16,13.8103,2.9010
red,square,false,48,77.5542,7.4670
purple,triangle,false,51,81.2290,8.5910
red,square,false,64,77.1991,9.5310
purple,triangle,false,65,80.1405,5.8240
yellow,circle,true,73,63.9785,4.2370
yellow,circle,true,87,63.5058,8.3350
purple,square,false,91,72.3735,8.2430
.. code-block:: none
:emphasize-lines: 1,1
mlr -I --csv sort -f shape newfile.txt
.. code-block:: none
:emphasize-lines: 1,1
cat newfile.txt
color,shape,flag,index,quantity,rate
red,circle,true,16,13.8103,2.9010
yellow,circle,true,73,63.9785,4.2370
yellow,circle,true,87,63.5058,8.3350
red,square,true,15,79.2778,0.0130
red,square,false,48,77.5542,7.4670
red,square,false,64,77.1991,9.5310
purple,square,false,91,72.3735,8.2430
yellow,triangle,true,11,43.6498,9.8870
purple,triangle,false,51,81.2290,8.5910
purple,triangle,false,65,80.1405,5.8240
Also using ``mlr -I`` you can bulk-operate on lots of files: e.g.:
.. code-block:: none
:emphasize-lines: 1,1
mlr -I --csv cut -x -f unwanted_column_name *.csv
If you like, you can first copy off your original data somewhere else, before doing in-place operations.
Lastly, using ``tee`` within ``put``, you can split your input data into separate files per one or more field names:
<pre>
<b>mlr --csv --from example.csv put -q 'tee > $shape.".csv", $*'</b>
</pre>
<pre>
<b>cat circle.csv</b>
color,shape,flag,index,quantity,rate
red,circle,true,16,13.8103,2.9010
yellow,circle,true,73,63.9785,4.2370
yellow,circle,true,87,63.5058,8.3350
</pre>
<pre>
<b>cat square.csv</b>
color,shape,flag,index,quantity,rate
red,square,true,15,79.2778,0.0130
red,square,false,48,77.5542,7.4670
red,square,false,64,77.1991,9.5310
purple,square,false,91,72.3735,8.2430
</pre>
<pre>
<b>cat triangle.csv</b>
color,shape,flag,index,quantity,rate
yellow,triangle,true,11,43.6498,9.8870
purple,triangle,false,51,81.2290,8.5910
purple,triangle,false,65,80.1405,5.8240
</pre>

370
docs6b/docs/10min.md.in Normal file
View file

@ -0,0 +1,370 @@
# Miller in 10 minutes
## Obtaining Miller
You can install Miller for various platforms as follows:
* Linux: ``yum install miller`` or ``apt-get install miller`` depending on your flavor of Linux
* MacOS: ``brew install miller`` or ``port install miller`` depending on your preference of [Homebrew](https://brew.sh>`_ or `MacPorts <https://macports.org).
* Windows: ``choco install miller`` using [Chocolatey](https://chocolatey.org).
* You can get latest builds for Linux, MacOS, and Windows by visiting https://github.com/johnkerl/miller/actions, selecting the latest build, and clicking _Artifacts_. (These are retained for 5 days after each commit.)
* See also the [build page](build.md) if you prefer -- in particular, if your platform's package manager doesn't have the latest release.
As a first check, you should be able to run ``mlr --version`` at your system's command prompt and see something like the following:
GENMD_RUN_COMMAND
mlr --version
GENMD_EOF
As a second check, given [example.csv](./example.csv) you should be able to do
GENMD_RUN_COMMAND
mlr --csv cat example.csv
GENMD_EOF
GENMD_RUN_COMMAND
mlr --icsv --opprint cat example.csv
GENMD_EOF
If you run into issues on these checks, please check out the resources on the [community page](community.md) for help.
## Miller verbs
Let's take a quick look at some of the most useful Miller verbs -- file-format-aware, name-index-empowered equivalents of standard system commands.
``mlr cat`` is like system ``cat`` (or ``type`` on Windows) -- it passes the data through unmodified:
GENMD_RUN_COMMAND
mlr --csv cat example.csv
GENMD_EOF
But ``mlr cat`` can also do format conversion -- for example, you can pretty-print in tabular format:
GENMD_RUN_COMMAND
mlr --icsv --opprint cat example.csv
GENMD_EOF
``mlr head`` and ``mlr tail`` count records rather than lines. Whether you're getting the first few records or the last few, the CSV header is included either way:
GENMD_RUN_COMMAND
mlr --csv head -n 4 example.csv
GENMD_EOF
GENMD_RUN_COMMAND
mlr --csv tail -n 4 example.csv
GENMD_EOF
GENMD_RUN_COMMAND
mlr --icsv --ojson tail -n 2 example.csv
GENMD_EOF
You can sort on a single field:
GENMD_RUN_COMMAND
mlr --icsv --opprint sort -f shape example.csv
GENMD_EOF
Or, you can sort primarily alphabetically on one field, then secondarily numerically descending on another field, and so on:
GENMD_RUN_COMMAND
mlr --icsv --opprint sort -f shape -nr index example.csv
GENMD_EOF
If there are fields you don't want to see in your data, you can use ``cut`` to keep only the ones you want, in the same order they appeared in the input data:
GENMD_RUN_COMMAND
mlr --icsv --opprint cut -f flag,shape example.csv
GENMD_EOF
You can also use ``cut -o`` to keep specified fields, but in your preferred order:
GENMD_RUN_COMMAND
mlr --icsv --opprint cut -o -f flag,shape example.csv
GENMD_EOF
You can use ``cut -x`` to omit fields you don't care about:
GENMD_RUN_COMMAND
mlr --icsv --opprint cut -x -f flag,shape example.csv
GENMD_EOF
You can use ``filter`` to keep only records you care about:
GENMD_RUN_COMMAND
mlr --icsv --opprint filter '$color == "red"' example.csv
GENMD_EOF
GENMD_RUN_COMMAND
mlr --icsv --opprint filter '$color == "red" && $flag == true' example.csv
GENMD_EOF
You can use ``put`` to create new fields which are computed from other fields:
GENMD_RUN_COMMAND
mlr --icsv --opprint put '
$ratio = $quantity / $rate;
$color_shape = $color . "_" . $shape
' example.csv
GENMD_EOF
Even though Miller's main selling point is name-indexing, sometimes you really want to refer to a field name by its positional index. Use ``$[[3]]`` to access the name of field 3 or ``$[[[3]]]`` to access the value of field 3:
GENMD_RUN_COMMAND
mlr --icsv --opprint put '$[[3]] = "NEW"' example.csv
GENMD_EOF
GENMD_RUN_COMMAND
mlr --icsv --opprint put '$[[[3]]] = "NEW"' example.csv
GENMD_EOF
You can find the full list of verbs at the [Verbs Reference](reference-verbs.md) page.
## Multiple input files
Miller takes all the files from the command line as an input stream. But it's format-aware, so it doesn't repeat CSV header lines. For example, with input files [data/a.csv](data/a.csv and [data/b.csv](data/b.csv), the system ``cat`` command will repeat header lines:
GENMD_RUN_COMMAND
cat data/a.csv
GENMD_EOF
GENMD_RUN_COMMAND
cat data/b.csv
GENMD_EOF
GENMD_RUN_COMMAND
cat data/a.csv data/b.csv
GENMD_EOF
However, ``mlr cat`` will not:
GENMD_RUN_COMMAND
mlr --csv cat data/a.csv data/b.csv
GENMD_EOF
## Chaining verbs together
Often we want to chain queries together -- for example, sorting by a field and taking the top few values. We can do this using pipes:
GENMD_RUN_COMMAND
mlr --csv sort -nr index example.csv | mlr --icsv --opprint head -n 3
GENMD_EOF
This works fine -- but Miller also lets you chain verbs together using the word ``then``. Think of this as a Miller-internal pipe that lets you use fewer keystrokes:
GENMD_RUN_COMMAND
mlr --icsv --opprint sort -nr index then head -n 3 example.csv
GENMD_EOF
As another convenience, you can put the filename first using ``--from``. When you're interacting with your data at the command line, this makes it easier to up-arrow and append to the previous command:
GENMD_RUN_COMMAND
mlr --icsv --opprint --from example.csv sort -nr index then head -n 3
GENMD_EOF
GENMD_RUN_COMMAND
mlr --icsv --opprint --from example.csv \
sort -nr index \
then head -n 3 \
then cut -f shape,quantity
GENMD_EOF
## Sorts and stats
Now suppose you want to sort the data on a given column, *and then* take the top few in that ordering. You can use Miller's ``then`` feature to pipe commands together.
Here are the records with the top three ``index`` values:
GENMD_RUN_COMMAND
mlr --icsv --opprint sort -nr index then head -n 3 example.csv
GENMD_EOF
Lots of Miller commands take a ``-g`` option for group-by: here, ``head -n 1 -g shape`` outputs the first record for each distinct value of the ``shape`` field. This means we're finding the record with highest ``index`` field for each distinct ``shape`` field:
GENMD_RUN_COMMAND
mlr --icsv --opprint sort -f shape -nr index then head -n 1 -g shape example.csv
GENMD_EOF
Statistics can be computed with or without group-by field(s):
GENMD_RUN_COMMAND
mlr --icsv --opprint --from example.csv \
stats1 -a count,min,mean,max -f quantity -g shape
GENMD_EOF
GENMD_RUN_COMMAND
mlr --icsv --opprint --from example.csv \
stats1 -a count,min,mean,max -f quantity -g shape,color
GENMD_EOF
If your output has a lot of columns, you can use XTAB format to line things up vertically for you instead:
GENMD_RUN_COMMAND
mlr --icsv --oxtab --from example.csv \
stats1 -a p0,p10,p25,p50,p75,p90,p99,p100 -f rate
GENMD_EOF
## File formats and format conversion
Miller supports the following formats:
* CSV (comma-separared values)
* TSV (tab-separated values)
* JSON (JavaScript Object Notation)
* PPRINT (pretty-printed tabular)
* XTAB (vertical-tabular or sideways-tabular)
* NIDX (numerically indexed, label-free, with implicit labels ``"1"``, ``"2"``, etc.)
* DKVP (delimited key-value pairs).
What's a CSV file, really? It's an array of rows, or *records*, each being a list of key-value pairs, or *fields*: for CSV it so happens that all the keys are shared in the header line and the values vary from one data line to another.
For example, if you have:
GENMD_CARDIFY
shape,flag,index
circle,1,24
square,0,36
GENMD_EOF
then that's a way of saying:
GENMD_CARDIFY
shape=circle,flag=1,index=24
shape=square,flag=0,index=36
GENMD_EOF
Other ways to write the same data:
GENMD_CARDIFY
CSV PPRINT
shape,flag,index shape flag index
circle,1,24 circle 1 24
square,0,36 square 0 36
JSON XTAB
{ shape circle
"shape": "circle", flag 1
"flag": 1, index 24
"index": 24 .
} shape square
{ flag 0
"shape": "square", index 36
"flag": 0,
"index": 36
}
DKVP
shape=circle,flag=1,index=24
shape=square,flag=0,index=36
GENMD_EOF
Anything we can do with CSV input data, we can do with any other format input data. And you can read from one format, do any record-processing, and output to the same format as the input, or to a different output format.
How to specify these to Miller:
* If you use ``--csv`` or ``--json`` or ``--pprint``, etc., then Miller will use that format for input and output.
* If you use ``--icsv`` and ``--ojson`` (note the extra ``i`` and ``o``) then Miller will use CSV for input and JSON for output, etc. See also [Keystroke Savers](keystroke-savers.md) for even shorter options like ``--c2j``.
You can read more about this at the [File Formats](file-formats.md) page.
.. _10min-choices-for-printing-to-files:
## Choices for printing to files
Often we want to print output to the screen. Miller does this by default, as we've seen in the previous examples.
Sometimes, though, we want to print output to another file. Just use **> outputfilenamegoeshere** at the end of your command:
.. code-block:: none
:emphasize-lines: 1,1
mlr --icsv --opprint cat example.csv > newfile.csv
# Output goes to the new file;
# nothing is printed to the screen.
.. code-block:: none
:emphasize-lines: 1,1
cat newfile.csv
color shape flag index quantity rate
yellow triangle true 11 43.6498 9.8870
red square true 15 79.2778 0.0130
red circle true 16 13.8103 2.9010
red square false 48 77.5542 7.4670
purple triangle false 51 81.2290 8.5910
red square false 64 77.1991 9.5310
purple triangle false 65 80.1405 5.8240
yellow circle true 73 63.9785 4.2370
yellow circle true 87 63.5058 8.3350
purple square false 91 72.3735 8.2430
Other times we just want our files to be **changed in-place**: just use **mlr -I**:
.. code-block:: none
:emphasize-lines: 1,1
cp example.csv newfile.txt
.. code-block:: none
:emphasize-lines: 1,1
cat newfile.txt
color,shape,flag,index,quantity,rate
yellow,triangle,true,11,43.6498,9.8870
red,square,true,15,79.2778,0.0130
red,circle,true,16,13.8103,2.9010
red,square,false,48,77.5542,7.4670
purple,triangle,false,51,81.2290,8.5910
red,square,false,64,77.1991,9.5310
purple,triangle,false,65,80.1405,5.8240
yellow,circle,true,73,63.9785,4.2370
yellow,circle,true,87,63.5058,8.3350
purple,square,false,91,72.3735,8.2430
.. code-block:: none
:emphasize-lines: 1,1
mlr -I --csv sort -f shape newfile.txt
.. code-block:: none
:emphasize-lines: 1,1
cat newfile.txt
color,shape,flag,index,quantity,rate
red,circle,true,16,13.8103,2.9010
yellow,circle,true,73,63.9785,4.2370
yellow,circle,true,87,63.5058,8.3350
red,square,true,15,79.2778,0.0130
red,square,false,48,77.5542,7.4670
red,square,false,64,77.1991,9.5310
purple,square,false,91,72.3735,8.2430
yellow,triangle,true,11,43.6498,9.8870
purple,triangle,false,51,81.2290,8.5910
purple,triangle,false,65,80.1405,5.8240
Also using ``mlr -I`` you can bulk-operate on lots of files: e.g.:
.. code-block:: none
:emphasize-lines: 1,1
mlr -I --csv cut -x -f unwanted_column_name *.csv
If you like, you can first copy off your original data somewhere else, before doing in-place operations.
Lastly, using ``tee`` within ``put``, you can split your input data into separate files per one or more field names:
GENMD_RUN_COMMAND
mlr --csv --from example.csv put -q 'tee > $shape.".csv", $*'
GENMD_EOF
GENMD_RUN_COMMAND
cat circle.csv
GENMD_EOF
GENMD_RUN_COMMAND
cat square.csv
GENMD_EOF
GENMD_RUN_COMMAND
cat triangle.csv
GENMD_EOF

28
docs6b/docs/Makefile Normal file
View file

@ -0,0 +1,28 @@
# Minimal makefile for Sphinx documentation
#
# Note: run this after make in the ../c directory and make in the ../man directory
# since ../c/mlr is used to autogenerate ../man/manpage.txt which is used in this directory.
# See also https://miller.readthedocs.io/en/latest/build.html#creating-a-new-release-for-developers
# You can set these variables from the command line, and also
# from the environment for the first two.
SPHINXOPTS ?=
SPHINXBUILD ?= sphinx-build
SOURCEDIR = .
BUILDDIR = _build
# Respective MANPATH entries would include /usr/local/share/man or $HOME/man.
INSTALLDIR=/usr/local/share/man/man1
INSTALLHOME=$(HOME)/man/man1
# Put it first so that "make" without argument is like "make help".
help:
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
.PHONY: help Makefile
# Catch-all target: route all unknown targets to Sphinx using the new
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
%: Makefile
./genmds
$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)

40
docs6b/docs/README.md Normal file
View file

@ -0,0 +1,40 @@
# Miller Sphinx docs
## Why use Sphinx
* Connects to https://miller.readthedocs.io so people can get their docmods onto the web instead of the self-hosted https://johnkerl.org/miller/doc. Thanks to @pabloab for the great advice!
* More standard look and feel -- lots of people use readthedocs for other things so this should feel familiar
* We get a Search feature for free
## Contributing
* You need `pip install sphinx` (or `pip3 install sphinx`)
* The docs include lots of live code examples which will be invoked using `mlr` which must be somewhere in your `$PATH`
* Clone https://github.com/johnkerl/miller and cd into `docs/` within your clone
* Editing loop:
* Edit `*.md.in`
* Run `make html`
* Either `open _build/html/index.html` (MacOS) or point your browser to `file:///path/to/your/clone/of/miller/docs/_build/html/index.html`
* Submitting:
* `git add` your modified files, `git commit`, `git push`, and submit a PR at https://github.com/johnkerl/miller
* A nice markup reference: https://www.sphinx-doc.org/en/1.8/usage/restructuredtext/basics.html
## Notes
* CSS:
* I used the Sphinx Classic theme which I like a lot except the colors -- it's a blue scheme and Miller has never been blue.
* Files are in `docs/_static/*.css` where I marked my mods with `/* CHANGE ME */`.
* If you modify the CSS you must run `make clean html` (not just `make html`) then reload in your browser.
* Live code:
* I didn't find a way to include non-Python live-code examples within Sphinx so I adapted the pre-Sphinx Miller-doc strategy which is to have a generator script read a template file (here, `foo.md.in`), run the marked lines, and generate the output file (`foo.md`).
* Edit the `*.md.in` files, not `*.md` directly.
* Within the `*.md.in` files are lines like `GENMD_RUN_COMMAND`. These will be run, and their output included, by `make html` which calls the `genmds` script for you.
* readthedocs:
* https://readthedocs.org/
* https://readthedocs.org/projects/miller/
* https://readthedocs.org/projects/miller/builds/
* https://miller.readthedocs.io/en/latest/
## To do
* Let's all discuss if/how we want the v2 docs to be structured better than the v1 docs.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

BIN
docs6b/docs/_build/doctrees/faq.doctree vendored Normal file

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

BIN
docs6b/docs/_build/doctrees/foo.doctree vendored Normal file

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

BIN
docs6b/docs/_build/doctrees/repl.doctree vendored Normal file

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

BIN
docs6b/docs/_build/doctrees/why.doctree vendored Normal file

Binary file not shown.

4
docs6b/docs/_build/html/.buildinfo vendored Normal file
View file

@ -0,0 +1,4 @@
# Sphinx build info version 1
# This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done.
config: 4993596e7a3406ca4625604594270a0e
tags: 645f666f9bcd5a90fca523b33c5a78b7

2
docs6b/docs/_build/html/10-1.sh vendored Normal file
View file

@ -0,0 +1,2 @@
grep op=cache log.txt \
| mlr --idkvp --opprint stats1 -a mean -f hit -g type then sort -f type

4
docs6b/docs/_build/html/10-2.sh vendored Normal file
View file

@ -0,0 +1,4 @@
mlr --from log.txt --opprint \
filter 'is_present($batch_size)' \
then step -a delta -f time,num_filtered \
then sec2gmt time

588
docs6b/docs/_build/html/10min.html vendored Normal file
View file

@ -0,0 +1,588 @@
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>Miller in 10 minutes &#8212; Miller 6.0.0-alpha documentation</title>
<link rel="stylesheet" href="_static/scrolls.css" type="text/css" />
<link rel="stylesheet" href="_static/pygments.css" type="text/css" />
<link rel="stylesheet" href="_static/print.css" type="text/css" />
<script id="documentation_options" data-url_root="./" src="_static/documentation_options.js"></script>
<script src="_static/jquery.js"></script>
<script src="_static/underscore.js"></script>
<script src="_static/doctools.js"></script>
<script src="_static/language_data.js"></script>
<script src="_static/theme_extras.js"></script>
<link rel="index" title="Index" href="genindex.html" />
<link rel="search" title="Search" href="search.html" />
<link rel="next" title="Keystroke-savers" href="keystroke-savers.html" />
<link rel="prev" title="Introduction" href="introduction.html" />
</head><body>
<div id="content">
<div class="header">
<h1 class="heading"><a href="index.html"
title="back to the documentation overview"><span>Miller in 10 minutes</span></a></h1>
</div>
<div class="relnav" role="navigation" aria-label="related navigation">
<a href="introduction.html">&laquo; Introduction</a> |
<a href="#">Miller in 10 minutes</a>
| <a href="keystroke-savers.html">Keystroke-savers &raquo;</a>
</div>
<div id="contentwrapper">
<div id="toc" role="navigation" aria-label="table of contents navigation">
<h3>Table of Contents</h3>
<ul>
<li><a class="reference internal" href="#">Miller in 10 minutes</a><ul>
<li><a class="reference internal" href="#obtaining-miller">Obtaining Miller</a></li>
<li><a class="reference internal" href="#miller-verbs">Miller verbs</a></li>
<li><a class="reference internal" href="#multiple-input-files">Multiple input files</a></li>
<li><a class="reference internal" href="#chaining-verbs-together">Chaining verbs together</a></li>
<li><a class="reference internal" href="#sorts-and-stats">Sorts and stats</a></li>
<li><a class="reference internal" href="#file-formats-and-format-conversion">File formats and format conversion</a></li>
<li><a class="reference internal" href="#choices-for-printing-to-files">Choices for printing to files</a></li>
</ul>
</li>
</ul>
</div>
<div role="main">
<div class="section" id="miller-in-10-minutes">
<h1>Miller in 10 minutes<a class="headerlink" href="#miller-in-10-minutes" title="Permalink to this headline"></a></h1>
<div class="section" id="obtaining-miller">
<h2>Obtaining Miller<a class="headerlink" href="#obtaining-miller" title="Permalink to this headline"></a></h2>
<p>You can install Miller for various platforms as follows:</p>
<ul class="simple">
<li><p>Linux: <code class="docutils literal notranslate"><span class="pre">yum</span> <span class="pre">install</span> <span class="pre">miller</span></code> or <code class="docutils literal notranslate"><span class="pre">apt-get</span> <span class="pre">install</span> <span class="pre">miller</span></code> depending on your flavor of Linux</p></li>
<li><p>MacOS: <code class="docutils literal notranslate"><span class="pre">brew</span> <span class="pre">install</span> <span class="pre">miller</span></code> or <code class="docutils literal notranslate"><span class="pre">port</span> <span class="pre">install</span> <span class="pre">miller</span></code> depending on your preference of <a class="reference external" href="https://brew.sh">Homebrew</a> or <a class="reference external" href="https://macports.org">MacPorts</a>.</p></li>
<li><p>Windows: <code class="docutils literal notranslate"><span class="pre">choco</span> <span class="pre">install</span> <span class="pre">miller</span></code> using <a class="reference external" href="https://chocolatey.org">Chocolatey</a>.</p></li>
<li><p>You can get latest builds for Linux, MacOS, and Windows by visiting <a class="reference external" href="https://github.com/johnkerl/miller/actions">https://github.com/johnkerl/miller/actions</a>, selecting the latest build, and clicking _Artifacts_. (These are retained for 5 days after each commit.)</p></li>
<li><p>See also <a class="reference internal" href="build.html"><span class="doc">Building from source</span></a> if you prefer in particular, if your platforms package manager doesnt have the latest release.</p></li>
</ul>
<p>As a first check, you should be able to run <code class="docutils literal notranslate"><span class="pre">mlr</span> <span class="pre">--version</span></code> at your systems command prompt and see something like the following:</p>
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> mlr --version
</span> Miller v6.0.0-dev
</pre></div>
</div>
<p>As a second check, given (<a class="reference external" href="./example.csv">example.csv</a>) you should be able to do</p>
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> mlr --csv cat example.csv
</span> color,shape,flag,index,quantity,rate
yellow,triangle,true,11,43.6498,9.8870
red,square,true,15,79.2778,0.0130
red,circle,true,16,13.8103,2.9010
red,square,false,48,77.5542,7.4670
purple,triangle,false,51,81.2290,8.5910
red,square,false,64,77.1991,9.5310
purple,triangle,false,65,80.1405,5.8240
yellow,circle,true,73,63.9785,4.2370
yellow,circle,true,87,63.5058,8.3350
purple,square,false,91,72.3735,8.2430
</pre></div>
</div>
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> mlr --icsv --opprint cat example.csv
</span> color shape flag index quantity rate
yellow triangle true 11 43.6498 9.8870
red square true 15 79.2778 0.0130
red circle true 16 13.8103 2.9010
red square false 48 77.5542 7.4670
purple triangle false 51 81.2290 8.5910
red square false 64 77.1991 9.5310
purple triangle false 65 80.1405 5.8240
yellow circle true 73 63.9785 4.2370
yellow circle true 87 63.5058 8.3350
purple square false 91 72.3735 8.2430
</pre></div>
</div>
<p>If you run into issues on these checks, please check out the resources on the <a class="reference internal" href="community.html"><span class="doc">Community</span></a> page for help.</p>
</div>
<div class="section" id="miller-verbs">
<h2>Miller verbs<a class="headerlink" href="#miller-verbs" title="Permalink to this headline"></a></h2>
<p>Lets take a quick look at some of the most useful Miller verbs file-format-aware, name-index-empowered equivalents of standard system commands.</p>
<p><code class="docutils literal notranslate"><span class="pre">mlr</span> <span class="pre">cat</span></code> is like system <code class="docutils literal notranslate"><span class="pre">cat</span></code> (or <code class="docutils literal notranslate"><span class="pre">type</span></code> on Windows) it passes the data through unmodified:</p>
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> mlr --csv cat example.csv
</span> color,shape,flag,index,quantity,rate
yellow,triangle,true,11,43.6498,9.8870
red,square,true,15,79.2778,0.0130
red,circle,true,16,13.8103,2.9010
red,square,false,48,77.5542,7.4670
purple,triangle,false,51,81.2290,8.5910
red,square,false,64,77.1991,9.5310
purple,triangle,false,65,80.1405,5.8240
yellow,circle,true,73,63.9785,4.2370
yellow,circle,true,87,63.5058,8.3350
purple,square,false,91,72.3735,8.2430
</pre></div>
</div>
<p>But <code class="docutils literal notranslate"><span class="pre">mlr</span> <span class="pre">cat</span></code> can also do format conversion for example, you can pretty-print in tabular format:</p>
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> mlr --icsv --opprint cat example.csv
</span> color shape flag index quantity rate
yellow triangle true 11 43.6498 9.8870
red square true 15 79.2778 0.0130
red circle true 16 13.8103 2.9010
red square false 48 77.5542 7.4670
purple triangle false 51 81.2290 8.5910
red square false 64 77.1991 9.5310
purple triangle false 65 80.1405 5.8240
yellow circle true 73 63.9785 4.2370
yellow circle true 87 63.5058 8.3350
purple square false 91 72.3735 8.2430
</pre></div>
</div>
<p><code class="docutils literal notranslate"><span class="pre">mlr</span> <span class="pre">head</span></code> and <code class="docutils literal notranslate"><span class="pre">mlr</span> <span class="pre">tail</span></code> count records rather than lines. Whether youre getting the first few records or the last few, the CSV header is included either way:</p>
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> mlr --csv head -n 4 example.csv
</span> color,shape,flag,index,quantity,rate
yellow,triangle,true,11,43.6498,9.8870
red,square,true,15,79.2778,0.0130
red,circle,true,16,13.8103,2.9010
red,square,false,48,77.5542,7.4670
</pre></div>
</div>
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> mlr --csv tail -n 4 example.csv
</span> color,shape,flag,index,quantity,rate
purple,triangle,false,65,80.1405,5.8240
yellow,circle,true,73,63.9785,4.2370
yellow,circle,true,87,63.5058,8.3350
purple,square,false,91,72.3735,8.2430
</pre></div>
</div>
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> mlr --icsv --ojson tail -n 2 example.csv
</span> {
&quot;color&quot;: &quot;yellow&quot;,
&quot;shape&quot;: &quot;circle&quot;,
&quot;flag&quot;: true,
&quot;index&quot;: 87,
&quot;quantity&quot;: 63.5058,
&quot;rate&quot;: 8.3350
}
{
&quot;color&quot;: &quot;purple&quot;,
&quot;shape&quot;: &quot;square&quot;,
&quot;flag&quot;: false,
&quot;index&quot;: 91,
&quot;quantity&quot;: 72.3735,
&quot;rate&quot;: 8.2430
}
</pre></div>
</div>
<p>You can sort on a single field:</p>
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> mlr --icsv --opprint sort -f shape example.csv
</span> color shape flag index quantity rate
red circle true 16 13.8103 2.9010
yellow circle true 73 63.9785 4.2370
yellow circle true 87 63.5058 8.3350
red square true 15 79.2778 0.0130
red square false 48 77.5542 7.4670
red square false 64 77.1991 9.5310
purple square false 91 72.3735 8.2430
yellow triangle true 11 43.6498 9.8870
purple triangle false 51 81.2290 8.5910
purple triangle false 65 80.1405 5.8240
</pre></div>
</div>
<p>Or, you can sort primarily alphabetically on one field, then secondarily numerically descending on another field, and so on:</p>
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> mlr --icsv --opprint sort -f shape -nr index example.csv
</span> color shape flag index quantity rate
yellow circle true 87 63.5058 8.3350
yellow circle true 73 63.9785 4.2370
red circle true 16 13.8103 2.9010
purple square false 91 72.3735 8.2430
red square false 64 77.1991 9.5310
red square false 48 77.5542 7.4670
red square true 15 79.2778 0.0130
purple triangle false 65 80.1405 5.8240
purple triangle false 51 81.2290 8.5910
yellow triangle true 11 43.6498 9.8870
</pre></div>
</div>
<p>If there are fields you dont want to see in your data, you can use <code class="docutils literal notranslate"><span class="pre">cut</span></code> to keep only the ones you want, in the same order they appeared in the input data:</p>
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> mlr --icsv --opprint cut -f flag,shape example.csv
</span> shape flag
triangle true
square true
circle true
square false
triangle false
square false
triangle false
circle true
circle true
square false
</pre></div>
</div>
<p>You can also use <code class="docutils literal notranslate"><span class="pre">cut</span> <span class="pre">-o</span></code> to keep specified fields, but in your preferred order:</p>
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> mlr --icsv --opprint cut -o -f flag,shape example.csv
</span> flag shape
true triangle
true square
true circle
false square
false triangle
false square
false triangle
true circle
true circle
false square
</pre></div>
</div>
<p>You can use <code class="docutils literal notranslate"><span class="pre">cut</span> <span class="pre">-x</span></code> to omit fields you dont care about:</p>
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> mlr --icsv --opprint cut -x -f flag,shape example.csv
</span> color index quantity rate
yellow 11 43.6498 9.8870
red 15 79.2778 0.0130
red 16 13.8103 2.9010
red 48 77.5542 7.4670
purple 51 81.2290 8.5910
red 64 77.1991 9.5310
purple 65 80.1405 5.8240
yellow 73 63.9785 4.2370
yellow 87 63.5058 8.3350
purple 91 72.3735 8.2430
</pre></div>
</div>
<p>You can use <code class="docutils literal notranslate"><span class="pre">filter</span></code> to keep only records you care about:</p>
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> mlr --icsv --opprint filter &#39;$color == &quot;red&quot;&#39; example.csv
</span> color shape flag index quantity rate
red square true 15 79.2778 0.0130
red circle true 16 13.8103 2.9010
red square false 48 77.5542 7.4670
red square false 64 77.1991 9.5310
</pre></div>
</div>
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> mlr --icsv --opprint filter &#39;$color == &quot;red&quot; &amp;&amp; $flag == true&#39; example.csv
</span> color shape flag index quantity rate
red square true 15 79.2778 0.0130
red circle true 16 13.8103 2.9010
</pre></div>
</div>
<p>You can use <code class="docutils literal notranslate"><span class="pre">put</span></code> to create new fields which are computed from other fields:</p>
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> mlr --icsv --opprint put &#39;
</span><span class="hll"> $ratio = $quantity / $rate;
</span><span class="hll"> $color_shape = $color . &quot;_&quot; . $shape
</span><span class="hll"> &#39; example.csv
</span> color shape flag index quantity rate ratio color_shape
yellow triangle true 11 43.6498 9.8870 4.414868008496004 yellow_triangle
red square true 15 79.2778 0.0130 6098.292307692308 red_square
red circle true 16 13.8103 2.9010 4.760530851430541 red_circle
red square false 48 77.5542 7.4670 10.386259541984733 red_square
purple triangle false 51 81.2290 8.5910 9.455127458968688 purple_triangle
red square false 64 77.1991 9.5310 8.099790158430384 red_square
purple triangle false 65 80.1405 5.8240 13.760388049450551 purple_triangle
yellow circle true 73 63.9785 4.2370 15.09995279679018 yellow_circle
yellow circle true 87 63.5058 8.3350 7.619172165566886 yellow_circle
purple square false 91 72.3735 8.2430 8.779995147397793 purple_square
</pre></div>
</div>
<p>Even though Millers main selling point is name-indexing, sometimes you really want to refer to a field name by its positional index. Use <code class="docutils literal notranslate"><span class="pre">$[[3]]</span></code> to access the name of field 3 or <code class="docutils literal notranslate"><span class="pre">$[[[3]]]</span></code> to access the value of field 3:</p>
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> mlr --icsv --opprint put &#39;$[[3]] = &quot;NEW&quot;&#39; example.csv
</span> color shape NEW index quantity rate
yellow triangle true 11 43.6498 9.8870
red square true 15 79.2778 0.0130
red circle true 16 13.8103 2.9010
red square false 48 77.5542 7.4670
purple triangle false 51 81.2290 8.5910
red square false 64 77.1991 9.5310
purple triangle false 65 80.1405 5.8240
yellow circle true 73 63.9785 4.2370
yellow circle true 87 63.5058 8.3350
purple square false 91 72.3735 8.2430
</pre></div>
</div>
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> mlr --icsv --opprint put &#39;$[[[3]]] = &quot;NEW&quot;&#39; example.csv
</span> color shape flag index quantity rate
yellow triangle NEW 11 43.6498 9.8870
red square NEW 15 79.2778 0.0130
red circle NEW 16 13.8103 2.9010
red square NEW 48 77.5542 7.4670
purple triangle NEW 51 81.2290 8.5910
red square NEW 64 77.1991 9.5310
purple triangle NEW 65 80.1405 5.8240
yellow circle NEW 73 63.9785 4.2370
yellow circle NEW 87 63.5058 8.3350
purple square NEW 91 72.3735 8.2430
</pre></div>
</div>
<p>You can find the full list of verbs at the <a class="reference internal" href="reference-verbs.html"><span class="doc">Reference: list of verbs</span></a> page.</p>
</div>
<div class="section" id="multiple-input-files">
<h2>Multiple input files<a class="headerlink" href="#multiple-input-files" title="Permalink to this headline"></a></h2>
<p>Miller takes all the files from the command line as an input stream. But its format-aware, so it doesnt repeat CSV header lines. For example, with input files (<a class="reference external" href="data/a.csv">data/a.csv</a>) and (<a class="reference external" href="data/b.csv">data/b.csv</a>), the system <code class="docutils literal notranslate"><span class="pre">cat</span></code> command will repeat header lines:</p>
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> cat data/a.csv
</span> a,b,c
1,2,3
4,5,6
</pre></div>
</div>
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> cat data/b.csv
</span> a,b,c
7,8,9
</pre></div>
</div>
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> cat data/a.csv data/b.csv
</span> a,b,c
1,2,3
4,5,6
a,b,c
7,8,9
</pre></div>
</div>
<p>However, <code class="docutils literal notranslate"><span class="pre">mlr</span> <span class="pre">cat</span></code> will not:</p>
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> mlr --csv cat data/a.csv data/b.csv
</span> a,b,c
1,2,3
4,5,6
7,8,9
</pre></div>
</div>
</div>
<div class="section" id="chaining-verbs-together">
<h2>Chaining verbs together<a class="headerlink" href="#chaining-verbs-together" title="Permalink to this headline"></a></h2>
<p>Often we want to chain queries together for example, sorting by a field and taking the top few values. We can do this using pipes:</p>
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> mlr --csv sort -nr index example.csv | mlr --icsv --opprint head -n 3
</span> color shape flag index quantity rate
purple square false 91 72.3735 8.2430
yellow circle true 87 63.5058 8.3350
yellow circle true 73 63.9785 4.2370
</pre></div>
</div>
<p>This works fine but Miller also lets you chain verbs together using the word <code class="docutils literal notranslate"><span class="pre">then</span></code>. Think of this as a Miller-internal pipe that lets you use fewer keystrokes:</p>
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> mlr --icsv --opprint sort -nr index then head -n 3 example.csv
</span> color shape flag index quantity rate
purple square false 91 72.3735 8.2430
yellow circle true 87 63.5058 8.3350
yellow circle true 73 63.9785 4.2370
</pre></div>
</div>
<p>As another convenience, you can put the filename first using <code class="docutils literal notranslate"><span class="pre">--from</span></code>. When youre interacting with your data at the command line, this makes it easier to up-arrow and append to the previous command:</p>
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> mlr --icsv --opprint --from example.csv sort -nr index then head -n 3
</span> color shape flag index quantity rate
purple square false 91 72.3735 8.2430
yellow circle true 87 63.5058 8.3350
yellow circle true 73 63.9785 4.2370
</pre></div>
</div>
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> mlr --icsv --opprint --from example.csv \
</span><span class="hll"> sort -nr index \
</span><span class="hll"> then head -n 3 \
</span><span class="hll"> then cut -f shape,quantity
</span> shape quantity
square 72.3735
circle 63.5058
circle 63.9785
</pre></div>
</div>
</div>
<div class="section" id="sorts-and-stats">
<h2>Sorts and stats<a class="headerlink" href="#sorts-and-stats" title="Permalink to this headline"></a></h2>
<p>Now suppose you want to sort the data on a given column, <em>and then</em> take the top few in that ordering. You can use Millers <code class="docutils literal notranslate"><span class="pre">then</span></code> feature to pipe commands together.</p>
<p>Here are the records with the top three <code class="docutils literal notranslate"><span class="pre">index</span></code> values:</p>
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> mlr --icsv --opprint sort -nr index then head -n 3 example.csv
</span> color shape flag index quantity rate
purple square false 91 72.3735 8.2430
yellow circle true 87 63.5058 8.3350
yellow circle true 73 63.9785 4.2370
</pre></div>
</div>
<p>Lots of Miller commands take a <code class="docutils literal notranslate"><span class="pre">-g</span></code> option for group-by: here, <code class="docutils literal notranslate"><span class="pre">head</span> <span class="pre">-n</span> <span class="pre">1</span> <span class="pre">-g</span> <span class="pre">shape</span></code> outputs the first record for each distinct value of the <code class="docutils literal notranslate"><span class="pre">shape</span></code> field. This means were finding the record with highest <code class="docutils literal notranslate"><span class="pre">index</span></code> field for each distinct <code class="docutils literal notranslate"><span class="pre">shape</span></code> field:</p>
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> mlr --icsv --opprint sort -f shape -nr index then head -n 1 -g shape example.csv
</span> color shape flag index quantity rate
yellow circle true 87 63.5058 8.3350
purple square false 91 72.3735 8.2430
purple triangle false 65 80.1405 5.8240
</pre></div>
</div>
<p>Statistics can be computed with or without group-by field(s):</p>
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> mlr --icsv --opprint --from example.csv \
</span><span class="hll"> stats1 -a count,min,mean,max -f quantity -g shape
</span> shape quantity_count quantity_min quantity_mean quantity_max
triangle 3 43.6498 68.33976666666666 81.229
square 4 72.3735 76.60114999999999 79.2778
circle 3 13.8103 47.0982 63.9785
</pre></div>
</div>
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> mlr --icsv --opprint --from example.csv \
</span><span class="hll"> stats1 -a count,min,mean,max -f quantity -g shape,color
</span> shape color quantity_count quantity_min quantity_mean quantity_max
triangle yellow 1 43.6498 43.6498 43.6498
square red 3 77.1991 78.01036666666666 79.2778
circle red 1 13.8103 13.8103 13.8103
triangle purple 2 80.1405 80.68475000000001 81.229
circle yellow 2 63.5058 63.742149999999995 63.9785
square purple 1 72.3735 72.3735 72.3735
</pre></div>
</div>
<p>If your output has a lot of columns, you can use XTAB format to line things up vertically for you instead:</p>
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> mlr --icsv --oxtab --from example.csv \
</span><span class="hll"> stats1 -a p0,p10,p25,p50,p75,p90,p99,p100 -f rate
</span> rate_p0 0.0130
rate_p10 2.9010
rate_p25 4.2370
rate_p50 8.2430
rate_p75 8.5910
rate_p90 9.8870
rate_p99 9.8870
rate_p100 9.8870
</pre></div>
</div>
</div>
<div class="section" id="file-formats-and-format-conversion">
<h2>File formats and format conversion<a class="headerlink" href="#file-formats-and-format-conversion" title="Permalink to this headline"></a></h2>
<p>Miller supports the following formats:</p>
<ul class="simple">
<li><p>CSV (comma-separared values)</p></li>
<li><p>TSV (tab-separated values)</p></li>
<li><p>JSON (JavaScript Object Notation)</p></li>
<li><p>PPRINT (pretty-printed tabular)</p></li>
<li><p>XTAB (vertical-tabular or sideways-tabular)</p></li>
<li><p>NIDX (numerically indexed, label-free, with implicit labels <code class="docutils literal notranslate"><span class="pre">&quot;1&quot;</span></code>, <code class="docutils literal notranslate"><span class="pre">&quot;2&quot;</span></code>, etc.)</p></li>
<li><p>DKVP (delimited key-value pairs).</p></li>
</ul>
<p>Whats a CSV file, really? Its an array of rows, or <em>records</em>, each being a list of key-value pairs, or <em>fields</em>: for CSV it so happens that all the keys are shared in the header line and the values vary from one data line to another.</p>
<p>For example, if you have:</p>
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>shape,flag,index
circle,1,24
square,0,36
</pre></div>
</div>
<p>then thats a way of saying:</p>
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>shape=circle,flag=1,index=24
shape=square,flag=0,index=36
</pre></div>
</div>
<p>Other ways to write the same data:</p>
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>CSV PPRINT
shape,flag,index shape flag index
circle,1,24 circle 1 24
square,0,36 square 0 36
JSON XTAB
{ shape circle
&quot;shape&quot;: &quot;circle&quot;, flag 1
&quot;flag&quot;: 1, index 24
&quot;index&quot;: 24 .
} shape square
{ flag 0
&quot;shape&quot;: &quot;square&quot;, index 36
&quot;flag&quot;: 0,
&quot;index&quot;: 36
}
DKVP
shape=circle,flag=1,index=24
shape=square,flag=0,index=36
</pre></div>
</div>
<p>Anything we can do with CSV input data, we can do with any other format input data. And you can read from one format, do any record-processing, and output to the same format as the input, or to a different output format.</p>
<p>How to specify these to Miller:</p>
<ul class="simple">
<li><p>If you use <code class="docutils literal notranslate"><span class="pre">--csv</span></code> or <code class="docutils literal notranslate"><span class="pre">--json</span></code> or <code class="docutils literal notranslate"><span class="pre">--pprint</span></code>, etc., then Miller will use that format for input and output.</p></li>
<li><p>If you use <code class="docutils literal notranslate"><span class="pre">--icsv</span></code> and <code class="docutils literal notranslate"><span class="pre">--ojson</span></code> (note the extra <code class="docutils literal notranslate"><span class="pre">i</span></code> and <code class="docutils literal notranslate"><span class="pre">o</span></code>) then Miller will use CSV for input and JSON for output, etc. See also <a class="reference internal" href="keystroke-savers.html"><span class="doc">Keystroke-savers</span></a> for even shorter options like <code class="docutils literal notranslate"><span class="pre">--c2j</span></code>.</p></li>
</ul>
<p>You can read more about this at the <a class="reference internal" href="file-formats.html"><span class="doc">File formats</span></a> page.</p>
</div>
<div class="section" id="choices-for-printing-to-files">
<span id="min-choices-for-printing-to-files"></span><h2>Choices for printing to files<a class="headerlink" href="#choices-for-printing-to-files" title="Permalink to this headline"></a></h2>
<p>Often we want to print output to the screen. Miller does this by default, as weve seen in the previous examples.</p>
<p>Sometimes, though, we want to print output to another file. Just use <strong>&gt; outputfilenamegoeshere</strong> at the end of your command:</p>
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> mlr --icsv --opprint cat example.csv &gt; newfile.csv
</span> # Output goes to the new file;
# nothing is printed to the screen.
</pre></div>
</div>
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> cat newfile.csv
</span> color shape flag index quantity rate
yellow triangle true 11 43.6498 9.8870
red square true 15 79.2778 0.0130
red circle true 16 13.8103 2.9010
red square false 48 77.5542 7.4670
purple triangle false 51 81.2290 8.5910
red square false 64 77.1991 9.5310
purple triangle false 65 80.1405 5.8240
yellow circle true 73 63.9785 4.2370
yellow circle true 87 63.5058 8.3350
purple square false 91 72.3735 8.2430
</pre></div>
</div>
<p>Other times we just want our files to be <strong>changed in-place</strong>: just use <strong>mlr -I</strong>:</p>
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> cp example.csv newfile.txt
</span></pre></div>
</div>
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> cat newfile.txt
</span> color,shape,flag,index,quantity,rate
yellow,triangle,true,11,43.6498,9.8870
red,square,true,15,79.2778,0.0130
red,circle,true,16,13.8103,2.9010
red,square,false,48,77.5542,7.4670
purple,triangle,false,51,81.2290,8.5910
red,square,false,64,77.1991,9.5310
purple,triangle,false,65,80.1405,5.8240
yellow,circle,true,73,63.9785,4.2370
yellow,circle,true,87,63.5058,8.3350
purple,square,false,91,72.3735,8.2430
</pre></div>
</div>
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> mlr -I --csv sort -f shape newfile.txt
</span></pre></div>
</div>
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> cat newfile.txt
</span> color,shape,flag,index,quantity,rate
red,circle,true,16,13.8103,2.9010
yellow,circle,true,73,63.9785,4.2370
yellow,circle,true,87,63.5058,8.3350
red,square,true,15,79.2778,0.0130
red,square,false,48,77.5542,7.4670
red,square,false,64,77.1991,9.5310
purple,square,false,91,72.3735,8.2430
yellow,triangle,true,11,43.6498,9.8870
purple,triangle,false,51,81.2290,8.5910
purple,triangle,false,65,80.1405,5.8240
</pre></div>
</div>
<p>Also using <code class="docutils literal notranslate"><span class="pre">mlr</span> <span class="pre">-I</span></code> you can bulk-operate on lots of files: e.g.:</p>
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> mlr -I --csv cut -x -f unwanted_column_name *.csv
</span></pre></div>
</div>
<p>If you like, you can first copy off your original data somewhere else, before doing in-place operations.</p>
<p>Lastly, using <code class="docutils literal notranslate"><span class="pre">tee</span></code> within <code class="docutils literal notranslate"><span class="pre">put</span></code>, you can split your input data into separate files per one or more field names:</p>
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> mlr --csv --from example.csv put -q &#39;tee &gt; $shape.&quot;.csv&quot;, $*&#39;
</span></pre></div>
</div>
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> cat circle.csv
</span> color,shape,flag,index,quantity,rate
red,circle,true,16,13.8103,2.9010
yellow,circle,true,73,63.9785,4.2370
yellow,circle,true,87,63.5058,8.3350
</pre></div>
</div>
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> cat square.csv
</span> color,shape,flag,index,quantity,rate
red,square,true,15,79.2778,0.0130
red,square,false,48,77.5542,7.4670
red,square,false,64,77.1991,9.5310
purple,square,false,91,72.3735,8.2430
</pre></div>
</div>
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> cat triangle.csv
</span> color,shape,flag,index,quantity,rate
yellow,triangle,true,11,43.6498,9.8870
purple,triangle,false,51,81.2290,8.5910
purple,triangle,false,65,80.1405,5.8240
</pre></div>
</div>
</div>
</div>
</div>
</div>
</div>
<div class="footer" role="contentinfo">
&#169; Copyright 2021, John Kerl.
Created using <a href="https://www.sphinx-doc.org/">Sphinx</a> 3.2.1.
</div>
</body>
</html>

Some files were not shown because too many files have changed in this diff Show more