mirror of
https://github.com/johnkerl/miller.git
synced 2026-01-23 02:14:13 +00:00
215 lines
8.1 KiB
ReStructuredText
215 lines
8.1 KiB
ReStructuredText
Miller in 10 minutes
|
|
====================
|
|
|
|
CSV-file examples
|
|
^^^^^^^^^^^^^^^^^
|
|
|
|
Suppose you have this CSV data file::
|
|
|
|
POKI_RUN_COMMAND{{cat example.csv}}HERE
|
|
|
|
``mlr cat`` is like cat -- it passes the data through unmodified::
|
|
|
|
POKI_RUN_COMMAND{{mlr --csv cat example.csv}}HERE
|
|
|
|
but it can also do format conversion (here, you can pretty-print in tabular format)::
|
|
|
|
POKI_RUN_COMMAND{{mlr --icsv --opprint cat example.csv}}HERE
|
|
|
|
``mlr head`` and ``mlr tail`` count records rather than lines. Whether you're getting the first few records or the last few, the CSV header is included either way::
|
|
|
|
POKI_RUN_COMMAND{{mlr --csv head -n 4 example.csv}}HERE
|
|
|
|
::
|
|
|
|
POKI_RUN_COMMAND{{mlr --csv tail -n 4 example.csv}}HERE
|
|
|
|
You can sort primarily alphabetically on one field, then secondarily numerically descending on another field::
|
|
|
|
POKI_RUN_COMMAND{{mlr --icsv --opprint sort -f shape -nr index example.csv}}HERE
|
|
|
|
You can use ``cut`` to retain only specified fields, in the same order they appeared in the input data::
|
|
|
|
POKI_RUN_COMMAND{{mlr --icsv --opprint cut -f flag,shape example.csv}}HERE
|
|
|
|
You can also use ``cut -o`` to retain only specified fields in your preferred order::
|
|
|
|
POKI_RUN_COMMAND{{mlr --icsv --opprint cut -o -f flag,shape example.csv}}HERE
|
|
|
|
You can use ``cut -x`` to omit fields you don't care about::
|
|
|
|
POKI_RUN_COMMAND{{mlr --icsv --opprint cut -x -f flag,shape example.csv}}HERE
|
|
|
|
You can use ``filter`` to keep only records you care about::
|
|
|
|
POKI_RUN_COMMAND{{mlr --icsv --opprint filter '$color == "red"' example.csv}}HERE
|
|
|
|
::
|
|
|
|
POKI_RUN_COMMAND{{mlr --icsv --opprint filter '$color == "red" && $flag == 1' example.csv}}HERE
|
|
|
|
You can use ``put`` to create new fields which are computed from other fields::
|
|
|
|
POKI_RUN_COMMAND{{mlr --icsv --opprint put '$ratio = $quantity / $rate; $color_shape = $color . "_" . $shape' example.csv}}HERE
|
|
|
|
Even though Miller's main selling point is name-indexing, sometimes you really want to refer to a field name by its positional index. Use ``$[[3]]`` to access the name of field 3 or ``$[[[3]]]`` to access the value of field 3::
|
|
|
|
POKI_RUN_COMMAND{{mlr --icsv --opprint put '$[[3]] = "NEW"' example.csv}}HERE
|
|
|
|
::
|
|
|
|
POKI_RUN_COMMAND{{mlr --icsv --opprint put '$[[[3]]] = "NEW"' example.csv}}HERE
|
|
|
|
JSON-file examples
|
|
^^^^^^^^^^^^^^^^^^
|
|
|
|
OK, CSV and pretty-print are fine. But Miller can also convert between a few other formats -- let's take a look at JSON output::
|
|
|
|
POKI_RUN_COMMAND{{mlr --icsv --ojson put '$ratio = $quantity/$rate; $shape = toupper($shape)' example.csv}}HERE
|
|
|
|
Sorts and stats
|
|
^^^^^^^^^^^^^^^
|
|
|
|
Now suppose you want to sort the data on a given column, *and then* take the top few in that ordering. You can use Miller's ``then`` feature to pipe commands together.
|
|
|
|
Here are the records with the top three ``index`` values::
|
|
|
|
POKI_RUN_COMMAND{{mlr --icsv --opprint sort -f shape -nr index then head -n 3 example.csv}}HERE
|
|
|
|
Lots of Miller commands take a ``-g`` option for group-by: here, ``head -n 1 -g shape`` outputs the first record for each distinct value of the ``shape`` field. This means we're finding the record with highest ``index`` field for each distinct ``shape`` field::
|
|
|
|
POKI_RUN_COMMAND{{mlr --icsv --opprint sort -f shape -nr index then head -n 1 -g shape example.csv}}HERE
|
|
|
|
Statistics can be computed with or without group-by field(s)::
|
|
|
|
POKI_RUN_COMMAND{{mlr --icsv --opprint --from example.csv stats1 -a count,min,mean,max -f quantity -g shape}}HERE
|
|
|
|
::
|
|
|
|
POKI_RUN_COMMAND{{mlr --icsv --opprint --from example.csv stats1 -a count,min,mean,max -f quantity -g shape,color}}HERE
|
|
|
|
If your output has a lot of columns, you can use XTAB format to line things up vertically for you instead::
|
|
|
|
POKI_RUN_COMMAND{{mlr --icsv --oxtab --from example.csv stats1 -a p0,p10,p25,p50,p75,p90,p99,p100 -f rate}}HERE
|
|
|
|
.. _10min-choices-for-printing-to-files:
|
|
|
|
Choices for printing to files
|
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
Often we want to print output to the screen. Miller does this by default, as we've seen in the previous examples.
|
|
|
|
Sometimes we want to print output to another file: just use **> outputfilenamegoeshere** at the end of your command:
|
|
|
|
::
|
|
|
|
% mlr --icsv --opprint cat example.csv > newfile.csv
|
|
# Output goes to the new file;
|
|
# nothing is printed to the screen.
|
|
|
|
::
|
|
|
|
% cat newfile.csv
|
|
color shape flag index quantity rate
|
|
yellow triangle 1 11 43.6498 9.8870
|
|
red square 1 15 79.2778 0.0130
|
|
red circle 1 16 13.8103 2.9010
|
|
red square 0 48 77.5542 7.4670
|
|
purple triangle 0 51 81.2290 8.5910
|
|
red square 0 64 77.1991 9.5310
|
|
purple triangle 0 65 80.1405 5.8240
|
|
yellow circle 1 73 63.9785 4.2370
|
|
yellow circle 1 87 63.5058 8.3350
|
|
purple square 0 91 72.3735 8.2430
|
|
|
|
Other times we just want our files to be **changed in-place**: just use **mlr -I**::
|
|
|
|
% cp example.csv newfile.txt
|
|
|
|
% cat newfile.txt
|
|
color,shape,flag,index,quantity,rate
|
|
yellow,triangle,1,11,43.6498,9.8870
|
|
red,square,1,15,79.2778,0.0130
|
|
red,circle,1,16,13.8103,2.9010
|
|
red,square,0,48,77.5542,7.4670
|
|
purple,triangle,0,51,81.2290,8.5910
|
|
red,square,0,64,77.1991,9.5310
|
|
purple,triangle,0,65,80.1405,5.8240
|
|
yellow,circle,1,73,63.9785,4.2370
|
|
yellow,circle,1,87,63.5058,8.3350
|
|
purple,square,0,91,72.3735,8.2430
|
|
|
|
% mlr -I --icsv --opprint cat newfile.txt
|
|
|
|
% cat newfile.txt
|
|
color shape flag index quantity rate
|
|
yellow triangle 1 11 43.6498 9.8870
|
|
red square 1 15 79.2778 0.0130
|
|
red circle 1 16 13.8103 2.9010
|
|
red square 0 48 77.5542 7.4670
|
|
purple triangle 0 51 81.2290 8.5910
|
|
red square 0 64 77.1991 9.5310
|
|
purple triangle 0 65 80.1405 5.8240
|
|
yellow circle 1 73 63.9785 4.2370
|
|
yellow circle 1 87 63.5058 8.3350
|
|
purple square 0 91 72.3735 8.2430
|
|
|
|
Also using ``mlr -I`` you can bulk-operate on lots of files: e.g.::
|
|
|
|
mlr -I --csv cut -x -f unwanted_column_name *.csv
|
|
|
|
If you like, you can first copy off your original data somewhere else, before doing in-place operations.
|
|
|
|
Lastly, using ``tee`` within ``put``, you can split your input data into separate files per one or more field names::
|
|
|
|
POKI_RUN_COMMAND{{mlr --csv --from example.csv put -q 'tee > $shape.".csv", $*'}}HERE
|
|
|
|
::
|
|
|
|
POKI_RUN_COMMAND{{cat circle.csv}}HERE
|
|
|
|
::
|
|
|
|
POKI_RUN_COMMAND{{cat square.csv}}HERE
|
|
|
|
::
|
|
|
|
POKI_RUN_COMMAND{{cat triangle.csv}}HERE
|
|
|
|
Other-format examples
|
|
^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
What's a CSV file, really? It's an array of rows, or *records*, each being a list of key-value pairs, or *fields*: for CSV it so happens that all the keys are shared in the header line and the values vary data line by data line.
|
|
|
|
For example, if you have::
|
|
|
|
shape,flag,index
|
|
circle,1,24
|
|
square,0,36
|
|
|
|
then that's a way of saying::
|
|
|
|
shape=circle,flag=1,index=24
|
|
shape=square,flag=0,index=36
|
|
|
|
Data written this way are called **DKVP**, for *delimited key-value pairs*.
|
|
|
|
We've also already seen other ways to write the same data::
|
|
|
|
CSV PPRINT JSON
|
|
shape,flag,index shape flag index [
|
|
circle,1,24 circle 1 24 {
|
|
square,0,36 square 0 36 "shape": "circle",
|
|
"flag": 1,
|
|
"index": 24
|
|
},
|
|
DKVP XTAB {
|
|
shape=circle,flag=1,index=24 shape circle "shape": "square",
|
|
shape=square,flag=0,index=36 flag 1 "flag": 0,
|
|
index 24 "index": 36
|
|
}
|
|
shape square ]
|
|
flag 0
|
|
index 36
|
|
|
|
Anything we can do with CSV input data, we can do with any other format input data. And you can read from one format, do any record-processing, and output to the same format as the input, or to a different output format.
|