mirror of
https://github.com/johnkerl/miller.git
synced 2026-01-23 02:14:13 +00:00
Added a manpage for mlr
This commit is contained in:
parent
4cf9f76073
commit
4e457e3b6c
2 changed files with 491 additions and 1 deletions
10
Makefile
10
Makefile
|
|
@ -1,4 +1,7 @@
|
|||
all: c
|
||||
MANDIR ?= /usr/share/man
|
||||
DESTDIR ?=
|
||||
|
||||
all: c manpage
|
||||
devall: c install doc
|
||||
# TODO: the install target exists to put most-recent mlr executable in the
|
||||
# path to be picked up by the mlr-execs in the docs dir. better would be to
|
||||
|
|
@ -9,7 +12,12 @@ doc: .always
|
|||
cd doc && poki
|
||||
install: .always
|
||||
make -C c install
|
||||
install -d -m 0755 $(DESTDIR)/$(mandir)
|
||||
install -m 0644 doc/miller.1 $(DESTDIR)/$(mandir)
|
||||
clean: .always
|
||||
make -C c clean
|
||||
.PHONY: manpage
|
||||
manpage: doc/miller.1.txt
|
||||
( cd doc && a2x a2x -d manpage -f manpage miller.1.txt )
|
||||
.always:
|
||||
@true
|
||||
|
|
|
|||
482
doc/miller.1.txt
Normal file
482
doc/miller.1.txt
Normal file
|
|
@ -0,0 +1,482 @@
|
|||
MILLER(1)
|
||||
=========
|
||||
|
||||
NAME
|
||||
----
|
||||
|
||||
miller - sed, awk, cut, join, and sort for name-indexed data such as CSV
|
||||
|
||||
SYNOPSIS
|
||||
--------
|
||||
|
||||
mlr [I/O options] {verb} [verb-dependent options ...] {file names}
|
||||
|
||||
|
||||
DESCRIPTION
|
||||
-----------
|
||||
|
||||
With Miller you get to use named fields without needing to count positional
|
||||
indices. This is something the Unix toolkit always could have done, and
|
||||
arguably always should have done. It operates on key-value-pair data while the
|
||||
familiar Unix tools operate on integer-indexed fields: if the natural data
|
||||
structure for the latter is the array, then Miller’s natural data structure is
|
||||
the insertion-ordered hash map. This encompasses a variety of data formats,
|
||||
including but not limited to the familiar CSV. (Miller can handle
|
||||
positionally-indexed data as a special case.)
|
||||
|
||||
|
||||
EXAMPLES
|
||||
--------
|
||||
|
||||
% mlr --csv cut -f hostname,uptime mydata.csv
|
||||
% mlr --csv sort -f hostname,uptime mydata.csv
|
||||
% mlr --csv put '$z = $x + 2.7*$y' mydata.csv
|
||||
% mlr --csv filter '$status != "down"' mydata.csv
|
||||
|
||||
OPTIONS
|
||||
-------
|
||||
|
||||
In the following option flags, the version with "i" designates the input
|
||||
stream, "o" the output stream, and the version without prefix sets the option
|
||||
for both input and output stream. For example: +--irs+ sets the input record
|
||||
separator, +--ors+ the output record separator, and +--rs+ sets both the input
|
||||
and output separator to the given value.
|
||||
|
||||
SEPARATOR
|
||||
~~~~~~~~~
|
||||
|
||||
--rs, --irs, --ors::
|
||||
Record separators, defaulting to newline
|
||||
--fs, --ifs, --ofs, --repifs::
|
||||
Field separators, defaulting to ","
|
||||
--ps, --ips, --ops::
|
||||
Pair separators, defaulting to "="
|
||||
|
||||
DATA-FORMAT
|
||||
~~~~~~~~~~~
|
||||
|
||||
--dkvp, --idkvp, --odkvp::
|
||||
Delimited key-value pairs, e.g "a=1,b=2" (default)
|
||||
--nidx, --inidx, --onidx::
|
||||
Implicitly-integer-indexed fields (Unix-toolkit style)
|
||||
--csv, --icsv, --ocsv::
|
||||
Comma-separated value (or tab-separated with --fs tab, etc.)
|
||||
--pprint, --ipprint, --opprint, --right::
|
||||
Pretty-printed tabular (produces no output until all input is in)
|
||||
--pprint, --ipprint, --opprint, --right::
|
||||
Pretty-printed tabular (produces no output until all input is in)
|
||||
|
||||
+-p+ is a keystroke-saver for +--nidx --fs space --repifs+
|
||||
|
||||
NUMERICAL FORMAT
|
||||
~~~~~~~~~~~~~~~~
|
||||
|
||||
--ofmt {format}::
|
||||
Sets the numerical format given a printf-style format string.
|
||||
|
||||
OTHER
|
||||
~~~~~
|
||||
|
||||
--seed {n}::
|
||||
Seeds the random number generator used for put/filter +urand()+ with a
|
||||
number n of the form 12345678 or 0xcafefeed.
|
||||
|
||||
|
||||
VERBS
|
||||
~~~~~
|
||||
|
||||
cat
|
||||
^^^
|
||||
|
||||
Usage: +mlr cat+
|
||||
|
||||
Passes input records directly to output. Most useful for format conversion.
|
||||
|
||||
check
|
||||
^^^^^
|
||||
|
||||
Usage: +mlr check+
|
||||
|
||||
Consumes records without printing any output. Useful for doing a
|
||||
well-formatted check on input data.
|
||||
|
||||
count-distinct
|
||||
^^^^^^^^^^^^^^
|
||||
|
||||
Usage: +mlr count-distinct [options]+
|
||||
|
||||
Prints number of records having distinct values for specified field names.
|
||||
Same as +uniq -c+.
|
||||
|
||||
-f {a,b,c}::
|
||||
Field names for distinct count.
|
||||
|
||||
cut
|
||||
^^^
|
||||
|
||||
Usage: +mlr cut [options]+
|
||||
|
||||
Passes through input records with specified fields included/excluded.
|
||||
|
||||
-f {a,b,c}::
|
||||
Field names to include for cut.
|
||||
-o::
|
||||
Retain fields in the order specified here in the argument list.
|
||||
Default is to retain them in the order found in the input data.
|
||||
-x|--complement::
|
||||
Exclude, rather that include, field names specified by -f.
|
||||
|
||||
filter
|
||||
^^^^^^
|
||||
|
||||
Usage: +mlr filter [-v] {expression}+
|
||||
|
||||
Prints records for which {expression} evaluates to true. With -v, first
|
||||
prints the AST (abstract syntax tree) for the expression, which gives full
|
||||
transparency on the precedence and associativity rules of Miller's grammar.
|
||||
Please use a dollar sign for field names and double-quotes for string
|
||||
literals. Miller built-in variables are +NF+, +NR+, +FNR+, +FILENUM+,
|
||||
+FILENAME+, +PI+, +E+.
|
||||
|
||||
Examples:
|
||||
|
||||
mlr filter 'log10($count) > 4.0'
|
||||
mlr filter 'FNR == 2 (second record in each file)'
|
||||
mlr filter 'urand() < 0.001' (subsampling)
|
||||
mlr filter '$color != "blue" && $value > 4.2'
|
||||
mlr filter '($x<.5 && $y<.5) || ($x>.5 && $y>.5)'
|
||||
|
||||
Please see http://johnkerl.org/miller/doc/reference.html for more information including function list.
|
||||
|
||||
group-by
|
||||
^^^^^^^^
|
||||
|
||||
Usage: +mlr group-by {comma-separated field names}+
|
||||
|
||||
Outputs records in batches having identical values at specified field names.
|
||||
|
||||
group-like
|
||||
^^^^^^^^^^
|
||||
|
||||
Usage: +mlr group-like+
|
||||
|
||||
Outputs records in batches having identical field names.
|
||||
|
||||
having-fields
|
||||
^^^^^^^^^^^^^
|
||||
|
||||
Usage: +mlr having-fields [options]+
|
||||
|
||||
Conditionally passes through records depending on each record's field names.
|
||||
|
||||
Options:
|
||||
|
||||
--at-least {a,b,c}
|
||||
--which-are {a,b,c}
|
||||
--at-most {a,b,c}
|
||||
|
||||
head
|
||||
^^^^
|
||||
|
||||
Usage: +mlr head [options]+
|
||||
|
||||
Passes through the first n records, optionally by category.
|
||||
|
||||
Options:
|
||||
|
||||
-n {count}::
|
||||
Head count to print; default 10
|
||||
-g {a,b,c}::
|
||||
Optional group-by-field names for head counts
|
||||
|
||||
histogram
|
||||
^^^^^^^^^
|
||||
|
||||
Usage: +mlr histogram [options]+
|
||||
|
||||
Just a histogram. Input values < lo or > hi are not counted.
|
||||
|
||||
Options:
|
||||
|
||||
-f {a,b,c}::
|
||||
Value-field names for histogram counts
|
||||
--lo {lo}::
|
||||
Histogram low value
|
||||
--hi {hi}::
|
||||
Histogram high value
|
||||
--nbins {n}::
|
||||
Number of histogram bins
|
||||
|
||||
join
|
||||
^^^^
|
||||
|
||||
Usage: +mlr join [options]+
|
||||
|
||||
Joins records from specified left file name with records from all file names
|
||||
at the end of the Miller argument list. Functionality is essentially the same
|
||||
as the system "join" command, but for record streams.
|
||||
|
||||
Options:
|
||||
|
||||
-f {left file name}
|
||||
-j {a,b,c} Comma-separated join-field names for output
|
||||
-l {a,b,c} Comma-separated join-field names for left input file; defaults to -j values if omitted.
|
||||
-r {a,b,c} Comma-separated join-field names for right input file(s); defaults to -j values if omitted.
|
||||
--lp {text} Additional prefix for non-join output field names from the left file
|
||||
--rp {text} Additional prefix for non-join output field names from the right file(s)
|
||||
--np Do not emit paired records
|
||||
--ul Emit unpaired records from the left file
|
||||
--ur Emit unpaired records from the right file(s)
|
||||
-u Enable unsorted input. In this case, the entire left file will be loaded into memory.
|
||||
Without -u, records must be sorted lexically by their join-field names, else not all
|
||||
records will be paired.
|
||||
|
||||
File-format options default to those for the right file names on the Miller
|
||||
argument list, but may be overridden for the left file as follows. Please see
|
||||
the main "mlr --help" for more information on syntax for these arguments.
|
||||
|
||||
-i {one of csv,dkvp,nidx,pprint,xtab}
|
||||
--irs {record-separator character}
|
||||
--ifs {field-separator character}
|
||||
--ips {pair-separator character}
|
||||
--repifs
|
||||
--repips
|
||||
--use-mmap
|
||||
--no-mmap
|
||||
|
||||
Please see http://johnkerl.org/miller/doc/reference.html for more information
|
||||
including examples.
|
||||
|
||||
label
|
||||
^^^^^
|
||||
|
||||
Usage: +mlr label {new1,new2,new3,...}+
|
||||
|
||||
Given n comma-separated names, renames the first n fields of each record to
|
||||
have the respective name. (Fields past the nth are left with their original
|
||||
names.) Particularly useful with --inidx, to give useful names to otherwise
|
||||
integer-indexed fields.
|
||||
|
||||
put
|
||||
^^^
|
||||
|
||||
Usage: +mlr put [-v] {expression}+
|
||||
|
||||
Adds/updates specified field(s).
|
||||
|
||||
With +-v+, first prints the AST (abstract syntax tree) for the expression,
|
||||
which gives full transparency on the precedence and associativity rules of
|
||||
Miller's grammar. Please use a dollar sign for field names and double-quotes
|
||||
for string literals. Miller built-in variables are +NF+, +NR+, +FNR+,
|
||||
+FILENUM+, +FILENAME+, +PI+, +E+. Multiple assignments may be separated with
|
||||
a semicolon.
|
||||
|
||||
Examples:
|
||||
|
||||
mlr put '$y = log10($x); $z = sqrt($y)'
|
||||
mlr put '$filename = FILENAME'
|
||||
mlr put '$colored_shape = $color . "_" . $shape'
|
||||
mlr put '$y = cos($theta); $z = atan2($y, $x)'
|
||||
|
||||
Please see http://johnkerl.org/miller/doc/reference.html for more information
|
||||
including function list.
|
||||
|
||||
regularize
|
||||
^^^^^^^^^^
|
||||
|
||||
Usage: +mlr regularize+
|
||||
|
||||
For records seen earlier in the data stream with same field names in a different order,
|
||||
outputs them with field names in the previously encountered order.
|
||||
|
||||
Example:
|
||||
|
||||
input records a=1,c=2,b=3, then e=4,d=5, then c=7,a=6,b=8
|
||||
output as a=1,c=2,b=3, then e=4,d=5, then a=6,c=7,b=8
|
||||
|
||||
rename
|
||||
^^^^^^
|
||||
|
||||
Usage: +mlr rename {old1,new1,old2,new2,...}+
|
||||
|
||||
Renames specified fields.
|
||||
|
||||
reorder
|
||||
^^^^^^^
|
||||
|
||||
Usage: +mlr reorder [options]+
|
||||
|
||||
Options:
|
||||
|
||||
-f {a,b,c}::
|
||||
Field names to reorder.
|
||||
-e::
|
||||
Put specified field names at record end: default is to put at record start.
|
||||
|
||||
Examples:
|
||||
|
||||
mlr reorder -f a,b sends input record "d=4,b=2,a=1,c=3" to "a=1,b=2,d=4,c=3".
|
||||
mlr reorder -e -f a,b sends input record "d=4,b=2,a=1,c=3" to "d=4,c=3,a=1,b=2".
|
||||
|
||||
sort
|
||||
^^^^
|
||||
|
||||
Usage: +mlr sort {flags}+
|
||||
|
||||
Sorts records primarily by the first specified field, secondarily by the
|
||||
second field, and so on.
|
||||
|
||||
Flags:
|
||||
|
||||
-f {comma-separated field names}::
|
||||
Lexical ascending
|
||||
-n {comma-separated field names}::
|
||||
Numerical ascending; nulls sort last
|
||||
-nf {comma-separated field names}::
|
||||
Numerical ascending; nulls sort last
|
||||
-r {comma-separated field names}::
|
||||
Lexical descending
|
||||
-nr {comma-separated field names}::
|
||||
Numerical descending; nulls sort first
|
||||
|
||||
Example:
|
||||
|
||||
mlr sort -f a,b -nr x,y,z
|
||||
|
||||
which is the same as:
|
||||
|
||||
mlr sort -f a -f b -nr x -nr y -nr z
|
||||
|
||||
stats1
|
||||
^^^^^^
|
||||
|
||||
Usage: +mlr stats1 [options]+
|
||||
|
||||
-a {sum,count,...}::
|
||||
Names of accumulators: +p10+, +p25.2+, +p50+, +p98+, +p100+, etc. and/or
|
||||
one or more of: +count+, +mode+, +sum+, +mean+, +stddev+, +var+, +meaneb+,
|
||||
+min+, +max+.
|
||||
-f {a,b,c}::
|
||||
Value-field names on which to compute statistics
|
||||
-g {d,e,f}::
|
||||
Optional group-by-field names
|
||||
|
||||
Examples:
|
||||
|
||||
mlr stats1 -a min,p10,p50,p90,max -f value -g size,shape
|
||||
mlr stats1 -a count,mode -f size
|
||||
mlr stats1 -a count,mode -f size -g shape
|
||||
|
||||
Notes:
|
||||
|
||||
* p50 is a synonym for median.
|
||||
* min and max output the same results as p0 and p100, respectively, but use less memory.
|
||||
* count and mode allow text input; the rest require numeric input. In particular, 1 and 1.0
|
||||
are distinct text for count and mode.
|
||||
* When there are mode ties, the first-encountered datum wins.
|
||||
|
||||
stats2
|
||||
^^^^^^
|
||||
|
||||
Usage: +mlr stats2 [options]+
|
||||
|
||||
-a {linreg-ols,corr,...}::
|
||||
Names of accumulators: one or more of +linreg-pca+, +linreg-ols+, +r2+,
|
||||
+corr+, +cov+, +covx+. +r2+ is a quality metric for +linreg-ols+;
|
||||
+linrec-pca+ outputs its own quality metric.
|
||||
-f {a,b,c,d}::
|
||||
Value-field name-pairs on which to compute statistics. There must be an
|
||||
even number of names.
|
||||
-g {e,f,g}::
|
||||
Optional group-by-field names.
|
||||
-v::
|
||||
Print additional output for +linreg-pca+.
|
||||
|
||||
Examples:
|
||||
|
||||
mlr stats2 -a linreg-pca -f x,y
|
||||
mlr stats2 -a linreg-ols,r2 -f x,y -g size,shape
|
||||
mlr stats2 -a corr -f x,y
|
||||
|
||||
step
|
||||
^^^^
|
||||
|
||||
Usage: +mlr step [options]+
|
||||
|
||||
Computes values dependent on the previous record, optionally grouped by
|
||||
category.
|
||||
|
||||
-a {delta,rsum,...}::
|
||||
Names of steppers: one or more of delta rsum counter
|
||||
-f {a,b,c}::
|
||||
Value-field names on which to compute statistics
|
||||
-g {d,e,f}::
|
||||
Group-by-field names
|
||||
|
||||
tac
|
||||
^^^
|
||||
|
||||
Usage: +mlr tac+
|
||||
|
||||
Prints records in reverse order from the order in which they were encountered.
|
||||
|
||||
tail
|
||||
^^^^
|
||||
|
||||
Usage: +mlr tail [options]+
|
||||
|
||||
Passes through the last n records, optionally by category.
|
||||
|
||||
-n {count}::
|
||||
Tail count to print; default 10
|
||||
-g {a,b,c}::
|
||||
Optional group-by-field names for tail counts
|
||||
|
||||
top
|
||||
^^^
|
||||
|
||||
Usage: +mlr top [options]+
|
||||
|
||||
Prints the n records with smallest/largest values at specified fields,
|
||||
optionally by category.
|
||||
|
||||
-f {a,b,c}::
|
||||
Value-field names for top counts
|
||||
-g {d,e,f}::
|
||||
Optional group-by-field names for top counts
|
||||
-n {count}::
|
||||
How many records to print per category; default 1
|
||||
-a::
|
||||
Print all fields for top-value records; default is to print only
|
||||
value and group-by fields.
|
||||
--min::
|
||||
Print top smallest values; default is top largest values
|
||||
|
||||
uniq
|
||||
^^^^
|
||||
|
||||
Usage: +mlr uniq [options]+
|
||||
|
||||
Prints distinct values for specified field names. With -c, same as
|
||||
count-distinct.
|
||||
|
||||
-g {d,e,f}::
|
||||
Group-by-field names for uniq counts
|
||||
-c::
|
||||
Show repeat counts in addition to unique values
|
||||
|
||||
AUTHOR
|
||||
------
|
||||
|
||||
miller is written by John Kerl <kerl.john.r@gmail.com>.
|
||||
|
||||
This manual page has been composed from miller's help output by Eric MSP Veith
|
||||
<eveith@veith-m.de>.
|
||||
|
||||
|
||||
SEE ALSO
|
||||
--------
|
||||
|
||||
sed(1), awk(1), cut(1), join(1), sort(1), RFC 4180: Common Format and MIME
|
||||
Type for Comma-Separated Values (CSV) Files, the miller website
|
||||
http://johnkerl.org/miller/doc
|
||||
Loading…
Add table
Add a link
Reference in a new issue