mirror of
https://github.com/johnkerl/miller.git
synced 2026-01-23 02:14:13 +00:00
1360 lines
58 KiB
Text
1360 lines
58 KiB
Text
================================================================
|
|
altkv
|
|
Usage: mlr altkv [options]
|
|
Given fields with values of the form a,b,c,d,e,f emits a=b,c=d,e=f pairs.
|
|
Options:
|
|
-h|--help Show this message.
|
|
|
|
================================================================
|
|
bar
|
|
Usage: mlr bar [options]
|
|
Replaces a numeric field with a number of asterisks, allowing for cheesy
|
|
bar plots. These align best with --opprint or --oxtab output format.
|
|
Options:
|
|
-f {a,b,c} Field names to convert to bars.
|
|
--lo {lo} Lower-limit value for min-width bar: default '0.000000'.
|
|
--hi {hi} Upper-limit value for max-width bar: default '100.000000'.
|
|
-w {n} Bar-field width: default '40'.
|
|
--auto Automatically computes limits, ignoring --lo and --hi.
|
|
Holds all records in memory before producing any output.
|
|
-c {character} Fill character: default '*'.
|
|
-x {character} Out-of-bounds character: default '#'.
|
|
-b {character} Blank character: default '.'.
|
|
Nominally the fill, out-of-bounds, and blank characters will be strings of length 1.
|
|
However you can make them all longer if you so desire.
|
|
-h|--help Show this message.
|
|
|
|
================================================================
|
|
bootstrap
|
|
Usage: mlr bootstrap [options]
|
|
Emits an n-sample, with replacement, of the input records.
|
|
See also mlr sample and mlr shuffle.
|
|
Options:
|
|
-n Number of samples to output. Defaults to number of input records.
|
|
Must be non-negative.
|
|
-h|--help Show this message.
|
|
|
|
================================================================
|
|
case
|
|
Usage: mlr case [options]
|
|
Uppercases strings in record keys and/or values.
|
|
Options:
|
|
-k Case only keys, not keys and values.
|
|
-v Case only values, not keys and values.
|
|
-f {a,b,c} Specify which field names to case (default: all)
|
|
-u Convert to uppercase
|
|
-l Convert to lowercase
|
|
-s Convert to sentence case (capitalize first letter)
|
|
-t Convert to title case (capitalize words)
|
|
-h|--help Show this message.
|
|
|
|
================================================================
|
|
cat
|
|
Usage: mlr cat [options]
|
|
Passes input records directly to output. Most useful for format conversion.
|
|
Options:
|
|
-n Prepend field "n" to each record with record-counter starting at 1.
|
|
-N {name} Prepend field {name} to each record with record-counter starting at 1.
|
|
-g {a,b,c} Optional group-by-field names for counters, e.g. a,b,c
|
|
--filename Prepend current filename to each record.
|
|
--filenum Prepend current filenum (1-up) to each record.
|
|
-h|--help Show this message.
|
|
|
|
================================================================
|
|
check
|
|
Usage: mlr check [options]
|
|
Consumes records without printing any output,
|
|
Useful for doing a well-formatted check on input data.
|
|
with the exception that warnings are printed to stderr.
|
|
Current checks are:
|
|
* Data are parseable
|
|
* If any key is the empty string
|
|
Options:
|
|
-h|--help Show this message.
|
|
|
|
================================================================
|
|
clean-whitespace
|
|
Usage: mlr clean-whitespace [options]
|
|
For each record, for each field in the record, whitespace-cleans the keys and/or
|
|
values. Whitespace-cleaning entails stripping leading and trailing whitespace,
|
|
and replacing multiple whitespace with singles. For finer-grained control,
|
|
please see the DSL functions lstrip, rstrip, strip, collapse_whitespace,
|
|
and clean_whitespace.
|
|
|
|
Options:
|
|
-k|--keys-only Do not touch values.
|
|
-v|--values-only Do not touch keys.
|
|
It is an error to specify -k as well as -v -- to clean keys and values,
|
|
leave off -k as well as -v.
|
|
-h|--help Show this message.
|
|
|
|
================================================================
|
|
count-distinct
|
|
Usage: mlr count-distinct [options]
|
|
Prints number of records having distinct values for specified field names.
|
|
Same as uniq -c.
|
|
|
|
Options:
|
|
-f {a,b,c} Field names for distinct count.
|
|
-x {a,b,c} Field names to exclude for distinct count: use each record's others instead.
|
|
-n Show only the number of distinct values. Not compatible with -u.
|
|
-o {name} Field name for output count. Default "count".
|
|
Ignored with -u.
|
|
-u Do unlashed counts for multiple field names. With -f a,b and
|
|
without -u, computes counts for distinct combinations of a
|
|
and b field values. With -f a,b and with -u, computes counts
|
|
for distinct a field values and counts for distinct b field
|
|
values separately.
|
|
|
|
================================================================
|
|
count
|
|
Usage: mlr count [options]
|
|
Prints number of records, optionally grouped by distinct values for specified field names.
|
|
Options:
|
|
-g {a,b,c} Optional group-by-field names for counts, e.g. a,b,c
|
|
-n {n} Show only the number of distinct values. Not interesting without -g.
|
|
-o {name} Field name for output-count. Default "count".
|
|
-h|--help Show this message.
|
|
|
|
================================================================
|
|
count-similar
|
|
Usage: mlr count-similar [options]
|
|
Ingests all records, then emits each record augmented by a count of
|
|
the number of other records having the same group-by field values.
|
|
Options:
|
|
-g {a,b,c} Group-by-field names for counts, e.g. a,b,c
|
|
-o {name} Field name for output-counts. Defaults to "count".
|
|
-h|--help Show this message.
|
|
|
|
================================================================
|
|
cut
|
|
Usage: mlr cut [options]
|
|
Passes through input records with specified fields included/excluded.
|
|
Options:
|
|
-f {a,b,c} Comma-separated field names for cut, e.g. a,b,c.
|
|
-o Retain fields in the order specified here in the argument list.
|
|
Default is to retain them in the order found in the input data.
|
|
-x|--complement Exclude, rather than include, field names specified by -f.
|
|
-r Treat field names as regular expressions. "ab", "a.*b" will
|
|
match any field name containing the substring "ab" or matching
|
|
"a.*b", respectively; anchors of the form "^ab$", "^a.*b$" may
|
|
be used. The -o flag is ignored when -r is present.
|
|
-h|--help Show this message.
|
|
Examples:
|
|
mlr cut -f hostname,status
|
|
mlr cut -x -f hostname,status
|
|
mlr cut -r -f '^status$,sda[0-9]'
|
|
mlr cut -r -f '^status$,"sda[0-9]"'
|
|
mlr cut -r -f '^status$,"sda[0-9]"i' (this is case-insensitive)
|
|
|
|
================================================================
|
|
decimate
|
|
Usage: mlr decimate [options]
|
|
Passes through one of every n records, optionally by category.
|
|
Options:
|
|
-b Decimate by printing first of every n.
|
|
-e Decimate by printing last of every n (default).
|
|
-g {a,b,c} Optional group-by-field names for decimate counts, e.g. a,b,c.
|
|
-n {n} Decimation factor (default 10).
|
|
-h|--help Show this message.
|
|
|
|
================================================================
|
|
fill-down
|
|
Usage: mlr fill-down [options]
|
|
If a given record has a missing value for a given field, fill that from
|
|
the corresponding value from a previous record, if any.
|
|
By default, a 'missing' field either is absent, or has the empty-string value.
|
|
With -a, a field is 'missing' only if it is absent.
|
|
|
|
Options:
|
|
--all Operate on all fields in the input.
|
|
-a|--only-if-absent If a given record has a missing value for a given field,
|
|
fill that from the corresponding value from a previous record, if any.
|
|
By default, a 'missing' field either is absent, or has the empty-string value.
|
|
With -a, a field is 'missing' only if it is absent.
|
|
-f Field names for fill-down.
|
|
-h|--help Show this message.
|
|
|
|
================================================================
|
|
fill-empty
|
|
Usage: mlr fill-empty [options]
|
|
Fills empty-string fields with specified fill-value.
|
|
Options:
|
|
-v {string} Fill-value: defaults to "N/A"
|
|
-S Don't infer type -- so '-v 0' would fill string 0 not int 0.
|
|
|
|
================================================================
|
|
filter
|
|
Usage: mlr filter [options] {DSL expression}
|
|
Options:
|
|
-f {file name} File containing a DSL expression (see examples below). If the filename
|
|
is a directory, all *.mlr files in that directory are loaded.
|
|
|
|
-e {expression} You can use this after -f to add an expression. Example use
|
|
case: define functions/subroutines in a file you specify with -f, then call
|
|
them with an expression you specify with -e.
|
|
|
|
(If you mix -e and -f then the expressions are evaluated in the order encountered.
|
|
Since the expression pieces are simply concatenated, please be sure to use intervening
|
|
semicolons to separate expressions.)
|
|
|
|
-s name=value: Predefines out-of-stream variable @name to have
|
|
Thus mlr put -s foo=97 '$column += @foo' is like
|
|
mlr put 'begin {@foo = 97} $column += @foo'.
|
|
The value part is subject to type-inferencing.
|
|
May be specified more than once, e.g. -s name1=value1 -s name2=value2.
|
|
Note: the value may be an environment variable, e.g. -s sequence=$SEQUENCE
|
|
|
|
-x (default false) Prints records for which {expression} evaluates to false, not true,
|
|
i.e. invert the sense of the filter expression.
|
|
|
|
-q Does not include the modified record in the output stream.
|
|
Useful for when all desired output is in begin and/or end blocks.
|
|
|
|
-S and -F: There are no-ops in Miller 6 and above, since now type-inferencing is done
|
|
by the record-readers before filter/put is executed. Supported as no-op pass-through
|
|
flags for backward compatibility.
|
|
|
|
-h|--help Show this message.
|
|
|
|
Parser-info options:
|
|
|
|
-w Print warnings about things like uninitialized variables.
|
|
|
|
-W Same as -w, but exit the process if there are any warnings.
|
|
|
|
-p Prints the expressions's AST (abstract syntax tree), which gives full
|
|
transparency on the precedence and associativity rules of Miller's grammar,
|
|
to stdout.
|
|
|
|
-d Like -p but uses a parenthesized-expression format for the AST.
|
|
|
|
-D Like -d but with output all on one line.
|
|
|
|
-E Echo DSL expression before printing parse-tree
|
|
|
|
-v Same as -E -p.
|
|
|
|
-X Exit after parsing but before stream-processing. Useful with -v/-d/-D, if you
|
|
only want to look at parser information.
|
|
|
|
Records will pass the filter depending on the last bare-boolean statement in
|
|
the DSL expression. That can be the result of <, ==, >, etc., the return value of a function call
|
|
which returns boolean, etc.
|
|
|
|
Examples:
|
|
mlr --csv --from example.csv filter '$color == "red"'
|
|
mlr --csv --from example.csv filter '$color == "red" && flag == true'
|
|
More example filter expressions:
|
|
First record in each file:
|
|
'FNR == 1'
|
|
Subsampling:
|
|
'urand() < 0.001'
|
|
Compound booleans:
|
|
'$color != "blue" && $value > 4.2'
|
|
'($x < 0.5 && $y < 0.5) || ($x > 0.5 && $y > 0.5)'
|
|
Regexes with case-insensitive flag
|
|
'($name =~ "^sys.*east$") || ($name =~ "^dev.[0-9]+"i)'
|
|
Assignments, then bare-boolean filter statement:
|
|
'$ab = $a+$b; $cd = $c+$d; $ab != $cd'
|
|
Bare-boolean filter statement within a conditional:
|
|
'if (NR < 100) {
|
|
$x > 0.3;
|
|
} else {
|
|
$x > 0.002;
|
|
}
|
|
'
|
|
Using 'any' higher-order function to see if $index is 10, 20, or 30:
|
|
'any([10,20,30], func(e) {return $index == e})'
|
|
|
|
See also https://miller.readthedocs.io/reference-dsl for more context.
|
|
|
|
================================================================
|
|
flatten
|
|
Usage: mlr flatten [options]
|
|
Flattens multi-level maps to single-level ones. Example: field with name 'a'
|
|
and value '{"b": { "c": 4 }}' becomes name 'a.b.c' and value 4.
|
|
Options:
|
|
-f Comma-separated list of field names to flatten (default all).
|
|
-s Separator, defaulting to mlr --flatsep value.
|
|
-h|--help Show this message.
|
|
|
|
================================================================
|
|
format-values
|
|
Usage: mlr format-values [options]
|
|
Applies format strings to all field values, depending on autodetected type.
|
|
* If a field value is detected to be integer, applies integer format.
|
|
* Else, if a field value is detected to be float, applies float format.
|
|
* Else, applies string format.
|
|
|
|
Note: this is a low-keystroke way to apply formatting to many fields. To get
|
|
finer control, please see the fmtnum function within the mlr put DSL.
|
|
|
|
Note: this verb lets you apply arbitrary format strings, which can produce
|
|
undefined behavior and/or program crashes. See your system's "man printf".
|
|
|
|
Options:
|
|
-i {integer format} Defaults to "%d".
|
|
Examples: "%06lld", "%08llx".
|
|
Note that Miller integers are long long so you must use
|
|
formats which apply to long long, e.g. with ll in them.
|
|
Undefined behavior results otherwise.
|
|
-f {float format} Defaults to "%f".
|
|
Examples: "%8.3lf", "%.6le".
|
|
Note that Miller floats are double-precision so you must
|
|
use formats which apply to double, e.g. with l[efg] in them.
|
|
Undefined behavior results otherwise.
|
|
-s {string format} Defaults to "%s".
|
|
Examples: "_%s", "%08s".
|
|
Note that you must use formats which apply to string, e.g.
|
|
with s in them. Undefined behavior results otherwise.
|
|
-n Coerce field values autodetected as int to float, and then
|
|
apply the float format.
|
|
|
|
================================================================
|
|
fraction
|
|
Usage: mlr fraction [options]
|
|
For each record's value in specified fields, computes the ratio of that
|
|
value to the sum of values in that field over all input records.
|
|
E.g. with input records x=1 x=2 x=3 and x=4, emits output records
|
|
x=1,x_fraction=0.1 x=2,x_fraction=0.2 x=3,x_fraction=0.3 and x=4,x_fraction=0.4
|
|
|
|
Note: this is internally a two-pass algorithm: on the first pass it retains
|
|
input records and accumulates sums; on the second pass it computes quotients
|
|
and emits output records. This means it produces no output until all input is read.
|
|
|
|
Options:
|
|
-f {a,b,c} Field name(s) for fraction calculation
|
|
-g {d,e,f} Optional group-by-field name(s) for fraction counts
|
|
-p Produce percents [0..100], not fractions [0..1]. Output field names
|
|
end with "_percent" rather than "_fraction"
|
|
-c Produce cumulative distributions, i.e. running sums: each output
|
|
value folds in the sum of the previous for the specified group
|
|
E.g. with input records x=1 x=2 x=3 and x=4, emits output records
|
|
x=1,x_cumulative_fraction=0.1 x=2,x_cumulative_fraction=0.3
|
|
x=3,x_cumulative_fraction=0.6 and x=4,x_cumulative_fraction=1.0
|
|
|
|
================================================================
|
|
gap
|
|
Usage: mlr gap [options]
|
|
Emits an empty record every n records, or when certain values change.
|
|
Options:
|
|
Emits an empty record every n records, or when certain values change.
|
|
-g {a,b,c} Print a gap whenever values of these fields (e.g. a,b,c) changes.
|
|
-n {n} Print a gap every n records.
|
|
One of -f or -g is required.
|
|
-n is ignored if -g is present.
|
|
-h|--help Show this message.
|
|
|
|
================================================================
|
|
grep
|
|
Usage: mlr grep [options] {regular expression}
|
|
Passes through records which match the regular expression.
|
|
Options:
|
|
-i Use case-insensitive search.
|
|
-v Invert: pass through records which do not match the regex.
|
|
-a Only grep for values, not keys and values.
|
|
-h|--help Show this message.
|
|
Note that "mlr filter" is more powerful, but requires you to know field names.
|
|
By contrast, "mlr grep" allows you to regex-match the entire record. It does this
|
|
by formatting each record in memory as DKVP (or NIDX, if -a is supplied), using
|
|
OFS "," and OPS "=", and matching the resulting line against the regex specified
|
|
here. In particular, the regex is not applied to the input stream: if you have
|
|
CSV with header line "x,y,z" and data line "1,2,3" then the regex will be
|
|
matched, not against either of these lines, but against the DKVP line
|
|
"x=1,y=2,z=3". Furthermore, not all the options to system grep are supported,
|
|
and this command is intended to be merely a keystroke-saver. To get all the
|
|
features of system grep, you can do
|
|
"mlr --odkvp ... | grep ... | mlr --idkvp ..."
|
|
|
|
================================================================
|
|
group-by
|
|
Usage: mlr group-by [options] {comma-separated field names}
|
|
Outputs records in batches having identical values at specified field names.Options:
|
|
-h|--help Show this message.
|
|
|
|
================================================================
|
|
group-like
|
|
Usage: mlr group-like [options]
|
|
Outputs records in batches having identical field names.
|
|
Options:
|
|
-h|--help Show this message.
|
|
|
|
================================================================
|
|
gsub
|
|
Usage: mlr gsub [options]
|
|
Replaces old string with new string in specified field(s), with regex support
|
|
for the old string and handling multiple matches, like the `gsub` DSL function.
|
|
See also the `sub` and `ssub` verbs.
|
|
Options:
|
|
-f {a,b,c} Field names to convert.
|
|
-h|--help Show this message.
|
|
|
|
================================================================
|
|
having-fields
|
|
Usage: mlr having-fields [options]
|
|
Conditionally passes through records depending on each record's field names.
|
|
Options:
|
|
--at-least {comma-separated names}
|
|
--which-are {comma-separated names}
|
|
--at-most {comma-separated names}
|
|
--all-matching {regular expression}
|
|
--any-matching {regular expression}
|
|
--none-matching {regular expression}
|
|
Examples:
|
|
mlr having-fields --which-are amount,status,owner
|
|
mlr having-fields --any-matching 'sda[0-9]'
|
|
mlr having-fields --any-matching '"sda[0-9]"'
|
|
mlr having-fields --any-matching '"sda[0-9]"i' (this is case-insensitive)
|
|
|
|
================================================================
|
|
head
|
|
Usage: mlr head [options]
|
|
Passes through the first n records, optionally by category.
|
|
Without -g, ceases consuming more input (i.e. is fast) when n records have been read.
|
|
Options:
|
|
-g {a,b,c} Optional group-by-field names for head counts, e.g. a,b,c.
|
|
-n {n} Head-count to print. Default 10.
|
|
-h|--help Show this message.
|
|
|
|
================================================================
|
|
histogram
|
|
Just a histogram. Input values < lo or > hi are not counted.
|
|
Usage: mlr histogram [options]
|
|
-f {a,b,c} Value-field names for histogram counts
|
|
--lo {lo} Histogram low value
|
|
--hi {hi} Histogram high value
|
|
--nbins {n} Number of histogram bins. Defaults to 20.
|
|
--auto Automatically computes limits, ignoring --lo and --hi.
|
|
Holds all values in memory before producing any output.
|
|
-o {prefix} Prefix for output field name. Default: no prefix.
|
|
-h|--help Show this message.
|
|
|
|
================================================================
|
|
json-parse
|
|
Usage: mlr json-parse [options]
|
|
Tries to convert string field values to parsed JSON, e.g. "[1,2,3]" -> [1,2,3].
|
|
Options:
|
|
-f {...} Comma-separated list of field names to json-parse (default all).
|
|
-k If supplied, then on parse fail for any cell, keep the (unparsable)
|
|
input value for the cell.
|
|
-h|--help Show this message.
|
|
|
|
================================================================
|
|
json-stringify
|
|
Usage: mlr json-stringify [options]
|
|
Produces string field values from field-value data, e.g. [1,2,3] -> "[1,2,3]".
|
|
Options:
|
|
-f {...} Comma-separated list of field names to json-parse (default all).
|
|
--jvstack Produce multi-line JSON output.
|
|
--no-jvstack Produce single-line JSON output per record (default).
|
|
-h|--help Show this message.
|
|
|
|
================================================================
|
|
join
|
|
Usage: mlr join [options]
|
|
Joins records from specified left file name with records from all file names
|
|
at the end of the Miller argument list.
|
|
Functionality is essentially the same as the system "join" command, but for
|
|
record streams.
|
|
Options:
|
|
-f {left file name}
|
|
-j {a,b,c} Comma-separated join-field names for output
|
|
-l {a,b,c} Comma-separated join-field names for left input file;
|
|
defaults to -j values if omitted.
|
|
-r {a,b,c} Comma-separated join-field names for right input file(s);
|
|
defaults to -j values if omitted.
|
|
--lk|--left-keep-field-names {a,b,c} If supplied, this means keep only the specified field
|
|
names from the left file. Automatically includes the join-field name(s). Helpful
|
|
for when you only want a limited subset of information from the left file.
|
|
Tip: you can use --lk "": this means the left file becomes solely a row-selector
|
|
for the input files.
|
|
--lp {text} Additional prefix for non-join output field names from
|
|
the left file
|
|
--rp {text} Additional prefix for non-join output field names from
|
|
the right file(s)
|
|
--np Do not emit paired records
|
|
--ul Emit unpaired records from the left file
|
|
--ur Emit unpaired records from the right file(s)
|
|
-s|--sorted-input Require sorted input: records must be sorted
|
|
lexically by their join-field names, else not all records will
|
|
be paired. The only likely use case for this is with a left
|
|
file which is too big to fit into system memory otherwise.
|
|
-u Enable unsorted input. (This is the default even without -u.)
|
|
In this case, the entire left file will be loaded into memory.
|
|
--prepipe {command} As in main input options; see mlr --help for details.
|
|
If you wish to use a prepipe command for the main input as well
|
|
as here, it must be specified there as well as here.
|
|
--prepipex {command} Likewise.
|
|
File-format options default to those for the right file names on the Miller
|
|
argument list, but may be overridden for the left file as follows. Please see
|
|
the main "mlr --help" for more information on syntax for these arguments:
|
|
-i {one of csv,dkvp,nidx,pprint,xtab}
|
|
--irs {record-separator character}
|
|
--ifs {field-separator character}
|
|
--ips {pair-separator character}
|
|
--repifs
|
|
--implicit-csv-header
|
|
--implicit-tsv-header
|
|
--no-implicit-csv-header
|
|
--no-implicit-tsv-header
|
|
For example, if you have 'mlr --csv ... join -l foo ... ' then the left-file format will
|
|
be specified CSV as well unless you override with 'mlr --csv ... join --ijson -l foo' etc.
|
|
Likewise, if you have 'mlr --csv --implicit-csv-header ...' then the join-in file will be
|
|
expected to be headerless as well unless you put '--no-implicit-csv-header' after 'join'.
|
|
Please use "mlr --usage-separator-options" for information on specifying separators.
|
|
Please see https://miller.readthedocs.io/en/latest/reference-verbs.html#join for more information
|
|
including examples.
|
|
|
|
================================================================
|
|
label
|
|
Usage: mlr label [options] {new1,new2,new3,...}
|
|
Given n comma-separated names, renames the first n fields of each record to
|
|
have the respective name. (Fields past the nth are left with their original
|
|
names.) Particularly useful with --inidx or --implicit-csv-header, to give
|
|
useful names to otherwise integer-indexed fields.
|
|
|
|
Options:
|
|
-h|--help Show this message.
|
|
|
|
================================================================
|
|
latin1-to-utf8
|
|
Usage: mlr latin1-to-utf8, with no options.
|
|
Recursively converts record strings from Latin-1 to UTF-8.
|
|
For field-level control, please see the latin1_to_utf8 DSL function.
|
|
Options:
|
|
-h|--help Show this message.
|
|
|
|
================================================================
|
|
least-frequent
|
|
Usage: mlr least-frequent [options]
|
|
Shows the least frequently occurring distinct values for specified field names.
|
|
The first entry is the statistical anti-mode; the remaining are runners-up.
|
|
Options:
|
|
-f {one or more comma-separated field names}. Required flag.
|
|
-n {count}. Optional flag defaulting to 10.
|
|
-b Suppress counts; show only field values.
|
|
-o {name} Field name for output count. Default "count".
|
|
See also "mlr most-frequent".
|
|
|
|
================================================================
|
|
merge-fields
|
|
Usage: mlr merge-fields [options]
|
|
Computes univariate statistics for each input record, accumulated across
|
|
specified fields.
|
|
Options:
|
|
-a {sum,count,...} Names of accumulators. One or more of:
|
|
count Count instances of fields
|
|
null_count Count number of empty-string/JSON-null instances per field
|
|
distinct_count Count number of distinct values per field
|
|
mode Find most-frequently-occurring values for fields; first-found wins tie
|
|
antimode Find least-frequently-occurring values for fields; first-found wins tie
|
|
sum Compute sums of specified fields
|
|
mean Compute averages (sample means) of specified fields
|
|
var Compute sample variance of specified fields
|
|
stddev Compute sample standard deviation of specified fields
|
|
meaneb Estimate error bars for averages (assuming no sample autocorrelation)
|
|
skewness Compute sample skewness of specified fields
|
|
kurtosis Compute sample kurtosis of specified fields
|
|
min Compute minimum values of specified fields
|
|
max Compute maximum values of specified fields
|
|
minlen Compute minimum string-lengths of specified fields
|
|
maxlen Compute maximum string-lengths of specified fields
|
|
-f {a,b,c} Value-field names on which to compute statistics. Requires -o.
|
|
-r {a,b,c} Regular expressions for value-field names on which to compute
|
|
statistics. Requires -o.
|
|
-c {a,b,c} Substrings for collapse mode. All fields which have the same names
|
|
after removing substrings will be accumulated together. Please see
|
|
examples below.
|
|
-i Use interpolated percentiles, like R's type=7; default like type=1.
|
|
Not sensical for string-valued fields.
|
|
-o {name} Output field basename for -f/-r.
|
|
-k Keep the input fields which contributed to the output statistics;
|
|
the default is to omit them.
|
|
|
|
String-valued data make sense unless arithmetic on them is required,
|
|
e.g. for sum, mean, interpolated percentiles, etc. In case of mixed data,
|
|
numbers are less than strings.
|
|
|
|
Example input data: "a_in_x=1,a_out_x=2,b_in_y=4,b_out_x=8".
|
|
Example: mlr merge-fields -a sum,count -f a_in_x,a_out_x -o foo
|
|
produces "b_in_y=4,b_out_x=8,foo_sum=3,foo_count=2" since "a_in_x,a_out_x" are
|
|
summed over.
|
|
Example: mlr merge-fields -a sum,count -r in_,out_ -o bar
|
|
produces "bar_sum=15,bar_count=4" since all four fields are summed over.
|
|
Example: mlr merge-fields -a sum,count -c in_,out_
|
|
produces "a_x_sum=3,a_x_count=2,b_y_sum=4,b_y_count=1,b_x_sum=8,b_x_count=1"
|
|
since "a_in_x" and "a_out_x" both collapse to "a_x", "b_in_y" collapses to
|
|
"b_y", and "b_out_x" collapses to "b_x".
|
|
|
|
================================================================
|
|
most-frequent
|
|
Usage: mlr most-frequent [options]
|
|
Shows the most frequently occurring distinct values for specified field names.
|
|
The first entry is the statistical mode; the remaining are runners-up.
|
|
Options:
|
|
-f {one or more comma-separated field names}. Required flag.
|
|
-n {count}. Optional flag defaulting to 10.
|
|
-b Suppress counts; show only field values.
|
|
-o {name} Field name for output count. Default "count".
|
|
See also "mlr least-frequent".
|
|
|
|
================================================================
|
|
nest
|
|
Usage: mlr nest [options]
|
|
Explodes specified field values into separate fields/records, or reverses this.
|
|
Options:
|
|
--explode,--implode One is required.
|
|
--values,--pairs One is required.
|
|
--across-records,--across-fields One is required.
|
|
-f {field name} Required.
|
|
--nested-fs {string} Defaults to ";". Field separator for nested values.
|
|
--nested-ps {string} Defaults to ":". Pair separator for nested key-value pairs.
|
|
--evar {string} Shorthand for --explode --values --across-records --nested-fs {string}
|
|
--ivar {string} Shorthand for --implode --values --across-records --nested-fs {string}
|
|
Please use "mlr --usage-separator-options" for information on specifying separators.
|
|
|
|
Examples:
|
|
|
|
mlr nest --explode --values --across-records -f x
|
|
with input record "x=a;b;c,y=d" produces output records
|
|
"x=a,y=d"
|
|
"x=b,y=d"
|
|
"x=c,y=d"
|
|
Use --implode to do the reverse.
|
|
|
|
mlr nest --explode --values --across-fields -f x
|
|
with input record "x=a;b;c,y=d" produces output records
|
|
"x_1=a,x_2=b,x_3=c,y=d"
|
|
Use --implode to do the reverse.
|
|
|
|
mlr nest --explode --pairs --across-records -f x
|
|
with input record "x=a:1;b:2;c:3,y=d" produces output records
|
|
"a=1,y=d"
|
|
"b=2,y=d"
|
|
"c=3,y=d"
|
|
|
|
mlr nest --explode --pairs --across-fields -f x
|
|
with input record "x=a:1;b:2;c:3,y=d" produces output records
|
|
"a=1,b=2,c=3,y=d"
|
|
|
|
Notes:
|
|
* With --pairs, --implode doesn't make sense since the original field name has
|
|
been lost.
|
|
* The combination "--implode --values --across-records" is non-streaming:
|
|
no output records are produced until all input records have been read. In
|
|
particular, this means it won't work in `tail -f` contexts. But all other flag
|
|
combinations result in streaming (`tail -f` friendly) data processing.
|
|
If input is coming from `tail -f`, be sure to use `--records-per-batch 1`.
|
|
* It's up to you to ensure that the nested-fs is distinct from your data's IFS:
|
|
e.g. by default the former is semicolon and the latter is comma.
|
|
See also mlr reshape.
|
|
|
|
================================================================
|
|
nothing
|
|
Usage: mlr nothing [options]
|
|
Drops all input records. Useful for testing, or after tee/print/etc. have
|
|
produced other output.
|
|
Options:
|
|
-h|--help Show this message.
|
|
|
|
================================================================
|
|
put
|
|
Usage: mlr put [options] {DSL expression}
|
|
Options:
|
|
-f {file name} File containing a DSL expression (see examples below). If the filename
|
|
is a directory, all *.mlr files in that directory are loaded.
|
|
|
|
-e {expression} You can use this after -f to add an expression. Example use
|
|
case: define functions/subroutines in a file you specify with -f, then call
|
|
them with an expression you specify with -e.
|
|
|
|
(If you mix -e and -f then the expressions are evaluated in the order encountered.
|
|
Since the expression pieces are simply concatenated, please be sure to use intervening
|
|
semicolons to separate expressions.)
|
|
|
|
-s name=value: Predefines out-of-stream variable @name to have
|
|
Thus mlr put -s foo=97 '$column += @foo' is like
|
|
mlr put 'begin {@foo = 97} $column += @foo'.
|
|
The value part is subject to type-inferencing.
|
|
May be specified more than once, e.g. -s name1=value1 -s name2=value2.
|
|
Note: the value may be an environment variable, e.g. -s sequence=$SEQUENCE
|
|
|
|
-x (default false) Prints records for which {expression} evaluates to false, not true,
|
|
i.e. invert the sense of the filter expression.
|
|
|
|
-q Does not include the modified record in the output stream.
|
|
Useful for when all desired output is in begin and/or end blocks.
|
|
|
|
-S and -F: There are no-ops in Miller 6 and above, since now type-inferencing is done
|
|
by the record-readers before filter/put is executed. Supported as no-op pass-through
|
|
flags for backward compatibility.
|
|
|
|
-h|--help Show this message.
|
|
|
|
Parser-info options:
|
|
|
|
-w Print warnings about things like uninitialized variables.
|
|
|
|
-W Same as -w, but exit the process if there are any warnings.
|
|
|
|
-p Prints the expressions's AST (abstract syntax tree), which gives full
|
|
transparency on the precedence and associativity rules of Miller's grammar,
|
|
to stdout.
|
|
|
|
-d Like -p but uses a parenthesized-expression format for the AST.
|
|
|
|
-D Like -d but with output all on one line.
|
|
|
|
-E Echo DSL expression before printing parse-tree
|
|
|
|
-v Same as -E -p.
|
|
|
|
-X Exit after parsing but before stream-processing. Useful with -v/-d/-D, if you
|
|
only want to look at parser information.
|
|
|
|
Examples:
|
|
mlr --from example.csv put '$qr = $quantity * $rate'
|
|
More example put expressions:
|
|
If-statements:
|
|
'if ($flag == true) { $quantity *= 10}'
|
|
'if ($x > 0.0) { $y=log10($x); $z=sqrt($y) } else {$y = 0.0; $z = 0.0}'
|
|
Newly created fields can be read after being written:
|
|
'$new_field = $index**2; $qn = $quantity * $new_field'
|
|
Regex-replacement:
|
|
'$name = sub($name, "http.*com"i, "")'
|
|
Regex-capture:
|
|
'if ($a =~ "([a-z]+)_([0-9]+)") { $b = "left_\1"; $c = "right_\2" }'
|
|
Built-in variables:
|
|
'$filename = FILENAME'
|
|
Aggregations (use mlr put -q):
|
|
'@sum += $x; end {emit @sum}'
|
|
'@sum[$shape] += $quantity; end {emit @sum, "shape"}'
|
|
'@sum[$shape][$color] += $x; end {emit @sum, "shape", "color"}'
|
|
'
|
|
@min = min(@min,$x);
|
|
@max=max(@max,$x);
|
|
end{emitf @min, @max}
|
|
'
|
|
|
|
See also https://miller.readthedocs.io/reference-dsl for more context.
|
|
|
|
================================================================
|
|
regularize
|
|
Usage: mlr regularize [options]
|
|
Outputs records sorted lexically ascending by keys.
|
|
Options:
|
|
-h|--help Show this message.
|
|
|
|
================================================================
|
|
remove-empty-columns
|
|
Usage: mlr remove-empty-columns [options]
|
|
Omits fields which are empty on every input row. Non-streaming.
|
|
Options:
|
|
-h|--help Show this message.
|
|
|
|
================================================================
|
|
rename
|
|
Usage: mlr rename [options] {old1,new1,old2,new2,...}
|
|
Renames specified fields.
|
|
Options:
|
|
-r Treat old field names as regular expressions. "ab", "a.*b"
|
|
will match any field name containing the substring "ab" or
|
|
matching "a.*b", respectively; anchors of the form "^ab$",
|
|
"^a.*b$" may be used. New field names may be plain strings,
|
|
or may contain capture groups of the form "\1" through
|
|
"\9". Wrapping the regex in double quotes is optional, but
|
|
is required if you wish to follow it with 'i' to indicate
|
|
case-insensitivity.
|
|
-g Do global replacement within each field name rather than
|
|
first-match replacement.
|
|
-h|--help Show this message.
|
|
Examples:
|
|
mlr rename old_name,new_name'
|
|
mlr rename old_name_1,new_name_1,old_name_2,new_name_2'
|
|
mlr rename -r 'Date_[0-9]+,Date,' Rename all such fields to be "Date"
|
|
mlr rename -r '"Date_[0-9]+",Date' Same
|
|
mlr rename -r 'Date_([0-9]+).*,\1' Rename all such fields to be of the form 20151015
|
|
mlr rename -r '"name"i,Name' Rename "name", "Name", "NAME", etc. to "Name"
|
|
|
|
================================================================
|
|
reorder
|
|
Usage: mlr reorder [options]
|
|
Moves specified names to start of record, or end of record.
|
|
Options:
|
|
-e Put specified field names at record end: default is to put them at record start.
|
|
-f {a,b,c} Field names to reorder.
|
|
-b {x} Put field names specified with -f before field name specified by {x},
|
|
if any. If {x} isn't present in a given record, the specified fields
|
|
will not be moved.
|
|
-a {x} Put field names specified with -f after field name specified by {x},
|
|
if any. If {x} isn't present in a given record, the specified fields
|
|
will not be moved.
|
|
-h|--help Show this message.
|
|
|
|
Examples:
|
|
mlr reorder -f a,b sends input record "d=4,b=2,a=1,c=3" to "a=1,b=2,d=4,c=3".
|
|
mlr reorder -e -f a,b sends input record "d=4,b=2,a=1,c=3" to "d=4,c=3,a=1,b=2".
|
|
|
|
================================================================
|
|
repeat
|
|
Usage: mlr repeat [options]
|
|
Copies input records to output records multiple times.
|
|
Options must be exactly one of the following:
|
|
-n {repeat count} Repeat each input record this many times.
|
|
-f {field name} Same, but take the repeat count from the specified
|
|
field name of each input record.
|
|
-h|--help Show this message.
|
|
Example:
|
|
echo x=0 | mlr repeat -n 4 then put '$x=urand()'
|
|
produces:
|
|
x=0.488189
|
|
x=0.484973
|
|
x=0.704983
|
|
x=0.147311
|
|
Example:
|
|
echo a=1,b=2,c=3 | mlr repeat -f b
|
|
produces:
|
|
a=1,b=2,c=3
|
|
a=1,b=2,c=3
|
|
Example:
|
|
echo a=1,b=2,c=3 | mlr repeat -f c
|
|
produces:
|
|
a=1,b=2,c=3
|
|
a=1,b=2,c=3
|
|
a=1,b=2,c=3
|
|
|
|
================================================================
|
|
reshape
|
|
Usage: mlr reshape [options]
|
|
Wide-to-long options:
|
|
-i {input field names} -o {key-field name,value-field name}
|
|
-r {input field regex} -o {key-field name,value-field name}
|
|
These pivot/reshape the input data such that the input fields are removed
|
|
and separate records are emitted for each key/value pair.
|
|
Note: if you have multiple regexes, please specify them using multiple -r,
|
|
since regexes can contain commas within them.
|
|
Note: this works with tail -f and produces output records for each input
|
|
record seen. If input is coming from `tail -f`, be sure to use
|
|
`--records-per-batch 1`.
|
|
Long-to-wide options:
|
|
-s {key-field name,value-field name}
|
|
These pivot/reshape the input data to undo the wide-to-long operation.
|
|
Note: this does not work with tail -f; it produces output records only after
|
|
all input records have been read.
|
|
|
|
Examples:
|
|
|
|
Input file "wide.txt":
|
|
time X Y
|
|
2009-01-01 0.65473572 2.4520609
|
|
2009-01-02 -0.89248112 0.2154713
|
|
2009-01-03 0.98012375 1.3179287
|
|
|
|
mlr --pprint reshape -i X,Y -o item,value wide.txt
|
|
time item value
|
|
2009-01-01 X 0.65473572
|
|
2009-01-01 Y 2.4520609
|
|
2009-01-02 X -0.89248112
|
|
2009-01-02 Y 0.2154713
|
|
2009-01-03 X 0.98012375
|
|
2009-01-03 Y 1.3179287
|
|
|
|
mlr --pprint reshape -r '[A-Z]' -o item,value wide.txt
|
|
time item value
|
|
2009-01-01 X 0.65473572
|
|
2009-01-01 Y 2.4520609
|
|
2009-01-02 X -0.89248112
|
|
2009-01-02 Y 0.2154713
|
|
2009-01-03 X 0.98012375
|
|
2009-01-03 Y 1.3179287
|
|
|
|
Input file "long.txt":
|
|
time item value
|
|
2009-01-01 X 0.65473572
|
|
2009-01-01 Y 2.4520609
|
|
2009-01-02 X -0.89248112
|
|
2009-01-02 Y 0.2154713
|
|
2009-01-03 X 0.98012375
|
|
2009-01-03 Y 1.3179287
|
|
|
|
mlr --pprint reshape -s item,value long.txt
|
|
time X Y
|
|
2009-01-01 0.65473572 2.4520609
|
|
2009-01-02 -0.89248112 0.2154713
|
|
2009-01-03 0.98012375 1.3179287
|
|
See also mlr nest.
|
|
|
|
================================================================
|
|
sample
|
|
Usage: mlr sample [options]
|
|
Reservoir sampling (subsampling without replacement), optionally by category.
|
|
See also mlr bootstrap and mlr shuffle.
|
|
Options:
|
|
-g {a,b,c} Optional: group-by-field names for samples, e.g. a,b,c.
|
|
-k {k} Required: number of records to output in total, or by group if using -g.
|
|
-h|--help Show this message.
|
|
|
|
================================================================
|
|
sec2gmtdate
|
|
Usage: ../c/mlr sec2gmtdate {comma-separated list of field names}
|
|
Replaces a numeric field representing seconds since the epoch with the
|
|
corresponding GMT year-month-day timestamp; leaves non-numbers as-is.
|
|
This is nothing more than a keystroke-saver for the sec2gmtdate function:
|
|
../c/mlr sec2gmtdate time1,time2
|
|
is the same as
|
|
../c/mlr put '$time1=sec2gmtdate($time1);$time2=sec2gmtdate($time2)'
|
|
|
|
================================================================
|
|
sec2gmt
|
|
Usage: mlr sec2gmt [options] {comma-separated list of field names}
|
|
Replaces a numeric field representing seconds since the epoch with the
|
|
corresponding GMT timestamp; leaves non-numbers as-is. This is nothing
|
|
more than a keystroke-saver for the sec2gmt function:
|
|
mlr sec2gmt time1,time2
|
|
is the same as
|
|
mlr put '$time1 = sec2gmt($time1); $time2 = sec2gmt($time2)'
|
|
Options:
|
|
-1 through -9: format the seconds using 1..9 decimal places, respectively.
|
|
--millis Input numbers are treated as milliseconds since the epoch.
|
|
--micros Input numbers are treated as microseconds since the epoch.
|
|
--nanos Input numbers are treated as nanoseconds since the epoch.
|
|
-h|--help Show this message.
|
|
|
|
================================================================
|
|
seqgen
|
|
Usage: mlr seqgen [options]
|
|
Passes input records directly to output. Most useful for format conversion.
|
|
Produces a sequence of counters. Discards the input record stream. Produces
|
|
output as specified by the options
|
|
|
|
Options:
|
|
-f {name} (default "i") Field name for counters.
|
|
--start {value} (default 1) Inclusive start value.
|
|
--step {value} (default 1) Step value.
|
|
--stop {value} (default 100) Inclusive stop value.
|
|
-h|--help Show this message.
|
|
Start, stop, and/or step may be floating-point. Output is integer if start,
|
|
stop, and step are all integers. Step may be negative. It may not be zero
|
|
unless start == stop.
|
|
|
|
================================================================
|
|
shuffle
|
|
Usage: mlr shuffle [options]
|
|
Outputs records randomly permuted. No output records are produced until
|
|
all input records are read. See also mlr bootstrap and mlr sample.
|
|
Options:
|
|
-h|--help Show this message.
|
|
|
|
================================================================
|
|
skip-trivial-records
|
|
Usage: mlr skip-trivial-records [options]
|
|
Passes through all records except those with zero fields,
|
|
or those for which all fields have empty value.
|
|
Options:
|
|
-h|--help Show this message.
|
|
|
|
================================================================
|
|
sort
|
|
Usage: mlr sort {flags}
|
|
Sorts records primarily by the first specified field, secondarily by the second
|
|
field, and so on. (Any records not having all specified sort keys will appear
|
|
at the end of the output, in the order they were encountered, regardless of the
|
|
specified sort order.) The sort is stable: records that compare equal will sort
|
|
in the order they were encountered in the input record stream.
|
|
|
|
Options:
|
|
-f {comma-separated field names} Lexical ascending
|
|
-r {comma-separated field names} Lexical descending
|
|
-c {comma-separated field names} Case-folded lexical ascending
|
|
-cr {comma-separated field names} Case-folded lexical descending
|
|
-n {comma-separated field names} Numerical ascending; nulls sort last
|
|
-nf {comma-separated field names} Same as -n
|
|
-nr {comma-separated field names} Numerical descending; nulls sort first
|
|
-t {comma-separated field names} Natural ascending
|
|
-tr|-rt {comma-separated field names} Natural descending
|
|
-h|--help Show this message.
|
|
|
|
Example:
|
|
mlr sort -f a,b -nr x,y,z
|
|
which is the same as:
|
|
mlr sort -f a -f b -nr x -nr y -nr z
|
|
|
|
================================================================
|
|
sort-within-records
|
|
Usage: mlr sort-within-records [options]
|
|
Outputs records sorted lexically ascending by keys.
|
|
Options:
|
|
-r Recursively sort subobjects/submaps, e.g. for JSON input.
|
|
-h|--help Show this message.
|
|
|
|
================================================================
|
|
split
|
|
Usage: mlr split [options] {filename}
|
|
Options:
|
|
-n {n}: Cap file sizes at N records.
|
|
-m {m}: Produce M files, round-robining records among them.
|
|
-g {a,b,c}: Write separate files with records having distinct values for fields named a,b,c.
|
|
Exactly one of -m, -n, or -g must be supplied.
|
|
--prefix {p} Specify filename prefix; default "split".
|
|
--suffix {s} Specify filename suffix; default is from mlr output format, e.g. "csv".
|
|
-a Append to existing file(s), if any, rather than overwriting.
|
|
-v Send records along to downstream verbs as well as splitting to files.
|
|
-e Do NOT URL-escape names of output files.
|
|
-j {J} Use string J to join filename parts; default "_".
|
|
-h|--help Show this message.
|
|
Any of the output-format command-line flags (see mlr -h). For example, using
|
|
mlr --icsv --from myfile.csv split --ojson -n 1000
|
|
the input is CSV, but the output files are JSON.
|
|
|
|
Examples: Suppose myfile.csv has 1,000,000 records.
|
|
|
|
100 output files, 10,000 records each. First 10,000 records in split_1.csv, next in split_2.csv, etc.
|
|
mlr --csv --from myfile.csv split -n 10000
|
|
|
|
10 output files, 100,000 records each. Records 1,11,21,etc in split_1.csv, records 2,12,22, etc in split_2.csv, etc.
|
|
mlr --csv --from myfile.csv split -m 10
|
|
Same, but with JSON output.
|
|
mlr --csv --from myfile.csv split -m 10 -o json
|
|
|
|
Same but instead of split_1.csv, split_2.csv, etc. there are test_1.dat, test_2.dat, etc.
|
|
mlr --csv --from myfile.csv split -m 10 --prefix test --suffix dat
|
|
Same, but written to the /tmp/ directory.
|
|
mlr --csv --from myfile.csv split -m 10 --prefix /tmp/test --suffix dat
|
|
|
|
If the shape field has values triangle and square, then there will be split_triangle.csv and split_square.csv.
|
|
mlr --csv --from myfile.csv split -g shape
|
|
|
|
If the color field has values yellow and green, and the shape field has values triangle and square,
|
|
then there will be split_yellow_triangle.csv, split_yellow_square.csv, etc.
|
|
mlr --csv --from myfile.csv split -g color,shape
|
|
|
|
See also the "tee" DSL function which lets you do more ad-hoc customization.
|
|
|
|
================================================================
|
|
ssub
|
|
Usage: mlr ssub [options]
|
|
Replaces old string with new string in specified field(s), without regex support for
|
|
the old string, like the `ssub` DSL function. See also the `gsub` and `sub` verbs.
|
|
Options:
|
|
-f {a,b,c} Field names to convert.
|
|
-h|--help Show this message.
|
|
|
|
================================================================
|
|
stats1
|
|
Usage: mlr stats1 [options]
|
|
Computes univariate statistics for one or more given fields, accumulated across
|
|
the input record stream.
|
|
Options:
|
|
-a {sum,count,...} Names of accumulators: one or more of:
|
|
median This is the same as p50
|
|
p10 p25.2 p50 p98 p100 etc.
|
|
count Count instances of fields
|
|
null_count Count number of empty-string/JSON-null instances per field
|
|
distinct_count Count number of distinct values per field
|
|
mode Find most-frequently-occurring values for fields; first-found wins tie
|
|
antimode Find least-frequently-occurring values for fields; first-found wins tie
|
|
sum Compute sums of specified fields
|
|
mean Compute averages (sample means) of specified fields
|
|
var Compute sample variance of specified fields
|
|
stddev Compute sample standard deviation of specified fields
|
|
meaneb Estimate error bars for averages (assuming no sample autocorrelation)
|
|
skewness Compute sample skewness of specified fields
|
|
kurtosis Compute sample kurtosis of specified fields
|
|
min Compute minimum values of specified fields
|
|
max Compute maximum values of specified fields
|
|
minlen Compute minimum string-lengths of specified fields
|
|
maxlen Compute maximum string-lengths of specified fields
|
|
|
|
-f {a,b,c} Value-field names on which to compute statistics
|
|
--fr {regex} Regex for value-field names on which to compute statistics
|
|
(compute statistics on values in all field names matching regex
|
|
--fx {regex} Inverted regex for value-field names on which to compute statistics
|
|
(compute statistics on values in all field names not matching regex)
|
|
|
|
-g {d,e,f} Optional group-by-field names
|
|
--gr {regex} Regex for optional group-by-field names
|
|
(group by values in field names matching regex)
|
|
--gx {regex} Inverted regex for optional group-by-field names
|
|
(group by values in field names not matching regex)
|
|
|
|
--grfx {regex} Shorthand for --gr {regex} --fx {that same regex}
|
|
|
|
-i Use interpolated percentiles, like R's type=7; default like type=1.
|
|
Not sensical for string-valued fields.\n");
|
|
-s Print iterative stats. Useful in tail -f contexts, in which
|
|
case please avoid pprint-format output since end of input
|
|
stream will never be seen. Likewise, if input is coming from `tail -f`
|
|
be sure to use `--records-per-batch 1`.
|
|
-h|--help Show this message.
|
|
Example: mlr stats1 -a min,p10,p50,p90,max -f value -g size,shape
|
|
Example: mlr stats1 -a count,mode -f size
|
|
Example: mlr stats1 -a count,mode -f size -g shape
|
|
Example: mlr stats1 -a count,mode --fr '^[a-h].*$' -gr '^k.*$'
|
|
This computes count and mode statistics on all field names beginning
|
|
with a through h, grouped by all field names starting with k.
|
|
|
|
Notes:
|
|
* p50 and median are synonymous.
|
|
* min and max output the same results as p0 and p100, respectively, but use
|
|
less memory.
|
|
* String-valued data make sense unless arithmetic on them is required,
|
|
e.g. for sum, mean, interpolated percentiles, etc. In case of mixed data,
|
|
numbers are less than strings.
|
|
* count and mode allow text input; the rest require numeric input.
|
|
In particular, 1 and 1.0 are distinct text for count and mode.
|
|
* When there are mode ties, the first-encountered datum wins.
|
|
|
|
================================================================
|
|
stats2
|
|
Usage: mlr stats2 [options]
|
|
Computes bivariate statistics for one or more given field-name pairs,
|
|
accumulated across the input record stream.
|
|
-a {linreg-ols,corr,...} Names of accumulators: one or more of:
|
|
linreg-ols Linear regression using ordinary least squares
|
|
linreg-pca Linear regression using principal component analysis
|
|
r2 Quality metric for linreg-ols (linreg-pca emits its own)
|
|
logireg Logistic regression
|
|
corr Sample correlation
|
|
cov Sample covariance
|
|
covx Sample-covariance matrix
|
|
-f {a,b,c,d} Value-field name-pairs on which to compute statistics.
|
|
There must be an even number of names.
|
|
-g {e,f,g} Optional group-by-field names.
|
|
-v Print additional output for linreg-pca.
|
|
-s Print iterative stats. Useful in tail -f contexts, in which
|
|
case please avoid pprint-format output since end of input
|
|
stream will never be seen. Likewise, if input is coming from
|
|
`tail -f`, be sure to use `--records-per-batch 1`.
|
|
--fit Rather than printing regression parameters, applies them to
|
|
the input data to compute new fit fields. All input records are
|
|
held in memory until end of input stream. Has effect only for
|
|
linreg-ols, linreg-pca, and logireg.
|
|
Only one of -s or --fit may be used.
|
|
Example: mlr stats2 -a linreg-pca -f x,y
|
|
Example: mlr stats2 -a linreg-ols,r2 -f x,y -g size,shape
|
|
Example: mlr stats2 -a corr -f x,y
|
|
|
|
================================================================
|
|
step
|
|
Usage: mlr step [options]
|
|
Computes values dependent on earlier/later records, optionally grouped by category.
|
|
Options:
|
|
-a {delta,rsum,...} Names of steppers: comma-separated, one or more of:
|
|
counter Count instances of field(s) between successive records
|
|
delta Compute differences in field(s) between successive records
|
|
ewma Exponentially weighted moving average over successive records
|
|
from-first Compute differences in field(s) from first record
|
|
ratio Compute ratios in field(s) between successive records
|
|
rprod Compute running products of field(s) between successive records
|
|
rsum Compute running sums of field(s) between successive records
|
|
shift Alias for shift_lag
|
|
shift_lag Include value(s) in field(s) from the previous record, if any
|
|
shift_lead Include value(s) in field(s) from the next record, if any
|
|
slwin Sliding-window averages over m records back and n forward. E.g. slwin_7_2 for 7 back and 2 forward.
|
|
|
|
-f {a,b,c} Value-field names on which to compute statistics
|
|
-g {d,e,f} Optional group-by-field names
|
|
-F Computes integerable things (e.g. counter) in floating point.
|
|
As of Miller 6 this happens automatically, but the flag is accepted
|
|
as a no-op for backward compatibility with Miller 5 and below.
|
|
-d {x,y,z} Weights for EWMA. 1 means current sample gets all weight (no
|
|
smoothing), near under 1 is light smoothing, near over 0 is
|
|
heavy smoothing. Multiple weights may be specified, e.g.
|
|
"mlr step -a ewma -f sys_load -d 0.01,0.1,0.9". Default if omitted
|
|
is "-d 0.5".
|
|
-o {a,b,c} Custom suffixes for EWMA output fields. If omitted, these default to
|
|
the -d values. If supplied, the number of -o values must be the same
|
|
as the number of -d values.
|
|
-h|--help Show this message.
|
|
|
|
Examples:
|
|
mlr step -a rsum -f request_size
|
|
mlr step -a delta -f request_size -g hostname
|
|
mlr step -a ewma -d 0.1,0.9 -f x,y
|
|
mlr step -a ewma -d 0.1,0.9 -o smooth,rough -f x,y
|
|
mlr step -a ewma -d 0.1,0.9 -o smooth,rough -f x,y -g group_name
|
|
mlr step -a slwin_9_0,slwin_0_9 -f x
|
|
|
|
Please see https://miller.readthedocs.io/en/latest/reference-verbs.html#filter or
|
|
https://en.wikipedia.org/wiki/Moving_average#Exponential_moving_average
|
|
for more information on EWMA.
|
|
|
|
================================================================
|
|
sub
|
|
Usage: mlr sub [options]
|
|
Replaces old string with new string in specified field(s), with regex support
|
|
for the old string and not handling multiple matches, like the `sub` DSL function.
|
|
See also the `gsub` and `ssub` verbs.
|
|
Options:
|
|
-f {a,b,c} Field names to convert.
|
|
-h|--help Show this message.
|
|
|
|
================================================================
|
|
summary
|
|
Usage: mlr summary [options]
|
|
Show summary statistics about the input data.
|
|
|
|
All summarizers:
|
|
field_type string, int, etc. -- if a column has mixed types, all encountered types are printed
|
|
count +1 for every instance of the field across all records in the input record stream
|
|
null_count count of field values either empty string or JSON null
|
|
distinct_count count of distinct values for the field
|
|
mode most-frequently-occurring value for the field
|
|
sum sum of field values
|
|
mean mean of the field values
|
|
stddev standard deviation of the field values
|
|
var variance of the field values
|
|
skewness skewness of the field values
|
|
minlen length of shortest string representation for the field
|
|
maxlen length of longest string representation for the field
|
|
min minimum field value
|
|
p25 first-quartile field value
|
|
median median field value
|
|
p75 third-quartile field value
|
|
max maximum field value
|
|
iqr interquartile range: p75 - p25
|
|
lof lower outer fence: p25 - 3.0 * iqr
|
|
lif lower inner fence: p25 - 1.5 * iqr
|
|
uif upper inner fence: p75 + 1.5 * iqr
|
|
uof upper outer fence: p75 + 3.0 * iqr
|
|
|
|
Default summarizers:
|
|
field_type count mean min max null_count distinct_count
|
|
|
|
Notes:
|
|
* min, p25, median, p75, and max work for strings as well as numbers
|
|
* Distinct-counts are computed on string representations -- so 4.1 and 4.10 are counted as distinct here.
|
|
* If the mode is not unique in the input data, the first-encountered value is reported as the mode.
|
|
|
|
Options:
|
|
-a {mean,sum,etc.} Use only the specified summarizers.
|
|
-x {mean,sum,etc.} Use all summarizers, except the specified ones.
|
|
--all Use all available summarizers.
|
|
-h|--help Show this message.
|
|
|
|
================================================================
|
|
tac
|
|
Usage: mlr tac [options]
|
|
Prints records in reverse order from the order in which they were encountered.
|
|
Options:
|
|
-h|--help Show this message.
|
|
|
|
================================================================
|
|
tail
|
|
Usage: mlr tail [options]
|
|
Passes through the last n records, optionally by category.
|
|
Options:
|
|
-g {a,b,c} Optional group-by-field names for head counts, e.g. a,b,c.
|
|
-n {n} Head-count to print. Default 10.
|
|
-h|--help Show this message.
|
|
|
|
================================================================
|
|
tee
|
|
Usage: mlr tee [options] {filename}
|
|
Options:
|
|
-a Append to existing file, if any, rather than overwriting.
|
|
-p Treat filename as a pipe-to command.
|
|
Any of the output-format command-line flags (see mlr -h). Example: using
|
|
mlr --icsv --opprint put '...' then tee --ojson ./mytap.dat then stats1 ...
|
|
the input is CSV, the output is pretty-print tabular, but the tee-file output
|
|
is written in JSON format.
|
|
|
|
-h|--help Show this message.
|
|
|
|
================================================================
|
|
template
|
|
Usage: mlr template [options]
|
|
Places input-record fields in the order specified by list of column names.
|
|
If the input record is missing a specified field, it will be filled with the fill-with.
|
|
If the input record possesses an unspecified field, it will be discarded.
|
|
Options:
|
|
-f {a,b,c} Comma-separated field names for template, e.g. a,b,c.
|
|
-t {filename} CSV file whose header line will be used for template.
|
|
--fill-with {filler string} What to fill absent fields with. Defaults to the empty string.
|
|
-h|--help Show this message.
|
|
Example:
|
|
* Specified fields are a,b,c.
|
|
* Input record is c=3,a=1,f=6.
|
|
* Output record is a=1,b=,c=3.
|
|
|
|
================================================================
|
|
top
|
|
Usage: mlr top [options]
|
|
-f {a,b,c} Value-field names for top counts.
|
|
-g {d,e,f} Optional group-by-field names for top counts.
|
|
-n {count} How many records to print per category; default 1.
|
|
-a Print all fields for top-value records; default is
|
|
to print only value and group-by fields. Requires a single
|
|
value-field name only.
|
|
--min Print top smallest values; default is top largest values.
|
|
-F Keep top values as floats even if they look like integers.
|
|
-o {name} Field name for output indices. Default "top_idx".
|
|
This is ignored if -a is used.
|
|
Prints the n records with smallest/largest values at specified fields,
|
|
optionally by category. If -a is given, then the top records are emitted
|
|
with the same fields as they appeared in the input. Without -a, only fields
|
|
from -f, fields from -g, and the top-index field are emitted. For more information
|
|
please see https://miller.readthedocs.io/en/latest/reference-verbs#top
|
|
|
|
================================================================
|
|
utf8-to-latin1
|
|
Usage: mlr utf8-to-latin1, with no options.
|
|
Recursively converts record strings from Latin-1 to UTF-8.
|
|
For field-level control, please see the utf8_to_latin1 DSL function.
|
|
Options:
|
|
-h|--help Show this message.
|
|
|
|
================================================================
|
|
unflatten
|
|
Usage: mlr unflatten [options]
|
|
Reverses flatten. Example: field with name 'a.b.c' and value 4
|
|
becomes name 'a' and value '{"b": { "c": 4 }}'.
|
|
Options:
|
|
-f {a,b,c} Comma-separated list of field names to unflatten (default all).
|
|
-s {string} Separator, defaulting to mlr --flatsep value.
|
|
-h|--help Show this message.
|
|
|
|
================================================================
|
|
uniq
|
|
Usage: mlr uniq [options]
|
|
Prints distinct values for specified field names. With -c, same as
|
|
count-distinct. For uniq, -f is a synonym for -g.
|
|
|
|
Options:
|
|
-g {d,e,f} Group-by-field names for uniq counts.
|
|
-x {a,b,c} Field names to exclude for uniq: use each record's others instead.
|
|
-c Show repeat counts in addition to unique values.
|
|
-n Show only the number of distinct values.
|
|
-o {name} Field name for output count. Default "count".
|
|
-a Output each unique record only once. Incompatible with -g.
|
|
With -c, produces unique records, with repeat counts for each.
|
|
With -n, produces only one record which is the unique-record count.
|
|
With neither -c nor -n, produces unique records.
|
|
|
|
================================================================
|
|
unspace
|
|
Usage: mlr unspace [options]
|
|
Replaces spaces in record keys and/or values with _. This is helpful for PPRINT output.
|
|
Options:
|
|
-f {x} Replace spaces with specified filler character.
|
|
-k Unspace only keys, not keys and values.
|
|
-v Unspace only values, not keys and values.
|
|
-h|--help Show this message.
|
|
|
|
================================================================
|
|
unsparsify
|
|
Usage: mlr unsparsify [options]
|
|
Prints records with the union of field names over all input records.
|
|
For field names absent in a given record but present in others, fills in
|
|
a value. This verb retains all input before producing any output.
|
|
Options:
|
|
--fill-with {filler string} What to fill absent fields with. Defaults to
|
|
the empty string.
|
|
-f {a,b,c} Specify field names to be operated on. Any other fields won't be
|
|
modified, and operation will be streaming.
|
|
-h|--help Show this message.
|
|
Example: if the input is two records, one being 'a=1,b=2' and the other
|
|
being 'b=3,c=4', then the output is the two records 'a=1,b=2,c=' and
|
|
'a=,b=3,c=4'.
|
|
================================================================
|