18 KiB
Under construction
Comments-in-data flags
Miller lets you put comments in your data, such as
# This is a comment for a CSV file
a,b,c
1,2,3
4,5,6
Notes:
- Comments are only honored at the start of a line.
- In the absence of any of the below four options, comments are data like any other text. (The comments-in-data feature is opt-in.)
- When
--pass-commentsis used, comment lines are written to standard output immediately upon being read; they are not part of the record stream. Results may be counterintuitive. A suggestion is to place comments at the start of data files.
Flags:
--pass-comments: Immediately print commented lines (prefixed by#) within the input.--pass-comments-with {string}: Immediately print commented lines within input, with specified prefix.--skip-comments: Ignore commented lines (prefixed by#) within the input.--skip-comments-with {string}: Ignore commented lines within input, with specified prefix.
Compressed-data flags
Miller offers a few different ways to handle reading data files which have been compressed.
- Decompression done within the Miller process itself:
--bz2in--gzin--zin - Decompression done outside the Miller process:
--prepipe--prepipex
Using --prepipe and --prepipex you can specify an action to be
taken on each input file. The prepipe command must be able to read from
standard input; it will be invoked with {command} < {filename}. The
prepipex command must take a filename as argument; it will be invoked with
{command} {filename}.
Examples:
mlr --prepipe gunzip
mlr --prepipe zcat -cf
mlr --prepipe xz -cd
mlr --prepipe cat
Note that this feature is quite general and is not limited to decompression
utilities. You can use it to apply per-file filters of your choice. For output
compression (or other) utilities, simply pipe the output:
mlr ... | {your compression command} > outputfilenamegoeshere
Lastly, note that if --prepipe or --prepipex is specified, it replaces any
decisions that might have been made based on the file suffix. Likewise,
--gzin/--bz2in/--zin are ignored if --prepipe is also specified.
Flags:
--bz2in: Uncompress bzip2 within the Miller process. Done by default if file ends in.bz2.--gzin: Uncompress gzip within the Miller process. Done by default if file ends in.gz.--prepipe {decompression command}: You can, of course, already do without this for single input files, e.g.gunzip < myfile.csv.gz | mlr .... Allowed at the command line, but not in.mlrrcto avoid unexpected code execution.--prepipe-bz2: Same as--prepipe bz2, except this is allowed in.mlrrc.--prepipe-gunzip: Same as--prepipe gunzip, except this is allowed in.mlrrc.--prepipe-zcat: Same as--prepipe zcat, except this is allowed in.mlrrc.--prepipex {decompression command}: Like--prepipewith one exception: doesn't insert<between command and filename at runtime. Useful for some commands likeunzip -qcwhich don't read standard input. Allowed at the command line, but not in.mlrrcto avoid unexpected code execution.--zin: Uncompress zlib within the Miller process. Done by default if file ends in.z.
CSV-only flags
Flags:
--allow-ragged-csv-input or --ragged: If a data line has fewer fields than the header line, fill remaining keys with empty string. If a data line has more fields than the header line, use integer field labels as in the implicit-header case.--headerless-csv-output: Print only CSV data lines; do not print CSV header lines.--implicit-csv-header: Use 1,2,3,... as field labels, rather than from line 1 of input files. Tip: combine withlabelto recreate missing headers.--no-implicit-csv-header: Opposite of--implicit-csv-header. This is the default anyway -- the main use is for the flags tomlr joinif you have main file(s) which are headerless but you want to join in on a file which does have a CSV header. Then you could usemlr --csv --implicit-csv-header join --no-implicit-csv-header -l your-join-in-with-header.csv ... your-headerless.csv.-N: Keystroke-saver for--implicit-csv-header --headerless-csv-output.
File-format flags
TO DO: brief list of formats w/ xref to m6 webdocs.
Examples: --csv for CSV-formatted input and output; --icsv --opprint for
CSV-formatted input and pretty-printed output.
Please use --iformat1 --oformat2 rather than --format1 --oformat2.
The latter sets up input and output flags for format1, not all of which
are overridden in all cases by setting output format to format2.
Flags:
--asv or --asvlite: Use ASV format for input and output data.--csv or -c: Use CSV format for input and output data.--csvlite: Use CSV-lite format for input and output data.--dkvp: Use DKVP format for input and output data.--iasv or --iasvlite: Use ASV format for input data.--icsv: Use CSV format for input data.--icsvlite: Use CSV-lite format for input data.--idkvp: Use DKVP format for input data.--ijson: Use JSON format for input data.--inidx: Use NIDX format for input data.--io {format name}: Use format name for input and output data. For example:--io csvis the same as--csv.--ipprint: Use PPRINT format for input data.--itsv: Use TSV format for input data.--itsvlite: Use TSV-lite format for input data.--iusv or --iusvlite: Use USV format for input data.--ixtab: Use XTAB format for input data.--json or -j: Use JSON format for input and output data.--nidx: Use NIDX format for input and output data.--oasv or --oasvlite: Use ASV format for output data.--ocsv: Use CSV format for output data.--ocsvlite: Use CSV-lite format for output data.--odkvp: Use DKVP format for output data.--ojson: Use JSON format for output data.--omd: Use markdown-tabular format for output data.--onidx: Use NIDX format for output data.--opprint: Use PPRINT format for output data.--otsv: Use TSV format for output data.--otsvlite: Use TSV-lite format for output data.--ousv or --ousvlite: Use USV format for output data.--oxtab: Use XTAB format for output data.--pprint: Use PPRINT format for input and output data.--tsv: Use TSV format for input and output data.--tsvlite or -t: Use TSV-lite format for input and output data.--usv or --usvlite: Use USV format for input and output data.--xtab: Use XTAB format for input and output data.-i {format name}: Use format name for input data. For example:-i csvis the same as--icsv.-o {format name}: Use format name for output data. For example:-o csvis the same as--ocsv.
Flatten-unflatten flags
Flags:
--flatsep or --jflatsep or --oflatsep {string}: Separator for flattening multi-level JSON keys, e.g.{"a":{"b":3}}becomesa:b => 3for non-JSON formats. Defaults to..--no-auto-flatten:--no-auto-unflatten:
Format-conversion keystroke-saver flags
The letters c, t, j, d, n, x, p, and m refer to formats CSV, TSV, DKVP, NIDX, JSON, XTAB,
PPRINT, and markdown, respectively. Note that markdown format is available for
output only.
| In out | CSV | TSV | JSON | DKVP | NIDX | XTAB | PPRINT | Markdown |
|---|---|---|---|---|---|---|---|---|
| CSV | --c2t |
--c2j |
--c2d |
--c2n |
--c2x |
--c2p |
--c2m |
|
| TSV | --t2c |
--t2j |
--t2d |
--t2n |
--t2x |
--t2p |
--t2m |
|
| JSON | --j2c |
--j2t |
--j2d |
--j2n |
--j2x |
--j2p |
--j2m |
|
| DKVP | --d2c |
--d2t |
--d2j |
--d2n |
--d2x |
--d2p |
--d2m |
|
| NIDX | --n2c |
--n2t |
--n2j |
--n2d |
--n2x |
--n2p |
--n2m |
|
| XTAB | --x2c |
--x2t |
--x2j |
--x2d |
--x2n |
--x2p |
--x2m |
|
| PPRINT | --p2c |
--p2t |
--p2j |
--p2d |
--p2n |
--p2x |
--p2m |
Additionally:
-pis a keystroke-saver for--nidx --fs space --repifs.-Tis a keystroke-saver for--nidx --fs tab.
JSON-only flags
These are flags which are applicable to JSON format.
Flags:
--jlistwrap or --jl: Wrap JSON output in outermost[ ].--jvstack: Put one key-value pair per line for JSON output (multi-line output).--no-jvstack: Put objects/arrays all on one line for JSON output.
Legacy flags
These are flags which don't do anything in the current Miller version. They are accepted as no-op flags in order to keep old scripts from breaking.
Flags:
--jknquoteint: Type information from JSON input files is now preserved throughout the processing stream.--jquoteall: Type information from JSON input files is now preserved throughout the processing stream.--json-fatal-arrays-on-input: Miller now supports arrays as of version 6.--json-map-arrays-on-input: Miller now supports arrays as of version 6.--json-skip-arrays-on-input: Miller now supports arrays as of version 6.--jsonx: The--jvstackflag is now default true in Miller 6.--jvquoteall: Type information from JSON input files is now preserved throughout the processing stream.--mmap: Miller no longer uses memory-mapping to access data files.--no-fflush: The current implementation of Miller does not use buffered output, so there is no longer anything to suppress here.--no-mmap: Miller no longer uses memory-mapping to access data files.--ojsonx: The--jvstackflag is now default true in Miller 6.
Miscellaneous flags
Flags:
--from {filename}: Use this to specify an input file before the verb(s), rather than after. May be used more than once. Example:mlr --from a.dat --from b.dat catis the same asmlr cat a.dat b.dat.--load {filename}: Load DSL script file for all put/filter operations on the command line. If the name following--loadis a directory, load all*.mlrfiles in that directory. This is just likeput -fandfilter -fexcept it's up-front on the command line, so you can do something likealias mlr='mlr --load ~/myscripts'if you like.--mfrom {filenames}: Use this to specify one of more input files before the verb(s), rather than after. May be used more than once. The list of filename must end with--. This is useful for example since--from *.csvdoesn't do what you might hope but--mfrom *.csv --does.--mload {filenames}: Like--loadbut works with more than one filename, e.g.--mload *.mlr --.--ofmt {format}: E.g. %.18f, %.0f, %9.6e. Please use sprintf-style codes for floating-point nummbers. If not specified, default formatting is used. See also thefmtnumfunction and theformat-valuesverb.--seed {n}: withnof the form12345678or0xcafefeed. Forput/filterurand,urandint, andurand32.-I: Process files in-place. For each file name on the command line, output is written to a temp file in the same directory, which is then renamed over the original. Each file is processed in isolation: if the output format is CSV, CSV headers will be present in each output file, statistics are only over each file's own records; and so on.-n: Process no input files, nor standard input either. Useful formlr putwithbegin/endstatements only. (Same as--from /dev/null.) Also useful inmlr -n put -v '...'for analyzing abstract syntax trees (if that's your thing).
Output-colorization flags
Miller uses colors to highlight outputs. You can specify color preferences. Note: output colorization does not work on Windows.
Things having colors:
- Keys in CSV header lines, JSON keys, etc
- Values in CSV data lines, JSON scalar values, etc in regression-test output
- Some online-help strings
Rules for coloring:
- By default, colorize output only if writing to stdout and stdout is a TTY.
- Example: color:
mlr --csv cat foo.csv - Example: no color:
mlr --csv cat foo.csv > bar.csv - Example: no color:
mlr --csv cat foo.csv | less
- Example: color:
- The default colors were chosen since they look OK with white or black terminal background, and are differentiable with common varieties of human color vision.
Mechanisms for coloring:
- Miller uses ANSI escape sequences only. This does not work on Windows except within Cygwin.
- Requires
TERMenvironment variable to be set to non-empty string. - Doesn't try to check to see whether the terminal is capable of 256-color ANSI vs 16-color ANSI. Note that if colors are in the range 0..15 then 16-color ANSI escapes are used, so this is in the user's control.
How you can control colorization:
-
Suppression/unsuppression:
- Environment variable
export MLR_NO_COLOR=truemeans don't color even if stdout+TTY. - Environment variable
export MLR_ALWAYS_COLOR=truemeans do color even if not stdout+TTY. For example, you might want to use this when piping mlr output toless -r. - Command-line flags
--no-coloror-M,--always-coloror-C.
- Environment variable
-
Color choices can be specified by using environment variables, or command-line flags, with values 0..255:
export MLR_KEY_COLOR=208,MLR_VALUE_COLOR=33, etc.:MLR_KEY_COLORMLR_VALUE_COLORMLR_PASS_COLORMLR_FAIL_COLORMLR_REPL_PS1_COLORMLR_REPL_PS2_COLORMLR_HELP_COLOR- Command-line flags
--key-color 208,--value-color 33, etc.:--key-color--value-color--pass-color--fail-color--repl-ps1-color--repl-ps2-color--help-color - This is particularly useful if your terminal's background color clashes with current settings.
If environment-variable settings and command-line flags are both provided, the latter take precedence.
Please do mlr --list-color-codes to see the available color codes (like 170), and
mlr --list-color-names to see available names (like orchid).
Flags:
--always-color or -C:--fail-color:--help-color:--key-color:--list-color-codes:--list-color-names:--no-color or -M:--pass-color:--value-color:
PPRINT-only flags
These are flags which are applicable to PPRINT output format.
Flags:
--barred: Prints a border around PPRINT output (not available for input).--right: Right-justifies all fields for PPRINT output.
Separator flags
Separator options:
--rs --irs --ors Record separators, e.g. 'lf' or '\\r\\n'
--fs --ifs --ofs --repifs Field separators, e.g. comma
--ps --ips --ops Pair separators, e.g. equals sign
TODO: auto-detect is still TBD for Miller 6
Notes about line endings:
- Default line endings (
--irsand--ors) are "auto" which means autodetect from the input file format, as long as the input file(s) have lines ending in either LF (also known as linefeed,\n,0x0a, or Unix-style) or CRLF (also known as carriage-return/linefeed pairs,\r\n,0x0d 0x0a, or Windows-style). - If both
irsandorsareauto(which is the default) then LF input will lead to LF output and CRLF input will lead to CRLF output, regardless of the platform you're running on. - The line-ending autodetector triggers on the first line ending detected in the input stream. E.g. if you specify a CRLF-terminated file on the command line followed by an LF-terminated file then autodetected line endings will be CRLF.
- If you use
--ors {something else}with (default or explicitly specified)--irs autothen line endings are autodetected on input and set to what you specify on output. - If you use
--irs {something else}with (default or explicitly specified)--ors autothen the output line endings used are LF on Unix/Linux/BSD/MacOSX, and CRLF on Windows.
Notes about all other separators:
-
IPS/OPS are only used for DKVP and XTAB formats, since only in these formats do key-value pairs appear juxtaposed.
-
IRS/ORS are ignored for XTAB format. Nominally IFS and OFS are newlines; XTAB records are separated by two or more consecutive IFS/OFS -- i.e. a blank line. Everything above about
--irs/--ors/--rs autobecomes--ifs/--ofs/--fsauto for XTAB format. (XTAB's default IFS/OFS are "auto".) -
OFS must be single-character for PPRINT format. This is because it is used with repetition for alignment; multi-character separators would make alignment impossible.
-
OPS may be multi-character for XTAB format, in which case alignment is disabled.
-
TSV is simply CSV using tab as field separator (
--fs tab). -
FS/PS are ignored for markdown format; RS is used.
-
All FS and PS options are ignored for JSON format, since they are not relevant to the JSON format.
-
You can specify separators in any of the following ways, shown by example:
- Type them out, quoting as necessary for shell escapes, e.g.
--fs '|' --ips : - C-style escape sequences, e.g.
--rs '\r\n' --fs '\t'. - To avoid backslashing, you can use any of the following names: TODO desc-to-chars map
- Type them out, quoting as necessary for shell escapes, e.g.
-
Default separators by format: TODO default_xses
Flags:
--fs {string}: Specify FS for input and output.--ifs {string}: Specify FS for input.--ips {string}: Specify PS for input.--irs {string}: Specify RS for input.--ofs {string}: Specify FS for output.--ops {string}: Specify PS for output.--ors {string}: Specify RS for output.--ps {string}: Specify PS for input and output.--repifs: Let IFS be repeated: e.g. for splitting on multiple spaces.--rs {string}: Specify RS for input and output.