mirror of
https://github.com/johnkerl/miller.git
synced 2026-01-23 02:14:13 +00:00
67 lines
3.3 KiB
ReStructuredText
67 lines
3.3 KiB
ReStructuredText
..
|
|
PLEASE DO NOT EDIT DIRECTLY. EDIT THE .rst.in FILE PLEASE.
|
|
|
|
Unix-toolkit context
|
|
================================================================
|
|
|
|
How does Miller fit within the Unix toolkit (`grep`, `sed`, `awk`, etc.)?
|
|
|
|
File-format awareness
|
|
----------------------------------------------------------------
|
|
|
|
Miller respects CSV headers. If you do ``mlr --csv cat *.csv`` then the header line is written once:
|
|
|
|
::
|
|
|
|
$ cat data/a.csv
|
|
a,b,c
|
|
1,2,3
|
|
4,5,6
|
|
|
|
::
|
|
|
|
$ cat data/b.csv
|
|
a,b,c
|
|
7,8,9
|
|
|
|
::
|
|
|
|
$ mlr --csv cat data/a.csv data/b.csv
|
|
a,b,c
|
|
1,2,3
|
|
4,5,6
|
|
7,8,9
|
|
|
|
::
|
|
|
|
$ mlr --csv sort -nr b data/a.csv data/b.csv
|
|
a,b,c
|
|
7,8,9
|
|
4,5,6
|
|
1,2,3
|
|
|
|
Likewise with ``mlr sort``, ``mlr tac``, and so on.
|
|
|
|
awk-like features: mlr filter and mlr put
|
|
----------------------------------------------------------------
|
|
|
|
* ``mlr filter`` includes/excludes records based on a filter expression, e.g. ``mlr filter '$count > 10'``.
|
|
|
|
* ``mlr put`` adds a new field as a function of others, e.g. ``mlr put '$xy = $x * $y'`` or ``mlr put '$counter = NR'``.
|
|
|
|
* The ``$name`` syntax is straight from ``awk``'s ``$1 $2 $3`` (adapted to name-based indexing), as are the variables ``FS``, ``OFS``, ``RS``, ``ORS``, ``NF``, ``NR``, and ``FILENAME``. The ``ENV[...]`` syntax is from Ruby.
|
|
|
|
* While ``awk`` functions are record-based, Miller subcommands (or *verbs*) are stream-based: each of them maps a stream of records into another stream of records.
|
|
|
|
* Like ``awk``, Miller (as of v5.0.0) allows you to define new functions within its ``put`` and ``filter`` expression language. Further programmability comes from chaining with ``then``.
|
|
|
|
* As with ``awk``, ``$``-variables are stream variables and all verbs (such as ``cut``, ``stats1``, ``put``, etc.) as well as ``put``/``filter`` statements operate on streams. This means that you define actions to be done on each record and then stream your data through those actions. The built-in variables ``NF``, ``NR``, etc. change from one line to another, ``$x`` is a label for field ``x`` in the current record, and the input to ``sqrt($x)`` changes from one record to the next. The expression language for the ``put`` and ``filter`` verbs additionally allows you to define ``begin {...}`` and ``end {...}`` blocks for actions to be taken before and after records are processed, respectively.
|
|
|
|
* As with ``awk``, Miller's ``put``/``filter`` language lets you set ``@sum=0`` before records are read, then update that sum on each record, then print its value at the end. Unlike ``awk``, Miller makes syntactically explicit the difference between variables with extent across all records (names starting with ``@``, such as ``@sum``) and variables which are local to the current expression (names starting without ``@``, such as ``sum``).
|
|
|
|
* Miller can be faster than ``awk``, ``cut``, and so on, depending on platform; see also :doc:`performance`. In particular, Miller's DSL syntax is parsed into C control structures at startup time, with the bulk data-stream processing all done in C.
|
|
|
|
See also
|
|
----------------------------------------------------------------
|
|
|
|
See :doc:`reference-verbs` for more on Miller's subcommands ``cat``, ``cut``, ``head``, ``sort``, ``tac``, ``tail``, ``top``, and ``uniq``, as well as :doc:`reference-dsl` for more on the awk-like ``mlr filter`` and ``mlr put``.
|