This commit is contained in:
John Kerl 2021-03-21 22:37:46 -04:00
parent 8b4191df00
commit 6e4fe0f03e
9 changed files with 41 additions and 37 deletions

View file

@ -532,7 +532,7 @@ Suppose you have some date-stamped data which may (or may not) be missing entrie
::
$ wc -l data/miss-date.csv
1372 data/miss-date.csv
1372 data/miss-date.csv
Since there are 1372 lines in the data file, some automation is called for. To find the missing dates, you can convert the dates to seconds since the epoch using ``strptime``, then compute adjacent differences (the ``cat -n`` simply inserts record-counters):

View file

@ -96,7 +96,7 @@ Peek at the data:
::
$ wc -l data/colored-shapes.dkvp
10078 data/colored-shapes.dkvp
10078 data/colored-shapes.dkvp
$ head -n 6 data/colored-shapes.dkvp | mlr --opprint cat
color shape flag i u v w x

View file

@ -117,23 +117,23 @@ Run as-is:
::
$ python polyglot-dkvp-io/example.py < data/small
a=pan,b=pan,i=1,y=0.7268028627434533,ab=panpan,iy=1.7268028627434533,ta=str,tb=str,ti=int,ty=float,tab=str,tiy=float
a=eks,b=pan,i=2,y=0.5221511083334797,ab=ekspan,iy=2.5221511083334796,ta=str,tb=str,ti=int,ty=float,tab=str,tiy=float
a=wye,b=wye,i=3,y=0.33831852551664776,ab=wyewye,iy=3.3383185255166477,ta=str,tb=str,ti=int,ty=float,tab=str,tiy=float
a=eks,b=wye,i=4,y=0.13418874328430463,ab=ekswye,iy=4.134188743284304,ta=str,tb=str,ti=int,ty=float,tab=str,tiy=float
a=wye,b=pan,i=5,y=0.8636244699032729,ab=wyepan,iy=5.863624469903273,ta=str,tb=str,ti=int,ty=float,tab=str,tiy=float
a=pan,b=pan,i=1,y=0.726802862743,ab=panpan,iy=1.72680286274,ta=str,tb=str,ti=int,ty=float,tab=str,tiy=float
a=eks,b=pan,i=2,y=0.522151108333,ab=ekspan,iy=2.52215110833,ta=str,tb=str,ti=int,ty=float,tab=str,tiy=float
a=wye,b=wye,i=3,y=0.338318525517,ab=wyewye,iy=3.33831852552,ta=str,tb=str,ti=int,ty=float,tab=str,tiy=float
a=eks,b=wye,i=4,y=0.134188743284,ab=ekswye,iy=4.13418874328,ta=str,tb=str,ti=int,ty=float,tab=str,tiy=float
a=wye,b=pan,i=5,y=0.863624469903,ab=wyepan,iy=5.8636244699,ta=str,tb=str,ti=int,ty=float,tab=str,tiy=float
Run as-is, then pipe to Miller for pretty-printing:
::
$ python polyglot-dkvp-io/example.py < data/small | mlr --opprint cat
a b i y ab iy ta tb ti ty tab tiy
pan pan 1 0.7268028627434533 panpan 1.7268028627434533 str str int float str float
eks pan 2 0.5221511083334797 ekspan 2.5221511083334796 str str int float str float
wye wye 3 0.33831852551664776 wyewye 3.3383185255166477 str str int float str float
eks wye 4 0.13418874328430463 ekswye 4.134188743284304 str str int float str float
wye pan 5 0.8636244699032729 wyepan 5.863624469903273 str str int float str float
a b i y ab iy ta tb ti ty tab tiy
pan pan 1 0.726802862743 panpan 1.72680286274 str str int float str float
eks pan 2 0.522151108333 ekspan 2.52215110833 str str int float str float
wye wye 3 0.338318525517 wyewye 3.33831852552 str str int float str float
eks wye 4 0.134188743284 ekswye 4.13418874328 str str int float str float
wye pan 5 0.863624469903 wyepan 5.8636244699 str str int float str float
DKVP I/O in Ruby
----------------------------------------------------------------
@ -287,11 +287,11 @@ The :ref:`reference-dsl-system` DSL function allows you to run a specific shell
$ mlr --opprint put '$o = system("echo -n ".$a."| sha1sum")' data/small
a b i x y o
pan pan 1 0.3467901443380824 0.7268028627434533 f29c748220331c273ef16d5115f6ecd799947f13 -
eks pan 2 0.7586799647899636 0.5221511083334797 456d988ecb3bf1b75f057fc6e9fe70db464e9388 -
wye wye 3 0.20460330576630303 0.33831852551664776 eab0de043d67f441c7fd1e335f0ca38708e6ebf7 -
eks wye 4 0.38139939387114097 0.13418874328430463 456d988ecb3bf1b75f057fc6e9fe70db464e9388 -
wye pan 5 0.5732889198020006 0.8636244699032729 eab0de043d67f441c7fd1e335f0ca38708e6ebf7 -
pan pan 1 0.3467901443380824 0.7268028627434533 bd2bd8216b9cb4aa5a12daa6cfc98eef2ee20e56 -
eks pan 2 0.7586799647899636 0.5221511083334797 16191338e81a46c7d127f5c8899f5c92e3cd38e3 -
wye wye 3 0.20460330576630303 0.33831852551664776 14ba3c3e96a2474ab6dc7409ebf9d6b9cc3d84f0 -
eks wye 4 0.38139939387114097 0.13418874328430463 16191338e81a46c7d127f5c8899f5c92e3cd38e3 -
wye pan 5 0.5732889198020006 0.8636244699032729 14ba3c3e96a2474ab6dc7409ebf9d6b9cc3d84f0 -
Note that running a subprocess on every record takes a non-trivial amount of time. Comparing asking the system ``date`` command for the current time in nanoseconds versus computing it in process:

View file

@ -28,7 +28,7 @@ This is simply a copy of what you should see on running **man mlr** at a command
insertion-ordered hash map. This encompasses a variety of data
formats, including but not limited to the familiar CSV, TSV, and JSON.
(Miller can handle positionally-indexed data as a special case.) This
manpage documents Miller v5.10.0-dev.
manpage documents Miller v5.10.1.
EXAMPLES
COMMAND-LINE SYNTAX
@ -493,7 +493,7 @@ This is simply a copy of what you should see on running **man mlr** at a command
Useful for doing a well-formatted check on input data.
clean-whitespace
Usage: mlr clean-whitespace [options] {old1,new1,old2,new2,...}
Usage: mlr clean-whitespace [options]
For each record, for each field in the record, whitespace-cleans the keys and
values. Whitespace-cleaning entails stripping leading and trailing whitespace,
and replacing multiple whitespace with singles. For finer-grained control,
@ -503,7 +503,8 @@ This is simply a copy of what you should see on running **man mlr** at a command
Options:
-k|--keys-only Do not touch values.
-v|--values-only Do not touch keys.
It is an error to specify -k as well as -v.
It is an error to specify -k as well as -v -- to clean keys and values,
leave off -k as well as -v.
count
Usage: mlr count [options]
@ -2377,4 +2378,4 @@ This is simply a copy of what you should see on running **man mlr** at a command
2021-03-03 MILLER(1)
2021-03-22 MILLER(1)

View file

@ -18,7 +18,7 @@ DESCRIPTION
insertion-ordered hash map. This encompasses a variety of data
formats, including but not limited to the familiar CSV, TSV, and JSON.
(Miller can handle positionally-indexed data as a special case.) This
manpage documents Miller v5.10.0-dev.
manpage documents Miller v5.10.1.
EXAMPLES
COMMAND-LINE SYNTAX
@ -483,7 +483,7 @@ VERBS
Useful for doing a well-formatted check on input data.
clean-whitespace
Usage: mlr clean-whitespace [options] {old1,new1,old2,new2,...}
Usage: mlr clean-whitespace [options]
For each record, for each field in the record, whitespace-cleans the keys and
values. Whitespace-cleaning entails stripping leading and trailing whitespace,
and replacing multiple whitespace with singles. For finer-grained control,
@ -493,7 +493,8 @@ VERBS
Options:
-k|--keys-only Do not touch values.
-v|--values-only Do not touch keys.
It is an error to specify -k as well as -v.
It is an error to specify -k as well as -v -- to clean keys and values,
leave off -k as well as -v.
count
Usage: mlr count [options]
@ -2367,4 +2368,4 @@ SEE ALSO
2021-03-03 MILLER(1)
2021-03-22 MILLER(1)

View file

@ -2,12 +2,12 @@
.\" Title: mlr
.\" Author: [see the "AUTHOR" section]
.\" Generator: ./mkman.rb
.\" Date: 2021-03-03
.\" Date: 2021-03-22
.\" Manual: \ \&
.\" Source: \ \&
.\" Language: English
.\"
.TH "MILLER" "1" "2021-03-03" "\ \&" "\ \&"
.TH "MILLER" "1" "2021-03-22" "\ \&" "\ \&"
.\" -----------------------------------------------------------------
.\" * Portability definitions
.\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@ -38,7 +38,7 @@ on integer-indexed fields: if the natural data structure for the latter is the
array, then Miller's natural data structure is the insertion-ordered hash map.
This encompasses a variety of data formats, including but not limited to the
familiar CSV, TSV, and JSON. (Miller can handle positionally-indexed data as
a special case.) This manpage documents Miller v5.10.0-dev.
a special case.) This manpage documents Miller v5.10.1.
.SH "EXAMPLES"
.sp
@ -642,7 +642,7 @@ Useful for doing a well-formatted check on input data.
.RS 0
.\}
.nf
Usage: mlr clean-whitespace [options] {old1,new1,old2,new2,...}
Usage: mlr clean-whitespace [options]
For each record, for each field in the record, whitespace-cleans the keys and
values. Whitespace-cleaning entails stripping leading and trailing whitespace,
and replacing multiple whitespace with singles. For finer-grained control,
@ -652,7 +652,8 @@ and clean_whitespace.
Options:
-k|--keys-only Do not touch values.
-v|--values-only Do not touch keys.
It is an error to specify -k as well as -v.
It is an error to specify -k as well as -v -- to clean keys and values,
leave off -k as well as -v.
.fi
.if n \{\
.RE

View file

@ -1422,7 +1422,7 @@ This produces heteregenous output which Miller, of course, has no problems with
$ mlr put '$x > 0.0; $y = log10($x); $z = sqrt($y)' data/put-gating-example-1.dkvp
x=-1,y=nan,z=nan
x=0,y=-inf,z=nan
x=0,y=-inf,z=-nan
x=1,y=0.000000,z=0.000000
x=2,y=0.301030,z=0.548662
x=3,y=0.477121,z=0.690740

View file

@ -294,7 +294,7 @@ clean-whitespace
::
$ mlr clean-whitespace --help
Usage: mlr clean-whitespace [options] {old1,new1,old2,new2,...}
Usage: mlr clean-whitespace [options]
For each record, for each field in the record, whitespace-cleans the keys and
values. Whitespace-cleaning entails stripping leading and trailing whitespace,
and replacing multiple whitespace with singles. For finer-grained control,
@ -304,7 +304,8 @@ clean-whitespace
Options:
-k|--keys-only Do not touch values.
-v|--values-only Do not touch keys.
It is an error to specify -k as well as -v.
It is an error to specify -k as well as -v -- to clean keys and values,
leave off -k as well as -v.
::
@ -3146,7 +3147,7 @@ There are two main ways to use ``mlr uniq``: the first way is with ``-g`` to spe
::
$ wc -l data/colored-shapes.dkvp
10078 data/colored-shapes.dkvp
10078 data/colored-shapes.dkvp
::
@ -3288,7 +3289,7 @@ The second main way to use ``mlr uniq`` is without group-by columns, using ``-a`
::
$ wc -l data/repeats.dkvp
57 data/repeats.dkvp
57 data/repeats.dkvp
::

View file

@ -1066,7 +1066,7 @@ Examples:
For more information, please invoke mlr {subcommand} --help
For more information please see http://johnkerl.org/miller/doc and/or
http://github.com/johnkerl/miller. This is Miller version v5.10.0-dev.
http://github.com/johnkerl/miller. This is Miller version v5.10.1.
::