diff --git a/.gitignore b/.gitignore index 36e4b51ad..cdc106b58 100644 --- a/.gitignore +++ b/.gitignore @@ -1,4 +1,7 @@ go/mlr +go/mlr.exe +./mlr +./mlr.exe a.out *.dSYM catc diff --git a/Makefile b/Makefile index beaf67828..79f7cff9b 100644 --- a/Makefile +++ b/Makefile @@ -1,5 +1,9 @@ +# TODO: 'cp go/mlr .' or 'copy go\mlr.exe .' with reliable platform detection +# and no confusing error messages. + build: make -C go build + @echo Miller executable is: go/mlr check: make -C go check diff --git a/README.md b/README.md index 6670c720e..3ba7571c7 100644 --- a/README.md +++ b/README.md @@ -28,11 +28,11 @@ key-value-pair data in a variety of data formats. # Getting started +* [Miller in 10 minutes](https://miller.readthedocs.io/en/latest/10min) * [A quick tutorial on Miller](https://www.ict4g.net/adolfo/notes/data-analysis/miller-quick-tutorial.html) * [Tools to manipulate CSV files from the Command Line](https://www.ict4g.net/adolfo/notes/data-analysis/tools-to-manipulate-csv.html) * [www.togaware.com/linux/survivor/CSV_Files.html](https://www.togaware.com/linux/survivor/CSV_Files.html) * [MLR for CSV manipulation](https://guillim.github.io/terminal/2018/06/19/MLR-for-CSV-manipulation.html) -* [Miller in 10 minutes](https://miller.readthedocs.io/en/latest/10min.html) * [Linux Magazine: Process structured text files with Miller](https://www.linux-magazine.com/Issues/2016/187/Miller) * [Miller: Command Line CSV File Processing](https://onepointzero.app/posts/miller-command-line-csv-file-processing/) @@ -43,12 +43,6 @@ key-value-pair data in a variety of data formats. * [Notes about issue-labeling in the Github repo](https://github.com/johnkerl/miller/wiki/Issue-labeling) * [Active issues](https://github.com/johnkerl/miller/issues?q=is%3Aissue+is%3Aopen+sort%3Aupdated-desc) -# Miller 6 pre-release - -* Pre-release/WIP docs are at [http://johnkerl.org/miller6](http://johnkerl.org/miller6) -* [go/README.md](./go/README.md) -* [Tracking issue](https://github.com/johnkerl/miller/issues/372) - # Installing There's a good chance you can get Miller pre-built for your system: @@ -81,9 +75,15 @@ See also [building from source](https://miller.readthedocs.io/en/latest/build.ht [![Go-port multi-platform build status](https://github.com/johnkerl/miller/actions/workflows/go.yml/badge.svg)](https://github.com/johnkerl/miller/actions) -[License: BSD2](https://github.com/johnkerl/miller/blob/master/LICENSE.txt) +# Building from source -[Docs](https://miller.readthedocs.io/en/latest/?badge=latest) +* `make` and `make check` +* The Miller executable is `go/mlr` (or `go\mlr.exe` on Windows) +* For more developer information please see [go/README.md](./go/README.md) + +# License + +[License: BSD2](https://github.com/johnkerl/miller/blob/master/LICENSE.txt) # Community diff --git a/docs/mkdocs.yml b/docs/mkdocs.yml index 830541625..d3ba3fe98 100644 --- a/docs/mkdocs.yml +++ b/docs/mkdocs.yml @@ -105,7 +105,6 @@ nav: - 'Misc. reference': - "Auxiliary commands": "reference-main-auxiliary-commands.md" - "Manual page": "manpage.md" - - "Installation": "installation.md" - "Building from source": "build.md" - "Documents for previous releases": "release-docs.md" - "Glossary": "glossary.md" diff --git a/docs/src/10min.md.in b/docs/src/10min.md.in index f1d3866f9..a62da1d0b 100644 --- a/docs/src/10min.md.in +++ b/docs/src/10min.md.in @@ -8,69 +8,69 @@ For most of this section we'll use our [example.csv](./example.csv). `mlr cat` is like system `cat` (or `type` on Windows) -- it passes the data through unmodified: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --csv cat example.csv -GENMD_EOF +GENMD-EOF But `mlr cat` can also do format conversion -- for example, you can pretty-print in tabular format: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --icsv --opprint cat example.csv -GENMD_EOF +GENMD-EOF `mlr head` and `mlr tail` count records rather than lines. Whether you're getting the first few records or the last few, the CSV header is included either way: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --csv head -n 4 example.csv -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --csv tail -n 4 example.csv -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --icsv --ojson tail -n 2 example.csv -GENMD_EOF +GENMD-EOF You can sort on a single field: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --icsv --opprint sort -f shape example.csv -GENMD_EOF +GENMD-EOF Or, you can sort primarily alphabetically on one field, then secondarily numerically descending on another field, and so on: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --icsv --opprint sort -f shape -nr index example.csv -GENMD_EOF +GENMD-EOF If there are fields you don't want to see in your data, you can use `cut` to keep only the ones you want, in the same order they appeared in the input data: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --icsv --opprint cut -f flag,shape example.csv -GENMD_EOF +GENMD-EOF You can also use `cut -o` to keep specified fields, but in your preferred order: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --icsv --opprint cut -o -f flag,shape example.csv -GENMD_EOF +GENMD-EOF You can use `cut -x` to omit fields you don't care about: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --icsv --opprint cut -x -f flag,shape example.csv -GENMD_EOF +GENMD-EOF Even though Miller's main selling point is name-indexing, sometimes you really want to refer to a field name by its positional index. Use `$[[3]]` to access the name of field 3 or `$[[[3]]]` to access the value of field 3: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --icsv --opprint put '$[[3]] = "NEW"' example.csv -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --icsv --opprint put '$[[[3]]] = "NEW"' example.csv -GENMD_EOF +GENMD-EOF You can find the full list of verbs at the [Verbs Reference](reference-verbs.md) page. @@ -78,33 +78,33 @@ You can find the full list of verbs at the [Verbs Reference](reference-verbs.md) You can use `filter` to keep only records you care about: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --icsv --opprint filter '$color == "red"' example.csv -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --icsv --opprint filter '$color == "red" && $flag == true' example.csv -GENMD_EOF +GENMD-EOF ## Computing new fields You can use `put` to create new fields which are computed from other fields: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --icsv --opprint put ' $ratio = $quantity / $rate; $color_shape = $color . "_" . $shape ' example.csv -GENMD_EOF +GENMD-EOF When you create a new field, it can immediately be used in subsequent statements: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --icsv --opprint --from example.csv put ' $y = $index + 1; $z = $y**2 + $k; ' -GENMD_EOF +GENMD-EOF For `put` and `filter` we were able to type out expressions using a programming-language syntax. See the [Miller programming language page](miller-programming-language.md) for more information. @@ -113,50 +113,50 @@ See the [Miller programming language page](miller-programming-language.md) for m Miller takes all the files from the command line as an input stream. But it's format-aware, so it doesn't repeat CSV header lines. For example, with input files [data/a.csv](data/a.csv) and [data/b.csv](data/b.csv), the system `cat` command will repeat header lines: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND cat data/a.csv -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND cat data/b.csv -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND cat data/a.csv data/b.csv -GENMD_EOF +GENMD-EOF However, `mlr cat` will not: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --csv cat data/a.csv data/b.csv -GENMD_EOF +GENMD-EOF ## Chaining verbs together Often we want to chain queries together -- for example, sorting by a field and taking the top few values. We can do this using pipes: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --csv sort -nr index example.csv | mlr --icsv --opprint head -n 3 -GENMD_EOF +GENMD-EOF This works fine -- but Miller also lets you chain verbs together using the word `then`. Think of this as a Miller-internal pipe that lets you use fewer keystrokes: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --icsv --opprint sort -nr index then head -n 3 example.csv -GENMD_EOF +GENMD-EOF As another convenience, you can put the filename first using `--from`. When you're interacting with your data at the command line, this makes it easier to up-arrow and append to the previous command: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --icsv --opprint --from example.csv sort -nr index then head -n 3 -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --icsv --opprint --from example.csv \ sort -nr index \ then head -n 3 \ then cut -f shape,quantity -GENMD_EOF +GENMD-EOF ## Sorts and stats @@ -164,55 +164,55 @@ Now suppose you want to sort the data on a given column, *and then* take the top Here are the records with the top three `index` values: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --icsv --opprint sort -nr index then head -n 3 example.csv -GENMD_EOF +GENMD-EOF Lots of Miller commands take a `-g` option for group-by: here, `head -n 1 -g shape` outputs the first record for each distinct value of the `shape` field. This means we're finding the record with highest `index` field for each distinct `shape` field: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --icsv --opprint sort -f shape -nr index then head -n 1 -g shape example.csv -GENMD_EOF +GENMD-EOF Statistics can be computed with or without group-by field(s): -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --icsv --opprint --from example.csv \ stats1 -a count,min,mean,max -f quantity -g shape -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --icsv --opprint --from example.csv \ stats1 -a count,min,mean,max -f quantity -g shape,color -GENMD_EOF +GENMD-EOF If your output has a lot of columns, you can use XTAB format to line things up vertically for you instead: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --icsv --oxtab --from example.csv \ stats1 -a p0,p10,p25,p50,p75,p90,p99,p100 -f rate -GENMD_EOF +GENMD-EOF ## Unicode and internationalization While Miller's function names, verb names, online help, etc. are all in English, Miller supports UTF-8 data. For example: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND cat παράδειγμα.csv -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --c2p filter '$σχήμα == "κύκλος"' παράδειγμα.csv -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --c2p sort -f σημαία παράδειγμα.csv -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --c2p put '$форма = toupper($форма); $длина = strlen($цвет)' пример.csv -GENMD_EOF +GENMD-EOF ## File formats and format conversion @@ -230,22 +230,22 @@ What's a CSV file, really? It's an array of rows, or *records*, each being a lis For example, if you have: -GENMD_CARDIFY +GENMD-CARDIFY shape,flag,index circle,1,24 square,0,36 -GENMD_EOF +GENMD-EOF then that's a way of saying: -GENMD_CARDIFY +GENMD-CARDIFY shape=circle,flag=1,index=24 shape=square,flag=0,index=36 -GENMD_EOF +GENMD-EOF Other ways to write the same data: -GENMD_CARDIFY +GENMD-CARDIFY CSV PPRINT shape,flag,index shape flag index circle,1,24 circle 1 24 @@ -266,7 +266,7 @@ JSON XTAB DKVP shape=circle,flag=1,index=24 shape=square,flag=0,index=36 -GENMD_EOF +GENMD-EOF Anything we can do with CSV input data, we can do with any other format input data. And you can read from one format, do any record-processing, and output to the same format as the input, or to a different output format. @@ -280,36 +280,36 @@ You can read more about this at the [File Formats](file-formats.md) page. If all record values are numbers, strings, etc., then converting back and forth between CSV and JSON is a matter of specifying input-format and output-format flags: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --json cat example.json -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --ijson --ocsv cat example.json -GENMD_EOF +GENMD-EOF However, if JSON data has map-valued or array-valued fields, Miller gives you choices on how to convert these to CSV columns. For example, here's some JSON data with map-valued fields: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND cat data/server-log.json -GENMD_EOF +GENMD-EOF We can convert this to CSV, or other tabular formats: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --ijson --ocsv cat data/server-log.json -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --ijson --oxtab cat data/server-log.json -GENMD_EOF +GENMD-EOF These transformations are reversible: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --ijson --oxtab cat data/server-log.json | mlr --ixtab --ojson cat -GENMD_EOF +GENMD-EOF See the [flatten/unflatten page](flatten-unflatten.md) for more information. @@ -319,13 +319,13 @@ Often we want to print output to the screen. Miller does this by default, as we' Sometimes, though, we want to print output to another file. Just use `> outputfilenamegoeshere` at the end of your command: -GENMD_CARDIFY_HIGHLIGHT_ONE +GENMD-CARDIFY-HIGHLIGHT-ONE mlr --icsv --opprint cat example.csv > newfile.csv # Output goes to the new file; # nothing is printed to the screen. -GENMD_EOF +GENMD-EOF -GENMD_CARDIFY_HIGHLIGHT_ONE +GENMD-CARDIFY-HIGHLIGHT-ONE cat newfile.csv color shape flag index quantity rate yellow triangle true 11 43.6498 9.8870 @@ -338,15 +338,15 @@ purple triangle false 65 80.1405 5.8240 yellow circle true 73 63.9785 4.2370 yellow circle true 87 63.5058 8.3350 purple square false 91 72.3735 8.2430 -GENMD_EOF +GENMD-EOF Other times we just want our files to be **changed in-place**: just use `mlr -I`: -GENMD_CARDIFY_HIGHLIGHT_ONE +GENMD-CARDIFY-HIGHLIGHT-ONE cp example.csv newfile.txt -GENMD_EOF +GENMD-EOF -GENMD_CARDIFY_HIGHLIGHT_ONE +GENMD-CARDIFY-HIGHLIGHT-ONE cat newfile.txt color,shape,flag,index,quantity,rate yellow,triangle,true,11,43.6498,9.8870 @@ -359,13 +359,13 @@ purple,triangle,false,65,80.1405,5.8240 yellow,circle,true,73,63.9785,4.2370 yellow,circle,true,87,63.5058,8.3350 purple,square,false,91,72.3735,8.2430 -GENMD_EOF +GENMD-EOF -GENMD_CARDIFY_HIGHLIGHT_ONE +GENMD-CARDIFY-HIGHLIGHT-ONE mlr -I --csv sort -f shape newfile.txt -GENMD_EOF +GENMD-EOF -GENMD_CARDIFY_HIGHLIGHT_ONE +GENMD-CARDIFY-HIGHLIGHT-ONE cat newfile.txt color,shape,flag,index,quantity,rate red,circle,true,16,13.8103,2.9010 @@ -378,30 +378,30 @@ purple,square,false,91,72.3735,8.2430 yellow,triangle,true,11,43.6498,9.8870 purple,triangle,false,51,81.2290,8.5910 purple,triangle,false,65,80.1405,5.8240 -GENMD_EOF +GENMD-EOF Also using `mlr -I` you can bulk-operate on lots of files: e.g.: -GENMD_CARDIFY_HIGHLIGHT_ONE +GENMD-CARDIFY-HIGHLIGHT-ONE mlr -I --csv cut -x -f unwanted_column_name *.csv -GENMD_EOF +GENMD-EOF If you like, you can first copy off your original data somewhere else, before doing in-place operations. Lastly, using `tee` within `put`, you can split your input data into separate files per one or more field names: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --csv --from example.csv put -q 'tee > $shape.".csv", $*' -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND cat circle.csv -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND cat square.csv -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND cat triangle.csv -GENMD_EOF +GENMD-EOF diff --git a/docs/src/build.md b/docs/src/build.md index 63db3643f..8c543aaa6 100644 --- a/docs/src/build.md +++ b/docs/src/build.md @@ -16,7 +16,7 @@ Quick links: # Building from source -Please also see [Installation](installation.md) for information about pre-built executables. +Please also see [Installation](installing-miller.md) for information about pre-built executables. You will need to first install Go version 1.15 or higher: please see [https://go.dev](https://go.dev). @@ -30,15 +30,18 @@ Two-clause BSD license [https://github.com/johnkerl/miller/blob/master/LICENSE.t * `tar zxvf mlr-i.j.k.tar.gz` * `cd mlr-i.j.k` * `cd go` -* `./build` creates the `go/mlr` executable and runs regression tests -* `go build mlr.go` creates the `go/mlr` executable without running regression tests +* `make` creates the `go/mlr` (or `go\mlr.exe` on Windows) executable +* `make check` runs tests +* `make install` installs the `mlr` executable and the `mlr` manpage +* On Windows, if you don't have `make`, then you can do `choco install make` -- or, alternatively: + * `cd go` + * `go build` creates `mlr.exe` + * `go test -v mlr\src\...` and `go test -v` runs tests ## From git clone * `git clone https://github.com/johnkerl/miller` -* `cd miller/go` -* `./build` creates the `go/mlr` executable and runs regression tests -* `go build mlr.go` creates the `go/mlr` executable without running regression tests +* `make`, `make check`, and `make install` as above ## In case of problems @@ -66,10 +69,8 @@ In this example I am using version 6.1.0 to 6.2.0; of course that will change fo * Update version found in `mlr --version` and `man mlr`: * Edit `go/src/version/version.go` from `6.1.0-dev` to `6.2.0`. - * `cd ../docs` - * `export PATH=../go:$PATH` - * `make html` - * The ordering is important: the first build creates `mlr`; the second runs `mlr` to create `manpage.txt`; the third includes `manpage.txt` into one of its outputs. + * `sh build-go-src-test-man-doc.sh` + * The ordering in this script is important: the first build creates `mlr`; the second runs `mlr` to create `manpage.txt`; the third includes `manpage.txt` into one of its outputs. * Commit and push. * Create the release tarball and SRPM: @@ -88,11 +89,9 @@ In this example I am using version 6.1.0 to 6.2.0; of course that will change fo * Check the release-specific docs: * Look at [https://miller.readthedocs.io](https://miller.readthedocs.io) for new-version docs, after a few minutes' propagation time. Note this won't work until Miller 6 is released. - * ISP-push to [https://johnkerl.org/miller6](https://johnkerl.org/miller6). (Until release: this is a temporary substitute for readthedocs.) * Notify: - * Only do these once Miller 6 is released: * Submit `brew` pull request; notify any other distros which don't appear to have autoupdated since the previous release (notes below) * Similarly for `macports`: [https://github.com/macports/macports-ports/blob/master/textproc/miller/Portfile](https://github.com/macports/macports-ports/blob/master/textproc/miller/Portfile) * Social-media updates. diff --git a/docs/src/build.md.in b/docs/src/build.md.in index 0c0eeb07a..557ea6914 100644 --- a/docs/src/build.md.in +++ b/docs/src/build.md.in @@ -1,6 +1,6 @@ # Building from source -Please also see [Installation](installation.md) for information about pre-built executables. +Please also see [Installation](installing-miller.md) for information about pre-built executables. You will need to first install Go version 1.15 or higher: please see [https://go.dev](https://go.dev). @@ -14,15 +14,18 @@ Two-clause BSD license [https://github.com/johnkerl/miller/blob/master/LICENSE.t * `tar zxvf mlr-i.j.k.tar.gz` * `cd mlr-i.j.k` * `cd go` -* `./build` creates the `go/mlr` executable and runs regression tests -* `go build mlr.go` creates the `go/mlr` executable without running regression tests +* `make` creates the `go/mlr` (or `go\mlr.exe` on Windows) executable +* `make check` runs tests +* `make install` installs the `mlr` executable and the `mlr` manpage +* On Windows, if you don't have `make`, then you can do `choco install make` -- or, alternatively: + * `cd go` + * `go build` creates `mlr.exe` + * `go test -v mlr\src\...` and `go test -v` runs tests ## From git clone * `git clone https://github.com/johnkerl/miller` -* `cd miller/go` -* `./build` creates the `go/mlr` executable and runs regression tests -* `go build mlr.go` creates the `go/mlr` executable without running regression tests +* `make`, `make check`, and `make install` as above ## In case of problems @@ -50,10 +53,8 @@ In this example I am using version 6.1.0 to 6.2.0; of course that will change fo * Update version found in `mlr --version` and `man mlr`: * Edit `go/src/version/version.go` from `6.1.0-dev` to `6.2.0`. - * `cd ../docs` - * `export PATH=../go:$PATH` - * `make html` - * The ordering is important: the first build creates `mlr`; the second runs `mlr` to create `manpage.txt`; the third includes `manpage.txt` into one of its outputs. + * `sh build-go-src-test-man-doc.sh` + * The ordering in this script is important: the first build creates `mlr`; the second runs `mlr` to create `manpage.txt`; the third includes `manpage.txt` into one of its outputs. * Commit and push. * Create the release tarball and SRPM: @@ -72,16 +73,14 @@ In this example I am using version 6.1.0 to 6.2.0; of course that will change fo * Check the release-specific docs: * Look at [https://miller.readthedocs.io](https://miller.readthedocs.io) for new-version docs, after a few minutes' propagation time. Note this won't work until Miller 6 is released. - * ISP-push to [https://johnkerl.org/miller6](https://johnkerl.org/miller6). (Until release: this is a temporary substitute for readthedocs.) * Notify: - * Only do these once Miller 6 is released: * Submit `brew` pull request; notify any other distros which don't appear to have autoupdated since the previous release (notes below) * Similarly for `macports`: [https://github.com/macports/macports-ports/blob/master/textproc/miller/Portfile](https://github.com/macports/macports-ports/blob/master/textproc/miller/Portfile) * Social-media updates. -GENMD_CARDIFY +GENMD-CARDIFY # brew notes: git remote add upstream https://github.com/Homebrew/homebrew-core # one-time setup only git fetch upstream @@ -98,7 +97,7 @@ git add Formula/miller.rb git commit -m 'miller 6.1.0' git push -u origin miller-6.1.0 (submit the pull request) -GENMD_EOF +GENMD-EOF * Afterwork: diff --git a/docs/src/contributing.md b/docs/src/contributing.md index 97634a34d..737dc3cfe 100644 --- a/docs/src/contributing.md +++ b/docs/src/contributing.md @@ -26,27 +26,16 @@ Pre-release Miller documentation is at [https://github.com/johnkerl/miller/tree/ Instructions for modifying, viewing, and submitting PRs for these are in the [docs/README.md](https://github.com/johnkerl/miller/blob/main/docs/README.md). -While Miller 6 is in pre-release, these docs are not viewable at -[https://miller.readthedocs.io](https://miller.readthedocs.io) which shows Miller 5 docs. -For now, I'll push Miller-6 docs to my ISP space at -[https://johnkerl.org/miller6](https://johnkerl.org/miller6) after your PR is merged. - - ## Testing As of Miller-6's current pre-release status, the best way to test is to either build from source via [Building from source](build.md), or by getting a recent binary at [https://github.com/johnkerl/miller/actions](https://github.com/johnkerl/miller/actions), then click latest build, then *Artifacts*. Then simply use Miller for whatever you do, and create an issue at [https://github.com/johnkerl/miller/issues](https://github.com/johnkerl/miller/issues). -Do note that as of mid-2021 a few things have not been ported to Miller 6 -- most notably, including localtime DSL functions and other issues. - ## Feature development Issues: [https://github.com/johnkerl/miller/issues](https://github.com/johnkerl/miller/issues) diff --git a/docs/src/contributing.md.in b/docs/src/contributing.md.in index 49a81f57b..f2c964948 100644 --- a/docs/src/contributing.md.in +++ b/docs/src/contributing.md.in @@ -10,27 +10,16 @@ Pre-release Miller documentation is at [https://github.com/johnkerl/miller/tree/ Instructions for modifying, viewing, and submitting PRs for these are in the [docs/README.md](https://github.com/johnkerl/miller/blob/main/docs/README.md). -While Miller 6 is in pre-release, these docs are not viewable at -[https://miller.readthedocs.io](https://miller.readthedocs.io) which shows Miller 5 docs. -For now, I'll push Miller-6 docs to my ISP space at -[https://johnkerl.org/miller6](https://johnkerl.org/miller6) after your PR is merged. - - ## Testing As of Miller-6's current pre-release status, the best way to test is to either build from source via [Building from source](build.md), or by getting a recent binary at [https://github.com/johnkerl/miller/actions](https://github.com/johnkerl/miller/actions), then click latest build, then *Artifacts*. Then simply use Miller for whatever you do, and create an issue at [https://github.com/johnkerl/miller/issues](https://github.com/johnkerl/miller/issues). -Do note that as of mid-2021 a few things have not been ported to Miller 6 -- most notably, including localtime DSL functions and other issues. - ## Feature development Issues: [https://github.com/johnkerl/miller/issues](https://github.com/johnkerl/miller/issues) diff --git a/docs/src/csv-with-and-without-headers.md.in b/docs/src/csv-with-and-without-headers.md.in index b7e38db9b..0d9ece9e7 100644 --- a/docs/src/csv-with-and-without-headers.md.in +++ b/docs/src/csv-with-and-without-headers.md.in @@ -4,37 +4,37 @@ Sometimes we get CSV files which lack a header. For example, [data/headerless.csv](./data/headerless.csv): -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND cat data/headerless.csv -GENMD_EOF +GENMD-EOF You can use Miller to add a header. The `--implicit-csv-header` applies positionally indexed labels: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --csv --implicit-csv-header cat data/headerless.csv -GENMD_EOF +GENMD-EOF Following that, you can rename the positionally indexed labels to names with meaning for your context. For example: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --csv --implicit-csv-header label name,age,status data/headerless.csv -GENMD_EOF +GENMD-EOF Likewise, if you need to produce CSV which is lacking its header, you can pipe Miller's output to the system command `sed 1d`, or you can use Miller's `--headerless-csv-output` option: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND head -5 data/colored-shapes.dkvp | mlr --ocsv cat -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND head -5 data/colored-shapes.dkvp | mlr --ocsv --headerless-csv-output cat -GENMD_EOF +GENMD-EOF Lastly, often we say "CSV" or "TSV" when we have positionally indexed data in columns which are separated by commas or tabs, respectively. In this case it's perhaps simpler to **just use NIDX format** which was designed for this purpose. (See also [File Formats](file-formats.md).) For example: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --inidx --ifs comma --oxtab cut -f 1,3 data/headerless.csv -GENMD_EOF +GENMD-EOF ## Headerless CSV with duplicate field values @@ -43,16 +43,16 @@ However, lots of folks think of CSV data -- comma-separated values -- as just th Here's some sample CSV data which is values-only, i.e. headerless: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND cat data/nas.csv -GENMD_EOF +GENMD-EOF There are clearly nine fields here, but if we try to have Miller parse it as CSV, we see there are fewer than nine columns: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --csv cat data/nas.csv -GENMD_EOF +GENMD-EOF What happened? @@ -67,36 +67,36 @@ values are being seen as duplicate keys. One solution is to use `--implicit-csv-header`, or its shorter alias `--hi`: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --csv --hi cat data/nas.csv -GENMD_EOF +GENMD-EOF Another solution is to use [NIDX format](file-formats.md#nidx-index-numbered-toolkit-style): -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --inidx --ifs comma --ocsv cat data/nas.csv -GENMD_EOF +GENMD-EOF Either way, since there is no explicit header, fields are named `1` through `9`. We can use the [label verb](reference-verbs.md#label) to apply more meaningful namees: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --csv --hi cat then label xsn,ysn,x,y,t,a,e29,e31,e32 data/nas.csv -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --inidx --ifs comma --ocsv cat then label xsn,ysn,x,y,t,a,e29,e31,e32 data/nas.csv -GENMD_EOF +GENMD-EOF ## Regularizing ragged CSV Miller handles [RFC-4180-compliant CSV](file-formats.md#csvtsvasvusvetc): in particular, it's an error if the number of data fields in a given data line don't match the number of header lines. But in the event that you have a CSV file in which some lines have less than the full number of fields, you can use Miller to pad them out. The trick is to use NIDX format, for which each line stands on its own without respect to a header line. -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND cat data/ragged.csv -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --from data/ragged.csv --fs comma --nidx put ' @maxnf = max(@maxnf, NF); @nf = NF; @@ -105,17 +105,17 @@ mlr --from data/ragged.csv --fs comma --nidx put ' $[@nf] = "" } ' -GENMD_EOF +GENMD-EOF or, more simply, -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --from data/ragged.csv --fs comma --nidx put ' @maxnf = max(@maxnf, NF); while(NF < @maxnf) { $[NF+1] = ""; } ' -GENMD_EOF +GENMD-EOF See also the [record-heterogeneity page](record-heterogeneity.md). diff --git a/docs/src/customization.md.in b/docs/src/customization.md.in index 45509c89f..9a1d2894b 100644 --- a/docs/src/customization.md.in +++ b/docs/src/customization.md.in @@ -4,29 +4,29 @@ Suppose you always use CSV files. Then instead of always having to type `--csv` as in -GENMD_CARDIFY_HIGHLIGHT_ONE +GENMD-CARDIFY-HIGHLIGHT-ONE mlr --csv cut -x -f extra mydata.csv -GENMD_EOF +GENMD-EOF -GENMD_CARDIFY_HIGHLIGHT_ONE +GENMD-CARDIFY-HIGHLIGHT-ONE mlr --csv sort -n id mydata.csv -GENMD_EOF +GENMD-EOF and so on, you can instead put the following into your `$HOME/.mlrrc`: -GENMD_CARDIFY +GENMD-CARDIFY --csv -GENMD_EOF +GENMD-EOF Then you can just type things like -GENMD_CARDIFY_HIGHLIGHT_ONE +GENMD-CARDIFY-HIGHLIGHT-ONE mlr cut -x -f extra mydata.csv -GENMD_EOF +GENMD-EOF -GENMD_CARDIFY_HIGHLIGHT_ONE +GENMD-CARDIFY-HIGHLIGHT-ONE mlr sort -n id mydata.csv -GENMD_EOF +GENMD-EOF and the `--csv` part will automatically be understood. If you do want to process, say, a JSON file then `mlr --json ...` at the command line will still override the defaults you've placed in your `.mlrrc`. @@ -46,7 +46,7 @@ and the `--csv` part will automatically be understood. If you do want to process Here is an example `.mlrrc` file: -GENMD_INCLUDE_ESCAPED(sample_mlrrc) +GENMD-INCLUDE-ESCAPED(sample_mlrrc) ## Where to put your .mlrrc diff --git a/docs/src/data-cleaning-examples.md.in b/docs/src/data-cleaning-examples.md.in index 33c429186..cf0234e44 100644 --- a/docs/src/data-cleaning-examples.md.in +++ b/docs/src/data-cleaning-examples.md.in @@ -2,36 +2,36 @@ Here are some ways to use the type-checking options as described in the [Type-checking page](reference-dsl-variables.md#type-checking). Suppose you have the following data file, with inconsistent typing for boolean. (Also imagine that, for the sake of discussion, we have a million-line file rather than a four-line file, so we can't see it all at once and some automation is called for.) -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND cat data/het-bool.csv -GENMD_EOF +GENMD-EOF One option is to coerce everything to boolean, or integer: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --icsv --opprint put '$reachable = boolean($reachable)' data/het-bool.csv -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --icsv --opprint put '$reachable = int(boolean($reachable))' data/het-bool.csv -GENMD_EOF +GENMD-EOF A second option is to flag badly formatted data within the output stream: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --icsv --opprint put '$format_ok = is_string($reachable)' data/het-bool.csv -GENMD_EOF +GENMD-EOF Or perhaps to flag badly formatted data outside the output stream: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --icsv --opprint put ' if (!is_string($reachable)) {eprint "Malformed at NR=".NR} ' data/het-bool.csv -GENMD_EOF +GENMD-EOF A third way is to abort the process on first instance of bad data: -GENMD_RUN_COMMAND_STDERR_ONLY +GENMD-RUN-COMMAND-STDERR-ONLY mlr --csv put '$reachable = asserting_string($reachable)' data/het-bool.csv -GENMD_EOF +GENMD-EOF diff --git a/docs/src/data-diving-examples.md.in b/docs/src/data-diving-examples.md.in index 67a96ffd0..2b63c97a1 100644 --- a/docs/src/data-diving-examples.md.in +++ b/docs/src/data-diving-examples.md.in @@ -6,56 +6,56 @@ The [flins.csv](data/flins.csv) file is some sample data obtained from [https:// Vertical-tabular format is good for a quick look at CSV data layout -- seeing what columns you have to work with, as this is a file big enough that we can't just see it on a single screenful: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND wc -l data/flins.csv -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --c2x --from data/flins.csv head -n 2 -GENMD_EOF +GENMD-EOF A few simple queries: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --c2p --from data/flins.csv count-distinct -f county | head -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --c2p --from data/flins.csv count-distinct -f line -GENMD_EOF +GENMD-EOF Categorization of total insured value: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --c2x --from data/flins.csv stats1 -a min,mean,max -f tiv_2012 -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --c2p --from data/flins.csv \ stats1 -a min,mean,max -f tiv_2012 -g construction,line -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --c2x --from data/flins.csv \ stats1 -a p0,p10,p50,p90,p95,p99,p100 -f hu_site_deductible -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --c2p --from data/flins.csv \ stats1 -a p95,p99,p100 -f hu_site_deductible -g county \ then sort -f county | head -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --c2x --from data/flins.csv \ stats2 -a corr,linreg-ols,r2 -f tiv_2011,tiv_2012 -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --c2x --from data/flins.csv --ofmt '%.4f' \ stats2 -a corr,linreg-ols,r2 -f tiv_2011,tiv_2012 -g county \ then head -n 5 -GENMD_EOF +GENMD-EOF ## Color/shape data @@ -71,50 +71,50 @@ The [data/colored-shapes.dkvp](data/colored-shapes.dkvp) file is some sample dat Peek at the data: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND wc -l data/colored-shapes.dkvp -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND head -n 6 data/colored-shapes.dkvp | mlr --opprint cat -GENMD_EOF +GENMD-EOF Look at uncategorized stats (using [creach](https://github.com/johnkerl/scripts/blob/master/fundam/creach) for spacing). Here it looks reasonable that `u` is unit-uniform; something's up with `v` but we can't yet see what: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --oxtab stats1 -a min,mean,max -f flag,u,v data/colored-shapes.dkvp | creach 3 -GENMD_EOF +GENMD-EOF The histogram shows the different distribution of 0/1 flags: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --opprint histogram -f flag,u,v --lo -0.1 --hi 1.1 --nbins 12 data/colored-shapes.dkvp -GENMD_EOF +GENMD-EOF Look at univariate stats by color and shape. In particular, color-dependent flag probabilities pop out, aligning with their original Bernoulli probablities from the data-generator script: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --opprint stats1 -a min,mean,max -f flag,u,v -g color \ then sort -f color \ data/colored-shapes.dkvp -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --opprint stats1 -a min,mean,max -f flag,u,v -g shape \ then sort -f shape \ data/colored-shapes.dkvp -GENMD_EOF +GENMD-EOF Look at bivariate stats by color and shape. In particular, `u,v` pairwise correlation for red circles pops out: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --opprint --right stats2 -a corr -f u,v,w,x data/colored-shapes.dkvp -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --opprint --right \ stats2 -a corr -f u,v,w,x -g color,shape then sort -nr u_v_corr \ data/colored-shapes.dkvp -GENMD_EOF +GENMD-EOF diff --git a/docs/src/date-time-examples.md.in b/docs/src/date-time-examples.md.in index 14ac00fb8..4c37e5c14 100644 --- a/docs/src/date-time-examples.md.in +++ b/docs/src/date-time-examples.md.in @@ -4,17 +4,17 @@ Given input like -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND cat dates.csv -GENMD_EOF +GENMD-EOF we can use [strptime](reference-verbs.md#strptime) to parse the date field into seconds-since-epoch and then do numeric comparisons. Simply match your input dataset's date-formatting to the [strptime](reference-verbs.md#strptime) format-string. For example: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --csv filter ' strptime($date, "%Y-%m-%d") > strptime("2018-03-03", "%Y-%m-%d") ' dates.csv -GENMD_EOF +GENMD-EOF Caveat: localtime-handling in timezones with DST is still a work in progress; see [https://github.com/johnkerl/miller/issues/170](https://github.com/johnkerl/miller/issues/170) . See also [https://github.com/johnkerl/miller/issues/208](https://github.com/johnkerl/miller/issues/208) -- thanks @aborruso! @@ -22,40 +22,40 @@ Caveat: localtime-handling in timezones with DST is still a work in progress; se Suppose you have some date-stamped data which may (or may not) be missing entries for one or more dates: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND head -n 10 data/miss-date.csv -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND wc -l data/miss-date.csv -GENMD_EOF +GENMD-EOF Since there are 1372 lines in the data file, some automation is called for. To find the missing dates, you can convert the dates to seconds since the epoch using `strptime`, then compute adjacent differences (the `cat -n` simply inserts record-counters): -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --from data/miss-date.csv --icsv \ cat -n \ then put '$datestamp = strptime($date, "%Y-%m-%d")' \ then step -a delta -f datestamp \ | head -GENMD_EOF +GENMD-EOF Then, filter for adjacent difference not being 86400 (the number of seconds in a day): -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --from data/miss-date.csv --icsv \ cat -n \ then put '$datestamp = strptime($date, "%Y-%m-%d")' \ then step -a delta -f datestamp \ then filter '$datestamp_delta != 86400 && $n != 1' -GENMD_EOF +GENMD-EOF Given this, it's now easy to see where the gaps are: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr cat -n then filter '$n >= 770 && $n <= 780' data/miss-date.csv -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr cat -n then filter '$n >= 1115 && $n <= 1125' data/miss-date.csv -GENMD_EOF +GENMD-EOF diff --git a/docs/src/dkvp-examples.md.in b/docs/src/dkvp-examples.md.in index 43afd7a41..6e56c957d 100644 --- a/docs/src/dkvp-examples.md.in +++ b/docs/src/dkvp-examples.md.in @@ -4,46 +4,46 @@ Here are the I/O routines: -GENMD_INCLUDE_ESCAPED(polyglot-dkvp-io/dkvp_io.py) +GENMD-INCLUDE-ESCAPED(polyglot-dkvp-io/dkvp_io.py) And here is an example using them: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND cat polyglot-dkvp-io/example.py -GENMD_EOF +GENMD-EOF Run as-is: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND python polyglot-dkvp-io/example.py < data/small -GENMD_EOF +GENMD-EOF Run as-is, then pipe to Miller for pretty-printing: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND python polyglot-dkvp-io/example.py < data/small | mlr --opprint cat -GENMD_EOF +GENMD-EOF ## DKVP I/O in Ruby Here are the I/O routines: -GENMD_INCLUDE_ESCAPED(polyglot-dkvp-io/dkvp_io.rb) +GENMD-INCLUDE-ESCAPED(polyglot-dkvp-io/dkvp_io.rb) And here is an example using them: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND cat polyglot-dkvp-io/example.rb -GENMD_EOF +GENMD-EOF Run as-is: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND ruby -I./polyglot-dkvp-io polyglot-dkvp-io/example.rb data/small -GENMD_EOF +GENMD-EOF Run as-is, then pipe to Miller for pretty-printing: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND ruby -I./polyglot-dkvp-io polyglot-dkvp-io/example.rb data/small | mlr --opprint cat -GENMD_EOF +GENMD-EOF diff --git a/docs/src/file-formats.md.in b/docs/src/file-formats.md.in index 82b512224..a5ff1ca16 100644 --- a/docs/src/file-formats.md.in +++ b/docs/src/file-formats.md.in @@ -6,9 +6,9 @@ Additionally, Miller gives you the option of including comments within your data ## Examples -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr help file-formats -GENMD_EOF +GENMD-EOF ## CSV/TSV/ASV/USV/etc. @@ -67,39 +67,39 @@ you. An **array of single-level objects** is, quite simply, **a table**: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --json head -n 2 then cut -f color,shape data/json-example-1.json -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --json head -n 2 then cut -f color,u,v data/json-example-1.json -GENMD_EOF +GENMD-EOF Single-level JSON data goes back and forth between JSON and tabular formats in the direct way: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --ijson --opprint head -n 2 then cut -f color,u,v data/json-example-1.json -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --ijson --opprint cat data/json-example-1.json -GENMD_EOF +GENMD-EOF ### Nested JSON objects Additionally, Miller can **tabularize nested objects by concatentating keys**. If your processing has input as well as output in JSON format, JSON structure is preserved throughout the processing: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --json --jvstack head -n 2 data/json-example-2.json -GENMD_EOF +GENMD-EOF But if the input format is JSON and the output format is not (or vice versa) then key-concatenation applies: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --ijson --opprint head -n 4 data/json-example-2.json -GENMD_EOF +GENMD-EOF This is discussed in more detail on the page [Flatten/unflatten: JSON vs. tabular formats](flatten-unflatten.md). @@ -128,13 +128,13 @@ decode these in Miller. Miller's pretty-print format is like CSV, but column-aligned. For example, compare -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --ocsv cat data/small -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --opprint cat data/small -GENMD_EOF +GENMD-EOF Note that while Miller is a line-at-a-time processor and retains input lines in memory only where necessary (e.g. for sort), pretty-print output requires it to accumulate all input lines (so that it can compute maximum column widths) before producing any output. This has two consequences: (a) pretty-print output won't work on `tail -f` contexts, where Miller will be waiting for an end-of-file marker which never arrives; (b) pretty-print output for large files is constrained by available machine memory. @@ -142,17 +142,17 @@ See [Record Heterogeneity](record-heterogeneity.md) for how Miller handles chang For output only (this isn't supported in the input-scanner as of 5.0.0) you can use `--barred` with pprint output format: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --opprint --barred cat data/small -GENMD_EOF +GENMD-EOF ## Markdown tabular Markdown format looks like this: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --omd cat data/small -GENMD_EOF +GENMD-EOF which renders like this when dropped into various web tools (e.g. github comments): @@ -165,7 +165,7 @@ As of Miller 4.3.0, markdown format is supported only for output, not input. This is perhaps most useful for looking a very wide and/or multi-column data which causes line-wraps on the screen (but see also [ngrid](https://github.com/twosigma/ngrid/) for an entirely different, very powerful option). Namely: -GENMD_CARDIFY_HIGHLIGHT_ONE +GENMD-CARDIFY-HIGHLIGHT-ONE $ grep -v '^#' /etc/passwd | head -n 6 | mlr --nidx --fs : --opprint cat 1 2 3 4 5 6 7 nobody * -2 -2 Unprivileged User /var/empty /usr/bin/false @@ -174,9 +174,9 @@ daemon * 1 1 System Services /var/root /usr/bin/false _uucp * 4 4 Unix to Unix Copy Protocol /var/spool/uucp /usr/sbin/uucico _taskgated * 13 13 Task Gate Daemon /var/empty /usr/bin/false _networkd * 24 24 Network Services /var/networkd /usr/bin/false -GENMD_EOF +GENMD-EOF -GENMD_CARDIFY_HIGHLIGHT_ONE +GENMD-CARDIFY-HIGHLIGHT-ONE $ grep -v '^#' /etc/passwd | head -n 2 | mlr --nidx --fs : --oxtab cat 1 nobody 2 * @@ -193,9 +193,9 @@ $ grep -v '^#' /etc/passwd | head -n 2 | mlr --nidx --fs : --oxtab cat 5 System Administrator 6 /var/root 7 /bin/sh -GENMD_EOF +GENMD-EOF -GENMD_CARDIFY_HIGHLIGHT_THREE +GENMD-CARDIFY-HIGHLIGHT-THREE $ grep -v '^#' /etc/passwd | head -n 2 | \ mlr --nidx --fs : --ojson --jvstack --jlistwrap \ label name,password,uid,gid,gecos,home_dir,shell @@ -219,45 +219,45 @@ $ grep -v '^#' /etc/passwd | head -n 2 | \ "shell": "/bin/sh" } ] -GENMD_EOF +GENMD-EOF ## DKVP: Key-value pairs Miller's default file format is DKVP, for **delimited key-value pairs**. Example: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr cat data/small -GENMD_EOF +GENMD-EOF Such data are easy to generate, e.g. in Ruby with -GENMD_CARDIFY +GENMD-CARDIFY puts "host=#{hostname},seconds=#{t2-t1},message=#{msg}" -GENMD_EOF +GENMD-EOF -GENMD_CARDIFY +GENMD-CARDIFY puts mymap.collect{|k,v| "#{k}=#{v}"}.join(',') -GENMD_EOF +GENMD-EOF or `print` statements in various languages, e.g. -GENMD_CARDIFY +GENMD-CARDIFY echo "type=3,user=$USER,date=$date\n"; -GENMD_EOF +GENMD-EOF -GENMD_CARDIFY +GENMD-CARDIFY logger.log("type=3,user=$USER,date=$date\n"); -GENMD_EOF +GENMD-EOF Fields lacking an IPS will have positional index (starting at 1) used as the key, as in NIDX format. For example, `dish=7,egg=8,flint` is parsed as `"dish" => "7", "egg" => "8", "3" => "flint"` and `dish,egg,flint` is parsed as `"1" => "dish", "2" => "egg", "3" => "flint"`. As discussed in [Record Heterogeneity](record-heterogeneity.md), Miller handles changes of field names within the same data stream. But using DKVP format this is particularly natural. One of my favorite use-cases for Miller is in application/server logs, where I log all sorts of lines such as -GENMD_CARDIFY +GENMD-CARDIFY resource=/path/to/file,loadsec=0.45,ok=true record_count=100, resource=/path/to/file resource=/some/other/path,loadsec=0.97,ok=false -GENMD_EOF +GENMD-EOF etc. and I just log them as needed. Then later, I can use `grep`, `mlr --opprint group-like`, etc. to analyze my logs. @@ -272,41 +272,41 @@ This recapitulates Unix-toolkit behavior. Example with index-numbered output: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND cat data/small -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --onidx --ofs ' ' cat data/small -GENMD_EOF +GENMD-EOF Example with index-numbered input: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND cat data/mydata.txt -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --inidx --ifs ' ' --odkvp cat data/mydata.txt -GENMD_EOF +GENMD-EOF Example with index-numbered input and output: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND cat data/mydata.txt -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --nidx --fs ' ' --repifs cut -f 2,3 data/mydata.txt -GENMD_EOF +GENMD-EOF ## Data-conversion keystroke-savers While you can do format conversion using `mlr --icsv --ojson cat myfile.csv`, there are also keystroke-savers for this purpose, such as `mlr --c2j cat myfile.csv`. For a complete list: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr help format-conversion -GENMD_EOF +GENMD-EOF -
- -Quick links: -  -Flags -  -Verbs -  -Functions -  -Glossary -  -Release docs - -
-# Installation - -Note: - -* Miller 6 is in pre-release, and is described by the docs you're reading ([https://johnkerl.org/miller6](https://johnkerl.org/miller6)). -* Miller 5 is released, and is described by [https://miller.readthedocs.io](https://miller.readthedocs.io). Package managers will currently give you Miller 5. - -## Prebuilt executables via package managers (Miller 5 only) - -[Homebrew](https://brew.sh/) installation support for OS X is available via - -
-brew update && brew install miller
-
- -... and also via [MacPorts](https://www.macports.org/): - -
-sudo port selfupdate && sudo port install miller
-
- -Note that Homebrew is available for Linux as well: [https://docs.brew.sh/linux](https://docs.brew.sh/linux). - -You may already have the `mlr` executable available in your platform's package manager on NetBSD, Debian Linux, Ubuntu Xenial and upward, Arch Linux, or perhaps other distributions. For example, on various Linux distributions you might do one of the following: - -
-sudo apt-get install miller
-
- -
-sudo apt install miller
-
- -
-sudo yum install miller
-
- -On Windows, Miller is available via [Chocolatey](https://chocolatey.org/): - -
-choco install miller
-
- -## Prebuilt executables via GitHub per release (Miller 5 only) - -Please see [https://github.com/johnkerl/miller/releases](https://github.com/johnkerl/miller/releases) where there are builds for OS X Yosemite, Linux x86-64 (dynamically linked), and Windows. - -## Prebuilt executables via GitHub per commit (Miller 6) - -Miller is [autobuilt for **Linux**, **MacOS**, and **Windows** using **GitHub Actions** on every commit](https://github.com/johnkerl/miller/actions): select the latest build and click _Artifacts_. (These are retained for 5 days after each commit.) - -## Building from source (Miller 6) - -Please see [Building from source](build.md). diff --git a/docs/src/installation.md.in b/docs/src/installation.md.in deleted file mode 100644 index 1fdf5435b..000000000 --- a/docs/src/installation.md.in +++ /dev/null @@ -1,54 +0,0 @@ -# Installation - -Note: - -* Miller 6 is in pre-release, and is described by the docs you're reading ([https://johnkerl.org/miller6](https://johnkerl.org/miller6)). -* Miller 5 is released, and is described by [https://miller.readthedocs.io](https://miller.readthedocs.io). Package managers will currently give you Miller 5. - -## Prebuilt executables via package managers (Miller 5 only) - -[Homebrew](https://brew.sh/) installation support for OS X is available via - -GENMD_CARDIFY_HIGHLIGHT_ONE -brew update && brew install miller -GENMD_EOF - -... and also via [MacPorts](https://www.macports.org/): - -GENMD_CARDIFY_HIGHLIGHT_ONE -sudo port selfupdate && sudo port install miller -GENMD_EOF - -Note that Homebrew is available for Linux as well: [https://docs.brew.sh/linux](https://docs.brew.sh/linux). - -You may already have the `mlr` executable available in your platform's package manager on NetBSD, Debian Linux, Ubuntu Xenial and upward, Arch Linux, or perhaps other distributions. For example, on various Linux distributions you might do one of the following: - -GENMD_CARDIFY_HIGHLIGHT_ONE -sudo apt-get install miller -GENMD_EOF - -GENMD_CARDIFY_HIGHLIGHT_ONE -sudo apt install miller -GENMD_EOF - -GENMD_CARDIFY_HIGHLIGHT_ONE -sudo yum install miller -GENMD_EOF - -On Windows, Miller is available via [Chocolatey](https://chocolatey.org/): - -GENMD_CARDIFY_HIGHLIGHT_ONE -choco install miller -GENMD_EOF - -## Prebuilt executables via GitHub per release (Miller 5 only) - -Please see [https://github.com/johnkerl/miller/releases](https://github.com/johnkerl/miller/releases) where there are builds for OS X Yosemite, Linux x86-64 (dynamically linked), and Windows. - -## Prebuilt executables via GitHub per commit (Miller 6) - -Miller is [autobuilt for **Linux**, **MacOS**, and **Windows** using **GitHub Actions** on every commit](https://github.com/johnkerl/miller/actions): select the latest build and click _Artifacts_. (These are retained for 5 days after each commit.) - -## Building from source (Miller 6) - -Please see [Building from source](build.md). diff --git a/docs/src/installing-miller.md b/docs/src/installing-miller.md index 1df5732b5..658746c4a 100644 --- a/docs/src/installing-miller.md +++ b/docs/src/installing-miller.md @@ -18,12 +18,12 @@ Quick links: You can install Miller for various platforms as follows. -* Miller 6 is in pre-release, and is described by the docs you're reading ([https://johnkerl.org/miller6](https://johnkerl.org/miller6)). +* Miller 6 is in pre-release. * You can get latest Miller 6 builds for Linux, MacOS, and Windows by visiting [https://github.com/johnkerl/miller/actions](https://github.com/johnkerl/miller/actions), selecting the latest build, and clicking _Artifacts_. (These are retained for 5 days after each commit.) * See also the [build page](build.md) if you prefer -- in particular, if your platform's package manager doesn't have the latest release. -* Miller 5 is released, and is described by [https://miller.readthedocs.io](https://miller.readthedocs.io). +* Miller 5 is released. * Linux: `yum install miller` or `apt-get install miller` depending on your flavor of Linux, or [Homebrew](https://docs.brew.sh/linux). - * MacOS: `brew install miller` or `port install miller` depending on your preference of [Homebrew](https://brew.sh) or [MacPorts](https://macports.org). + * MacOS: `brew update` and `brew install miller`, or `sudo port selfupdate` and `sudo port install miller`, depending on your preference of [Homebrew](https://brew.sh) or [MacPorts](https://macports.org). * Windows: `choco install miller` using [Chocolatey](https://chocolatey.org). As a first check, you should be able to run `mlr --version` at your system's command prompt and see something like the following: diff --git a/docs/src/installing-miller.md.in b/docs/src/installing-miller.md.in index 2899ddcd5..74ef229d5 100644 --- a/docs/src/installing-miller.md.in +++ b/docs/src/installing-miller.md.in @@ -2,29 +2,29 @@ You can install Miller for various platforms as follows. -* Miller 6 is in pre-release, and is described by the docs you're reading ([https://johnkerl.org/miller6](https://johnkerl.org/miller6)). +* Miller 6 is in pre-release. * You can get latest Miller 6 builds for Linux, MacOS, and Windows by visiting [https://github.com/johnkerl/miller/actions](https://github.com/johnkerl/miller/actions), selecting the latest build, and clicking _Artifacts_. (These are retained for 5 days after each commit.) * See also the [build page](build.md) if you prefer -- in particular, if your platform's package manager doesn't have the latest release. -* Miller 5 is released, and is described by [https://miller.readthedocs.io](https://miller.readthedocs.io). +* Miller 5 is released. * Linux: `yum install miller` or `apt-get install miller` depending on your flavor of Linux, or [Homebrew](https://docs.brew.sh/linux). - * MacOS: `brew install miller` or `port install miller` depending on your preference of [Homebrew](https://brew.sh) or [MacPorts](https://macports.org). + * MacOS: `brew update` and `brew install miller`, or `sudo port selfupdate` and `sudo port install miller`, depending on your preference of [Homebrew](https://brew.sh) or [MacPorts](https://macports.org). * Windows: `choco install miller` using [Chocolatey](https://chocolatey.org). As a first check, you should be able to run `mlr --version` at your system's command prompt and see something like the following: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --version -GENMD_EOF +GENMD-EOF As a second check, given [example.csv](./example.csv) you should be able to do -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --csv cat example.csv -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --icsv --opprint cat example.csv -GENMD_EOF +GENMD-EOF If you run into issues on these checks, please check out the resources on the [community page](community.md) for help. diff --git a/docs/src/internationalization.md.in b/docs/src/internationalization.md.in index d48aa226b..7be279572 100644 --- a/docs/src/internationalization.md.in +++ b/docs/src/internationalization.md.in @@ -9,18 +9,18 @@ Support for internationalization includes: * The [toupper](reference-dsl-builtin-functions.md#toupper), [tolower](reference-dsl-builtin-functions.md#tolower), and [capitalize](reference-dsl-builtin-functions.md#capitalize) DSL functions operate within the capabilities of the Go libraries. * While Miller's function names, verb names, online help, etc. are all in English, you can write field names, string literals, variable names, etc in UTF-8. -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND cat παράδειγμα.csv -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --c2p filter '$σχήμα == "κύκλος"' παράδειγμα.csv -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --c2p sort -f σημαία παράδειγμα.csv -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --c2p put '$форма = toupper($форма); $длина = strlen($цвет)' пример.csv -GENMD_EOF +GENMD-EOF diff --git a/docs/src/keystroke-savers.md.in b/docs/src/keystroke-savers.md.in index d66d1e289..c73229eef 100644 --- a/docs/src/keystroke-savers.md.in +++ b/docs/src/keystroke-savers.md.in @@ -4,13 +4,13 @@ In our examples so far we've often made use of `mlr --icsv --opprint` or `mlr --icsv --ojson`. These are such frequently occurring patterns that they have short options like `--c2p` and `--c2j`: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --c2p head -n 2 example.csv -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --c2j head -n 2 example.csv -GENMD_EOF +GENMD-EOF You can get the full list [here](file-formats.md#data-conversion-keystroke-savers). @@ -18,19 +18,19 @@ You can get the full list [here](file-formats.md#data-conversion-keystroke-saver Already we saw that you can put the filename first using `--from`. When you're interacting with your data at the command line, this makes it easier to up-arrow and append to the previous command: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --c2p --from example.csv sort -nr index then head -n 3 -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --c2p --from example.csv sort -nr index then head -n 3 then cut -f shape,quantity -GENMD_EOF +GENMD-EOF If there's more than one input file, you can use `--mfrom`, then however many file names, then `--` to indicate the end of your input-file-name list: -GENMD_SHOW_COMMAND +GENMD-SHOW-COMMAND mlr --c2p --mfrom data/*.csv -- sort -n index -GENMD_EOF +GENMD-EOF ## Shortest flags for CSV, TSV, and JSON diff --git a/docs/src/log-processing-examples.md.in b/docs/src/log-processing-examples.md.in index ad7ab9871..3ad6c9593 100644 --- a/docs/src/log-processing-examples.md.in +++ b/docs/src/log-processing-examples.md.in @@ -10,47 +10,47 @@ Writing a program -- in any language whatsoever -- you can have it print out log Suppose your program has printed something like this [log.txt](./log.txt): -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND cat log.txt -GENMD_EOF +GENMD-EOF Each print statement simply contains local information: the current timestamp, whether a particular cache was hit or not, etc. Then using either the system `grep` command, or Miller's [having-fields verb](reference-verbs.md#having-fields), or the [is_present DSL function](reference-dsl-builtin-functions.md#is_present), we can pick out the parts we want and analyze them: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND grep op=cache log.txt \ | mlr --idkvp --opprint stats1 -a mean -f hit -g type then sort -f type -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --from log.txt --opprint \ filter 'is_present($batch_size)' \ then step -a delta -f time,num_filtered \ then sec2gmt time -GENMD_EOF +GENMD-EOF Alternatively, we can simply group the similar data for a better look: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --opprint group-like log.txt -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --opprint group-like then sec2gmt time log.txt -GENMD_EOF +GENMD-EOF ## Parsing log-file output This, of course, depends highly on what's in your log files. But, as an example, suppose you have log-file lines such as -GENMD_CARDIFY +GENMD-CARDIFY 2015-10-08 08:29:09,445 INFO com.company.path.to.ClassName @ [sometext] various/sorts/of data {& punctuation} hits=1 status=0 time=2.378 -GENMD_EOF +GENMD-EOF I prefer to pre-filter with `grep` and/or `sed` to extract the structured text, then hand that to Miller. Example: -GENMD_CARDIFY_HIGHLIGHT_THREE +GENMD-CARDIFY-HIGHLIGHT-THREE grep 'various sorts' *.log \ | sed 's/.*} //' \ | mlr --fs space --repifs --oxtab stats1 -a min,p10,p50,p90,max -f time -g status ... output here ... -GENMD_EOF +GENMD-EOF diff --git a/docs/src/manpage.md b/docs/src/manpage.md index 4ab2cb23e..631ec3588 100644 --- a/docs/src/manpage.md +++ b/docs/src/manpage.md @@ -40,7 +40,7 @@ SYNOPSIS example.csv Please see 'mlr help topics' for more information. Please also see - https://johnkerl.org/miller6 + https://miller.readthedocs.io DESCRIPTION @@ -1019,7 +1019,7 @@ VERBS Using 'any' higher-order function to see if $index is 10, 20, or 30: 'any([10,20,30], func(e) {return $index == e})' - See also https://johnkerl.org/miller6/reference-dsl for more context. + See also https://miller.readthedocs.io/reference-dsl for more context. flatten Usage: mlr flatten [options] @@ -1437,7 +1437,7 @@ VERBS end{emitf @min, @max} ' - See also https://johnkerl.org/miller6/reference-dsl for more context. + See also https://miller.readthedocs.io/reference-dsl for more context. regularize Usage: mlr regularize [options] @@ -2692,7 +2692,7 @@ KEYWORDS FOR PUT AND FILTER Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emit > stderr, @*, "index1", "index2"' Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emit | "grep somepattern", @*, "index1", "index2"' - Please see https://johnkerl.org/miller6://johnkerl.org/miller/doc for more information. + Please see https://miller.readthedocs.io://johnkerl.org/miller/doc for more information. emitf emitf: inserts non-indexed out-of-stream variable(s) side-by-side into the @@ -2720,7 +2720,7 @@ KEYWORDS FOR PUT AND FILTER Example: mlr --from f.dat put '@a=$i;@b+=$x;@c+=$y; emitf | "grep somepattern", @a, @b, @c' Example: mlr --from f.dat put '@a=$i;@b+=$x;@c+=$y; emitf | "grep somepattern > mytap.dat", @a, @b, @c' - Please see https://johnkerl.org/miller6://johnkerl.org/miller/doc for more information. + Please see https://miller.readthedocs.io://johnkerl.org/miller/doc for more information. emitp emitp: inserts an out-of-stream variable into the output record stream. @@ -2750,7 +2750,7 @@ KEYWORDS FOR PUT AND FILTER Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emitp > stderr, @*, "index1", "index2"' Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emitp | "grep somepattern", @*, "index1", "index2"' - Please see https://johnkerl.org/miller6://johnkerl.org/miller/doc for more information. + Please see https://miller.readthedocs.io://johnkerl.org/miller/doc for more information. end end: defines a block of statements to be executed after input records @@ -2975,5 +2975,5 @@ SEE ALSO - 2021-11-05 MILLER(1) + 2021-11-06 MILLER(1) diff --git a/docs/src/manpage.md.in b/docs/src/manpage.md.in index 05b2fc97a..24d381881 100644 --- a/docs/src/manpage.md.in +++ b/docs/src/manpage.md.in @@ -2,4 +2,4 @@ This is simply a copy of what you should see on running `man mlr` at a command prompt, once Miller is installed on your system. -GENMD_INCLUDE_ESCAPED(manpage.txt) +GENMD-INCLUDE-ESCAPED(manpage.txt) diff --git a/docs/src/manpage.txt b/docs/src/manpage.txt index 085886266..085790da2 100644 --- a/docs/src/manpage.txt +++ b/docs/src/manpage.txt @@ -19,7 +19,7 @@ SYNOPSIS example.csv Please see 'mlr help topics' for more information. Please also see - https://johnkerl.org/miller6 + https://miller.readthedocs.io DESCRIPTION @@ -998,7 +998,7 @@ VERBS Using 'any' higher-order function to see if $index is 10, 20, or 30: 'any([10,20,30], func(e) {return $index == e})' - See also https://johnkerl.org/miller6/reference-dsl for more context. + See also https://miller.readthedocs.io/reference-dsl for more context. flatten Usage: mlr flatten [options] @@ -1416,7 +1416,7 @@ VERBS end{emitf @min, @max} ' - See also https://johnkerl.org/miller6/reference-dsl for more context. + See also https://miller.readthedocs.io/reference-dsl for more context. regularize Usage: mlr regularize [options] @@ -2671,7 +2671,7 @@ KEYWORDS FOR PUT AND FILTER Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emit > stderr, @*, "index1", "index2"' Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emit | "grep somepattern", @*, "index1", "index2"' - Please see https://johnkerl.org/miller6://johnkerl.org/miller/doc for more information. + Please see https://miller.readthedocs.io://johnkerl.org/miller/doc for more information. emitf emitf: inserts non-indexed out-of-stream variable(s) side-by-side into the @@ -2699,7 +2699,7 @@ KEYWORDS FOR PUT AND FILTER Example: mlr --from f.dat put '@a=$i;@b+=$x;@c+=$y; emitf | "grep somepattern", @a, @b, @c' Example: mlr --from f.dat put '@a=$i;@b+=$x;@c+=$y; emitf | "grep somepattern > mytap.dat", @a, @b, @c' - Please see https://johnkerl.org/miller6://johnkerl.org/miller/doc for more information. + Please see https://miller.readthedocs.io://johnkerl.org/miller/doc for more information. emitp emitp: inserts an out-of-stream variable into the output record stream. @@ -2729,7 +2729,7 @@ KEYWORDS FOR PUT AND FILTER Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emitp > stderr, @*, "index1", "index2"' Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emitp | "grep somepattern", @*, "index1", "index2"' - Please see https://johnkerl.org/miller6://johnkerl.org/miller/doc for more information. + Please see https://miller.readthedocs.io://johnkerl.org/miller/doc for more information. end end: defines a block of statements to be executed after input records @@ -2954,4 +2954,4 @@ SEE ALSO - 2021-11-05 MILLER(1) + 2021-11-06 MILLER(1) diff --git a/docs/src/miller-on-windows.md b/docs/src/miller-on-windows.md index f6968360a..64541be27 100644 --- a/docs/src/miller-on-windows.md +++ b/docs/src/miller-on-windows.md @@ -24,7 +24,7 @@ Miller was originally developed for Unix-like operating systems including Linux MSYS2 is no longer required -- although you can of course still use Miller from within MSYS2 if you prefer. There is now simply a single `mlr.exe`, with no `msys2.dll` alongside anymore. -See [Installation](installation.md) for how to get a copy of `mlr.exe`. +See [Installation](installing-miller.md) for how to get a copy of `mlr.exe`. ## Setup diff --git a/docs/src/miller-on-windows.md.in b/docs/src/miller-on-windows.md.in index 2bb8ef372..a28d267fe 100644 --- a/docs/src/miller-on-windows.md.in +++ b/docs/src/miller-on-windows.md.in @@ -8,7 +8,7 @@ Miller was originally developed for Unix-like operating systems including Linux MSYS2 is no longer required -- although you can of course still use Miller from within MSYS2 if you prefer. There is now simply a single `mlr.exe`, with no `msys2.dll` alongside anymore. -See [Installation](installation.md) for how to get a copy of `mlr.exe`. +See [Installation](installing-miller.md) for how to get a copy of `mlr.exe`. ## Setup diff --git a/docs/src/miller-programming-language.md.in b/docs/src/miller-programming-language.md.in index 866f35fca..bb098e2e0 100644 --- a/docs/src/miller-programming-language.md.in +++ b/docs/src/miller-programming-language.md.in @@ -10,9 +10,9 @@ In the [DSL reference](reference-dsl.md) page we have a complete reference to Mi Let's keep using the [example.csv](./example.csv) file: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --c2p put '$cost = $quantity * $rate' example.csv -GENMD_EOF +GENMD-EOF When we type that, a few things are happening: @@ -25,26 +25,26 @@ When we type that, a few things are happening: You can use more than one statement, separating them with semicolons, and optionally putting them on lines of their own: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --c2p put '$cost = $quantity * $rate; $index = $index * 100' example.csv -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --c2p put ' $cost = $quantity * $rate; $index *= 100 ' example.csv -GENMD_EOF +GENMD-EOF One of Miller's key features is the ability to express data-transformation right there at the keyboard, interactively. But if you find yourself using expressions repeatedly, you can put everything between the single quotes into a file and refer to that using `put -f`: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND cat dsl-example.mlr -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --c2p put -f dsl-example.mlr example.csv -GENMD_EOF +GENMD-EOF This becomes particularly important on Windows. Quite a bit of effort was put into making Miller on Windows be able to handle the kinds of single-quoted expressions we're showing here, but if you get syntax-error messages on Windows using examples in this documentation, you can put the parts between single quotes into a file and refer to that using `mlr put -f` -- or, use the triple-double-quote trick as described in the [Miller on Windows page](miller-on-windows.md). @@ -56,34 +56,34 @@ Above we also saw that names like `$quantity` are bound to each record in turn. To make `begin` and `end` statements useful, we need somewhere to put things that persist across the duration of the record stream, and a way to emit them. Miller uses [**out-of-stream variables**](reference-dsl-variables.md#out-of-stream-variables) (or **oosvars** for short) whose names start with an `@` sigil, along with the [`emit`](reference-dsl-output-statements.md#emit-statements) keyword to write them into the output record stream: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --c2p --from example.csv put 'begin { @sum = 0 } @sum += $quantity; end {emit @sum}' -GENMD_EOF +GENMD-EOF If you want the end-block output to be the only output, and not include the records from the input data, you can use `mlr put -q`: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --c2p --from example.csv put -q 'begin { @sum = 0 } @sum += $quantity; end {emit @sum}' -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --c2j --from example.csv put -q 'begin { @sum = 0 } @sum += $quantity; end {emit @sum}' -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --c2j --from example.csv put -q ' begin { @count = 0; @sum = 0 } @count += 1; @sum += $quantity; end {emit (@count, @sum)} ' -GENMD_EOF +GENMD-EOF We'll see in the documentation for [stats1](reference-verbs.md#stats1) that there's a lower-keystroking way to get counts and sums of things: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --c2j --from example.csv stats1 -a sum,count -f quantity -GENMD_EOF +GENMD-EOF So, take this sum/count example as an indication of the kinds of things you can do using Miller's programming language. @@ -97,33 +97,33 @@ Also inspired by [AWK](https://en.wikipedia.org/wiki/AWK), the Miller DSL has th * `NR` -- starting from 1, counter of how many records processed so far. * `FNR` -- similar, but resets to 1 at the start of each file. -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND cat context-example.mlr -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND cat data/a.csv -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND cat data/b.csv -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --c2p put -f context-example.mlr data/a.csv data/b.csv -GENMD_EOF +GENMD-EOF ## Functions and local variables You can [define your own functions](reference-dsl-user-defined-functions.md): -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND cat factorial-example.mlr -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --c2p --from example.csv put -f factorial-example.mlr -e '$fact = factorial(NR)' -GENMD_EOF +GENMD-EOF Note that here we used the `-f` flag to `put` to load our function definition, and also the `-e` flag to add another statement on the command @@ -135,13 +135,13 @@ future use.) Suppose you want to only compute sums conditionally -- you can use an `if` statement: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND cat if-example.mlr -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --c2p --from example.csv put -q -f if-example.mlr -GENMD_EOF +GENMD-EOF Miller's else-if is spelled `elif`. @@ -154,17 +154,17 @@ haven't encountered maps and arrays yet in this introduction, but for now it suffices to know that `$*` is a special variable holding the current record as a map: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND cat for-example.mlr -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --csv cat data/a.csv -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --csv --from data/a.csv put -qf for-example.mlr -GENMD_EOF +GENMD-EOF Here we used the local variables `k` and `v`. Now we've seen four kinds of variables: @@ -199,10 +199,10 @@ basic idea is: For example, you can sum up all the `$a` values across records without having to check whether they're present or not: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --json cat absent-example.json -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --json put '@sum_of_a += $a; end {emit @sum_of_a}' absent-example.json -GENMD_EOF +GENMD-EOF diff --git a/docs/src/misc-examples.md.in b/docs/src/misc-examples.md.in index d4b6bdc7d..80512c572 100644 --- a/docs/src/misc-examples.md.in +++ b/docs/src/misc-examples.md.in @@ -2,61 +2,61 @@ Column select: -GENMD_SHOW_COMMAND +GENMD-SHOW-COMMAND mlr --csv cut -f hostname,uptime mydata.csv -GENMD_EOF +GENMD-EOF Add new columns as function of other columns: -GENMD_CARDIFY_HIGHLIGHT_ONE +GENMD-CARDIFY-HIGHLIGHT-ONE mlr --nidx put '$sum = $7 < 0.0 ? 3.5 : $7 + 2.1*$8' *.dat -GENMD_EOF +GENMD-EOF Row filter: -GENMD_CARDIFY_HIGHLIGHT_ONE +GENMD-CARDIFY-HIGHLIGHT-ONE mlr --csv filter '$status != "down" && $upsec >= 10000' *.csv -GENMD_EOF +GENMD-EOF Apply column labels and pretty-print: -GENMD_CARDIFY_HIGHLIGHT_ONE +GENMD-CARDIFY-HIGHLIGHT-ONE grep -v '^#' /etc/group | mlr --ifs : --nidx --opprint label group,pass,gid,member then sort -f group -GENMD_EOF +GENMD-EOF Join multiple data sources on key columns: -GENMD_CARDIFY_HIGHLIGHT_ONE +GENMD-CARDIFY-HIGHLIGHT-ONE mlr join -j account_id -f accounts.dat then group-by account_name balances.dat -GENMD_EOF +GENMD-EOF Mulltiple formats including JSON: -GENMD_CARDIFY_HIGHLIGHT_ONE +GENMD-CARDIFY-HIGHLIGHT-ONE mlr --json put '$attr = sub($attr, "([0-9]+)_([0-9]+)_.*", "\1:\2")' data/*.json -GENMD_EOF +GENMD-EOF Aggregate per-column statistics: -GENMD_CARDIFY_HIGHLIGHT_ONE +GENMD-CARDIFY-HIGHLIGHT-ONE mlr stats1 -a min,mean,max,p10,p50,p90 -f flag,u,v data/* -GENMD_EOF +GENMD-EOF Linear regression: -GENMD_CARDIFY_HIGHLIGHT_ONE +GENMD-CARDIFY-HIGHLIGHT-ONE mlr stats2 -a linreg-pca -f u,v -g shape data/* -GENMD_EOF +GENMD-EOF Aggregate custom per-column statistics: -GENMD_CARDIFY_HIGHLIGHT_ONE +GENMD-CARDIFY-HIGHLIGHT-ONE mlr put -q '@sum[$a][$b] += $x; end {emit @sum, "a", "b"}' data/* -GENMD_EOF +GENMD-EOF Iterate over data using DSL expressions: -GENMD_CARDIFY_HIGHLIGHT_ONE +GENMD-CARDIFY-HIGHLIGHT-ONE mlr --from estimates.tbl put ' for (k,v in $*) { if (is_numeric(v) && k =~ "^[t-z].*$") { @@ -65,85 +65,85 @@ mlr --from estimates.tbl put ' } $mean = $sum / $count # no assignment if count unset ' -GENMD_EOF +GENMD-EOF Run DSL expressions from a script file: -GENMD_CARDIFY_HIGHLIGHT_ONE +GENMD-CARDIFY-HIGHLIGHT-ONE mlr --from infile.dat put -f analyze.mlr -GENMD_EOF +GENMD-EOF Split/reduce output to multiple filenames: -GENMD_CARDIFY_HIGHLIGHT_ONE +GENMD-CARDIFY-HIGHLIGHT-ONE mlr --from infile.dat put 'tee > "./taps/data-".$a."-".$b, $*' -GENMD_EOF +GENMD-EOF Compressed I/O: -GENMD_CARDIFY_HIGHLIGHT_ONE +GENMD-CARDIFY-HIGHLIGHT-ONE mlr --from infile.dat put 'tee | "gzip > ./taps/data-".$a."-".$b.".gz", $*' -GENMD_EOF +GENMD-EOF Interoperate with other data-processing tools using standard pipes: -GENMD_CARDIFY_HIGHLIGHT_ONE +GENMD-CARDIFY-HIGHLIGHT-ONE mlr --from infile.dat put -q '@v=$*; dump | "jq .[]"' -GENMD_EOF +GENMD-EOF Tap/trace: -GENMD_CARDIFY_HIGHLIGHT_ONE +GENMD-CARDIFY-HIGHLIGHT-ONE mlr --from infile.dat put '(NR % 1000 == 0) { print > stderr, "Checkpoint ".NR}' -GENMD_EOF +GENMD-EOF ## Program timing This admittedly artificial example demonstrates using Miller time and stats functions to introspectively acquire some information about Miller's own runtime. The `delta` function computes the difference between successive timestamps. -GENMD_INCLUDE_ESCAPED(data/timing-example.txt) +GENMD-INCLUDE-ESCAPED(data/timing-example.txt) ## Showing differences between successive queries Suppose you have a database query which you run at one point in time, producing the output on the left, then again later producing the output on the right: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND cat data/previous_counters.csv -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND cat data/current_counters.csv -GENMD_EOF +GENMD-EOF And, suppose you want to compute the differences in the counters between adjacent keys. Since the color names aren't all in the same order, nor are they all present on both sides, we can't just paste the two files side-by-side and do some column-four-minus-column-two arithmetic. First, rename counter columns to make them distinct: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --csv rename count,previous_count data/previous_counters.csv > data/prevtemp.csv -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND cat data/prevtemp.csv -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --csv rename count,current_count data/current_counters.csv > data/currtemp.csv -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND cat data/currtemp.csv -GENMD_EOF +GENMD-EOF Then, join on the key field(s), and use unsparsify to zero-fill counters absent on one side but present on the other. Use `--ul` and `--ur` to emit unpaired records (namely, purple on the left and yellow on the right): -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --icsv --opprint \ join -j color --ul --ur -f data/prevtemp.csv \ then unsparsify --fill-with 0 \ then put '$count_delta = $current_count - $previous_count' \ data/currtemp.csv -GENMD_EOF +GENMD-EOF See also the [record-heterogeneity page](record-heterogeneity.md). @@ -151,11 +151,11 @@ See also the [record-heterogeneity page](record-heterogeneity.md). The recursive function for the Fibonacci sequence is famous for its computational complexity. Namely, using f(0)=1, f(1)=1, f(n)=f(n-1)+f(n-2) for n>=2, the evaluation tree branches left as well as right at each non-trivial level, resulting in millions or more paths to the root 0/1 nodes for larger n. This program -GENMD_INCLUDE_ESCAPED(data/fibo-uncached.sh) +GENMD-INCLUDE-ESCAPED(data/fibo-uncached.sh) produces output like this: -GENMD_CARDIFY +GENMD-CARDIFY i o fcount seconds_delta 1 1 1 0 2 2 3 0.000039101 @@ -185,15 +185,15 @@ i o fcount seconds_delta 26 196418 392835 0.334423065 27 317811 635621 0.605969906 28 514229 1028457 0.971235037 -GENMD_EOF +GENMD-EOF Note that the time it takes to evaluate the function is blowing up exponentially as the input argument increases. Using `@`-variables, which persist across records, we can cache and reuse the results of previous computations: -GENMD_INCLUDE_ESCAPED(data/fibo-cached.sh) +GENMD-INCLUDE-ESCAPED(data/fibo-cached.sh) with output like this: -GENMD_CARDIFY +GENMD-CARDIFY i o fcount seconds_delta 1 1 1 0 2 2 3 0.000053883 @@ -223,4 +223,4 @@ i o fcount seconds_delta 26 196418 3 0.000012875 27 317811 3 0.000013113 28 514229 3 0.000012875 -GENMD_EOF +GENMD-EOF diff --git a/docs/src/morph b/docs/src/morph index cda715edf..1ce7ffb4c 100755 --- a/docs/src/morph +++ b/docs/src/morph @@ -7,12 +7,12 @@ while true break end - if line =~ /GENMD_RUN_COMMAND{{.*}}HERE/ - line.sub!("GENMD_RUN_COMMAND{{", "") + if line =~ /GENMD-RUN-COMMAND{{.*}}HERE/ + line.sub!("GENMD-RUN-COMMAND{{", "") line.sub!("}}HERE", "") - puts 'GENMD_RUN_COMMAND' + puts 'GENMD-RUN-COMMAND' puts line - puts 'GENMD_EOF' + puts 'GENMD-EOF' else puts line end diff --git a/docs/src/new-in-miller-6.md b/docs/src/new-in-miller-6.md index ebdb2cf02..d519ebd4f 100644 --- a/docs/src/new-in-miller-6.md +++ b/docs/src/new-in-miller-6.md @@ -72,7 +72,7 @@ See also the [Arrays reference](reference-main-arrays.md) for more information. Stronger support for Windows (with or without MSYS2), with a couple of exceptions. See [Miller on Windows](miller-on-windows.md) for more information. -Binaries are reliably available using GitHub Actions: see also [Installation](installation.md). +Binaries are reliably available using GitHub Actions: see also [Installation](installing-miller.md). ## In-process support for compressed input diff --git a/docs/src/new-in-miller-6.md.in b/docs/src/new-in-miller-6.md.in index d9a438d27..5882c9b62 100644 --- a/docs/src/new-in-miller-6.md.in +++ b/docs/src/new-in-miller-6.md.in @@ -56,7 +56,7 @@ See also the [Arrays reference](reference-main-arrays.md) for more information. Stronger support for Windows (with or without MSYS2), with a couple of exceptions. See [Miller on Windows](miller-on-windows.md) for more information. -Binaries are reliably available using GitHub Actions: see also [Installation](installation.md). +Binaries are reliably available using GitHub Actions: see also [Installation](installing-miller.md). ## In-process support for compressed input @@ -66,10 +66,10 @@ In addition to `--prepipe gunzip`, you can now use the `--gzin` flag. In fact, i You can read input with prefixes `https://`, `http://`, and `file://`: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --csv sort -f shape \ https://raw.githubusercontent.com/johnkerl/miller/main/docs/src/gz-example.csv.gz -GENMD_EOF +GENMD-EOF ## Output colorization @@ -98,13 +98,13 @@ strings throughout the processing chain. For example (see [https://github.com/johnkerl/miller/issues/178](https://github.com/johnkerl/miller/issues/178)) you can now do -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND echo '{ "a": "0123" }' | mlr --json cat -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND echo '{ "x": 1.230, "y": 1.230000000 }' | mlr --json cat -GENMD_EOF +GENMD-EOF ## REPL @@ -150,10 +150,10 @@ Miller 6 has getoptish command-line parsing ([pull request 467](https://github.c For `mlr put` and `mlr filter`, parse-error messages now include location information: -GENMD_CARDIFY +GENMD-CARDIFY mlr: cannot parse DSL expression. Parse error on token ">" at line 63 columnn 7. -GENMD_EOF +GENMD-EOF ## Developer-specific aspects diff --git a/docs/src/online-help.md b/docs/src/online-help.md index bf2bc4129..cce2cdde2 100644 --- a/docs/src/online-help.md +++ b/docs/src/online-help.md @@ -35,7 +35,7 @@ Output of one verb may be chained as input to another using "then", e.g. mlr --csv stats1 -a min,mean,max -f quantity then sort -f color example.csv Please see 'mlr help topics' for more information. -Please also see https://johnkerl.org/miller6 +Please also see https://miller.readthedocs.io
@@ -214,7 +214,7 @@ You can use `:h` or `:help` inside the [REPL](repl.md):
 
 Miller v6.0.0-dev REPL for darwin:amd64:go1.16.5
-Pre-release docs for Miller 6: https://johnkerl.org/miller6
+Docs: https://miller.readthedocs.io
 Type ':h' or ':help' for on-line help; ':q' or ':quit' to quit.
 [mlr] :h
 Options:
diff --git a/docs/src/online-help.md.in b/docs/src/online-help.md.in
index d7f0d45cc..e75ebf2dd 100644
--- a/docs/src/online-help.md.in
+++ b/docs/src/online-help.md.in
@@ -6,17 +6,17 @@ Miller has several online help mechanisms built in.
 
 The front door is `mlr --help` or its synonym `mlr -h`. This leads you to `mlr help topics` with its list of specific areas:
 
-GENMD_RUN_COMMAND
+GENMD-RUN-COMMAND
 mlr --help
-GENMD_EOF
+GENMD-EOF
 
-GENMD_RUN_COMMAND
+GENMD-RUN-COMMAND
 mlr help topics
-GENMD_EOF
+GENMD-EOF
 
-GENMD_RUN_COMMAND
+GENMD-RUN-COMMAND
 mlr help functions
-GENMD_EOF
+GENMD-EOF
 
 Etc.
 
@@ -30,17 +30,17 @@ See `mlr help flags` for a full listing.
 This is a command-line version of the [List of verbs](reference-verbs.md) page.
 Given the name of a verb (from `mlr -l`) you can invoke it with `--help` or `-h` -- or, use `mlr help verb`:
 
-GENMD_RUN_COMMAND
+GENMD-RUN-COMMAND
 mlr cat --help
-GENMD_EOF
+GENMD-EOF
 
-GENMD_RUN_COMMAND
+GENMD-RUN-COMMAND
 mlr group-like -h
-GENMD_EOF
+GENMD-EOF
 
-GENMD_RUN_COMMAND
+GENMD-RUN-COMMAND
 mlr help verb sort
-GENMD_EOF
+GENMD-EOF
 
 Etc.
 
@@ -49,17 +49,17 @@ Etc.
 This is a command-line version of the [DSL built-in functions](reference-dsl-builtin-functions.md) page.
 Given the name of a DSL function (from `mlr -f`) you can use `mlr help function` for details:
 
-GENMD_RUN_COMMAND
+GENMD-RUN-COMMAND
 mlr help function append
-GENMD_EOF
+GENMD-EOF
 
-GENMD_RUN_COMMAND
+GENMD-RUN-COMMAND
 mlr help function split
-GENMD_EOF
+GENMD-EOF
 
-GENMD_RUN_COMMAND
+GENMD-RUN-COMMAND
 mlr help function splita
-GENMD_EOF
+GENMD-EOF
 
 Etc.
 
@@ -68,10 +68,10 @@ Etc.
 You can use `:h` or `:help` inside the [REPL](repl.md):
 
 
-GENMD_CARDIFY_HIGHLIGHT_ONE
+GENMD-CARDIFY-HIGHLIGHT-ONE
 $ mlr repl
 Miller v6.0.0-dev REPL for darwin:amd64:go1.16.5
-Pre-release docs for Miller 6: https://johnkerl.org/miller6
+Docs: https://miller.readthedocs.io
 Type ':h' or ':help' for on-line help; ':q' or ':quit' to quit.
 [mlr] :h
 Options:
@@ -85,7 +85,7 @@ Options:
 :help {function name}, e.g. :help sec2gmt
 :help {function name}, e.g. :help sec2gmt
 [mlr]
-GENMD_EOF
+GENMD-EOF
 
 ## Manual page
 
diff --git a/docs/src/operating-on-all-fields.md.in b/docs/src/operating-on-all-fields.md.in
index fdb064271..6292872cf 100644
--- a/docs/src/operating-on-all-fields.md.in
+++ b/docs/src/operating-on-all-fields.md.in
@@ -4,55 +4,55 @@
 
 Suppose you want to replace spaces with underscores in your column names:
 
-GENMD_RUN_COMMAND
+GENMD-RUN-COMMAND
 cat data/spaces.csv
-GENMD_EOF
+GENMD-EOF
 
 The simplest way is to use `mlr rename` with `-g` (for global replace, not just first occurrence of space within each field) and `-r` for pattern-matching (rather than explicit single-column renames):
 
-GENMD_RUN_COMMAND
+GENMD-RUN-COMMAND
 mlr --csv rename -g -r ' ,_'  data/spaces.csv
-GENMD_EOF
+GENMD-EOF
 
-GENMD_RUN_COMMAND
+GENMD-RUN-COMMAND
 mlr --csv --opprint rename -g -r ' ,_'  data/spaces.csv
-GENMD_EOF
+GENMD-EOF
 
 You can also do this with a for-loop:
 
-GENMD_RUN_COMMAND
+GENMD-RUN-COMMAND
 cat data/bulk-rename-for-loop.mlr
-GENMD_EOF
+GENMD-EOF
 
-GENMD_RUN_COMMAND
+GENMD-RUN-COMMAND
 mlr --icsv --opprint put -f data/bulk-rename-for-loop.mlr data/spaces.csv
-GENMD_EOF
+GENMD-EOF
 
 ## Search-and-replace over all fields
 
 How to do `$name = gsub($name, "old", "new")` for all fields?
 
-GENMD_RUN_COMMAND
+GENMD-RUN-COMMAND
 cat data/sar.csv
-GENMD_EOF
+GENMD-EOF
 
-GENMD_RUN_COMMAND
+GENMD-RUN-COMMAND
 cat data/sar.mlr
-GENMD_EOF
+GENMD-EOF
 
-GENMD_RUN_COMMAND
+GENMD-RUN-COMMAND
 mlr --csv put -f data/sar.mlr data/sar.csv
-GENMD_EOF
+GENMD-EOF
 
 ## Full field renames and reassigns
 
 Using Miller 5.0.0's map literals and assigning to `$*`, you can fully generalize [rename](reference-verbs.md#rename), [reorder](reference-verbs.md#reorder), etc.
 
-GENMD_RUN_COMMAND
+GENMD-RUN-COMMAND
 cat data/small
-GENMD_EOF
+GENMD-EOF
 
-GENMD_RUN_COMMAND
+GENMD-RUN-COMMAND
 mlr put '
   begin {
     @i_cumu = 0;
@@ -68,4 +68,4 @@ mlr put '
     "x": $y,
   };
 ' data/small
-GENMD_EOF
+GENMD-EOF
diff --git a/docs/src/operating-on-all-records.md.in b/docs/src/operating-on-all-records.md.in
index 46f07c2f3..ab93071c3 100644
--- a/docs/src/operating-on-all-records.md.in
+++ b/docs/src/operating-on-all-records.md.in
@@ -22,9 +22,9 @@ to retain sums, counters, etc.
 
 For example, let's look at our short data file [data/short.csv](data/short.csv):
 
-GENMD_RUN_COMMAND
+GENMD-RUN-COMMAND
 cat data/short.csv
-GENMD_EOF
+GENMD-EOF
 
 We can track count and sum using
 [out-of-stream variables](reference-dsl-variables.md#out-of-stream-variables) -- the ones that
@@ -32,7 +32,7 @@ start with the `@` sigil -- then
 [emit](reference-dsl-output-statements.md#emit-statements) them as a new record
 after all the input is read.
 
-GENMD_RUN_COMMAND
+GENMD-RUN-COMMAND
 mlr --icsv --ojson --from data/short.csv put '
   begin {
     @count = 0;
@@ -44,12 +44,12 @@ mlr --icsv --ojson --from data/short.csv put '
     emit (@count, @sum);
   }
 '
-GENMD_EOF
+GENMD-EOF
 
 And if all we want is the final output and not the input data, we can use `put
 -q` to not pass through the input records:
 
-GENMD_RUN_COMMAND
+GENMD-RUN-COMMAND
 mlr --icsv --ojson --from data/short.csv put -q '
   begin {
     @count = 0;
@@ -61,7 +61,7 @@ mlr --icsv --ojson --from data/short.csv put -q '
     emit (@count, @sum);
   }
 '
-GENMD_EOF
+GENMD-EOF
 
 As discussed a bit more on the page on [streaming processing and memory
 usage](streaming-and-memory.md), this doesn't keep all records in memory, only
@@ -74,11 +74,11 @@ The second option is to retain entire records in a [map](reference-main-maps.md)
 
 Let's use the same short data file [data/short.csv](data/short.csv):
 
-GENMD_RUN_COMMAND
+GENMD-RUN-COMMAND
 cat data/short.csv
-GENMD_EOF
+GENMD-EOF
 
-GENMD_RUN_COMMAND
+GENMD-RUN-COMMAND
 mlr --icsv --ojson --from data/short.csv put -q '
   # map
   begin {
@@ -95,11 +95,11 @@ mlr --icsv --ojson --from data/short.csv put -q '
     emit (count, sum);
   }
 '
-GENMD_EOF
+GENMD-EOF
 
 The downside to this, of course, is that this retains all records (plus data-structure overhead) in memory, so you're limited to processing files that fit in your computer's memory. The upside, though, is that you can do random access over the records using things like
 
-GENMD_CARDIFY
+GENMD-CARDIFY
     output = 0;
     for (i = 1; i <= NR; i += 1) {
       for (j = 1; j <= NR; j += 1) {
@@ -109,13 +109,13 @@ GENMD_CARDIFY
       }
     }
     # do something with the output
-GENMD_EOF
+GENMD-EOF
 
 ## Retaining records in an array
 
 The third option is to retain records in an [array](reference-main-arrays.md), then loop over them in an `end` block.
 
-GENMD_RUN_COMMAND
+GENMD-RUN-COMMAND
 mlr --icsv --ojson --from data/short.csv put -q '
   # array
   begin {
@@ -132,7 +132,7 @@ mlr --icsv --ojson --from data/short.csv put -q '
     emit (count, sum);
   }
 '
-GENMD_EOF
+GENMD-EOF
 
 Just as with the retain-as-map approach, the downside is the overhead of
 retaining all records in memory, and the upside is that you get random access
@@ -149,7 +149,7 @@ start with 1, not 0 as discussed in the [Arrays](reference-main-arrays.md)
 page.) This means that if you are only retaining a subset of records then your
 array will have [null-gaps](reference-main-arrays.md) in it:
 
-GENMD_RUN_COMMAND
+GENMD-RUN-COMMAND
 mlr --icsv --ojson --from data/short.csv put -q '
   begin {
     @records = [];
@@ -161,11 +161,11 @@ mlr --icsv --ojson --from data/short.csv put -q '
     dump @records;
   }
 '
-GENMD_EOF
+GENMD-EOF
 
 You can index `@records` by `@count` rather than `NR` to get a contiguous array:
 
-GENMD_RUN_COMMAND
+GENMD-RUN-COMMAND
 mlr --icsv --ojson --from data/short.csv put -q '
   begin {
     @records = [];
@@ -186,11 +186,11 @@ mlr --icsv --ojson --from data/short.csv put -q '
     emit (count, sum);
   }
 '
-GENMD_EOF
+GENMD-EOF
 
 If you use a map to retain records, then this is a non-issue: maps can retain whatever values you like:
 
-GENMD_RUN_COMMAND
+GENMD-RUN-COMMAND
 mlr --icsv --ojson --from data/short.csv put -q '
   begin {
     @records = {};
@@ -209,7 +209,7 @@ mlr --icsv --ojson --from data/short.csv put -q '
     emit (count, sum);
   }
 '
-GENMD_EOF
+GENMD-EOF
 
 Do note that Miller [maps](reference-main-maps.md) preserve insertion order, so
 at the end you're guaranteed to loop over records in the same order you read
@@ -222,7 +222,7 @@ If all you need is one or a few attributes out of a record, you don't need to
 retain full records. You can retain a map, or array, of just the fields you're
 interested in:
 
-GENMD_RUN_COMMAND
+GENMD-RUN-COMMAND
 mlr --icsv --ojson --from data/short.csv put -q '
   begin {
     @values = {};
@@ -241,7 +241,7 @@ mlr --icsv --ojson --from data/short.csv put -q '
     emit (count, sum);
   }
 '
-GENMD_EOF
+GENMD-EOF
 
 ## Sorting
 
diff --git a/docs/src/programming-examples.md.in b/docs/src/programming-examples.md.in
index 4f101dbb8..351b84140 100644
--- a/docs/src/programming-examples.md.in
+++ b/docs/src/programming-examples.md.in
@@ -6,13 +6,13 @@ Here are a few things focusing on Miller's DSL as a programming language per se,
 
 The [Sieve of Eratosthenes](http://en.wikipedia.org/wiki/Sieve_of_Eratosthenes) is a standard introductory programming topic. The idea is to find all primes up to some *N* by making a list of the numbers 1 to *N*, then striking out all multiples of 2 except 2 itself, all multiples of 3 except 3 itself, all multiples of 4 except 4 itself, and so on. Whatever survives that without getting marked is a prime. This is easy enough in Miller. Notice that here all the work is in `begin` and `end` statements; there is no file input (so we use `mlr -n` to keep Miller from waiting for input data).
 
-GENMD_RUN_COMMAND
+GENMD-RUN-COMMAND
 cat programs/sieve.mlr
-GENMD_EOF
+GENMD-EOF
 
-GENMD_RUN_COMMAND
+GENMD-RUN-COMMAND
 mlr -n put -f programs/sieve.mlr
-GENMD_EOF
+GENMD-EOF
 
 ## Mandelbrot-set generator
 
@@ -20,19 +20,19 @@ The [Mandelbrot set](http://en.wikipedia.org/wiki/Mandelbrot_set) is also easily
 
 The (approximate) computation of points in the complex plane which are and aren't members is just a few lines of complex arithmetic (see the [Wikipedia article](https://en.wikipedia.org/wiki/Mandelbrot_set)); how to render them visually is another task.  Using graphics libraries you can create PNG or JPEG files, but another fun way to do this is by printing various characters to the screen:
 
-GENMD_RUN_COMMAND
+GENMD-RUN-COMMAND
 cat programs/mand.mlr
-GENMD_EOF
+GENMD-EOF
 
 At standard resolution this makes a nice little ASCII plot:
 
-GENMD_RUN_COMMAND
+GENMD-RUN-COMMAND
 mlr -n put -s iheight=25 -s iwidth=50 -f ./programs/mand.mlr
-GENMD_EOF
+GENMD-EOF
 
 But using a very small font size (as small as my Mac will let me go), and by choosing the coordinates to zoom in on a particular part of the complex plane, we can get a nice little picture:
 
-GENMD_CARDIFY
+GENMD-CARDIFY
 #!/bin/bash
 # Get the number of rows and columns from the terminal window dimensions
 iheight=$(stty size | mlr --nidx --fs space cut -f 1)
@@ -40,6 +40,6 @@ iwidth=$(stty size | mlr --nidx --fs space cut -f 2)
 mlr -n put \
   -s rcorn=-1.755350 -s icorn=0.014230 -s side=0.000020 -s maxits=10000 -s iheight=$iheight -s iwidth=$iwidth \
   -f programs/mand.mlr
-GENMD_EOF
+GENMD-EOF
 
 ![pix/mand.png](pix/mand.png)
diff --git a/docs/src/questions-about-joins.md.in b/docs/src/questions-about-joins.md.in
index ececea305..e9a15141b 100644
--- a/docs/src/questions-about-joins.md.in
+++ b/docs/src/questions-about-joins.md.in
@@ -6,25 +6,25 @@
 
 For example, the right file here has nine records, and the left file should add in the `hostname` column -- so the join output should also have 9 records:
 
-GENMD_RUN_COMMAND
+GENMD-RUN-COMMAND
 mlr --icsvlite --opprint cat data/join-u-left.csv
-GENMD_EOF
+GENMD-EOF
 
-GENMD_RUN_COMMAND
+GENMD-RUN-COMMAND
 mlr --icsvlite --opprint cat data/join-u-right.csv
-GENMD_EOF
+GENMD-EOF
 
-GENMD_RUN_COMMAND
+GENMD-RUN-COMMAND
 mlr --icsvlite --opprint join -s -j ipaddr -f data/join-u-left.csv data/join-u-right.csv
-GENMD_EOF
+GENMD-EOF
 
 The issue is that Miller's `join`, by default (before 5.1.0), took input sorted (lexically ascending) by the sort keys on both the left and right files.  This design decision was made intentionally to parallel the Unix/Linux system `join` command, which has the same semantics. The benefit of this default is that the joiner program can stream through the left and right files, needing to load neither entirely into memory. The drawback, of course, is that is requires sorted input.
 
 The solution (besides pre-sorting the input files on the join keys) is to simply use **mlr join -u** (which is now the default). This loads the left file entirely into memory (while the right file is still streamed one line at a time) and does all possible joins without requiring sorted input:
 
-GENMD_RUN_COMMAND
+GENMD-RUN-COMMAND
 mlr --icsvlite --opprint join -u -j ipaddr -f data/join-u-left.csv data/join-u-right.csv
-GENMD_EOF
+GENMD-EOF
 
 General advice is to make sure the left-file is relatively small, e.g. containing name-to-number mappings, while saving large amounts of data for the right file.
 
@@ -32,29 +32,29 @@ General advice is to make sure the left-file is relatively small, e.g. containin
 
 Suppose you have the following two data files:
 
-GENMD_INCLUDE_ESCAPED(data/color-codes.csv)
+GENMD-INCLUDE-ESCAPED(data/color-codes.csv)
 
-GENMD_INCLUDE_ESCAPED(data/color-names.csv)
+GENMD-INCLUDE-ESCAPED(data/color-names.csv)
 
 Joining on color the results are as expected:
 
-GENMD_RUN_COMMAND
+GENMD-RUN-COMMAND
 mlr --csv join -j id -f data/color-codes.csv data/color-names.csv
-GENMD_EOF
+GENMD-EOF
 
 However, if we ask for left-unpaireds, since there's no `color` column, we get a row not having the same column names as the other:
 
-GENMD_RUN_COMMAND
+GENMD-RUN-COMMAND
 mlr --csv join --ul -j id -f data/color-codes.csv data/color-names.csv
-GENMD_EOF
+GENMD-EOF
 
 To fix this, we can use **unsparsify**:
 
-GENMD_RUN_COMMAND
+GENMD-RUN-COMMAND
 mlr --csv join --ul -j id -f data/color-codes.csv \
   then unsparsify --fill-with "" \
   data/color-names.csv
-GENMD_EOF
+GENMD-EOF
 
 Thanks to @aborruso for the tip!
 
@@ -64,24 +64,24 @@ See also the [record-heterogeneity page](record-heterogeneity.md).
 
 Suppose we have the following data:
 
-GENMD_RUN_COMMAND
+GENMD-RUN-COMMAND
 cat multi-join/input.csv
-GENMD_EOF
+GENMD-EOF
 
 And we want to augment the `id` column with lookups from the following data files:
 
-GENMD_RUN_COMMAND
+GENMD-RUN-COMMAND
 cat multi-join/name-lookup.csv
-GENMD_EOF
+GENMD-EOF
 
-GENMD_RUN_COMMAND
+GENMD-RUN-COMMAND
 cat multi-join/status-lookup.csv
-GENMD_EOF
+GENMD-EOF
 
 We can run the input file through multiple `join` commands in a `then`-chain:
 
-GENMD_RUN_COMMAND
+GENMD-RUN-COMMAND
 mlr --icsv --opprint join -f multi-join/name-lookup.csv -j id \
   then join -f multi-join/status-lookup.csv -j id \
   multi-join/input.csv
-GENMD_EOF
+GENMD-EOF
diff --git a/docs/src/questions-about-then-chaining.md.in b/docs/src/questions-about-then-chaining.md.in
index c014e2838..ceb2736b8 100644
--- a/docs/src/questions-about-then-chaining.md.in
+++ b/docs/src/questions-about-then-chaining.md.in
@@ -6,55 +6,55 @@ Then-chaining found in Miller is intended to function the same as Unix pipes, bu
 
 First, look at the input data:
 
-GENMD_RUN_COMMAND
+GENMD-RUN-COMMAND
 cat data/then-example.csv
-GENMD_EOF
+GENMD-EOF
 
 Next, run the first step of your command, omitting anything from the first `then` onward:
 
-GENMD_RUN_COMMAND
+GENMD-RUN-COMMAND
 mlr --from data/then-example.csv --c2p count-distinct -f Status,Payment_Type
-GENMD_EOF
+GENMD-EOF
 
 After that, run it with the next `then` step included:
 
-GENMD_RUN_COMMAND
+GENMD-RUN-COMMAND
 mlr --from data/then-example.csv --c2p count-distinct -f Status,Payment_Type \
   then sort -nr count
-GENMD_EOF
+GENMD-EOF
 
 Now if you use `then` to include another verb after that, the columns `Status`, `Payment_Type`, and `count` will be the input to that verb.
 
 Note, by the way, that you'll get the same results using pipes:
 
-GENMD_RUN_COMMAND
+GENMD-RUN-COMMAND
 mlr --from data/then-example.csv --csv count-distinct -f Status,Payment_Type \
 | mlr --c2p sort -nr count
-GENMD_EOF
+GENMD-EOF
 
 ## NR is not consecutive after then-chaining
 
 Given this input data:
 
-GENMD_RUN_COMMAND
+GENMD-RUN-COMMAND
 cat data/small
-GENMD_EOF
+GENMD-EOF
 
 why don't I see `NR=1` and `NR=2` here??
 
-GENMD_RUN_COMMAND
+GENMD-RUN-COMMAND
 mlr --from data/small filter '$x > 0.5' then put '$NR = NR'
-GENMD_EOF
+GENMD-EOF
 
 The reason is that `NR` is computed for the original input records and isn't dynamically updated. By contrast, `NF` is dynamically updated: it's the number of fields in the current record, and if you add/remove a field, the value of `NF` will change:
 
-GENMD_RUN_COMMAND
+GENMD-RUN-COMMAND
 echo x=1,y=2,z=3 | mlr put '$nf1 = NF; $u = 4; $nf2 = NF; unset $x,$y,$z; $nf3 = NF'
-GENMD_EOF
+GENMD-EOF
 
 `NR`, by contrast (and `FNR` as well), retains the value from the original input stream, and records may be dropped by a `filter` within a `then`-chain. To recover consecutive record numbers, you can use out-of-stream variables as follows:
 
-GENMD_RUN_COMMAND
+GENMD-RUN-COMMAND
 mlr --opprint --from data/small put '
   begin{ @nr1 = 0 }
   @nr1 += 1;
@@ -66,10 +66,10 @@ then put '
   @nr2 += 1;
   $nr2 = @nr2
 '
-GENMD_EOF
+GENMD-EOF
 
 Or, simply use `mlr cat -n`:
 
-GENMD_RUN_COMMAND
+GENMD-RUN-COMMAND
 mlr filter '$x > 0.5' then cat -n data/small
-GENMD_EOF
+GENMD-EOF
diff --git a/docs/src/randomizing-examples.md.in b/docs/src/randomizing-examples.md.in
index 0af804559..0ae6a8f2e 100644
--- a/docs/src/randomizing-examples.md.in
+++ b/docs/src/randomizing-examples.md.in
@@ -4,9 +4,9 @@
 
 Here we can chain together a few simple building blocks:
 
-GENMD_RUN_COMMAND
+GENMD-RUN-COMMAND
 cat expo-sample.sh
-GENMD_EOF
+GENMD-EOF
 
 Namely:
 
@@ -18,15 +18,15 @@ Namely:
 
 The output is as follows:
 
-GENMD_RUN_COMMAND
+GENMD-RUN-COMMAND
 sh expo-sample.sh
-GENMD_EOF
+GENMD-EOF
 
 ## Randomly selecting words from a list
 
 Given this [word list](./data/english-words.txt), first take a look to see what the first few lines look like:
 
-GENMD_CARDIFY_HIGHLIGHT_ONE
+GENMD-CARDIFY-HIGHLIGHT-ONE
 head data/english-words.txt
 a
 aa
@@ -38,11 +38,11 @@ aardwolf
 aba
 abac
 abaca
-GENMD_EOF
+GENMD-EOF
 
 Then the following will randomly sample ten words with four to eight characters in them:
 
-GENMD_CARDIFY_HIGHLIGHT_ONE
+GENMD-CARDIFY-HIGHLIGHT-ONE
 mlr --from data/english-words.txt --nidx filter -S 'n=strlen($1);4<=n&&n<=8' then sample -k 10
 thionine
 birchman
@@ -54,7 +54,7 @@ askant
 aiming
 insulant
 coinmate
-GENMD_EOF
+GENMD-EOF
 
 ## Randomly generating jabberwocky words
 
@@ -62,7 +62,7 @@ These are simple *n*-grams as [described here](http://johnkerl.org/randspell/ran
 
 The idea is that words from the input file are consumed, then taken apart and pasted back together in ways which imitate the letter-to-letter transitions found in the word list -- giving us automatically generated words in the same vein as *bromance* and *spork*:
 
-GENMD_CARDIFY_HIGHLIGHT_ONE
+GENMD-CARDIFY-HIGHLIGHT-ONE
 mlr --nidx --from ./ngrams/gsl-2000.txt put -q -f ./ngrams/ngfuncs.mlr -f ./ngrams/ng5.mlr
 beard
 plastinguish
@@ -80,4 +80,4 @@ rottendence
 lessenger
 diffendant
 suggestional
-GENMD_EOF
+GENMD-EOF
diff --git a/docs/src/record-heterogeneity.md b/docs/src/record-heterogeneity.md
index df2b8a62d..2c796d1df 100644
--- a/docs/src/record-heterogeneity.md
+++ b/docs/src/record-heterogeneity.md
@@ -127,8 +127,6 @@ If you `mlr csv cat` this, you'll get an error message:
 mlr --csv cat data/het/ragged.csv
 
-a,b,c
-1,2,3
 mlr :  mlr: CSV header/data length mismatch 3 != 2 at filename data/het/ragged.csv row 3.
 
 
diff --git a/docs/src/record-heterogeneity.md.in b/docs/src/record-heterogeneity.md.in index 03b496e21..989e92ad5 100644 --- a/docs/src/record-heterogeneity.md.in +++ b/docs/src/record-heterogeneity.md.in @@ -17,35 +17,35 @@ Different kinds of heterogeneous data include _ragged_, _irregular_, and _sparse A **homogeneous** list of records is one in which all records have _the same keys, in the same order_. For example, here is a well-formed [CSV file](file-formats.md#csvtsvasvusvetc): -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --csv cat data/het/hom.csv -GENMD_EOF +GENMD-EOF It has three records (written here using JSON formatting): -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --icsv --ojson --no-jvstack cat data/het/hom.csv -GENMD_EOF +GENMD-EOF Here every row has the same keys, in the same order: `a,b,c`. These are also sometimes called **rectangular** since if we pretty-print them we get a nice rectangle: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --icsv --opprint cat data/het/hom.csv -GENMD_EOF +GENMD-EOF ### Fillable data A second example has some empty cells which could be **filled**: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --csv cat data/het/fillable.csv -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --icsv --ojson --no-jvstack cat data/het/fillable.csv -GENMD_EOF +GENMD-EOF This example is still homogeneous, though: every row has the same keys, in the same order: `a,b,c`. Empty values don't make the data heterogeneous. @@ -53,23 +53,23 @@ Empty values don't make the data heterogeneous. Note however that we can use the [`fill-down`](reference-verbs.md#fill-empty) verb to make these values non-empty, if we like: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --icsv --opprint fill-empty -v filler data/het/fillable.csv -GENMD_EOF +GENMD-EOF ### Ragged data Next let's look at non-well-formed CSV files. For a third example: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND cat data/het/ragged.csv -GENMD_EOF +GENMD-EOF If you `mlr csv cat` this, you'll get an error message: -GENMD_RUN_COMMAND_TOLERATING_ERROR +GENMD-RUN-COMMAND-TOLERATING-ERROR mlr --csv cat data/het/ragged.csv -GENMD_EOF +GENMD-EOF There are two kinds of raggedness here. Since CSVs form records by zipping the keys from the header line together with the values from each data line, the @@ -80,18 +80,18 @@ Using the [`--allow-ragged-csv-input` flag](reference-main-flag-list.md#csv-only we can fill values in too-short rows, and provide a key (column number starting with 1) for too-long rows: -GENMD_RUN_COMMAND_TOLERATING_ERROR +GENMD-RUN-COMMAND-TOLERATING-ERROR mlr --icsv --ojson --allow-ragged-csv-input cat data/het/ragged.csv -GENMD_EOF +GENMD-EOF ### Irregular data Here's another situation -- this file has, in some sense, the "same" data as our `ragged.csv` example above: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND cat data/het/irregular.json -GENMD_EOF +GENMD-EOF For example, on the second record, `a` is 4, `b` is 5, `c` is 6. But this data is heterogeneous because the keys `a,b,c` aren't in the same order in each @@ -107,9 +107,9 @@ We can use the [`regularize`](reference-verbs.md#regularize) or [`sort-within-records`](reference-verbs.md#sort-within-records) verb to order the keys: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --json --no-jvstack regularize data/het/irregular.json -GENMD_EOF +GENMD-EOF The `regularize` verb tries to re-order subsequent rows to look like the first (whatever order that is); the `sort-within-records` verb simply uses @@ -121,24 +121,24 @@ record has keys in the order `a,b,c`). Here's another frequently occurring situation -- quite often, systems will log data for items which are present, but won't log data for items which aren't. -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --json cat data/het/sparse.json -GENMD_EOF +GENMD-EOF This data is called **sparse** (from the [data-storage term](https://en.wikipedia.org/wiki/Sparse_matrix)). We can use the [`unsparsify`](reference-verbs.md#unsparsify) verb to make sure every record has the same keys: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --json unsparsify data/het/sparse.json -GENMD_EOF +GENMD-EOF Since this data is now homogeneous (rectangular), it pretty-prints nicely: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --ijson --opprint unsparsify data/het/sparse.json -GENMD_EOF +GENMD-EOF ## Reading and writing heterogeneous data @@ -149,31 +149,31 @@ to transform the data to make it homogeneous. For these formats, record-heterogeneity comes naturally: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND cat data/het/sparse.json -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --ijson --onidx --ofs ' ' cat data/het/sparse.json -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --ijson --oxtab cat data/het/sparse.json -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --ijson --odkvp cat data/het/sparse.json -GENMD_EOF +GENMD-EOF Even then, we may wish to put like with like, using the [`group-like`](reference-verbs.md#group-like) verb: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --ijson --odkvp cat data/het.json -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --ijson --odkvp group-like data/het.json -GENMD_EOF +GENMD-EOF ### Rectangular file formats: CSV and pretty-print @@ -188,23 +188,23 @@ the same way. The difference between CSV and CSV-lite is that the former is [RFC-4180-compliant](file-formats.md#csvtsvasvusvetc), while the latter readily handles heterogeneous data (which is non-compliant). For example: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND cat data/het.json -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --ijson --opprint cat data/het.json -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --ijson --opprint group-like data/het.json -GENMD_EOF +GENMD-EOF Miller handles explicit header changes as just shown. If your CSV input contains ragged data -- if there are implicit header changes (no intervening blank line and new header line) as seen above -- you can use `--allow-ragged-csv-input` (or keystroke-saver `--ragged`). -GENMD_RUN_COMMAND_TOLERATING_ERROR +GENMD-RUN-COMMAND-TOLERATING-ERROR mlr --csv --ragged cat data/het/ragged.csv -GENMD_EOF +GENMD-EOF ## Processing heterogeneous data @@ -216,10 +216,10 @@ you are sorting on the `count` field then all records in the input stream must have a `count` field but the other fields can vary, and moreover the sorted-on field name(s) don't need to be in the same position on each line: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND cat data/sort-het.dkvp -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr sort -n count data/sort-het.dkvp -GENMD_EOF +GENMD-EOF diff --git a/docs/src/reference-dsl-builtin-functions.md.in b/docs/src/reference-dsl-builtin-functions.md.in index a0797a291..4bb51082c 100644 --- a/docs/src/reference-dsl-builtin-functions.md.in +++ b/docs/src/reference-dsl-builtin-functions.md.in @@ -3,12 +3,12 @@ These are functions in the [Miller programming language](miller-programming-language.md) that you can call when you use `mlr put` and `mlr filter`. For example, when you type -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --icsv --opprint --from example.csv put ' $color = toupper($color); $shape = gsub($shape, "[aeiou]", "*"); ' -GENMD_EOF +GENMD-EOF the `toupper` and `gsub` bits are _functions_. @@ -35,4 +35,4 @@ say `x+y` so the details for the `+` operator say that its number of arguments is 2. Unary operators such as `!` and `~` show argument-count of 1; the ternary `? :` operator shows an argument-count of 3. -GENMD_RUN_CONTENT_GENERATOR(./mk-func-info.rb) +GENMD-RUN-CONTENT-GENERATOR(./mk-func-info.rb) diff --git a/docs/src/reference-dsl-control-structures.md.in b/docs/src/reference-dsl-control-structures.md.in index 50c4abb0d..5307cfcfb 100644 --- a/docs/src/reference-dsl-control-structures.md.in +++ b/docs/src/reference-dsl-control-structures.md.in @@ -4,65 +4,65 @@ These are reminiscent of `awk` syntax. They can be used to allow assignments to be done only when appropriate -- e.g. for math-function domain restrictions, regex-matching, and so on: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr cat data/put-gating-example-1.dkvp -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr put '$x > 0.0 { $y = log10($x); $z = sqrt($y) }' data/put-gating-example-1.dkvp -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr cat data/put-gating-example-2.dkvp -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr put ' $a =~ "([a-z]+)_([0-9]+)" { $b = "left_\1"; $c = "right_\2" }' \ data/put-gating-example-2.dkvp -GENMD_EOF +GENMD-EOF This produces heteregenous output which Miller, of course, has no problems with (see [Record Heterogeneity](record-heterogeneity.md)). But if you want homogeneous output, the curly braces can be replaced with a semicolon between the expression and the body statements. This causes `put` to evaluate the boolean expression (along with any side effects, namely, regex-captures `\1`, `\2`, etc.) but doesn't use it as a criterion for whether subsequent assignments should be executed. Instead, subsequent assignments are done unconditionally: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --opprint put ' $a =~ "([a-z]+)_([0-9]+)"; $b = "left_\1"; $c = "right_\2" ' data/put-gating-example-2.dkvp -GENMD_EOF +GENMD-EOF Note that pattern-action blocks are just a syntactic variation of if-statements. The following do the same thing: -GENMD_CARDIFY +GENMD-CARDIFY boolean_condition { body } -GENMD_EOF +GENMD-EOF -GENMD_CARDIFY +GENMD-CARDIFY if (boolean_condition) { body } -GENMD_EOF +GENMD-EOF ## If-statements These are again reminiscent of `awk`. Pattern-action blocks are a special case of `if` with no `elif` or `else` blocks, no `if` keyword, and parentheses optional around the boolean expression: -GENMD_SHOW_COMMAND +GENMD-SHOW-COMMAND mlr put 'NR == 4 {$foo = "bar"}' -GENMD_EOF +GENMD-EOF -GENMD_SHOW_COMMAND +GENMD-SHOW-COMMAND mlr put 'if (NR == 4) {$foo = "bar"}' -GENMD_EOF +GENMD-EOF Compound statements use `elif` (rather than `elsif` or `else if`): -GENMD_SHOW_COMMAND +GENMD-SHOW-COMMAND mlr put ' if (NR == 2) { ... @@ -74,22 +74,22 @@ mlr put ' ... } ' -GENMD_EOF +GENMD-EOF ## While and do-while loops Miller's `while` and `do-while` are unsurprising in comparison to various languages, as are `break` and `continue`: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND echo x=1,y=2 | mlr put ' while (NF < 10) { $[NF+1] = "" } $foo = "bar" ' -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND echo x=1,y=2 | mlr put ' do { $[NF+1] = ""; @@ -99,7 +99,7 @@ echo x=1,y=2 | mlr put ' } while (NF < 10); $foo = "bar" ' -GENMD_EOF +GENMD-EOF A `break` or `continue` within nested conditional blocks or if-statements will, of course, propagate to the innermost loop enclosing them, if any. A `break` or @@ -128,16 +128,16 @@ As with `while` and `do-while`, a `break` or `continue` within nested control st For [maps](reference-main-maps.md), the single variable is always bound to the *key* of key-value pairs: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --from data/small put -q ' print "NR = ".NR; for (e in $*) { print " key:", e, "value:", $[e]; } ' -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr -n put -q ' end { o = {"a":1, "b":{"c":3}}; @@ -146,13 +146,13 @@ mlr -n put -q ' } } ' -GENMD_EOF +GENMD-EOF Note that the value corresponding to a given key may be gotten as through a **computed field name** using square brackets as in `$[e]` for stream records, or by indexing the looped-over variable using square brackets. For [arrays](reference-main-arrays.md), the single variable is always bound to the *value* (not the array index): -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr -n put -q ' end { o = [10, "20", {}, "four", true]; @@ -161,7 +161,7 @@ mlr -n put -q ' } } ' -GENMD_EOF +GENMD-EOF ### Key-value for-loops @@ -171,11 +171,11 @@ variable is the (1-up) array index and the second is the value. Single-level keys may be gotten at using either `for(k,v)` or `for((k),v)`; multi-level keys may be gotten at using `for((k1,k2,k3),v)` and so on. The `v` variable will be bound to to a scalar value (non-array/non-map) if the map stops at that level, or to a map-valued or array-valued variable if the map goes deeper. If the map isn't deep enough then the loop body won't be executed. -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND cat data/for-srec-example.tbl -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --pprint --from data/for-srec-example.tbl put ' $sum1 = $f1 + $f2 + $f3; $sum2 = 0; @@ -187,17 +187,17 @@ mlr --pprint --from data/for-srec-example.tbl put ' } } ' -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --from data/small --opprint put 'for (k,v in $*) { $[k."_type"] = typeof(v) }' -GENMD_EOF +GENMD-EOF Note that the value of the current field in the for-loop can be gotten either using the bound variable `value`, or through a **computed field name** using square brackets as in `$[key]`. Important note: to avoid inconsistent looping behavior in case you're setting new fields (and/or unsetting existing ones) while looping over the record, **Miller makes a copy of the record before the loop: loop variables are bound from the copy and all other reads/writes involve the record itself**: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --from data/small --opprint put ' $sum1 = 0; $sum2 = 0; @@ -208,11 +208,11 @@ mlr --from data/small --opprint put ' } } ' -GENMD_EOF +GENMD-EOF It can be confusing to modify the stream record while iterating over a copy of it, so instead you might find it simpler to use a local variable in the loop and only update the stream record after the loop: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --from data/small --opprint put ' sum = 0; for (k,v in $*) { @@ -222,15 +222,15 @@ mlr --from data/small --opprint put ' } $sum = sum ' -GENMD_EOF +GENMD-EOF You can also start iterating on sub-maps of an out-of-stream or local variable; you can loop over nested keys; you can loop over all out-of-stream variables. The bound variables are bound to a copy of the sub-map as it was before the loop started. The sub-map is specified by square-bracketed indices after `in`, and additional deeper indices are bound to loop key-variables. The terminal values are bound to the loop value-variable whenever the keys are not too shallow. The value-variable may refer to a terminal (string, number) or it may be map-valued if the map goes deeper. Example indexing is as follows: -GENMD_INCLUDE_ESCAPED(data/for-oosvar-example-0a.txt) +GENMD-INCLUDE-ESCAPED(data/for-oosvar-example-0a.txt) That's confusing in the abstract, so a concrete example is in order. Suppose the out-of-stream variable `@myvar` is populated as follows: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr -n put --jknquoteint -q ' begin { @myvar = { @@ -241,11 +241,11 @@ mlr -n put --jknquoteint -q ' } end { dump } ' -GENMD_EOF +GENMD-EOF Then we can get at various values as follows: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr -n put --jknquoteint -q ' begin { @myvar = { @@ -262,9 +262,9 @@ mlr -n put --jknquoteint -q ' } } ' -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr -n put --jknquoteint -q ' begin { @myvar = { @@ -282,9 +282,9 @@ mlr -n put --jknquoteint -q ' } } ' -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr -n put --jknquoteint -q ' begin { @myvar = { @@ -302,13 +302,13 @@ mlr -n put --jknquoteint -q ' } } ' -GENMD_EOF +GENMD-EOF ### C-style triple-for loops These are supported as follows: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --from data/small --opprint put ' num suma = 0; for (a = 1; a <= NR; a += 1) { @@ -316,9 +316,9 @@ mlr --from data/small --opprint put ' } $suma = suma; ' -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --from data/small --opprint put ' num suma = 0; num sumb = 0; @@ -329,7 +329,7 @@ mlr --from data/small --opprint put ' $suma = suma; $sumb = sumb; ' -GENMD_EOF +GENMD-EOF Notes: @@ -347,35 +347,35 @@ Notes: Miller supports an `awk`-like `begin/end` syntax. The statements in the `begin` block are executed before any input records are read; the statements in the `end` block are executed after the last input record is read. (If you want to execute some statement at the start of each file, not at the start of the first file as with `begin`, you might use a pattern/action block of the form `FNR == 1 { ... }`.) All statements outside of `begin` or `end` are, of course, executed on every input record. Semicolons separate statements inside or outside of begin/end blocks; semicolons are required between begin/end block bodies and any subsequent statement. For example: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr put ' begin { @sum = 0 }; @x_sum += $x; end { emit @x_sum } ' ./data/small -GENMD_EOF +GENMD-EOF Since uninitialized out-of-stream variables default to 0 for addition/substraction and 1 for multiplication when they appear on expression right-hand sides (not quite as in `awk`, where they'd default to 0 either way), the above can be written more succinctly as -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr put ' @x_sum += $x; end { emit @x_sum } ' ./data/small -GENMD_EOF +GENMD-EOF The **put -q** option suppresses printing of each output record, with only `emit` statements being output. So to get only summary outputs, you could write -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr put -q ' @x_sum += $x; end { emit @x_sum } ' ./data/small -GENMD_EOF +GENMD-EOF We can do similarly with multiple out-of-stream variables: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr put -q ' @x_count += 1; @x_sum += $x; @@ -384,13 +384,13 @@ mlr put -q ' emit @x_sum; } ' ./data/small -GENMD_EOF +GENMD-EOF This is of course (see also [here](reference-dsl.md#verbs-compared-to-dsl)) not much different than -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr stats1 -a count,sum -f x ./data/small -GENMD_EOF +GENMD-EOF Note that it's a syntax error for begin/end blocks to refer to field names (beginning with `$`), since begin/end blocks execute outside the context of input records. diff --git a/docs/src/reference-dsl-differences.md.in b/docs/src/reference-dsl-differences.md.in index 097ed3cca..417867645 100644 --- a/docs/src/reference-dsl-differences.md.in +++ b/docs/src/reference-dsl-differences.md.in @@ -28,16 +28,16 @@ semicolon where one is needed . The parser tries to remind you about semicolons whenever there's a chance a missing semicolon might be involved in a parse error. -GENMD_RUN_COMMAND_TOLERATING_ERROR +GENMD-RUN-COMMAND-TOLERATING-ERROR mlr --csv --from example.csv put -q ' begin { @count = 0 # No semicolon required -- before closing curly brace } $x=1 # No semicolon required -- at end of expression ' -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND_TOLERATING_ERROR +GENMD-RUN-COMMAND-TOLERATING-ERROR mlr --csv --from example.csv put -q ' begin { @count = 0 # No semicolon required -- before closing curly brace @@ -45,7 +45,7 @@ mlr --csv --from example.csv put -q ' $x=1 # Needs a semicolon after it $y=2 # No semicolon required -- at end of expression ' -GENMD_EOF +GENMD-EOF ## elif @@ -56,39 +56,39 @@ Miller has [`elif`](reference-dsl-control-structures.md#if-statements), not `els Miller is simple-minded about scoping [local variables](reference-dsl-variables.md#local-variables) to blocks. If you have -GENMD_CARDIFY +GENMD-CARDIFY if (something) { x = 1 } else { x = 2 } -GENMD_EOF +GENMD-EOF then there are two `x` variable, each confined only to their enclosing curly braces; there is no hoisting out of the `if` and `else` blocks. A suggestion is -GENMD_CARDIFY +GENMD-CARDIFY var x if (something) { x = 1 } else { x = 2 } -GENMD_EOF +GENMD-EOF ## Required curly braces Bodies for all compound statements must be enclosed in curly braces, even if the body is a single statement: -GENMD_SHOW_COMMAND +GENMD-SHOW-COMMAND mlr ... put 'if ($x == 1) $y = 2' # Syntax error -GENMD_EOF +GENMD-EOF -GENMD_SHOW_COMMAND +GENMD-SHOW-COMMAND mlr ... put 'if ($x == 1) { $y = 2 }' # This is OK -GENMD_EOF +GENMD-EOF ## No autoconvert to boolean @@ -105,7 +105,7 @@ As discussed on the [arithmetic page](reference-main-arithmetic.md) the sum, dif Likewise, while quotient and remainder are generally pythonic, the quotient and exponentiation of two integers is an integer when possible. -GENMD_CARDIFY_HIGHLIGHT_ONE +GENMD-CARDIFY-HIGHLIGHT-ONE $ mlr repl -q [mlr] 6/2 3 @@ -124,7 +124,7 @@ int [mlr] typeof(7**80) float -GENMD_EOF +GENMD-EOF ## Print adds spaces around multiple arguments @@ -133,14 +133,14 @@ As seen in the previous example, comma-delimited arguments fills in intervening spaces for you. If you want to avoid this, use the dot operator for string-concatenation instead. -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr -n put -q ' end { print "[", "a", "b", "c", "]"; print "[" . "a" . "b" . "c" . "]"; } ' -GENMD_EOF +GENMD-EOF Similarly, a final newline is printed for you; use [`printn`](reference-dsl-output-statements.md#print-statements) to avoid this. @@ -183,11 +183,11 @@ Arrays and strings are indexed starting with 1, not 0. This is discussed in detail on the [arrays page](reference-main-arrays.md) and the [strings page](reference-main-strings.md). -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --csv --from data/short.csv cat -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --csv --from data/short.csv put -q ' @records[NR] = $*; end { @@ -196,7 +196,7 @@ mlr --csv --from data/short.csv put -q ' } } ' -GENMD_EOF +GENMD-EOF Also, slices for arrays and strings are _doubly inclusive_: `x[3:5]` gets you elements 3, 4, and 5 of the array or string named `x`. diff --git a/docs/src/reference-dsl-filter-statements.md.in b/docs/src/reference-dsl-filter-statements.md.in index ee0f771fc..c3acd41e1 100644 --- a/docs/src/reference-dsl-filter-statements.md.in +++ b/docs/src/reference-dsl-filter-statements.md.in @@ -2,20 +2,20 @@ You can use the `filter` DSL keyword within the `put` verb. In fact, the following two are synonymous: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --csv filter 'NR==2 || NR==3' example.csv -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --csv put 'filter NR==2 || NR==3' example.csv -GENMD_EOF +GENMD-EOF The former, of course, is a little easier to type. For another example: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --csv put '@running_sum += $quantity; filter @running_sum > 500' example.csv -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --csv filter '@running_sum += $quantity; @running_sum > 500' example.csv -GENMD_EOF +GENMD-EOF diff --git a/docs/src/reference-dsl-higher-order-functions.md.in b/docs/src/reference-dsl-higher-order-functions.md.in index 85e9cea71..31826ca3e 100644 --- a/docs/src/reference-dsl-higher-order-functions.md.in +++ b/docs/src/reference-dsl-higher-order-functions.md.in @@ -33,7 +33,7 @@ A perhaps helpful analogy: the `select` function is to arrays and maps as the Array examples: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr -n put ' end { my_array = [2, 9, 10, 3, 1, 4, 5, 8, 7, 6]; @@ -51,11 +51,11 @@ mlr -n put ' print; } ' -GENMD_EOF +GENMD-EOF Map examples: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr -n put ' end { my_map = {"cubit": 823, "dale": 13, "apple": 199, "ember": 191, "bottle": 107}; @@ -71,7 +71,7 @@ mlr -n put ' print select(my_map, func (k,v) { return v % 10 >= 5}); } ' -GENMD_EOF +GENMD-EOF ## apply @@ -88,7 +88,7 @@ A perhaps helpful analogy: the `apply` function is to arrays and maps as the Array examples: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr -n put ' end { my_array = [2, 9, 10, 3, 1, 4, 5, 8, 7, 6]; @@ -108,9 +108,9 @@ mlr -n put ' print sort(apply(my_array, func(e) { return e**3 })); } ' -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr -n put ' end { my_map = {"cubit": 823, "dale": 13, "apple": 199, "ember": 191, "bottle": 107}; @@ -130,7 +130,7 @@ mlr -n put ' print sort(apply(my_map, func(k,v) { return {toupper(k): v**3} })); } ' -GENMD_EOF +GENMD-EOF ## reduce @@ -146,7 +146,7 @@ accumulator. The start value for the accumulator is the first element for arrays, or the first element's key-value pair for maps. -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr -n put ' end { my_array = [2, 9, 10, 3, 1, 4, 5, 8, 7, 6]; @@ -175,9 +175,9 @@ mlr -n put ' print reduce(my_array, func (acc,e) { return acc. "," . e }); } ' -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr -n put ' end { my_map = {"cubit": 823, "dale": 13, "apple": 199, "ember": 191, "bottle": 107}; @@ -209,7 +209,7 @@ mlr -n put ' print reduce(my_map, func (acck,accv,ek,ev) { return {"joined": accv . "," . ev }}); } ' -GENMD_EOF +GENMD-EOF ## fold @@ -218,7 +218,7 @@ The [`fold`](reference-dsl-builtin-functions.md#fold) function is the same as taken from the first entry of the array/map, you specify it as the third argument. -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr -n put ' end { my_array = [2, 9, 10, 3, 1, 4, 5, 8, 7, 6]; @@ -239,9 +239,9 @@ mlr -n put ' print fold(my_array, func (acc,e) { return acc + e }, 1000000); } ' -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr -n put ' end { my_map = {"cubit": 823, "dale": 13, "apple": 199, "ember": 191, "bottle": 107}; @@ -265,7 +265,7 @@ mlr -n put ' print fold(my_map, func (acck,accv,ek,ev) { return {"sum": accv + ev} }, {"sum": 1000000}); } ' -GENMD_EOF +GENMD-EOF ## sort @@ -288,7 +288,7 @@ values. Array examples: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr -n put ' end { my_array = [2, 9, 10, 3, 1, 4, 5, 8, 7, 6]; @@ -307,11 +307,11 @@ mlr -n put ' print sort(my_array, func (a,b) { return b <=> a }); } ' -GENMD_EOF +GENMD-EOF Map examples: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr -n put ' end { my_map = {"cubit": 823, "dale": 13, "apple": 199, "ember": 191, "bottle": 107}; @@ -338,7 +338,7 @@ mlr -n put ' print sort(my_map, func(ak,av,bk,bv) { return bv <=> av }); } ' -GENMD_EOF +GENMD-EOF Please see the [sorting page](sorting.md) for more examples. @@ -346,36 +346,36 @@ Please see the [sorting page](sorting.md) for more examples. This is a way to do a logical OR/AND, respectively, of several boolean expressions, without the explicit `||`/`&&` and without a `for`-loop. This is a keystroke-saving convenience. -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --c2p cat example.csv -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --c2p --from example.csv filter 'any({"color":"red","shape":"square"}, func(k,v) {return $[k] == v})' -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --c2p --from example.csv filter 'every({"color":"red","shape":"square"}, func(k,v) {return $[k] == v})' -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --c2p --from example.csv put '$is_red_square = every({"color":"red","shape":"square"}, func(k,v) {return $[k] == v})' -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --c2p --from example.csv filter 'any([16,51,61,64], func(e) {return $index == e})' -GENMD_EOF +GENMD-EOF This last example could also be done using a map: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --c2p --from example.csv filter ' begin { @indices = {16:true, 51:true, 61:true, 64:true}; } @indices[$index] == true; ' -GENMD_EOF +GENMD-EOF ## Combined examples @@ -383,11 +383,11 @@ Using a paradigm from the [page on operating on all records](operating-on-all-records.md), we can retain a column from the input data as an array, then apply some higher-order functions to it: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --c2p cat example.csv -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --c2p --from example.csv put -q ' begin { @indexes = [] # So auto-extend will make an array, not a map @@ -420,7 +420,7 @@ mlr --c2p --from example.csv put -q ' ) } ' -GENMD_EOF +GENMD-EOF ## Caveats @@ -428,21 +428,21 @@ GENMD_EOF From other languages it's easy to accidentially write -GENMD_RUN_COMMAND_TOLERATING_ERROR +GENMD-RUN-COMMAND-TOLERATING-ERROR mlr -n put 'end { print select([1,2,3,4,5], func (e) { e >= 3 })}' -GENMD_EOF +GENMD-EOF instead of -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr -n put 'end { print select([1,2,3,4,5], func (e) { return e >= 3 })}' -GENMD_EOF +GENMD-EOF ### No IIFEs As of September 2021, immediately invoked function expressions (IIFEs) are not part of the Miller DSL's grammar. For example, this doesn't work yet: -GENMD_RUN_COMMAND_TOLERATING_ERROR +GENMD-RUN-COMMAND-TOLERATING-ERROR mlr -n put ' end { x = 3; @@ -450,11 +450,11 @@ mlr -n put ' print y; } ' -GENMD_EOF +GENMD-EOF but this does: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr -n put ' end { x = 3; @@ -463,7 +463,7 @@ mlr -n put ' print y; } ' -GENMD_EOF +GENMD-EOF ### Built-in functions currently unsupported as arguments @@ -474,7 +474,7 @@ be used directly as arguments to higher-order functions. For example, this doesn't work yet: -GENMD_RUN_COMMAND_TOLERATING_ERROR +GENMD-RUN-COMMAND-TOLERATING-ERROR mlr -n put ' end { notches = [0,1,2,3]; @@ -483,11 +483,11 @@ mlr -n put ' print cosines; } ' -GENMD_EOF +GENMD-EOF but this does: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr -n put ' end { notches = [0,1,2,3]; @@ -497,4 +497,4 @@ mlr -n put ' print cosines; } ' -GENMD_EOF +GENMD-EOF diff --git a/docs/src/reference-dsl-operators.md.in b/docs/src/reference-dsl-operators.md.in index c226749fe..47344031f 100644 --- a/docs/src/reference-dsl-operators.md.in +++ b/docs/src/reference-dsl-operators.md.in @@ -54,45 +54,45 @@ The main use for the `.` operator is for string concatenation: `"abc" . "def"` i However, in Miller 6 it has optional use for map traversal. Example: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND cat data/server-log.json -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --json --from data/server-log.json put -q ' print $req["headers"]["host"]; print $req.headers.host; ' -GENMD_EOF +GENMD-EOF This also works on the left-hand sides of assignment statements: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --json --from data/server-log.json put ' $req.headers.host = "UPDATED"; ' -GENMD_EOF +GENMD-EOF A few caveats: * This is why `.` has higher precedece than `+` in the table above -- in Miller 5 and below, where `.` was only used for concatenation, it had the same precedence as `+`. So you can now do this: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --json --from data/server-log.json put -q ' print $req.id + $res.status_code ' -GENMD_EOF +GENMD-EOF * However (awkwardly), if you want to use `.` for map-traversal as well as string-concatenation in the same statement, you'll need to insert parentheses, as the default associativity is left-to-right: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --json --from data/server-log.json put -q ' print $req.method . " -- " . $req.path ' -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --json --from data/server-log.json put -q ' print ($req.method) . " -- " . ($req.path) ' -GENMD_EOF +GENMD-EOF diff --git a/docs/src/reference-dsl-output-statements.md b/docs/src/reference-dsl-output-statements.md index 7c8311ae8..c91cb00bd 100644 --- a/docs/src/reference-dsl-output-statements.md +++ b/docs/src/reference-dsl-output-statements.md @@ -246,7 +246,7 @@ etc., to control the format of the output if the output is redirected. See also Example: mlr --from f.dat put '@a=$i;@b+=$x;@c+=$y; emitf | "grep somepattern", @a, @b, @c' Example: mlr --from f.dat put '@a=$i;@b+=$x;@c+=$y; emitf | "grep somepattern > mytap.dat", @a, @b, @c' -Please see https://johnkerl.org/miller6://johnkerl.org/miller/doc for more information. +Please see https://miller.readthedocs.io://johnkerl.org/miller/doc for more information.
@@ -280,7 +280,7 @@ etc., to control the format of the output if the output is redirected. See also
   Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emitp > stderr, @*, "index1", "index2"'
   Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emitp | "grep somepattern", @*, "index1", "index2"'
 
-Please see https://johnkerl.org/miller6://johnkerl.org/miller/doc for more information.
+Please see https://miller.readthedocs.io://johnkerl.org/miller/doc for more information.
 
@@ -315,7 +315,7 @@ etc., to control the format of the output if the output is redirected. See also
   Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emit > stderr, @*, "index1", "index2"'
   Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emit | "grep somepattern", @*, "index1", "index2"'
 
-Please see https://johnkerl.org/miller6://johnkerl.org/miller/doc for more information.
+Please see https://miller.readthedocs.io://johnkerl.org/miller/doc for more information.
 
## Emit statements diff --git a/docs/src/reference-dsl-output-statements.md.in b/docs/src/reference-dsl-output-statements.md.in index 3632adccd..5648d4119 100644 --- a/docs/src/reference-dsl-output-statements.md.in +++ b/docs/src/reference-dsl-output-statements.md.in @@ -28,15 +28,15 @@ The `print` statement is perhaps self-explanatory, but with a few light caveats: * You can redirect print output to a file: -GENMD_CARDIFY_HIGHLIGHT_ONE +GENMD-CARDIFY-HIGHLIGHT-ONE mlr --from myfile.dat put 'print > "tap.txt", $x' -GENMD_EOF +GENMD-EOF * You can redirect print output to multiple files, split by values present in various records: -GENMD_CARDIFY_HIGHLIGHT_ONE +GENMD-CARDIFY-HIGHLIGHT-ONE mlr --from myfile.dat put 'print > $a.".txt", $x' -GENMD_EOF +GENMD-EOF See also [Redirected-output statements](reference-dsl-output-statements.md#redirected-output-statements) for examples. @@ -62,34 +62,34 @@ Records produced by a `mlr put` go downstream to the next verb in your `then`-ch The syntax is, by example: -GENMD_CARDIFY_HIGHLIGHT_ONE +GENMD-CARDIFY-HIGHLIGHT-ONE mlr --from myfile.dat put 'tee > "tap.dat", $*' then sort -n index -GENMD_EOF +GENMD-EOF First is `tee >`, then the filename expression (which can be an expression such as `"tap.".$a.".dat"`), then a comma, then `$*`. (Nothing else but `$*` is teeable.) You can also write to a variable file name -- for example, you can split a single file into multiple ones on field names: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --csv cat example.csv -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --csv --from example.csv put -q 'tee > $shape.".csv", $*' -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --csv cat circle.csv -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --csv cat square.csv -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --csv cat triangle.csv -GENMD_EOF +GENMD-EOF See also [Redirected-output statements](reference-dsl-output-statements.md#redirected-output-statements) for examples. @@ -101,33 +101,33 @@ Details: * The `print` and `dump` keywords produce output immediately to standard output, or to specified file(s) or pipe-to command if present. -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr help keyword print -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr help keyword dump -GENMD_EOF +GENMD-EOF * `mlr put` sends the current record (possibly modified by the `put` expression) to the output record stream. Records are then input to the following verb in a `then`-chain (if any), else printed to standard output (unless `put -q`). The **tee** keyword *additionally* writes the output record to specified file(s) or pipe-to command, or immediately to `stdout`/`stderr`. -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr help keyword tee -GENMD_EOF +GENMD-EOF * `mlr put`'s `emitf`, `emitp`, and `emit` send out-of-stream variables to the output record stream. These are then input to the following verb in a `then`-chain (if any), else printed to standard output. When redirected with `>`, `>>`, or `|`, they *instead* write the out-of-stream variable(s) to specified file(s) or pipe-to command, or immediately to `stdout`/`stderr`. -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr help keyword emitf -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr help keyword emitp -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr help keyword emit -GENMD_EOF +GENMD-EOF ## Emit statements @@ -142,108 +142,108 @@ You can emit any map-valued expression, including `$*`, map-valued out-of-stream Use **emitf** to output several out-of-stream variables side-by-side in the same output record. For `emitf` these mustn't have indexing using `@name[...]`. Example: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr put -q ' @count += 1; @x_sum += $x; @y_sum += $y; end { emitf @count, @x_sum, @y_sum} ' data/small -GENMD_EOF +GENMD-EOF Use **emit** to output an out-of-stream variable. If it's non-indexed you'll get a simple key-value pair: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND cat data/small -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr put -q '@sum += $x; end { dump }' data/small -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr put -q '@sum += $x; end { emit @sum }' data/small -GENMD_EOF +GENMD-EOF If it's indexed then use as many names after `emit` as there are indices: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr put -q '@sum[$a] += $x; end { dump }' data/small -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr put -q '@sum[$a] += $x; end { emit @sum, "a" }' data/small -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr put -q '@sum[$a][$b] += $x; end { dump }' data/small -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr put -q '@sum[$a][$b] += $x; end { emit @sum, "a", "b" }' data/small -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr put -q '@sum[$a][$b][$i] += $x; end { dump }' data/small -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr put -q ' @sum[$a][$b][$i] += $x; end { emit @sum, "a", "b", "i" } ' data/small -GENMD_EOF +GENMD-EOF Now for **emitp**: if you have as many names following `emit` as there are levels in the out-of-stream variable's map, then `emit` and `emitp` do the same thing. Where they differ is when you don't specify as many names as there are map levels. In this case, Miller needs to flatten multiple map indices down to output-record keys: `emitp` includes full prefixing (hence the `p` in `emitp`) while `emit` takes the deepest map key as the output-record key: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr put -q '@sum[$a][$b] += $x; end { dump }' data/small -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr put -q '@sum[$a][$b] += $x; end { emit @sum, "a" }' data/small -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr put -q '@sum[$a][$b] += $x; end { emit @sum }' data/small -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr put -q '@sum[$a][$b] += $x; end { emitp @sum, "a" }' data/small -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr put -q '@sum[$a][$b] += $x; end { emitp @sum }' data/small -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --oxtab put -q '@sum[$a][$b] += $x; end { emitp @sum }' data/small -GENMD_EOF +GENMD-EOF Use **--flatsep** to specify the character which joins multilevel keys for `emitp` (it defaults to a colon): -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --flatsep / put -q '@sum[$a][$b] += $x; end { emitp @sum, "a" }' data/small -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --flatsep / put -q '@sum[$a][$b] += $x; end { emitp @sum }' data/small -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --flatsep / --oxtab put -q ' @sum[$a][$b] += $x; end { emitp @sum } ' data/small -GENMD_EOF +GENMD-EOF ## Multi-emit statements You can emit **multiple map-valued expressions side-by-side** by including their names in parentheses: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --from data/medium --opprint put -q ' @x_count[$a][$b] += 1; @x_sum[$a][$b] += $x; @@ -254,7 +254,7 @@ mlr --from data/medium --opprint put -q ' emit (@x_sum, @x_count, @x_mean), "a", "b" } ' -GENMD_EOF +GENMD-EOF What this does is walk through the first out-of-stream variable (`@x_sum` in this example) as usual, then for each keylist found (e.g. `pan,wye`), include the values for the remaining out-of-stream variables (here, `@x_count` and `@x_mean`). You should use this when all out-of-stream variables in the emit statement have **the same shape and the same keylists**. @@ -262,27 +262,27 @@ What this does is walk through the first out-of-stream variable (`@x_sum` in thi Use **emit all** (or `emit @*` which is synonymous) to output all out-of-stream variables. You can use the following idiom to get various accumulators output side-by-side (reminiscent of `mlr stats1`): -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --from data/small --opprint put -q ' @v[$a][$b]["sum"] += $x; @v[$a][$b]["count"] += 1; end{emit @*,"a","b"} ' -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --from data/small --opprint put -q ' @sum[$a][$b] += $x; @count[$a][$b] += 1; end{emit @*,"a","b"} ' -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --from data/small --opprint put -q ' @sum[$a][$b] += $x; @count[$a][$b] += 1; end{emit (@sum, @count),"a","b"} ' -GENMD_EOF +GENMD-EOF diff --git a/docs/src/reference-dsl-syntax.md.in b/docs/src/reference-dsl-syntax.md.in index f2299757b..52bff2489 100644 --- a/docs/src/reference-dsl-syntax.md.in +++ b/docs/src/reference-dsl-syntax.md.in @@ -4,13 +4,13 @@ Multiple expressions may be given, separated by semicolons, and each may refer to the ones before: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND ruby -e '10.times{|i|puts "i=#{i}"}' | mlr --opprint put '$j = $i + 1; $k = $i +$j' -GENMD_EOF +GENMD-EOF Newlines within the expression are ignored, which can help increase legibility of complex expressions: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --opprint put ' $nf = NF; $nr = NR; @@ -18,46 +18,46 @@ mlr --opprint put ' $filenum = FILENUM; $filename = FILENAME ' data/small data/small2 -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --opprint filter '($x > 0.5 && $y < 0.5) || ($x < 0.5 && $y > 0.5)' \ then stats2 -a corr -f x,y \ data/medium -GENMD_EOF +GENMD-EOF ## Expressions from files The simplest way to enter expressions for `put` and `filter` is between single quotes on the command line (see also [here](miller-on-windows.md) for Windows). For example: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --from data/small put '$xy = sqrt($x**2 + $y**2)' -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --from data/small put 'func f(a, b) { return sqrt(a**2 + b**2) } $xy = f($x, $y)' -GENMD_EOF +GENMD-EOF You may, though, find it convenient to put expressions into files for reuse, and read them **using the -f option**. For example: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND cat data/fe-example-3.mlr -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --from data/small put -f data/fe-example-3.mlr -GENMD_EOF +GENMD-EOF If you have some of the logic in a file and you want to write the rest on the command line, you can **use the -f and -e options together**: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND cat data/fe-example-4.mlr -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --from data/small put -f data/fe-example-4.mlr -e '$xy = f($x, $y)' -GENMD_EOF +GENMD-EOF A suggested use-case here is defining functions in files, and calling them from command-line expressions. @@ -69,25 +69,25 @@ Moreover, you can have one or more `-f` expressions (maybe one function per file Miller uses **semicolons as statement separators**, not statement terminators. This means you can write: -GENMD_INCLUDE_ESCAPED(data/semicolon-example.txt) +GENMD-INCLUDE-ESCAPED(data/semicolon-example.txt) Semicolons are optional after closing curly braces (which close conditionals and loops as discussed below). -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND echo x=1,y=2 | mlr put 'while (NF < 10) { $[NF+1] = ""} $foo = "bar"' -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND echo x=1,y=2 | mlr put 'while (NF < 10) { $[NF+1] = ""}; $foo = "bar"' -GENMD_EOF +GENMD-EOF Semicolons are required between statements even if those statements are on separate lines. **Newlines** are for your convenience but have no syntactic meaning: line endings do not terminate statements. For example, adjacent assignment statements must be separated by semicolons even if those statements are on separate lines: -GENMD_INCLUDE_ESCAPED(data/newline-example.txt) +GENMD-INCLUDE-ESCAPED(data/newline-example.txt) **Trailing commas** are allowed in function/subroutine definitions, function/subroutine callsites, and map literals. This is intended for (although not restricted to) the multi-line case: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --csvlite --from data/a.csv put ' func f( num a, @@ -105,21 +105,21 @@ mlr --csvlite --from data/a.csv put ' "v": NR, } ' -GENMD_EOF +GENMD-EOF Bodies for all compound statements must be enclosed in **curly braces**, even if the body is a single statement: -GENMD_SHOW_COMMAND +GENMD-SHOW-COMMAND mlr put 'if ($x == 1) $y = 2' # Syntax error -GENMD_EOF +GENMD-EOF -GENMD_SHOW_COMMAND +GENMD-SHOW-COMMAND mlr put 'if ($x == 1) { $y = 2 }' # This is OK -GENMD_EOF +GENMD-EOF Bodies for compound statements may be empty: -GENMD_SHOW_COMMAND +GENMD-SHOW-COMMAND mlr put 'if ($x == 1) { }' # This no-op is syntactically acceptable -GENMD_EOF +GENMD-EOF diff --git a/docs/src/reference-dsl-time.md.in b/docs/src/reference-dsl-time.md.in index 1ba752ff8..a81a7283d 100644 --- a/docs/src/reference-dsl-time.md.in +++ b/docs/src/reference-dsl-time.md.in @@ -31,17 +31,17 @@ seconds, are common in some contexts, particulary JavaScript. If you ever (anywhere) see a timestamp for the year 49,000-something -- probably someone is treating epoch-milliseconds as epoch-seconds. -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr -n put 'end { print sec2gmt(1500000000); print sec2gmt(1500000000000); }' -GENMD_EOF +GENMD-EOF You can get the current system time, as epoch-seconds, using the [systime](reference-dsl-builtin-functions.md#systime) DSL function: -GENMD_CARDIFY_HIGHLIGHT_ONE +GENMD-CARDIFY-HIGHLIGHT-ONE mlr --c2p --from example.csv put '$t = systime()' color shape flag k index quantity rate t yellow triangle true 1 11 43.6498 9.8870 1634784588.045347 @@ -54,7 +54,7 @@ purple triangle false 7 65 80.1405 5.8240 1634784588.045418 yellow circle true 8 73 63.9785 4.2370 1634784588.045419 yellow circle true 9 87 63.5058 8.3350 1634784588.045421 purple square false 10 91 72.3735 8.2430 1634784588.045422 -GENMD_EOF +GENMD-EOF The [systimeint](reference-dsl-builtin-functions.md#systimeint) DSL functions is nothing more than a keystroke-saver for `int(systime())`. @@ -72,7 +72,7 @@ You can get these from epoch-seconds using the (Note that the terms _UTC_ and _GMT_ are used interchangeably in Miller.) We also have [sec2gmtdate](reference-dsl-builtin-functions.md#sec2gmtdate) DSL function. -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr -n put 'end { print sec2gmt(0); print sec2gmt(1234567890.123); @@ -82,7 +82,7 @@ mlr -n put 'end { print sec2gmtdate(1234567890.123); print sec2gmtdate(-1234567890.123); }' -GENMD_EOF +GENMD-EOF # Local times with standard format; specifying timezones @@ -101,20 +101,20 @@ You can specify the timezone using any of the following: Regardless, if you specify an invalid timezone, you'll be clearly notified: -GENMD_RUN_COMMAND_TOLERATING_ERROR +GENMD-RUN-COMMAND-TOLERATING-ERROR mlr --from example.csv --tz This/Is/A/Typo cat -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND export TZ=Asia/Istanbul mlr -n put 'end { print sec2localtime(0) }' -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --tz America/Sao_Paulo -n put 'end { print sec2localtime(0) }' -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr -n put 'end { ENV["TZ"] = "Asia/Istanbul"; print sec2localtime(0); @@ -126,9 +126,9 @@ mlr -n put 'end { print sec2localdate(0); print localtime2sec("2000-01-02 03:04:05"); }' -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr -n put 'end { print sec2localtime(0, 0, "Asia/Istanbul"); print sec2localdate(0, "Asia/Istanbul"); @@ -138,7 +138,7 @@ mlr -n put 'end { print sec2localdate(0, "America/Sao_Paulo"); print localtime2sec("2000-01-02 03:04:05", "America/Sao_Paulo"); }' -GENMD_EOF +GENMD-EOF Note that for local times, Miller omits the `T` and the `Z` you see in GMT times. @@ -146,22 +146,22 @@ We also have the [gmt2localtime](reference-dsl-builtin-functions.md#gmt2localtime) and [localtime2gmt](reference-dsl-builtin-functions.md#localtime2gmt) convenience functions: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr -n put 'end { ENV["TZ"] = "Asia/Istanbul"; print gmt2localtime("1970-01-01T00:00:00Z"); print localtime2gmt("1970-01-01 00:00:00"); }' -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr -n put 'end { print gmt2localtime("1970-01-01T00:00:00Z", "America/Sao_Paulo"); print gmt2localtime("1970-01-01T00:00:00Z", "Asia/Istanbul"); print localtime2gmt("1970-01-01 00:00:00", "America/Sao_Paulo"); print localtime2gmt("1970-01-01 00:00:00", "Asia/Istanbul"); }' -GENMD_EOF +GENMD-EOF # GMT and local times with custom formats @@ -181,7 +181,7 @@ Notes: Some examples: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr -n put 'end { ENV["TZ"] = "Asia/Istanbul"; print strftime(0, "%Y-%m-%d %H:%M:%S"); @@ -190,7 +190,7 @@ mlr -n put 'end { print strftime(0, "%A, %B %e, %Y"); print strftime(123456789, "%I:%M %p"); }' -GENMD_EOF +GENMD-EOF Unfortunately, names from `%A` and `%B` are only available in English, as an artifact of a design choice in the Go `time` library which Miller (and its @@ -200,7 +200,7 @@ We also have [strftimelocal](reference-dsl-builtin-functions.md#strftimelocal) and [strptimelocal](reference-dsl-builtin-functions.md#strptimelocal): -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr -n put 'end { ENV["TZ"] = "America/Anchorage"; print strftime_local(0, "%Y-%m-%d %H:%M:%S %Z"); @@ -214,9 +214,9 @@ mlr -n put 'end { print strftime_local(0, "%A, %B %e, %Y"); print strptime_local("2020-03-01 00:00:00", "%Y-%m-%d %H:%M:%S"); }' -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr -n put 'end { print strftime_local(0, "%Y-%m-%d %H:%M:%S %Z", "America/Anchorage"); print strftime_local(0, "%Y-%m-%d %H:%M:%S %z", "America/Anchorage"); @@ -228,7 +228,7 @@ mlr -n put 'end { print strftime_local(0, "%A, %B %e, %Y", "Asia/Hong_Kong"); print strptime_local("2020-03-01 00:00:00", "%Y-%m-%d %H:%M:%S", "Asia/Hong_Kong"); }' -GENMD_EOF +GENMD-EOF # Relative times @@ -236,7 +236,7 @@ You can get the seconds since the Miller process start using [uptime](reference-dsl-builtin-functions.md#uptime): -GENMD_CARDIFY_HIGHLIGHT_ONE +GENMD-CARDIFY-HIGHLIGHT-ONE color shape flag k index quantity rate u yellow triangle true 1 11 43.6498 9.8870 0.0011110305786132812 red square true 2 15 79.2778 0.0130 0.0011241436004638672 @@ -248,13 +248,13 @@ purple triangle false 7 65 80.1405 5.8240 0.0024831295013427734 yellow circle true 8 73 63.9785 4.2370 0.0024831295013427734 yellow circle true 9 87 63.5058 8.3350 0.0024852752685546875 purple square false 10 91 72.3735 8.2430 0.002485990524291992 -GENMD_EOF +GENMD-EOF Time-differences can be done in seconds, of course; you can also use the following if you like: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr -F | grep hms -GENMD_EOF +GENMD-EOF # References diff --git a/docs/src/reference-dsl-unset-statements.md.in b/docs/src/reference-dsl-unset-statements.md.in index 28671b6fd..a144ea2da 100644 --- a/docs/src/reference-dsl-unset-statements.md.in +++ b/docs/src/reference-dsl-unset-statements.md.in @@ -2,22 +2,22 @@ You can clear a map key by assigning the empty string as its value: `$x=""` or `@x=""`. Using `unset` you can remove the key entirely. Examples: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND cat data/small -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr put 'unset $x, $a' data/small -GENMD_EOF +GENMD-EOF This can also be done, of course, using `mlr cut -x`. You can also clear out-of-stream or local variables, at the base name level, or at an indexed sublevel: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr put -q '@sum[$a][$b] += $x; end { dump; unset @sum; dump }' data/small -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr put -q '@sum[$a][$b] += $x; end { dump; unset @sum["eks"]; dump }' data/small -GENMD_EOF +GENMD-EOF If you use `unset all` (or `unset @*` which is synonymous), that will unset all out-of-stream variables which have been assigned up to that point. diff --git a/docs/src/reference-dsl-user-defined-functions.md.in b/docs/src/reference-dsl-user-defined-functions.md.in index c65f98624..1e20760f3 100644 --- a/docs/src/reference-dsl-user-defined-functions.md.in +++ b/docs/src/reference-dsl-user-defined-functions.md.in @@ -6,7 +6,7 @@ As of Miller 5.0.0 you can define your own functions, as well as subroutines. Here's the obligatory example of a recursive function to compute the factorial function: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --opprint --from data/small put ' func f(n) { if (is_numeric(n)) { @@ -21,7 +21,7 @@ mlr --opprint --from data/small put ' $ox = f($x + NR); $oi = f($i); ' -GENMD_EOF +GENMD-EOF Properties of user-defined functions: @@ -45,7 +45,7 @@ Properties of user-defined functions: Example: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --opprint --from data/small put -q ' begin { @call_count = 0; @@ -63,7 +63,7 @@ mlr --opprint --from data/small put -q ' print "NR=" . NR; call s(NR); ' -GENMD_EOF +GENMD-EOF Properties of user-defined subroutines: @@ -97,15 +97,15 @@ If you have a file with UDFs you use frequently, say `my-udfs.mlr`, you can use `--load` or `--mload` to define them for your Miller scripts. For example, in your shell, -GENMD_CARDIFY_HIGHLIGHT_ONE +GENMD-CARDIFY-HIGHLIGHT-ONE alias mlr='mlr --load ~/my-functions.mlr' -GENMD_EOF +GENMD-EOF or -GENMD_CARDIFY_HIGHLIGHT_ONE +GENMD-CARDIFY-HIGHLIGHT-ONE alias mlr='mlr --load /u/miller-udfs/' -GENMD_EOF +GENMD-EOF See the [miscellaneous-flags page](reference-main-flag-list.md#miscellaneous-flags) for more information. @@ -123,16 +123,16 @@ for more information on For example: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --c2p --from example.csv put ' f = func(s, t) { return s . ":" . t; }; $z = f($color, $shape); ' -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --c2p --from example.csv put ' a = func(s, t) { return s . ":" . t . " above"; @@ -143,7 +143,7 @@ mlr --c2p --from example.csv put ' f = $index >= 50 ? a : b; $z = f($color, $shape); ' -GENMD_EOF +GENMD-EOF Note that you need a semicolon after the closing curly brace of the function literal. @@ -151,7 +151,7 @@ Unlike named functions, function literals (also known as unnamed functions) have access to local variables defined in their enclosing scope. That's so you can do things like this: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --c2p --from example.csv put ' f = func(s, t, i) { if (i >= cap) { @@ -163,6 +163,6 @@ mlr --c2p --from example.csv put ' cap = 10; $z = f($color, $shape, $index); ' -GENMD_EOF +GENMD-EOF See the [page on higher-order functions](reference-dsl-higher-order-functions.md) for more. diff --git a/docs/src/reference-dsl-variables.md b/docs/src/reference-dsl-variables.md index 6d81b0b16..84028cde2 100644 --- a/docs/src/reference-dsl-variables.md +++ b/docs/src/reference-dsl-variables.md @@ -987,7 +987,7 @@ etc., to control the format of the output if the output is redirected. See also Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emit > stderr, @*, "index1", "index2"' Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emit | "grep somepattern", @*, "index1", "index2"' -Please see https://johnkerl.org/miller6://johnkerl.org/miller/doc for more information. +Please see https://miller.readthedocs.io://johnkerl.org/miller/doc for more information. emitf: inserts non-indexed out-of-stream variable(s) side-by-side into the output record stream. @@ -1014,7 +1014,7 @@ etc., to control the format of the output if the output is redirected. See also Example: mlr --from f.dat put '@a=$i;@b+=$x;@c+=$y; emitf | "grep somepattern", @a, @b, @c' Example: mlr --from f.dat put '@a=$i;@b+=$x;@c+=$y; emitf | "grep somepattern > mytap.dat", @a, @b, @c' -Please see https://johnkerl.org/miller6://johnkerl.org/miller/doc for more information. +Please see https://miller.readthedocs.io://johnkerl.org/miller/doc for more information. emitp: inserts an out-of-stream variable into the output record stream. Hashmap indices present in the data but not slotted by emitp arguments are @@ -1043,7 +1043,7 @@ etc., to control the format of the output if the output is redirected. See also Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emitp > stderr, @*, "index1", "index2"' Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emitp | "grep somepattern", @*, "index1", "index2"' -Please see https://johnkerl.org/miller6://johnkerl.org/miller/doc for more information. +Please see https://miller.readthedocs.io://johnkerl.org/miller/doc for more information. end: defines a block of statements to be executed after input records are ingested. The body statements must be wrapped in curly braces. diff --git a/docs/src/reference-dsl-variables.md.in b/docs/src/reference-dsl-variables.md.in index fc33164d5..553eef0b6 100644 --- a/docs/src/reference-dsl-variables.md.in +++ b/docs/src/reference-dsl-variables.md.in @@ -20,13 +20,13 @@ If field names have **special characters** such as `.` then you can use braces, You may also use a **computed field name** in square brackets, e.g. -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND echo a=3,b=4 | mlr filter '$["x"] < 0.5' -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND echo s=green,t=blue,a=3,b=4 | mlr put '$[$s."_".$t] = $a * $b' -GENMD_EOF +GENMD-EOF Notes: @@ -46,39 +46,39 @@ Use `$[[3]]` to access the name of field 3. More generally, any expression eval Then using a computed field name, `$[ $[[3]] ]` is the value in the third field. This has the shorter equivalent notation `$[[[3]]]`. -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr cat data/small -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr put '$[[3]] = "NEW"' data/small -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr put '$[[[3]]] = "NEW"' data/small -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr put '$NEW = $[[NR]]' data/small -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr put '$NEW = $[[[NR]]]' data/small -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr put '$[[[NR]]] = "NEW"' data/small -GENMD_EOF +GENMD-EOF Right-hand side accesses to non-existent fields -- i.e. with index less than 1 or greater than `NF` -- return an absent value. Likewise, left-hand side accesses only refer to fields which already exist. For example, if a field has 5 records then assigning the name or value of the 6th (or 600th) field results in a no-op. -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr put '$[[6]] = "NEW"' data/small -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr put '$[[[6]]] = "NEW"' data/small -GENMD_EOF +GENMD-EOF ## Out-of-stream variables @@ -90,21 +90,21 @@ Just as for field names in stream records, if you want to define out-of-stream v You may use a **computed key** in square brackets, e.g. -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND echo s=green,t=blue,a=3,b=4 | mlr put -q '@[$s."_".$t] = $a * $b; emit all' -GENMD_EOF +GENMD-EOF Out-of-stream variables are **scoped** to the `put` command in which they appear. In particular, if you have two or more `put` commands separated by `then`, each put will have its own set of out-of-stream variables: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND cat data/a.dkvp -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr put '@sum += $a; end {emit @sum}' \ then put 'is_present($a) {$a=10*$a; @sum += $a}; end {emit @sum}' \ data/a.dkvp -GENMD_EOF +GENMD-EOF Out-of-stream variables' **extent** is from the start to the end of the record stream, i.e. every time the `put` or `filter` statement referring to them is executed. @@ -114,7 +114,7 @@ Out-of-stream variables are **read-write**: you can do `$sum=@sum`, `@sum=$sum`, Using an index on the `@count` and `@sum` variables, we get the benefit of the `-g` (group-by) option which `mlr stats1` and various other Miller commands have: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr put -q ' @x_count[$a] += 1; @x_sum[$a] += $x; @@ -123,15 +123,15 @@ mlr put -q ' emit @x_sum, "a"; } ' ./data/small -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr stats1 -a count,sum -f x -g a ./data/small -GENMD_EOF +GENMD-EOF Indices can be arbitrarily deep -- here there are two or more of them: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --from data/medium put -q ' @x_count[$a][$b] += 1; @x_sum[$a][$b] += $x; @@ -139,13 +139,13 @@ mlr --from data/medium put -q ' emit (@x_count, @x_sum), "a", "b"; } ' -GENMD_EOF +GENMD-EOF The idea is that `stats1`, and other Miller verbs, encapsulate frequently-used patterns with a minimum of keystroking (and run a little faster), whereas using out-of-stream variables you have more flexibility and control in what you do. Begin/end blocks can be mixed with pattern/action blocks. For example: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr put ' begin { @num_total = 0; @@ -160,7 +160,7 @@ mlr put ' emitf @num_total, @num_positive } ' data/put-gating-example-1.dkvp -GENMD_EOF +GENMD-EOF ## Local variables @@ -168,7 +168,7 @@ Local variables are similar to out-of-stream variables, except that their extent For example: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND # Here I'm using a specified random-number seed so this example always # produces the same output for this web document: in everyday practice we # would leave off the --seed 12345 part. @@ -185,7 +185,7 @@ mlr --seed 12345 seqgen --start 1 --stop 10 then put ' num o = f(10, 20); # local to the top-level scope $o = o; ' -GENMD_EOF +GENMD-EOF Things which are completely unsurprising, resembling many other languages: @@ -213,23 +213,23 @@ Things which are perhaps surprising compared to other languages: The following example demonstrates the scope rules: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND cat data/scope-example.mlr -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND cat data/scope-example.dat -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --oxtab --from data/scope-example.dat put -f data/scope-example.mlr -GENMD_EOF +GENMD-EOF And this example demonstrates the type-declaration rules: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND cat data/type-decl-example.mlr -GENMD_EOF +GENMD-EOF ## Map literals @@ -237,7 +237,7 @@ Miller's `put`/`filter` DSL has four kinds of maps. **Stream records** are (sing For example, the following swaps the input stream's `a` and `i` fields, modifies `y`, and drops the rest: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --opprint put ' $* = { "a": $i, @@ -245,11 +245,11 @@ mlr --opprint put ' "y": $y * 10, } ' data/small -GENMD_EOF +GENMD-EOF Likewise, you can assign map literals to out-of-stream variables or local variables; pass them as arguments to user-defined functions, return them from functions, and so on: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --from data/small put ' func f(map m): map { m["x"] *= 200; @@ -257,11 +257,11 @@ mlr --from data/small put ' } $* = f({"a": $a, "x": $x}); ' -GENMD_EOF +GENMD-EOF Like out-of-stream and local variables, map literals can be multi-level: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --from data/small put -q ' begin { @o = { @@ -281,7 +281,7 @@ mlr --from data/small put -q ' dump @o; } ' -GENMD_EOF +GENMD-EOF See also the [Maps page](reference-main-maps.md). @@ -301,13 +301,13 @@ read/write access to environment variables, e.g. `ENV["HOME"]` or -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr filter 'FNR == 2' data/small* -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr put '$fnr = FNR' data/small* -GENMD_EOF +GENMD-EOF Their values of `NF`, `NR`, `FNR`, `FILENUM`, and `FILENAME` change from one record to the next as Miller scans through your input data stream. The @@ -318,13 +318,13 @@ system environment variables at the time Miller starts. Any changes made to Their **scope is global**: you can refer to them in any `filter` or `put` statement. Their values are assigned by the input-record reader: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --csv put '$nr = NR' data/a.csv -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --csv repeat -n 3 then put '$nr = NR' data/a.csv -GENMD_EOF +GENMD-EOF The **extent** is for the duration of the put/filter: in a `begin` statement (which executes before the first input record is consumed) you will find `NR=1` and in an `end` statement (which is executed after the last input record is consumed) you will find `NR` to be the total number of records ingested. @@ -340,13 +340,13 @@ Use of type-checking is entirely up to you: omit it if you want flexibility with The following `is_...` functions take a value and return a boolean indicating whether the argument is of the indicated type. The `assert_...` functions return their argument if it is of the specified type, and cause a fatal error otherwise: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr -f | grep ^is -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr -f | grep ^assert -GENMD_EOF +GENMD-EOF See [Data-cleaning Examples](data-cleaning-examples.md) for examples of how to use these. @@ -356,36 +356,36 @@ Local variables can be defined either untyped as in `x = 1`, or typed as in `int The reason for `num` is that `int` and `float` typedecls are very precise: -GENMD_CARDIFY +GENMD-CARDIFY float a = 0; # Runtime error since 0 is int not float int b = 1.0; # Runtime error since 1.0 is float not int num c = 0; # OK num d = 1.0; # OK -GENMD_EOF +GENMD-EOF A suggestion is to use `num` for general use when you want numeric content, and use `int` when you genuinely want integer-only values, e.g. in loop indices or map keys (since Miller map keys can only be strings or ints). The `var` type declaration indicates no type restrictions, e.g. `var x = 1` has the same type restrictions on `x` as `x = 1`. The difference is in intentional shadowing: if you have `x = 1` in outer scope and `x = 2` in inner scope (e.g. within a for-loop or an if-statement) then outer-scope `x` has value 2 after the second assignment. But if you have `var x = 2` in the inner scope, then you are declaring a variable scoped to the inner block.) For example: -GENMD_CARDIFY +GENMD-CARDIFY x = 1; if (NR == 4) { x = 2; # Refers to outer-scope x: value changes from 1 to 2. } print x; # Value of x is now two -GENMD_EOF +GENMD-EOF -GENMD_CARDIFY +GENMD-CARDIFY x = 1; if (NR == 4) { var x = 2; # Defines a new inner-scope x with value 2 } print x; # Value of this x is still 1 -GENMD_EOF +GENMD-EOF Likewise function arguments can optionally be typed, with type enforced when the function is called: -GENMD_CARDIFY +GENMD-CARDIFY func f(map m, int i) { ... } @@ -396,11 +396,11 @@ if (NR == 4) { var x = 2; # Defines a new inner-scope x with value 2 } print x; # Value of this x is still 1 -GENMD_EOF +GENMD-EOF Thirdly, function return values can be type-checked at the point of `return` using `:` and a typedecl after the parameter list: -GENMD_CARDIFY +GENMD-CARDIFY func f(map m, int i): bool { ... ... @@ -419,7 +419,7 @@ func f(map m, int i): bool { # So it would also be a runtime error on reaching the end of this function without # an explicit return statement. } -GENMD_EOF +GENMD-EOF ## Aggregate variable assignments @@ -431,7 +431,7 @@ There are three remaining kinds of variable assignment using out-of-stream varia Example recursive copy of out-of-stream variables: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --opprint --from data/small put -q ' @v["sum"] += $x; @v["count"] += 1; @@ -441,35 +441,35 @@ mlr --opprint --from data/small put -q ' dump } ' -GENMD_EOF +GENMD-EOF Example of out-of-stream variable assigned to full stream record, where the 2nd record is stashed, and the 4th record is overwritten with that: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr put 'NR == 2 {@keep = $*}; NR == 4 {$* = @keep}' data/small -GENMD_EOF +GENMD-EOF Example of full stream record assigned to an out-of-stream variable, finding the record for which the `x` field has the largest value in the input stream: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND cat data/small -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --opprint put -q ' is_null(@xmax) || $x > @xmax {@xmax = $x; @recmax = $*}; end {emit @recmax} ' data/small -GENMD_EOF +GENMD-EOF ## Keywords for filter and put -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr help list-keywords # you can also use mlr -k -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr help usage-keywords # you can also use mlr -K -GENMD_EOF +GENMD-EOF diff --git a/docs/src/reference-dsl.md.in b/docs/src/reference-dsl.md.in index 42f53835d..eb2ae470a 100644 --- a/docs/src/reference-dsl.md.in +++ b/docs/src/reference-dsl.md.in @@ -16,9 +16,9 @@ Here's comparison of verbs and `put`/`filter` DSL expressions: Example: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr stats1 -a sum -f x -g a data/small -GENMD_EOF +GENMD-EOF * Verbs are coded in Go * They run a bit faster @@ -28,9 +28,9 @@ GENMD_EOF Example: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr put -q '@x_sum[$a] += $x; end{emit @x_sum, "a"}' data/small -GENMD_EOF +GENMD-EOF * You get to write your own DSL expressions * They run a bit slower @@ -62,15 +62,15 @@ page on [operating on all records](operating-on-all-records.md).) To see this in action, let's take a look at the [data/short.csv](./data/short.csv) file: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND cat data/short.csv -GENMD_EOF +GENMD-EOF There are three records in this file, with `word=apple`, `word=ball`, and `word=cat`, respectively. Let's print something in a `begin` statement, add a field in a main statement, and print something else in an `end` statement: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --csv --from data/short.csv put ' begin { print "begin"; @@ -80,7 +80,7 @@ mlr --csv --from data/short.csv put ' print "end"; } ' -GENMD_EOF +GENMD-EOF The `print` statements for `begin` and `end` went out before the first record was seen and after the last was seen; the field-creation statement `$nr = NR` @@ -100,21 +100,21 @@ The essential usages of `mlr filter` and `mlr put` are for record-selection and record-updating expressions, respectively. For example, given the following input data: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND cat data/small -GENMD_EOF +GENMD-EOF you might retain only the records whose `a` field has value `eks`: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr filter '$a == "eks"' data/small -GENMD_EOF +GENMD-EOF or you might add a new field which is a function of existing fields: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr put '$ab = $a . "_" . $b ' data/small -GENMD_EOF +GENMD-EOF ## Differences between put and filter @@ -128,7 +128,7 @@ The two verbs `mlr filter` and `mlr put` are essentially the same. The only diff You can define and invoke functions and subroutines to help produce the bare-boolean statement, and record fields may be assigned in the statements before or after the bare-boolean statement. For example: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --c2p --from example.csv filter ' # Bare-boolean filter expression: only records matching this pass through: $quantity >= 70; @@ -139,9 +139,9 @@ mlr --c2p --from example.csv filter ' $description = "low rate"; } ' -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --c2p --from example.csv filter ' # Bare-boolean filter expression: only records matching this pass through: $shape =~ "^(...)(...)$"; @@ -150,7 +150,7 @@ mlr --c2p --from example.csv filter ' $left = "\1"; $right = "\2"; ' -GENMD_EOF +GENMD-EOF There are more details and more choices, of course, as detailed in the following sections. diff --git a/docs/src/reference-main-arithmetic.md.in b/docs/src/reference-main-arithmetic.md.in index c50563e57..6e481f736 100644 --- a/docs/src/reference-main-arithmetic.md.in +++ b/docs/src/reference-main-arithmetic.md.in @@ -22,7 +22,7 @@ The short of it is that Miller does this transparently for you so you needn't th Implementation details of this, for the interested: integer adds and subtracts overflow by at most one bit so it suffices to check sign-changes. Thus, Miller allows you to add and subtract arbitrary 64-bit signed integers, converting only to float precisely when the result is less than -2\*\*63 or greater than 2\*\*63 - 1. Multiplies, on the other hand, can overflow by a word size and a sign-change technique does not suffice to detect overflow. Instead, Miller tests whether the floating-point product exceeds the representable integer range. Now, 64-bit integers have 64-bit precision while IEEE-doubles have only 52-bit mantissas -- so, there are 53 bits including implicit leading one. The following experiment explicitly demonstrates the resolution at this range: -GENMD_CARDIFY +GENMD-CARDIFY 64-bit integer 64-bit integer Casted to double Back to 64-bit in hex in decimal integer 0x7ffffffffffff9ff 9223372036854774271 9223372036854773760.000000 0x7ffffffffffff800 @@ -33,7 +33,7 @@ in hex in decimal integer 0x7ffffffffffffe00 9223372036854775296 9223372036854775808.000000 0x8000000000000000 0x7ffffffffffffffe 9223372036854775806 9223372036854775808.000000 0x8000000000000000 0x7fffffffffffffff 9223372036854775807 9223372036854775808.000000 0x8000000000000000 -GENMD_EOF +GENMD-EOF That is, one cannot check an integer product to see if it is precisely greater than 2\*\*63 - 1 or less than -2\*\*63 using either integer arithmetic (it may have already overflowed) or using double-precision (due to granularity). Instead, Miller checks for overflow in 64-bit integer multiplication by seeing whether the absolute value of the double-precision product exceeds the largest representable IEEE double less than 2\*\*63, which we see from the listing above is 9223372036854774784. (An alternative would be to do all integer multiplies using handcrafted multi-word 128-bit arithmetic. This approach is not taken.) diff --git a/docs/src/reference-main-arrays.md.in b/docs/src/reference-main-arrays.md.in index 031e84688..e25dff3d1 100644 --- a/docs/src/reference-main-arrays.md.in +++ b/docs/src/reference-main-arrays.md.in @@ -10,18 +10,18 @@ of the major advantages of Miller 6. Array literals are written in square brackets braces with integer indices. Array slots can be any [Miller data type](reference-main-data-types.md) (including other arrays, or maps). -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr -n put ' end { x = [ "a", 1, "b", {"x": 2, "y": [3,4,5]}, 99, true]; print x; } ' -GENMD_EOF +GENMD-EOF As with maps and argument-lists, trailing commas are supported: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr -n put ' end { x = [ @@ -32,7 +32,7 @@ mlr -n put ' print x; } ' -GENMD_EOF +GENMD-EOF Also note that several [built-in functions](reference-dsl-builtin-functions.md) operate on arrays and/or return arrays. @@ -60,7 +60,7 @@ are already 1-up in Miller, and always have been, mostly inherited from AWK: Imitating Python and other languages, you can use negative indices to read backward from the end of the array, while positive indices read forward from the start. If an array has length `n` then `-n..-1` are aliases for `1..n`, respectively; 0 is never a valid array index in Miller. -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr -n put ' end { x = [10, 20, 30, 40, 50]; @@ -70,7 +70,7 @@ mlr -n put ' print x[-2:-1]; } ' -GENMD_EOF +GENMD-EOF ## Slicing @@ -79,7 +79,7 @@ in a slice can be negatively aliased as described above. Unlike in Python, Miller array-slice indices are inclusive on both sides: `x[3:5]` means `[x[3], x[4], x[5]]`. -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr -n put ' end { x = [10, 20, 30, 40, 50]; @@ -90,7 +90,7 @@ mlr -n put ' print x[2:-2]; } ' -GENMD_EOF +GENMD-EOF ## Out-of-bounds indexing @@ -98,7 +98,7 @@ Somewhat imitating Python, out-of-bounds index accesses are [absent](reference-main-null-data.md), but out-of-bounds slice accesses result in trimming the indices, resulting in a short array or even the empty array: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr -n put ' end { x = [10, 20, 30, 40, 50]; @@ -107,9 +107,9 @@ mlr -n put ' print x[6]; # absent } ' -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr -n put ' end { x = [10, 20, 30, 40, 50]; @@ -118,7 +118,7 @@ mlr -n put ' print x[10:20]; } ' -GENMD_EOF +GENMD-EOF ## Auto-create results in maps @@ -126,7 +126,7 @@ As noted on the [maps page](reference-main-maps.md), indexing any as-yet-assigned local variable or out-of-stream variable results in **auto-create** of that variable as a map variable: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --csv --from example.csv put -q ' # You can do this but you do not need to: # begin { @last_rates = {} } @@ -135,12 +135,12 @@ mlr --csv --from example.csv put -q ' dump @last_rates; } ' -GENMD_EOF +GENMD-EOF *This also means that auto-create results in maps, not arrays, even if keys are integers.* If you want to auto-extend an [array](reference-main-arrays.md), initialize it explicitly to `[]`. -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --csv --from example.csv head -n 4 then put -q ' begin { @my_array = []; @@ -151,7 +151,7 @@ mlr --csv --from example.csv head -n 4 then put -q ' dump } ' -GENMD_EOF +GENMD-EOF ## Auto-extend and null-gaps @@ -170,7 +170,7 @@ However, if an array is written to more than one past its end, [values of type JSON-null](reference-main-data-types.md) are used to fill in the gaps. These are called **null-gaps**. -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr -n put ' end { no_gaps = []; @@ -185,13 +185,13 @@ mlr -n put ' print gaps; } ' -GENMD_EOF +GENMD-EOF ## Unset as shift Unsetting an array index results in shifting all higher-index elements down by one: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr -n put ' end { x = [ "a", "b", "c", "d", "e"]; @@ -200,11 +200,11 @@ mlr -n put ' print x; } ' -GENMD_EOF +GENMD-EOF More generally, you can get shift and pop operations by unsetting indices 1 and -1: -GENMD_CARDIFY_HIGHLIGHT_ONE +GENMD-CARDIFY-HIGHLIGHT-ONE $ mlr repl -q [mlr] x=[1,2,3,4,5] [mlr] unset x[-1] @@ -222,7 +222,7 @@ $ mlr repl -q [mlr] x [3, 4, 5] [mlr] -GENMD_EOF +GENMD-EOF ## Looping diff --git a/docs/src/reference-main-auxiliary-commands.md.in b/docs/src/reference-main-auxiliary-commands.md.in index 59d1513ea..a2b40c527 100644 --- a/docs/src/reference-main-auxiliary-commands.md.in +++ b/docs/src/reference-main-auxiliary-commands.md.in @@ -2,46 +2,46 @@ There are a few nearly-standalone programs which have a little to do with the rest of Miller, do not participate in record streams, and do not deal with file formats. They might as well be little standalone executables, but instead they're delivered within the main Miller executable for convenience. -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr aux-list -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr lecat --help -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr termcvt --help -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr hex --help -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr unhex --help -GENMD_EOF +GENMD-EOF Examples: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND echo 'Hello, world!' | mlr lecat --mono -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND echo 'Hello, world!' | mlr termcvt --lf2crlf | mlr lecat --mono -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr hex data/budget.csv -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr hex -r data/budget.csv -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr hex -r data/budget.csv | sed 's/20/2a/g' | mlr unhex -GENMD_EOF +GENMD-EOF Additionally, [`mlr help`](online-help.md), [`mlr repl`](repl.md), and [`mlr regtest`](https://github.com/johnkerl/miller/blob/main/go/regtest/README.md) are implemented here. diff --git a/docs/src/reference-main-compressed-data.md.in b/docs/src/reference-main-compressed-data.md.in index 0ffb98f75..afd7eae7d 100644 --- a/docs/src/reference-main-compressed-data.md.in +++ b/docs/src/reference-main-compressed-data.md.in @@ -8,14 +8,14 @@ more general `--prepipe` option to support other decompression programs. If your files end in `.gz`, `.bz2`, or `.z` then Miller will autodetect by file extension: -GENMD_CARDIFY_HIGHLIGHT_ONE +GENMD-CARDIFY-HIGHLIGHT-ONE file gz-example.csv.gz gz-example.csv.gz: gzip compressed data, was "gz-example.csv", last modified: Mon Aug 23 02:04:34 2021, from Unix, original size modulo 2^32 429 -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --csv sort -f color gz-example.csv.gz -GENMD_EOF +GENMD-EOF This will decompress the input data on the fly, while leaving the disk file unmodified. This helps you save disk space, at the cost of some additional runtime CPU usage to decompress the data. @@ -23,9 +23,9 @@ This will decompress the input data on the fly, while leaving the disk file unmo If the filename doesn't in in `.gz`, `.bz2`, or `.z` then you can use the flags `--gzin`, `--bz2in`, or `--zin` to let Miller know: -GENMD_CARDIFY_HIGHLIGHT_ONE +GENMD-CARDIFY-HIGHLIGHT-ONE mlr --csv --gzin sort -f color myfile.bin # myfile.bin has gzip contents -GENMD_EOF +GENMD-EOF ## External decompressors on input @@ -35,9 +35,9 @@ piping the standard output of that program to Miller's standard input. You can, of course, already do without this for single input files, for example: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND gunzip < gz-example.csv.gz | mlr --csv sort -f color -GENMD_EOF +GENMD-EOF The benefit of `--prepipe` is that Miller will run the specified program once per file, respecting file boundaries. @@ -81,27 +81,27 @@ For compressed output: file, which is annoying for version control. That can be suppressed by using 'gzip -n' but then that's confusing for the reader, who has no need for -n. We handle this by making this code sample non-live. ---> -GENMD_CARDIFY_HIGHLIGHT_FOUR +GENMD-CARDIFY-HIGHLIGHT-FOUR mlr --from example.csv --csv put -q ' filename = $color.".csv.gz"; tee | "gzip > ".filename, $* ' -GENMD_EOF +GENMD-EOF -GENMD_CARDIFY_HIGHLIGHT_ONE +GENMD-CARDIFY-HIGHLIGHT-ONE file red.csv.gz purple.csv.gz yellow.csv.gz red.csv.gz: gzip compressed data, last modified: Mon Aug 23 02:34:05 2021, from Unix, original size modulo 2^32 185 purple.csv.gz: gzip compressed data, last modified: Mon Aug 23 02:34:05 2021, from Unix, original size modulo 2^32 164 yellow.csv.gz: gzip compressed data, last modified: Mon Aug 23 02:34:05 2021, from Unix, original size modulo 2^32 158 -GENMD_EOF +GENMD-EOF -GENMD_CARDIFY_HIGHLIGHT_ONE +GENMD-CARDIFY-HIGHLIGHT-ONE mlr --csv cat yellow.csv.gz color,shape,flag,k,index,quantity,rate yellow,triangle,true,1,11,43.6498,9.8870 yellow,circle,true,8,73,63.9785,4.2370 yellow,circle,true,9,87,63.5058,8.3350 -GENMD_EOF +GENMD-EOF * Using the [in-place flag](reference-main-in-place-processing.md) `-I`, the overwritten file will be compressed when possible. See the [page on in-place mode](reference-main-in-place-processing.md) for details. diff --git a/docs/src/reference-main-data-types.md.in b/docs/src/reference-main-data-types.md.in index 0cbfc949e..d3e7e4268 100644 --- a/docs/src/reference-main-data-types.md.in +++ b/docs/src/reference-main-data-types.md.in @@ -49,11 +49,11 @@ dot operator has been generalized to stringify non-strings Examples: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --csv cat data/type-infer.csv -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --icsv --oxtab --from data/type-infer.csv put ' $d = $a . $c; $e = 7; @@ -67,7 +67,7 @@ mlr --icsv --oxtab --from data/type-infer.csv put ' $tf = typeof($f); $tg = typeof($g); ' then reorder -f a,ta,b,tb,c,tc,d,td,e,te,f,tf,g,tg -GENMD_EOF +GENMD-EOF On input, string values representable as boolean (e.g. `"true"`, `"false"`) are *not* automatically treated as boolean. This is because `"true"` and @@ -90,21 +90,21 @@ they will not be auto-converted, but you can use the or the [`json_parse` DSL function](reference-dsl-builtin-functions.md#json_parse): -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --csv --from data/json-in-csv.csv cat -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --icsv --ojson --from data/json-in-csv.csv cat -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --icsv --ojson --from data/json-in-csv.csv json-parse -f blob -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --icsv --ojson --from data/json-in-csv.csv put '$blob = json_parse($blob)' -GENMD_EOF +GENMD-EOF These have their respective operations to convert back to string: the [`json-stringify` verb](reference-verbs.md#json-stringify) diff --git a/docs/src/reference-main-flag-list.md.in b/docs/src/reference-main-flag-list.md.in index 66d4f7d37..80bc0c1ff 100644 --- a/docs/src/reference-main-flag-list.md.in +++ b/docs/src/reference-main-flag-list.md.in @@ -2,13 +2,13 @@ Here are flags you can use when invoking Miller. For example, when you type -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --icsv --ojson head -n 1 example.csv -GENMD_EOF +GENMD-EOF the `--icsv` and `--ojson` bits are _flags_. See the [Miller command structure](reference-main-overview.md) page for context. Also, at the command line, you can use `mlr -g` for a list much like this one. -GENMD_RUN_CONTENT_GENERATOR(./mk-flag-info.rb) +GENMD-RUN-CONTENT-GENERATOR(./mk-flag-info.rb) diff --git a/docs/src/reference-main-maps.md.in b/docs/src/reference-main-maps.md.in index b875dd407..3ec63722a 100644 --- a/docs/src/reference-main-maps.md.in +++ b/docs/src/reference-main-maps.md.in @@ -11,7 +11,7 @@ there are a few differences as noted below. _Map literals_ are written in curly braces with string keys any [Miller data type](reference-main-data-types.md) (including other maps, or arrays) as values. Also, integers may be given as keys although they'll be stored as strings. -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr -n put ' end { x = {"a": 1, "b": {"x": 2, "y": [3,4,5]}, 99: true}; @@ -20,11 +20,11 @@ mlr -n put ' print x["99"]; } ' -GENMD_EOF +GENMD-EOF As with arrays and argument-lists, trailing commas are supported: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr -n put ' end { x = { @@ -35,20 +35,20 @@ mlr -n put ' print x; } ' -GENMD_EOF +GENMD-EOF The current record, accessible using `$*`, is a map. -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --csv --from example.csv head -n 2 then put -q ' dump $*; print "Color is", $*["color"]; ' -GENMD_EOF +GENMD-EOF The collection of all [out-of-stream variables](reference-dsl-variables.md#out-of-stream0variables), `@*`, is a map. -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --csv --from example.csv put -q ' begin { @last_rates = {}; @@ -59,7 +59,7 @@ mlr --csv --from example.csv put -q ' dump @*; } ' -GENMD_EOF +GENMD-EOF Also note that several [built-in functions](reference-dsl-builtin-functions.md) operate on maps and/or return maps. @@ -82,7 +82,7 @@ let people do `@records[NR] = $*`. Indexing any as-yet-assigned local variable or out-of-stream variable results in **auto-create** of that variable as a map variable: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --csv --from example.csv put -q ' # You can do this but you do not need to: # begin { @last_rates = {} } @@ -91,12 +91,12 @@ mlr --csv --from example.csv put -q ' dump @last_rates; } ' -GENMD_EOF +GENMD-EOF *This also means that auto-create results in maps, not arrays, even if keys are integers.* If you want to auto-extend an [array](reference-main-arrays.md), initialize it explicitly to `[]`. -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --csv --from example.csv head -n 4 then put -q ' begin { @my_array = []; @@ -107,7 +107,7 @@ mlr --csv --from example.csv head -n 4 then put -q ' dump } ' -GENMD_EOF +GENMD-EOF ## Auto-deepen @@ -116,14 +116,14 @@ without first setting `@m["a"]={}` and `@m["a"]["b"]={}`. The reason for this is for doing data aggregations: for example if you want compute keyed sums, you can do that with a minimum of keystrokes. -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --icsv --opprint --from example.csv put -q ' @quantity_sum[$color][$shape] += $rate; end { emit @quantity_sum, "color", "shape"; } ' -GENMD_EOF +GENMD-EOF ## Looping diff --git a/docs/src/reference-main-null-data.md.in b/docs/src/reference-main-null-data.md.in index 2091ed7a8..837780925 100644 --- a/docs/src/reference-main-null-data.md.in +++ b/docs/src/reference-main-null-data.md.in @@ -14,55 +14,55 @@ Miller has three kinds of null data: You can test these programatically using the functions `is_empty`/`is_not_empty`, `is_absent`/`is_present`, and `is_null`/`is_not_null`. For the last pair, note that null means either empty or absent. Here is a full list of such functions: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr -f | grep is_ -GENMD_EOF +GENMD-EOF ## Rules for null-handling * Records with one or more empty sort-field values sort after records with all sort-field values present: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr cat data/sort-null.dat -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr sort -n a data/sort-null.dat -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr sort -nr a data/sort-null.dat -GENMD_EOF +GENMD-EOF * Functions/operators which have one or more *empty* arguments produce empty output: e.g. -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND echo 'x=2,y=3' | mlr put '$a=$x+$y' -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND echo 'x=,y=3' | mlr put '$a=$x+$y' -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND echo 'x=,y=3' | mlr put '$a=log($x);$b=log($y)' -GENMD_EOF +GENMD-EOF with the exception that the `min` and `max` functions are special: if one argument is non-null, it wins: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND echo 'x=,y=3' | mlr put '$a=min($x,$y);$b=max($x,$y)' -GENMD_EOF +GENMD-EOF * Functions of *absent* variables (e.g. `mlr put '$y = log10($nonesuch)'`) evaluate to absent, and arithmetic/bitwise/boolean operators with both operands being absent evaluate to absent. Arithmetic operators with one absent operand return the other operand. More specifically, absent values act like zero for addition/subtraction, and one for multiplication: Furthermore, **any expression which evaluates to absent is not stored in the left-hand side of an assignment statement**: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND echo 'x=2,y=3' | mlr put '$a=$u+$v; $b=$u+$y; $c=$x+$y' -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND echo 'x=2,y=3' | mlr put '$a=min($x,$v);$b=max($u,$y);$c=min($u,$v)' -GENMD_EOF +GENMD-EOF * Likewise, for assignment to maps, **absent-valued keys or values result in a skipped assignment**. @@ -80,22 +80,22 @@ The reasoning is as follows: Since absent plus absent is absent (and likewise for other operators), accumulations such as `@sum += $x` work correctly on heterogenous data, as do within-record formulas if both operands are absent. If one operand is present, you may get behavior you don't desire. To work around this -- namely, to set an output field only for records which have all the inputs present -- you can use a pattern-action block with `is_present`: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr cat data/het.dkvp -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr put 'is_present($loadsec) { $loadmillis = $loadsec * 1000 }' data/het.dkvp -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr put '$loadmillis = (is_present($loadsec) ? $loadsec : 0.0) * 1000' data/het.dkvp -GENMD_EOF +GENMD-EOF ## Arithmetic rules If you're interested in a formal description of how empty and absent fields participate in arithmetic, here's a table for plus (other arithmetic/boolean/bitwise operators are similar): -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr help type-arithmetic-info -GENMD_EOF +GENMD-EOF diff --git a/docs/src/reference-main-number-formatting.md.in b/docs/src/reference-main-number-formatting.md.in index 89b4ac78f..2e6605dc8 100644 --- a/docs/src/reference-main-number-formatting.md.in +++ b/docs/src/reference-main-number-formatting.md.in @@ -4,19 +4,19 @@ The command-line option `--ofmt {format string}` is the global number format for all numeric fields. Examples: -GENMD_CARDIFY +GENMD-CARDIFY --ofmt %.9e --ofmt %.6f --ofmt %.0f -GENMD_EOF +GENMD-EOF These are just familiar `printf` formats. (TODO: write about type-checking once that's implemented.) Additionally, if you use leading width (e.g. `%18.12f`) then the output will contain embedded whitespace, which may not be what you want if you pipe the output to something else, particularly CSV. I use Miller's pretty-print format (`mlr --opprint`) to column-align numerical data. -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND echo 'x=3.1,y=4.3' | mlr --ofmt '%8.3f' cat -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND echo 'x=3.1,y=4.3' | mlr --ofmt '%11.8e' cat -GENMD_EOF +GENMD-EOF ## The format-values verb @@ -29,20 +29,20 @@ To apply formatting to a single field, you can also use [`fmtnum`](reference-dsl-builtin-functions.md#fmtnum) function within `mlr put`. For example: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND echo 'x=3.1,y=4.3' | mlr put '$z=fmtnum($x*$y,"%08f")' -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND echo 'x=0xffff,y=0xff' | mlr put '$z=fmtnum(int($x*$y),"%08x")' -GENMD_EOF +GENMD-EOF Input conversion from hexadecimal is done automatically on fields handled by `mlr put` and `mlr filter` as long as the field value begins with `0x`. To apply output conversion to hexadecimal on a single column, you may use `fmtnum`, or the keystroke-saving [`hexfmt`](reference-dsl-builtin-functions.md#hexfmt) function. Example: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND echo 'x=0xffff,y=0xff' | mlr put '$z=$x*$y' -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND echo 'x=0xffff,y=0xff' | mlr put '$z=hexfmt($x*$y)' -GENMD_EOF +GENMD-EOF diff --git a/docs/src/reference-main-overview.md.in b/docs/src/reference-main-overview.md.in index b3311cd37..3f1db0ee6 100644 --- a/docs/src/reference-main-overview.md.in +++ b/docs/src/reference-main-overview.md.in @@ -11,19 +11,19 @@ The outline of an invocation of Miller is: For example, reading from a file: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --icsv --opprint head -n 2 then sort -f shape example.csv -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --from example.csv --icsv --opprint head -n 2 then sort -f shape -GENMD_EOF +GENMD-EOF Reading from standard input: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND cat example.csv | mlr --icsv --opprint head -n 2 then sort -f shape -GENMD_EOF +GENMD-EOF The rest of this reference section gives you full information on each of these parts of the command line. @@ -41,9 +41,9 @@ Here's a comparison of verbs and `put`/`filter` DSL expressions: Example of using a verb for data processing: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr stats1 -a sum -f x -g a data/small -GENMD_EOF +GENMD-EOF * Verbs are coded in Go * They run a bit faster @@ -53,9 +53,9 @@ GENMD_EOF Example of doing the same thing using a DSL expression: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr put -q '@x_sum[$a] += $x; end{emit @x_sum, "a"}' data/small -GENMD_EOF +GENMD-EOF * You get to write your own expressions in Miller's programming language * They run a bit slower diff --git a/docs/src/reference-main-regular-expressions.md.in b/docs/src/reference-main-regular-expressions.md.in index b5561ccd6..3997c328d 100644 --- a/docs/src/reference-main-regular-expressions.md.in +++ b/docs/src/reference-main-regular-expressions.md.in @@ -26,13 +26,13 @@ Points demonstrated by the above examples: Example: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND cat data/regex-in-data.dat -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr filter '$name =~ $regex' data/regex-in-data.dat -GENMD_EOF +GENMD-EOF ## Regex captures @@ -40,21 +40,21 @@ Regex captures of the form `\0` through `\9` are supported as * Captures have in-function context for `sub` and `gsub`. For example, the first `\1,\2` pair belong to the first `sub` and the second `\1,\2` pair belong to the second `sub`: -GENMD_SHOW_COMMAND +GENMD-SHOW-COMMAND mlr put '$b = sub($a, "(..)_(...)", "\2-\1"); $c = sub($a, "(..)_(.)(..)", ":\1:\2:\3")' -GENMD_EOF +GENMD-EOF * Captures endure for the entirety of a `put` for the `=~` and `!=~` operators. For example, here the `\1,\2` are set by the `=~` operator and are used by both subsequent assignment statements: -GENMD_SHOW_COMMAND +GENMD-SHOW-COMMAND mlr put '$a =~ "(..)_(....); $b = "left_\1"; $c = "right_\2"' -GENMD_EOF +GENMD-EOF * The captures are not retained across multiple puts. For example, here the `\1,\2` won't be expanded from the regex capture: -GENMD_SHOW_COMMAND +GENMD-SHOW-COMMAND mlr put '$a =~ "(..)_(....)' then {... something else ...} then put '$b = "left_\1"; $c = "right_\2"' -GENMD_EOF +GENMD-EOF * Up to nine matches are supported: `\1` through `\9`, while `\0` is the entire match string; `\15` is treated as `\1` followed by an unrelated `5`. diff --git a/docs/src/reference-main-separators.md.in b/docs/src/reference-main-separators.md.in index 9212ae8e4..ce6ec50ff 100644 --- a/docs/src/reference-main-separators.md.in +++ b/docs/src/reference-main-separators.md.in @@ -6,9 +6,9 @@ Miller has record separators, field separators, and pair separators. For example, given the following [DKVP](file-formats.md#dkvp-key-value-pairs) records: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND cat data/a.dkvp -GENMD_EOF +GENMD-EOF * the **record separator** is newline -- it separates records from one another; * the **field separator** is `,` -- it separates fields (key-value pairs) from one another; @@ -40,33 +40,33 @@ separators, `IFS` and `OFS` for the input and output field separators, and For example: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND cat data/a.dkvp -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --ifs , --ofs ';' --ips = --ops : cut -o -f c,a,b data/a.dkvp -GENMD_EOF +GENMD-EOF If your data has non-default separators and you don't want to change those between input and output, you can use `--rs`, `--fs`, and `--ps`. Setting `--fs :` is the same as setting `--ifs : --ofs :`, but with fewer keystrokes. -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND cat data/modsep.dkvp -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --fs ';' --ps : cut -o -f c,a,b data/modsep.dkvp -GENMD_EOF +GENMD-EOF ## Multi-character and regular-expression separators The separators default to single characters, but can be multiple characters if you like: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --ifs ';' --ips : --ofs ';;;' --ops := cut -o -f c,a,b data/modsep.dkvp -GENMD_EOF +GENMD-EOF As of September 2021: @@ -89,22 +89,22 @@ is internally implemented in terms of `--repifs`. For example: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND cat data/extra-spaces.txt -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --ifs ' ' --repifs --inidx --oxtab cat data/extra-spaces.txt -GENMD_EOF +GENMD-EOF ## Aliases Many things we'd like to write as separators need to be escaped from the shell -- e.g. `--ifs ';'` or `--ofs '|'`, and so on. You can use the following if you like: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr help list-separator-aliases -GENMD_EOF +GENMD-EOF Note that `spaces`, `tabs`, and `whitespace` already are regexes so you shouldn't use `--repifs` with them. @@ -113,11 +113,11 @@ shouldn't use `--repifs` with them. Given the above, we now have seen the following flags: -GENMD_CARDIFY +GENMD-CARDIFY --rs --irs --ors --fs --ifs --ofs --repifs --ps --ips --ops -GENMD_EOF +GENMD-EOF See also the [separator-flags section](reference-main-flag-list.md#separator-flags). @@ -127,9 +127,9 @@ Miller exposes for you read-only [built-in variables](reference-dsl-variables.md names `IRS`, `ORS`, `IFS`, `OFS`, `IPS`, and `OPS`. Unlike in AWK, you can't set these in begin-blocks -- their values indicate what you specified at the command line -- so their use is limited. -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --ifs , --ofs ';' --ips = --ops : --from data/a.dkvp put '$d = ">>>" . IFS . "|||" . OFS . "<<<"' -GENMD_EOF +GENMD-EOF ## Which separators apply to which file formats diff --git a/docs/src/reference-main-strings.md.in b/docs/src/reference-main-strings.md.in index ef4f95701..37e6914a3 100644 --- a/docs/src/reference-main-strings.md.in +++ b/docs/src/reference-main-strings.md.in @@ -10,9 +10,9 @@ the single-quotes are consumed by the shell and Miller gets `$b=$a.".suffix"`. ( A basic string operation is the `.` (concatenation) operator: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --c2p --from example.csv put '$output = $color . ":" . $shape' -GENMD_EOF +GENMD-EOF Also see the [list of string-related built-in functions](reference-dsl-builtin-functions.md#string-functions). @@ -47,7 +47,7 @@ backward from the end of the string, while positive indices read forward from the start. If a string has length `n` then `-n..-1` are aliases for `1..n`, respectively; 0 is never a valid string index in Miller. -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr -n put ' end { x = "abcde"; @@ -57,7 +57,7 @@ mlr -n put ' print x[-2:-1]; } ' -GENMD_EOF +GENMD-EOF ## Slicing @@ -65,7 +65,7 @@ Miller supports slicing using `[lo:hi]` syntax. Either or both of the indices in a slice can be negatively aliased as described above. Unlike in Python, Miller string-slice indices are inclusive on both sides: `x[3:5]` means `x[3] . x[4] . x[5]`. -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr -n put ' end { x = "abcde"; @@ -76,7 +76,7 @@ mlr -n put ' print x[2:-2]; } ' -GENMD_EOF +GENMD-EOF ## Out-of-bounds indexing @@ -84,7 +84,7 @@ Somewhat imitating Python, out-of-bounds index accesses are [errors](reference-main-data-types.md), but out-of-bounds slice accesses result in trimming the indices, resulting in a short string or even the empty string: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr -n put ' end { x = "abcde"; @@ -93,9 +93,9 @@ mlr -n put ' print x[6]; # absent } ' -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr -n put ' end { x = "abcde"; @@ -104,7 +104,7 @@ mlr -n put ' print x[10:20]; } ' -GENMD_EOF +GENMD-EOF ## Escape sequences for string literals diff --git a/docs/src/reference-main-then-chaining.md.in b/docs/src/reference-main-then-chaining.md.in index 9b8bf92b2..293124aa2 100644 --- a/docs/src/reference-main-then-chaining.md.in +++ b/docs/src/reference-main-then-chaining.md.in @@ -2,21 +2,21 @@ In accord with the [Unix philosophy](http://en.wikipedia.org/wiki/Unix_philosophy), you can pipe data into or out of Miller. For example: -GENMD_SHOW_COMMAND +GENMD-SHOW-COMMAND mlr cut --complement -f os_version *.dat | mlr sort -f hostname,uptime -GENMD_EOF +GENMD-EOF You can, if you like, instead simply chain commands together using the `then` keyword: -GENMD_SHOW_COMMAND +GENMD-SHOW-COMMAND mlr cut --complement -f os_version then sort -f hostname,uptime *.dat -GENMD_EOF +GENMD-EOF (You can precede the very first verb with `then`, if you like, for symmetry.) Here's a performance comparison: -GENMD_INCLUDE_ESCAPED(data/then-chaining-performance.txt) +GENMD-INCLUDE-ESCAPED(data/then-chaining-performance.txt) There are two reasons to use then-chaining: one is for performance, although I don't expect this to be a win in all cases. Using then-chaining avoids redundant string-parsing and string-formatting at each pipeline step: instead input records are parsed once, they are fed through each pipeline stage in memory, and then output records are formatted once. diff --git a/docs/src/reference-verbs.md b/docs/src/reference-verbs.md index dad722bf0..d20d24ee8 100644 --- a/docs/src/reference-verbs.md +++ b/docs/src/reference-verbs.md @@ -996,7 +996,7 @@ More example filter expressions: Using 'any' higher-order function to see if $index is 10, 20, or 30: 'any([10,20,30], func(e) {return $index == e})' -See also https://johnkerl.org/miller6/reference-dsl for more context. +See also https://miller.readthedocs.io/reference-dsl for more context. ### Features which filter shares with put @@ -2226,7 +2226,7 @@ More example put expressions: end{emitf @min, @max} ' -See also https://johnkerl.org/miller6/reference-dsl for more context. +See also https://miller.readthedocs.io/reference-dsl for more context. ### Features which put shares with filter diff --git a/docs/src/reference-verbs.md.in b/docs/src/reference-verbs.md.in index 55c103d50..187164c43 100644 --- a/docs/src/reference-verbs.md.in +++ b/docs/src/reference-verbs.md.in @@ -3,9 +3,9 @@ Verbs are the building blocks of how you can use Miller to process your data. When you type -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --icsv --opprint sort -n quanity then head -n 4 example.csv -GENMD_EOF +GENMD-EOF the `sort` and `head` bits are _verbs_. See the [Miller command structure](reference-main-overview.md) page for context. @@ -19,7 +19,7 @@ Whereas the Unix toolkit is made of the separate executables `cat`, `tail`, `cut `sort`, etc., Miller has subcommands, or **verbs**, such as `mlr cat`, `mlr tail`, `mlr cut`, and `mlr sort`, invoked as follows: -GENMD_INCLUDE_ESCAPED(data/subcommand-example.txt) +GENMD-INCLUDE-ESCAPED(data/subcommand-example.txt) These fall into categories as follows: @@ -37,54 +37,54 @@ These fall into categories as follows: Map list of values to alternating key/value pairs. -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr altkv -h -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND echo 'a,b,c,d,e,f' | mlr altkv -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND echo 'a,b,c,d,e,f,g' | mlr altkv -GENMD_EOF +GENMD-EOF ## bar Cheesy bar-charting. -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr bar -h -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --opprint cat data/small -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --opprint bar --lo 0 --hi 1 -f x,y data/small -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --opprint bar --lo 0.4 --hi 0.6 -f x,y data/small -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --opprint bar --auto -f x,y -w 20 data/small -GENMD_EOF +GENMD-EOF ## bootstrap -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr bootstrap --help -GENMD_EOF +GENMD-EOF The canonical use for bootstrap sampling is to put error bars on statistical quantities, such as mean. For example: -GENMD_CARDIFY_HIGHLIGHT_ONE +GENMD-CARDIFY-HIGHLIGHT-ONE mlr --c2p stats1 -a mean,count -f u -g color data/colored-shapes.csv color u_mean u_count yellow 0.4971291160651098 1413 @@ -93,9 +93,9 @@ purple 0.49400496322241666 1142 green 0.5048610595130744 1109 blue 0.5177171537414964 1470 orange 0.49053241584158375 303 -GENMD_EOF +GENMD-EOF -GENMD_CARDIFY_HIGHLIGHT_ONE +GENMD-CARDIFY-HIGHLIGHT-ONE mlr --c2p bootstrap then stats1 -a mean,count -f u -g color data/colored-shapes.csv color u_mean u_count red 0.49183858109559747 4655 @@ -104,9 +104,9 @@ green 0.5018994641860465 1075 orange 0.5005396620689654 290 blue 0.5309761257817928 1439 purple 0.4917481873438798 1201 -GENMD_EOF +GENMD-EOF -GENMD_CARDIFY_HIGHLIGHT_ONE +GENMD-CARDIFY-HIGHLIGHT-ONE color u_mean u_count yellow 0.4809714157857651 1419 blue 0.5057790647530039 1498 @@ -114,9 +114,9 @@ red 0.49114305508382283 4593 purple 0.49652395202020194 1188 green 0.5011425433212993 1108 orange 0.48935696323529426 272 -GENMD_EOF +GENMD-EOF -GENMD_CARDIFY_HIGHLIGHT_ONE +GENMD-CARDIFY-HIGHLIGHT-ONE mlr --c2p bootstrap then stats1 -a mean,count -f u -g color data/colored-shapes.csv color u_mean u_count red 0.49934473217726466 4671 @@ -125,71 +125,71 @@ blue 0.5097866573146287 1497 yellow 0.4987188126740959 1436 orange 0.4802164827586204 290 green 0.5129018241860459 1075 -GENMD_EOF +GENMD-EOF ## cat Most useful for format conversions (see [File Formats](file-formats.md), and concatenating multiple same-schema CSV files to have the same header: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr cat -h -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND cat data/a.csv -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND cat data/b.csv -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --csv cat data/a.csv data/b.csv -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --icsv --oxtab cat data/a.csv data/b.csv -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --csv cat -n data/a.csv data/b.csv -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --opprint cat data/small -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --opprint cat -n -g a data/small -GENMD_EOF +GENMD-EOF ## check -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr check --help -GENMD_EOF +GENMD-EOF ## clean-whitespace -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr clean-whitespace --help -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --icsv --ojson cat data/clean-whitespace.csv -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --icsv --ojson clean-whitespace -k data/clean-whitespace.csv -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --icsv --ojson clean-whitespace -v data/clean-whitespace.csv -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --icsv --ojson clean-whitespace data/clean-whitespace.csv -GENMD_EOF +GENMD-EOF Function links: @@ -201,143 +201,143 @@ Function links: ## count -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr count --help -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr count data/medium -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr count -g a data/medium -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr count -n -g a data/medium -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr count -g b data/medium -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr count -n -g b data/medium -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr count -g a,b data/medium -GENMD_EOF +GENMD-EOF ## count-distinct -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr count-distinct --help -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr count-distinct -f a,b then sort -nr count data/medium -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr count-distinct -u -f a,b data/medium -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr count-distinct -f a,b -o someothername then sort -nr someothername data/medium -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr count-distinct -n -f a,b data/medium -GENMD_EOF +GENMD-EOF ## count-similar -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr count-similar --help -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --opprint head -n 20 data/medium -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --opprint head -n 20 then count-similar -g a data/medium -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --opprint head -n 20 then count-similar -g a then sort -f a data/medium -GENMD_EOF +GENMD-EOF ## cut -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr cut --help -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --opprint cat data/small -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --opprint cut -f y,x,i data/small -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND echo 'a=1,b=2,c=3' | mlr cut -f b,c,a -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND echo 'a=1,b=2,c=3' | mlr cut -o -f b,c,a -GENMD_EOF +GENMD-EOF ## decimate -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr decimate --help -GENMD_EOF +GENMD-EOF ## fill-down -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr fill-down --help -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND cat data/fillable.csv -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --csv fill-down -f b data/fillable.csv -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --csv fill-down -a -f b data/fillable.csv -GENMD_EOF +GENMD-EOF ## fill-empty -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr fill-empty --help -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND cat data/fillable.csv -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --csv fill-empty data/fillable.csv -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --csv fill-empty -v something data/fillable.csv -GENMD_EOF +GENMD-EOF ## filter -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr filter --help -GENMD_EOF +GENMD-EOF ### Features which filter shares with put @@ -345,363 +345,363 @@ Please see [DSL reference](reference-dsl.md) for more information about the expr ## flatten -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr flatten --help -GENMD_EOF +GENMD-EOF ## format-values -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr format-values --help -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --opprint format-values data/small -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --opprint format-values -n data/small -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --opprint format-values -i %08llx -f %.6le -s X%sX data/small -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --opprint format-values -i %08llx -f %.6le -s X%sX -n data/small -GENMD_EOF +GENMD-EOF ## fraction -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr fraction --help -GENMD_EOF +GENMD-EOF For example, suppose you have the following CSV file: -GENMD_INCLUDE_ESCAPED(data/fraction-example.csv) +GENMD-INCLUDE-ESCAPED(data/fraction-example.csv) Then we can see what each record's `n` contributes to the total `n`: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --opprint fraction -f n data/fraction-example.csv -GENMD_EOF +GENMD-EOF Using `-g` we can split those out by gender, or by color: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --opprint fraction -f n -g u data/fraction-example.csv -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --opprint fraction -f n -g v data/fraction-example.csv -GENMD_EOF +GENMD-EOF We can see, for example, that 70.9% of females have red (on the left) while 94.5% of reds are for females. To convert fractions to percents, you may use `-p`: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --opprint fraction -f n -p data/fraction-example.csv -GENMD_EOF +GENMD-EOF Another often-used idiom is to convert from a point distribution to a cumulative distribution, also known as "running sums". Here, you can use `-c`: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --opprint fraction -f n -p -c data/fraction-example.csv -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --opprint fraction -f n -g u -p -c data/fraction-example.csv -GENMD_EOF +GENMD-EOF ## gap -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr gap -h -GENMD_EOF +GENMD-EOF ## grep -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr grep -h -GENMD_EOF +GENMD-EOF ## group-by -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr group-by --help -GENMD_EOF +GENMD-EOF This is similar to `sort` but with less work. Namely, Miller's sort has three steps: read through the data and append linked lists of records, one for each unique combination of the key-field values; after all records are read, sort the key-field values; then print each record-list. The group-by operation simply omits the middle sort. An example should make this more clear: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --opprint sort -f a data/small -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --opprint group-by a data/small -GENMD_EOF +GENMD-EOF In this example, since the sort is on field `a`, the first step is to group together all records having the same value for field `a`; the second step is to sort the distinct `a`-field values `pan`, `eks`, and `wye` into `eks`, `pan`, and `wye`; the third step is to print out the record-list for `a=eks`, then the record-list for `a=pan`, then the record-list for `a=wye`. The group-by operation omits the middle sort and just puts like records together, for those times when a sort isn't desired. In particular, the ordering of group-by fields for group-by is the order in which they were encountered in the data stream, which in some cases may be more interesting to you. ## group-like -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr group-like --help -GENMD_EOF +GENMD-EOF This groups together records having the same schema (i.e. same ordered list of field names) which is useful for making sense of time-ordered output as described in [Record Heterogeneity](record-heterogeneity.md) -- in particular, in preparation for CSV or pretty-print output. -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr cat data/het.dkvp -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --opprint group-like data/het.dkvp -GENMD_EOF +GENMD-EOF ## having-fields -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr having-fields --help -GENMD_EOF +GENMD-EOF Similar to [group-like](reference-verbs.md#group-like), this retains records with specified schema. -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr cat data/het.dkvp -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr having-fields --at-least resource data/het.dkvp -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr having-fields --which-are resource,ok,loadsec data/het.dkvp -GENMD_EOF +GENMD-EOF ## head -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr head --help -GENMD_EOF +GENMD-EOF Note that `head` is distinct from [top](reference-verbs.md#top) -- `head` shows fields which appear first in the data stream; `top` shows fields which are numerically largest (or smallest). -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --opprint head -n 4 data/medium -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --opprint head -n 1 -g b data/medium -GENMD_EOF +GENMD-EOF ## histogram -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr histogram --help -GENMD_EOF +GENMD-EOF This is just a histogram; there's not too much to say here. A note about binning, by example: Suppose you use `--lo 0.0 --hi 1.0 --nbins 10 -f x`. The input numbers less than 0 or greater than 1 aren't counted in any bin. Input numbers equal to 1 are counted in the last bin. That is, bin 0 has `0.0 ≤ x < 0.1`, bin 1 has `0.1 ≤ x < 0.2`, etc., but bin 9 has `0.9 ≤ x ≤ 1.0`. -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --opprint put '$x2=$x**2;$x3=$x2*$x' \ then histogram -f x,x2,x3 --lo 0 --hi 1 --nbins 10 \ data/medium -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --opprint put '$x2=$x**2;$x3=$x2*$x' \ then histogram -f x,x2,x3 --lo 0 --hi 1 --nbins 10 -o my_ \ data/medium -GENMD_EOF +GENMD-EOF ## join -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr join --help -GENMD_EOF +GENMD-EOF Examples: Join larger table with IDs with smaller ID-to-name lookup table, showing only paired records: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --icsvlite --opprint cat data/join-left-example.csv -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --icsvlite --opprint cat data/join-right-example.csv -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --icsvlite --opprint \ join -u -j id -r idcode -f data/join-left-example.csv \ data/join-right-example.csv -GENMD_EOF +GENMD-EOF Same, but with sorting the input first: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --icsvlite --opprint sort -f idcode \ then join -j id -r idcode -f data/join-left-example.csv \ data/join-right-example.csv -GENMD_EOF +GENMD-EOF Same, but showing only unpaired records: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --icsvlite --opprint \ join --np --ul --ur -u -j id -r idcode -f data/join-left-example.csv \ data/join-right-example.csv -GENMD_EOF +GENMD-EOF Use prefixing options to disambiguate between otherwise identical non-join field names: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --csvlite --opprint cat data/self-join.csv data/self-join.csv -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --csvlite --opprint join -j a --lp left_ --rp right_ -f data/self-join.csv data/self-join.csv -GENMD_EOF +GENMD-EOF Use zero join columns: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --csvlite --opprint join -j "" --lp left_ --rp right_ -f data/self-join.csv data/self-join.csv -GENMD_EOF +GENMD-EOF ## json-parse -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr json-parse --help -GENMD_EOF +GENMD-EOF ## json-stringify -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr json-stringify --help -GENMD_EOF +GENMD-EOF ## label -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr label --help -GENMD_EOF +GENMD-EOF See also [rename](reference-verbs.md#rename). Example: Files such as `/etc/passwd`, `/etc/group`, and so on have implicit field names which are found in section-5 manpages. These field names may be made explicit as follows: -GENMD_INCLUDE_ESCAPED(data/label-example.txt) +GENMD-INCLUDE-ESCAPED(data/label-example.txt) Likewise, if you have CSV/CSV-lite input data which has somehow been bereft of its header line, you can re-add a header line using `--implicit-csv-header` and `label`: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND cat data/headerless.csv -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --csv --implicit-csv-header cat data/headerless.csv -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --csv --implicit-csv-header label name,age,status data/headerless.csv -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --icsv --implicit-csv-header --opprint label name,age,status data/headerless.csv -GENMD_EOF +GENMD-EOF ## least-frequent -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr least-frequent -h -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --c2p --from data/colored-shapes.csv least-frequent -f shape -n 5 -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --c2p --from data/colored-shapes.csv least-frequent -f shape,color -n 5 -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --c2p --from data/colored-shapes.csv least-frequent -f shape,color -n 5 -o someothername -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --c2p --from data/colored-shapes.csv least-frequent -f shape,color -n 5 -b -GENMD_EOF +GENMD-EOF See also [most-frequent](reference-verbs.md#most-frequent). ## merge-fields -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr merge-fields --help -GENMD_EOF +GENMD-EOF This is like `mlr stats1` but all accumulation is done across fields within each given record: horizontal rather than vertical statistics, if you will. Examples: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --csvlite --opprint cat data/inout.csv -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --csvlite --opprint merge-fields -a min,max,sum -c _in,_out data/inout.csv -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --csvlite --opprint merge-fields -k -a sum -c _in,_out data/inout.csv -GENMD_EOF +GENMD-EOF ## most-frequent -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr most-frequent -h -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --c2p --from data/colored-shapes.csv most-frequent -f shape -n 5 -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --c2p --from data/colored-shapes.csv most-frequent -f shape,color -n 5 -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --c2p --from data/colored-shapes.csv most-frequent -f shape,color -n 5 -o someothername -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --c2p --from data/colored-shapes.csv most-frequent -f shape,color -n 5 -b -GENMD_EOF +GENMD-EOF See also [least-frequent](reference-verbs.md#least-frequent). ## nest -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr nest -h -GENMD_EOF +GENMD-EOF ## nothing -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr nothing -h -GENMD_EOF +GENMD-EOF ## put -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr put --help -GENMD_EOF +GENMD-EOF ### Features which put shares with filter @@ -709,9 +709,9 @@ Please see the [DSL reference](reference-dsl.md) for more information about the ## regularize -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr regularize --help -GENMD_EOF +GENMD-EOF This exists since hash-map software in various languages and tools encountered in the wild does not always print similar rows with fields in the same order: `mlr regularize` helps clean that up. @@ -719,85 +719,85 @@ See also [reorder](reference-verbs.md#reorder). ## remove-empty-columns -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr remove-empty-columns --help -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND cat data/remove-empty-columns.csv -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --csv remove-empty-columns data/remove-empty-columns.csv -GENMD_EOF +GENMD-EOF Since this verb needs to read all records to see if any of them has a non-empty value for a given field name, it is non-streaming: it will ingest all records before writing any. ## rename -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr rename --help -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --opprint cat data/small -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --opprint rename i,INDEX,b,COLUMN2 data/small -GENMD_EOF +GENMD-EOF As discussed in [Performance](performance.md), `sed` is significantly faster than Miller at doing this. However, Miller is format-aware, so it knows to do renames only within specified field keys and not any others, nor in field values which may happen to contain the same pattern. Example: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND sed 's/y/COLUMN5/g' data/small -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr rename y,COLUMN5 data/small -GENMD_EOF +GENMD-EOF See also [label](reference-verbs.md#label). ## reorder -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr reorder --help -GENMD_EOF +GENMD-EOF This pivots specified field names to the start or end of the record -- for example when you have highly multi-column data and you want to bring a field or two to the front of line where you can give a quick visual scan. -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --opprint cat data/small -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --opprint reorder -f i,b data/small -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --opprint reorder -e -f i,b data/small -GENMD_EOF +GENMD-EOF ## repeat -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr repeat --help -GENMD_EOF +GENMD-EOF This is useful in at least two ways: one, as a data-generator as in the above example using `urand()`; two, for reconstructing individual samples from data which has been count-aggregated: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND cat data/repeat-example.dat -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr repeat -f count then cut -x -f count data/repeat-example.dat -GENMD_EOF +GENMD-EOF After expansion with `repeat`, such data can then be sent on to `stats1 -a mode`, or (if the data are numeric) to `stats1 -a @@ -805,15 +805,15 @@ p10,p50,p90`, etc. ## reshape -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr reshape --help -GENMD_EOF +GENMD-EOF ## sample -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr sample --help -GENMD_EOF +GENMD-EOF This is reservoir-sampling: select *k* items from *n* with uniform probability and no repeats in the sample. (If *n* is less than @@ -821,7 +821,7 @@ uniform probability and no repeats in the sample. (If *n* is less than {field names}`, produce a *k*-sample for each distinct value of the specified field names. -GENMD_INCLUDE_ESCAPED(data/sample-example.txt) +GENMD-INCLUDE-ESCAPED(data/sample-example.txt) Note that no output is produced until all inputs are in. Another way to do sampling, which works in the streaming case, is `mlr filter 'urand() & @@ -829,171 +829,171 @@ sampling, which works in the streaming case, is `mlr filter 'urand() & ## sec2gmt -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr sec2gmt -h -GENMD_EOF +GENMD-EOF ## sec2gmtdate -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr sec2gmtdate -h -GENMD_EOF +GENMD-EOF ## seqgen -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr seqgen -h -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr seqgen --stop 10 -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr seqgen --start 20 --stop 40 --step 4 -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr seqgen --start 40 --stop 20 --step -4 -GENMD_EOF +GENMD-EOF ## shuffle -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr shuffle -h -GENMD_EOF +GENMD-EOF ## skip-trivial-records -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr skip-trivial-records -h -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND cat data/trivial-records.csv -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --csv skip-trivial-records data/trivial-records.csv -GENMD_EOF +GENMD-EOF ## sort -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr sort --help -GENMD_EOF +GENMD-EOF Example: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --opprint sort -f a -nr x data/small -GENMD_EOF +GENMD-EOF Here's an example filtering log data: suppose multiple threads (labeled here by color) are all logging progress counts to a single log file. The log file is (by nature) chronological, so the progress of various threads is interleaved: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND head -n 10 data/multicountdown.dat -GENMD_EOF +GENMD-EOF We can group these by thread by sorting on the thread ID (here, `color`). Since Miller's sort is stable, this means that timestamps within each thread's log data are still chronological: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND head -n 20 data/multicountdown.dat | mlr --opprint sort -f color -GENMD_EOF +GENMD-EOF Any records not having all specified sort keys will appear at the end of the output, in the order they were encountered, regardless of the specified sort order: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr sort -n x data/sort-missing.dkvp -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr sort -nr x data/sort-missing.dkvp -GENMD_EOF +GENMD-EOF ## sort-within-records -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr sort-within-records -h -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND cat data/sort-within-records.json -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --ijson --opprint cat data/sort-within-records.json -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --json sort-within-records data/sort-within-records.json -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --ijson --opprint sort-within-records data/sort-within-records.json -GENMD_EOF +GENMD-EOF ## stats1 -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr stats1 --help -GENMD_EOF +GENMD-EOF These are simple univariate statistics on one or more number-valued fields (`count` and `mode` apply to non-numeric fields as well), optionally categorized by one or more other fields. -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --oxtab stats1 -a count,sum,min,p10,p50,mean,p90,max -f x,y data/medium -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --opprint stats1 -a mean -f x,y -g b then sort -f b data/medium -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --c2p stats1 -a p50,p99 -f u,v -g color \ then put '$ur=$u_p99/$u_p50;$vr=$v_p99/$v_p50' \ data/colored-shapes.csv -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --c2p count-distinct -f shape then sort -nr count data/colored-shapes.csv -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --c2p stats1 -a mode -f color -g shape data/colored-shapes.csv -GENMD_EOF +GENMD-EOF ## stats2 -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr stats2 --help -GENMD_EOF +GENMD-EOF These are simple bivariate statistics on one or more pairs of number-valued fields, optionally categorized by one or more fields. -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --oxtab put '$x2=$x*$x; $xy=$x*$y; $y2=$y**2' \ then stats2 -a cov,corr -f x,y,y,y,x2,xy,x2,y2 \ data/medium -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --opprint put '$x2=$x*$x; $xy=$x*$y; $y2=$y**2' \ then stats2 -a linreg-ols,r2 -f x,y,y,y,xy,y2 -g a \ data/medium -GENMD_EOF +GENMD-EOF Here's an example simple line-fit. The `x` and `y` fields of the `data/medium` dataset are just independent uniformly distributed on the unit interval. Here we remove half the data and fit a line to it. -GENMD_INCLUDE_ESCAPED(data/linreg-example.txt) +GENMD-INCLUDE-ESCAPED(data/linreg-example.txt) I use [pgr](https://github.com/johnkerl/pgr) for plotting; here's a screenshot. @@ -1003,206 +1003,206 @@ I use [pgr](https://github.com/johnkerl/pgr) for plotting; here's a screenshot. Here's an example estimating time-to-completion for a set of jobs. Input data comes from a log file, with number of work units left to do in the `count` field and accumulated seconds in the `upsec` field, labeled by the `color` field: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND head -n 10 data/multicountdown.dat -GENMD_EOF +GENMD-EOF We can do a linear regression on count remaining as a function of time: with `c = m*u+b` we want to find the time when the count goes to zero, i.e. `u=-b/m`. -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --oxtab stats2 -a linreg-pca -f upsec,count -g color \ then put '$donesec = -$upsec_count_pca_b/$upsec_count_pca_m' \ data/multicountdown.dat -GENMD_EOF +GENMD-EOF ## step -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr step --help -GENMD_EOF +GENMD-EOF Most Miller commands are record-at-a-time, with the exception of `stats1`, `stats2`, and `histogram` which compute aggregate output. The `step` command is intermediate: it allows the option of adding fields which are functions of fields from previous records. Rsum is short for *running sum*. -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --opprint step -a shift,delta,rsum,counter -f x data/medium | head -15 -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --opprint step -a shift,delta,rsum,counter -f x -g a data/medium | head -15 -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --opprint step -a ewma -f x -d 0.1,0.9 data/medium | head -15 -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --opprint step -a ewma -f x -d 0.1,0.9 -o smooth,rough data/medium | head -15 -GENMD_EOF +GENMD-EOF Example deriving uptime-delta from system uptime: -GENMD_INCLUDE_ESCAPED(data/ping-delta-example.txt) +GENMD-INCLUDE-ESCAPED(data/ping-delta-example.txt) ## tac -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr tac --help -GENMD_EOF +GENMD-EOF Prints the records in the input stream in reverse order. Note: this requires Miller to retain all input records in memory before any output records are produced. -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --icsv --opprint cat data/a.csv -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --icsv --opprint cat data/b.csv -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --icsv --opprint tac data/a.csv data/b.csv -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --icsv --opprint put '$filename=FILENAME' then tac data/a.csv data/b.csv -GENMD_EOF +GENMD-EOF ## tail -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr tail --help -GENMD_EOF +GENMD-EOF Prints the last *n* records in the input stream, optionally by category. -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --c2p tail -n 4 data/colored-shapes.csv -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --c2p tail -n 1 -g shape data/colored-shapes.csv -GENMD_EOF +GENMD-EOF ## tee -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr tee --help -GENMD_EOF +GENMD-EOF ## template -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr template --help -GENMD_EOF +GENMD-EOF ## top -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr top --help -GENMD_EOF +GENMD-EOF Note that `top` is distinct from [head](reference-verbs.md#head) -- `head` shows fields which appear first in the data stream; `top` shows fields which are numerically largest (or smallest). -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --opprint top -n 4 -f x data/medium -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --opprint top -n 4 -f x -o someothername data/medium -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --opprint top -n 2 -f x -g a then sort -f a data/medium -GENMD_EOF +GENMD-EOF ## unflatten -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr unflatten --help -GENMD_EOF +GENMD-EOF ## uniq -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr uniq --help -GENMD_EOF +GENMD-EOF There are two main ways to use `mlr uniq`: the first way is with `-g` to specify group-by columns. -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND wc -l data/colored-shapes.csv -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --csv uniq -g color,shape data/colored-shapes.csv -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --c2p uniq -g color,shape -c then sort -f color,shape data/colored-shapes.csv -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --c2p uniq -g color,shape -c -o someothername \ then sort -nr someothername \ data/colored-shapes.csv -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --c2p uniq -n -g color,shape data/colored-shapes.csv -GENMD_EOF +GENMD-EOF The second main way to use `mlr uniq` is without group-by columns, using `-a` instead: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND cat data/repeats.dkvp -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND wc -l data/repeats.dkvp -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --opprint uniq -a data/repeats.dkvp -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --opprint uniq -a -n data/repeats.dkvp -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --opprint uniq -a -c data/repeats.dkvp -GENMD_EOF +GENMD-EOF ## unsparsify -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr unsparsify --help -GENMD_EOF +GENMD-EOF Examples: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND cat data/sparse.json -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --json unsparsify data/sparse.json -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --ijson --opprint unsparsify data/sparse.json -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --ijson --opprint unsparsify --fill-with missing data/sparse.json -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --ijson --opprint unsparsify -f a,b,u data/sparse.json -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --ijson --opprint unsparsify -f a,b,u,v,w,x then regularize data/sparse.json -GENMD_EOF +GENMD-EOF diff --git a/docs/src/repl.md.in b/docs/src/repl.md.in index 41171510f..e276ab7ab 100644 --- a/docs/src/repl.md.in +++ b/docs/src/repl.md.in @@ -4,12 +4,12 @@ The Miller REPL (read-evaluate-print loop) is an interactive counterpart to reco Miller's REPL isn't a source-level debugger which lets you execute one source-code *statement* at a time -- however, it does let you operate on one *record* at a time. Further, it lets you use "immediate expressions", namely, you can interact with the [Miller programming language](miller-programming-language.md) without having to provide data from an input file. -GENMD_CARDIFY_HIGHLIGHT_ONE +GENMD-CARDIFY-HIGHLIGHT-ONE mlr repl [mlr] 1 + 2 3 -GENMD_EOF +GENMD-EOF ## Using Miller without the REPL @@ -23,10 +23,10 @@ Using `put` and `filter`, you can do the following as we've seen above: * Specify statements to be executed on each record -- which are anything outside of `begin`/`end`/`func`/`subr`. * Example: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --icsv --ojson --from example.csv head -n 2 \ then put 'begin {print "HELLO"} $qr = $quantity / $rate; end {print "GOODBYE"}' -GENMD_EOF +GENMD-EOF ## Using Miller with the REPL @@ -74,7 +74,7 @@ printed to the terminal, e.g. if you type `1+2`, you will see `3`. Use the REPL to look at arithmetic: -GENMD_CARDIFY_HIGHLIGHT_ONE +GENMD-CARDIFY-HIGHLIGHT-ONE mlr repl [mlr] 6/3 @@ -88,11 +88,11 @@ int [mlr] typeof(6/5) float -GENMD_EOF +GENMD-EOF Read the first record from a small file: -GENMD_CARDIFY_HIGHLIGHT_ONE +GENMD-CARDIFY-HIGHLIGHT-ONE mlr repl [mlr] :open foo.dat @@ -115,11 +115,11 @@ FILENAME="foo.dat",FILENUM=1,NR=1,FNR=1 [mlr] :write a=eks,b=wye,i=4,x=0.38139939387114097,y=0.13418874328430463,z=4.381399393871141 -GENMD_EOF +GENMD-EOF Skip until deep into a larger file, then inspect a record: -GENMD_CARDIFY_HIGHLIGHT_ONE +GENMD-CARDIFY-HIGHLIGHT-ONE mlr repl --csv [mlr] :open data/colored-shapes.csv @@ -136,7 +136,7 @@ mlr repl --csv "w": 0.4799125551304738, "x": 6.379888206335166 } -GENMD_EOF +GENMD-EOF ## History-editing diff --git a/docs/src/shapes-of-data.md.in b/docs/src/shapes-of-data.md.in index 3ab01abd8..a2ad7c7c3 100644 --- a/docs/src/shapes-of-data.md.in +++ b/docs/src/shapes-of-data.md.in @@ -16,39 +16,39 @@ Also try `od -xcv` and/or `cat -e` on your file to check for non-printable chara Use the `file` command to see if there are CR/LF terminators (in this case, there are not): -GENMD_CARDIFY_HIGHLIGHT_ONE +GENMD-CARDIFY-HIGHLIGHT-ONE file data/colours.csv data/colours.csv: UTF-8 Unicode text -GENMD_EOF +GENMD-EOF Look at the file to find names of fields: -GENMD_CARDIFY_HIGHLIGHT_ONE +GENMD-CARDIFY-HIGHLIGHT-ONE cat data/colours.csv KEY;DE;EN;ES;FI;FR;IT;NL;PL;RO;TR masterdata_colourcode_1;Weiß;White;Blanco;Valkoinen;Blanc;Bianco;Wit;Biały;Alb;Beyaz masterdata_colourcode_2;Schwarz;Black;Negro;Musta;Noir;Nero;Zwart;Czarny;Negru;Siyah -GENMD_EOF +GENMD-EOF Extract a few fields: -GENMD_CARDIFY_HIGHLIGHT_ONE +GENMD-CARDIFY-HIGHLIGHT-ONE mlr --csv cut -f KEY,PL,RO data/colours.csv (only blank lines appear) -GENMD_EOF +GENMD-EOF Use XTAB output format to get a sharper picture of where records/fields are being split: -GENMD_CARDIFY_HIGHLIGHT_ONE +GENMD-CARDIFY-HIGHLIGHT-ONE mlr --icsv --oxtab cat data/colours.csv KEY;DE;EN;ES;FI;FR;IT;NL;PL;RO;TR masterdata_colourcode_1;Weiß;White;Blanco;Valkoinen;Blanc;Bianco;Wit;Biały;Alb;Beyaz KEY;DE;EN;ES;FI;FR;IT;NL;PL;RO;TR masterdata_colourcode_2;Schwarz;Black;Negro;Musta;Noir;Nero;Zwart;Czarny;Negru;Siyah -GENMD_EOF +GENMD-EOF Using XTAB output format makes it clearer that `KEY;DE;...;RO;TR` is being treated as a single field name in the CSV header, and likewise each subsequent line is being treated as a single field value. This is because the default field separator is a comma but we have semicolons here. Use XTAB again with different field separator (`--fs semicolon`): -GENMD_CARDIFY_HIGHLIGHT_ONE +GENMD-CARDIFY-HIGHLIGHT-ONE mlr --icsv --ifs semicolon --oxtab cat data/colours.csv KEY masterdata_colourcode_1 DE Weiß @@ -73,98 +73,98 @@ NL Zwart PL Czarny RO Negru TR Siyah -GENMD_EOF +GENMD-EOF Using the new field-separator, retry the cut: -GENMD_CARDIFY_HIGHLIGHT_ONE +GENMD-CARDIFY-HIGHLIGHT-ONE mlr --csv --fs semicolon cut -f KEY,PL,RO data/colours.csv KEY;PL;RO masterdata_colourcode_1;Biały;Alb masterdata_colourcode_2;Czarny;Negru -GENMD_EOF +GENMD-EOF ## I assigned $9 and it's not 9th Miller records are ordered lists of key-value pairs. For NIDX format, DKVP format when keys are missing, or CSV/CSV-lite format with `--implicit-csv-header`, Miller will sequentially assign keys of the form `1`, `2`, etc. But these are not integer array indices: they're just field names taken from the initial field ordering in the input data, when it was originally read from the input file(s). -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND echo x,y,z | mlr --dkvp cat -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND echo x,y,z | mlr --dkvp put '$6="a";$4="b";$55="cde"' -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND echo x,y,z | mlr --nidx cat -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND echo x,y,z | mlr --csv --implicit-csv-header cat -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND echo x,y,z | mlr --dkvp rename 2,999 -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND echo x,y,z | mlr --dkvp rename 2,newname -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND echo x,y,z | mlr --csv --implicit-csv-header reorder -f 3,1,2 -GENMD_EOF +GENMD-EOF ## Why doesn't mlr cut put fields in the order I want? Example: columns `rate,shape,flag` were requested but they appear here in the order `shape,flag,rate`: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND cat example.csv -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --csv cut -f rate,shape,flag example.csv -GENMD_EOF +GENMD-EOF The issue is that Miller's `cut`, by default, outputs cut fields in the order they appear in the input data. This design decision was made intentionally to parallel the Unix/Linux system `cut` command, which has the same semantics. The solution is to use the `-o` option: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --csv cut -o -f rate,shape,flag example.csv -GENMD_EOF +GENMD-EOF ## Numbering and renumbering records The `awk`-like built-in variable `NR` is incremented for each input record: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND cat example.csv -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --csv put '$nr = NR' example.csv -GENMD_EOF +GENMD-EOF However, this is the record number within the original input stream -- not after any filtering you may have done: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --csv filter '$color == "yellow"' then put '$nr = NR' example.csv -GENMD_EOF +GENMD-EOF There are two good options here. One is to use the `cat` verb with `-n`: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --csv filter '$color == "yellow"' then cat -n example.csv -GENMD_EOF +GENMD-EOF The other is to keep your own counter within the `put` DSL: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --csv filter '$color == "yellow"' then put 'begin {@n = 1} $n = @n; @n += 1' example.csv -GENMD_EOF +GENMD-EOF The difference is a matter of taste (although `mlr cat -n` puts the counter first). @@ -172,50 +172,50 @@ The difference is a matter of taste (although `mlr cat -n` puts the counter firs Suppose you want to just keep the first two components of the hostnames: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND cat data/hosts.csv -GENMD_EOF +GENMD-EOF Using the [`splita`](reference-dsl-builtin-functions.md#splita) and [`joinv`](reference-dsl-builtin-functions.md#joinv) functions, along with [array slicing](reference-main-arrays.md#slicing), we get -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --csv --from data/hosts.csv put '$host = joinv(splita($host, ".")[1:2], ".")' -GENMD_EOF +GENMD-EOF ## Splitting nested fields Suppose you have a TSV file like this: -GENMD_INCLUDE_ESCAPED(data/nested.tsv) +GENMD-INCLUDE-ESCAPED(data/nested.tsv) The simplest option is to use [nest](reference-verbs.md#nest): -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --tsv nest --explode --values --across-records -f b --nested-fs : data/nested.tsv -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --tsv nest --explode --values --across-fields -f b --nested-fs : data/nested.tsv -GENMD_EOF +GENMD-EOF While `mlr nest` is simplest, let's also take a look at a few ways to do this using the `put` DSL. One option to split out the colon-delimited values in the `b` column is to use `splitnv` to create an integer-indexed map and loop over it, adding new fields to the current record: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --from data/nested.tsv --itsv --oxtab put ' o = splitnv($b, ":"); for (k,v in o) { $["p".k]=v } ' -GENMD_EOF +GENMD-EOF while another is to loop over the same map from `splitnv` and use it (with `put -q` to suppress printing the original record) to produce multiple records: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --from data/nested.tsv --itsv --oxtab put -q ' o = splitnv($b, ":"); for (k,v in o) { @@ -223,16 +223,16 @@ mlr --from data/nested.tsv --itsv --oxtab put -q ' emit x } ' -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --from data/nested.tsv --tsv put -q ' o = splitnv($b, ":"); for (k,v in o) { x = mapsum($*, {"b":v}); emit x } ' -GENMD_EOF +GENMD-EOF ## Options for dealing with duplicate rows @@ -244,23 +244,23 @@ If you want to look at partial uniqueness -- for example, show only the first re Suppose you have a method (in whatever language) which is printing things of the form -GENMD_INCLUDE_ESCAPED(data/rect-outer.txt) +GENMD-INCLUDE-ESCAPED(data/rect-outer.txt) and then calls another method which prints things of the form -GENMD_INCLUDE_ESCAPED(data/rect-middle.txt) +GENMD-INCLUDE-ESCAPED(data/rect-middle.txt) and then, perhaps, that second method calls a third method which prints things of the form -GENMD_INCLUDE_ESCAPED(data/rect-inner.txt) +GENMD-INCLUDE-ESCAPED(data/rect-inner.txt) with the result that your program's output is -GENMD_INCLUDE_ESCAPED(data/rect.txt) +GENMD-INCLUDE-ESCAPED(data/rect.txt) The idea here is that middles starting with a 1 belong to the outer value of 1, and so on. (For example, the outer values might be account IDs, the middle values might be invoice IDs, and the inner values might be invoice line-items.) If you want all the middle and inner lines to have the context of which outers they belong to, you can modify your software to pass all those through your methods. Alternatively, don't refactor your code just to handle some ad-hoc log-data formatting -- instead, use the following to [rectangularize the data](record-heterogeneity.md). The idea is to use an out-of-stream variable to accumulate fields across records. Clear that variable when you see an outer ID; accumulate fields; emit output when you see the inner IDs. -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --from data/rect.txt put -q ' is_present($outer) { unset @r @@ -271,7 +271,7 @@ mlr --from data/rect.txt put -q ' is_present($inner1) { emit @r }' -GENMD_EOF +GENMD-EOF See also the [record-heterogeneity page](record-heterogeneity.md); see in particular the [`regularize` verb](reference-verbs.md#regularize) for a way to diff --git a/docs/src/shell-commands.md.in b/docs/src/shell-commands.md.in index 8c9fd521d..6fe1079b1 100644 --- a/docs/src/shell-commands.md.in +++ b/docs/src/shell-commands.md.in @@ -4,23 +4,23 @@ TODO: while-read example from issues The [system](reference-dsl.md#system) DSL function allows you to run a specific shell command and put its output -- minus the final newline -- into a record field. The command itself is any string, either a literal string, or a concatenation of strings, perhaps including other field values or what have you. -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --opprint put '$o = system("echo hello world")' data/small -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --opprint put '$o = system("echo {" . NR . "}")' data/small -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --opprint put '$o = system("echo -n ".$a."| md5")' data/small -GENMD_EOF +GENMD-EOF Note that running a subprocess on every record takes a non-trivial amount of time. Comparing asking the system `date` command for the current time in nanoseconds versus computing it in process: -GENMD_CARDIFY_HIGHLIGHT_ONE +GENMD-CARDIFY-HIGHLIGHT-ONE mlr --opprint put '$t=system("date +%s.%N")' then step -a delta -f t data/small a b i x y t t_delta pan pan 1 0.3467901443380824 0.7268028627434533 1568774318.513903817 0 @@ -28,9 +28,9 @@ eks pan 2 0.7586799647899636 0.5221511083334797 1568774318.514722876 0.000819 wye wye 3 0.20460330576630303 0.33831852551664776 1568774318.515618046 0.000895 eks wye 4 0.38139939387114097 0.13418874328430463 1568774318.516547441 0.000929 wye pan 5 0.5732889198020006 0.8636244699032729 1568774318.517518828 0.000971 -GENMD_EOF +GENMD-EOF -GENMD_CARDIFY_HIGHLIGHT_ONE +GENMD-CARDIFY-HIGHLIGHT-ONE mlr --opprint put '$t=systime()' then step -a delta -f t data/small a b i x y t t_delta pan pan 1 0.3467901443380824 0.7268028627434533 1568774318.518699 0 @@ -38,4 +38,4 @@ eks pan 2 0.7586799647899636 0.5221511083334797 1568774318.518717 0.000018 wye wye 3 0.20460330576630303 0.33831852551664776 1568774318.518723 0.000006 eks wye 4 0.38139939387114097 0.13418874328430463 1568774318.518727 0.000004 wye pan 5 0.5732889198020006 0.8636244699032729 1568774318.518730 0.000003 -GENMD_EOF +GENMD-EOF diff --git a/docs/src/sorting.md.in b/docs/src/sorting.md.in index 081d435d1..22dcc34a2 100644 --- a/docs/src/sorting.md.in +++ b/docs/src/sorting.md.in @@ -16,21 +16,21 @@ another, etc. Input data: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --c2p cat example.csv -GENMD_EOF +GENMD-EOF Sorted numerically ascending by rate: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --c2p sort -n rate example.csv -GENMD_EOF +GENMD-EOF Sorted lexically ascending by color; then, within each color, numerically descending by quantity: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --c2p sort -f color -nr quantity example.csv -GENMD_EOF +GENMD-EOF ## Sorting fields within records: the sort-within-records verb @@ -40,17 +40,17 @@ leaves records in their original order in the data stream, but reorders fields within each record. A typical use-case is for given all records the same column-ordering, in particular for converting JSON to CSV (or other tabular formats): -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND cat data/sort-within-records.json -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --ijson --opprint cat data/sort-within-records.json -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --ijson --opprint sort-within-records data/sort-within-records.json -GENMD_EOF +GENMD-EOF ## The sort function by example @@ -58,77 +58,77 @@ GENMD_EOF * Without second argument, uses the natural ordering. * With second which is string, takes sorting flags from it: `"f"` for lexical or `"c"` for case-folded lexical, and/or `"r"` for reverse/descending. -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr -n put ' end { # Sort array with natural ordering print sort([5,2,3,1,4]); } ' -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr -n put ' end { # Sort array with reverse-natural ordering print sort([5,2,3,1,4], "r"); } ' -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr -n put ' end { # Sort array with custom function: natural ordering print sort([5,2,3,1,4], func(a,b) { return a <=> b}); } ' -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr -n put ' end { # Sort array with custom function: reverse-natural ordering print sort([5,2,3,1,4], func(a,b) { return b <=> a}); } ' -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr -n put ' end { # Sort map with natural ordering on keys print sort({"c":2, "a": 3, "b": 1}); } ' -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr -n put ' end { # Sort map with reverse-natural ordering on keys print sort({"c":2, "a": 3, "b": 1}, "r"); } ' -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr -n put ' end { # Sort map with custom function: natural ordering on values print sort({"c":2, "a": 3, "b": 1}, func(ak,av,bk,bv){return av <=> bv}); } ' -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr -n put ' end { # Sort map with custom function: reverse-natural ordering on values print sort({"c":2, "a": 3, "b": 1}, func(ak,av,bk,bv){return bv <=> av}); } ' -GENMD_EOF +GENMD-EOF In the rest of this page we'll look more closely at these variants. @@ -147,47 +147,47 @@ contain strings, floats, and booleans; if you need to sort an array whose values are themselves maps or arrays, you'll need `sort` with function argument as described further down in this page. -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND cat data/sorta-example.csv -GENMD_EOF +GENMD-EOF Default sort is numerical ascending: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --c2p --from data/sorta-example.csv put ' $values = splita($values, ";"); $values = sort($values); # default flags $values = joinv($values, ";"); ' -GENMD_EOF +GENMD-EOF Use the `"r"` flag for reverse, which is numerical descending: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --c2p --from data/sorta-example.csv put ' $values = splita($values, ";"); $values = sort($values, "r"); # 'r' flag for reverse sort $values = joinv($values, ";"); ' -GENMD_EOF +GENMD-EOF Use the `"f"` flag for lexical ascending sort (and `"fr"` would lexical descending): -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --c2p --from data/sorta-example.csv put ' $values = splita($values, ";"); $values = sort($values, "f"); # 'f' flag for lexical sort $values = joinv($values, ";"); ' -GENMD_EOF +GENMD-EOF Without and with case-folding: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND cat data/sorta-example-text.csv -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --c2p --from data/sorta-example-text.csv put ' $values = splita($values, ";"); if (NR == 1) { @@ -197,7 +197,7 @@ mlr --c2p --from data/sorta-example-text.csv put ' } $values = joinv($values, ";"); ' -GENMD_EOF +GENMD-EOF ## Simple sorting of maps within records @@ -211,16 +211,16 @@ described further down in this page. Also note that, unlike the `sort-within-record` verb with its `-r` flag, `sort` doesn't recurse into submaps and sort those. -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND cat data/server-log.json -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --json --from data/server-log.json put ' $req = sort($req); # Ascending here $res = sort($res, "r"); # Descending here ' -GENMD_EOF +GENMD-EOF ## Simple sorting of maps across records @@ -235,7 +235,7 @@ of accumulating records in a map, then sorting the map. Using the `f` flag we're sorting the map keys (1-up NR) lexically, so we have 1, then 10, then 2: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --icsv --opprint --from example.csv put -q ' begin { @records = {}; # Define as a map @@ -249,7 +249,7 @@ mlr --icsv --opprint --from example.csv put -q ' } } ' -GENMD_EOF +GENMD-EOF ## Custom sorting of arrays within records @@ -264,14 +264,14 @@ for comparing elements. For example, let's use the following input data. Instead of having an array, it has some semicolon-delimited data in a field which we can split and sort: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND cat data/sortaf-example.csv -GENMD_EOF +GENMD-EOF In the following example we sort data in several ways -- the first two just recaptiulate (for reference) what `sort` with default flags already does; the third is novel: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --icsv --ojson --from data/sortaf-example.csv put ' # Same as sort($values) @@ -302,7 +302,7 @@ mlr --icsv --ojson --from data/sortaf-example.csv put ' $reverse = sort(split_values, reverse); $even_then_odd = sort(split_values, even_then_odd); ' -GENMD_EOF +GENMD-EOF ## Custom sorting of arrays across records @@ -319,7 +319,7 @@ functions are maps -- and we have to access the `index` field using either indexing](reference-dsl-operators.md#the-double-purpose-dot-operator)) `a.index` and `b.index`. -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --icsv --opprint --from example.csv put -q ' # Sort primarily ascending on the shape field, then secondarily # descending numeric on the index field. @@ -342,7 +342,7 @@ mlr --icsv --opprint --from example.csv put -q ' } } ' -GENMD_EOF +GENMD-EOF ## Custom sorting of maps within records @@ -356,7 +356,7 @@ keys and/or values. For example, we can sort ascending or descending by map key or map value: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr -n put -q ' func f1(ak, av, bk, bv) { return ak <=> bk @@ -383,7 +383,7 @@ mlr -n put -q ' print sort(x, f4); } ' -GENMD_EOF +GENMD-EOF ## Custom sorting of maps across records @@ -395,7 +395,7 @@ of them -- densely -- accumulating them in an array is fine. If we're only taking a subset -- sparsely -- and we want to retain the original `NR` as keys, using a map is handy, since we don't need continguous keys. -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --icsv --opprint --from example.csv put -q ' # Sort descending numeric on the index field func cmp(ak, av, bk, bv) { @@ -412,4 +412,4 @@ mlr --icsv --opprint --from example.csv put -q ' } } ' -GENMD_EOF +GENMD-EOF diff --git a/docs/src/special-symbols-and-formatting.md.in b/docs/src/special-symbols-and-formatting.md.in index d7ca33db0..051afe058 100644 --- a/docs/src/special-symbols-and-formatting.md.in +++ b/docs/src/special-symbols-and-formatting.md.in @@ -4,66 +4,66 @@ [CSV](file-formats.md) handles this well and by design: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND cat commas.csv -GENMD_EOF +GENMD-EOF Likewise [JSON](file-formats.md#json): -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --icsv --ojson cat commas.csv -GENMD_EOF +GENMD-EOF For Miller's [XTAB](file-formats.md#xtab-vertical-tabular) there is no escaping for carriage returns, but commas work fine: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --icsv --oxtab cat commas.csv -GENMD_EOF +GENMD-EOF But for [key-value-pairs](file-formats.md#dkvp-key-value-pairs) and [index-numbered](file-formats.md#nidx-index-numbered-toolkit-style) formats, commas are the default field separator. And -- as of Miller 5.4.0 anyway -- there is no CSV-style double-quote-handling like there is for CSV. So commas within the data look like delimiters: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --icsv --odkvp cat commas.csv -GENMD_EOF +GENMD-EOF One solution is to use a different delimiter, such as a pipe character: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --icsv --odkvp --ofs pipe cat commas.csv -GENMD_EOF +GENMD-EOF To be extra-sure to avoid data/delimiter clashes, you can also use control characters as delimiters -- here, control-A: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --icsv --odkvp --ofs '\001' cat commas.csv | cat -v -GENMD_EOF +GENMD-EOF ## How can I handle field names with special symbols in them? Simply surround the field names with curly braces: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND echo 'x.a=3,y:b=4,z/c=5' | mlr put '${product.all} = ${x.a} * ${y:b} * ${z/c}' -GENMD_EOF +GENMD-EOF ## How can I put single quotes into strings? This is a little tricky due to the shell's handling of quotes. For simplicity, let's first put an update script into a file: -GENMD_INCLUDE_ESCAPED(data/single-quote-example.mlr) +GENMD-INCLUDE-ESCAPED(data/single-quote-example.mlr) -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND echo a=bcd | mlr put -f data/single-quote-example.mlr -GENMD_EOF +GENMD-EOF So: Miller's DSL uses double quotes for strings, and you can put single quotes (or backslash-escaped double-quotes) inside strings, no problem. Without putting the update expression in a file, it's messier: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND echo a=bcd | mlr put '$a="It'\''s OK, I said, '\''for now'\''."' -GENMD_EOF +GENMD-EOF The idea is that the outermost single-quotes are to protect the `put` expression from the shell, and the double quotes within them are for Miller. To get a single quote in the middle there, you need to actually put it *outside* the single-quoting for the shell. The pieces are the following, all concatenated together: @@ -79,14 +79,14 @@ The idea is that the outermost single-quotes are to protect the `put` expression One way is to use square brackets; an alternative is to use simple string-substitution rather than a regular expression. -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND cat data/question.dat -GENMD_EOF -GENMD_RUN_COMMAND +GENMD-EOF +GENMD-RUN-COMMAND mlr --oxtab put '$c = gsub($a, "[?]"," ...")' data/question.dat -GENMD_EOF -GENMD_RUN_COMMAND +GENMD-EOF +GENMD-RUN-COMMAND mlr --oxtab put '$c = ssub($a, "?"," ...")' data/question.dat -GENMD_EOF +GENMD-EOF The `ssub` function exists precisely for this reason: so you don't have to escape anything. diff --git a/docs/src/sql-examples.md.in b/docs/src/sql-examples.md.in index fcc46113f..7555c7346 100644 --- a/docs/src/sql-examples.md.in +++ b/docs/src/sql-examples.md.in @@ -6,7 +6,7 @@ I like to produce SQL-query output with header-column and tab delimiter: this is For example, using default output formatting in `mysql` we get formatting like Miller's `--opprint --barred`: -GENMD_CARDIFY_HIGHLIGHT_ONE +GENMD-CARDIFY-HIGHLIGHT-ONE mysql --database=mydb -e 'show columns in mytable' +------------------+--------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | @@ -17,11 +17,11 @@ mysql --database=mydb -e 'show columns in mytable' | assigned_to | bigint(20) | YES | | NULL | | | last_update_time | int(11) | YES | | NULL | | +------------------+--------------+------+-----+---------+-------+ -GENMD_EOF +GENMD-EOF Using `mysql`'s `-B` we get TSV output: -GENMD_CARDIFY_HIGHLIGHT_ONE +GENMD-CARDIFY-HIGHLIGHT-ONE mysql --database=mydb -B -e 'show columns in mytable' | mlr --itsvlite --opprint cat Field Type Null Key Default Extra id bigint(20) NO MUL NULL - @@ -29,11 +29,11 @@ category varchar(256) NO - NULL - is_permanent tinyint(1) NO - NULL - assigned_to bigint(20) YES - NULL - last_update_time int(11) YES - NULL - -GENMD_EOF +GENMD-EOF Since Miller handles TSV output, we can do as much or as little processing as we want in the SQL query, then send the rest on to Miller. This includes outputting as JSON, doing further selects/joins in Miller, doing stats, etc. etc.: -GENMD_CARDIFY_HIGHLIGHT_ONE +GENMD-CARDIFY-HIGHLIGHT-ONE mysql --database=mydb -B -e 'show columns in mytable' | mlr --itsvlite --ojson --jlistwrap --jvstack cat [ { @@ -77,13 +77,13 @@ mysql --database=mydb -B -e 'show columns in mytable' | mlr --itsvlite --ojson - "Extra": "" } ] -GENMD_EOF +GENMD-EOF -GENMD_CARDIFY_HIGHLIGHT_ONE +GENMD-CARDIFY-HIGHLIGHT-ONE mysql --database=mydb -B -e 'select * from mytable' > query.tsv -GENMD_EOF +GENMD-EOF -GENMD_CARDIFY_HIGHLIGHT_ONE +GENMD-CARDIFY-HIGHLIGHT-ONE mlr --from query.tsv --t2p stats1 -a count -f id -g category,assigned_to category assigned_to id_count special 10000978 207 @@ -93,7 +93,7 @@ standard 10000978 524 standard 10003924 392 standard 10009872 108 ... -GENMD_EOF +GENMD-EOF ## SQL-input examples @@ -101,7 +101,7 @@ One use of NIDX (value-only, no keys) format is for loading up SQL tables. Create and load SQL table: -GENMD_CARDIFY +GENMD-CARDIFY mysql> CREATE TABLE abixy( a VARCHAR(32), b VARCHAR(32), @@ -140,11 +140,11 @@ mysql> SELECT * FROM abixy LIMIT 10; | hat | wye | 9 | 0.03144187646093577 | 0.7495507603507059 | | pan | wye | 10 | 0.5026260055412137 | 0.9526183602969864 | +------+------+------+---------------------+---------------------+ -GENMD_EOF +GENMD-EOF Aggregate counts within SQL: -GENMD_CARDIFY +GENMD-CARDIFY mysql> SELECT a, b, COUNT(*) AS count FROM abixy GROUP BY a, b ORDER BY COUNT DESC; +------+------+-------+ | a | b | count | @@ -176,11 +176,11 @@ mysql> SELECT a, b, COUNT(*) AS count FROM abixy GROUP BY a, b ORDER BY COUNT DE | eks | zee | 357 | +------+------+-------+ 25 rows in set (0.01 sec) -GENMD_EOF +GENMD-EOF Aggregate counts within Miller: -GENMD_CARDIFY_HIGHLIGHT_ONE +GENMD-CARDIFY-HIGHLIGHT-ONE mlr --opprint uniq -c -g a,b then sort -nr count data/medium a b count zee wye 455 @@ -198,11 +198,11 @@ zee zee 403 pan wye 395 hat pan 363 eks zee 357 -GENMD_EOF +GENMD-EOF Pipe SQL output to aggregate counts within Miller: -GENMD_CARDIFY_HIGHLIGHT_ONE +GENMD-CARDIFY-HIGHLIGHT-ONE mysql -D miller -B -e 'select * from abixy' | mlr --itsv --opprint uniq -c -g a,b then sort -nr count a b count zee wye 455 @@ -230,4 +230,4 @@ wye wye 377 eks pan 371 hat pan 363 eks zee 357 -GENMD_EOF +GENMD-EOF diff --git a/docs/src/statistics-examples.md.in b/docs/src/statistics-examples.md.in index 7ab879c9f..a98ead194 100644 --- a/docs/src/statistics-examples.md.in +++ b/docs/src/statistics-examples.md.in @@ -4,15 +4,15 @@ For one or more specified field names, simply compute p25 and p75, then write the IQR as the difference of p75 and p25: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --oxtab stats1 -f x -a p25,p75 \ then put '$x_iqr = $x_p75 - $x_p25' \ data/medium -GENMD_EOF +GENMD-EOF For wildcarded field names, first compute p25 and p75, then loop over field names with `p25` in them: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --oxtab stats1 --fr '[i-z]' -a p25,p75 \ then put 'for (k,v in $*) { if (k =~ "(.*)_p25") { @@ -20,13 +20,13 @@ mlr --oxtab stats1 --fr '[i-z]' -a p25,p75 \ } }' \ data/medium -GENMD_EOF +GENMD-EOF ## Computing weighted means This might be more elegantly implemented as an option within the `stats1` verb. Meanwhile, it's expressible within the DSL: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --from data/medium put -q ' # Using the y field for weighting in this example weight = $y; @@ -51,4 +51,4 @@ mlr --from data/medium put -q ' #emit mean, "a"; emit (wmean, mean), "a"; }' -GENMD_EOF +GENMD-EOF diff --git a/docs/src/two-pass-algorithms.md.in b/docs/src/two-pass-algorithms.md.in index 73beb09d3..78fa029d5 100644 --- a/docs/src/two-pass-algorithms.md.in +++ b/docs/src/two-pass-algorithms.md.in @@ -16,43 +16,43 @@ multi-pass computations, at the price of retaining all input records in memory. One of Miller's strengths is its compact notation: for example, given input of the form -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND head -n 5 ./data/medium -GENMD_EOF +GENMD-EOF you can simply do -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --oxtab stats1 -a sum -f x ./data/medium -GENMD_EOF +GENMD-EOF or -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --opprint stats1 -a sum -f x -g b ./data/medium -GENMD_EOF +GENMD-EOF rather than the more tedious -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --oxtab put -q ' @x_sum += $x; end { emit @x_sum } ' data/medium -GENMD_EOF +GENMD-EOF or -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --opprint put -q ' @x_sum[$b] += $x; end { emit @x_sum, "b" } ' data/medium -GENMD_EOF +GENMD-EOF The former (`mlr stats1` et al.) has the advantages of being easier to type, being less error-prone to type, and running faster. @@ -73,7 +73,7 @@ The following examples compute some things using oosvars which are already compu For example, mapping numeric values down a column to the percentage between their min and max values is two-pass: on the first pass you find the min and max values, then on the second, map each record's value to a percentage. -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --from data/small --opprint put -q ' # These are executed once per record, which is the first pass. # The key is to use NR to index an out-of-stream variable to @@ -90,13 +90,13 @@ mlr --from data/small --opprint put -q ' emit (@x, @x_pct), "NR" } ' -GENMD_EOF +GENMD-EOF ## Line-number ratios Similarly, finding the total record count requires first reading through all the data: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --opprint --from data/small put -q ' @records[NR] = $*; end { @@ -109,45 +109,45 @@ mlr --opprint --from data/small put -q ' emit @records,"I" } ' then reorder -f I,N,PCT -GENMD_EOF +GENMD-EOF ## Records having max value The idea is to retain records having the largest value of `n` in the following data: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --itsv --opprint cat data/maxrows.tsv -GENMD_EOF +GENMD-EOF Of course, the largest value of `n` isn't known until after all data have been read. Using an [out-of-stream variable](reference-dsl-variables.md#out-of-stream-variables) we can [retain all records as they are read](operating-on-all-records.md), then filter them at the end: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND cat data/maxrows.mlr -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --itsv --opprint put -q -f data/maxrows.mlr data/maxrows.tsv -GENMD_EOF +GENMD-EOF ## Feature-counting Suppose you have some [heterogeneous data](record-heterogeneity.md) like this: -GENMD_INCLUDE_ESCAPED(data/features.json) +GENMD-INCLUDE-ESCAPED(data/features.json) A reasonable question to ask is, how many occurrences of each field are there? And, what percentage of total row count has each of them? Since the denominator of the percentage is not known until the end, this is a two-pass algorithm: -GENMD_INCLUDE_ESCAPED(data/feature-count.mlr) +GENMD-INCLUDE-ESCAPED(data/feature-count.mlr) Then -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --json put -q -f data/feature-count.mlr data/features.json -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --ijson --opprint put -q -f data/feature-count.mlr data/features.json -GENMD_EOF +GENMD-EOF ## Unsparsing @@ -157,35 +157,35 @@ There is a keystroke-saving verb for this: [unsparsify](reference-verbs.md#unspa For example, suppose you have JSON input like this: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND cat data/sparse.json -GENMD_EOF +GENMD-EOF There are field names `a`, `b`, `v`, `u`, `x`, `w` in the data -- but not all in every record. Since we don't know the names of all the keys until we've read them all, this needs to be a two-pass algorithm. On the first pass, remember all the unique key names and all the records; on the second pass, loop through the records filling in absent values, then producing output. Use `put -q` since we don't want to produce per-record output, only emitting output in the `end` block: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND cat data/unsparsify.mlr -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --json put -q -f data/unsparsify.mlr data/sparse.json -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --ijson --ocsv put -q -f data/unsparsify.mlr data/sparse.json -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --ijson --opprint put -q -f data/unsparsify.mlr data/sparse.json -GENMD_EOF +GENMD-EOF ## Mean without/with oosvars -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --opprint stats1 -a mean -f x data/medium -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --opprint put -q ' @x_sum += $x; @x_count += 1; @@ -194,15 +194,15 @@ mlr --opprint put -q ' emit @x_mean } ' data/medium -GENMD_EOF +GENMD-EOF ## Keyed mean without/with oosvars -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --opprint stats1 -a mean -f x -g a,b data/medium -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --opprint put -q ' @x_sum[$a][$b] += $x; @x_count[$a][$b] += 1; @@ -213,45 +213,45 @@ mlr --opprint put -q ' emit @x_mean, "a", "b" } ' data/medium -GENMD_EOF +GENMD-EOF ## Variance and standard deviation without/with oosvars -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --oxtab stats1 -a count,sum,mean,var,stddev -f x data/medium -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND cat variance.mlr -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --oxtab put -q -f variance.mlr data/medium -GENMD_EOF +GENMD-EOF You can also do this keyed, of course, imitating the keyed-mean example above. ## Min/max without/with oosvars -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --oxtab stats1 -a min,max -f x data/medium -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --oxtab put -q ' @x_min = min(@x_min, $x); @x_max = max(@x_max, $x); end{emitf @x_min, @x_max} ' data/medium -GENMD_EOF +GENMD-EOF ## Keyed min/max without/with oosvars -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --opprint stats1 -a min,max -f x -g a data/medium -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --opprint --from data/medium put -q ' @min[$a] = min(@min[$a], $x); @max[$a] = max(@max[$a], $x); @@ -259,44 +259,44 @@ mlr --opprint --from data/medium put -q ' emit (@min, @max), "a"; } ' -GENMD_EOF +GENMD-EOF ## Delta without/with oosvars -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --opprint step -a delta -f x data/small -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --opprint put ' $x_delta = is_present(@last) ? $x - @last : 0; @last = $x ' data/small -GENMD_EOF +GENMD-EOF ## Keyed delta without/with oosvars -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --opprint step -a delta -f x -g a data/small -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --opprint put ' $x_delta = is_present(@last[$a]) ? $x - @last[$a] : 0; @last[$a]=$x ' data/small -GENMD_EOF +GENMD-EOF ## Exponentially weighted moving averages without/with oosvars -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --opprint step -a ewma -d 0.1 -f x data/small -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --opprint put ' begin{ @a=0.1 }; $e = NR==1 ? $x : @a * $x + (1 - @a) * @e; @e=$e ' data/small -GENMD_EOF +GENMD-EOF diff --git a/docs/src/unix-toolkit-context.md.in b/docs/src/unix-toolkit-context.md.in index a70363cf0..bea7b27f3 100644 --- a/docs/src/unix-toolkit-context.md.in +++ b/docs/src/unix-toolkit-context.md.in @@ -6,21 +6,21 @@ How does Miller fit within the Unix toolkit (`grep`, `sed`, `awk`, etc.)? Miller respects CSV headers. If you do `mlr --csv cat *.csv` then the header line is written once: -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND cat data/a.csv -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND cat data/b.csv -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --csv cat data/a.csv data/b.csv -GENMD_EOF +GENMD-EOF -GENMD_RUN_COMMAND +GENMD-RUN-COMMAND mlr --csv sort -nr b data/a.csv data/b.csv -GENMD_EOF +GENMD-EOF Likewise with `mlr sort`, `mlr tac`, and so on. diff --git a/go/README.md b/go/README.md index 3ec356604..06d0cf41d 100644 --- a/go/README.md +++ b/go/README.md @@ -1,47 +1,14 @@ -# Quickstart - -A TL;DR for anyone wanting to compile and run the Go port of Miller: +# Quickstart for developers * `go build` -- produces the `mlr` executable * Miller has tens of unit tests and thousands of regression tests: * `go test mlr/src/...` runs the unit tests. - * `go test` runs the regression tests. This runs the same tests that `mlr regtest` runs by default, but note that (see `mlr regtest -h`) the latter gives you more options. - * `./mlr regtest` -- runs `regtest/cases`, which are cases passing on all platforms - * `./mlr regtest regtest/cases-pending-go-port` -- needing Go code to be ported from C - * `./mlr regtest regtest/cases-pending-windows` -- for Go code already ported from C but needing some work for Windows + * `go test` or `mlr regtest` runs the regression tests in `regtest/cases/`. Using `mlr regtest -h` you can see more options available than are exposed by `go test`. -Pre-release/rough-draft docs are at http://johnkerl.org/miller6. +# Continuous integration -See also the tracking issue (somewhat redundant to this README file) https://github.com/johnkerl/miller/issues/372. - -A note on Continuous Integration: - -* The C implementation is auto-built for Linux using Travis: see [../.travis.yml](../.travis.yml). -* The C implementation is also auto-built for Windows using Appveyor: see [../appveyor.yml](../appveyor.yml). However Ifind that it often breaks and I'm bewildered as to how to fix it. -* See also [../README.md](../README.md). * The Go implementation is auto-built using GitHub Actions: see [../.github/workflows/go.yml](../.github/workflows/go.yml). This works splendidly on Linux, MacOS, and Windows. - -# Status of the Go port - -* This will be a full Go port of [Miller](https://miller.readthedocs.io/). Things are currently rough and iterative and incomplete. I don't have a firm timeline but I suspect it will take a few more months of late-evening/spare-time work. -* The released Go port will become Miller 6.0. As noted below, this will be a win both at the source-code level, and for users of Miller. -* I hope to retain backward compatibility at the command-line level as much as possible. -* In the meantime I will still keep fixing bugs, doing some features, etc. in C on Miller 5.x -- in the near term, support for Miller's C implementation continues as before. - -# Trying out the Go port - -* Caveat: *lots* of things present in the C implementation are currently missing in the Go implementation. So if something doesn't work, it's almost certainly because it doesn't work *yet*. -* That said, if anyone is interested in playing around with it and giving early feedback, I'll be happy for it. -* Building: - * Clone the Miller repo - * `cd go` - * `./build` should create `mlr`. If it doesn't do this on your platform, please [file an issue](https://github.com/johnkerl/miller/issues). -* Platforms tried so far: - * macOS with Go 1.14 and 1.16, Linux Mint with Go 1.10 and 1.16, and Windows 10 with Go 1.16 -* On-line help: - * `mlr --help` advertises some things the Go implementation doesn't actually do yet. - * `mlr --help-all-verbs` correctly lists verbs which do things in the Go implementation. -* See also https://github.com/johnkerl/miller/issues/372 +* See also [../README.md](../README.md). # Benefits of porting to Go @@ -62,10 +29,6 @@ A note on Continuous Integration: * Go is an up-and-coming language, with good reason -- it's mature, stable, with few of C's weaknesses and many of C's strengths. * The source code will be easier to read/maintain/write, by myself and others. -# Things which may change - -Please see https://github.com/johnkerl/miller/issues/372. - # Efficiency of the Go port As I wrote [here](http://johnkerl.org/miller/doc/whyc.html) back in 2015 I couldn't get Rust or Go (or any other language I tried) to do some test-case processing as quickly as C, so I stuck with C. diff --git a/go/regtest/cases/cli-help/0001/expout b/go/regtest/cases/cli-help/0001/expout index c5be9add4..b7ded340b 100644 --- a/go/regtest/cases/cli-help/0001/expout +++ b/go/regtest/cases/cli-help/0001/expout @@ -245,7 +245,7 @@ More example filter expressions: Using 'any' higher-order function to see if $index is 10, 20, or 30: 'any([10,20,30], func(e) {return $index == e})' -See also https://johnkerl.org/miller6/reference-dsl for more context. +See also https://miller.readthedocs.io/reference-dsl for more context. ================================================================ flatten @@ -683,7 +683,7 @@ More example put expressions: end{emitf @min, @max} ' -See also https://johnkerl.org/miller6/reference-dsl for more context. +See also https://miller.readthedocs.io/reference-dsl for more context. ================================================================ regularize diff --git a/go/src/auxents/repl/prompt.go b/go/src/auxents/repl/prompt.go index 31529296f..77c693c8c 100644 --- a/go/src/auxents/repl/prompt.go +++ b/go/src/auxents/repl/prompt.go @@ -51,7 +51,7 @@ func getPrompt2() string { func (repl *Repl) printStartupBanner() { if repl.inputIsTerminal { fmt.Printf("Miller %s REPL for %s:%s:%s\n", version.STRING, runtime.GOOS, runtime.GOARCH, runtime.Version()) - fmt.Printf("Pre-release docs for Miller 6: %s\n", lib.DOC_URL) + fmt.Printf("Docs: %s\n", lib.DOC_URL) fmt.Printf("Type ':h' or ':help' for online help; ':q' or ':quit' to quit.\n") } } diff --git a/go/src/input/record_reader_csv.go b/go/src/input/record_reader_csv.go index 78afceaa6..36b629094 100644 --- a/go/src/input/record_reader_csv.go +++ b/go/src/input/record_reader_csv.go @@ -23,7 +23,7 @@ type RecordReaderCSV struct { // ---------------------------------------------------------------- func NewRecordReaderCSV(readerOptions *cli.TReaderOptions) (*RecordReaderCSV, error) { if readerOptions.IRS != "\n" { - return nil, errors.New("CSV IRS can only be newline") + return nil, errors.New("CSV IRS can only be newline; LF vs CR/LF is autodetected.") } if len(readerOptions.IFS) != 1 { return nil, errors.New("CSV IFS can only be a single character") diff --git a/go/src/lib/docurl.go b/go/src/lib/docurl.go index 5d74876f7..d3cb2f386 100644 --- a/go/src/lib/docurl.go +++ b/go/src/lib/docurl.go @@ -1,6 +1,3 @@ package lib -// DOC_URL is for the current location of Miller 6 docs on the web. -// Miller 5 is released and its docs are at https://miller.readthedocs.io/en/latest. -// Miller 6 is pre-release and its doc are at the following location. -const DOC_URL = "https://johnkerl.org/miller6" +const DOC_URL = "https://miller.readthedocs.io" diff --git a/go/todo.txt b/go/todo.txt index b83cc9f8f..65aced04a 100644 --- a/go/todo.txt +++ b/go/todo.txt @@ -1,10 +1,28 @@ ================================================================ PUNCHDOWN LIST -* ./configure equivalent - o make: - - windoc note 'choco install make' - - (works in GH CI due to their toolchain) +* plan: + o blockers: + - fractional-strptime + - cmp-matrices + - all-contribs + - license triple-checks + - ./configure --prefix + ? alpha? + - csv irs lf/crlf ignores -- ? already is so? + - `mlr put` -> coverart + o doc / release: + - ? auto-cp go/mlr // go/mlr.exe to basedir? + - auto-path (../go/mlr) in docs dir ... + - brew as first trial -- ? + > brew macports chocolatey + ubuntu debian fedora gentoo prolinux archlinux + netbsd freebsd + +* post-release: + w installing-miller.md.in + w build.md.in developer/release notes + ? RTD -> GP -- ? ? twi-dm re all-contribs: all-contributors.org * nikos materials -> fold in @@ -46,7 +64,6 @@ PUNCHDOWN LIST o TODO in *.go & *.mi o release notes per se o ./configure whatever equivalent - o readthedocs -- find out what's necessary to get per-version history * doc o new-in-miller-6: missings: diff --git a/man/manpage.txt b/man/manpage.txt index 085886266..085790da2 100644 --- a/man/manpage.txt +++ b/man/manpage.txt @@ -19,7 +19,7 @@ SYNOPSIS example.csv Please see 'mlr help topics' for more information. Please also see - https://johnkerl.org/miller6 + https://miller.readthedocs.io DESCRIPTION @@ -998,7 +998,7 @@ VERBS Using 'any' higher-order function to see if $index is 10, 20, or 30: 'any([10,20,30], func(e) {return $index == e})' - See also https://johnkerl.org/miller6/reference-dsl for more context. + See also https://miller.readthedocs.io/reference-dsl for more context. flatten Usage: mlr flatten [options] @@ -1416,7 +1416,7 @@ VERBS end{emitf @min, @max} ' - See also https://johnkerl.org/miller6/reference-dsl for more context. + See also https://miller.readthedocs.io/reference-dsl for more context. regularize Usage: mlr regularize [options] @@ -2671,7 +2671,7 @@ KEYWORDS FOR PUT AND FILTER Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emit > stderr, @*, "index1", "index2"' Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emit | "grep somepattern", @*, "index1", "index2"' - Please see https://johnkerl.org/miller6://johnkerl.org/miller/doc for more information. + Please see https://miller.readthedocs.io://johnkerl.org/miller/doc for more information. emitf emitf: inserts non-indexed out-of-stream variable(s) side-by-side into the @@ -2699,7 +2699,7 @@ KEYWORDS FOR PUT AND FILTER Example: mlr --from f.dat put '@a=$i;@b+=$x;@c+=$y; emitf | "grep somepattern", @a, @b, @c' Example: mlr --from f.dat put '@a=$i;@b+=$x;@c+=$y; emitf | "grep somepattern > mytap.dat", @a, @b, @c' - Please see https://johnkerl.org/miller6://johnkerl.org/miller/doc for more information. + Please see https://miller.readthedocs.io://johnkerl.org/miller/doc for more information. emitp emitp: inserts an out-of-stream variable into the output record stream. @@ -2729,7 +2729,7 @@ KEYWORDS FOR PUT AND FILTER Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emitp > stderr, @*, "index1", "index2"' Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emitp | "grep somepattern", @*, "index1", "index2"' - Please see https://johnkerl.org/miller6://johnkerl.org/miller/doc for more information. + Please see https://miller.readthedocs.io://johnkerl.org/miller/doc for more information. end end: defines a block of statements to be executed after input records @@ -2954,4 +2954,4 @@ SEE ALSO - 2021-11-05 MILLER(1) + 2021-11-06 MILLER(1) diff --git a/man/mlr.1 b/man/mlr.1 index 45276b43d..61fd0858f 100644 --- a/man/mlr.1 +++ b/man/mlr.1 @@ -2,12 +2,12 @@ .\" Title: mlr .\" Author: [see the "AUTHOR" section] .\" Generator: ./mkman.rb -.\" Date: 2021-11-05 +.\" Date: 2021-11-06 .\" Manual: \ \& .\" Source: \ \& .\" Language: English .\" -.TH "MILLER" "1" "2021-11-05" "\ \&" "\ \&" +.TH "MILLER" "1" "2021-11-06" "\ \&" "\ \&" .\" ----------------------------------------------------------------- .\" * Portability definitions .\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -38,7 +38,7 @@ Output of one verb may be chained as input to another using "then", e.g. mlr --csv stats1 -a min,mean,max -f quantity then sort -f color example.csv Please see 'mlr help topics' for more information. -Please also see https://johnkerl.org/miller6 +Please also see https://miller.readthedocs.io .SH "DESCRIPTION" .sp @@ -1245,7 +1245,7 @@ More example filter expressions: Using 'any' higher-order function to see if $index is 10, 20, or 30: 'any([10,20,30], func(e) {return $index == e})' -See also https://johnkerl.org/miller6/reference-dsl for more context. +See also https://miller.readthedocs.io/reference-dsl for more context. .fi .if n \{\ .RE @@ -1783,7 +1783,7 @@ More example put expressions: end{emitf @min, @max} ' -See also https://johnkerl.org/miller6/reference-dsl for more context. +See also https://miller.readthedocs.io/reference-dsl for more context. .fi .if n \{\ .RE @@ -4488,7 +4488,7 @@ etc., to control the format of the output if the output is redirected. See also Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emit > stderr, @*, "index1", "index2"' Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emit | "grep somepattern", @*, "index1", "index2"' -Please see https://johnkerl.org/miller6://johnkerl.org/miller/doc for more information. +Please see https://miller.readthedocs.io://johnkerl.org/miller/doc for more information. .fi .if n \{\ .RE @@ -4522,7 +4522,7 @@ etc., to control the format of the output if the output is redirected. See also Example: mlr --from f.dat put '@a=$i;@b+=$x;@c+=$y; emitf | "grep somepattern", @a, @b, @c' Example: mlr --from f.dat put '@a=$i;@b+=$x;@c+=$y; emitf | "grep somepattern > mytap.dat", @a, @b, @c' -Please see https://johnkerl.org/miller6://johnkerl.org/miller/doc for more information. +Please see https://miller.readthedocs.io://johnkerl.org/miller/doc for more information. .fi .if n \{\ .RE @@ -4558,7 +4558,7 @@ etc., to control the format of the output if the output is redirected. See also Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emitp > stderr, @*, "index1", "index2"' Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emitp | "grep somepattern", @*, "index1", "index2"' -Please see https://johnkerl.org/miller6://johnkerl.org/miller/doc for more information. +Please see https://miller.readthedocs.io://johnkerl.org/miller/doc for more information. .fi .if n \{\ .RE