From 38d7de545db5148d7dd66d56cafa33a1ab9e9ab3 Mon Sep 17 00:00:00 2001 From: John Kerl Date: Sat, 4 Mar 2023 00:15:01 -0500 Subject: [PATCH] Docs re `tail -f` and `--records-per-batch 1` (#1218) --- docs/src/manpage.md | 20 ++++++++++++-------- docs/src/manpage.txt | 20 ++++++++++++-------- docs/src/new-in-miller-6.md | 2 +- docs/src/new-in-miller-6.md.in | 2 +- docs/src/reference-verbs.md | 18 +++++++++++------- docs/src/streaming-and-memory.md | 12 ++++++------ docs/src/streaming-and-memory.md.in | 12 ++++++------ internal/pkg/transformers/nest.go | 5 +++-- internal/pkg/transformers/reshape.go | 3 ++- internal/pkg/transformers/stats1.go | 7 ++++--- internal/pkg/transformers/stats2.go | 5 +++-- man/manpage.txt | 20 ++++++++++++-------- man/mlr.1 | 22 +++++++++++++--------- test/cases/cli-help/0001/expout | 18 +++++++++++------- 14 files changed, 97 insertions(+), 69 deletions(-) diff --git a/docs/src/manpage.md b/docs/src/manpage.md index a7ddaccde..1e31d4269 100644 --- a/docs/src/manpage.md +++ b/docs/src/manpage.md @@ -1450,8 +1450,9 @@ MILLER(1) MILLER(1) been lost. * The combination "--implode --values --across-records" is non-streaming: no output records are produced until all input records have been read. In - particular, this means it won't work in tail -f contexts. But all other flag - combinations result in streaming (tail -f friendly) data processing. + particular, this means it won't work in `tail -f` contexts. But all other flag + combinations result in streaming (`tail -f` friendly) data processing. + If input is coming from `tail -f`, be sure to use `--records-per-batch 1`. * It's up to you to ensure that the nested-fs is distinct from your data's IFS: e.g. by default the former is semicolon and the latter is comma. See also mlr reshape. @@ -1633,7 +1634,8 @@ MILLER(1) MILLER(1) Note: if you have multiple regexes, please specify them using multiple -r, since regexes can contain commas within them. Note: this works with tail -f and produces output records for each input - record seen. + record seen. If input is coming from `tail -f`, be sure to use + `--records-per-batch 1`. Long-to-wide options: -s {key-field name,value-field name} These pivot/reshape the input data to undo the wide-to-long operation. @@ -1858,9 +1860,10 @@ MILLER(1) MILLER(1) -i Use interpolated percentiles, like R's type=7; default like type=1. Not sensical for string-valued fields.\n"); - -s Print iterative stats. Useful in tail -f contexts (in which + -s Print iterative stats. Useful in tail -f contexts, in which case please avoid pprint-format output since end of input - stream will never be seen). + stream will never be seen. Likewise, if input is coming from `tail -f` + be sure to use `--records-per-batch 1`. -h|--help Show this message. Example: mlr stats1 -a min,p10,p50,p90,max -f value -g size,shape Example: mlr stats1 -a count,mode -f size @@ -1896,9 +1899,10 @@ MILLER(1) MILLER(1) There must be an even number of names. -g {e,f,g} Optional group-by-field names. -v Print additional output for linreg-pca. - -s Print iterative stats. Useful in tail -f contexts (in which + -s Print iterative stats. Useful in tail -f contexts, in which case please avoid pprint-format output since end of input - stream will never be seen). + stream will never be seen. Likewise, if input is coming from + `tail -f`, be sure to use `--records-per-batch 1`. --fit Rather than printing regression parameters, applies them to the input data to compute new fit fields. All input records are held in memory until end of input stream. Has effect only for @@ -3314,5 +3318,5 @@ MILLER(1) MILLER(1) - 2023-03-02 MILLER(1) + 2023-03-04 MILLER(1) diff --git a/docs/src/manpage.txt b/docs/src/manpage.txt index ef001ac32..9651b66a1 100644 --- a/docs/src/manpage.txt +++ b/docs/src/manpage.txt @@ -1429,8 +1429,9 @@ MILLER(1) MILLER(1) been lost. * The combination "--implode --values --across-records" is non-streaming: no output records are produced until all input records have been read. In - particular, this means it won't work in tail -f contexts. But all other flag - combinations result in streaming (tail -f friendly) data processing. + particular, this means it won't work in `tail -f` contexts. But all other flag + combinations result in streaming (`tail -f` friendly) data processing. + If input is coming from `tail -f`, be sure to use `--records-per-batch 1`. * It's up to you to ensure that the nested-fs is distinct from your data's IFS: e.g. by default the former is semicolon and the latter is comma. See also mlr reshape. @@ -1612,7 +1613,8 @@ MILLER(1) MILLER(1) Note: if you have multiple regexes, please specify them using multiple -r, since regexes can contain commas within them. Note: this works with tail -f and produces output records for each input - record seen. + record seen. If input is coming from `tail -f`, be sure to use + `--records-per-batch 1`. Long-to-wide options: -s {key-field name,value-field name} These pivot/reshape the input data to undo the wide-to-long operation. @@ -1837,9 +1839,10 @@ MILLER(1) MILLER(1) -i Use interpolated percentiles, like R's type=7; default like type=1. Not sensical for string-valued fields.\n"); - -s Print iterative stats. Useful in tail -f contexts (in which + -s Print iterative stats. Useful in tail -f contexts, in which case please avoid pprint-format output since end of input - stream will never be seen). + stream will never be seen. Likewise, if input is coming from `tail -f` + be sure to use `--records-per-batch 1`. -h|--help Show this message. Example: mlr stats1 -a min,p10,p50,p90,max -f value -g size,shape Example: mlr stats1 -a count,mode -f size @@ -1875,9 +1878,10 @@ MILLER(1) MILLER(1) There must be an even number of names. -g {e,f,g} Optional group-by-field names. -v Print additional output for linreg-pca. - -s Print iterative stats. Useful in tail -f contexts (in which + -s Print iterative stats. Useful in tail -f contexts, in which case please avoid pprint-format output since end of input - stream will never be seen). + stream will never be seen. Likewise, if input is coming from + `tail -f`, be sure to use `--records-per-batch 1`. --fit Rather than printing regression parameters, applies them to the input data to compute new fit fields. All input records are held in memory until end of input stream. Has effect only for @@ -3293,4 +3297,4 @@ MILLER(1) MILLER(1) - 2023-03-02 MILLER(1) + 2023-03-04 MILLER(1) diff --git a/docs/src/new-in-miller-6.md b/docs/src/new-in-miller-6.md index 13e17a06f..3170819c9 100644 --- a/docs/src/new-in-miller-6.md +++ b/docs/src/new-in-miller-6.md @@ -178,7 +178,7 @@ now a keyword so this is no longer usable as a local-variable or UDF name.) JSON support is improved: * Direct support for arrays means that you can now use Miller to process more JSON files than ever before. -* Streamable JSON parsing: Miller's internal record-processing pipeline starts as soon as the first record is read (which was already the case for other file formats). This means that, unless records are wrapped with outermost `[...]`, Miller now handles JSON / JSON Lines in `tail -f` contexts like it does for other file formats. +* Streamable JSON parsing: Miller's internal record-processing pipeline starts as soon as the first record is read (which was already the case for other file formats). This means that, unless records are wrapped with outermost `[...]`, Miller now handles JSON / JSON Lines in `tail -f` contexts like it does for other file formats. (Note: use `--records-per-batch 1 for `tail -f` input.) * Flatten/unflatten -- conversion of JSON nested data structures (arrays and/or maps in record values) to/from non-JSON formats is a powerful new feature, discussed in the page [Flatten/unflatten: JSON vs. tabular formats](flatten-unflatten.md). * Since types are better handled now, the workaround flags `--jvquoteall` and `--jknquoteint` no longer have meaning -- although they're accepted as no-ops at the command line for backward compatibility. * Update: `--jvquoteall` was restored shortly after 6he 6.4.0 release. diff --git a/docs/src/new-in-miller-6.md.in b/docs/src/new-in-miller-6.md.in index 8b0b7f3fe..43ea44d90 100644 --- a/docs/src/new-in-miller-6.md.in +++ b/docs/src/new-in-miller-6.md.in @@ -147,7 +147,7 @@ now a keyword so this is no longer usable as a local-variable or UDF name.) JSON support is improved: * Direct support for arrays means that you can now use Miller to process more JSON files than ever before. -* Streamable JSON parsing: Miller's internal record-processing pipeline starts as soon as the first record is read (which was already the case for other file formats). This means that, unless records are wrapped with outermost `[...]`, Miller now handles JSON / JSON Lines in `tail -f` contexts like it does for other file formats. +* Streamable JSON parsing: Miller's internal record-processing pipeline starts as soon as the first record is read (which was already the case for other file formats). This means that, unless records are wrapped with outermost `[...]`, Miller now handles JSON / JSON Lines in `tail -f` contexts like it does for other file formats. (Note: use `--records-per-batch 1 for `tail -f` input.) * Flatten/unflatten -- conversion of JSON nested data structures (arrays and/or maps in record values) to/from non-JSON formats is a powerful new feature, discussed in the page [Flatten/unflatten: JSON vs. tabular formats](flatten-unflatten.md). * Since types are better handled now, the workaround flags `--jvquoteall` and `--jknquoteint` no longer have meaning -- although they're accepted as no-ops at the command line for backward compatibility. * Update: `--jvquoteall` was restored shortly after 6he 6.4.0 release. diff --git a/docs/src/reference-verbs.md b/docs/src/reference-verbs.md index 6bc3fd8bf..853adfc33 100644 --- a/docs/src/reference-verbs.md +++ b/docs/src/reference-verbs.md @@ -2175,8 +2175,9 @@ Notes: been lost. * The combination "--implode --values --across-records" is non-streaming: no output records are produced until all input records have been read. In - particular, this means it won't work in tail -f contexts. But all other flag - combinations result in streaming (tail -f friendly) data processing. + particular, this means it won't work in `tail -f` contexts. But all other flag + combinations result in streaming (`tail -f` friendly) data processing. + If input is coming from `tail -f`, be sure to use `--records-per-batch 1`. * It's up to you to ensure that the nested-fs is distinct from your data's IFS: e.g. by default the former is semicolon and the latter is comma. See also mlr reshape. @@ -2561,7 +2562,8 @@ Wide-to-long options: Note: if you have multiple regexes, please specify them using multiple -r, since regexes can contain commas within them. Note: this works with tail -f and produces output records for each input - record seen. + record seen. If input is coming from `tail -f`, be sure to use + `--records-per-batch 1`. Long-to-wide options: -s {key-field name,value-field name} These pivot/reshape the input data to undo the wide-to-long operation. @@ -3118,9 +3120,10 @@ Options: -i Use interpolated percentiles, like R's type=7; default like type=1. Not sensical for string-valued fields.\n"); --s Print iterative stats. Useful in tail -f contexts (in which +-s Print iterative stats. Useful in tail -f contexts, in which case please avoid pprint-format output since end of input - stream will never be seen). + stream will never be seen. Likewise, if input is coming from `tail -f` + be sure to use `--records-per-batch 1`. -h|--help Show this message. Example: mlr stats1 -a min,p10,p50,p90,max -f value -g size,shape Example: mlr stats1 -a count,mode -f size @@ -3235,9 +3238,10 @@ accumulated across the input record stream. There must be an even number of names. -g {e,f,g} Optional group-by-field names. -v Print additional output for linreg-pca. --s Print iterative stats. Useful in tail -f contexts (in which +-s Print iterative stats. Useful in tail -f contexts, in which case please avoid pprint-format output since end of input - stream will never be seen). + stream will never be seen. Likewise, if input is coming from + `tail -f`, be sure to use `--records-per-batch 1`. --fit Rather than printing regression parameters, applies them to the input data to compute new fit fields. All input records are held in memory until end of input stream. Has effect only for diff --git a/docs/src/streaming-and-memory.md b/docs/src/streaming-and-memory.md index 0404ae9df..01020b876 100644 --- a/docs/src/streaming-and-memory.md +++ b/docs/src/streaming-and-memory.md @@ -33,12 +33,12 @@ rather than looping through them explicitly. Since Miller takes the streaming approach when possible (see below for exceptions), you can often operate on files which are larger than your system's -memory . It also means you can do `tail -f some-file | mlr --some-flags` and -Miller will operate on records as they arrive one at a time. You don't have to -wait for and end-of-file marker (which never arrives with `tail-f`) to start -seeing partial results. This also means if you pipe Miller's output to other -streaming tools (like `cat`, `grep`, `sed`, and so on), they will also output -partial results as data arrives. +memory . It also means you can do `tail -f some-file | mlr --records-per-batch 1 +--some-flags` and Miller will operate on records as they arrive one at a time. +You don't have to wait for and end-of-file marker (which never arrives with +`tail-f`) to start seeing partial results. This also means if you pipe Miller's +output to other streaming tools (like `cat`, `grep`, `sed`, and so on), they +will also output partial results as data arrives. The statements in the [Miller programming language](miller-programming-language.md) (outside of optional `begin`/`end` blocks which execute before and after all diff --git a/docs/src/streaming-and-memory.md.in b/docs/src/streaming-and-memory.md.in index 45b9754eb..616a0cc52 100644 --- a/docs/src/streaming-and-memory.md.in +++ b/docs/src/streaming-and-memory.md.in @@ -17,12 +17,12 @@ rather than looping through them explicitly. Since Miller takes the streaming approach when possible (see below for exceptions), you can often operate on files which are larger than your system's -memory . It also means you can do `tail -f some-file | mlr --some-flags` and -Miller will operate on records as they arrive one at a time. You don't have to -wait for and end-of-file marker (which never arrives with `tail-f`) to start -seeing partial results. This also means if you pipe Miller's output to other -streaming tools (like `cat`, `grep`, `sed`, and so on), they will also output -partial results as data arrives. +memory . It also means you can do `tail -f some-file | mlr --records-per-batch 1 +--some-flags` and Miller will operate on records as they arrive one at a time. +You don't have to wait for and end-of-file marker (which never arrives with +`tail-f`) to start seeing partial results. This also means if you pipe Miller's +output to other streaming tools (like `cat`, `grep`, `sed`, and so on), they +will also output partial results as data arrives. The statements in the [Miller programming language](miller-programming-language.md) (outside of optional `begin`/`end` blocks which execute before and after all diff --git a/internal/pkg/transformers/nest.go b/internal/pkg/transformers/nest.go index 55ea4cc9b..29a034989 100644 --- a/internal/pkg/transformers/nest.go +++ b/internal/pkg/transformers/nest.go @@ -80,8 +80,9 @@ func transformerNestUsage( fmt.Fprintf(o, " been lost.\n") fmt.Fprintf(o, "* The combination \"--implode --values --across-records\" is non-streaming:\n") fmt.Fprintf(o, " no output records are produced until all input records have been read. In\n") - fmt.Fprintf(o, " particular, this means it won't work in tail -f contexts. But all other flag\n") - fmt.Fprintf(o, " combinations result in streaming (tail -f friendly) data processing.\n") + fmt.Fprintf(o, " particular, this means it won't work in `tail -f` contexts. But all other flag\n") + fmt.Fprintf(o, " combinations result in streaming (`tail -f` friendly) data processing.\n") + fmt.Fprintf(o, " If input is coming from `tail -f`, be sure to use `--records-per-batch 1`.\n") fmt.Fprintf(o, "* It's up to you to ensure that the nested-fs is distinct from your data's IFS:\n") fmt.Fprintf(o, " e.g. by default the former is semicolon and the latter is comma.\n") fmt.Fprintf(o, "See also %s reshape.\n", argv0) diff --git a/internal/pkg/transformers/reshape.go b/internal/pkg/transformers/reshape.go index 49f7048ff..f82c16dab 100644 --- a/internal/pkg/transformers/reshape.go +++ b/internal/pkg/transformers/reshape.go @@ -66,7 +66,8 @@ func transformerReshapeUsage( fmt.Fprintf(o, " Note: if you have multiple regexes, please specify them using multiple -r,\n") fmt.Fprintf(o, " since regexes can contain commas within them.\n") fmt.Fprintf(o, " Note: this works with tail -f and produces output records for each input\n") - fmt.Fprintf(o, " record seen.\n") + fmt.Fprintf(o, " record seen. If input is coming from `tail -f`, be sure to use\n") + fmt.Fprintf(o, " `--records-per-batch 1`.\n") fmt.Fprintf(o, "Long-to-wide options:\n") fmt.Fprintf(o, " -s {key-field name,value-field name}\n") fmt.Fprintf(o, " These pivot/reshape the input data to undo the wide-to-long operation.\n") diff --git a/internal/pkg/transformers/stats1.go b/internal/pkg/transformers/stats1.go index 94702fee1..ade693989 100644 --- a/internal/pkg/transformers/stats1.go +++ b/internal/pkg/transformers/stats1.go @@ -55,11 +55,12 @@ Options: -i Use interpolated percentiles, like R's type=7; default like type=1. Not sensical for string-valued fields.\n"); --s Print iterative stats. Useful in tail -f contexts (in which +-s Print iterative stats. Useful in tail -f contexts, in which case please avoid pprint-format output since end of input - stream will never be seen). --h|--help Show this message. `) + fmt.Fprintln(o, " stream will never be seen. Likewise, if input is coming from `tail -f`") + fmt.Fprintln(o, " be sure to use `--records-per-batch 1`.") + fmt.Fprintln(o, "-h|--help Show this message.") fmt.Fprintln(o, "Example: mlr stats1 -a min,p10,p50,p90,max -f value -g size,shape") diff --git a/internal/pkg/transformers/stats2.go b/internal/pkg/transformers/stats2.go index 0fd40dbab..c8f163911 100644 --- a/internal/pkg/transformers/stats2.go +++ b/internal/pkg/transformers/stats2.go @@ -44,9 +44,10 @@ func transformerStats2Usage( fmt.Fprintf(o, " There must be an even number of names.\n") fmt.Fprintf(o, "-g {e,f,g} Optional group-by-field names.\n") fmt.Fprintf(o, "-v Print additional output for linreg-pca.\n") - fmt.Fprintf(o, "-s Print iterative stats. Useful in tail -f contexts (in which\n") + fmt.Fprintf(o, "-s Print iterative stats. Useful in tail -f contexts, in which\n") fmt.Fprintf(o, " case please avoid pprint-format output since end of input\n") - fmt.Fprintf(o, " stream will never be seen).\n") + fmt.Fprintf(o, " stream will never be seen. Likewise, if input is coming from\n") + fmt.Fprintf(o, " `tail -f`, be sure to use `--records-per-batch 1`.\n") fmt.Fprintf(o, "--fit Rather than printing regression parameters, applies them to\n") fmt.Fprintf(o, " the input data to compute new fit fields. All input records are\n") fmt.Fprintf(o, " held in memory until end of input stream. Has effect only for\n") diff --git a/man/manpage.txt b/man/manpage.txt index ef001ac32..9651b66a1 100644 --- a/man/manpage.txt +++ b/man/manpage.txt @@ -1429,8 +1429,9 @@ MILLER(1) MILLER(1) been lost. * The combination "--implode --values --across-records" is non-streaming: no output records are produced until all input records have been read. In - particular, this means it won't work in tail -f contexts. But all other flag - combinations result in streaming (tail -f friendly) data processing. + particular, this means it won't work in `tail -f` contexts. But all other flag + combinations result in streaming (`tail -f` friendly) data processing. + If input is coming from `tail -f`, be sure to use `--records-per-batch 1`. * It's up to you to ensure that the nested-fs is distinct from your data's IFS: e.g. by default the former is semicolon and the latter is comma. See also mlr reshape. @@ -1612,7 +1613,8 @@ MILLER(1) MILLER(1) Note: if you have multiple regexes, please specify them using multiple -r, since regexes can contain commas within them. Note: this works with tail -f and produces output records for each input - record seen. + record seen. If input is coming from `tail -f`, be sure to use + `--records-per-batch 1`. Long-to-wide options: -s {key-field name,value-field name} These pivot/reshape the input data to undo the wide-to-long operation. @@ -1837,9 +1839,10 @@ MILLER(1) MILLER(1) -i Use interpolated percentiles, like R's type=7; default like type=1. Not sensical for string-valued fields.\n"); - -s Print iterative stats. Useful in tail -f contexts (in which + -s Print iterative stats. Useful in tail -f contexts, in which case please avoid pprint-format output since end of input - stream will never be seen). + stream will never be seen. Likewise, if input is coming from `tail -f` + be sure to use `--records-per-batch 1`. -h|--help Show this message. Example: mlr stats1 -a min,p10,p50,p90,max -f value -g size,shape Example: mlr stats1 -a count,mode -f size @@ -1875,9 +1878,10 @@ MILLER(1) MILLER(1) There must be an even number of names. -g {e,f,g} Optional group-by-field names. -v Print additional output for linreg-pca. - -s Print iterative stats. Useful in tail -f contexts (in which + -s Print iterative stats. Useful in tail -f contexts, in which case please avoid pprint-format output since end of input - stream will never be seen). + stream will never be seen. Likewise, if input is coming from + `tail -f`, be sure to use `--records-per-batch 1`. --fit Rather than printing regression parameters, applies them to the input data to compute new fit fields. All input records are held in memory until end of input stream. Has effect only for @@ -3293,4 +3297,4 @@ MILLER(1) MILLER(1) - 2023-03-02 MILLER(1) + 2023-03-04 MILLER(1) diff --git a/man/mlr.1 b/man/mlr.1 index 6f5fe38fe..ff9da5df7 100644 --- a/man/mlr.1 +++ b/man/mlr.1 @@ -2,12 +2,12 @@ .\" Title: mlr .\" Author: [see the "AUTHOR" section] .\" Generator: ./mkman.rb -.\" Date: 2023-03-02 +.\" Date: 2023-03-04 .\" Manual: \ \& .\" Source: \ \& .\" Language: English .\" -.TH "MILLER" "1" "2023-03-02" "\ \&" "\ \&" +.TH "MILLER" "1" "2023-03-04" "\ \&" "\ \&" .\" ----------------------------------------------------------------- .\" * Portability definitions .\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -1798,8 +1798,9 @@ Notes: been lost. * The combination "--implode --values --across-records" is non-streaming: no output records are produced until all input records have been read. In - particular, this means it won't work in tail -f contexts. But all other flag - combinations result in streaming (tail -f friendly) data processing. + particular, this means it won't work in `tail -f` contexts. But all other flag + combinations result in streaming (`tail -f` friendly) data processing. + If input is coming from `tail -f`, be sure to use `--records-per-batch 1`. * It's up to you to ensure that the nested-fs is distinct from your data's IFS: e.g. by default the former is semicolon and the latter is comma. See also mlr reshape. @@ -2029,7 +2030,8 @@ Wide-to-long options: Note: if you have multiple regexes, please specify them using multiple -r, since regexes can contain commas within them. Note: this works with tail -f and produces output records for each input - record seen. + record seen. If input is coming from `tail -f`, be sure to use + `--records-per-batch 1`. Long-to-wide options: -s {key-field name,value-field name} These pivot/reshape the input data to undo the wide-to-long operation. @@ -2314,9 +2316,10 @@ Options: -i Use interpolated percentiles, like R's type=7; default like type=1. Not sensical for string-valued fields.\en"); --s Print iterative stats. Useful in tail -f contexts (in which +-s Print iterative stats. Useful in tail -f contexts, in which case please avoid pprint-format output since end of input - stream will never be seen). + stream will never be seen. Likewise, if input is coming from `tail -f` + be sure to use `--records-per-batch 1`. -h|--help Show this message. Example: mlr stats1 -a min,p10,p50,p90,max -f value -g size,shape Example: mlr stats1 -a count,mode -f size @@ -2358,9 +2361,10 @@ accumulated across the input record stream. There must be an even number of names. -g {e,f,g} Optional group-by-field names. -v Print additional output for linreg-pca. --s Print iterative stats. Useful in tail -f contexts (in which +-s Print iterative stats. Useful in tail -f contexts, in which case please avoid pprint-format output since end of input - stream will never be seen). + stream will never be seen. Likewise, if input is coming from + `tail -f`, be sure to use `--records-per-batch 1`. --fit Rather than printing regression parameters, applies them to the input data to compute new fit fields. All input records are held in memory until end of input stream. Has effect only for diff --git a/test/cases/cli-help/0001/expout b/test/cases/cli-help/0001/expout index 1c0795f0b..100f50b19 100644 --- a/test/cases/cli-help/0001/expout +++ b/test/cases/cli-help/0001/expout @@ -609,8 +609,9 @@ Notes: been lost. * The combination "--implode --values --across-records" is non-streaming: no output records are produced until all input records have been read. In - particular, this means it won't work in tail -f contexts. But all other flag - combinations result in streaming (tail -f friendly) data processing. + particular, this means it won't work in `tail -f` contexts. But all other flag + combinations result in streaming (`tail -f` friendly) data processing. + If input is coming from `tail -f`, be sure to use `--records-per-batch 1`. * It's up to you to ensure that the nested-fs is distinct from your data's IFS: e.g. by default the former is semicolon and the latter is comma. See also mlr reshape. @@ -800,7 +801,8 @@ Wide-to-long options: Note: if you have multiple regexes, please specify them using multiple -r, since regexes can contain commas within them. Note: this works with tail -f and produces output records for each input - record seen. + record seen. If input is coming from `tail -f`, be sure to use + `--records-per-batch 1`. Long-to-wide options: -s {key-field name,value-field name} These pivot/reshape the input data to undo the wide-to-long operation. @@ -1035,9 +1037,10 @@ Options: -i Use interpolated percentiles, like R's type=7; default like type=1. Not sensical for string-valued fields.\n"); --s Print iterative stats. Useful in tail -f contexts (in which +-s Print iterative stats. Useful in tail -f contexts, in which case please avoid pprint-format output since end of input - stream will never be seen). + stream will never be seen. Likewise, if input is coming from `tail -f` + be sure to use `--records-per-batch 1`. -h|--help Show this message. Example: mlr stats1 -a min,p10,p50,p90,max -f value -g size,shape Example: mlr stats1 -a count,mode -f size @@ -1074,9 +1077,10 @@ accumulated across the input record stream. There must be an even number of names. -g {e,f,g} Optional group-by-field names. -v Print additional output for linreg-pca. --s Print iterative stats. Useful in tail -f contexts (in which +-s Print iterative stats. Useful in tail -f contexts, in which case please avoid pprint-format output since end of input - stream will never be seen). + stream will never be seen. Likewise, if input is coming from + `tail -f`, be sure to use `--records-per-batch 1`. --fit Rather than printing regression parameters, applies them to the input data to compute new fit fields. All input records are held in memory until end of input stream. Has effect only for