Docs re tail -f and --records-per-batch 1 (#1218)

This commit is contained in:
John Kerl 2023-03-04 00:15:01 -05:00 committed by GitHub
parent e10a0d35ae
commit 38d7de545d
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
14 changed files with 97 additions and 69 deletions

View file

@ -1450,8 +1450,9 @@ MILLER(1) MILLER(1)
been lost.
* The combination "--implode --values --across-records" is non-streaming:
no output records are produced until all input records have been read. In
particular, this means it won't work in tail -f contexts. But all other flag
combinations result in streaming (tail -f friendly) data processing.
particular, this means it won't work in `tail -f` contexts. But all other flag
combinations result in streaming (`tail -f` friendly) data processing.
If input is coming from `tail -f`, be sure to use `--records-per-batch 1`.
* It's up to you to ensure that the nested-fs is distinct from your data's IFS:
e.g. by default the former is semicolon and the latter is comma.
See also mlr reshape.
@ -1633,7 +1634,8 @@ MILLER(1) MILLER(1)
Note: if you have multiple regexes, please specify them using multiple -r,
since regexes can contain commas within them.
Note: this works with tail -f and produces output records for each input
record seen.
record seen. If input is coming from `tail -f`, be sure to use
`--records-per-batch 1`.
Long-to-wide options:
-s {key-field name,value-field name}
These pivot/reshape the input data to undo the wide-to-long operation.
@ -1858,9 +1860,10 @@ MILLER(1) MILLER(1)
-i Use interpolated percentiles, like R's type=7; default like type=1.
Not sensical for string-valued fields.\n");
-s Print iterative stats. Useful in tail -f contexts (in which
-s Print iterative stats. Useful in tail -f contexts, in which
case please avoid pprint-format output since end of input
stream will never be seen).
stream will never be seen. Likewise, if input is coming from `tail -f`
be sure to use `--records-per-batch 1`.
-h|--help Show this message.
Example: mlr stats1 -a min,p10,p50,p90,max -f value -g size,shape
Example: mlr stats1 -a count,mode -f size
@ -1896,9 +1899,10 @@ MILLER(1) MILLER(1)
There must be an even number of names.
-g {e,f,g} Optional group-by-field names.
-v Print additional output for linreg-pca.
-s Print iterative stats. Useful in tail -f contexts (in which
-s Print iterative stats. Useful in tail -f contexts, in which
case please avoid pprint-format output since end of input
stream will never be seen).
stream will never be seen. Likewise, if input is coming from
`tail -f`, be sure to use `--records-per-batch 1`.
--fit Rather than printing regression parameters, applies them to
the input data to compute new fit fields. All input records are
held in memory until end of input stream. Has effect only for
@ -3314,5 +3318,5 @@ MILLER(1) MILLER(1)
2023-03-02 MILLER(1)
2023-03-04 MILLER(1)
</pre>

View file

@ -1429,8 +1429,9 @@ MILLER(1) MILLER(1)
been lost.
* The combination "--implode --values --across-records" is non-streaming:
no output records are produced until all input records have been read. In
particular, this means it won't work in tail -f contexts. But all other flag
combinations result in streaming (tail -f friendly) data processing.
particular, this means it won't work in `tail -f` contexts. But all other flag
combinations result in streaming (`tail -f` friendly) data processing.
If input is coming from `tail -f`, be sure to use `--records-per-batch 1`.
* It's up to you to ensure that the nested-fs is distinct from your data's IFS:
e.g. by default the former is semicolon and the latter is comma.
See also mlr reshape.
@ -1612,7 +1613,8 @@ MILLER(1) MILLER(1)
Note: if you have multiple regexes, please specify them using multiple -r,
since regexes can contain commas within them.
Note: this works with tail -f and produces output records for each input
record seen.
record seen. If input is coming from `tail -f`, be sure to use
`--records-per-batch 1`.
Long-to-wide options:
-s {key-field name,value-field name}
These pivot/reshape the input data to undo the wide-to-long operation.
@ -1837,9 +1839,10 @@ MILLER(1) MILLER(1)
-i Use interpolated percentiles, like R's type=7; default like type=1.
Not sensical for string-valued fields.\n");
-s Print iterative stats. Useful in tail -f contexts (in which
-s Print iterative stats. Useful in tail -f contexts, in which
case please avoid pprint-format output since end of input
stream will never be seen).
stream will never be seen. Likewise, if input is coming from `tail -f`
be sure to use `--records-per-batch 1`.
-h|--help Show this message.
Example: mlr stats1 -a min,p10,p50,p90,max -f value -g size,shape
Example: mlr stats1 -a count,mode -f size
@ -1875,9 +1878,10 @@ MILLER(1) MILLER(1)
There must be an even number of names.
-g {e,f,g} Optional group-by-field names.
-v Print additional output for linreg-pca.
-s Print iterative stats. Useful in tail -f contexts (in which
-s Print iterative stats. Useful in tail -f contexts, in which
case please avoid pprint-format output since end of input
stream will never be seen).
stream will never be seen. Likewise, if input is coming from
`tail -f`, be sure to use `--records-per-batch 1`.
--fit Rather than printing regression parameters, applies them to
the input data to compute new fit fields. All input records are
held in memory until end of input stream. Has effect only for
@ -3293,4 +3297,4 @@ MILLER(1) MILLER(1)
2023-03-02 MILLER(1)
2023-03-04 MILLER(1)

View file

@ -178,7 +178,7 @@ now a keyword so this is no longer usable as a local-variable or UDF name.)
JSON support is improved:
* Direct support for arrays means that you can now use Miller to process more JSON files than ever before.
* Streamable JSON parsing: Miller's internal record-processing pipeline starts as soon as the first record is read (which was already the case for other file formats). This means that, unless records are wrapped with outermost `[...]`, Miller now handles JSON / JSON Lines in `tail -f` contexts like it does for other file formats.
* Streamable JSON parsing: Miller's internal record-processing pipeline starts as soon as the first record is read (which was already the case for other file formats). This means that, unless records are wrapped with outermost `[...]`, Miller now handles JSON / JSON Lines in `tail -f` contexts like it does for other file formats. (Note: use `--records-per-batch 1 for `tail -f` input.)
* Flatten/unflatten -- conversion of JSON nested data structures (arrays and/or maps in record values) to/from non-JSON formats is a powerful new feature, discussed in the page [Flatten/unflatten: JSON vs. tabular formats](flatten-unflatten.md).
* Since types are better handled now, the workaround flags `--jvquoteall` and `--jknquoteint` no longer have meaning -- although they're accepted as no-ops at the command line for backward compatibility.
* Update: `--jvquoteall` was restored shortly after 6he 6.4.0 release.

View file

@ -147,7 +147,7 @@ now a keyword so this is no longer usable as a local-variable or UDF name.)
JSON support is improved:
* Direct support for arrays means that you can now use Miller to process more JSON files than ever before.
* Streamable JSON parsing: Miller's internal record-processing pipeline starts as soon as the first record is read (which was already the case for other file formats). This means that, unless records are wrapped with outermost `[...]`, Miller now handles JSON / JSON Lines in `tail -f` contexts like it does for other file formats.
* Streamable JSON parsing: Miller's internal record-processing pipeline starts as soon as the first record is read (which was already the case for other file formats). This means that, unless records are wrapped with outermost `[...]`, Miller now handles JSON / JSON Lines in `tail -f` contexts like it does for other file formats. (Note: use `--records-per-batch 1 for `tail -f` input.)
* Flatten/unflatten -- conversion of JSON nested data structures (arrays and/or maps in record values) to/from non-JSON formats is a powerful new feature, discussed in the page [Flatten/unflatten: JSON vs. tabular formats](flatten-unflatten.md).
* Since types are better handled now, the workaround flags `--jvquoteall` and `--jknquoteint` no longer have meaning -- although they're accepted as no-ops at the command line for backward compatibility.
* Update: `--jvquoteall` was restored shortly after 6he 6.4.0 release.

View file

@ -2175,8 +2175,9 @@ Notes:
been lost.
* The combination "--implode --values --across-records" is non-streaming:
no output records are produced until all input records have been read. In
particular, this means it won't work in tail -f contexts. But all other flag
combinations result in streaming (tail -f friendly) data processing.
particular, this means it won't work in `tail -f` contexts. But all other flag
combinations result in streaming (`tail -f` friendly) data processing.
If input is coming from `tail -f`, be sure to use `--records-per-batch 1`.
* It's up to you to ensure that the nested-fs is distinct from your data's IFS:
e.g. by default the former is semicolon and the latter is comma.
See also mlr reshape.
@ -2561,7 +2562,8 @@ Wide-to-long options:
Note: if you have multiple regexes, please specify them using multiple -r,
since regexes can contain commas within them.
Note: this works with tail -f and produces output records for each input
record seen.
record seen. If input is coming from `tail -f`, be sure to use
`--records-per-batch 1`.
Long-to-wide options:
-s {key-field name,value-field name}
These pivot/reshape the input data to undo the wide-to-long operation.
@ -3118,9 +3120,10 @@ Options:
-i Use interpolated percentiles, like R's type=7; default like type=1.
Not sensical for string-valued fields.\n");
-s Print iterative stats. Useful in tail -f contexts (in which
-s Print iterative stats. Useful in tail -f contexts, in which
case please avoid pprint-format output since end of input
stream will never be seen).
stream will never be seen. Likewise, if input is coming from `tail -f`
be sure to use `--records-per-batch 1`.
-h|--help Show this message.
Example: mlr stats1 -a min,p10,p50,p90,max -f value -g size,shape
Example: mlr stats1 -a count,mode -f size
@ -3235,9 +3238,10 @@ accumulated across the input record stream.
There must be an even number of names.
-g {e,f,g} Optional group-by-field names.
-v Print additional output for linreg-pca.
-s Print iterative stats. Useful in tail -f contexts (in which
-s Print iterative stats. Useful in tail -f contexts, in which
case please avoid pprint-format output since end of input
stream will never be seen).
stream will never be seen. Likewise, if input is coming from
`tail -f`, be sure to use `--records-per-batch 1`.
--fit Rather than printing regression parameters, applies them to
the input data to compute new fit fields. All input records are
held in memory until end of input stream. Has effect only for

View file

@ -33,12 +33,12 @@ rather than looping through them explicitly.
Since Miller takes the streaming approach when possible (see below for
exceptions), you can often operate on files which are larger than your system's
memory . It also means you can do `tail -f some-file | mlr --some-flags` and
Miller will operate on records as they arrive one at a time. You don't have to
wait for and end-of-file marker (which never arrives with `tail-f`) to start
seeing partial results. This also means if you pipe Miller's output to other
streaming tools (like `cat`, `grep`, `sed`, and so on), they will also output
partial results as data arrives.
memory . It also means you can do `tail -f some-file | mlr --records-per-batch 1
--some-flags` and Miller will operate on records as they arrive one at a time.
You don't have to wait for and end-of-file marker (which never arrives with
`tail-f`) to start seeing partial results. This also means if you pipe Miller's
output to other streaming tools (like `cat`, `grep`, `sed`, and so on), they
will also output partial results as data arrives.
The statements in the [Miller programming language](miller-programming-language.md)
(outside of optional `begin`/`end` blocks which execute before and after all

View file

@ -17,12 +17,12 @@ rather than looping through them explicitly.
Since Miller takes the streaming approach when possible (see below for
exceptions), you can often operate on files which are larger than your system's
memory . It also means you can do `tail -f some-file | mlr --some-flags` and
Miller will operate on records as they arrive one at a time. You don't have to
wait for and end-of-file marker (which never arrives with `tail-f`) to start
seeing partial results. This also means if you pipe Miller's output to other
streaming tools (like `cat`, `grep`, `sed`, and so on), they will also output
partial results as data arrives.
memory . It also means you can do `tail -f some-file | mlr --records-per-batch 1
--some-flags` and Miller will operate on records as they arrive one at a time.
You don't have to wait for and end-of-file marker (which never arrives with
`tail-f`) to start seeing partial results. This also means if you pipe Miller's
output to other streaming tools (like `cat`, `grep`, `sed`, and so on), they
will also output partial results as data arrives.
The statements in the [Miller programming language](miller-programming-language.md)
(outside of optional `begin`/`end` blocks which execute before and after all

View file

@ -80,8 +80,9 @@ func transformerNestUsage(
fmt.Fprintf(o, " been lost.\n")
fmt.Fprintf(o, "* The combination \"--implode --values --across-records\" is non-streaming:\n")
fmt.Fprintf(o, " no output records are produced until all input records have been read. In\n")
fmt.Fprintf(o, " particular, this means it won't work in tail -f contexts. But all other flag\n")
fmt.Fprintf(o, " combinations result in streaming (tail -f friendly) data processing.\n")
fmt.Fprintf(o, " particular, this means it won't work in `tail -f` contexts. But all other flag\n")
fmt.Fprintf(o, " combinations result in streaming (`tail -f` friendly) data processing.\n")
fmt.Fprintf(o, " If input is coming from `tail -f`, be sure to use `--records-per-batch 1`.\n")
fmt.Fprintf(o, "* It's up to you to ensure that the nested-fs is distinct from your data's IFS:\n")
fmt.Fprintf(o, " e.g. by default the former is semicolon and the latter is comma.\n")
fmt.Fprintf(o, "See also %s reshape.\n", argv0)

View file

@ -66,7 +66,8 @@ func transformerReshapeUsage(
fmt.Fprintf(o, " Note: if you have multiple regexes, please specify them using multiple -r,\n")
fmt.Fprintf(o, " since regexes can contain commas within them.\n")
fmt.Fprintf(o, " Note: this works with tail -f and produces output records for each input\n")
fmt.Fprintf(o, " record seen.\n")
fmt.Fprintf(o, " record seen. If input is coming from `tail -f`, be sure to use\n")
fmt.Fprintf(o, " `--records-per-batch 1`.\n")
fmt.Fprintf(o, "Long-to-wide options:\n")
fmt.Fprintf(o, " -s {key-field name,value-field name}\n")
fmt.Fprintf(o, " These pivot/reshape the input data to undo the wide-to-long operation.\n")

View file

@ -55,11 +55,12 @@ Options:
-i Use interpolated percentiles, like R's type=7; default like type=1.
Not sensical for string-valued fields.\n");
-s Print iterative stats. Useful in tail -f contexts (in which
-s Print iterative stats. Useful in tail -f contexts, in which
case please avoid pprint-format output since end of input
stream will never be seen).
-h|--help Show this message.
`)
fmt.Fprintln(o, " stream will never be seen. Likewise, if input is coming from `tail -f`")
fmt.Fprintln(o, " be sure to use `--records-per-batch 1`.")
fmt.Fprintln(o, "-h|--help Show this message.")
fmt.Fprintln(o,
"Example: mlr stats1 -a min,p10,p50,p90,max -f value -g size,shape")

View file

@ -44,9 +44,10 @@ func transformerStats2Usage(
fmt.Fprintf(o, " There must be an even number of names.\n")
fmt.Fprintf(o, "-g {e,f,g} Optional group-by-field names.\n")
fmt.Fprintf(o, "-v Print additional output for linreg-pca.\n")
fmt.Fprintf(o, "-s Print iterative stats. Useful in tail -f contexts (in which\n")
fmt.Fprintf(o, "-s Print iterative stats. Useful in tail -f contexts, in which\n")
fmt.Fprintf(o, " case please avoid pprint-format output since end of input\n")
fmt.Fprintf(o, " stream will never be seen).\n")
fmt.Fprintf(o, " stream will never be seen. Likewise, if input is coming from\n")
fmt.Fprintf(o, " `tail -f`, be sure to use `--records-per-batch 1`.\n")
fmt.Fprintf(o, "--fit Rather than printing regression parameters, applies them to\n")
fmt.Fprintf(o, " the input data to compute new fit fields. All input records are\n")
fmt.Fprintf(o, " held in memory until end of input stream. Has effect only for\n")

View file

@ -1429,8 +1429,9 @@ MILLER(1) MILLER(1)
been lost.
* The combination "--implode --values --across-records" is non-streaming:
no output records are produced until all input records have been read. In
particular, this means it won't work in tail -f contexts. But all other flag
combinations result in streaming (tail -f friendly) data processing.
particular, this means it won't work in `tail -f` contexts. But all other flag
combinations result in streaming (`tail -f` friendly) data processing.
If input is coming from `tail -f`, be sure to use `--records-per-batch 1`.
* It's up to you to ensure that the nested-fs is distinct from your data's IFS:
e.g. by default the former is semicolon and the latter is comma.
See also mlr reshape.
@ -1612,7 +1613,8 @@ MILLER(1) MILLER(1)
Note: if you have multiple regexes, please specify them using multiple -r,
since regexes can contain commas within them.
Note: this works with tail -f and produces output records for each input
record seen.
record seen. If input is coming from `tail -f`, be sure to use
`--records-per-batch 1`.
Long-to-wide options:
-s {key-field name,value-field name}
These pivot/reshape the input data to undo the wide-to-long operation.
@ -1837,9 +1839,10 @@ MILLER(1) MILLER(1)
-i Use interpolated percentiles, like R's type=7; default like type=1.
Not sensical for string-valued fields.\n");
-s Print iterative stats. Useful in tail -f contexts (in which
-s Print iterative stats. Useful in tail -f contexts, in which
case please avoid pprint-format output since end of input
stream will never be seen).
stream will never be seen. Likewise, if input is coming from `tail -f`
be sure to use `--records-per-batch 1`.
-h|--help Show this message.
Example: mlr stats1 -a min,p10,p50,p90,max -f value -g size,shape
Example: mlr stats1 -a count,mode -f size
@ -1875,9 +1878,10 @@ MILLER(1) MILLER(1)
There must be an even number of names.
-g {e,f,g} Optional group-by-field names.
-v Print additional output for linreg-pca.
-s Print iterative stats. Useful in tail -f contexts (in which
-s Print iterative stats. Useful in tail -f contexts, in which
case please avoid pprint-format output since end of input
stream will never be seen).
stream will never be seen. Likewise, if input is coming from
`tail -f`, be sure to use `--records-per-batch 1`.
--fit Rather than printing regression parameters, applies them to
the input data to compute new fit fields. All input records are
held in memory until end of input stream. Has effect only for
@ -3293,4 +3297,4 @@ MILLER(1) MILLER(1)
2023-03-02 MILLER(1)
2023-03-04 MILLER(1)

View file

@ -2,12 +2,12 @@
.\" Title: mlr
.\" Author: [see the "AUTHOR" section]
.\" Generator: ./mkman.rb
.\" Date: 2023-03-02
.\" Date: 2023-03-04
.\" Manual: \ \&
.\" Source: \ \&
.\" Language: English
.\"
.TH "MILLER" "1" "2023-03-02" "\ \&" "\ \&"
.TH "MILLER" "1" "2023-03-04" "\ \&" "\ \&"
.\" -----------------------------------------------------------------
.\" * Portability definitions
.\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@ -1798,8 +1798,9 @@ Notes:
been lost.
* The combination "--implode --values --across-records" is non-streaming:
no output records are produced until all input records have been read. In
particular, this means it won't work in tail -f contexts. But all other flag
combinations result in streaming (tail -f friendly) data processing.
particular, this means it won't work in `tail -f` contexts. But all other flag
combinations result in streaming (`tail -f` friendly) data processing.
If input is coming from `tail -f`, be sure to use `--records-per-batch 1`.
* It's up to you to ensure that the nested-fs is distinct from your data's IFS:
e.g. by default the former is semicolon and the latter is comma.
See also mlr reshape.
@ -2029,7 +2030,8 @@ Wide-to-long options:
Note: if you have multiple regexes, please specify them using multiple -r,
since regexes can contain commas within them.
Note: this works with tail -f and produces output records for each input
record seen.
record seen. If input is coming from `tail -f`, be sure to use
`--records-per-batch 1`.
Long-to-wide options:
-s {key-field name,value-field name}
These pivot/reshape the input data to undo the wide-to-long operation.
@ -2314,9 +2316,10 @@ Options:
-i Use interpolated percentiles, like R's type=7; default like type=1.
Not sensical for string-valued fields.\en");
-s Print iterative stats. Useful in tail -f contexts (in which
-s Print iterative stats. Useful in tail -f contexts, in which
case please avoid pprint-format output since end of input
stream will never be seen).
stream will never be seen. Likewise, if input is coming from `tail -f`
be sure to use `--records-per-batch 1`.
-h|--help Show this message.
Example: mlr stats1 -a min,p10,p50,p90,max -f value -g size,shape
Example: mlr stats1 -a count,mode -f size
@ -2358,9 +2361,10 @@ accumulated across the input record stream.
There must be an even number of names.
-g {e,f,g} Optional group-by-field names.
-v Print additional output for linreg-pca.
-s Print iterative stats. Useful in tail -f contexts (in which
-s Print iterative stats. Useful in tail -f contexts, in which
case please avoid pprint-format output since end of input
stream will never be seen).
stream will never be seen. Likewise, if input is coming from
`tail -f`, be sure to use `--records-per-batch 1`.
--fit Rather than printing regression parameters, applies them to
the input data to compute new fit fields. All input records are
held in memory until end of input stream. Has effect only for

View file

@ -609,8 +609,9 @@ Notes:
been lost.
* The combination "--implode --values --across-records" is non-streaming:
no output records are produced until all input records have been read. In
particular, this means it won't work in tail -f contexts. But all other flag
combinations result in streaming (tail -f friendly) data processing.
particular, this means it won't work in `tail -f` contexts. But all other flag
combinations result in streaming (`tail -f` friendly) data processing.
If input is coming from `tail -f`, be sure to use `--records-per-batch 1`.
* It's up to you to ensure that the nested-fs is distinct from your data's IFS:
e.g. by default the former is semicolon and the latter is comma.
See also mlr reshape.
@ -800,7 +801,8 @@ Wide-to-long options:
Note: if you have multiple regexes, please specify them using multiple -r,
since regexes can contain commas within them.
Note: this works with tail -f and produces output records for each input
record seen.
record seen. If input is coming from `tail -f`, be sure to use
`--records-per-batch 1`.
Long-to-wide options:
-s {key-field name,value-field name}
These pivot/reshape the input data to undo the wide-to-long operation.
@ -1035,9 +1037,10 @@ Options:
-i Use interpolated percentiles, like R's type=7; default like type=1.
Not sensical for string-valued fields.\n");
-s Print iterative stats. Useful in tail -f contexts (in which
-s Print iterative stats. Useful in tail -f contexts, in which
case please avoid pprint-format output since end of input
stream will never be seen).
stream will never be seen. Likewise, if input is coming from `tail -f`
be sure to use `--records-per-batch 1`.
-h|--help Show this message.
Example: mlr stats1 -a min,p10,p50,p90,max -f value -g size,shape
Example: mlr stats1 -a count,mode -f size
@ -1074,9 +1077,10 @@ accumulated across the input record stream.
There must be an even number of names.
-g {e,f,g} Optional group-by-field names.
-v Print additional output for linreg-pca.
-s Print iterative stats. Useful in tail -f contexts (in which
-s Print iterative stats. Useful in tail -f contexts, in which
case please avoid pprint-format output since end of input
stream will never be seen).
stream will never be seen. Likewise, if input is coming from
`tail -f`, be sure to use `--records-per-batch 1`.
--fit Rather than printing regression parameters, applies them to
the input data to compute new fit fields. All input records are
held in memory until end of input stream. Has effect only for