mirror of
https://github.com/johnkerl/miller.git
synced 2026-01-23 02:14:13 +00:00
Support ZSTD compression in-process (#1360)
* Support ZSTD compression in-process * doc mods * unit-test cases * doc-gen artifacts
This commit is contained in:
parent
8b22708c27
commit
d4a3bf99b2
27 changed files with 130 additions and 28 deletions
|
|
@ -50,7 +50,7 @@ and the `--csv` part will automatically be understood. If you do want to process
|
|||
|
||||
* You can include any command-line flags, except the "terminal" ones such as `--help`.
|
||||
|
||||
* The `--prepipe`, `--load`, and `--mload` flags aren't allowed in `.mlrrc` as they control code execution, and could result in your scripts running things you don't expect if you receive data from someone with a `./.mlrrc` in it. You can use `--prepipe-bz2`, `--prepipe-gunzip`, and `--prepipe-zcat` in `.mlrrc`, though.
|
||||
* The `--prepipe`, `--load`, and `--mload` flags aren't allowed in `.mlrrc` as they control code execution, and could result in your scripts running things you don't expect if you receive data from someone with a `./.mlrrc` in it. You can use `--prepipe-bz2`, `--prepipe-gunzip`, `--prepipe-zcat`, and `--prepipe-zstdcat` in `.mlrrc`, though.
|
||||
|
||||
* The formatting rule is you need to put one flag beginning with `--` per line: for example, `--csv` on one line and `--nr-progress-mod 1000` on a separate line.
|
||||
|
||||
|
|
|
|||
|
|
@ -34,7 +34,7 @@ and the `--csv` part will automatically be understood. If you do want to process
|
|||
|
||||
* You can include any command-line flags, except the "terminal" ones such as `--help`.
|
||||
|
||||
* The `--prepipe`, `--load`, and `--mload` flags aren't allowed in `.mlrrc` as they control code execution, and could result in your scripts running things you don't expect if you receive data from someone with a `./.mlrrc` in it. You can use `--prepipe-bz2`, `--prepipe-gunzip`, and `--prepipe-zcat` in `.mlrrc`, though.
|
||||
* The `--prepipe`, `--load`, and `--mload` flags aren't allowed in `.mlrrc` as they control code execution, and could result in your scripts running things you don't expect if you receive data from someone with a `./.mlrrc` in it. You can use `--prepipe-bz2`, `--prepipe-gunzip`, `--prepipe-zcat`, and `--prepipe-zstdcat` in `.mlrrc`, though.
|
||||
|
||||
* The formatting rule is you need to put one flag beginning with `--` per line: for example, `--csv` on one line and `--nr-progress-mod 1000` on a separate line.
|
||||
|
||||
|
|
|
|||
|
|
@ -905,3 +905,8 @@ See also the [arrays page](reference-main-arrays.md), as well as the page on
|
|||
|
||||
A [data-compression format supported by Miller](reference-main-compressed-data.md).
|
||||
Files compressed using ZLIB compression normally end in `.z`.
|
||||
|
||||
## ZSTD / .zst
|
||||
|
||||
A [data-compression format supported by Miller](reference-main-compressed-data.md).
|
||||
Files compressed using ZSTD compression normally end in`.zst`.
|
||||
|
|
|
|||
|
|
@ -889,3 +889,8 @@ See also the [arrays page](reference-main-arrays.md), as well as the page on
|
|||
|
||||
A [data-compression format supported by Miller](reference-main-compressed-data.md).
|
||||
Files compressed using ZLIB compression normally end in `.z`.
|
||||
|
||||
## ZSTD / .zst
|
||||
|
||||
A [data-compression format supported by Miller](reference-main-compressed-data.md).
|
||||
Files compressed using ZSTD compression normally end in`.zst`.
|
||||
|
|
|
|||
|
|
@ -262,7 +262,7 @@ MILLER(1) MILLER(1)
|
|||
Miller offers a few different ways to handle reading data files
|
||||
which have been compressed.
|
||||
|
||||
* Decompression done within the Miller process itself: `--bz2in` `--gzin` `--zin`
|
||||
* Decompression done within the Miller process itself: `--bz2in` `--gzin` `--zin``--zstdin`
|
||||
* Decompression done outside the Miller process: `--prepipe` `--prepipex`
|
||||
|
||||
Using `--prepipe` and `--prepipex` you can specify an action to be
|
||||
|
|
@ -285,7 +285,7 @@ MILLER(1) MILLER(1)
|
|||
|
||||
Lastly, note that if `--prepipe` or `--prepipex` is specified, it replaces any
|
||||
decisions that might have been made based on the file suffix. Likewise,
|
||||
`--gzin`/`--bz2in`/`--zin` are ignored if `--prepipe` is also specified.
|
||||
`--gzin`/`--bz2in`/`--zin``--zin` are ignored if `--prepipe` is also specified.
|
||||
|
||||
--bz2in Uncompress bzip2 within the Miller process. Done by
|
||||
default if file ends in `.bz2`.
|
||||
|
|
@ -302,6 +302,8 @@ MILLER(1) MILLER(1)
|
|||
`.mlrrc`.
|
||||
--prepipe-zcat Same as `--prepipe zcat`, except this is allowed in
|
||||
`.mlrrc`.
|
||||
--prepipe-zstdcat Same as `--prepipe zstdcat`, except this is allowed
|
||||
in `.mlrrc`.
|
||||
--prepipex {decompression command}
|
||||
Like `--prepipe` with one exception: doesn't insert
|
||||
`<` between command and filename at runtime. Useful
|
||||
|
|
@ -310,6 +312,8 @@ MILLER(1) MILLER(1)
|
|||
in `.mlrrc` to avoid unexpected code execution.
|
||||
--zin Uncompress zlib within the Miller process. Done by
|
||||
default if file ends in `.z`.
|
||||
--zstdin Uncompress zstd within the Miller process. Done by
|
||||
default if file ends in `.zstd`.
|
||||
|
||||
1mCSV/TSV-ONLY FLAGS0m
|
||||
These are flags which are applicable to CSV format.
|
||||
|
|
|
|||
|
|
@ -241,7 +241,7 @@ MILLER(1) MILLER(1)
|
|||
Miller offers a few different ways to handle reading data files
|
||||
which have been compressed.
|
||||
|
||||
* Decompression done within the Miller process itself: `--bz2in` `--gzin` `--zin`
|
||||
* Decompression done within the Miller process itself: `--bz2in` `--gzin` `--zin``--zstdin`
|
||||
* Decompression done outside the Miller process: `--prepipe` `--prepipex`
|
||||
|
||||
Using `--prepipe` and `--prepipex` you can specify an action to be
|
||||
|
|
@ -264,7 +264,7 @@ MILLER(1) MILLER(1)
|
|||
|
||||
Lastly, note that if `--prepipe` or `--prepipex` is specified, it replaces any
|
||||
decisions that might have been made based on the file suffix. Likewise,
|
||||
`--gzin`/`--bz2in`/`--zin` are ignored if `--prepipe` is also specified.
|
||||
`--gzin`/`--bz2in`/`--zin``--zin` are ignored if `--prepipe` is also specified.
|
||||
|
||||
--bz2in Uncompress bzip2 within the Miller process. Done by
|
||||
default if file ends in `.bz2`.
|
||||
|
|
@ -281,6 +281,8 @@ MILLER(1) MILLER(1)
|
|||
`.mlrrc`.
|
||||
--prepipe-zcat Same as `--prepipe zcat`, except this is allowed in
|
||||
`.mlrrc`.
|
||||
--prepipe-zstdcat Same as `--prepipe zstdcat`, except this is allowed
|
||||
in `.mlrrc`.
|
||||
--prepipex {decompression command}
|
||||
Like `--prepipe` with one exception: doesn't insert
|
||||
`<` between command and filename at runtime. Useful
|
||||
|
|
@ -289,6 +291,8 @@ MILLER(1) MILLER(1)
|
|||
in `.mlrrc` to avoid unexpected code execution.
|
||||
--zin Uncompress zlib within the Miller process. Done by
|
||||
default if file ends in `.z`.
|
||||
--zstdin Uncompress zstd within the Miller process. Done by
|
||||
default if file ends in `.zstd`.
|
||||
|
||||
1mCSV/TSV-ONLY FLAGS0m
|
||||
These are flags which are applicable to CSV format.
|
||||
|
|
|
|||
|
|
@ -143,7 +143,7 @@ the `TZ` environment variable. Please see [DSL datetime/timezone functions](refe
|
|||
|
||||
### In-process support for compressed input
|
||||
|
||||
In addition to `--prepipe gunzip`, you can now use the `--gzin` flag. In fact, if your files end in `.gz` you don't even need to do that -- Miller will autodetect by file extension and automatically uncompress `mlr --csv cat foo.csv.gz`. Similarly for `.z` and `.bz2` files. Please see the page on [Compressed data](reference-main-compressed-data.md) for more information.
|
||||
In addition to `--prepipe gunzip`, you can now use the `--gzin` flag. In fact, if your files end in `.gz` you don't even need to do that -- Miller will autodetect by file extension and automatically uncompress `mlr --csv cat foo.csv.gz`. Similarly for `.z`, `.bz2`, and `.zst` files. Please see the page on [Compressed data](reference-main-compressed-data.md) for more information.
|
||||
|
||||
### Support for reading web URLs
|
||||
|
||||
|
|
|
|||
|
|
@ -125,7 +125,7 @@ the `TZ` environment variable. Please see [DSL datetime/timezone functions](refe
|
|||
|
||||
### In-process support for compressed input
|
||||
|
||||
In addition to `--prepipe gunzip`, you can now use the `--gzin` flag. In fact, if your files end in `.gz` you don't even need to do that -- Miller will autodetect by file extension and automatically uncompress `mlr --csv cat foo.csv.gz`. Similarly for `.z` and `.bz2` files. Please see the page on [Compressed data](reference-main-compressed-data.md) for more information.
|
||||
In addition to `--prepipe gunzip`, you can now use the `--gzin` flag. In fact, if your files end in `.gz` you don't even need to do that -- Miller will autodetect by file extension and automatically uncompress `mlr --csv cat foo.csv.gz`. Similarly for `.z`, `.bz2`, and `.zst` files. Please see the page on [Compressed data](reference-main-compressed-data.md) for more information.
|
||||
|
||||
### Support for reading web URLs
|
||||
|
||||
|
|
|
|||
|
|
@ -16,13 +16,13 @@ Quick links:
|
|||
</div>
|
||||
# Compressed data
|
||||
|
||||
As of [Miller 6](new-in-miller-6.md), Miller supports reading GZIP, BZIP2, and
|
||||
ZLIB formats transparently, and in-process. And (as before Miller 6) you have a
|
||||
As of [Miller 6](new-in-miller-6.md), Miller supports reading GZIP, BZIP2, ZLIB, and
|
||||
ZSTD formats transparently, and in-process. And (as before Miller 6) you have a
|
||||
more general `--prepipe` option to support other decompression programs.
|
||||
|
||||
## Automatic detection on input
|
||||
|
||||
If your files end in `.gz`, `.bz2`, or `.z` then Miller will autodetect by file extension:
|
||||
If your files end in `.gz`, `.bz2`, `.z`, or `.zst` then Miller will autodetect by file extension:
|
||||
|
||||
<pre class="pre-highlight-in-pair">
|
||||
<b>file gz-example.csv.gz</b>
|
||||
|
|
@ -52,7 +52,7 @@ This will decompress the input data on the fly, while leaving the disk file unmo
|
|||
|
||||
## Manual detection on input
|
||||
|
||||
If the filename doesn't in in `.gz`, `.bz2`, or `.z` then you can use the flags `--gzin`, `--bz2in`, or `--zin` to let Miller know:
|
||||
If the filename doesn't in in `.gz`, `.bz2`, `-z`, or `.zst` then you can use the flags `--gzin`, `--bz2in`, `--zin`, or `--zstdin` to let Miller know:
|
||||
|
||||
<pre class="pre-highlight-non-pair">
|
||||
<b>mlr --csv --gzin sort -f color myfile.bin # myfile.bin has gzip contents</b>
|
||||
|
|
@ -94,7 +94,7 @@ If the command has flags, quote them: e.g. `mlr --prepipe 'zcat -cf'`.
|
|||
|
||||
In your [.mlrrc file](customization.md), `--prepipe` and `--prepipex` are not
|
||||
allowed as they could be used for unexpected code execution. You can use
|
||||
`--prepipe-bz2`, `--prepipe-gunzip`, and `--prepipe-zcat` in `.mlrrc`, though.
|
||||
`--prepipe-bz2`, `--prepipe-gunzip`, `--prepipe-zcat`, and `--prepipe-zstdcat` in `.mlrrc`, though.
|
||||
|
||||
Note that this feature is quite general and is not limited to decompression
|
||||
utilities. You can use it to apply per-file filters of your choice: e.g. `mlr
|
||||
|
|
@ -107,7 +107,7 @@ There is a `--prepipe` and a `--prepipex`:
|
|||
|
||||
Lastly, note that if `--prepipe` or `--prepipex` is specified on the Miller
|
||||
command line, it replaces any autodetect decisions that might have been made
|
||||
based on the filename extension. Likewise, `--gzin`/`--bz2in`/`--zin` are ignored if
|
||||
based on the filename extension. Likewise, `--gzin`/`--bz2in`/`--zin`/`--zstdin` are ignored if
|
||||
`--prepipe` or `--prepipex` is also specified.
|
||||
|
||||
## Compressed output
|
||||
|
|
|
|||
|
|
@ -1,12 +1,12 @@
|
|||
# Compressed data
|
||||
|
||||
As of [Miller 6](new-in-miller-6.md), Miller supports reading GZIP, BZIP2, and
|
||||
ZLIB formats transparently, and in-process. And (as before Miller 6) you have a
|
||||
As of [Miller 6](new-in-miller-6.md), Miller supports reading GZIP, BZIP2, ZLIB, and
|
||||
ZSTD formats transparently, and in-process. And (as before Miller 6) you have a
|
||||
more general `--prepipe` option to support other decompression programs.
|
||||
|
||||
## Automatic detection on input
|
||||
|
||||
If your files end in `.gz`, `.bz2`, or `.z` then Miller will autodetect by file extension:
|
||||
If your files end in `.gz`, `.bz2`, `.z`, or `.zst` then Miller will autodetect by file extension:
|
||||
|
||||
GENMD-CARDIFY-HIGHLIGHT-ONE
|
||||
file gz-example.csv.gz
|
||||
|
|
@ -21,7 +21,7 @@ This will decompress the input data on the fly, while leaving the disk file unmo
|
|||
|
||||
## Manual detection on input
|
||||
|
||||
If the filename doesn't in in `.gz`, `.bz2`, or `.z` then you can use the flags `--gzin`, `--bz2in`, or `--zin` to let Miller know:
|
||||
If the filename doesn't in in `.gz`, `.bz2`, `-z`, or `.zst` then you can use the flags `--gzin`, `--bz2in`, `--zin`, or `--zstdin` to let Miller know:
|
||||
|
||||
GENMD-CARDIFY-HIGHLIGHT-ONE
|
||||
mlr --csv --gzin sort -f color myfile.bin # myfile.bin has gzip contents
|
||||
|
|
@ -50,7 +50,7 @@ If the command has flags, quote them: e.g. `mlr --prepipe 'zcat -cf'`.
|
|||
|
||||
In your [.mlrrc file](customization.md), `--prepipe` and `--prepipex` are not
|
||||
allowed as they could be used for unexpected code execution. You can use
|
||||
`--prepipe-bz2`, `--prepipe-gunzip`, and `--prepipe-zcat` in `.mlrrc`, though.
|
||||
`--prepipe-bz2`, `--prepipe-gunzip`, `--prepipe-zcat`, and `--prepipe-zstdcat` in `.mlrrc`, though.
|
||||
|
||||
Note that this feature is quite general and is not limited to decompression
|
||||
utilities. You can use it to apply per-file filters of your choice: e.g. `mlr
|
||||
|
|
@ -63,7 +63,7 @@ There is a `--prepipe` and a `--prepipex`:
|
|||
|
||||
Lastly, note that if `--prepipe` or `--prepipex` is specified on the Miller
|
||||
command line, it replaces any autodetect decisions that might have been made
|
||||
based on the filename extension. Likewise, `--gzin`/`--bz2in`/`--zin` are ignored if
|
||||
based on the filename extension. Likewise, `--gzin`/`--bz2in`/`--zin`/`--zstdin` are ignored if
|
||||
`--prepipe` or `--prepipex` is also specified.
|
||||
|
||||
## Compressed output
|
||||
|
|
|
|||
|
|
@ -72,7 +72,7 @@ Notes:
|
|||
Miller offers a few different ways to handle reading data files
|
||||
which have been compressed.
|
||||
|
||||
* Decompression done within the Miller process itself: `--bz2in` `--gzin` `--zin`
|
||||
* Decompression done within the Miller process itself: `--bz2in` `--gzin` `--zin``--zstdin`
|
||||
* Decompression done outside the Miller process: `--prepipe` `--prepipex`
|
||||
|
||||
Using `--prepipe` and `--prepipex` you can specify an action to be
|
||||
|
|
@ -95,7 +95,7 @@ compression (or other) utilities, simply pipe the output:
|
|||
|
||||
Lastly, note that if `--prepipe` or `--prepipex` is specified, it replaces any
|
||||
decisions that might have been made based on the file suffix. Likewise,
|
||||
`--gzin`/`--bz2in`/`--zin` are ignored if `--prepipe` is also specified.
|
||||
`--gzin`/`--bz2in`/`--zin``--zin` are ignored if `--prepipe` is also specified.
|
||||
|
||||
|
||||
**Flags:**
|
||||
|
|
@ -106,8 +106,10 @@ decisions that might have been made based on the file suffix. Likewise,
|
|||
* `--prepipe-bz2`: Same as `--prepipe bz2`, except this is allowed in `.mlrrc`.
|
||||
* `--prepipe-gunzip`: Same as `--prepipe gunzip`, except this is allowed in `.mlrrc`.
|
||||
* `--prepipe-zcat`: Same as `--prepipe zcat`, except this is allowed in `.mlrrc`.
|
||||
* `--prepipe-zstdcat`: Same as `--prepipe zstdcat`, except this is allowed in `.mlrrc`.
|
||||
* `--prepipex {decompression command}`: Like `--prepipe` with one exception: doesn't insert `<` between command and filename at runtime. Useful for some commands like `unzip -qc` which don't read standard input. Allowed at the command line, but not in `.mlrrc` to avoid unexpected code execution.
|
||||
* `--zin`: Uncompress zlib within the Miller process. Done by default if file ends in `.z`.
|
||||
* `--zstdin`: Uncompress zstd within the Miller process. Done by default if file ends in `.zstd`.
|
||||
|
||||
## CSV/TSV-only flags
|
||||
|
||||
|
|
|
|||
1
go.mod
1
go.mod
|
|
@ -34,6 +34,7 @@ require (
|
|||
github.com/davecgh/go-spew v1.1.1 // indirect
|
||||
github.com/felixge/fgprof v0.9.3 // indirect
|
||||
github.com/google/pprof v0.0.0-20211214055906-6f57359322fd // indirect
|
||||
github.com/klauspost/compress v1.16.7 // indirect
|
||||
github.com/pkg/errors v0.9.1 // indirect
|
||||
github.com/pmezard/go-difflib v1.0.0 // indirect
|
||||
gopkg.in/yaml.v3 v3.0.1 // indirect
|
||||
|
|
|
|||
2
go.sum
2
go.sum
|
|
@ -15,6 +15,8 @@ github.com/johnkerl/lumin v1.0.0 h1:CV34cHZOJ92Y02RbQ0rd4gA0C06Qck9q8blOyaPoWpU=
|
|||
github.com/johnkerl/lumin v1.0.0/go.mod h1:eLf5AdQOaLvzZ2zVy4REr/DSeEwG+CZreHwNLICqv9E=
|
||||
github.com/kballard/go-shellquote v0.0.0-20180428030007-95032a82bc51 h1:Z9n2FFNUXsshfwJMBgNA0RU6/i7WVaAegv3PtuIHPMs=
|
||||
github.com/kballard/go-shellquote v0.0.0-20180428030007-95032a82bc51/go.mod h1:CzGEWj7cYgsdH8dAjBGEr58BoE7ScuLd+fwFZ44+/x8=
|
||||
github.com/klauspost/compress v1.16.7 h1:2mk3MPGNzKyxErAw8YaohYh69+pa4sIQSC0fPGCFR9I=
|
||||
github.com/klauspost/compress v1.16.7/go.mod h1:ntbaceVETuRiXiv4DpjP66DpAtAGkEQskQzEyD//IeE=
|
||||
github.com/lestrrat-go/envload v0.0.0-20180220234015-a3eb8ddeffcc h1:RKf14vYWi2ttpEmkA4aQ3j4u9dStX2t4M8UM6qqNsG8=
|
||||
github.com/lestrrat-go/envload v0.0.0-20180220234015-a3eb8ddeffcc/go.mod h1:kopuH9ugFRkIXf3YoqHKyrJ9YfUFsckUU9S7B+XP+is=
|
||||
github.com/lestrrat-go/strftime v1.0.6 h1:CFGsDEt1pOpFNU+TJB0nhz9jl+K0hZSLE205AhTIGQQ=
|
||||
|
|
|
|||
|
|
@ -2200,7 +2200,8 @@ func CompressedDataPrintInfo() {
|
|||
fmt.Print(`Miller offers a few different ways to handle reading data files
|
||||
which have been compressed.
|
||||
|
||||
* Decompression done within the Miller process itself: ` + "`--bz2in`" + ` ` + "`--gzin`" + ` ` + "`--zin`" + `
|
||||
* Decompression done within the Miller process itself: ` + "`--bz2in`" + ` ` + "`--gzin`" + ` ` + "`--zin`" + "`--zstdin`" +
|
||||
`
|
||||
* Decompression done outside the Miller process: ` + "`--prepipe`" + ` ` + "`--prepipex`" + `
|
||||
|
||||
Using ` + "`--prepipe`" + ` and ` + "`--prepipex`" + ` you can specify an action to be
|
||||
|
|
@ -2223,7 +2224,7 @@ compression (or other) utilities, simply pipe the output:
|
|||
|
||||
Lastly, note that if ` + "`--prepipe`" + ` or ` + "`--prepipex`" + ` is specified, it replaces any
|
||||
decisions that might have been made based on the file suffix. Likewise,
|
||||
` + "`--gzin`" + `/` + "`--bz2in`" + `/` + "`--zin`" + ` are ignored if ` + "`--prepipe`" + ` is also specified.
|
||||
` + "`--gzin`" + `/` + "`--bz2in`" + `/` + "`--zin`" + "`--zin`" + ` are ignored if ` + "`--prepipe`" + ` is also specified.
|
||||
`)
|
||||
}
|
||||
|
||||
|
|
@ -2278,6 +2279,16 @@ var CompressedDataFlagSection = FlagSection{
|
|||
},
|
||||
},
|
||||
|
||||
{
|
||||
name: "--prepipe-zstdcat",
|
||||
help: "Same as `--prepipe zstdcat`, except this is allowed in `.mlrrc`.",
|
||||
parser: func(args []string, argc int, pargi *int, options *TOptions) {
|
||||
options.ReaderOptions.Prepipe = "zstdcat"
|
||||
options.ReaderOptions.PrepipeIsRaw = false
|
||||
*pargi += 1
|
||||
},
|
||||
},
|
||||
|
||||
{
|
||||
name: "--prepipe-bz2",
|
||||
help: "Same as `--prepipe bz2`, except this is allowed in `.mlrrc`.",
|
||||
|
|
@ -2314,6 +2325,15 @@ var CompressedDataFlagSection = FlagSection{
|
|||
*pargi += 1
|
||||
},
|
||||
},
|
||||
|
||||
{
|
||||
name: "--zstdin",
|
||||
help: "Uncompress zstd within the Miller process. Done by default if file ends in `.zstd`.",
|
||||
parser: func(args []string, argc int, pargi *int, options *TOptions) {
|
||||
options.ReaderOptions.FileInputEncoding = lib.FileInputEncodingZstd
|
||||
*pargi += 1
|
||||
},
|
||||
},
|
||||
},
|
||||
}
|
||||
|
||||
|
|
|
|||
|
|
@ -25,6 +25,7 @@ import (
|
|||
"compress/gzip"
|
||||
"compress/zlib"
|
||||
"fmt"
|
||||
"github.com/klauspost/compress/zstd"
|
||||
"io"
|
||||
"net/http"
|
||||
"os"
|
||||
|
|
@ -38,6 +39,7 @@ const (
|
|||
FileInputEncodingBzip2
|
||||
FileInputEncodingGzip
|
||||
FileInputEncodingZlib
|
||||
FileInputEncodingZstd
|
||||
)
|
||||
|
||||
// OpenFileForRead: If prepipe is non-empty, popens "{prepipe} < {filename}"
|
||||
|
|
@ -160,6 +162,8 @@ func openEncodedHandleForRead(
|
|||
return gzip.NewReader(handle)
|
||||
case FileInputEncodingZlib:
|
||||
return zlib.NewReader(handle)
|
||||
case FileInputEncodingZstd:
|
||||
return NewZstdReadCloser(handle)
|
||||
}
|
||||
|
||||
InternalCodingErrorIf(encoding != FileInputEncodingDefault)
|
||||
|
|
@ -173,6 +177,9 @@ func openEncodedHandleForRead(
|
|||
if strings.HasSuffix(filename, ".z") {
|
||||
return zlib.NewReader(handle)
|
||||
}
|
||||
if strings.HasSuffix(filename, ".zst") {
|
||||
return NewZstdReadCloser(handle)
|
||||
}
|
||||
|
||||
// Pass along os.Stdin or os.Open(filename)
|
||||
return handle, nil
|
||||
|
|
@ -200,6 +207,32 @@ func (rc *BZip2ReadCloser) Close() error {
|
|||
return rc.originalHandle.Close()
|
||||
}
|
||||
|
||||
// ----------------------------------------------------------------
|
||||
// ZstdReadCloser remedies the fact that zstd.NewReader does not implement io.ReadCloser.
|
||||
type ZstdReadCloser struct {
|
||||
originalHandle io.ReadCloser
|
||||
zstdHandle io.Reader
|
||||
}
|
||||
|
||||
func NewZstdReadCloser(handle io.ReadCloser) (*ZstdReadCloser, error) {
|
||||
zstdHandle, err := zstd.NewReader(handle)
|
||||
if err != nil {
|
||||
return nil, err
|
||||
}
|
||||
return &ZstdReadCloser{
|
||||
originalHandle: handle,
|
||||
zstdHandle: zstdHandle,
|
||||
}, nil
|
||||
}
|
||||
|
||||
func (rc *ZstdReadCloser) Read(p []byte) (n int, err error) {
|
||||
return rc.zstdHandle.Read(p)
|
||||
}
|
||||
|
||||
func (rc *ZstdReadCloser) Close() error {
|
||||
return rc.originalHandle.Close()
|
||||
}
|
||||
|
||||
// ----------------------------------------------------------------
|
||||
|
||||
// IsEOF handles the following problem: reading past end of files opened with
|
||||
|
|
|
|||
|
|
@ -241,7 +241,7 @@ MILLER(1) MILLER(1)
|
|||
Miller offers a few different ways to handle reading data files
|
||||
which have been compressed.
|
||||
|
||||
* Decompression done within the Miller process itself: `--bz2in` `--gzin` `--zin`
|
||||
* Decompression done within the Miller process itself: `--bz2in` `--gzin` `--zin``--zstdin`
|
||||
* Decompression done outside the Miller process: `--prepipe` `--prepipex`
|
||||
|
||||
Using `--prepipe` and `--prepipex` you can specify an action to be
|
||||
|
|
@ -264,7 +264,7 @@ MILLER(1) MILLER(1)
|
|||
|
||||
Lastly, note that if `--prepipe` or `--prepipex` is specified, it replaces any
|
||||
decisions that might have been made based on the file suffix. Likewise,
|
||||
`--gzin`/`--bz2in`/`--zin` are ignored if `--prepipe` is also specified.
|
||||
`--gzin`/`--bz2in`/`--zin``--zin` are ignored if `--prepipe` is also specified.
|
||||
|
||||
--bz2in Uncompress bzip2 within the Miller process. Done by
|
||||
default if file ends in `.bz2`.
|
||||
|
|
@ -281,6 +281,8 @@ MILLER(1) MILLER(1)
|
|||
`.mlrrc`.
|
||||
--prepipe-zcat Same as `--prepipe zcat`, except this is allowed in
|
||||
`.mlrrc`.
|
||||
--prepipe-zstdcat Same as `--prepipe zstdcat`, except this is allowed
|
||||
in `.mlrrc`.
|
||||
--prepipex {decompression command}
|
||||
Like `--prepipe` with one exception: doesn't insert
|
||||
`<` between command and filename at runtime. Useful
|
||||
|
|
@ -289,6 +291,8 @@ MILLER(1) MILLER(1)
|
|||
in `.mlrrc` to avoid unexpected code execution.
|
||||
--zin Uncompress zlib within the Miller process. Done by
|
||||
default if file ends in `.z`.
|
||||
--zstdin Uncompress zstd within the Miller process. Done by
|
||||
default if file ends in `.zstd`.
|
||||
|
||||
1mCSV/TSV-ONLY FLAGS0m
|
||||
These are flags which are applicable to CSV format.
|
||||
|
|
|
|||
|
|
@ -304,7 +304,7 @@ Notes:
|
|||
Miller offers a few different ways to handle reading data files
|
||||
which have been compressed.
|
||||
|
||||
* Decompression done within the Miller process itself: `--bz2in` `--gzin` `--zin`
|
||||
* Decompression done within the Miller process itself: `--bz2in` `--gzin` `--zin``--zstdin`
|
||||
* Decompression done outside the Miller process: `--prepipe` `--prepipex`
|
||||
|
||||
Using `--prepipe` and `--prepipex` you can specify an action to be
|
||||
|
|
@ -327,7 +327,7 @@ compression (or other) utilities, simply pipe the output:
|
|||
|
||||
Lastly, note that if `--prepipe` or `--prepipex` is specified, it replaces any
|
||||
decisions that might have been made based on the file suffix. Likewise,
|
||||
`--gzin`/`--bz2in`/`--zin` are ignored if `--prepipe` is also specified.
|
||||
`--gzin`/`--bz2in`/`--zin``--zin` are ignored if `--prepipe` is also specified.
|
||||
|
||||
--bz2in Uncompress bzip2 within the Miller process. Done by
|
||||
default if file ends in `.bz2`.
|
||||
|
|
@ -344,6 +344,8 @@ decisions that might have been made based on the file suffix. Likewise,
|
|||
`.mlrrc`.
|
||||
--prepipe-zcat Same as `--prepipe zcat`, except this is allowed in
|
||||
`.mlrrc`.
|
||||
--prepipe-zstdcat Same as `--prepipe zstdcat`, except this is allowed
|
||||
in `.mlrrc`.
|
||||
--prepipex {decompression command}
|
||||
Like `--prepipe` with one exception: doesn't insert
|
||||
`<` between command and filename at runtime. Useful
|
||||
|
|
@ -352,6 +354,8 @@ decisions that might have been made based on the file suffix. Likewise,
|
|||
in `.mlrrc` to avoid unexpected code execution.
|
||||
--zin Uncompress zlib within the Miller process. Done by
|
||||
default if file ends in `.z`.
|
||||
--zstdin Uncompress zstd within the Miller process. Done by
|
||||
default if file ends in `.zstd`.
|
||||
.fi
|
||||
.if n \{\
|
||||
.RE
|
||||
|
|
|
|||
1
test/cases/io-compressed-input/0014/cmd
Normal file
1
test/cases/io-compressed-input/0014/cmd
Normal file
|
|
@ -0,0 +1 @@
|
|||
mlr count -g a test/input/medium.zst
|
||||
0
test/cases/io-compressed-input/0014/experr
Normal file
0
test/cases/io-compressed-input/0014/experr
Normal file
5
test/cases/io-compressed-input/0014/expout
Normal file
5
test/cases/io-compressed-input/0014/expout
Normal file
|
|
@ -0,0 +1,5 @@
|
|||
a=pan,count=8
|
||||
a=eks,count=10
|
||||
a=wye,count=7
|
||||
a=zee,count=8
|
||||
a=hat,count=7
|
||||
1
test/cases/io-compressed-input/0015/cmd
Normal file
1
test/cases/io-compressed-input/0015/cmd
Normal file
|
|
@ -0,0 +1 @@
|
|||
mlr --zstdin count -g a < test/input/medium.zst
|
||||
0
test/cases/io-compressed-input/0015/experr
Normal file
0
test/cases/io-compressed-input/0015/experr
Normal file
5
test/cases/io-compressed-input/0015/expout
Normal file
5
test/cases/io-compressed-input/0015/expout
Normal file
|
|
@ -0,0 +1,5 @@
|
|||
a=pan,count=8
|
||||
a=eks,count=10
|
||||
a=wye,count=7
|
||||
a=zee,count=8
|
||||
a=hat,count=7
|
||||
1
test/cases/io-compressed-input/0016/cmd
Normal file
1
test/cases/io-compressed-input/0016/cmd
Normal file
|
|
@ -0,0 +1 @@
|
|||
mlr --zstdin count -g a test/input/medium.zst
|
||||
0
test/cases/io-compressed-input/0016/experr
Normal file
0
test/cases/io-compressed-input/0016/experr
Normal file
5
test/cases/io-compressed-input/0016/expout
Normal file
5
test/cases/io-compressed-input/0016/expout
Normal file
|
|
@ -0,0 +1,5 @@
|
|||
a=pan,count=8
|
||||
a=eks,count=10
|
||||
a=wye,count=7
|
||||
a=zee,count=8
|
||||
a=hat,count=7
|
||||
BIN
test/input/medium.zst
Normal file
BIN
test/input/medium.zst
Normal file
Binary file not shown.
Loading…
Add table
Add a link
Reference in a new issue