Make TSV finally true TSV (#923)

* Spec-TSV

* doc mods; more test cases
This commit is contained in:
John Kerl 2022-02-06 00:13:55 -05:00 committed by GitHub
parent ac47c7052a
commit 66c4a077fd
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
30 changed files with 705 additions and 139 deletions

3
.vimrc
View file

@ -1,4 +1,5 @@
map \d :w<C-m>:!clear;echo Building ...; echo; make mlr<C-m>
map \f :w<C-m>:!clear;echo Building ...; echo; make ut<C-m>
map \r :w<C-m>:!clear;echo Building ...; echo; make ut-scan ut-mlv<C-m>
"map \r :w<C-m>:!clear;echo Building ...; echo; make ut-scan ut-mlv<C-m>
map \r :w<C-m>:!clear;echo Building ...; echo; make ut-lib<C-m>
map \t :w<C-m>:!clear;go test github.com/johnkerl/miller/internal/pkg/transformers/...<C-m>

View file

@ -104,36 +104,34 @@ NIDX: implicitly numerically indexed (Unix-toolkit style)
When `mlr` is invoked with the `--csv` or `--csvlite` option, key names are found on the first record and values are taken from subsequent records. This includes the case of CSV-formatted files. See [Record Heterogeneity](record-heterogeneity.md) for how Miller handles changes of field names within a single data stream.
Miller has record separator `RS` and field separator `FS`, just as `awk` does. For TSV, use `--fs tab`; to convert TSV to CSV, use `--ifs tab --ofs comma`, etc. (See also the [separators page](reference-main-separators.md).)
Miller has record separator `RS` and field separator `FS`, just as `awk` does. (See also the [separators page](reference-main-separators.md).)
**TSV (tab-separated values):** the following are synonymous pairs:
**TSV (tab-separated values):** `FS` is tab and `RS` is newline (or carriage return + linefeed for
Windows). On input, if fields have `\r`, `\n`, `\t`, or `\\`, those are decoded as carriage return,
newline, tab, and backslash, respectively. On output, the reverse is done -- for example, if a field
has an embedded newline, that newline is replaced by `\n`.
* `--tsv` and `--csv --fs tab`
* `--itsv` and `--icsv --ifs tab`
* `--otsv` and `--ocsv --ofs tab`
* `--tsvlite` and `--csvlite --fs tab`
* `--itsvlite` and `--icsvlite --ifs tab`
* `--otsvlite` and `--ocsvlite --ofs tab`
**ASV (ASCII-separated values):** the flags `--asv`, `--iasv`, `--oasv`, `--asvlite`, `--iasvlite`, and `--oasvlite` are analogous except they use ASCII FS and RS `0x1f` and `0x1e`, respectively.
**ASV (ASCII-separated values):** the flags `--asv`, `--iasv`, `--oasv`, `--asvlite`, `--iasvlite`, and `--oasvlite` are analogous except they use ASCII FS and RS 0x1f and 0x1e, respectively.
**USV (Unicode-separated values):** likewise, the flags `--usv`, `--iusv`, `--ousv`, `--usvlite`, `--iusvlite`, and `--ousvlite` use Unicode FS and RS U+241F (UTF-8 0x0xe2909f) and U+241E (UTF-8 0xe2909e), respectively.
**USV (Unicode-separated values):** likewise, the flags `--usv`, `--iusv`, `--ousv`, `--usvlite`, `--iusvlite`, and `--ousvlite` use Unicode FS and RS `U+241F` (UTF-8 `0x0xe2909f`) and `U+241E` (UTF-8 `0xe2909e`), respectively.
Miller's `--csv` flag supports [RFC-4180 CSV](https://tools.ietf.org/html/rfc4180). This includes CRLF line-terminators by default, regardless of platform.
Here are the differences between CSV and CSV-lite:
* CSV-lite naively splits lines on newline, and fields on comma -- embedded commas and newlines are not escaped in any way.
* CSV supports [RFC-4180](https://tools.ietf.org/html/rfc4180)-style double-quoting, including the ability to have commas and/or LF/CRLF line-endings contained within an input field; CSV-lite does not.
* CSV does not allow heterogeneous data; CSV-lite does (see also [Record Heterogeneity](record-heterogeneity.md)).
* The CSV-lite input-reading code is fractionally more efficient than the CSV input-reader.
* TSV-lite is simply CSV-lite with field separator set to tab instead of comma.
Here are things they have in common:
* CSV-lite allows changing FS and/or RS to any values, perhaps multi-character.
* The ability to specify record/field separators other than the default, e.g. CR-LF vs. LF, or tab instead of comma for TSV, and so on.
* In short, use-cases for CSV-lite and TSV-lite are often found when dealing with CSV/TSV files which are formatted in some non-standard way -- you have a little more flexibility available to you. (As an example of this flexibility: ASV and USV are nothing more than CSV-lite with different values for FS and RS.)
* The `--implicit-csv-header` flag for input and the `--headerless-csv-output` flag for output.
CSV, TSV, CSV-lite, and TSV-lite have in common the `--implicit-csv-header` flag for input and the `--headerless-csv-output` flag for output.
## JSON

View file

@ -16,36 +16,34 @@ GENMD-EOF
When `mlr` is invoked with the `--csv` or `--csvlite` option, key names are found on the first record and values are taken from subsequent records. This includes the case of CSV-formatted files. See [Record Heterogeneity](record-heterogeneity.md) for how Miller handles changes of field names within a single data stream.
Miller has record separator `RS` and field separator `FS`, just as `awk` does. For TSV, use `--fs tab`; to convert TSV to CSV, use `--ifs tab --ofs comma`, etc. (See also the [separators page](reference-main-separators.md).)
Miller has record separator `RS` and field separator `FS`, just as `awk` does. (See also the [separators page](reference-main-separators.md).)
**TSV (tab-separated values):** the following are synonymous pairs:
**TSV (tab-separated values):** `FS` is tab and `RS` is newline (or carriage return + linefeed for
Windows). On input, if fields have `\r`, `\n`, `\t`, or `\\`, those are decoded as carriage return,
newline, tab, and backslash, respectively. On output, the reverse is done -- for example, if a field
has an embedded newline, that newline is replaced by `\n`.
* `--tsv` and `--csv --fs tab`
* `--itsv` and `--icsv --ifs tab`
* `--otsv` and `--ocsv --ofs tab`
* `--tsvlite` and `--csvlite --fs tab`
* `--itsvlite` and `--icsvlite --ifs tab`
* `--otsvlite` and `--ocsvlite --ofs tab`
**ASV (ASCII-separated values):** the flags `--asv`, `--iasv`, `--oasv`, `--asvlite`, `--iasvlite`, and `--oasvlite` are analogous except they use ASCII FS and RS `0x1f` and `0x1e`, respectively.
**ASV (ASCII-separated values):** the flags `--asv`, `--iasv`, `--oasv`, `--asvlite`, `--iasvlite`, and `--oasvlite` are analogous except they use ASCII FS and RS 0x1f and 0x1e, respectively.
**USV (Unicode-separated values):** likewise, the flags `--usv`, `--iusv`, `--ousv`, `--usvlite`, `--iusvlite`, and `--ousvlite` use Unicode FS and RS U+241F (UTF-8 0x0xe2909f) and U+241E (UTF-8 0xe2909e), respectively.
**USV (Unicode-separated values):** likewise, the flags `--usv`, `--iusv`, `--ousv`, `--usvlite`, `--iusvlite`, and `--ousvlite` use Unicode FS and RS `U+241F` (UTF-8 `0x0xe2909f`) and `U+241E` (UTF-8 `0xe2909e`), respectively.
Miller's `--csv` flag supports [RFC-4180 CSV](https://tools.ietf.org/html/rfc4180). This includes CRLF line-terminators by default, regardless of platform.
Here are the differences between CSV and CSV-lite:
* CSV-lite naively splits lines on newline, and fields on comma -- embedded commas and newlines are not escaped in any way.
* CSV supports [RFC-4180](https://tools.ietf.org/html/rfc4180)-style double-quoting, including the ability to have commas and/or LF/CRLF line-endings contained within an input field; CSV-lite does not.
* CSV does not allow heterogeneous data; CSV-lite does (see also [Record Heterogeneity](record-heterogeneity.md)).
* The CSV-lite input-reading code is fractionally more efficient than the CSV input-reader.
* TSV-lite is simply CSV-lite with field separator set to tab instead of comma.
Here are things they have in common:
* CSV-lite allows changing FS and/or RS to any values, perhaps multi-character.
* The ability to specify record/field separators other than the default, e.g. CR-LF vs. LF, or tab instead of comma for TSV, and so on.
* In short, use-cases for CSV-lite and TSV-lite are often found when dealing with CSV/TSV files which are formatted in some non-standard way -- you have a little more flexibility available to you. (As an example of this flexibility: ASV and USV are nothing more than CSV-lite with different values for FS and RS.)
* The `--implicit-csv-header` flag for input and the `--headerless-csv-output` flag for output.
CSV, TSV, CSV-lite, and TSV-lite have in common the `--implicit-csv-header` flag for input and the `--headerless-csv-output` flag for output.
## JSON

View file

@ -92,11 +92,11 @@ If there's more than one input file, you can use `--mfrom`, then however many fi
The following have even shorter versions:
* `-c` is the same as `--csv`
* `-t` is the same as `--tsvlite`
* `-t` is the same as `--tsv`
* `-j` is the same as `--json`
I don't use these within these documents, since I want the docs to be self-explanatory on every page, and
I think `mlr --csv ...` explains itself better than `mlr -c ...`. Nonetheless, they're there for you to use.
I think `mlr --csv ...` explains itself better than `mlr -c ...`. Nonetheless, they're always there for you to use.
## .mlrrc file

View file

@ -37,11 +37,11 @@ GENMD-EOF
The following have even shorter versions:
* `-c` is the same as `--csv`
* `-t` is the same as `--tsvlite`
* `-t` is the same as `--tsv`
* `-j` is the same as `--json`
I don't use these within these documents, since I want the docs to be self-explanatory on every page, and
I think `mlr --csv ...` explains itself better than `mlr -c ...`. Nonetheless, they're there for you to use.
I think `mlr --csv ...` explains itself better than `mlr -c ...`. Nonetheless, they're always there for you to use.
## .mlrrc file

View file

@ -386,7 +386,7 @@ FILE-FORMAT FLAGS
--oxtab Use XTAB format for output data.
--pprint Use PPRINT format for input and output data.
--tsv Use TSV format for input and output data.
--tsvlite or -t Use TSV-lite format for input and output data.
--tsv or -t Use TSV-lite format for input and output data.
--usv or --usvlite Use USV format for input and output data.
--xtab Use XTAB format for input and output data.
-i {format name} Use format name for input data. For example: `-i csv`
@ -708,7 +708,6 @@ SEPARATOR FLAGS
alignment impossible.
* OPS may be multi-character for XTAB format, in which case alignment is
disabled.
* TSV is simply CSV using tab as field separator (`--fs tab`).
* FS/PS are ignored for markdown format; RS is used.
* All FS and PS options are ignored for JSON format, since they are not relevant
to the JSON format.
@ -763,6 +762,7 @@ SEPARATOR FLAGS
markdown " " N/A "\n"
nidx " " N/A "\n"
pprint " " N/A "\n"
tsv " " N/A "\n"
xtab "\n" " " "\n\n"
--fs {string} Specify FS for input and output.
@ -3157,5 +3157,5 @@ SEE ALSO
2022-02-05 MILLER(1)
2022-02-06 MILLER(1)
</pre>

View file

@ -365,7 +365,7 @@ FILE-FORMAT FLAGS
--oxtab Use XTAB format for output data.
--pprint Use PPRINT format for input and output data.
--tsv Use TSV format for input and output data.
--tsvlite or -t Use TSV-lite format for input and output data.
--tsv or -t Use TSV-lite format for input and output data.
--usv or --usvlite Use USV format for input and output data.
--xtab Use XTAB format for input and output data.
-i {format name} Use format name for input data. For example: `-i csv`
@ -687,7 +687,6 @@ SEPARATOR FLAGS
alignment impossible.
* OPS may be multi-character for XTAB format, in which case alignment is
disabled.
* TSV is simply CSV using tab as field separator (`--fs tab`).
* FS/PS are ignored for markdown format; RS is used.
* All FS and PS options are ignored for JSON format, since they are not relevant
to the JSON format.
@ -742,6 +741,7 @@ SEPARATOR FLAGS
markdown " " N/A "\n"
nidx " " N/A "\n"
pprint " " N/A "\n"
tsv " " N/A "\n"
xtab "\n" " " "\n\n"
--fs {string} Specify FS for input and output.
@ -3136,4 +3136,4 @@ SEE ALSO
2022-02-05 MILLER(1)
2022-02-06 MILLER(1)

View file

@ -177,7 +177,7 @@ are overridden in all cases by setting output format to `format2`.
* `--oxtab`: Use XTAB format for output data.
* `--pprint`: Use PPRINT format for input and output data.
* `--tsv`: Use TSV format for input and output data.
* `--tsvlite or -t`: Use TSV-lite format for input and output data.
* `--tsv`: Use TSV format for input and output data.
* `--usv or --usvlite`: Use USV format for input and output data.
* `--xtab`: Use XTAB format for input and output data.
* `-i {format name}`: Use format name for input data. For example: `-i csv` is the same as `--icsv`.
@ -405,7 +405,6 @@ Notes about all other separators:
alignment impossible.
* OPS may be multi-character for XTAB format, in which case alignment is
disabled.
* TSV is simply CSV using tab as field separator (`--fs tab`).
* FS/PS are ignored for markdown format; RS is used.
* All FS and PS options are ignored for JSON format, since they are not relevant
to the JSON format.
@ -460,6 +459,7 @@ Notes about all other separators:
markdown " " N/A "\n"
nidx " " N/A "\n"
pprint " " N/A "\n"
tsv " " N/A "\n"
xtab "\n" " " "\n\n"

View file

@ -261,8 +261,9 @@ a:4;b:5;c:6;d:>>>,|||;<<<
Notes:
* If CSV field separator is tab, we have TSV; see more examples (ASV, USV, etc.) at in the [CSV section](file-formats.md#csvtsvasvusvetc).
* CSV IRS and ORS must be newline, and CSV IFS must be a single character. (CSV-lite does not have these restrictions.)
* TSV IRS and ORS must be newline, and TSV IFS must be a tab. (TSV-lite does not have these restrictions.)
* See the [CSV section](file-formats.md#csvtsvasvusvetc) for information about ASV and USV.
* JSON: ignores all separator flags from the command line.
* Headerless CSV overlaps quite a bit with NIDX format using comma for IFS. See also the page on [CSV with and without headers](csv-with-and-without-headers.md).
* For XTAB, the record separator is a repetition of the field separator. For example, if one record has `x=1,y=2` and the next has `x=3,y=4`, and OFS is newline, then output lines are `x 1`, then `y 2`, then an extra newline, then `x 3`, then `y 4`. This means: to customize XTAB, set `OFS` rather than `ORS`.

View file

@ -151,8 +151,9 @@ GENMD-EOF
Notes:
* If CSV field separator is tab, we have TSV; see more examples (ASV, USV, etc.) at in the [CSV section](file-formats.md#csvtsvasvusvetc).
* CSV IRS and ORS must be newline, and CSV IFS must be a single character. (CSV-lite does not have these restrictions.)
* TSV IRS and ORS must be newline, and TSV IFS must be a tab. (TSV-lite does not have these restrictions.)
* See the [CSV section](file-formats.md#csvtsvasvusvetc) for information about ASV and USV.
* JSON: ignores all separator flags from the command line.
* Headerless CSV overlaps quite a bit with NIDX format using comma for IFS. See also the page on [CSV with and without headers](csv-with-and-without-headers.md).
* For XTAB, the record separator is a repetition of the field separator. For example, if one record has `x=1,y=2` and the next has `x=3,y=4`, and OFS is newline, then output lines are `x 1`, then `y 2`, then an extra newline, then `x 3`, then `y 4`. This means: to customize XTAB, set `OFS` rather than `ORS`.

View file

@ -147,7 +147,6 @@ Notes about all other separators:
alignment impossible.
* OPS may be multi-character for XTAB format, in which case alignment is
disabled.
* TSV is simply CSV using tab as field separator (` + "`--fs tab`" + `).
* FS/PS are ignored for markdown format; RS is used.
* All FS and PS options are ignored for JSON format, since they are not relevant
to the JSON format.
@ -629,9 +628,7 @@ var FileFormatFlagSection = FlagSection{
name: "--itsv",
help: "Use TSV format for input data.",
parser: func(args []string, argc int, pargi *int, options *TOptions) {
options.ReaderOptions.InputFileFormat = "csv"
options.ReaderOptions.IFS = "\t"
options.ReaderOptions.ifsWasSpecified = true
options.ReaderOptions.InputFileFormat = "tsv"
*pargi += 1
},
},
@ -824,7 +821,7 @@ var FileFormatFlagSection = FlagSection{
name: "--otsv",
help: "Use TSV format for output data.",
parser: func(args []string, argc int, pargi *int, options *TOptions) {
options.WriterOptions.OutputFileFormat = "csv"
options.WriterOptions.OutputFileFormat = "tsv"
options.WriterOptions.OFS = "\t"
options.WriterOptions.ofsWasSpecified = true
*pargi += 1
@ -981,27 +978,19 @@ var FileFormatFlagSection = FlagSection{
name: "--tsv",
help: "Use TSV format for input and output data.",
parser: func(args []string, argc int, pargi *int, options *TOptions) {
options.ReaderOptions.InputFileFormat = "csv"
options.WriterOptions.OutputFileFormat = "csv"
options.ReaderOptions.IFS = "\t"
options.WriterOptions.OFS = "\t"
options.ReaderOptions.ifsWasSpecified = true
options.WriterOptions.ofsWasSpecified = true
options.ReaderOptions.InputFileFormat = "tsv"
options.WriterOptions.OutputFileFormat = "tsv"
*pargi += 1
},
},
{
name: "--tsvlite",
name: "--tsv",
help: "Use TSV-lite format for input and output data.",
altNames: []string{"-t"},
parser: func(args []string, argc int, pargi *int, options *TOptions) {
options.ReaderOptions.InputFileFormat = "csvlite"
options.WriterOptions.OutputFileFormat = "csvlite"
options.ReaderOptions.IFS = "\t"
options.WriterOptions.OFS = "\t"
options.ReaderOptions.ifsWasSpecified = true
options.WriterOptions.ofsWasSpecified = true
options.ReaderOptions.InputFileFormat = "tsv"
options.WriterOptions.OutputFileFormat = "tsv"
*pargi += 1
},
},
@ -1181,11 +1170,8 @@ var FormatConversionKeystrokeSaverFlagSection = FlagSection{
suppressFlagEnumeration: true,
parser: func(args []string, argc int, pargi *int, options *TOptions) {
options.ReaderOptions.InputFileFormat = "csv"
options.WriterOptions.OutputFileFormat = "csv"
options.WriterOptions.OFS = "\t"
options.WriterOptions.OutputFileFormat = "tsv"
options.ReaderOptions.irsWasSpecified = true
options.WriterOptions.ofsWasSpecified = true
options.WriterOptions.orsWasSpecified = true
*pargi += 1
},
},
@ -1308,12 +1294,8 @@ var FormatConversionKeystrokeSaverFlagSection = FlagSection{
// need to print a tedious 60-line list.
suppressFlagEnumeration: true,
parser: func(args []string, argc int, pargi *int, options *TOptions) {
options.ReaderOptions.InputFileFormat = "csv"
options.ReaderOptions.IFS = "\t"
options.ReaderOptions.InputFileFormat = "tsv"
options.WriterOptions.OutputFileFormat = "csv"
options.ReaderOptions.ifsWasSpecified = true
options.ReaderOptions.irsWasSpecified = true
options.WriterOptions.orsWasSpecified = true
*pargi += 1
},
},
@ -1324,11 +1306,8 @@ var FormatConversionKeystrokeSaverFlagSection = FlagSection{
// need to print a tedious 60-line list.
suppressFlagEnumeration: true,
parser: func(args []string, argc int, pargi *int, options *TOptions) {
options.ReaderOptions.InputFileFormat = "csv"
options.ReaderOptions.IFS = "\t"
options.ReaderOptions.InputFileFormat = "tsv"
options.WriterOptions.OutputFileFormat = "dkvp"
options.ReaderOptions.ifsWasSpecified = true
options.ReaderOptions.irsWasSpecified = true
*pargi += 1
},
},
@ -1339,12 +1318,9 @@ var FormatConversionKeystrokeSaverFlagSection = FlagSection{
// need to print a tedious 60-line list.
suppressFlagEnumeration: true,
parser: func(args []string, argc int, pargi *int, options *TOptions) {
options.ReaderOptions.InputFileFormat = "csv"
options.ReaderOptions.IFS = "\t"
options.ReaderOptions.InputFileFormat = "tsv"
options.WriterOptions.OutputFileFormat = "nidx"
options.WriterOptions.OFS = " "
options.ReaderOptions.ifsWasSpecified = true
options.ReaderOptions.irsWasSpecified = true
options.WriterOptions.ofsWasSpecified = true
*pargi += 1
},
@ -1356,13 +1332,10 @@ var FormatConversionKeystrokeSaverFlagSection = FlagSection{
// need to print a tedious 60-line list.
suppressFlagEnumeration: true,
parser: func(args []string, argc int, pargi *int, options *TOptions) {
options.ReaderOptions.InputFileFormat = "csv"
options.ReaderOptions.IFS = "\t"
options.ReaderOptions.InputFileFormat = "tsv"
options.WriterOptions.OutputFileFormat = "json"
options.WriterOptions.WrapJSONOutputInOuterList = true
options.WriterOptions.JSONOutputMultiline = true
options.ReaderOptions.ifsWasSpecified = true
options.ReaderOptions.irsWasSpecified = true
*pargi += 1
},
},
@ -1373,13 +1346,10 @@ var FormatConversionKeystrokeSaverFlagSection = FlagSection{
// need to print a tedious 60-line list.
suppressFlagEnumeration: true,
parser: func(args []string, argc int, pargi *int, options *TOptions) {
options.ReaderOptions.InputFileFormat = "csv"
options.ReaderOptions.IFS = "\t"
options.ReaderOptions.InputFileFormat = "tsv"
options.WriterOptions.OutputFileFormat = "json"
options.WriterOptions.WrapJSONOutputInOuterList = false
options.WriterOptions.JSONOutputMultiline = false
options.ReaderOptions.ifsWasSpecified = true
options.ReaderOptions.irsWasSpecified = true
*pargi += 1
},
},
@ -1390,11 +1360,8 @@ var FormatConversionKeystrokeSaverFlagSection = FlagSection{
// need to print a tedious 60-line list.
suppressFlagEnumeration: true,
parser: func(args []string, argc int, pargi *int, options *TOptions) {
options.ReaderOptions.InputFileFormat = "csv"
options.ReaderOptions.IFS = "\t"
options.ReaderOptions.InputFileFormat = "tsv"
options.WriterOptions.OutputFileFormat = "pprint"
options.ReaderOptions.ifsWasSpecified = true
options.ReaderOptions.irsWasSpecified = true
*pargi += 1
},
},
@ -1405,12 +1372,9 @@ var FormatConversionKeystrokeSaverFlagSection = FlagSection{
// need to print a tedious 60-line list.
suppressFlagEnumeration: true,
parser: func(args []string, argc int, pargi *int, options *TOptions) {
options.ReaderOptions.InputFileFormat = "csv"
options.ReaderOptions.IFS = "\t"
options.ReaderOptions.InputFileFormat = "tsv"
options.WriterOptions.OutputFileFormat = "pprint"
options.WriterOptions.BarredPprintOutput = true
options.ReaderOptions.ifsWasSpecified = true
options.ReaderOptions.irsWasSpecified = true
*pargi += 1
},
},
@ -1421,11 +1385,8 @@ var FormatConversionKeystrokeSaverFlagSection = FlagSection{
// need to print a tedious 60-line list.
suppressFlagEnumeration: true,
parser: func(args []string, argc int, pargi *int, options *TOptions) {
options.ReaderOptions.InputFileFormat = "csv"
options.ReaderOptions.IFS = "\t"
options.ReaderOptions.InputFileFormat = "tsv"
options.WriterOptions.OutputFileFormat = "xtab"
options.ReaderOptions.ifsWasSpecified = true
options.ReaderOptions.irsWasSpecified = true
*pargi += 1
},
},
@ -1436,11 +1397,8 @@ var FormatConversionKeystrokeSaverFlagSection = FlagSection{
// need to print a tedious 60-line list.
suppressFlagEnumeration: true,
parser: func(args []string, argc int, pargi *int, options *TOptions) {
options.ReaderOptions.InputFileFormat = "csv"
options.ReaderOptions.IFS = "\t"
options.ReaderOptions.InputFileFormat = "tsv"
options.WriterOptions.OutputFileFormat = "markdown"
options.ReaderOptions.ifsWasSpecified = true
options.ReaderOptions.irsWasSpecified = true
*pargi += 1
},
},
@ -1465,7 +1423,7 @@ var FormatConversionKeystrokeSaverFlagSection = FlagSection{
suppressFlagEnumeration: true,
parser: func(args []string, argc int, pargi *int, options *TOptions) {
options.ReaderOptions.InputFileFormat = "dkvp"
options.WriterOptions.OutputFileFormat = "csv"
options.WriterOptions.OutputFileFormat = "tsv"
options.WriterOptions.OFS = "\t"
options.WriterOptions.ofsWasSpecified = true
options.WriterOptions.orsWasSpecified = true
@ -1585,10 +1543,7 @@ var FormatConversionKeystrokeSaverFlagSection = FlagSection{
suppressFlagEnumeration: true,
parser: func(args []string, argc int, pargi *int, options *TOptions) {
options.ReaderOptions.InputFileFormat = "nidx"
options.WriterOptions.OutputFileFormat = "csv"
options.WriterOptions.OFS = "\t"
options.WriterOptions.ofsWasSpecified = true
options.WriterOptions.orsWasSpecified = true
options.WriterOptions.OutputFileFormat = "tsv"
*pargi += 1
},
},
@ -1703,10 +1658,7 @@ var FormatConversionKeystrokeSaverFlagSection = FlagSection{
suppressFlagEnumeration: true,
parser: func(args []string, argc int, pargi *int, options *TOptions) {
options.ReaderOptions.InputFileFormat = "json"
options.WriterOptions.OutputFileFormat = "csv"
options.WriterOptions.OFS = "\t"
options.WriterOptions.ofsWasSpecified = true
options.WriterOptions.orsWasSpecified = true
options.WriterOptions.OutputFileFormat = "tsv"
*pargi += 1
},
},
@ -1805,10 +1757,7 @@ var FormatConversionKeystrokeSaverFlagSection = FlagSection{
suppressFlagEnumeration: true,
parser: func(args []string, argc int, pargi *int, options *TOptions) {
options.ReaderOptions.InputFileFormat = "json"
options.WriterOptions.OutputFileFormat = "csv"
options.WriterOptions.OFS = "\t"
options.WriterOptions.ofsWasSpecified = true
options.WriterOptions.orsWasSpecified = true
options.WriterOptions.OutputFileFormat = "tsv"
*pargi += 1
},
},
@ -1910,11 +1859,8 @@ var FormatConversionKeystrokeSaverFlagSection = FlagSection{
parser: func(args []string, argc int, pargi *int, options *TOptions) {
options.ReaderOptions.InputFileFormat = "pprint"
options.ReaderOptions.IFS = " "
options.WriterOptions.OutputFileFormat = "csv"
options.WriterOptions.OFS = "\t"
options.WriterOptions.OutputFileFormat = "tsv"
options.ReaderOptions.ifsWasSpecified = true
options.WriterOptions.ofsWasSpecified = true
options.WriterOptions.orsWasSpecified = true
*pargi += 1
},
},
@ -2028,10 +1974,7 @@ var FormatConversionKeystrokeSaverFlagSection = FlagSection{
suppressFlagEnumeration: true,
parser: func(args []string, argc int, pargi *int, options *TOptions) {
options.ReaderOptions.InputFileFormat = "xtab"
options.WriterOptions.OutputFileFormat = "csv"
options.WriterOptions.OFS = "\t"
options.WriterOptions.ofsWasSpecified = true
options.WriterOptions.orsWasSpecified = true
options.WriterOptions.OutputFileFormat = "tsv"
*pargi += 1
},
},

View file

@ -89,6 +89,7 @@ var defaultFSes = map[string]string{
"nidx": " ",
"markdown": " ",
"pprint": " ",
"tsv": "\t",
"xtab": "\n", // todo: windows-dependent ...
}
@ -100,6 +101,7 @@ var defaultPSes = map[string]string{
"markdown": "N/A",
"nidx": "N/A",
"pprint": "N/A",
"tsv": "N/A",
"xtab": " ",
}
@ -111,6 +113,7 @@ var defaultRSes = map[string]string{
"markdown": "\n",
"nidx": "\n",
"pprint": "\n",
"tsv": "\n",
"xtab": "\n\n", // todo: maybe jettison the idea of this being alterable
}
@ -122,5 +125,6 @@ var defaultAllowRepeatIFSes = map[string]bool{
"markdown": false,
"nidx": false,
"pprint": true,
"tsv": false,
"xtab": false,
}

View file

@ -20,6 +20,8 @@ func Create(readerOptions *cli.TReaderOptions, recordsPerBatch int64) (IRecordRe
return NewRecordReaderNIDX(readerOptions, recordsPerBatch)
case "pprint":
return NewRecordReaderPPRINT(readerOptions, recordsPerBatch)
case "tsv":
return NewRecordReaderTSV(readerOptions, recordsPerBatch)
case "xtab":
return NewRecordReaderXTAB(readerOptions, recordsPerBatch)
case "gen":

View file

@ -0,0 +1,378 @@
package input
import (
"container/list"
"fmt"
"io"
"strconv"
"strings"
"github.com/johnkerl/miller/internal/pkg/cli"
"github.com/johnkerl/miller/internal/pkg/lib"
"github.com/johnkerl/miller/internal/pkg/mlrval"
"github.com/johnkerl/miller/internal/pkg/types"
)
// recordBatchGetterTSV points to either an explicit-TSV-header or
// implicit-TSV-header record-batch getter.
type recordBatchGetterTSV func(
reader *RecordReaderTSV,
linesChannel <-chan *list.List,
filename string,
context *types.Context,
errorChannel chan error,
) (
recordsAndContexts *list.List,
eof bool,
)
type RecordReaderTSV struct {
readerOptions *cli.TReaderOptions
recordsPerBatch int64 // distinct from readerOptions.RecordsPerBatch for join/repl
fieldSplitter iFieldSplitter
recordBatchGetter recordBatchGetterTSV
inputLineNumber int64
headerStrings []string
}
func NewRecordReaderTSV(
readerOptions *cli.TReaderOptions,
recordsPerBatch int64,
) (*RecordReaderTSV, error) {
if readerOptions.IFS != "\t" {
return nil, fmt.Errorf("for TSV, IFS cannot be altered")
}
if readerOptions.IRS != "\n" && readerOptions.IRS != "\r\n" {
return nil, fmt.Errorf("for TSV, IRS cannot be altered; LF vs CR/LF is autodetected")
}
reader := &RecordReaderTSV{
readerOptions: readerOptions,
recordsPerBatch: recordsPerBatch,
fieldSplitter: newFieldSplitter(readerOptions),
}
if reader.readerOptions.UseImplicitCSVHeader {
reader.recordBatchGetter = getRecordBatchImplicitTSVHeader
} else {
reader.recordBatchGetter = getRecordBatchExplicitTSVHeader
}
return reader, nil
}
func (reader *RecordReaderTSV) Read(
filenames []string,
context types.Context,
readerChannel chan<- *list.List, // list of *types.RecordAndContext
errorChannel chan error,
downstreamDoneChannel <-chan bool, // for mlr head
) {
if filenames != nil { // nil for mlr -n
if len(filenames) == 0 { // read from stdin
handle, err := lib.OpenStdin(
reader.readerOptions.Prepipe,
reader.readerOptions.PrepipeIsRaw,
reader.readerOptions.FileInputEncoding,
)
if err != nil {
errorChannel <- err
return
}
reader.processHandle(
handle,
"(stdin)",
&context,
readerChannel,
errorChannel,
downstreamDoneChannel,
)
} else {
for _, filename := range filenames {
handle, err := lib.OpenFileForRead(
filename,
reader.readerOptions.Prepipe,
reader.readerOptions.PrepipeIsRaw,
reader.readerOptions.FileInputEncoding,
)
if err != nil {
errorChannel <- err
return
}
reader.processHandle(
handle,
filename,
&context,
readerChannel,
errorChannel,
downstreamDoneChannel,
)
handle.Close()
}
}
}
readerChannel <- types.NewEndOfStreamMarkerList(&context)
}
func (reader *RecordReaderTSV) processHandle(
handle io.Reader,
filename string,
context *types.Context,
readerChannel chan<- *list.List, // list of *types.RecordAndContext
errorChannel chan error,
downstreamDoneChannel <-chan bool, // for mlr head
) {
context.UpdateForStartOfFile(filename)
reader.inputLineNumber = 0
reader.headerStrings = nil
recordsPerBatch := reader.recordsPerBatch
lineScanner := NewLineScanner(handle, reader.readerOptions.IRS)
linesChannel := make(chan *list.List, recordsPerBatch)
go channelizedLineScanner(lineScanner, linesChannel, downstreamDoneChannel, recordsPerBatch)
for {
recordsAndContexts, eof := reader.recordBatchGetter(reader, linesChannel, filename, context, errorChannel)
if recordsAndContexts.Len() > 0 {
readerChannel <- recordsAndContexts
}
if eof {
break
}
}
}
func getRecordBatchExplicitTSVHeader(
reader *RecordReaderTSV,
linesChannel <-chan *list.List,
filename string,
context *types.Context,
errorChannel chan error,
) (
recordsAndContexts *list.List,
eof bool,
) {
recordsAndContexts = list.New()
dedupeFieldNames := reader.readerOptions.DedupeFieldNames
lines, more := <-linesChannel
if !more {
return recordsAndContexts, true
}
for e := lines.Front(); e != nil; e = e.Next() {
line := e.Value.(string)
reader.inputLineNumber++
// Check for comments-in-data feature
// TODO: function-pointer this away
if reader.readerOptions.CommentHandling != cli.CommentsAreData {
if strings.HasPrefix(line, reader.readerOptions.CommentString) {
if reader.readerOptions.CommentHandling == cli.PassComments {
recordsAndContexts.PushBack(types.NewOutputString(line+"\n", context))
continue
} else if reader.readerOptions.CommentHandling == cli.SkipComments {
continue
}
// else comments are data
}
}
if line == "" {
// Reset to new schema
reader.headerStrings = nil
continue
}
fields := reader.fieldSplitter.Split(line)
if reader.headerStrings == nil {
reader.headerStrings = fields
// Get data lines on subsequent loop iterations
} else {
if !reader.readerOptions.AllowRaggedCSVInput && len(reader.headerStrings) != len(fields) {
err := fmt.Errorf(
"mlr: TSV header/data length mismatch %d != %d "+
"at filename %s line %d.\n",
len(reader.headerStrings), len(fields), filename, reader.inputLineNumber,
)
errorChannel <- err
return
}
record := mlrval.NewMlrmapAsRecord()
if !reader.readerOptions.AllowRaggedCSVInput {
for i, field := range fields {
field = lib.TSVDecodeField(field)
value := mlrval.FromDeferredType(field)
_, err := record.PutReferenceMaybeDedupe(reader.headerStrings[i], value, dedupeFieldNames)
if err != nil {
errorChannel <- err
return
}
}
} else {
nh := int64(len(reader.headerStrings))
nd := int64(len(fields))
n := lib.IntMin2(nh, nd)
var i int64
for i = 0; i < n; i++ {
field := lib.TSVDecodeField(fields[i])
value := mlrval.FromDeferredType(field)
_, err := record.PutReferenceMaybeDedupe(reader.headerStrings[i], value, dedupeFieldNames)
if err != nil {
errorChannel <- err
return
}
}
if nh < nd {
// if header shorter than data: use 1-up itoa keys
for i = nh; i < nd; i++ {
key := strconv.FormatInt(i+1, 10)
field := lib.TSVDecodeField(fields[i])
value := mlrval.FromDeferredType(field)
_, err := record.PutReferenceMaybeDedupe(key, value, dedupeFieldNames)
if err != nil {
errorChannel <- err
return
}
}
}
if nh > nd {
// if header longer than data: use "" values
for i = nd; i < nh; i++ {
record.PutCopy(reader.headerStrings[i], mlrval.VOID)
}
}
}
context.UpdateForInputRecord()
recordsAndContexts.PushBack(types.NewRecordAndContext(record, context))
}
}
return recordsAndContexts, false
}
func getRecordBatchImplicitTSVHeader(
reader *RecordReaderTSV,
linesChannel <-chan *list.List,
filename string,
context *types.Context,
errorChannel chan error,
) (
recordsAndContexts *list.List,
eof bool,
) {
recordsAndContexts = list.New()
dedupeFieldNames := reader.readerOptions.DedupeFieldNames
lines, more := <-linesChannel
if !more {
return recordsAndContexts, true
}
for e := lines.Front(); e != nil; e = e.Next() {
line := e.Value.(string)
reader.inputLineNumber++
// Check for comments-in-data feature
// TODO: function-pointer this away
if reader.readerOptions.CommentHandling != cli.CommentsAreData {
if strings.HasPrefix(line, reader.readerOptions.CommentString) {
if reader.readerOptions.CommentHandling == cli.PassComments {
recordsAndContexts.PushBack(types.NewOutputString(line+"\n", context))
continue
} else if reader.readerOptions.CommentHandling == cli.SkipComments {
continue
}
// else comments are data
}
}
// This is how to do a chomp:
line = strings.TrimRight(line, reader.readerOptions.IRS)
line = strings.TrimRight(line, "\r")
if line == "" {
// Reset to new schema
reader.headerStrings = nil
continue
}
fields := reader.fieldSplitter.Split(line)
if reader.headerStrings == nil {
n := len(fields)
reader.headerStrings = make([]string, n)
for i := 0; i < n; i++ {
reader.headerStrings[i] = strconv.Itoa(i + 1)
}
} else {
if !reader.readerOptions.AllowRaggedCSVInput && len(reader.headerStrings) != len(fields) {
err := fmt.Errorf(
"mlr: TSV header/data length mismatch %d != %d "+
"at filename %s line %d.\n",
len(reader.headerStrings), len(fields), filename, reader.inputLineNumber,
)
errorChannel <- err
return
}
}
record := mlrval.NewMlrmapAsRecord()
if !reader.readerOptions.AllowRaggedCSVInput {
for i, field := range fields {
field = lib.TSVDecodeField(field)
value := mlrval.FromDeferredType(field)
_, err := record.PutReferenceMaybeDedupe(reader.headerStrings[i], value, dedupeFieldNames)
if err != nil {
errorChannel <- err
return
}
}
} else {
nh := int64(len(reader.headerStrings))
nd := int64(len(fields))
n := lib.IntMin2(nh, nd)
var i int64
for i = 0; i < n; i++ {
field := lib.TSVDecodeField(fields[i])
value := mlrval.FromDeferredType(field)
_, err := record.PutReferenceMaybeDedupe(reader.headerStrings[i], value, dedupeFieldNames)
if err != nil {
errorChannel <- err
return
}
}
if nh < nd {
// if header shorter than data: use 1-up itoa keys
key := strconv.FormatInt(i+1, 10)
field := lib.TSVDecodeField(fields[i])
value := mlrval.FromDeferredType(field)
_, err := record.PutReferenceMaybeDedupe(key, value, dedupeFieldNames)
if err != nil {
errorChannel <- err
return
}
}
if nh > nd {
// if header longer than data: use "" values
for i = nd; i < nh; i++ {
_, err := record.PutReferenceMaybeDedupe(reader.headerStrings[i], mlrval.VOID.Copy(), dedupeFieldNames)
if err != nil {
errorChannel <- err
return
}
}
}
}
context.UpdateForInputRecord()
recordsAndContexts.PushBack(types.NewRecordAndContext(record, context))
}
return recordsAndContexts, false
}

View file

@ -0,0 +1,68 @@
package lib
import (
"bytes"
)
// * https://en.wikipedia.org/wiki/Tab-separated_values
// * https://www.iana.org/assignments/media-types/text/tab-separated-values
// \n for newline,
// \r for carriage return,
// \t for tab,
// \\ for backslash.
// TSVDecodeField is for the TSV record-reader.
func TSVDecodeField(input string) string {
var buffer bytes.Buffer
n := len(input)
for i := 0; i < n; /* increment in loop */ {
c := input[i]
if c == '\\' && i < n-1 {
d := input[i+1]
if d == '\\' {
buffer.WriteByte('\\')
i += 2
} else if d == 'n' {
buffer.WriteByte('\n')
i += 2
} else if d == 'r' {
buffer.WriteByte('\r')
i += 2
} else if d == 't' {
buffer.WriteByte('\t')
i += 2
} else {
buffer.WriteByte(c)
i++
}
} else {
buffer.WriteByte(c)
i++
}
}
return buffer.String()
}
// TSVEncodeField is for the TSV record-writer.
func TSVEncodeField(input string) string {
var buffer bytes.Buffer
for i := range input {
c := input[i]
if c == '\\' {
buffer.WriteByte('\\')
buffer.WriteByte('\\')
} else if c == '\n' {
buffer.WriteByte('\\')
buffer.WriteByte('n')
} else if c == '\r' {
buffer.WriteByte('\\')
buffer.WriteByte('r')
} else if c == '\t' {
buffer.WriteByte('\\')
buffer.WriteByte('t')
} else {
buffer.WriteByte(c)
}
}
return buffer.String()
}

View file

@ -0,0 +1,35 @@
package lib
import (
"testing"
"github.com/stretchr/testify/assert"
)
func TestTSVDecodeField(t *testing.T) {
assert.Equal(t, "", TSVDecodeField(""))
assert.Equal(t, "a", TSVDecodeField("a"))
assert.Equal(t, "abc", TSVDecodeField("abc"))
assert.Equal(t, `\`, TSVDecodeField(`\`))
assert.Equal(t, "\n", TSVDecodeField(`\n`))
assert.Equal(t, "\r", TSVDecodeField(`\r`))
assert.Equal(t, "\t", TSVDecodeField(`\t`))
assert.Equal(t, "\\", TSVDecodeField(`\\`))
assert.Equal(t, `\n`, TSVDecodeField(`\\n`))
assert.Equal(t, "\\\n", TSVDecodeField(`\\\n`))
assert.Equal(t, "abc\r\ndef\r\n", TSVDecodeField(`abc\r\ndef\r\n`))
}
func TestTSVEncodeField(t *testing.T) {
assert.Equal(t, "", TSVEncodeField(""))
assert.Equal(t, "a", TSVEncodeField("a"))
assert.Equal(t, "abc", TSVEncodeField("abc"))
assert.Equal(t, `\\`, TSVEncodeField(`\`))
assert.Equal(t, `\n`, TSVEncodeField("\n"))
assert.Equal(t, `\r`, TSVEncodeField("\r"))
assert.Equal(t, `\t`, TSVEncodeField("\t"))
assert.Equal(t, `\\`, TSVEncodeField("\\"))
assert.Equal(t, `\\n`, TSVEncodeField("\\n"))
assert.Equal(t, `\\\n`, TSVEncodeField("\\\n"))
assert.Equal(t, `abc\r\ndef\r\n`, TSVEncodeField("abc\r\ndef\r\n"))
}

View file

@ -22,6 +22,8 @@ func Create(writerOptions *cli.TWriterOptions) (IRecordWriter, error) {
return NewRecordWriterNIDX(writerOptions)
case "pprint":
return NewRecordWriterPPRINT(writerOptions)
case "tsv":
return NewRecordWriterTSV(writerOptions)
case "xtab":
return NewRecordWriterXTAB(writerOptions)
default:

View file

@ -0,0 +1,104 @@
package output
import (
"bufio"
"fmt"
"strings"
"github.com/johnkerl/miller/internal/pkg/cli"
"github.com/johnkerl/miller/internal/pkg/colorizer"
"github.com/johnkerl/miller/internal/pkg/lib"
"github.com/johnkerl/miller/internal/pkg/mlrval"
)
type RecordWriterTSV struct {
writerOptions *cli.TWriterOptions
// For reporting schema changes: we print a newline and the new header
lastJoinedHeader *string
// Only write one blank line for schema changes / blank input lines
justWroteEmptyLine bool
}
func NewRecordWriterTSV(writerOptions *cli.TWriterOptions) (*RecordWriterTSV, error) {
if writerOptions.OFS != "\t" {
return nil, fmt.Errorf("for TSV, OFS cannot be altered")
}
if writerOptions.ORS != "\n" && writerOptions.ORS != "\r\n" {
return nil, fmt.Errorf("for CSV, ORS cannot be altered")
}
return &RecordWriterTSV{
writerOptions: writerOptions,
lastJoinedHeader: nil,
justWroteEmptyLine: false,
}, nil
}
func (writer *RecordWriterTSV) Write(
outrec *mlrval.Mlrmap,
bufferedOutputStream *bufio.Writer,
outputIsStdout bool,
) {
// End of record stream: nothing special for this output format
if outrec == nil {
return
}
if outrec.IsEmpty() {
if !writer.justWroteEmptyLine {
bufferedOutputStream.WriteString(writer.writerOptions.ORS)
}
joinedHeader := ""
writer.lastJoinedHeader = &joinedHeader
writer.justWroteEmptyLine = true
return
}
needToPrintHeader := false
joinedHeader := strings.Join(outrec.GetKeys(), ",")
if writer.lastJoinedHeader == nil || *writer.lastJoinedHeader != joinedHeader {
if writer.lastJoinedHeader != nil {
if !writer.justWroteEmptyLine {
bufferedOutputStream.WriteString(writer.writerOptions.ORS)
}
writer.justWroteEmptyLine = true
}
writer.lastJoinedHeader = &joinedHeader
needToPrintHeader = true
}
if needToPrintHeader && !writer.writerOptions.HeaderlessCSVOutput {
for pe := outrec.Head; pe != nil; pe = pe.Next {
bufferedOutputStream.WriteString(
colorizer.MaybeColorizeKey(
lib.TSVEncodeField(
pe.Key,
),
outputIsStdout,
),
)
if pe.Next != nil {
bufferedOutputStream.WriteString(writer.writerOptions.OFS)
}
}
bufferedOutputStream.WriteString(writer.writerOptions.ORS)
}
for pe := outrec.Head; pe != nil; pe = pe.Next {
bufferedOutputStream.WriteString(
colorizer.MaybeColorizeValue(
lib.TSVEncodeField(
pe.Value.String(),
),
outputIsStdout,
),
)
if pe.Next != nil {
bufferedOutputStream.WriteString(writer.writerOptions.OFS)
}
}
bufferedOutputStream.WriteString(writer.writerOptions.ORS)
writer.justWroteEmptyLine = false
}

View file

@ -365,7 +365,7 @@ FILE-FORMAT FLAGS
--oxtab Use XTAB format for output data.
--pprint Use PPRINT format for input and output data.
--tsv Use TSV format for input and output data.
--tsvlite or -t Use TSV-lite format for input and output data.
--tsv or -t Use TSV-lite format for input and output data.
--usv or --usvlite Use USV format for input and output data.
--xtab Use XTAB format for input and output data.
-i {format name} Use format name for input data. For example: `-i csv`
@ -687,7 +687,6 @@ SEPARATOR FLAGS
alignment impossible.
* OPS may be multi-character for XTAB format, in which case alignment is
disabled.
* TSV is simply CSV using tab as field separator (`--fs tab`).
* FS/PS are ignored for markdown format; RS is used.
* All FS and PS options are ignored for JSON format, since they are not relevant
to the JSON format.
@ -742,6 +741,7 @@ SEPARATOR FLAGS
markdown " " N/A "\n"
nidx " " N/A "\n"
pprint " " N/A "\n"
tsv " " N/A "\n"
xtab "\n" " " "\n\n"
--fs {string} Specify FS for input and output.
@ -3136,4 +3136,4 @@ SEE ALSO
2022-02-05 MILLER(1)
2022-02-06 MILLER(1)

View file

@ -2,12 +2,12 @@
.\" Title: mlr
.\" Author: [see the "AUTHOR" section]
.\" Generator: ./mkman.rb
.\" Date: 2022-02-05
.\" Date: 2022-02-06
.\" Manual: \ \&
.\" Source: \ \&
.\" Language: English
.\"
.TH "MILLER" "1" "2022-02-05" "\ \&" "\ \&"
.TH "MILLER" "1" "2022-02-06" "\ \&" "\ \&"
.\" -----------------------------------------------------------------
.\" * Portability definitions
.\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@ -444,7 +444,7 @@ are overridden in all cases by setting output format to `format2`.
--oxtab Use XTAB format for output data.
--pprint Use PPRINT format for input and output data.
--tsv Use TSV format for input and output data.
--tsvlite or -t Use TSV-lite format for input and output data.
--tsv or -t Use TSV-lite format for input and output data.
--usv or --usvlite Use USV format for input and output data.
--xtab Use XTAB format for input and output data.
-i {format name} Use format name for input data. For example: `-i csv`
@ -830,7 +830,6 @@ Notes about all other separators:
alignment impossible.
* OPS may be multi-character for XTAB format, in which case alignment is
disabled.
* TSV is simply CSV using tab as field separator (`--fs tab`).
* FS/PS are ignored for markdown format; RS is used.
* All FS and PS options are ignored for JSON format, since they are not relevant
to the JSON format.
@ -885,6 +884,7 @@ Notes about all other separators:
markdown " " N/A "\en"
nidx " " N/A "\en"
pprint " " N/A "\en"
tsv " " N/A "\en"
xtab "\en" " " "\en\en"
--fs {string} Specify FS for input and output.

View file

@ -1,5 +1,5 @@
"a b i x y"
"pan pan 1 0.3467901443380824 0.7268028627434533"
"eks pan 2 0.7586799647899636 0.5221511083334797"
"wye wye 3 0.20460330576630303 0.33831852551664776"
"eks wye 4 0.38139939387114097 0.13418874328430463"
a\tb\ti\tx\ty
pan\tpan\t1\t0.3467901443380824\t0.7268028627434533
eks\tpan\t2\t0.7586799647899636\t0.5221511083334797
wye\twye\t3\t0.20460330576630303\t0.33831852551664776
eks\twye\t4\t0.38139939387114097\t0.13418874328430463

View file

@ -0,0 +1 @@
mlr --itsv --ojson cat ${CASEDIR}/data.tsv

View file

@ -0,0 +1,2 @@
a\tb,c\nd,e
1\r2,3\\4,5
1 a\tb,c\nd,e
2 1\r2,3\\4,5

View file

View file

@ -0,0 +1,5 @@
[
{
"a\\tb,c\\nd,e": "1\r2,3\\4,5"
}
]

View file

@ -0,0 +1 @@
mlr --ijson --otsv cat ${CASEDIR}/data.json

View file

@ -0,0 +1,5 @@
[
{
"a\\tb,c\\nd,e": "1\r2,3\\4,5"
}
]

View file

View file

@ -0,0 +1,2 @@
a\\tb,c\\nd,e
1\r2,3\\4,5

View file

@ -2,26 +2,41 @@
RELEASES
* plan 6.1.0
! IANA-TSV w/ \{X}
? w/ natural sort order
? strptime
? datediff et al.
? mlr join --left-fields a,b,c
? rank
? ?foo and ??foo @ repl help
o fmt/unfmt/regex doc
o FAQ/examples reorg
k default colors; bold/underline/reverse
k array concat
k format/unformat
k split
k split verb
k slwin & shift-lead
m unicode string literals
k 0o.. octal literals in the DSL
k codeql/codespell/goreleaseer binaries/zips
k :rb
k ?foo and ??foo @ repl help
k doc-improves
* plan 6.2.0
? YAML
================================================================
FEATURES
----------------------------------------------------------------
TSV etc
? also: some escapes perhaps for dkvp, xtab, pprint -- ?
o nidx is a particular pure-text, leave-as-is
? try out nidx single-line w/ \001, \002 FS/PS & \n or \n\n RS
o make/publicize a shorthand for this -- ?
o --words && --lines & --paragraphs -- ?
* still need csv --lazy-quotes
----------------------------------------------------------------
* natural sort order
https://github.com/facette/natsort