mirror of
https://github.com/johnkerl/miller.git
synced 2026-01-23 02:14:13 +00:00
Auto-unsparsify CSV and TSV on output (#1479)
* Auto-unsparsify CSV * Update unit-test cases * More unit-test cases * Key-change handling for CSV output * Same for TSV, with unit-test and doc updates
This commit is contained in:
parent
af021f28d7
commit
ac65675ab1
61 changed files with 481 additions and 221 deletions
5
docs/src/data/key-change.json
Normal file
5
docs/src/data/key-change.json
Normal file
|
|
@ -0,0 +1,5 @@
|
|||
[
|
||||
{ "a": 1, "b": 2, "c": 3 },
|
||||
{ "a": 4, "b": 5, "c": 6 },
|
||||
{ "a": 7, "X": 8, "c": 9 }
|
||||
]
|
||||
6
docs/src/data/under-over.json
Normal file
6
docs/src/data/under-over.json
Normal file
|
|
@ -0,0 +1,6 @@
|
|||
[
|
||||
{ "a": 1, "b": 2, "c": 3 },
|
||||
{ "a": 4, "b": 5, "c": 6, "d": 7 },
|
||||
{ "a": 7, "b": 8 },
|
||||
{ "a": 9, "b": 10, "c": 11 }
|
||||
]
|
||||
|
|
@ -130,6 +130,74 @@ In particular, no encode/decode of `\r`, `\n`, `\t`, or `\\` is done.
|
|||
|
||||
* CSV-lite allows changing FS and/or RS to any values, perhaps multi-character.
|
||||
|
||||
* CSV-lite and TSV-lite handle schema changes ("schema" meaning "ordered list of field names in a given record") by adding a newline and re-emitting the header. CSV and TSV, by contrast, do the following:
|
||||
* If there are too few keys, but these match the header, empty fields are emitted.
|
||||
* If there are too many keys, but these match the header up to the number of header fields, the extra fields are emitted.
|
||||
* If keys don't match the header, this is an error.
|
||||
|
||||
<pre class="pre-highlight-in-pair">
|
||||
<b>cat data/under-over.json</b>
|
||||
</pre>
|
||||
<pre class="pre-non-highlight-in-pair">
|
||||
[
|
||||
{ "a": 1, "b": 2, "c": 3 },
|
||||
{ "a": 4, "b": 5, "c": 6, "d": 7 },
|
||||
{ "a": 7, "b": 8 },
|
||||
{ "a": 9, "b": 10, "c": 11 }
|
||||
]
|
||||
</pre>
|
||||
|
||||
<pre class="pre-highlight-in-pair">
|
||||
<b>mlr --ijson --ocsvlite cat data/under-over.json</b>
|
||||
</pre>
|
||||
<pre class="pre-non-highlight-in-pair">
|
||||
a,b,c
|
||||
1,2,3
|
||||
|
||||
a,b,c,d
|
||||
4,5,6,7
|
||||
|
||||
a,b
|
||||
7,8
|
||||
|
||||
a,b,c
|
||||
9,10,11
|
||||
</pre>
|
||||
|
||||
<pre class="pre-highlight-in-pair">
|
||||
<b>mlr --ijson --ocsvlite cat data/key-change.json</b>
|
||||
</pre>
|
||||
<pre class="pre-non-highlight-in-pair">
|
||||
a,b,c
|
||||
1,2,3
|
||||
4,5,6
|
||||
|
||||
a,X,c
|
||||
7,8,9
|
||||
</pre>
|
||||
|
||||
<pre class="pre-highlight-in-pair">
|
||||
<b>mlr --ijson --ocsv cat data/under-over.json</b>
|
||||
</pre>
|
||||
<pre class="pre-non-highlight-in-pair">
|
||||
a,b,c
|
||||
1,2,3
|
||||
4,5,6,7
|
||||
7,8,
|
||||
9,10,11
|
||||
</pre>
|
||||
|
||||
<pre class="pre-highlight-in-pair">
|
||||
<b>mlr --ijson --ocsv cat data/key-change.json</b>
|
||||
</pre>
|
||||
<pre class="pre-non-highlight-in-pair">
|
||||
a,b,c
|
||||
1,2,3
|
||||
4,5,6
|
||||
mlr: CSV schema change: first keys "a,b,c"; current keys "a,X,c"
|
||||
mlr: exiting due to data error.
|
||||
</pre>
|
||||
|
||||
* In short, use-cases for CSV-lite and TSV-lite are often found when dealing with CSV/TSV files which are formatted in some non-standard way -- you have a little more flexibility available to you. (As an example of this flexibility: ASV and USV are nothing more than CSV-lite with different values for FS and RS.)
|
||||
|
||||
CSV, TSV, CSV-lite, and TSV-lite have in common the `--implicit-csv-header` flag for input and the `--headerless-csv-output` flag for output.
|
||||
|
|
|
|||
|
|
@ -42,6 +42,31 @@ In particular, no encode/decode of `\r`, `\n`, `\t`, or `\\` is done.
|
|||
|
||||
* CSV-lite allows changing FS and/or RS to any values, perhaps multi-character.
|
||||
|
||||
* CSV-lite and TSV-lite handle schema changes ("schema" meaning "ordered list of field names in a given record") by adding a newline and re-emitting the header. CSV and TSV, by contrast, do the following:
|
||||
* If there are too few keys, but these match the header, empty fields are emitted.
|
||||
* If there are too many keys, but these match the header up to the number of header fields, the extra fields are emitted.
|
||||
* If keys don't match the header, this is an error.
|
||||
|
||||
GENMD-RUN-COMMAND
|
||||
cat data/under-over.json
|
||||
GENMD-EOF
|
||||
|
||||
GENMD-RUN-COMMAND
|
||||
mlr --ijson --ocsvlite cat data/under-over.json
|
||||
GENMD-EOF
|
||||
|
||||
GENMD-RUN-COMMAND-TOLERATING-ERROR
|
||||
mlr --ijson --ocsvlite cat data/key-change.json
|
||||
GENMD-EOF
|
||||
|
||||
GENMD-RUN-COMMAND
|
||||
mlr --ijson --ocsv cat data/under-over.json
|
||||
GENMD-EOF
|
||||
|
||||
GENMD-RUN-COMMAND-TOLERATING-ERROR
|
||||
mlr --ijson --ocsv cat data/key-change.json
|
||||
GENMD-EOF
|
||||
|
||||
* In short, use-cases for CSV-lite and TSV-lite are often found when dealing with CSV/TSV files which are formatted in some non-standard way -- you have a little more flexibility available to you. (As an example of this flexibility: ASV and USV are nothing more than CSV-lite with different values for FS and RS.)
|
||||
|
||||
CSV, TSV, CSV-lite, and TSV-lite have in common the `--implicit-csv-header` flag for input and the `--headerless-csv-output` flag for output.
|
||||
|
|
|
|||
|
|
@ -118,9 +118,7 @@ However, if we ask for left-unpaireds, since there's no `color` column, we get a
|
|||
id,code,color
|
||||
4,ff0000,red
|
||||
2,00ff00,green
|
||||
|
||||
id,code
|
||||
3,0000ff
|
||||
3,0000ff,
|
||||
</pre>
|
||||
|
||||
To fix this, we can use **unsparsify**:
|
||||
|
|
|
|||
|
|
@ -375,13 +375,12 @@ record_count=150,resource=/path/to/second/file
|
|||
CSV and pretty-print formats expect rectangular structure. But Miller lets you
|
||||
process non-rectangular using CSV and pretty-print.
|
||||
|
||||
Miller simply prints a newline and a new header when there is a schema change
|
||||
-- where by _schema_ we mean simply the list of record keys in the order they
|
||||
are encountered. When there is no schema change, you get CSV per se as a
|
||||
special case. Likewise, Miller reads heterogeneous CSV or pretty-print input
|
||||
the same way. The difference between CSV and CSV-lite is that the former is
|
||||
[RFC-4180-compliant](file-formats.md#csvtsvasvusvetc), while the latter readily
|
||||
handles heterogeneous data (which is non-compliant). For example:
|
||||
For CSV-lite and TSV-lite, Miller simply prints a newline and a new header when there is a schema
|
||||
change -- where by _schema_ we mean simply the list of record keys in the order they are
|
||||
encountered. When there is no schema change, you get CSV per se as a special case. Likewise, Miller
|
||||
reads heterogeneous CSV or pretty-print input the same way. The difference between CSV and CSV-lite
|
||||
is that the former is [RFC-4180-compliant](file-formats.md#csvtsvasvusvetc), while the latter
|
||||
readily handles heterogeneous data (which is non-compliant). For example:
|
||||
|
||||
<pre class="pre-highlight-in-pair">
|
||||
<b>cat data/het.json</b>
|
||||
|
|
@ -446,19 +445,43 @@ record_count resource
|
|||
150 /path/to/second/file
|
||||
</pre>
|
||||
|
||||
Miller handles explicit header changes as just shown. If your CSV input contains ragged data -- if there are implicit header changes (no intervening blank line and new header line) as seen above -- you can use `--allow-ragged-csv-input` (or keystroke-saver `--ragged`).
|
||||
<pre class="pre-highlight-in-pair">
|
||||
<b>mlr --ijson --ocsvlite group-like data/het.json</b>
|
||||
</pre>
|
||||
<pre class="pre-non-highlight-in-pair">
|
||||
resource,loadsec,ok
|
||||
/path/to/file,0.45,true
|
||||
/path/to/second/file,0.32,true
|
||||
/some/other/path,0.97,false
|
||||
|
||||
record_count,resource
|
||||
100,/path/to/file
|
||||
150,/path/to/second/file
|
||||
</pre>
|
||||
|
||||
<pre class="pre-highlight-in-pair">
|
||||
<b>mlr --csv --ragged cat data/het/ragged.csv</b>
|
||||
<b>mlr --ijson --ocsv group-like data/het.json</b>
|
||||
</pre>
|
||||
<pre class="pre-non-highlight-in-pair">
|
||||
resource,loadsec,ok
|
||||
/path/to/file,0.45,true
|
||||
/path/to/second/file,0.32,true
|
||||
/some/other/path,0.97,false
|
||||
mlr: CSV schema change: first keys "resource,loadsec,ok"; current keys "record_count,resource"
|
||||
mlr: exiting due to data error.
|
||||
</pre>
|
||||
|
||||
Miller handles explicit header changes as just shown. If your CSV input contains ragged data -- if
|
||||
there are implicit header changes (no intervening blank line and new header line) as seen above --
|
||||
you can use `--allow-ragged-csv-input` (or keystroke-saver `--ragged`).
|
||||
|
||||
<pre class="pre-highlight-in-pair">
|
||||
<b>mlr --csv --allow-ragged-csv-input cat data/het/ragged.csv</b>
|
||||
</pre>
|
||||
<pre class="pre-non-highlight-in-pair">
|
||||
a,b,c
|
||||
1,2,3
|
||||
|
||||
a,b
|
||||
4,5
|
||||
|
||||
a,b,c,4
|
||||
4,5,
|
||||
7,8,9,10
|
||||
</pre>
|
||||
|
||||
|
|
|
|||
|
|
@ -180,13 +180,12 @@ GENMD-EOF
|
|||
CSV and pretty-print formats expect rectangular structure. But Miller lets you
|
||||
process non-rectangular using CSV and pretty-print.
|
||||
|
||||
Miller simply prints a newline and a new header when there is a schema change
|
||||
-- where by _schema_ we mean simply the list of record keys in the order they
|
||||
are encountered. When there is no schema change, you get CSV per se as a
|
||||
special case. Likewise, Miller reads heterogeneous CSV or pretty-print input
|
||||
the same way. The difference between CSV and CSV-lite is that the former is
|
||||
[RFC-4180-compliant](file-formats.md#csvtsvasvusvetc), while the latter readily
|
||||
handles heterogeneous data (which is non-compliant). For example:
|
||||
For CSV-lite and TSV-lite, Miller simply prints a newline and a new header when there is a schema
|
||||
change -- where by _schema_ we mean simply the list of record keys in the order they are
|
||||
encountered. When there is no schema change, you get CSV per se as a special case. Likewise, Miller
|
||||
reads heterogeneous CSV or pretty-print input the same way. The difference between CSV and CSV-lite
|
||||
is that the former is [RFC-4180-compliant](file-formats.md#csvtsvasvusvetc), while the latter
|
||||
readily handles heterogeneous data (which is non-compliant). For example:
|
||||
|
||||
GENMD-RUN-COMMAND
|
||||
cat data/het.json
|
||||
|
|
@ -200,10 +199,20 @@ GENMD-RUN-COMMAND
|
|||
mlr --ijson --opprint group-like data/het.json
|
||||
GENMD-EOF
|
||||
|
||||
Miller handles explicit header changes as just shown. If your CSV input contains ragged data -- if there are implicit header changes (no intervening blank line and new header line) as seen above -- you can use `--allow-ragged-csv-input` (or keystroke-saver `--ragged`).
|
||||
GENMD-RUN-COMMAND
|
||||
mlr --ijson --ocsvlite group-like data/het.json
|
||||
GENMD-EOF
|
||||
|
||||
GENMD-RUN-COMMAND-TOLERATING-ERROR
|
||||
mlr --csv --ragged cat data/het/ragged.csv
|
||||
mlr --ijson --ocsv group-like data/het.json
|
||||
GENMD-EOF
|
||||
|
||||
Miller handles explicit header changes as just shown. If your CSV input contains ragged data -- if
|
||||
there are implicit header changes (no intervening blank line and new header line) as seen above --
|
||||
you can use `--allow-ragged-csv-input` (or keystroke-saver `--ragged`).
|
||||
|
||||
GENMD-RUN-COMMAND
|
||||
mlr --csv --allow-ragged-csv-input cat data/het/ragged.csv
|
||||
GENMD-EOF
|
||||
|
||||
## Processing heterogeneous data
|
||||
|
|
|
|||
|
|
@ -94,7 +94,11 @@ func channelWriterHandleBatch(
|
|||
}
|
||||
|
||||
if record != nil {
|
||||
recordWriter.Write(record, bufferedOutputStream, outputIsStdout)
|
||||
err := recordWriter.Write(record, bufferedOutputStream, outputIsStdout)
|
||||
if err != nil {
|
||||
fmt.Fprintf(os.Stderr, "mlr: %v\n", err)
|
||||
return true, true
|
||||
}
|
||||
}
|
||||
|
||||
outputString := recordAndContext.OutputString
|
||||
|
|
@ -111,8 +115,13 @@ func channelWriterHandleBatch(
|
|||
// queued up. For example, PPRINT needs to see all same-schema
|
||||
// records before printing any, since it needs to compute max width
|
||||
// down columns.
|
||||
recordWriter.Write(nil, bufferedOutputStream, outputIsStdout)
|
||||
return true, false
|
||||
err := recordWriter.Write(nil, bufferedOutputStream, outputIsStdout)
|
||||
if err != nil {
|
||||
fmt.Fprintf(os.Stderr, "mlr: %v\n", err)
|
||||
return true, true
|
||||
} else {
|
||||
return true, false
|
||||
}
|
||||
}
|
||||
}
|
||||
return false, false
|
||||
|
|
|
|||
|
|
@ -20,5 +20,5 @@ type IRecordWriter interface {
|
|||
outrec *mlrval.Mlrmap,
|
||||
bufferedOutputStream *bufio.Writer,
|
||||
outputIsStdout bool,
|
||||
)
|
||||
) error
|
||||
}
|
||||
|
|
|
|||
|
|
@ -12,15 +12,13 @@ import (
|
|||
)
|
||||
|
||||
type RecordWriterCSV struct {
|
||||
writerOptions *cli.TWriterOptions
|
||||
ofs0 byte // Go's CSV library only lets its 'Comma' be a single character
|
||||
csvWriter *csv.Writer
|
||||
// For reporting schema changes: we print a newline and the new header
|
||||
lastJoinedHeader *string
|
||||
// Only write one blank line for schema changes / blank input lines
|
||||
justWroteEmptyLine bool
|
||||
// For double-quote around all fields
|
||||
quoteAll bool
|
||||
writerOptions *cli.TWriterOptions
|
||||
ofs0 byte // Go's CSV library only lets its 'Comma' be a single character
|
||||
csvWriter *csv.Writer
|
||||
needToPrintHeader bool
|
||||
firstRecordKeys []string
|
||||
firstRecordNF int64
|
||||
quoteAll bool // For double-quote around all fields
|
||||
}
|
||||
|
||||
func NewRecordWriterCSV(writerOptions *cli.TWriterOptions) (*RecordWriterCSV, error) {
|
||||
|
|
@ -30,23 +28,25 @@ func NewRecordWriterCSV(writerOptions *cli.TWriterOptions) (*RecordWriterCSV, er
|
|||
if writerOptions.ORS != "\n" && writerOptions.ORS != "\r\n" {
|
||||
return nil, fmt.Errorf("for CSV, ORS cannot be altered")
|
||||
}
|
||||
return &RecordWriterCSV{
|
||||
writerOptions: writerOptions,
|
||||
csvWriter: nil, // will be set on first Write() wherein we have the output stream
|
||||
lastJoinedHeader: nil,
|
||||
justWroteEmptyLine: false,
|
||||
quoteAll: writerOptions.CSVQuoteAll,
|
||||
}, nil
|
||||
writer := &RecordWriterCSV{
|
||||
writerOptions: writerOptions,
|
||||
csvWriter: nil, // will be set on first Write() wherein we have the output stream
|
||||
needToPrintHeader: !writerOptions.HeaderlessOutput,
|
||||
firstRecordKeys: nil,
|
||||
firstRecordNF: -1,
|
||||
quoteAll: writerOptions.CSVQuoteAll,
|
||||
}
|
||||
return writer, nil
|
||||
}
|
||||
|
||||
func (writer *RecordWriterCSV) Write(
|
||||
outrec *mlrval.Mlrmap,
|
||||
bufferedOutputStream *bufio.Writer,
|
||||
outputIsStdout bool,
|
||||
) {
|
||||
) error {
|
||||
// End of record stream: nothing special for this output format
|
||||
if outrec == nil {
|
||||
return
|
||||
return nil
|
||||
}
|
||||
|
||||
if writer.csvWriter == nil {
|
||||
|
|
@ -54,46 +54,46 @@ func (writer *RecordWriterCSV) Write(
|
|||
writer.csvWriter.Comma = rune(writer.writerOptions.OFS[0]) // xxx temp
|
||||
}
|
||||
|
||||
if outrec.IsEmpty() {
|
||||
if !writer.justWroteEmptyLine {
|
||||
bufferedOutputStream.WriteString("\n")
|
||||
}
|
||||
joinedHeader := ""
|
||||
writer.lastJoinedHeader = &joinedHeader
|
||||
writer.justWroteEmptyLine = true
|
||||
return
|
||||
if writer.firstRecordKeys == nil {
|
||||
writer.firstRecordKeys = outrec.GetKeys()
|
||||
writer.firstRecordNF = int64(len(writer.firstRecordKeys))
|
||||
}
|
||||
|
||||
needToPrintHeader := false
|
||||
joinedHeader := strings.Join(outrec.GetKeys(), ",")
|
||||
if writer.lastJoinedHeader == nil || *writer.lastJoinedHeader != joinedHeader {
|
||||
if writer.lastJoinedHeader != nil {
|
||||
if !writer.justWroteEmptyLine {
|
||||
bufferedOutputStream.WriteString("\n")
|
||||
}
|
||||
writer.justWroteEmptyLine = true
|
||||
}
|
||||
writer.lastJoinedHeader = &joinedHeader
|
||||
needToPrintHeader = true
|
||||
}
|
||||
|
||||
if needToPrintHeader && !writer.writerOptions.HeaderlessOutput {
|
||||
if writer.needToPrintHeader {
|
||||
fields := make([]string, outrec.FieldCount)
|
||||
i := 0
|
||||
for pe := outrec.Head; pe != nil; pe = pe.Next {
|
||||
fields[i] = pe.Key
|
||||
i++
|
||||
}
|
||||
//////writer.csvWriter.Write(fields)
|
||||
writer.WriteCSVRecordMaybeColorized(fields, bufferedOutputStream, outputIsStdout, true, writer.quoteAll)
|
||||
writer.needToPrintHeader = false
|
||||
}
|
||||
|
||||
fields := make([]string, outrec.FieldCount)
|
||||
i := 0
|
||||
var outputNF int64 = outrec.FieldCount
|
||||
if outputNF < writer.firstRecordNF {
|
||||
outputNF = writer.firstRecordNF
|
||||
}
|
||||
|
||||
fields := make([]string, outputNF)
|
||||
var i int64 = 0
|
||||
for pe := outrec.Head; pe != nil; pe = pe.Next {
|
||||
if i < writer.firstRecordNF && pe.Key != writer.firstRecordKeys[i] {
|
||||
return fmt.Errorf(
|
||||
"CSV schema change: first keys \"%s\"; current keys \"%s\"",
|
||||
strings.Join(writer.firstRecordKeys, writer.writerOptions.OFS),
|
||||
strings.Join(outrec.GetKeys(), writer.writerOptions.OFS),
|
||||
)
|
||||
}
|
||||
fields[i] = pe.Value.String()
|
||||
i++
|
||||
}
|
||||
|
||||
for ; i < outputNF; i++ {
|
||||
fields[i] = ""
|
||||
}
|
||||
|
||||
writer.WriteCSVRecordMaybeColorized(fields, bufferedOutputStream, outputIsStdout, false, writer.quoteAll)
|
||||
writer.justWroteEmptyLine = false
|
||||
|
||||
return nil
|
||||
}
|
||||
|
|
|
|||
|
|
@ -29,10 +29,10 @@ func (writer *RecordWriterCSVLite) Write(
|
|||
outrec *mlrval.Mlrmap,
|
||||
bufferedOutputStream *bufio.Writer,
|
||||
outputIsStdout bool,
|
||||
) {
|
||||
) error {
|
||||
// End of record stream: nothing special for this output format
|
||||
if outrec == nil {
|
||||
return
|
||||
return nil
|
||||
}
|
||||
|
||||
if outrec.IsEmpty() {
|
||||
|
|
@ -42,7 +42,7 @@ func (writer *RecordWriterCSVLite) Write(
|
|||
joinedHeader := ""
|
||||
writer.lastJoinedHeader = &joinedHeader
|
||||
writer.justWroteEmptyLine = true
|
||||
return
|
||||
return nil
|
||||
}
|
||||
|
||||
needToPrintHeader := false
|
||||
|
|
@ -79,4 +79,6 @@ func (writer *RecordWriterCSVLite) Write(
|
|||
bufferedOutputStream.WriteString(writer.writerOptions.ORS)
|
||||
|
||||
writer.justWroteEmptyLine = false
|
||||
|
||||
return nil
|
||||
}
|
||||
|
|
|
|||
|
|
@ -22,15 +22,15 @@ func (writer *RecordWriterDKVP) Write(
|
|||
outrec *mlrval.Mlrmap,
|
||||
bufferedOutputStream *bufio.Writer,
|
||||
outputIsStdout bool,
|
||||
) {
|
||||
) error {
|
||||
// End of record stream: nothing special for this output format
|
||||
if outrec == nil {
|
||||
return
|
||||
return nil
|
||||
}
|
||||
|
||||
if outrec.IsEmpty() {
|
||||
bufferedOutputStream.WriteString(writer.writerOptions.ORS)
|
||||
return
|
||||
return nil
|
||||
}
|
||||
|
||||
for pe := outrec.Head; pe != nil; pe = pe.Next {
|
||||
|
|
@ -42,4 +42,6 @@ func (writer *RecordWriterDKVP) Write(
|
|||
}
|
||||
}
|
||||
bufferedOutputStream.WriteString(writer.writerOptions.ORS)
|
||||
|
||||
return nil
|
||||
}
|
||||
|
|
|
|||
|
|
@ -39,7 +39,7 @@ func (writer *RecordWriterJSON) Write(
|
|||
outrec *mlrval.Mlrmap,
|
||||
bufferedOutputStream *bufio.Writer,
|
||||
outputIsStdout bool,
|
||||
) {
|
||||
) error {
|
||||
if outrec != nil && writer.jvQuoteAll {
|
||||
outrec.StringifyValuesRecursively()
|
||||
}
|
||||
|
|
@ -49,6 +49,7 @@ func (writer *RecordWriterJSON) Write(
|
|||
} else {
|
||||
writer.writeWithoutListWrap(outrec, bufferedOutputStream, outputIsStdout)
|
||||
}
|
||||
return nil
|
||||
}
|
||||
|
||||
// ----------------------------------------------------------------
|
||||
|
|
|
|||
|
|
@ -31,9 +31,9 @@ func (writer *RecordWriterMarkdown) Write(
|
|||
outrec *mlrval.Mlrmap,
|
||||
bufferedOutputStream *bufio.Writer,
|
||||
outputIsStdout bool,
|
||||
) {
|
||||
) error {
|
||||
if outrec == nil { // end of record stream
|
||||
return
|
||||
return nil
|
||||
}
|
||||
|
||||
currentJoinedHeader := outrec.GetKeysJoined()
|
||||
|
|
@ -73,4 +73,6 @@ func (writer *RecordWriterMarkdown) Write(
|
|||
bufferedOutputStream.WriteString(" |")
|
||||
}
|
||||
bufferedOutputStream.WriteString(writer.writerOptions.ORS)
|
||||
|
||||
return nil
|
||||
}
|
||||
|
|
|
|||
|
|
@ -21,10 +21,10 @@ func (writer *RecordWriterNIDX) Write(
|
|||
outrec *mlrval.Mlrmap,
|
||||
bufferedOutputStream *bufio.Writer,
|
||||
outputIsStdout bool,
|
||||
) {
|
||||
) error {
|
||||
// End of record stream: nothing special for this output format
|
||||
if outrec == nil {
|
||||
return
|
||||
return nil
|
||||
}
|
||||
|
||||
for pe := outrec.Head; pe != nil; pe = pe.Next {
|
||||
|
|
@ -34,4 +34,6 @@ func (writer *RecordWriterNIDX) Write(
|
|||
}
|
||||
}
|
||||
bufferedOutputStream.WriteString(writer.writerOptions.ORS)
|
||||
|
||||
return nil
|
||||
}
|
||||
|
|
|
|||
|
|
@ -37,7 +37,7 @@ func (writer *RecordWriterPPRINT) Write(
|
|||
outrec *mlrval.Mlrmap,
|
||||
bufferedOutputStream *bufio.Writer,
|
||||
outputIsStdout bool,
|
||||
) {
|
||||
) error {
|
||||
// Group records by have-same-schema or not. Pretty-print each
|
||||
// homoegeneous sublist, or "batch".
|
||||
//
|
||||
|
|
@ -83,6 +83,8 @@ func (writer *RecordWriterPPRINT) Write(
|
|||
bufferedOutputStream, outputIsStdout)
|
||||
}
|
||||
}
|
||||
|
||||
return nil
|
||||
}
|
||||
|
||||
// ----------------------------------------------------------------
|
||||
|
|
|
|||
|
|
@ -12,11 +12,10 @@ import (
|
|||
)
|
||||
|
||||
type RecordWriterTSV struct {
|
||||
writerOptions *cli.TWriterOptions
|
||||
// For reporting schema changes: we print a newline and the new header
|
||||
lastJoinedHeader *string
|
||||
// Only write one blank line for schema changes / blank input lines
|
||||
justWroteEmptyLine bool
|
||||
writerOptions *cli.TWriterOptions
|
||||
needToPrintHeader bool
|
||||
firstRecordKeys []string
|
||||
firstRecordNF int64
|
||||
}
|
||||
|
||||
func NewRecordWriterTSV(writerOptions *cli.TWriterOptions) (*RecordWriterTSV, error) {
|
||||
|
|
@ -27,9 +26,10 @@ func NewRecordWriterTSV(writerOptions *cli.TWriterOptions) (*RecordWriterTSV, er
|
|||
return nil, fmt.Errorf("for CSV, ORS cannot be altered")
|
||||
}
|
||||
return &RecordWriterTSV{
|
||||
writerOptions: writerOptions,
|
||||
lastJoinedHeader: nil,
|
||||
justWroteEmptyLine: false,
|
||||
writerOptions: writerOptions,
|
||||
needToPrintHeader: !writerOptions.HeaderlessOutput,
|
||||
firstRecordKeys: nil,
|
||||
firstRecordNF: -1,
|
||||
}, nil
|
||||
}
|
||||
|
||||
|
|
@ -37,42 +37,28 @@ func (writer *RecordWriterTSV) Write(
|
|||
outrec *mlrval.Mlrmap,
|
||||
bufferedOutputStream *bufio.Writer,
|
||||
outputIsStdout bool,
|
||||
) {
|
||||
) error {
|
||||
// End of record stream: nothing special for this output format
|
||||
if outrec == nil {
|
||||
return
|
||||
return nil
|
||||
}
|
||||
|
||||
if outrec.IsEmpty() {
|
||||
if !writer.justWroteEmptyLine {
|
||||
bufferedOutputStream.WriteString(writer.writerOptions.ORS)
|
||||
if writer.firstRecordKeys == nil {
|
||||
writer.firstRecordKeys = outrec.GetKeys()
|
||||
writer.firstRecordNF = int64(len(writer.firstRecordKeys))
|
||||
}
|
||||
|
||||
if writer.needToPrintHeader {
|
||||
fields := make([]string, outrec.FieldCount)
|
||||
i := 0
|
||||
for pe := outrec.Head; pe != nil; pe = pe.Next {
|
||||
fields[i] = pe.Key
|
||||
i++
|
||||
}
|
||||
joinedHeader := ""
|
||||
writer.lastJoinedHeader = &joinedHeader
|
||||
writer.justWroteEmptyLine = true
|
||||
return
|
||||
}
|
||||
|
||||
needToPrintHeader := false
|
||||
joinedHeader := strings.Join(outrec.GetKeys(), ",")
|
||||
if writer.lastJoinedHeader == nil || *writer.lastJoinedHeader != joinedHeader {
|
||||
if writer.lastJoinedHeader != nil {
|
||||
if !writer.justWroteEmptyLine {
|
||||
bufferedOutputStream.WriteString(writer.writerOptions.ORS)
|
||||
}
|
||||
writer.justWroteEmptyLine = true
|
||||
}
|
||||
writer.lastJoinedHeader = &joinedHeader
|
||||
needToPrintHeader = true
|
||||
}
|
||||
|
||||
if needToPrintHeader && !writer.writerOptions.HeaderlessOutput {
|
||||
for pe := outrec.Head; pe != nil; pe = pe.Next {
|
||||
bufferedOutputStream.WriteString(
|
||||
colorizer.MaybeColorizeKey(
|
||||
lib.TSVEncodeField(
|
||||
pe.Key,
|
||||
),
|
||||
lib.TSVEncodeField(pe.Key),
|
||||
outputIsStdout,
|
||||
),
|
||||
)
|
||||
|
|
@ -83,22 +69,44 @@ func (writer *RecordWriterTSV) Write(
|
|||
}
|
||||
|
||||
bufferedOutputStream.WriteString(writer.writerOptions.ORS)
|
||||
|
||||
writer.needToPrintHeader = false
|
||||
}
|
||||
|
||||
var outputNF int64 = outrec.FieldCount
|
||||
if outputNF < writer.firstRecordNF {
|
||||
outputNF = writer.firstRecordNF
|
||||
}
|
||||
|
||||
fields := make([]string, outputNF)
|
||||
var i int64 = 0
|
||||
for pe := outrec.Head; pe != nil; pe = pe.Next {
|
||||
bufferedOutputStream.WriteString(
|
||||
colorizer.MaybeColorizeValue(
|
||||
lib.TSVEncodeField(
|
||||
pe.Value.String(),
|
||||
),
|
||||
outputIsStdout,
|
||||
),
|
||||
if i < writer.firstRecordNF && pe.Key != writer.firstRecordKeys[i] {
|
||||
return fmt.Errorf(
|
||||
"TSV schema change: first keys \"%s\"; current keys \"%s\"",
|
||||
strings.Join(writer.firstRecordKeys, writer.writerOptions.OFS),
|
||||
strings.Join(outrec.GetKeys(), writer.writerOptions.OFS),
|
||||
)
|
||||
}
|
||||
fields[i] = colorizer.MaybeColorizeValue(
|
||||
lib.TSVEncodeField(pe.Value.String()),
|
||||
outputIsStdout,
|
||||
)
|
||||
if pe.Next != nil {
|
||||
i++
|
||||
}
|
||||
|
||||
for ; i < outputNF; i++ {
|
||||
fields[i] = ""
|
||||
}
|
||||
|
||||
for j, field := range fields {
|
||||
if j > 0 {
|
||||
bufferedOutputStream.WriteString(writer.writerOptions.OFS)
|
||||
}
|
||||
bufferedOutputStream.WriteString(field)
|
||||
}
|
||||
|
||||
bufferedOutputStream.WriteString(writer.writerOptions.ORS)
|
||||
|
||||
writer.justWroteEmptyLine = false
|
||||
return nil
|
||||
}
|
||||
|
|
|
|||
|
|
@ -45,10 +45,10 @@ func (writer *RecordWriterXTAB) Write(
|
|||
outrec *mlrval.Mlrmap,
|
||||
bufferedOutputStream *bufio.Writer,
|
||||
outputIsStdout bool,
|
||||
) {
|
||||
) error {
|
||||
// End of record stream: nothing special for this output format
|
||||
if outrec == nil {
|
||||
return
|
||||
return nil
|
||||
}
|
||||
|
||||
maxKeyLength := 1
|
||||
|
|
@ -64,6 +64,8 @@ func (writer *RecordWriterXTAB) Write(
|
|||
} else {
|
||||
writer.writeWithLeftAlignedValues(outrec, bufferedOutputStream, outputIsStdout, maxKeyLength)
|
||||
}
|
||||
|
||||
return nil
|
||||
}
|
||||
|
||||
func (writer *RecordWriterXTAB) writeWithLeftAlignedValues(
|
||||
|
|
|
|||
1
test/cases/io-csv-auto-unsparsify/at/cmd
Normal file
1
test/cases/io-csv-auto-unsparsify/at/cmd
Normal file
|
|
@ -0,0 +1 @@
|
|||
mlr -i json -o csv cat ${CASEDIR}/input.json
|
||||
0
test/cases/io-csv-auto-unsparsify/at/experr
Normal file
0
test/cases/io-csv-auto-unsparsify/at/experr
Normal file
4
test/cases/io-csv-auto-unsparsify/at/expout
Normal file
4
test/cases/io-csv-auto-unsparsify/at/expout
Normal file
|
|
@ -0,0 +1,4 @@
|
|||
a,b,c
|
||||
1,2,3
|
||||
4,5,6
|
||||
7,8,9
|
||||
17
test/cases/io-csv-auto-unsparsify/at/input.json
Normal file
17
test/cases/io-csv-auto-unsparsify/at/input.json
Normal file
|
|
@ -0,0 +1,17 @@
|
|||
[
|
||||
{
|
||||
"a": 1,
|
||||
"b": 2,
|
||||
"c": 3
|
||||
},
|
||||
{
|
||||
"a": 4,
|
||||
"b": 5,
|
||||
"c": 6
|
||||
},
|
||||
{
|
||||
"a": 7,
|
||||
"b": 8,
|
||||
"c": 9
|
||||
}
|
||||
]
|
||||
1
test/cases/io-csv-auto-unsparsify/key-change/cmd
Normal file
1
test/cases/io-csv-auto-unsparsify/key-change/cmd
Normal file
|
|
@ -0,0 +1 @@
|
|||
mlr -i json -o csv cat ${CASEDIR}/input.json
|
||||
2
test/cases/io-csv-auto-unsparsify/key-change/experr
Normal file
2
test/cases/io-csv-auto-unsparsify/key-change/experr
Normal file
|
|
@ -0,0 +1,2 @@
|
|||
mlr: CSV schema change: first keys "a,b,c"; current keys "a,X,c"
|
||||
mlr: exiting due to data error.
|
||||
3
test/cases/io-csv-auto-unsparsify/key-change/expout
Normal file
3
test/cases/io-csv-auto-unsparsify/key-change/expout
Normal file
|
|
@ -0,0 +1,3 @@
|
|||
a,b,c
|
||||
1,2,3
|
||||
4,5,6
|
||||
17
test/cases/io-csv-auto-unsparsify/key-change/input.json
Normal file
17
test/cases/io-csv-auto-unsparsify/key-change/input.json
Normal file
|
|
@ -0,0 +1,17 @@
|
|||
[
|
||||
{
|
||||
"a": 1,
|
||||
"b": 2,
|
||||
"c": 3
|
||||
},
|
||||
{
|
||||
"a": 4,
|
||||
"b": 5,
|
||||
"c": 6
|
||||
},
|
||||
{
|
||||
"a": 7,
|
||||
"X": 8,
|
||||
"c": 9
|
||||
}
|
||||
]
|
||||
0
test/cases/io-csv-auto-unsparsify/key-change/should-fail
Normal file
0
test/cases/io-csv-auto-unsparsify/key-change/should-fail
Normal file
1
test/cases/io-csv-auto-unsparsify/over/cmd
Normal file
1
test/cases/io-csv-auto-unsparsify/over/cmd
Normal file
|
|
@ -0,0 +1 @@
|
|||
mlr -i json -o csv cat ${CASEDIR}/input.json
|
||||
0
test/cases/io-csv-auto-unsparsify/over/experr
Normal file
0
test/cases/io-csv-auto-unsparsify/over/experr
Normal file
4
test/cases/io-csv-auto-unsparsify/over/expout
Normal file
4
test/cases/io-csv-auto-unsparsify/over/expout
Normal file
|
|
@ -0,0 +1,4 @@
|
|||
a,b,c
|
||||
1,2,3
|
||||
4,5,6,7
|
||||
7,8,9
|
||||
18
test/cases/io-csv-auto-unsparsify/over/input.json
Normal file
18
test/cases/io-csv-auto-unsparsify/over/input.json
Normal file
|
|
@ -0,0 +1,18 @@
|
|||
[
|
||||
{
|
||||
"a": 1,
|
||||
"b": 2,
|
||||
"c": 3
|
||||
},
|
||||
{
|
||||
"a": 4,
|
||||
"b": 5,
|
||||
"c": 6,
|
||||
"d": 7
|
||||
},
|
||||
{
|
||||
"a": 7,
|
||||
"b": 8,
|
||||
"c": 9
|
||||
}
|
||||
]
|
||||
1
test/cases/io-csv-auto-unsparsify/under/cmd
Normal file
1
test/cases/io-csv-auto-unsparsify/under/cmd
Normal file
|
|
@ -0,0 +1 @@
|
|||
mlr -i json -o csv cat ${CASEDIR}/input.json
|
||||
0
test/cases/io-csv-auto-unsparsify/under/experr
Normal file
0
test/cases/io-csv-auto-unsparsify/under/experr
Normal file
4
test/cases/io-csv-auto-unsparsify/under/expout
Normal file
4
test/cases/io-csv-auto-unsparsify/under/expout
Normal file
|
|
@ -0,0 +1,4 @@
|
|||
a,b,c
|
||||
1,2,3
|
||||
4,5,
|
||||
7,8,9
|
||||
16
test/cases/io-csv-auto-unsparsify/under/input.json
Normal file
16
test/cases/io-csv-auto-unsparsify/under/input.json
Normal file
|
|
@ -0,0 +1,16 @@
|
|||
[
|
||||
{
|
||||
"a": 1,
|
||||
"b": 2,
|
||||
"c": 3
|
||||
},
|
||||
{
|
||||
"a": 4,
|
||||
"b": 5
|
||||
},
|
||||
{
|
||||
"a": 7,
|
||||
"b": 8,
|
||||
"c": 9
|
||||
}
|
||||
]
|
||||
|
|
@ -0,0 +1,2 @@
|
|||
mlr: CSV schema change: first keys "host"; current keys "df/tmp,uptime"
|
||||
mlr: exiting due to data error.
|
||||
|
|
@ -1,35 +1,2 @@
|
|||
host
|
||||
jupiter
|
||||
|
||||
df/tmp,uptime
|
||||
2.43MB,32345sec
|
||||
|
||||
host
|
||||
saturn
|
||||
|
||||
df/tmp,uptime
|
||||
1.34MB,234214132sec
|
||||
|
||||
host
|
||||
mars
|
||||
|
||||
df/tmp,uptime
|
||||
4.97MB,345089805sec
|
||||
|
||||
host
|
||||
jupiter
|
||||
|
||||
df/tmp,uptime
|
||||
0.04MB,890sec
|
||||
|
||||
host
|
||||
mars
|
||||
|
||||
df/tmp,uptime
|
||||
8.55MB,787897777sec
|
||||
|
||||
host
|
||||
saturn
|
||||
|
||||
df/tmp,uptime
|
||||
9.47MB,234289080sec
|
||||
|
|
|
|||
0
test/cases/io-multi/0010/should-fail
Normal file
0
test/cases/io-multi/0010/should-fail
Normal file
|
|
@ -0,0 +1,2 @@
|
|||
mlr: CSV schema change: first keys "host"; current keys "df/tmp,uptime"
|
||||
mlr: exiting due to data error.
|
||||
|
|
@ -1,35 +1,2 @@
|
|||
host
|
||||
jupiter
|
||||
|
||||
df/tmp,uptime
|
||||
2.43MB,32345sec
|
||||
|
||||
host
|
||||
saturn
|
||||
|
||||
df/tmp,uptime
|
||||
1.34MB,234214132sec
|
||||
|
||||
host
|
||||
mars
|
||||
|
||||
df/tmp,uptime
|
||||
4.97MB,345089805sec
|
||||
|
||||
host
|
||||
jupiter
|
||||
|
||||
df/tmp,uptime
|
||||
0.04MB,890sec
|
||||
|
||||
host
|
||||
mars
|
||||
|
||||
df/tmp,uptime
|
||||
8.55MB,787897777sec
|
||||
|
||||
host
|
||||
saturn
|
||||
|
||||
df/tmp,uptime
|
||||
9.47MB,234289080sec
|
||||
|
|
|
|||
0
test/cases/io-multi/0033/should-fail
Normal file
0
test/cases/io-multi/0033/should-fail
Normal file
|
|
@ -0,0 +1,2 @@
|
|||
mlr: CSV schema change: first keys "host"; current keys "df/tmp,uptime"
|
||||
mlr: exiting due to data error.
|
||||
|
|
@ -1,23 +1 @@
|
|||
jupiter
|
||||
|
||||
2.43MB,32345sec
|
||||
|
||||
saturn
|
||||
|
||||
1.34MB,234214132sec
|
||||
|
||||
mars
|
||||
|
||||
4.97MB,345089805sec
|
||||
|
||||
jupiter
|
||||
|
||||
0.04MB,890sec
|
||||
|
||||
mars
|
||||
|
||||
8.55MB,787897777sec
|
||||
|
||||
saturn
|
||||
|
||||
9.47MB,234289080sec
|
||||
|
|
|
|||
0
test/cases/io-multi/0034/should-fail
Normal file
0
test/cases/io-multi/0034/should-fail
Normal file
1
test/cases/io-tsv-auto-unsparsify/at/cmd
Normal file
1
test/cases/io-tsv-auto-unsparsify/at/cmd
Normal file
|
|
@ -0,0 +1 @@
|
|||
mlr -i json -o tsv cat ${CASEDIR}/input.json
|
||||
0
test/cases/io-tsv-auto-unsparsify/at/experr
Normal file
0
test/cases/io-tsv-auto-unsparsify/at/experr
Normal file
4
test/cases/io-tsv-auto-unsparsify/at/expout
Normal file
4
test/cases/io-tsv-auto-unsparsify/at/expout
Normal file
|
|
@ -0,0 +1,4 @@
|
|||
a b c
|
||||
1 2 3
|
||||
4 5 6
|
||||
7 8 9
|
||||
17
test/cases/io-tsv-auto-unsparsify/at/input.json
Normal file
17
test/cases/io-tsv-auto-unsparsify/at/input.json
Normal file
|
|
@ -0,0 +1,17 @@
|
|||
[
|
||||
{
|
||||
"a": 1,
|
||||
"b": 2,
|
||||
"c": 3
|
||||
},
|
||||
{
|
||||
"a": 4,
|
||||
"b": 5,
|
||||
"c": 6
|
||||
},
|
||||
{
|
||||
"a": 7,
|
||||
"b": 8,
|
||||
"c": 9
|
||||
}
|
||||
]
|
||||
1
test/cases/io-tsv-auto-unsparsify/key-change/cmd
Normal file
1
test/cases/io-tsv-auto-unsparsify/key-change/cmd
Normal file
|
|
@ -0,0 +1 @@
|
|||
mlr -i json -o tsv cat ${CASEDIR}/input.json
|
||||
2
test/cases/io-tsv-auto-unsparsify/key-change/experr
Normal file
2
test/cases/io-tsv-auto-unsparsify/key-change/experr
Normal file
|
|
@ -0,0 +1,2 @@
|
|||
mlr: TSV schema change: first keys "a b c"; current keys "a X c"
|
||||
mlr: exiting due to data error.
|
||||
3
test/cases/io-tsv-auto-unsparsify/key-change/expout
Normal file
3
test/cases/io-tsv-auto-unsparsify/key-change/expout
Normal file
|
|
@ -0,0 +1,3 @@
|
|||
a b c
|
||||
1 2 3
|
||||
4 5 6
|
||||
17
test/cases/io-tsv-auto-unsparsify/key-change/input.json
Normal file
17
test/cases/io-tsv-auto-unsparsify/key-change/input.json
Normal file
|
|
@ -0,0 +1,17 @@
|
|||
[
|
||||
{
|
||||
"a": 1,
|
||||
"b": 2,
|
||||
"c": 3
|
||||
},
|
||||
{
|
||||
"a": 4,
|
||||
"b": 5,
|
||||
"c": 6
|
||||
},
|
||||
{
|
||||
"a": 7,
|
||||
"X": 8,
|
||||
"c": 9
|
||||
}
|
||||
]
|
||||
0
test/cases/io-tsv-auto-unsparsify/key-change/should-fail
Normal file
0
test/cases/io-tsv-auto-unsparsify/key-change/should-fail
Normal file
1
test/cases/io-tsv-auto-unsparsify/over/cmd
Normal file
1
test/cases/io-tsv-auto-unsparsify/over/cmd
Normal file
|
|
@ -0,0 +1 @@
|
|||
mlr -i json -o tsv cat ${CASEDIR}/input.json
|
||||
0
test/cases/io-tsv-auto-unsparsify/over/experr
Normal file
0
test/cases/io-tsv-auto-unsparsify/over/experr
Normal file
4
test/cases/io-tsv-auto-unsparsify/over/expout
Normal file
4
test/cases/io-tsv-auto-unsparsify/over/expout
Normal file
|
|
@ -0,0 +1,4 @@
|
|||
a b c
|
||||
1 2 3
|
||||
4 5 6 7
|
||||
7 8 9
|
||||
18
test/cases/io-tsv-auto-unsparsify/over/input.json
Normal file
18
test/cases/io-tsv-auto-unsparsify/over/input.json
Normal file
|
|
@ -0,0 +1,18 @@
|
|||
[
|
||||
{
|
||||
"a": 1,
|
||||
"b": 2,
|
||||
"c": 3
|
||||
},
|
||||
{
|
||||
"a": 4,
|
||||
"b": 5,
|
||||
"c": 6,
|
||||
"d": 7
|
||||
},
|
||||
{
|
||||
"a": 7,
|
||||
"b": 8,
|
||||
"c": 9
|
||||
}
|
||||
]
|
||||
1
test/cases/io-tsv-auto-unsparsify/under/cmd
Normal file
1
test/cases/io-tsv-auto-unsparsify/under/cmd
Normal file
|
|
@ -0,0 +1 @@
|
|||
mlr -i json -o tsv cat ${CASEDIR}/input.json
|
||||
0
test/cases/io-tsv-auto-unsparsify/under/experr
Normal file
0
test/cases/io-tsv-auto-unsparsify/under/experr
Normal file
4
test/cases/io-tsv-auto-unsparsify/under/expout
Normal file
4
test/cases/io-tsv-auto-unsparsify/under/expout
Normal file
|
|
@ -0,0 +1,4 @@
|
|||
a b c
|
||||
1 2 3
|
||||
4 5
|
||||
7 8 9
|
||||
16
test/cases/io-tsv-auto-unsparsify/under/input.json
Normal file
16
test/cases/io-tsv-auto-unsparsify/under/input.json
Normal file
|
|
@ -0,0 +1,16 @@
|
|||
[
|
||||
{
|
||||
"a": 1,
|
||||
"b": 2,
|
||||
"c": 3
|
||||
},
|
||||
{
|
||||
"a": 4,
|
||||
"b": 5
|
||||
},
|
||||
{
|
||||
"a": 7,
|
||||
"b": 8,
|
||||
"c": 9
|
||||
}
|
||||
]
|
||||
Loading…
Add table
Add a link
Reference in a new issue