mirror of
https://github.com/johnkerl/miller.git
synced 2026-01-23 02:14:13 +00:00
mlr split verb (#898)
* mlr split * regression-test cases * doc-build artifacts
This commit is contained in:
parent
dad6456022
commit
f7ff63124b
62 changed files with 969 additions and 19 deletions
|
|
@ -195,8 +195,8 @@ VERB LIST
|
|||
json-stringify join label least-frequent merge-fields most-frequent nest
|
||||
nothing put regularize remove-empty-columns rename reorder repeat reshape
|
||||
sample sec2gmtdate sec2gmt seqgen shuffle skip-trivial-records sort
|
||||
sort-within-records stats1 stats2 step tac tail tee template top unflatten
|
||||
uniq unsparsify
|
||||
sort-within-records split stats1 stats2 step tac tail tee template top
|
||||
unflatten uniq unsparsify
|
||||
|
||||
FUNCTION LIST
|
||||
abs acos acosh any append apply arrayify asin asinh asserting_absent
|
||||
|
|
@ -1737,6 +1737,46 @@ VERBS
|
|||
-r Recursively sort subobjects/submaps, e.g. for JSON input.
|
||||
-h|--help Show this message.
|
||||
|
||||
split
|
||||
Usage: mlr split [options] {filename}
|
||||
Options:
|
||||
-n {n}: Cap file sizes at N records.
|
||||
-m {m}: Produce M files, round-robining records among them.
|
||||
-g {a,b,c}: Write separate files with records having distinct values for fields named a,b,c.
|
||||
Exactly one of -m, -n, or -g must be supplied.
|
||||
--prefix {p} Specify filename prefix; default "split".
|
||||
--suffix {s} Specify filename suffix; default is from mlr output format, e.g. "csv".
|
||||
-a Append to existing file(s), if any, rather than overwriting.
|
||||
-v Send records along to downstream verbs as well as splitting to files.
|
||||
-h|--help Show this message.
|
||||
Any of the output-format command-line flags (see mlr -h). For example, using
|
||||
mlr --icsv --from myfile.csv split --ojson -n 1000
|
||||
the input is CSV, but the output files are JSON.
|
||||
|
||||
Examples: Suppose myfile.csv has 1,000,000 records.
|
||||
|
||||
100 output files, 10,000 records each. First 10,000 records in split_1.csv, next in split_2.csv, etc.
|
||||
mlr --csv --from myfile.csv split -n 10000
|
||||
|
||||
10 output files, 100,000 records each. Records 1,11,21,etc in split_1.csv, records 2,12,22, etc in split_2.csv, etc.
|
||||
mlr --csv --from myfile.csv split -m 10
|
||||
Same, but with JSON output.
|
||||
mlr --csv --from myfile.csv split -m 10 -o json
|
||||
|
||||
Same but instead of split_1.csv, split_2.csv, etc. there are test_1.dat, test_2.dat, etc.
|
||||
mlr --csv --from myfile.csv split -m 10 --prefix test --suffix dat
|
||||
Same, but written to the /tmp/ directory.
|
||||
mlr --csv --from myfile.csv split -m 10 --prefix /tmp/test --suffix dat
|
||||
|
||||
If the shape field has values triangle and square, then there will be split_triangle.csv and split_square.csv.
|
||||
mlr --csv --from myfile.csv split -g shape
|
||||
|
||||
If the color field has values yellow and green, and the shape field has values triangle and square,
|
||||
then there will be split_yellow_triangle.csv, split_yellow_square.csv, etc.
|
||||
mlr --csv --from myfile.csv split -g color,shape
|
||||
|
||||
See also the "tee" DSL function which lets you do more ad-hoc customization.
|
||||
|
||||
stats1
|
||||
Usage: mlr stats1 [options]
|
||||
Computes univariate statistics for one or more given fields, accumulated across
|
||||
|
|
@ -3091,5 +3131,5 @@ SEE ALSO
|
|||
|
||||
|
||||
|
||||
2022-01-25 MILLER(1)
|
||||
2022-01-27 MILLER(1)
|
||||
</pre>
|
||||
|
|
|
|||
|
|
@ -174,8 +174,8 @@ VERB LIST
|
|||
json-stringify join label least-frequent merge-fields most-frequent nest
|
||||
nothing put regularize remove-empty-columns rename reorder repeat reshape
|
||||
sample sec2gmtdate sec2gmt seqgen shuffle skip-trivial-records sort
|
||||
sort-within-records stats1 stats2 step tac tail tee template top unflatten
|
||||
uniq unsparsify
|
||||
sort-within-records split stats1 stats2 step tac tail tee template top
|
||||
unflatten uniq unsparsify
|
||||
|
||||
FUNCTION LIST
|
||||
abs acos acosh any append apply arrayify asin asinh asserting_absent
|
||||
|
|
@ -1716,6 +1716,46 @@ VERBS
|
|||
-r Recursively sort subobjects/submaps, e.g. for JSON input.
|
||||
-h|--help Show this message.
|
||||
|
||||
split
|
||||
Usage: mlr split [options] {filename}
|
||||
Options:
|
||||
-n {n}: Cap file sizes at N records.
|
||||
-m {m}: Produce M files, round-robining records among them.
|
||||
-g {a,b,c}: Write separate files with records having distinct values for fields named a,b,c.
|
||||
Exactly one of -m, -n, or -g must be supplied.
|
||||
--prefix {p} Specify filename prefix; default "split".
|
||||
--suffix {s} Specify filename suffix; default is from mlr output format, e.g. "csv".
|
||||
-a Append to existing file(s), if any, rather than overwriting.
|
||||
-v Send records along to downstream verbs as well as splitting to files.
|
||||
-h|--help Show this message.
|
||||
Any of the output-format command-line flags (see mlr -h). For example, using
|
||||
mlr --icsv --from myfile.csv split --ojson -n 1000
|
||||
the input is CSV, but the output files are JSON.
|
||||
|
||||
Examples: Suppose myfile.csv has 1,000,000 records.
|
||||
|
||||
100 output files, 10,000 records each. First 10,000 records in split_1.csv, next in split_2.csv, etc.
|
||||
mlr --csv --from myfile.csv split -n 10000
|
||||
|
||||
10 output files, 100,000 records each. Records 1,11,21,etc in split_1.csv, records 2,12,22, etc in split_2.csv, etc.
|
||||
mlr --csv --from myfile.csv split -m 10
|
||||
Same, but with JSON output.
|
||||
mlr --csv --from myfile.csv split -m 10 -o json
|
||||
|
||||
Same but instead of split_1.csv, split_2.csv, etc. there are test_1.dat, test_2.dat, etc.
|
||||
mlr --csv --from myfile.csv split -m 10 --prefix test --suffix dat
|
||||
Same, but written to the /tmp/ directory.
|
||||
mlr --csv --from myfile.csv split -m 10 --prefix /tmp/test --suffix dat
|
||||
|
||||
If the shape field has values triangle and square, then there will be split_triangle.csv and split_square.csv.
|
||||
mlr --csv --from myfile.csv split -g shape
|
||||
|
||||
If the color field has values yellow and green, and the shape field has values triangle and square,
|
||||
then there will be split_yellow_triangle.csv, split_yellow_square.csv, etc.
|
||||
mlr --csv --from myfile.csv split -g color,shape
|
||||
|
||||
See also the "tee" DSL function which lets you do more ad-hoc customization.
|
||||
|
||||
stats1
|
||||
Usage: mlr stats1 [options]
|
||||
Computes univariate statistics for one or more given fields, accumulated across
|
||||
|
|
@ -3070,4 +3110,4 @@ SEE ALSO
|
|||
|
||||
|
||||
|
||||
2022-01-25 MILLER(1)
|
||||
2022-01-27 MILLER(1)
|
||||
|
|
|
|||
|
|
@ -2978,6 +2978,52 @@ a b c
|
|||
9 8 7
|
||||
</pre>
|
||||
|
||||
## split
|
||||
|
||||
<pre class="pre-highlight-in-pair">
|
||||
<b>mlr split --help</b>
|
||||
</pre>
|
||||
<pre class="pre-non-highlight-in-pair">
|
||||
Usage: mlr split [options] {filename}
|
||||
Options:
|
||||
-n {n}: Cap file sizes at N records.
|
||||
-m {m}: Produce M files, round-robining records among them.
|
||||
-g {a,b,c}: Write separate files with records having distinct values for fields named a,b,c.
|
||||
Exactly one of -m, -n, or -g must be supplied.
|
||||
--prefix {p} Specify filename prefix; default "split".
|
||||
--suffix {s} Specify filename suffix; default is from mlr output format, e.g. "csv".
|
||||
-a Append to existing file(s), if any, rather than overwriting.
|
||||
-v Send records along to downstream verbs as well as splitting to files.
|
||||
-h|--help Show this message.
|
||||
Any of the output-format command-line flags (see mlr -h). For example, using
|
||||
mlr --icsv --from myfile.csv split --ojson -n 1000
|
||||
the input is CSV, but the output files are JSON.
|
||||
|
||||
Examples: Suppose myfile.csv has 1,000,000 records.
|
||||
|
||||
100 output files, 10,000 records each. First 10,000 records in split_1.csv, next in split_2.csv, etc.
|
||||
mlr --csv --from myfile.csv split -n 10000
|
||||
|
||||
10 output files, 100,000 records each. Records 1,11,21,etc in split_1.csv, records 2,12,22, etc in split_2.csv, etc.
|
||||
mlr --csv --from myfile.csv split -m 10
|
||||
Same, but with JSON output.
|
||||
mlr --csv --from myfile.csv split -m 10 -o json
|
||||
|
||||
Same but instead of split_1.csv, split_2.csv, etc. there are test_1.dat, test_2.dat, etc.
|
||||
mlr --csv --from myfile.csv split -m 10 --prefix test --suffix dat
|
||||
Same, but written to the /tmp/ directory.
|
||||
mlr --csv --from myfile.csv split -m 10 --prefix /tmp/test --suffix dat
|
||||
|
||||
If the shape field has values triangle and square, then there will be split_triangle.csv and split_square.csv.
|
||||
mlr --csv --from myfile.csv split -g shape
|
||||
|
||||
If the color field has values yellow and green, and the shape field has values triangle and square,
|
||||
then there will be split_yellow_triangle.csv, split_yellow_square.csv, etc.
|
||||
mlr --csv --from myfile.csv split -g color,shape
|
||||
|
||||
See also the "tee" DSL function which lets you do more ad-hoc customization.
|
||||
</pre>
|
||||
|
||||
## stats1
|
||||
|
||||
<pre class="pre-highlight-in-pair">
|
||||
|
|
|
|||
|
|
@ -936,6 +936,12 @@ GENMD-RUN-COMMAND
|
|||
mlr --ijson --opprint sort-within-records data/sort-within-records.json
|
||||
GENMD-EOF
|
||||
|
||||
## split
|
||||
|
||||
GENMD-RUN-COMMAND
|
||||
mlr split --help
|
||||
GENMD-EOF
|
||||
|
||||
## stats1
|
||||
|
||||
GENMD-RUN-COMMAND
|
||||
|
|
|
|||
|
|
@ -56,6 +56,17 @@ type MultiOutputHandlerManager struct {
|
|||
}
|
||||
|
||||
// ----------------------------------------------------------------
|
||||
func NewFileOutputHandlerManager(
|
||||
recordWriterOptions *cli.TWriterOptions,
|
||||
doAppend bool,
|
||||
) *MultiOutputHandlerManager {
|
||||
if doAppend {
|
||||
return NewFileAppendHandlerManager(recordWriterOptions)
|
||||
} else {
|
||||
return NewFileWritetHandlerManager(recordWriterOptions)
|
||||
}
|
||||
}
|
||||
|
||||
func NewFileWritetHandlerManager(
|
||||
recordWriterOptions *cli.TWriterOptions,
|
||||
) *MultiOutputHandlerManager {
|
||||
|
|
@ -228,6 +239,18 @@ func newOutputHandlerCommon(
|
|||
}
|
||||
|
||||
// ----------------------------------------------------------------
|
||||
func NewFileOutputHandler(
|
||||
filename string,
|
||||
recordWriterOptions *cli.TWriterOptions,
|
||||
doAppend bool,
|
||||
) (*FileOutputHandler, error) {
|
||||
if doAppend {
|
||||
return NewFileAppendOutputHandler(filename, recordWriterOptions)
|
||||
} else {
|
||||
return NewFileWriteOutputHandler(filename, recordWriterOptions)
|
||||
}
|
||||
}
|
||||
|
||||
func NewFileWriteOutputHandler(
|
||||
filename string,
|
||||
recordWriterOptions *cli.TWriterOptions,
|
||||
|
|
|
|||
|
|
@ -59,6 +59,7 @@ var TRANSFORMER_LOOKUP_TABLE = []TransformerSetup{
|
|||
SkipTrivialRecordsSetup,
|
||||
SortSetup,
|
||||
SortWithinRecordsSetup,
|
||||
SplitSetup,
|
||||
Stats1Setup,
|
||||
Stats2Setup,
|
||||
StepSetup,
|
||||
|
|
|
|||
437
internal/pkg/transformers/split.go
Normal file
437
internal/pkg/transformers/split.go
Normal file
|
|
@ -0,0 +1,437 @@
|
|||
package transformers
|
||||
|
||||
import (
|
||||
"bytes"
|
||||
"container/list"
|
||||
"fmt"
|
||||
"net/url"
|
||||
"os"
|
||||
"strings"
|
||||
|
||||
"github.com/johnkerl/miller/internal/pkg/cli"
|
||||
"github.com/johnkerl/miller/internal/pkg/mlrval"
|
||||
"github.com/johnkerl/miller/internal/pkg/output"
|
||||
"github.com/johnkerl/miller/internal/pkg/types"
|
||||
)
|
||||
|
||||
// ----------------------------------------------------------------
|
||||
const verbNameSplit = "split"
|
||||
const splitDefaultOutputFileNamePrefix = "split"
|
||||
|
||||
var SplitSetup = TransformerSetup{
|
||||
Verb: verbNameSplit,
|
||||
UsageFunc: transformerSplitUsage,
|
||||
ParseCLIFunc: transformerSplitParseCLI,
|
||||
IgnoresInput: false,
|
||||
}
|
||||
|
||||
func transformerSplitUsage(
|
||||
o *os.File,
|
||||
doExit bool,
|
||||
exitCode int,
|
||||
) {
|
||||
fmt.Fprintf(o, "Usage: %s %s [options] {filename}\n", "mlr", verbNameSplit)
|
||||
fmt.Fprintf(o,
|
||||
`Options:
|
||||
-n {n}: Cap file sizes at N records.
|
||||
-m {m}: Produce M files, round-robining records among them.
|
||||
-g {a,b,c}: Write separate files with records having distinct values for fields named a,b,c.
|
||||
Exactly one of -m, -n, or -g must be supplied.
|
||||
--prefix {p} Specify filename prefix; default "`+splitDefaultOutputFileNamePrefix+`".
|
||||
--suffix {s} Specify filename suffix; default is from mlr output format, e.g. "csv".
|
||||
-a Append to existing file(s), if any, rather than overwriting.
|
||||
-v Send records along to downstream verbs as well as splitting to files.
|
||||
-h|--help Show this message.
|
||||
Any of the output-format command-line flags (see mlr -h). For example, using
|
||||
mlr --icsv --from myfile.csv split --ojson -n 1000
|
||||
the input is CSV, but the output files are JSON.
|
||||
|
||||
Examples: Suppose myfile.csv has 1,000,000 records.
|
||||
|
||||
100 output files, 10,000 records each. First 10,000 records in split_1.csv, next in split_2.csv, etc.
|
||||
mlr --csv --from myfile.csv split -n 10000
|
||||
|
||||
10 output files, 100,000 records each. Records 1,11,21,etc in split_1.csv, records 2,12,22, etc in split_2.csv, etc.
|
||||
mlr --csv --from myfile.csv split -m 10
|
||||
Same, but with JSON output.
|
||||
mlr --csv --from myfile.csv split -m 10 -o json
|
||||
|
||||
Same but instead of split_1.csv, split_2.csv, etc. there are test_1.dat, test_2.dat, etc.
|
||||
mlr --csv --from myfile.csv split -m 10 --prefix test --suffix dat
|
||||
Same, but written to the /tmp/ directory.
|
||||
mlr --csv --from myfile.csv split -m 10 --prefix /tmp/test --suffix dat
|
||||
|
||||
If the shape field has values triangle and square, then there will be split_triangle.csv and split_square.csv.
|
||||
mlr --csv --from myfile.csv split -g shape
|
||||
|
||||
If the color field has values yellow and green, and the shape field has values triangle and square,
|
||||
then there will be split_yellow_triangle.csv, split_yellow_square.csv, etc.
|
||||
mlr --csv --from myfile.csv split -g color,shape
|
||||
|
||||
See also the "tee" DSL function which lets you do more ad-hoc customization.
|
||||
`)
|
||||
if doExit {
|
||||
os.Exit(exitCode)
|
||||
}
|
||||
}
|
||||
|
||||
func transformerSplitParseCLI(
|
||||
pargi *int,
|
||||
argc int,
|
||||
args []string,
|
||||
mainOptions *cli.TOptions,
|
||||
doConstruct bool, // false for first pass of CLI-parse, true for second pass
|
||||
) IRecordTransformer {
|
||||
|
||||
// Skip the verb name from the current spot in the mlr command line
|
||||
argi := *pargi
|
||||
verb := args[argi]
|
||||
argi++
|
||||
|
||||
var n int = 0
|
||||
var doMod bool = false
|
||||
var doSize bool = false
|
||||
var groupByFieldNames []string = nil
|
||||
var emitDownstream bool = false
|
||||
var doAppend bool = false
|
||||
var outputFileNamePrefix string = splitDefaultOutputFileNamePrefix
|
||||
var outputFileNameSuffix string = "uninit"
|
||||
haveOutputFileNameSuffix := false
|
||||
|
||||
var localOptions *cli.TOptions = nil
|
||||
if mainOptions != nil {
|
||||
copyThereof := *mainOptions // struct copy
|
||||
localOptions = ©Thereof
|
||||
}
|
||||
|
||||
// Parse local flags.
|
||||
for argi < argc /* variable increment: 1 or 2 depending on flag */ {
|
||||
opt := args[argi]
|
||||
if !strings.HasPrefix(opt, "-") {
|
||||
break // No more flag options to process
|
||||
}
|
||||
if args[argi] == "--" {
|
||||
break // All transformers must do this so main-flags can follow verb-flags
|
||||
}
|
||||
argi++
|
||||
|
||||
if opt == "-h" || opt == "--help" {
|
||||
transformerSplitUsage(os.Stdout, true, 0)
|
||||
|
||||
} else if opt == "-n" {
|
||||
n = cli.VerbGetIntArgOrDie(verb, opt, args, &argi, argc)
|
||||
doSize = true
|
||||
|
||||
} else if opt == "-m" {
|
||||
n = cli.VerbGetIntArgOrDie(verb, opt, args, &argi, argc)
|
||||
doMod = true
|
||||
|
||||
} else if opt == "-g" {
|
||||
groupByFieldNames = cli.VerbGetStringArrayArgOrDie(verb, opt, args, &argi, argc)
|
||||
|
||||
} else if opt == "--prefix" {
|
||||
outputFileNamePrefix = cli.VerbGetStringArgOrDie(verb, opt, args, &argi, argc)
|
||||
|
||||
} else if opt == "--suffix" {
|
||||
outputFileNameSuffix = cli.VerbGetStringArgOrDie(verb, opt, args, &argi, argc)
|
||||
haveOutputFileNameSuffix = true
|
||||
|
||||
} else if opt == "-a" {
|
||||
doAppend = true
|
||||
|
||||
} else if opt == "-v" {
|
||||
emitDownstream = true
|
||||
|
||||
} else {
|
||||
// This is inelegant. For error-proofing we advance argi already in our
|
||||
// loop (so individual if-statements don't need to). However,
|
||||
// ParseWriterOptions expects it unadvanced.
|
||||
largi := argi - 1
|
||||
if cli.FLAG_TABLE.Parse(args, argc, &largi, localOptions) {
|
||||
// This lets mlr main and mlr split have different output formats.
|
||||
// Nothing else to handle here.
|
||||
argi = largi
|
||||
} else {
|
||||
transformerSplitUsage(os.Stderr, true, 1)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
doGroup := groupByFieldNames != nil
|
||||
if !doMod && !doSize && !doGroup {
|
||||
fmt.Fprintf(os.Stderr, "mlr %s: At least one of -m, -n, or -g is required.\n", verb)
|
||||
os.Exit(1)
|
||||
}
|
||||
if (doMod && doSize) || (doMod && doGroup) || (doSize && doGroup) {
|
||||
fmt.Fprintf(os.Stderr, "mlr %s: Only one of -m, -n, or -g is required.\n", verb)
|
||||
os.Exit(1)
|
||||
}
|
||||
|
||||
cli.FinalizeWriterOptions(&localOptions.WriterOptions)
|
||||
if !haveOutputFileNameSuffix {
|
||||
outputFileNameSuffix = localOptions.WriterOptions.OutputFileFormat
|
||||
}
|
||||
|
||||
*pargi = argi
|
||||
if !doConstruct { // All transformers must do this for main command-line parsing
|
||||
return nil
|
||||
}
|
||||
|
||||
transformer, err := NewTransformerSplit(
|
||||
n,
|
||||
doMod,
|
||||
doSize,
|
||||
groupByFieldNames,
|
||||
emitDownstream,
|
||||
doAppend,
|
||||
outputFileNamePrefix,
|
||||
outputFileNameSuffix,
|
||||
&localOptions.WriterOptions,
|
||||
)
|
||||
if err != nil {
|
||||
// Error message already printed out
|
||||
os.Exit(1)
|
||||
}
|
||||
|
||||
return transformer
|
||||
}
|
||||
|
||||
// ----------------------------------------------------------------
|
||||
type TransformerSplit struct {
|
||||
n int
|
||||
outputFileNamePrefix string
|
||||
outputFileNameSuffix string
|
||||
emitDownstream bool
|
||||
ungroupedCounter int
|
||||
groupByFieldNames []string
|
||||
recordWriterOptions *cli.TWriterOptions
|
||||
doAppend bool
|
||||
|
||||
// For doSize ungrouped: only one file open at a time
|
||||
outputHandler output.OutputHandler
|
||||
previousQuotient int
|
||||
|
||||
// For all other cases: multiple files open at a time
|
||||
outputHandlerManager output.OutputHandlerManager
|
||||
|
||||
recordTransformerFunc RecordTransformerFunc
|
||||
}
|
||||
|
||||
func NewTransformerSplit(
|
||||
n int,
|
||||
doMod bool,
|
||||
doSize bool,
|
||||
groupByFieldNames []string,
|
||||
emitDownstream bool,
|
||||
doAppend bool,
|
||||
outputFileNamePrefix string,
|
||||
outputFileNameSuffix string,
|
||||
recordWriterOptions *cli.TWriterOptions,
|
||||
) (*TransformerSplit, error) {
|
||||
|
||||
tr := &TransformerSplit{
|
||||
n: n,
|
||||
outputFileNamePrefix: outputFileNamePrefix,
|
||||
outputFileNameSuffix: outputFileNameSuffix,
|
||||
emitDownstream: emitDownstream,
|
||||
ungroupedCounter: 0,
|
||||
groupByFieldNames: groupByFieldNames,
|
||||
recordWriterOptions: recordWriterOptions,
|
||||
doAppend: doAppend,
|
||||
|
||||
outputHandler: nil,
|
||||
previousQuotient: -1,
|
||||
}
|
||||
|
||||
tr.outputHandlerManager = output.NewFileOutputHandlerManager(recordWriterOptions, doAppend)
|
||||
|
||||
if groupByFieldNames != nil {
|
||||
tr.recordTransformerFunc = tr.splitGrouped
|
||||
} else if doMod {
|
||||
tr.recordTransformerFunc = tr.splitModUngrouped
|
||||
} else {
|
||||
tr.recordTransformerFunc = tr.splitSizeUngrouped
|
||||
}
|
||||
|
||||
return tr, nil
|
||||
}
|
||||
|
||||
func (tr *TransformerSplit) Transform(
|
||||
inrecAndContext *types.RecordAndContext,
|
||||
outputRecordsAndContexts *list.List, // list of *types.RecordAndContext
|
||||
inputDownstreamDoneChannel <-chan bool,
|
||||
outputDownstreamDoneChannel chan<- bool,
|
||||
) {
|
||||
HandleDefaultDownstreamDone(inputDownstreamDoneChannel, outputDownstreamDoneChannel)
|
||||
tr.recordTransformerFunc(inrecAndContext, outputRecordsAndContexts, inputDownstreamDoneChannel,
|
||||
outputDownstreamDoneChannel)
|
||||
}
|
||||
|
||||
func (tr *TransformerSplit) splitModUngrouped(
|
||||
inrecAndContext *types.RecordAndContext,
|
||||
outputRecordsAndContexts *list.List, // list of *types.RecordAndContext
|
||||
inputDownstreamDoneChannel <-chan bool,
|
||||
outputDownstreamDoneChannel chan<- bool,
|
||||
) {
|
||||
if !inrecAndContext.EndOfStream {
|
||||
remainder := 1 + (tr.ungroupedCounter % tr.n)
|
||||
filename := tr.makeUngroupedOutputFileName(remainder)
|
||||
|
||||
err := tr.outputHandlerManager.WriteRecordAndContext(inrecAndContext, filename)
|
||||
if err != nil {
|
||||
fmt.Fprintf(os.Stderr, "mlr: file-write error: %v\n", err)
|
||||
os.Exit(1)
|
||||
}
|
||||
|
||||
if tr.emitDownstream {
|
||||
outputRecordsAndContexts.PushBack(inrecAndContext)
|
||||
}
|
||||
|
||||
tr.ungroupedCounter++
|
||||
|
||||
} else {
|
||||
outputRecordsAndContexts.PushBack(inrecAndContext) // end-of-stream marker
|
||||
errs := tr.outputHandlerManager.Close()
|
||||
if len(errs) > 0 {
|
||||
for _, err := range errs {
|
||||
fmt.Fprintf(os.Stderr, "mlr: file-close error: %v\n", err)
|
||||
}
|
||||
os.Exit(1)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
func (tr *TransformerSplit) splitSizeUngrouped(
|
||||
inrecAndContext *types.RecordAndContext,
|
||||
outputRecordsAndContexts *list.List, // list of *types.RecordAndContext
|
||||
inputDownstreamDoneChannel <-chan bool,
|
||||
outputDownstreamDoneChannel chan<- bool,
|
||||
) {
|
||||
var err error
|
||||
if !inrecAndContext.EndOfStream {
|
||||
quotient := 1 + (tr.ungroupedCounter / tr.n)
|
||||
|
||||
if quotient != tr.previousQuotient {
|
||||
if tr.outputHandler != nil {
|
||||
err = tr.outputHandler.Close()
|
||||
if err != nil {
|
||||
fmt.Fprintf(os.Stderr, "mlr: file-close error: %v\n", err)
|
||||
os.Exit(1)
|
||||
}
|
||||
}
|
||||
|
||||
filename := tr.makeUngroupedOutputFileName(quotient)
|
||||
tr.outputHandler, err = output.NewFileOutputHandler(
|
||||
filename,
|
||||
tr.recordWriterOptions,
|
||||
tr.doAppend,
|
||||
)
|
||||
if err != nil {
|
||||
fmt.Fprintf(os.Stderr, "mlr: file-open error: %v\n", err)
|
||||
os.Exit(1)
|
||||
}
|
||||
|
||||
tr.previousQuotient = quotient
|
||||
}
|
||||
|
||||
err = tr.outputHandler.WriteRecordAndContext(inrecAndContext)
|
||||
if err != nil {
|
||||
fmt.Fprintf(os.Stderr, "mlr: file-write error: %v\n", err)
|
||||
os.Exit(1)
|
||||
}
|
||||
|
||||
if tr.emitDownstream {
|
||||
outputRecordsAndContexts.PushBack(inrecAndContext)
|
||||
}
|
||||
|
||||
tr.ungroupedCounter++
|
||||
|
||||
} else {
|
||||
outputRecordsAndContexts.PushBack(inrecAndContext) // end-of-stream marker
|
||||
|
||||
if tr.outputHandler != nil {
|
||||
err := tr.outputHandler.Close()
|
||||
if err != nil {
|
||||
fmt.Fprintf(os.Stderr, "mlr: file-close error: %v\n", err)
|
||||
os.Exit(1)
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
func (tr *TransformerSplit) splitGrouped(
|
||||
inrecAndContext *types.RecordAndContext,
|
||||
outputRecordsAndContexts *list.List, // list of *types.RecordAndContext
|
||||
inputDownstreamDoneChannel <-chan bool,
|
||||
outputDownstreamDoneChannel chan<- bool,
|
||||
) {
|
||||
if !inrecAndContext.EndOfStream {
|
||||
var filename string
|
||||
groupByFieldValues, ok := inrecAndContext.Record.GetSelectedValues(tr.groupByFieldNames)
|
||||
if !ok {
|
||||
filename = fmt.Sprintf("%s_ungrouped.%s", tr.outputFileNamePrefix, tr.outputFileNameSuffix)
|
||||
} else {
|
||||
filename = tr.makeGroupedOutputFileName(groupByFieldValues)
|
||||
}
|
||||
err := tr.outputHandlerManager.WriteRecordAndContext(inrecAndContext, filename)
|
||||
if err != nil {
|
||||
fmt.Fprintf(os.Stderr, "mlr: %v\n", err)
|
||||
os.Exit(1)
|
||||
}
|
||||
|
||||
if tr.emitDownstream {
|
||||
outputRecordsAndContexts.PushBack(inrecAndContext)
|
||||
}
|
||||
|
||||
} else {
|
||||
outputRecordsAndContexts.PushBack(inrecAndContext) // emit end-of-stream marker
|
||||
|
||||
errs := tr.outputHandlerManager.Close()
|
||||
if len(errs) > 0 {
|
||||
for _, err := range errs {
|
||||
fmt.Fprintf(os.Stderr, "mlr: file-close error: %v\n", err)
|
||||
}
|
||||
os.Exit(1)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// makeUngroupedOutputFileName example: "split_53.csv"
|
||||
func (tr *TransformerSplit) makeUngroupedOutputFileName(k int) string {
|
||||
return fmt.Sprintf("%s_%d.%s", tr.outputFileNamePrefix, k, tr.outputFileNameSuffix)
|
||||
}
|
||||
|
||||
// makeGroupedOutputFileName example: "split_orange.csv"
|
||||
func (tr *TransformerSplit) makeGroupedOutputFileName(
|
||||
groupByFieldValues []*mlrval.Mlrval,
|
||||
) string {
|
||||
var buffer bytes.Buffer
|
||||
buffer.WriteString(tr.outputFileNamePrefix)
|
||||
for _, groupByFieldValue := range groupByFieldValues {
|
||||
buffer.WriteString("_")
|
||||
buffer.WriteString(url.QueryEscape(groupByFieldValue.String()))
|
||||
}
|
||||
buffer.WriteString(".")
|
||||
buffer.WriteString(tr.outputFileNameSuffix)
|
||||
return buffer.String()
|
||||
}
|
||||
|
||||
// makeGroupedIndexedOutputFileName example: "split_yellow_53.csv"
|
||||
func (tr *TransformerSplit) makeGroupedIndexedOutputFileName(
|
||||
groupByFieldValues []*mlrval.Mlrval,
|
||||
index int,
|
||||
) string {
|
||||
// URL-escape the fields which come from data and which may have '/'
|
||||
// etc within. Don't URL-escape the prefix since people may want to
|
||||
// use prefixes like '/tmp/split' to write to the /tmp directory, etc.
|
||||
var buffer bytes.Buffer
|
||||
buffer.WriteString(tr.outputFileNamePrefix)
|
||||
for _, groupByFieldValue := range groupByFieldValues {
|
||||
buffer.WriteString("_")
|
||||
buffer.WriteString(url.QueryEscape(groupByFieldValue.String()))
|
||||
}
|
||||
buffer.WriteString(fmt.Sprintf("_%d", index))
|
||||
buffer.WriteString(".")
|
||||
buffer.WriteString(tr.outputFileNameSuffix)
|
||||
return buffer.String()
|
||||
}
|
||||
|
|
@ -174,8 +174,8 @@ VERB LIST
|
|||
json-stringify join label least-frequent merge-fields most-frequent nest
|
||||
nothing put regularize remove-empty-columns rename reorder repeat reshape
|
||||
sample sec2gmtdate sec2gmt seqgen shuffle skip-trivial-records sort
|
||||
sort-within-records stats1 stats2 step tac tail tee template top unflatten
|
||||
uniq unsparsify
|
||||
sort-within-records split stats1 stats2 step tac tail tee template top
|
||||
unflatten uniq unsparsify
|
||||
|
||||
FUNCTION LIST
|
||||
abs acos acosh any append apply arrayify asin asinh asserting_absent
|
||||
|
|
@ -1716,6 +1716,46 @@ VERBS
|
|||
-r Recursively sort subobjects/submaps, e.g. for JSON input.
|
||||
-h|--help Show this message.
|
||||
|
||||
split
|
||||
Usage: mlr split [options] {filename}
|
||||
Options:
|
||||
-n {n}: Cap file sizes at N records.
|
||||
-m {m}: Produce M files, round-robining records among them.
|
||||
-g {a,b,c}: Write separate files with records having distinct values for fields named a,b,c.
|
||||
Exactly one of -m, -n, or -g must be supplied.
|
||||
--prefix {p} Specify filename prefix; default "split".
|
||||
--suffix {s} Specify filename suffix; default is from mlr output format, e.g. "csv".
|
||||
-a Append to existing file(s), if any, rather than overwriting.
|
||||
-v Send records along to downstream verbs as well as splitting to files.
|
||||
-h|--help Show this message.
|
||||
Any of the output-format command-line flags (see mlr -h). For example, using
|
||||
mlr --icsv --from myfile.csv split --ojson -n 1000
|
||||
the input is CSV, but the output files are JSON.
|
||||
|
||||
Examples: Suppose myfile.csv has 1,000,000 records.
|
||||
|
||||
100 output files, 10,000 records each. First 10,000 records in split_1.csv, next in split_2.csv, etc.
|
||||
mlr --csv --from myfile.csv split -n 10000
|
||||
|
||||
10 output files, 100,000 records each. Records 1,11,21,etc in split_1.csv, records 2,12,22, etc in split_2.csv, etc.
|
||||
mlr --csv --from myfile.csv split -m 10
|
||||
Same, but with JSON output.
|
||||
mlr --csv --from myfile.csv split -m 10 -o json
|
||||
|
||||
Same but instead of split_1.csv, split_2.csv, etc. there are test_1.dat, test_2.dat, etc.
|
||||
mlr --csv --from myfile.csv split -m 10 --prefix test --suffix dat
|
||||
Same, but written to the /tmp/ directory.
|
||||
mlr --csv --from myfile.csv split -m 10 --prefix /tmp/test --suffix dat
|
||||
|
||||
If the shape field has values triangle and square, then there will be split_triangle.csv and split_square.csv.
|
||||
mlr --csv --from myfile.csv split -g shape
|
||||
|
||||
If the color field has values yellow and green, and the shape field has values triangle and square,
|
||||
then there will be split_yellow_triangle.csv, split_yellow_square.csv, etc.
|
||||
mlr --csv --from myfile.csv split -g color,shape
|
||||
|
||||
See also the "tee" DSL function which lets you do more ad-hoc customization.
|
||||
|
||||
stats1
|
||||
Usage: mlr stats1 [options]
|
||||
Computes univariate statistics for one or more given fields, accumulated across
|
||||
|
|
@ -3070,4 +3110,4 @@ SEE ALSO
|
|||
|
||||
|
||||
|
||||
2022-01-25 MILLER(1)
|
||||
2022-01-27 MILLER(1)
|
||||
|
|
|
|||
54
man/mlr.1
54
man/mlr.1
|
|
@ -2,12 +2,12 @@
|
|||
.\" Title: mlr
|
||||
.\" Author: [see the "AUTHOR" section]
|
||||
.\" Generator: ./mkman.rb
|
||||
.\" Date: 2022-01-25
|
||||
.\" Date: 2022-01-27
|
||||
.\" Manual: \ \&
|
||||
.\" Source: \ \&
|
||||
.\" Language: English
|
||||
.\"
|
||||
.TH "MILLER" "1" "2022-01-25" "\ \&" "\ \&"
|
||||
.TH "MILLER" "1" "2022-01-27" "\ \&" "\ \&"
|
||||
.\" -----------------------------------------------------------------
|
||||
.\" * Portability definitions
|
||||
.\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
|
@ -215,8 +215,8 @@ fraction gap grep group-by group-like having-fields head histogram json-parse
|
|||
json-stringify join label least-frequent merge-fields most-frequent nest
|
||||
nothing put regularize remove-empty-columns rename reorder repeat reshape
|
||||
sample sec2gmtdate sec2gmt seqgen shuffle skip-trivial-records sort
|
||||
sort-within-records stats1 stats2 step tac tail tee template top unflatten
|
||||
uniq unsparsify
|
||||
sort-within-records split stats1 stats2 step tac tail tee template top
|
||||
unflatten uniq unsparsify
|
||||
.fi
|
||||
.if n \{\
|
||||
.RE
|
||||
|
|
@ -2169,6 +2169,52 @@ Options:
|
|||
.fi
|
||||
.if n \{\
|
||||
.RE
|
||||
.SS "split"
|
||||
.if n \{\
|
||||
.RS 0
|
||||
.\}
|
||||
.nf
|
||||
Usage: mlr split [options] {filename}
|
||||
Options:
|
||||
-n {n}: Cap file sizes at N records.
|
||||
-m {m}: Produce M files, round-robining records among them.
|
||||
-g {a,b,c}: Write separate files with records having distinct values for fields named a,b,c.
|
||||
Exactly one of -m, -n, or -g must be supplied.
|
||||
--prefix {p} Specify filename prefix; default "split".
|
||||
--suffix {s} Specify filename suffix; default is from mlr output format, e.g. "csv".
|
||||
-a Append to existing file(s), if any, rather than overwriting.
|
||||
-v Send records along to downstream verbs as well as splitting to files.
|
||||
-h|--help Show this message.
|
||||
Any of the output-format command-line flags (see mlr -h). For example, using
|
||||
mlr --icsv --from myfile.csv split --ojson -n 1000
|
||||
the input is CSV, but the output files are JSON.
|
||||
|
||||
Examples: Suppose myfile.csv has 1,000,000 records.
|
||||
|
||||
100 output files, 10,000 records each. First 10,000 records in split_1.csv, next in split_2.csv, etc.
|
||||
mlr --csv --from myfile.csv split -n 10000
|
||||
|
||||
10 output files, 100,000 records each. Records 1,11,21,etc in split_1.csv, records 2,12,22, etc in split_2.csv, etc.
|
||||
mlr --csv --from myfile.csv split -m 10
|
||||
Same, but with JSON output.
|
||||
mlr --csv --from myfile.csv split -m 10 -o json
|
||||
|
||||
Same but instead of split_1.csv, split_2.csv, etc. there are test_1.dat, test_2.dat, etc.
|
||||
mlr --csv --from myfile.csv split -m 10 --prefix test --suffix dat
|
||||
Same, but written to the /tmp/ directory.
|
||||
mlr --csv --from myfile.csv split -m 10 --prefix /tmp/test --suffix dat
|
||||
|
||||
If the shape field has values triangle and square, then there will be split_triangle.csv and split_square.csv.
|
||||
mlr --csv --from myfile.csv split -g shape
|
||||
|
||||
If the color field has values yellow and green, and the shape field has values triangle and square,
|
||||
then there will be split_yellow_triangle.csv, split_yellow_square.csv, etc.
|
||||
mlr --csv --from myfile.csv split -g color,shape
|
||||
|
||||
See also the "tee" DSL function which lets you do more ad-hoc customization.
|
||||
.fi
|
||||
.if n \{\
|
||||
.RE
|
||||
.SS "stats1"
|
||||
.if n \{\
|
||||
.RS 0
|
||||
|
|
|
|||
|
|
@ -929,6 +929,47 @@ Options:
|
|||
-r Recursively sort subobjects/submaps, e.g. for JSON input.
|
||||
-h|--help Show this message.
|
||||
|
||||
================================================================
|
||||
split
|
||||
Usage: mlr split [options] {filename}
|
||||
Options:
|
||||
-n {n}: Cap file sizes at N records.
|
||||
-m {m}: Produce M files, round-robining records among them.
|
||||
-g {a,b,c}: Write separate files with records having distinct values for fields named a,b,c.
|
||||
Exactly one of -m, -n, or -g must be supplied.
|
||||
--prefix {p} Specify filename prefix; default "split".
|
||||
--suffix {s} Specify filename suffix; default is from mlr output format, e.g. "csv".
|
||||
-a Append to existing file(s), if any, rather than overwriting.
|
||||
-v Send records along to downstream verbs as well as splitting to files.
|
||||
-h|--help Show this message.
|
||||
Any of the output-format command-line flags (see mlr -h). For example, using
|
||||
mlr --icsv --from myfile.csv split --ojson -n 1000
|
||||
the input is CSV, but the output files are JSON.
|
||||
|
||||
Examples: Suppose myfile.csv has 1,000,000 records.
|
||||
|
||||
100 output files, 10,000 records each. First 10,000 records in split_1.csv, next in split_2.csv, etc.
|
||||
mlr --csv --from myfile.csv split -n 10000
|
||||
|
||||
10 output files, 100,000 records each. Records 1,11,21,etc in split_1.csv, records 2,12,22, etc in split_2.csv, etc.
|
||||
mlr --csv --from myfile.csv split -m 10
|
||||
Same, but with JSON output.
|
||||
mlr --csv --from myfile.csv split -m 10 -o json
|
||||
|
||||
Same but instead of split_1.csv, split_2.csv, etc. there are test_1.dat, test_2.dat, etc.
|
||||
mlr --csv --from myfile.csv split -m 10 --prefix test --suffix dat
|
||||
Same, but written to the /tmp/ directory.
|
||||
mlr --csv --from myfile.csv split -m 10 --prefix /tmp/test --suffix dat
|
||||
|
||||
If the shape field has values triangle and square, then there will be split_triangle.csv and split_square.csv.
|
||||
mlr --csv --from myfile.csv split -g shape
|
||||
|
||||
If the color field has values yellow and green, and the shape field has values triangle and square,
|
||||
then there will be split_yellow_triangle.csv, split_yellow_square.csv, etc.
|
||||
mlr --csv --from myfile.csv split -g color,shape
|
||||
|
||||
See also the "tee" DSL function which lets you do more ad-hoc customization.
|
||||
|
||||
================================================================
|
||||
stats1
|
||||
Usage: mlr stats1 [options]
|
||||
|
|
|
|||
1
test/cases/verb-split/0001/cmd
Normal file
1
test/cases/verb-split/0001/cmd
Normal file
|
|
@ -0,0 +1 @@
|
|||
mlr --csv split -m 2 --prefix ${CASEDIR}/split test/input/example.csv
|
||||
0
test/cases/verb-split/0001/experr
Normal file
0
test/cases/verb-split/0001/experr
Normal file
0
test/cases/verb-split/0001/expout
Normal file
0
test/cases/verb-split/0001/expout
Normal file
3
test/cases/verb-split/0001/postcmp
Normal file
3
test/cases/verb-split/0001/postcmp
Normal file
|
|
@ -0,0 +1,3 @@
|
|||
${CASEDIR}/split_1.csv.expect ${CASEDIR}/split_1.csv
|
||||
${CASEDIR}/split_2.csv.expect ${CASEDIR}/split_2.csv
|
||||
|
||||
6
test/cases/verb-split/0001/split_1.csv.expect
Normal file
6
test/cases/verb-split/0001/split_1.csv.expect
Normal file
|
|
@ -0,0 +1,6 @@
|
|||
color,shape,flag,k,index,quantity,rate
|
||||
yellow,triangle,true,1,11,43.6498,9.8870
|
||||
red,circle,true,3,16,13.8103,2.9010
|
||||
purple,triangle,false,5,51,81.2290,8.5910
|
||||
purple,triangle,false,7,65,80.1405,5.8240
|
||||
yellow,circle,true,9,87,63.5058,8.3350
|
||||
6
test/cases/verb-split/0001/split_2.csv.expect
Normal file
6
test/cases/verb-split/0001/split_2.csv.expect
Normal file
|
|
@ -0,0 +1,6 @@
|
|||
color,shape,flag,k,index,quantity,rate
|
||||
red,square,true,2,15,79.2778,0.0130
|
||||
red,square,false,4,48,77.5542,7.4670
|
||||
red,square,false,6,64,77.1991,9.5310
|
||||
yellow,circle,true,8,73,63.9785,4.2370
|
||||
purple,square,false,10,91,72.3735,8.2430
|
||||
1
test/cases/verb-split/0002/cmd
Normal file
1
test/cases/verb-split/0002/cmd
Normal file
|
|
@ -0,0 +1 @@
|
|||
mlr --csv split -n 2 --prefix ${CASEDIR}/split test/input/example.csv
|
||||
0
test/cases/verb-split/0002/experr
Normal file
0
test/cases/verb-split/0002/experr
Normal file
0
test/cases/verb-split/0002/expout
Normal file
0
test/cases/verb-split/0002/expout
Normal file
6
test/cases/verb-split/0002/postcmp
Normal file
6
test/cases/verb-split/0002/postcmp
Normal file
|
|
@ -0,0 +1,6 @@
|
|||
${CASEDIR}/split_1.csv.expect ${CASEDIR}/split_1.csv
|
||||
${CASEDIR}/split_2.csv.expect ${CASEDIR}/split_2.csv
|
||||
${CASEDIR}/split_3.csv.expect ${CASEDIR}/split_3.csv
|
||||
${CASEDIR}/split_4.csv.expect ${CASEDIR}/split_4.csv
|
||||
${CASEDIR}/split_5.csv.expect ${CASEDIR}/split_5.csv
|
||||
|
||||
3
test/cases/verb-split/0002/split_1.csv.expect
Normal file
3
test/cases/verb-split/0002/split_1.csv.expect
Normal file
|
|
@ -0,0 +1,3 @@
|
|||
color,shape,flag,k,index,quantity,rate
|
||||
yellow,triangle,true,1,11,43.6498,9.8870
|
||||
red,square,true,2,15,79.2778,0.0130
|
||||
3
test/cases/verb-split/0002/split_2.csv.expect
Normal file
3
test/cases/verb-split/0002/split_2.csv.expect
Normal file
|
|
@ -0,0 +1,3 @@
|
|||
color,shape,flag,k,index,quantity,rate
|
||||
red,circle,true,3,16,13.8103,2.9010
|
||||
red,square,false,4,48,77.5542,7.4670
|
||||
3
test/cases/verb-split/0002/split_3.csv.expect
Normal file
3
test/cases/verb-split/0002/split_3.csv.expect
Normal file
|
|
@ -0,0 +1,3 @@
|
|||
color,shape,flag,k,index,quantity,rate
|
||||
purple,triangle,false,5,51,81.2290,8.5910
|
||||
red,square,false,6,64,77.1991,9.5310
|
||||
3
test/cases/verb-split/0002/split_4.csv.expect
Normal file
3
test/cases/verb-split/0002/split_4.csv.expect
Normal file
|
|
@ -0,0 +1,3 @@
|
|||
color,shape,flag,k,index,quantity,rate
|
||||
purple,triangle,false,7,65,80.1405,5.8240
|
||||
yellow,circle,true,8,73,63.9785,4.2370
|
||||
3
test/cases/verb-split/0002/split_5.csv.expect
Normal file
3
test/cases/verb-split/0002/split_5.csv.expect
Normal file
|
|
@ -0,0 +1,3 @@
|
|||
color,shape,flag,k,index,quantity,rate
|
||||
yellow,circle,true,9,87,63.5058,8.3350
|
||||
purple,square,false,10,91,72.3735,8.2430
|
||||
1
test/cases/verb-split/0003/cmd
Normal file
1
test/cases/verb-split/0003/cmd
Normal file
|
|
@ -0,0 +1 @@
|
|||
mlr --csv split -g shape --prefix ${CASEDIR}/split test/input/example.csv
|
||||
0
test/cases/verb-split/0003/experr
Normal file
0
test/cases/verb-split/0003/experr
Normal file
0
test/cases/verb-split/0003/expout
Normal file
0
test/cases/verb-split/0003/expout
Normal file
3
test/cases/verb-split/0003/postcmp
Normal file
3
test/cases/verb-split/0003/postcmp
Normal file
|
|
@ -0,0 +1,3 @@
|
|||
${CASEDIR}/split_square.csv.expect ${CASEDIR}/split_square.csv
|
||||
${CASEDIR}/split_circle.csv.expect ${CASEDIR}/split_circle.csv
|
||||
${CASEDIR}/split_triangle.csv.expect ${CASEDIR}/split_triangle.csv
|
||||
4
test/cases/verb-split/0003/split_circle.csv.expect
Normal file
4
test/cases/verb-split/0003/split_circle.csv.expect
Normal file
|
|
@ -0,0 +1,4 @@
|
|||
color,shape,flag,k,index,quantity,rate
|
||||
red,circle,true,3,16,13.8103,2.9010
|
||||
yellow,circle,true,8,73,63.9785,4.2370
|
||||
yellow,circle,true,9,87,63.5058,8.3350
|
||||
5
test/cases/verb-split/0003/split_square.csv.expect
Normal file
5
test/cases/verb-split/0003/split_square.csv.expect
Normal file
|
|
@ -0,0 +1,5 @@
|
|||
color,shape,flag,k,index,quantity,rate
|
||||
red,square,true,2,15,79.2778,0.0130
|
||||
red,square,false,4,48,77.5542,7.4670
|
||||
red,square,false,6,64,77.1991,9.5310
|
||||
purple,square,false,10,91,72.3735,8.2430
|
||||
4
test/cases/verb-split/0003/split_triangle.csv.expect
Normal file
4
test/cases/verb-split/0003/split_triangle.csv.expect
Normal file
|
|
@ -0,0 +1,4 @@
|
|||
color,shape,flag,k,index,quantity,rate
|
||||
yellow,triangle,true,1,11,43.6498,9.8870
|
||||
purple,triangle,false,5,51,81.2290,8.5910
|
||||
purple,triangle,false,7,65,80.1405,5.8240
|
||||
1
test/cases/verb-split/0004/cmd
Normal file
1
test/cases/verb-split/0004/cmd
Normal file
|
|
@ -0,0 +1 @@
|
|||
mlr --csv split -g color,shape --prefix ${CASEDIR}/split test/input/example.csv
|
||||
0
test/cases/verb-split/0004/experr
Normal file
0
test/cases/verb-split/0004/experr
Normal file
0
test/cases/verb-split/0004/expout
Normal file
0
test/cases/verb-split/0004/expout
Normal file
7
test/cases/verb-split/0004/postcmp
Normal file
7
test/cases/verb-split/0004/postcmp
Normal file
|
|
@ -0,0 +1,7 @@
|
|||
${CASEDIR}/split_purple_square.csv.expect ${CASEDIR}/split_purple_square.csv
|
||||
${CASEDIR}/split_purple_triangle.csv.expect ${CASEDIR}/split_purple_triangle.csv
|
||||
${CASEDIR}/split_red_circle.csv.expect ${CASEDIR}/split_red_circle.csv
|
||||
${CASEDIR}/split_red_square.csv.expect ${CASEDIR}/split_red_square.csv
|
||||
${CASEDIR}/split_yellow_circle.csv.expect ${CASEDIR}/split_yellow_circle.csv
|
||||
${CASEDIR}/split_yellow_triangle.csv.expect ${CASEDIR}/split_yellow_triangle.csv
|
||||
|
||||
|
|
@ -0,0 +1,2 @@
|
|||
color,shape,flag,k,index,quantity,rate
|
||||
purple,square,false,10,91,72.3735,8.2430
|
||||
|
|
@ -0,0 +1,3 @@
|
|||
color,shape,flag,k,index,quantity,rate
|
||||
purple,triangle,false,5,51,81.2290,8.5910
|
||||
purple,triangle,false,7,65,80.1405,5.8240
|
||||
2
test/cases/verb-split/0004/split_red_circle.csv.expect
Normal file
2
test/cases/verb-split/0004/split_red_circle.csv.expect
Normal file
|
|
@ -0,0 +1,2 @@
|
|||
color,shape,flag,k,index,quantity,rate
|
||||
red,circle,true,3,16,13.8103,2.9010
|
||||
4
test/cases/verb-split/0004/split_red_square.csv.expect
Normal file
4
test/cases/verb-split/0004/split_red_square.csv.expect
Normal file
|
|
@ -0,0 +1,4 @@
|
|||
color,shape,flag,k,index,quantity,rate
|
||||
red,square,true,2,15,79.2778,0.0130
|
||||
red,square,false,4,48,77.5542,7.4670
|
||||
red,square,false,6,64,77.1991,9.5310
|
||||
|
|
@ -0,0 +1,3 @@
|
|||
color,shape,flag,k,index,quantity,rate
|
||||
yellow,circle,true,8,73,63.9785,4.2370
|
||||
yellow,circle,true,9,87,63.5058,8.3350
|
||||
|
|
@ -0,0 +1,2 @@
|
|||
color,shape,flag,k,index,quantity,rate
|
||||
yellow,triangle,true,1,11,43.6498,9.8870
|
||||
1
test/cases/verb-split/0005/cmd
Normal file
1
test/cases/verb-split/0005/cmd
Normal file
|
|
@ -0,0 +1 @@
|
|||
mlr --csv split -m 2 --prefix ${CASEDIR}/split --suffix dat test/input/example.csv
|
||||
0
test/cases/verb-split/0005/experr
Normal file
0
test/cases/verb-split/0005/experr
Normal file
0
test/cases/verb-split/0005/expout
Normal file
0
test/cases/verb-split/0005/expout
Normal file
3
test/cases/verb-split/0005/postcmp
Normal file
3
test/cases/verb-split/0005/postcmp
Normal file
|
|
@ -0,0 +1,3 @@
|
|||
${CASEDIR}/split_1.dat.expect ${CASEDIR}/split_1.dat
|
||||
${CASEDIR}/split_2.dat.expect ${CASEDIR}/split_2.dat
|
||||
|
||||
6
test/cases/verb-split/0005/split_1.dat.expect
Normal file
6
test/cases/verb-split/0005/split_1.dat.expect
Normal file
|
|
@ -0,0 +1,6 @@
|
|||
color,shape,flag,k,index,quantity,rate
|
||||
yellow,triangle,true,1,11,43.6498,9.8870
|
||||
red,circle,true,3,16,13.8103,2.9010
|
||||
purple,triangle,false,5,51,81.2290,8.5910
|
||||
purple,triangle,false,7,65,80.1405,5.8240
|
||||
yellow,circle,true,9,87,63.5058,8.3350
|
||||
6
test/cases/verb-split/0005/split_2.dat.expect
Normal file
6
test/cases/verb-split/0005/split_2.dat.expect
Normal file
|
|
@ -0,0 +1,6 @@
|
|||
color,shape,flag,k,index,quantity,rate
|
||||
red,square,true,2,15,79.2778,0.0130
|
||||
red,square,false,4,48,77.5542,7.4670
|
||||
red,square,false,6,64,77.1991,9.5310
|
||||
yellow,circle,true,8,73,63.9785,4.2370
|
||||
purple,square,false,10,91,72.3735,8.2430
|
||||
1
test/cases/verb-split/0006/cmd
Normal file
1
test/cases/verb-split/0006/cmd
Normal file
|
|
@ -0,0 +1 @@
|
|||
mlr --csv split -m 2 --prefix ${CASEDIR}/split --ojson test/input/example.csv
|
||||
0
test/cases/verb-split/0006/experr
Normal file
0
test/cases/verb-split/0006/experr
Normal file
0
test/cases/verb-split/0006/expout
Normal file
0
test/cases/verb-split/0006/expout
Normal file
3
test/cases/verb-split/0006/postcmp
Normal file
3
test/cases/verb-split/0006/postcmp
Normal file
|
|
@ -0,0 +1,3 @@
|
|||
${CASEDIR}/split_1.json.expect ${CASEDIR}/split_1.json
|
||||
${CASEDIR}/split_2.json.expect ${CASEDIR}/split_2.json
|
||||
|
||||
47
test/cases/verb-split/0006/split_1.json.expect
Normal file
47
test/cases/verb-split/0006/split_1.json.expect
Normal file
|
|
@ -0,0 +1,47 @@
|
|||
[
|
||||
{
|
||||
"color": "yellow",
|
||||
"shape": "triangle",
|
||||
"flag": "true",
|
||||
"k": 1,
|
||||
"index": 11,
|
||||
"quantity": 43.6498,
|
||||
"rate": 9.8870
|
||||
},
|
||||
{
|
||||
"color": "red",
|
||||
"shape": "circle",
|
||||
"flag": "true",
|
||||
"k": 3,
|
||||
"index": 16,
|
||||
"quantity": 13.8103,
|
||||
"rate": 2.9010
|
||||
},
|
||||
{
|
||||
"color": "purple",
|
||||
"shape": "triangle",
|
||||
"flag": "false",
|
||||
"k": 5,
|
||||
"index": 51,
|
||||
"quantity": 81.2290,
|
||||
"rate": 8.5910
|
||||
},
|
||||
{
|
||||
"color": "purple",
|
||||
"shape": "triangle",
|
||||
"flag": "false",
|
||||
"k": 7,
|
||||
"index": 65,
|
||||
"quantity": 80.1405,
|
||||
"rate": 5.8240
|
||||
},
|
||||
{
|
||||
"color": "yellow",
|
||||
"shape": "circle",
|
||||
"flag": "true",
|
||||
"k": 9,
|
||||
"index": 87,
|
||||
"quantity": 63.5058,
|
||||
"rate": 8.3350
|
||||
}
|
||||
]
|
||||
47
test/cases/verb-split/0006/split_2.json.expect
Normal file
47
test/cases/verb-split/0006/split_2.json.expect
Normal file
|
|
@ -0,0 +1,47 @@
|
|||
[
|
||||
{
|
||||
"color": "red",
|
||||
"shape": "square",
|
||||
"flag": "true",
|
||||
"k": 2,
|
||||
"index": 15,
|
||||
"quantity": 79.2778,
|
||||
"rate": 0.0130
|
||||
},
|
||||
{
|
||||
"color": "red",
|
||||
"shape": "square",
|
||||
"flag": "false",
|
||||
"k": 4,
|
||||
"index": 48,
|
||||
"quantity": 77.5542,
|
||||
"rate": 7.4670
|
||||
},
|
||||
{
|
||||
"color": "red",
|
||||
"shape": "square",
|
||||
"flag": "false",
|
||||
"k": 6,
|
||||
"index": 64,
|
||||
"quantity": 77.1991,
|
||||
"rate": 9.5310
|
||||
},
|
||||
{
|
||||
"color": "yellow",
|
||||
"shape": "circle",
|
||||
"flag": "true",
|
||||
"k": 8,
|
||||
"index": 73,
|
||||
"quantity": 63.9785,
|
||||
"rate": 4.2370
|
||||
},
|
||||
{
|
||||
"color": "purple",
|
||||
"shape": "square",
|
||||
"flag": "false",
|
||||
"k": 10,
|
||||
"index": 91,
|
||||
"quantity": 72.3735,
|
||||
"rate": 8.2430
|
||||
}
|
||||
]
|
||||
1
test/cases/verb-split/0007/cmd
Normal file
1
test/cases/verb-split/0007/cmd
Normal file
|
|
@ -0,0 +1 @@
|
|||
mlr --csv split -m 2 -v --prefix ${CASEDIR}/split test/input/example.csv
|
||||
0
test/cases/verb-split/0007/experr
Normal file
0
test/cases/verb-split/0007/experr
Normal file
11
test/cases/verb-split/0007/expout
Normal file
11
test/cases/verb-split/0007/expout
Normal file
|
|
@ -0,0 +1,11 @@
|
|||
color,shape,flag,k,index,quantity,rate
|
||||
yellow,triangle,true,1,11,43.6498,9.8870
|
||||
red,square,true,2,15,79.2778,0.0130
|
||||
red,circle,true,3,16,13.8103,2.9010
|
||||
red,square,false,4,48,77.5542,7.4670
|
||||
purple,triangle,false,5,51,81.2290,8.5910
|
||||
red,square,false,6,64,77.1991,9.5310
|
||||
purple,triangle,false,7,65,80.1405,5.8240
|
||||
yellow,circle,true,8,73,63.9785,4.2370
|
||||
yellow,circle,true,9,87,63.5058,8.3350
|
||||
purple,square,false,10,91,72.3735,8.2430
|
||||
3
test/cases/verb-split/0007/postcmp
Normal file
3
test/cases/verb-split/0007/postcmp
Normal file
|
|
@ -0,0 +1,3 @@
|
|||
${CASEDIR}/split_1.csv.expect ${CASEDIR}/split_1.csv
|
||||
${CASEDIR}/split_2.csv.expect ${CASEDIR}/split_2.csv
|
||||
|
||||
6
test/cases/verb-split/0007/split_1.csv.expect
Normal file
6
test/cases/verb-split/0007/split_1.csv.expect
Normal file
|
|
@ -0,0 +1,6 @@
|
|||
color,shape,flag,k,index,quantity,rate
|
||||
yellow,triangle,true,1,11,43.6498,9.8870
|
||||
red,circle,true,3,16,13.8103,2.9010
|
||||
purple,triangle,false,5,51,81.2290,8.5910
|
||||
purple,triangle,false,7,65,80.1405,5.8240
|
||||
yellow,circle,true,9,87,63.5058,8.3350
|
||||
6
test/cases/verb-split/0007/split_2.csv.expect
Normal file
6
test/cases/verb-split/0007/split_2.csv.expect
Normal file
|
|
@ -0,0 +1,6 @@
|
|||
color,shape,flag,k,index,quantity,rate
|
||||
red,square,true,2,15,79.2778,0.0130
|
||||
red,square,false,4,48,77.5542,7.4670
|
||||
red,square,false,6,64,77.1991,9.5310
|
||||
yellow,circle,true,8,73,63.9785,4.2370
|
||||
purple,square,false,10,91,72.3735,8.2430
|
||||
11
test/input/example.csv
Normal file
11
test/input/example.csv
Normal file
|
|
@ -0,0 +1,11 @@
|
|||
color,shape,flag,k,index,quantity,rate
|
||||
yellow,triangle,true,1,11,43.6498,9.8870
|
||||
red,square,true,2,15,79.2778,0.0130
|
||||
red,circle,true,3,16,13.8103,2.9010
|
||||
red,square,false,4,48,77.5542,7.4670
|
||||
purple,triangle,false,5,51,81.2290,8.5910
|
||||
red,square,false,6,64,77.1991,9.5310
|
||||
purple,triangle,false,7,65,80.1405,5.8240
|
||||
yellow,circle,true,8,73,63.9785,4.2370
|
||||
yellow,circle,true,9,87,63.5058,8.3350
|
||||
purple,square,false,10,91,72.3735,8.2430
|
||||
|
11
todo.txt
11
todo.txt
|
|
@ -1,4 +1,4 @@
|
|||
================================================================
|
||||
===============================================================
|
||||
RELEASES
|
||||
|
||||
* follow ...
|
||||
|
|
@ -26,6 +26,10 @@ FEATURES
|
|||
o format/unformat
|
||||
o strmatch
|
||||
o =~
|
||||
* separate examples from FAQs
|
||||
* mlr split -- needs an example page along with the tee DSL function
|
||||
* new example entry, with ccump and pgr
|
||||
o slwin --prune (or somesuch) to only emit averages over full windows -- ?
|
||||
|
||||
----------------------------------------------------------------
|
||||
k better print-interpolate with {} etc
|
||||
|
|
@ -33,15 +37,10 @@ k better print-interpolate with {} etc
|
|||
----------------------------------------------------------------
|
||||
! sysdate, sysdate_local; datediff ...
|
||||
|
||||
----------------------------------------------------------------
|
||||
mlr split ... -n, -g -- ?
|
||||
- how to specify filenames?
|
||||
|
||||
----------------------------------------------------------------
|
||||
! strmatch https://github.com/johnkerl/miller/issues/77#issuecomment-538790927
|
||||
|
||||
----------------------------------------------------------------
|
||||
* new example entry, with ccump and pgr
|
||||
* make a lag-by-n and lead-by-n
|
||||
|
||||
----------------------------------------------------------------
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue