miller/scripts/make-big-files
John Kerl f233923351
Performance improvement: record-batching (#779)
* Rename inputChannel,outputChannel to readerChannel,writerChannel

* Rename inputChannel,outputChannel to readerChannel,writerChannel (#772)

* Start batched-reader API mods

* Singleton-list step for reader-batching at input

* CLI options for records-per-batch and hash-records

* Push channelized-reader logic into DKVP reader

* Push batching logic into chain-transformer, transformers, and channel-writer

* foo

* cmd/mprof and cmd/mprof2

* cmd/mprof3 and cmd/mprof4

* narrowed in on regexp-splitting on IFS/IPS as perf-hit

* neaten

* channelize nidx

* cmd/mprof5

* channelize CSV reader

* channelize NIDX reader

* Dedupe DKVP-reader and NIDX-reader source files

* channelize CSV-lite reader

* channelize XTAB reader

* batchify JSON reader

* channelize GEN pseudo-reader

* scripts for perf-testing on larger files

* merge with main for #776

* Fix record-batching for join and repl

* Fix comment-handling in channelized XTAB reader

* Fix bug found in positional-rename
2021-12-13 00:57:52 -05:00

23 lines
496 B
Bash
Executable file

#!/bin/bash
set -x
mkdir ~/tmp/
mlr --csv \
repeat -n 100000 \
then shuffle \
then put '
begin{@index=1}
$k = NR;
@index += urandint(2,10);
$index=@index;
$quantity=fmtnum(urandrange(50,100),"%.4f");
$rate=fmtnum(urandrange(1,10),"%.4f");
' \
docs/src/example.csv > ~/tmp/big.csv
mlr --c2d cat ~/tmp/big.csv > ~/tmp/big.dkvp
mlr --c2j cat ~/tmp/big.csv > ~/tmp/big.json
mlr --c2n cat ~/tmp/big.csv > ~/tmp/big.nidx
mlr --c2x cat ~/tmp/big.csv > ~/tmp/big.xtab