miller/scripts/make-data-stream
John Kerl f233923351
Performance improvement: record-batching (#779)
* Rename inputChannel,outputChannel to readerChannel,writerChannel

* Rename inputChannel,outputChannel to readerChannel,writerChannel (#772)

* Start batched-reader API mods

* Singleton-list step for reader-batching at input

* CLI options for records-per-batch and hash-records

* Push channelized-reader logic into DKVP reader

* Push batching logic into chain-transformer, transformers, and channel-writer

* foo

* cmd/mprof and cmd/mprof2

* cmd/mprof3 and cmd/mprof4

* narrowed in on regexp-splitting on IFS/IPS as perf-hit

* neaten

* channelize nidx

* cmd/mprof5

* channelize CSV reader

* channelize NIDX reader

* Dedupe DKVP-reader and NIDX-reader source files

* channelize CSV-lite reader

* channelize XTAB reader

* batchify JSON reader

* channelize GEN pseudo-reader

* scripts for perf-testing on larger files

* merge with main for #776

* Fix record-batching for join and repl

* Fix comment-handling in channelized XTAB reader

* Fix bug found in positional-rename
2021-12-13 00:57:52 -05:00

32 lines
777 B
Text
Executable file

stop=1000000000
profile=""
#stop=1000000
#profile="cpuprofile cpu.pprof"
mlr \
$profile \
--ocsv \
--igen --gen-stop $stop \
put '
begin {
@colors=["red","purple","yellow","green","blue","orange"];
@shapes=["triangle","square","circle","pentagon","hexagon"];
@index = 1;
}
$color = @colors[urandint(1,length(@colors))];
$shape = @shapes[urandint(1,length(@shapes))];
$flag = (urand32() < 0.6) ? "true" : "false";
$index = @index;
$quantity=fmtnum(urandrange(50,100),"%.4f");
$rate=fmtnum(urandrange(1,10),"%.4f");
@index += urandint(2,10);
' \
then filter '$quantity > 60.0' \
then put '$y = $k + $index**2 + log10($rate/$quantity)' \
then rename i,k \
then cut -xf index \
then filter '$rate > 2'