mirror of https://github.com/johnkerl/miller.git synced 2026-01-23 02:14:13 +00:00

History

John Kerl 898fabf99f Port top verb from C to Go		2021-03-17 00:49:27 -04:00
..
cases	Port top verb from C to Go	2021-03-17 00:49:27 -04:00
expected	Port top verb from C to Go	2021-03-17 00:49:27 -04:00
input	accept-case ./reg-test/cases/case-c-dsl-user-defined-functions.sh	2021-03-01 23:37:33 -05:00
README.md	reg-test/README.md	2020-11-24 16:46:26 -05:00
run	Port most-frequent/least-frequent verbs from C to Go	2021-03-14 00:55:04 -05:00

README.md

Miller regression tests

There are a few files unit-tested with Go's testing package -- a few dozen cases total.

The vast majority of Miller tests, though -- thousands of cases -- are tested by running scripted invocations of mlr with various flags and inputs, comparing against expected output, and checking the exit code back to the shell.

How to run the regression tests, in brief

Note: while this README.md file is within the go/reg-test/ subdirectory, all paths in this file are written from the perspective of the user being cd'ed into the go/ directory, i.e. this directory's parent directory.

rr in the Miller go/ subdirectory is an alias-like script for reg-test/run
reg-test/run --help for help
Without -v, this runs all cases with a pass/fail indication per case, and an overall pass/fail indication -- overall pass only if all cases pass. With -v, individual invocation lines are shown, as well as diff output comparing actual to expected results.

Items for the duration of the Go port

rr -o runs only cases where the output was generated by the C version of mlr: case-files starting with names case-c-.
rr -n runs cases which were either written for the Go version, or which have been verified to work with it: case-files not starting with names case-c-. (As the port progresses, the case-c- and case-go- prefixes will be removed and replaced with simply case-.)
rr -c runs the C version of Miller from the local checkout; rr -g (which is the default) runs the Go version of Miller from the local checkout.

Examples

All new/ported cases:

$ reg-test/run -n
Using mlr executable ./reg-test/../../go/mlr
PASS  ./reg-test/cases/case-altkv.sh
PASS  ./reg-test/cases/case-bootstrap.sh
PASS  ./reg-test/cases/case-cat.sh
PASS  ./reg-test/cases/case-env.sh
PASS  ./reg-test/cases/case-go-chain.sh
...
PASS  ./reg-test/cases/case-sample.sh
PASS  ./reg-test/cases/case-shuffle.sh
PASS  ./reg-test/cases/case-subr.sh

NUMBER OF MILLER INVOCATIONS 807
NUMBER OF CASES PASSED 84
NUMBER OF CASES FAILED 0

PASS

All cases (as of November 2020 -- Go port in progress, not all C cases succeeding yet as many things are not ported):

$ reg-test/run
Using mlr executable ./reg-test/../../go/mlr
PASS  ./reg-test/cases/case-altkv.sh
PASS  ./reg-test/cases/case-bootstrap.sh
FAIL  ./reg-test/cases/case-c-auxents.sh
PASS  ./reg-test/cases/case-c-awkish-conds.sh
FAIL  ./reg-test/cases/case-c-bar.sh
...
PASS  ./reg-test/cases/case-go-skip-trivial-records.sh
PASS  ./reg-test/cases/case-go-sort-within-records.sh
PASS  ./reg-test/cases/case-go-sort.sh
PASS  ./reg-test/cases/case-go-tail.sh
PASS  ./reg-test/cases/case-go-unsparsify.sh
PASS  ./reg-test/cases/case-min-max-types.sh
PASS  ./reg-test/cases/case-no-filter-in-filter.sh
PASS  ./reg-test/cases/case-remove-empty-columns.sh
PASS  ./reg-test/cases/case-sample.sh
PASS  ./reg-test/cases/case-shuffle.sh
PASS  ./reg-test/cases/case-subr.sh

NUMBER OF MILLER INVOCATIONS 3767
NUMBER OF CASES PASSED 127
NUMBER OF CASES FAILED 91

FAIL

Single case, with verbosity:

$ reg-test/run -v reg-test/cases/case-cat.sh
Using mlr executable reg-test/../../go/mlr

----------------------------------------------------------------
BEGIN reg-test/cases/case-cat.sh

mlr cat reg-test/input/abixy
mlr cat /dev/null
mlr cat -n reg-test/input/abixy-het
mlr cat -N counter reg-test/input/abixy-het
mlr cat -g a,b reg-test/input/abixy-het
mlr cat -g a,b reg-test/input/abixy-het
mlr cat -g a,b -n reg-test/input/abixy-het
mlr cat -g a,b -N counter reg-test/input/abixy-het
mlr cat
mlr cat
mlr --opprint cat reg-test/input/s.dkvp
mlr --opprint cat -n reg-test/input/s.dkvp
mlr --opprint cat -n -g a reg-test/input/s.dkvp
mlr --opprint cat -n -g a,b reg-test/input/s.dkvp

num_invocations_attempted  14
num_invocations_passed     14
num_invocations_failed     0

diff -a -I ^mlr -I ^Miller: -I ^cat reg-test/expected/case-cat.sh.out output-reg-test/case-cat.sh.out

PASS  reg-test/cases/case-cat.sh
----------------------------------------------------------------

NUMBER OF MILLER INVOCATIONS 14
NUMBER OF CASES PASSED 1
NUMBER OF CASES FAILED 0

PASS

More details

reg-test/cases/case*.sh files consist of "invocations" of Miller.
Each case ./reg-test/cases/case-foo.sh has expected output ./reg-test/expected/case-foo.sh.out and ./reg-test/expected/case-foo.sh.err, along with actual outputs output-reg-test/case-c-cat.sh.out and output-reg-test/case-c-cat.sh.err.
The reg-test/run script loops over all of the case-*.sh files and executes them via sourcing them with the Bash . operator.
Each has lines of the form run_mlr ... or mlr_expect_fail .... Those functions are defined in reg_test/run. They take mlr command-lines as arguments. They simply invoke mlr with the specified arguments, along with logic to check/count the shell exit codes, save off output for comparison to expected, etc.
Each case-file can fail in the following ways:
- Zero invocations were attempted.
- A given run_mlr ... invocation exits with non-zero when it should exit with zero.
- A given mlr_expect_fail ... invocation exits with zero when it should exit with non-zero.
- The output of the invocations in the case's actual-output file differs from the case's expected-output file.

Example single case-file:

$ cat reg-test/cases/case-cat.sh
run_mlr cat $indir/abixy
run_mlr cat /dev/null

run_mlr cat -n $indir/abixy-het
run_mlr cat -N counter $indir/abixy-het

run_mlr cat -g a,b $indir/abixy-het
run_mlr cat -g a,b $indir/abixy-het

run_mlr cat -g a,b -n $indir/abixy-het
run_mlr cat -g a,b -N counter $indir/abixy-het

run_mlr cat <<EOF
a,b,c,d,e,f
EOF
run_mlr cat <<EOF
a,b,c,d,e,f,g
EOF

run_mlr --opprint cat           $indir/s.dkvp
run_mlr --opprint cat -n        $indir/s.dkvp
run_mlr --opprint cat -n -g a   $indir/s.dkvp
run_mlr --opprint cat -n -g a,b $indir/s.dkvp

Debugging failures of existing cases

If a case fails, you can run it by itself with -v if you like: e.g. ./reg-test/run -v reg-test/cases/case-cat.sh.
Also -C 1 or -C 5, etc. (note the space) to control number of context lines in the diff output.
You can also run rr -s ... to view the output without diffing against expected results.

Example with -s:

$ rr -s $cases/case-cat.sh
Using mlr executable ./reg-test/../../go/mlr


mlr cat ./reg-test/input/abixy
a=pan,b=pan,i=1,x=0.3467901443380824,y=0.7268028627434533
a=eks,b=pan,i=2,x=0.7586799647899636,y=0.5221511083334797
a=wye,b=wye,i=3,x=0.20460330576630303,y=0.33831852551664776
a=eks,b=wye,i=4,x=0.38139939387114097,y=0.13418874328430463
a=wye,b=pan,i=5,x=0.5732889198020006,y=0.8636244699032729
a=zee,b=pan,i=6,x=0.5271261600918548,y=0.49322128674835697
a=eks,b=zee,i=7,x=0.6117840605678454,y=0.1878849191181694
a=zee,b=wye,i=8,x=0.5985540091064224,y=0.976181385699006
a=hat,b=wye,i=9,x=0.03144187646093577,y=0.7495507603507059
a=pan,b=wye,i=10,x=0.5026260055412137,y=0.9526183602969864

mlr cat /dev/null

...

mlr --opprint cat -n -g a,b ./reg-test/input/s.dkvp
n a   b   i x                   y
1 pan pan 1 0.3467901443380824  0.7268028627434533
1 eks pan 2 0.7586799647899636  0.5221511083334797
1 wye wye 3 0.20460330576630303 0.33831852551664776
1 eks wye 4 0.38139939387114097 0.13418874328430463

Creating new cases

Edit reg-test/cases/case-new-name-goes-here.sh. Note that the reg-test/cases directory path, the filename starting with case-, and the filename ending with .sh are all required.
Run reg-test/run reg-test/cases/case-new-name-goes-here.sh
That will create output-regtest/case-new-name-goes-here.sh.out
If this all looks OK, accept-case case-new-name-goes-here.sh which will copy actual output to reg-test/expected/case-new-name-goes-here.sh.out
Add the case*sh and the expected-output file to source control.