From 1c1d239e2750986ef198e4b1cd7a1178a5c1542a Mon Sep 17 00:00:00 2001 From: John Kerl Date: Tue, 21 Dec 2021 22:36:38 -0500 Subject: [PATCH] update mlr -O behavior for #756 --- Makefile | 5 ++- docs/src/manpage.md | 7 ++-- docs/src/manpage.txt | 7 ++-- docs/src/new-in-miller-6.md | 2 +- docs/src/new-in-miller-6.md.in | 2 +- docs/src/reference-main-flag-list.md | 4 +- internal/pkg/cli/option_parse.go | 7 ++-- internal/pkg/mlrval/mlrval_infer.go | 56 +++++++++++++++++----------- internal/pkg/mlrval/mlrval_output.go | 13 +++++-- man/manpage.txt | 7 ++-- man/mlr.1 | 9 +++-- todo.txt | 1 + 12 files changed, 75 insertions(+), 45 deletions(-) diff --git a/Makefile b/Makefile index 187182701..0e8e8cf27 100644 --- a/Makefile +++ b/Makefile @@ -189,6 +189,9 @@ dev: make -C docs @echo DONE +docs: + make -C docs + # ---------------------------------------------------------------- # Keystroke-savers it: build check @@ -216,4 +219,4 @@ release_tarball: build check # ================================================================ # Go does its own dependency management, outside of make. -.PHONY: build mlr mprof mprof2 mprof3 mprof4 mprof5 check unit_test regression_test fmt dev +.PHONY: build mlr mprof mprof2 mprof3 mprof4 mprof5 check unit_test regression_test fmt dev docs diff --git a/docs/src/manpage.md b/docs/src/manpage.md index 12aadf7c3..dc04973c3 100644 --- a/docs/src/manpage.md +++ b/docs/src/manpage.md @@ -488,10 +488,11 @@ MISCELLANEOUS FLAGS slight performance benefit. --infer-int-as-float or -A Cast all integers in data files to floats. - --infer-no-octal or -O Treat numbers like 0123 in data files as string - "0123", not octal for decimal 83 etc. --infer-none or -S Don't treat values like 123 or 456.7 in data files as int/float; leave them as strings. + --infer-octal or -O Treat numbers like 0123 in data files as numeric; + default is string. Note that 00--07 etc scan as int; + 08-09 scan as float. --load {filename} Load DSL script file for all put/filter operations on the command line. If the name following `--load` is a directory, load all `*.mlr` files in that directory. @@ -3006,5 +3007,5 @@ SEE ALSO - 2021-12-15 MILLER(1) + 2021-12-22 MILLER(1) diff --git a/docs/src/manpage.txt b/docs/src/manpage.txt index 36a13164c..e2b0211a2 100644 --- a/docs/src/manpage.txt +++ b/docs/src/manpage.txt @@ -467,10 +467,11 @@ MISCELLANEOUS FLAGS slight performance benefit. --infer-int-as-float or -A Cast all integers in data files to floats. - --infer-no-octal or -O Treat numbers like 0123 in data files as string - "0123", not octal for decimal 83 etc. --infer-none or -S Don't treat values like 123 or 456.7 in data files as int/float; leave them as strings. + --infer-octal or -O Treat numbers like 0123 in data files as numeric; + default is string. Note that 00--07 etc scan as int; + 08-09 scan as float. --load {filename} Load DSL script file for all put/filter operations on the command line. If the name following `--load` is a directory, load all `*.mlr` files in that directory. @@ -2985,4 +2986,4 @@ SEE ALSO - 2021-12-15 MILLER(1) + 2021-12-22 MILLER(1) diff --git a/docs/src/new-in-miller-6.md b/docs/src/new-in-miller-6.md index 2c9b1acef..9c160dfca 100644 --- a/docs/src/new-in-miller-6.md +++ b/docs/src/new-in-miller-6.md @@ -251,7 +251,7 @@ The following differences are rather technical. If they don't sound familiar to * See also `mlr help legacy-flags` or the [legacy-flags reference](reference-main-flag-list.md#legacy-flags). * Type-inference: * The `-S` and `-F` flags to `mlr put` and `mlr filter` are ignored, since type-inference is no longer done in `mlr put` and `mlr filter`, but rather, when records are first read. You can use `mlr -S` and `mlr -A`, respectively, instead to control type-inference within the record-readers. - * Similarly, use `mlr -O` to force octal-looking strings to remain strings like `"0123"`, not ints like `0123` which is 83 in decimal. + * Octal numbers like `0123` and `07` are type-inferred as string. Use `mlr -O` to infer them as octal integers. Note that `08` and `09` will then infer as float. * See also the [miscellaneous-flags reference](reference-main-flag-list.md#miscellaneous-flags). * Emitting a map-valued expression now requires either a temporary variable or the new `emit1` keyword. Please see the [page on emit statements](reference-dsl-output-statements.md#emit1-and-emitemitpemitf) for more information. diff --git a/docs/src/new-in-miller-6.md.in b/docs/src/new-in-miller-6.md.in index 25a2c20e3..171df64ea 100644 --- a/docs/src/new-in-miller-6.md.in +++ b/docs/src/new-in-miller-6.md.in @@ -209,7 +209,7 @@ The following differences are rather technical. If they don't sound familiar to * See also `mlr help legacy-flags` or the [legacy-flags reference](reference-main-flag-list.md#legacy-flags). * Type-inference: * The `-S` and `-F` flags to `mlr put` and `mlr filter` are ignored, since type-inference is no longer done in `mlr put` and `mlr filter`, but rather, when records are first read. You can use `mlr -S` and `mlr -A`, respectively, instead to control type-inference within the record-readers. - * Similarly, use `mlr -O` to force octal-looking strings to remain strings like `"0123"`, not ints like `0123` which is 83 in decimal. + * Octal numbers like `0123` and `07` are type-inferred as string. Use `mlr -O` to infer them as octal integers. Note that `08` and `09` will then infer as float. * See also the [miscellaneous-flags reference](reference-main-flag-list.md#miscellaneous-flags). * Emitting a map-valued expression now requires either a temporary variable or the new `emit1` keyword. Please see the [page on emit statements](reference-dsl-output-statements.md#emit1-and-emitemitpemitf) for more information. diff --git a/docs/src/reference-main-flag-list.md b/docs/src/reference-main-flag-list.md index 3c4ce7ea2..3aa2eeda8 100644 --- a/docs/src/reference-main-flag-list.md +++ b/docs/src/reference-main-flag-list.md @@ -345,10 +345,10 @@ These are flags which don't fit into any other category. `: This is an internal parameter which normally does not need to be modified. It controls the mechanism by which Miller accesses fields within records. In general --no-hash-records is faster, and is the default. For specific use-cases involving data having many fields, and many of them being processed during a given processing run, --hash-records might offer a slight performance benefit. * `--infer-int-as-float or -A `: Cast all integers in data files to floats. -* `--infer-no-octal or -O -`: Treat numbers like 0123 in data files as string "0123", not octal for decimal 83 etc. * `--infer-none or -S `: Don't treat values like 123 or 456.7 in data files as int/float; leave them as strings. +* `--infer-octal or -O +`: Treat numbers like 0123 in data files as numeric; default is string. Note that 00--07 etc scan as int; 08-09 scan as float. * `--load {filename} `: Load DSL script file for all put/filter operations on the command line. If the name following `--load` is a directory, load all `*.mlr` files in that directory. This is just like `put -f` and `filter -f` except it's up-front on the command line, so you can do something like `alias mlr='mlr --load ~/myscripts'` if you like. * `--mfrom {filenames} diff --git a/internal/pkg/cli/option_parse.go b/internal/pkg/cli/option_parse.go index 68a680dbc..c88a7bd4c 100644 --- a/internal/pkg/cli/option_parse.go +++ b/internal/pkg/cli/option_parse.go @@ -2619,11 +2619,12 @@ data having many fields, and many of them being processed during a given process }, { - name: "--infer-no-octal", + name: "--infer-octal", altNames: []string{"-O"}, - help: `Treat numbers like 0123 in data files as string "0123", not octal for decimal 83 etc.`, + help: `Treat numbers like 0123 in data files as numeric; default is string. +Note that 00--07 etc scan as int; 08-09 scan as float.`, parser: func(args []string, argc int, pargi *int, options *TOptions) { - mlrval.SetInferrerNoOctal() + mlrval.SetInferrerOctalAsInt() *pargi += 1 }, }, diff --git a/internal/pkg/mlrval/mlrval_infer.go b/internal/pkg/mlrval/mlrval_infer.go index e445d04ad..bdfb10a2b 100644 --- a/internal/pkg/mlrval/mlrval_infer.go +++ b/internal/pkg/mlrval/mlrval_infer.go @@ -7,7 +7,7 @@ import ( "github.com/johnkerl/miller/internal/pkg/lib" ) -// TODO: no infer-bool from data files. Always false in this path. +// TODO: comment no infer-bool from data files. Always false in this path. // It's essential that we use mv.Type() not mv.mvtype since types are // JIT-computed on first access for most data-file values. See type.go for more @@ -23,14 +23,24 @@ func (mv *Mlrval) Type() MVType { // Support for mlr -S, mlr -A, mlr -O. type tInferrer func(mv *Mlrval, input string, inferBool bool) *Mlrval -var packageLevelInferrer tInferrer = inferNormally +var packageLevelInferrer tInferrer = inferWithOctalAsString -func SetInferrerNoOctal() { - packageLevelInferrer = inferWithOctalSuppress +// SetInferrerOctalAsInt is for default behavior. +func SetInferrerOctalAsString() { + packageLevelInferrer = inferWithOctalAsString } + +// SetInferrerOctalAsInt is for mlr -O. +func SetInferrerOctalAsInt() { + packageLevelInferrer = inferWithOctalAsInt +} + +// SetInferrerStringOnly is for mlr -A. func SetInferrerIntAsFloat() { packageLevelInferrer = inferWithIntAsFloat } + +// SetInferrerStringOnly is for mlr -S. func SetInferrerStringOnly() { packageLevelInferrer = inferStringOnly } @@ -47,7 +57,24 @@ var downcasedFloatNamesToNotInfer = map[string]bool{ "nan": true, } -func inferNormally(mv *Mlrval, input string, inferBool bool) *Mlrval { +var octalDetector = regexp.MustCompile("^-?0[0-9]+") + +// inferWithOctalAsString is for default behavior. +func inferWithOctalAsString(mv *Mlrval, input string, inferBool bool) *Mlrval { + inferWithOctalAsInt(mv, input, inferBool) + if mv.mvtype != MT_INT && mv.mvtype != MT_FLOAT { + return mv + } + + if octalDetector.MatchString(mv.printrep) { + return mv.SetFromString(input) + } else { + return mv + } +} + +// inferWithOctalAsInt is for mlr -O. +func inferWithOctalAsInt(mv *Mlrval, input string, inferBool bool) *Mlrval { if input == "" { return mv.SetFromVoid() } @@ -73,23 +100,9 @@ func inferNormally(mv *Mlrval, input string, inferBool bool) *Mlrval { return mv.SetFromString(input) } -var octalDetector = regexp.MustCompile("^-?0[0-9]+") - -func inferWithOctalSuppress(mv *Mlrval, input string, inferBool bool) *Mlrval { - inferNormally(mv, input, inferBool) - if mv.mvtype != MT_INT && mv.mvtype != MT_FLOAT { - return mv - } - - if octalDetector.MatchString(mv.printrep) { - return mv.SetFromString(input) - } else { - return mv - } -} - +// inferWithIntAsFloat is for mlr -A. func inferWithIntAsFloat(mv *Mlrval, input string, inferBool bool) *Mlrval { - inferNormally(mv, input, inferBool) + inferWithOctalAsString(mv, input, inferBool) if mv.Type() == MT_INT { mv.floatval = float64(mv.intval) mv.mvtype = MT_FLOAT @@ -97,6 +110,7 @@ func inferWithIntAsFloat(mv *Mlrval, input string, inferBool bool) *Mlrval { return mv } +// inferStringOnly is for mlr -S. func inferStringOnly(mv *Mlrval, input string, inferBool bool) *Mlrval { return mv.SetFromString(input) } diff --git a/internal/pkg/mlrval/mlrval_output.go b/internal/pkg/mlrval/mlrval_output.go index 672f9b93e..437f77347 100644 --- a/internal/pkg/mlrval/mlrval_output.go +++ b/internal/pkg/mlrval/mlrval_output.go @@ -19,10 +19,17 @@ func (mv *Mlrval) String() string { if floatOutputFormatter != nil && mv.Type() == MT_FLOAT { // Use the format string from global --ofmt, if supplied return floatOutputFormatter.FormatFloat(mv.floatval) - } else { - mv.setPrintRep() - return mv.printrep } + + // TODO: track dirty-flag checking / somesuch. + // At present it's cumbersome to check if an array or map has been modified + // and it's safest to always recompute the string-rep. + if mv.IsArrayOrMap() { + mv.printrepValid = false + } + + mv.setPrintRep() + return mv.printrep } // See mlrval.go for more about JIT-formatting of string backings diff --git a/man/manpage.txt b/man/manpage.txt index 36a13164c..e2b0211a2 100644 --- a/man/manpage.txt +++ b/man/manpage.txt @@ -467,10 +467,11 @@ MISCELLANEOUS FLAGS slight performance benefit. --infer-int-as-float or -A Cast all integers in data files to floats. - --infer-no-octal or -O Treat numbers like 0123 in data files as string - "0123", not octal for decimal 83 etc. --infer-none or -S Don't treat values like 123 or 456.7 in data files as int/float; leave them as strings. + --infer-octal or -O Treat numbers like 0123 in data files as numeric; + default is string. Note that 00--07 etc scan as int; + 08-09 scan as float. --load {filename} Load DSL script file for all put/filter operations on the command line. If the name following `--load` is a directory, load all `*.mlr` files in that directory. @@ -2985,4 +2986,4 @@ SEE ALSO - 2021-12-15 MILLER(1) + 2021-12-22 MILLER(1) diff --git a/man/mlr.1 b/man/mlr.1 index d8880c326..172e326c8 100644 --- a/man/mlr.1 +++ b/man/mlr.1 @@ -2,12 +2,12 @@ .\" Title: mlr .\" Author: [see the "AUTHOR" section] .\" Generator: ./mkman.rb -.\" Date: 2021-12-15 +.\" Date: 2021-12-22 .\" Manual: \ \& .\" Source: \ \& .\" Language: English .\" -.TH "MILLER" "1" "2021-12-15" "\ \&" "\ \&" +.TH "MILLER" "1" "2021-12-22" "\ \&" "\ \&" .\" ----------------------------------------------------------------- .\" * Portability definitions .\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -586,10 +586,11 @@ These are flags which don't fit into any other category. slight performance benefit. --infer-int-as-float or -A Cast all integers in data files to floats. ---infer-no-octal or -O Treat numbers like 0123 in data files as string - "0123", not octal for decimal 83 etc. --infer-none or -S Don't treat values like 123 or 456.7 in data files as int/float; leave them as strings. +--infer-octal or -O Treat numbers like 0123 in data files as numeric; + default is string. Note that 00--07 etc scan as int; + 08-09 scan as float. --load {filename} Load DSL script file for all put/filter operations on the command line. If the name following `--load` is a directory, load all `*.mlr` files in that directory. diff --git a/todo.txt b/todo.txt index 5e576bca7..334b8aee2 100644 --- a/todo.txt +++ b/todo.txt @@ -5,6 +5,7 @@ PUNCHDOWN LIST - sort-hof check - more linux perf checks - mlr -O / abor! + > doc 07 int 08 float - --ifs-regex & --ips-regex -- guessing is not safe as evidence by '.' and '|' - big-picture item @ Rmd (csv memes; and beyond); also webdoc intro page - function: randsel for arrays; use for example-csv-expander