update mlr -O behavior for #756 (#788)

This commit is contained in:
John Kerl 2021-12-21 22:40:34 -05:00 committed by GitHub
parent fafff68c20
commit 93862f16f9
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
12 changed files with 75 additions and 45 deletions

View file

@ -189,6 +189,9 @@ dev:
make -C docs
@echo DONE
docs:
make -C docs
# ----------------------------------------------------------------
# Keystroke-savers
it: build check
@ -216,4 +219,4 @@ release_tarball: build check
# ================================================================
# Go does its own dependency management, outside of make.
.PHONY: build mlr mprof mprof2 mprof3 mprof4 mprof5 check unit_test regression_test fmt dev
.PHONY: build mlr mprof mprof2 mprof3 mprof4 mprof5 check unit_test regression_test fmt dev docs

View file

@ -488,10 +488,11 @@ MISCELLANEOUS FLAGS
slight performance benefit.
--infer-int-as-float or -A
Cast all integers in data files to floats.
--infer-no-octal or -O Treat numbers like 0123 in data files as string
"0123", not octal for decimal 83 etc.
--infer-none or -S Don't treat values like 123 or 456.7 in data files as
int/float; leave them as strings.
--infer-octal or -O Treat numbers like 0123 in data files as numeric;
default is string. Note that 00--07 etc scan as int;
08-09 scan as float.
--load {filename} Load DSL script file for all put/filter operations on
the command line. If the name following `--load` is a
directory, load all `*.mlr` files in that directory.
@ -3006,5 +3007,5 @@ SEE ALSO
2021-12-15 MILLER(1)
2021-12-22 MILLER(1)
</pre>

View file

@ -467,10 +467,11 @@ MISCELLANEOUS FLAGS
slight performance benefit.
--infer-int-as-float or -A
Cast all integers in data files to floats.
--infer-no-octal or -O Treat numbers like 0123 in data files as string
"0123", not octal for decimal 83 etc.
--infer-none or -S Don't treat values like 123 or 456.7 in data files as
int/float; leave them as strings.
--infer-octal or -O Treat numbers like 0123 in data files as numeric;
default is string. Note that 00--07 etc scan as int;
08-09 scan as float.
--load {filename} Load DSL script file for all put/filter operations on
the command line. If the name following `--load` is a
directory, load all `*.mlr` files in that directory.
@ -2985,4 +2986,4 @@ SEE ALSO
2021-12-15 MILLER(1)
2021-12-22 MILLER(1)

View file

@ -251,7 +251,7 @@ The following differences are rather technical. If they don't sound familiar to
* See also `mlr help legacy-flags` or the [legacy-flags reference](reference-main-flag-list.md#legacy-flags).
* Type-inference:
* The `-S` and `-F` flags to `mlr put` and `mlr filter` are ignored, since type-inference is no longer done in `mlr put` and `mlr filter`, but rather, when records are first read. You can use `mlr -S` and `mlr -A`, respectively, instead to control type-inference within the record-readers.
* Similarly, use `mlr -O` to force octal-looking strings to remain strings like `"0123"`, not ints like `0123` which is 83 in decimal.
* Octal numbers like `0123` and `07` are type-inferred as string. Use `mlr -O` to infer them as octal integers. Note that `08` and `09` will then infer as float.
* See also the [miscellaneous-flags reference](reference-main-flag-list.md#miscellaneous-flags).
* Emitting a map-valued expression now requires either a temporary variable or the new `emit1` keyword. Please see the
[page on emit statements](reference-dsl-output-statements.md#emit1-and-emitemitpemitf) for more information.

View file

@ -209,7 +209,7 @@ The following differences are rather technical. If they don't sound familiar to
* See also `mlr help legacy-flags` or the [legacy-flags reference](reference-main-flag-list.md#legacy-flags).
* Type-inference:
* The `-S` and `-F` flags to `mlr put` and `mlr filter` are ignored, since type-inference is no longer done in `mlr put` and `mlr filter`, but rather, when records are first read. You can use `mlr -S` and `mlr -A`, respectively, instead to control type-inference within the record-readers.
* Similarly, use `mlr -O` to force octal-looking strings to remain strings like `"0123"`, not ints like `0123` which is 83 in decimal.
* Octal numbers like `0123` and `07` are type-inferred as string. Use `mlr -O` to infer them as octal integers. Note that `08` and `09` will then infer as float.
* See also the [miscellaneous-flags reference](reference-main-flag-list.md#miscellaneous-flags).
* Emitting a map-valued expression now requires either a temporary variable or the new `emit1` keyword. Please see the
[page on emit statements](reference-dsl-output-statements.md#emit1-and-emitemitpemitf) for more information.

View file

@ -345,10 +345,10 @@ These are flags which don't fit into any other category.
`: This is an internal parameter which normally does not need to be modified. It controls the mechanism by which Miller accesses fields within records. In general --no-hash-records is faster, and is the default. For specific use-cases involving data having many fields, and many of them being processed during a given processing run, --hash-records might offer a slight performance benefit.
* `--infer-int-as-float or -A
`: Cast all integers in data files to floats.
* `--infer-no-octal or -O
`: Treat numbers like 0123 in data files as string "0123", not octal for decimal 83 etc.
* `--infer-none or -S
`: Don't treat values like 123 or 456.7 in data files as int/float; leave them as strings.
* `--infer-octal or -O
`: Treat numbers like 0123 in data files as numeric; default is string. Note that 00--07 etc scan as int; 08-09 scan as float.
* `--load {filename}
`: Load DSL script file for all put/filter operations on the command line. If the name following `--load` is a directory, load all `*.mlr` files in that directory. This is just like `put -f` and `filter -f` except it's up-front on the command line, so you can do something like `alias mlr='mlr --load ~/myscripts'` if you like.
* `--mfrom {filenames}

View file

@ -2619,11 +2619,12 @@ data having many fields, and many of them being processed during a given process
},
{
name: "--infer-no-octal",
name: "--infer-octal",
altNames: []string{"-O"},
help: `Treat numbers like 0123 in data files as string "0123", not octal for decimal 83 etc.`,
help: `Treat numbers like 0123 in data files as numeric; default is string.
Note that 00--07 etc scan as int; 08-09 scan as float.`,
parser: func(args []string, argc int, pargi *int, options *TOptions) {
mlrval.SetInferrerNoOctal()
mlrval.SetInferrerOctalAsInt()
*pargi += 1
},
},

View file

@ -7,7 +7,7 @@ import (
"github.com/johnkerl/miller/internal/pkg/lib"
)
// TODO: no infer-bool from data files. Always false in this path.
// TODO: comment no infer-bool from data files. Always false in this path.
// It's essential that we use mv.Type() not mv.mvtype since types are
// JIT-computed on first access for most data-file values. See type.go for more
@ -23,14 +23,24 @@ func (mv *Mlrval) Type() MVType {
// Support for mlr -S, mlr -A, mlr -O.
type tInferrer func(mv *Mlrval, input string, inferBool bool) *Mlrval
var packageLevelInferrer tInferrer = inferNormally
var packageLevelInferrer tInferrer = inferWithOctalAsString
func SetInferrerNoOctal() {
packageLevelInferrer = inferWithOctalSuppress
// SetInferrerOctalAsInt is for default behavior.
func SetInferrerOctalAsString() {
packageLevelInferrer = inferWithOctalAsString
}
// SetInferrerOctalAsInt is for mlr -O.
func SetInferrerOctalAsInt() {
packageLevelInferrer = inferWithOctalAsInt
}
// SetInferrerStringOnly is for mlr -A.
func SetInferrerIntAsFloat() {
packageLevelInferrer = inferWithIntAsFloat
}
// SetInferrerStringOnly is for mlr -S.
func SetInferrerStringOnly() {
packageLevelInferrer = inferStringOnly
}
@ -47,7 +57,24 @@ var downcasedFloatNamesToNotInfer = map[string]bool{
"nan": true,
}
func inferNormally(mv *Mlrval, input string, inferBool bool) *Mlrval {
var octalDetector = regexp.MustCompile("^-?0[0-9]+")
// inferWithOctalAsString is for default behavior.
func inferWithOctalAsString(mv *Mlrval, input string, inferBool bool) *Mlrval {
inferWithOctalAsInt(mv, input, inferBool)
if mv.mvtype != MT_INT && mv.mvtype != MT_FLOAT {
return mv
}
if octalDetector.MatchString(mv.printrep) {
return mv.SetFromString(input)
} else {
return mv
}
}
// inferWithOctalAsInt is for mlr -O.
func inferWithOctalAsInt(mv *Mlrval, input string, inferBool bool) *Mlrval {
if input == "" {
return mv.SetFromVoid()
}
@ -73,23 +100,9 @@ func inferNormally(mv *Mlrval, input string, inferBool bool) *Mlrval {
return mv.SetFromString(input)
}
var octalDetector = regexp.MustCompile("^-?0[0-9]+")
func inferWithOctalSuppress(mv *Mlrval, input string, inferBool bool) *Mlrval {
inferNormally(mv, input, inferBool)
if mv.mvtype != MT_INT && mv.mvtype != MT_FLOAT {
return mv
}
if octalDetector.MatchString(mv.printrep) {
return mv.SetFromString(input)
} else {
return mv
}
}
// inferWithIntAsFloat is for mlr -A.
func inferWithIntAsFloat(mv *Mlrval, input string, inferBool bool) *Mlrval {
inferNormally(mv, input, inferBool)
inferWithOctalAsString(mv, input, inferBool)
if mv.Type() == MT_INT {
mv.floatval = float64(mv.intval)
mv.mvtype = MT_FLOAT
@ -97,6 +110,7 @@ func inferWithIntAsFloat(mv *Mlrval, input string, inferBool bool) *Mlrval {
return mv
}
// inferStringOnly is for mlr -S.
func inferStringOnly(mv *Mlrval, input string, inferBool bool) *Mlrval {
return mv.SetFromString(input)
}

View file

@ -19,10 +19,17 @@ func (mv *Mlrval) String() string {
if floatOutputFormatter != nil && mv.Type() == MT_FLOAT {
// Use the format string from global --ofmt, if supplied
return floatOutputFormatter.FormatFloat(mv.floatval)
} else {
mv.setPrintRep()
return mv.printrep
}
// TODO: track dirty-flag checking / somesuch.
// At present it's cumbersome to check if an array or map has been modified
// and it's safest to always recompute the string-rep.
if mv.IsArrayOrMap() {
mv.printrepValid = false
}
mv.setPrintRep()
return mv.printrep
}
// See mlrval.go for more about JIT-formatting of string backings

View file

@ -467,10 +467,11 @@ MISCELLANEOUS FLAGS
slight performance benefit.
--infer-int-as-float or -A
Cast all integers in data files to floats.
--infer-no-octal or -O Treat numbers like 0123 in data files as string
"0123", not octal for decimal 83 etc.
--infer-none or -S Don't treat values like 123 or 456.7 in data files as
int/float; leave them as strings.
--infer-octal or -O Treat numbers like 0123 in data files as numeric;
default is string. Note that 00--07 etc scan as int;
08-09 scan as float.
--load {filename} Load DSL script file for all put/filter operations on
the command line. If the name following `--load` is a
directory, load all `*.mlr` files in that directory.
@ -2985,4 +2986,4 @@ SEE ALSO
2021-12-15 MILLER(1)
2021-12-22 MILLER(1)

View file

@ -2,12 +2,12 @@
.\" Title: mlr
.\" Author: [see the "AUTHOR" section]
.\" Generator: ./mkman.rb
.\" Date: 2021-12-15
.\" Date: 2021-12-22
.\" Manual: \ \&
.\" Source: \ \&
.\" Language: English
.\"
.TH "MILLER" "1" "2021-12-15" "\ \&" "\ \&"
.TH "MILLER" "1" "2021-12-22" "\ \&" "\ \&"
.\" -----------------------------------------------------------------
.\" * Portability definitions
.\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@ -586,10 +586,11 @@ These are flags which don't fit into any other category.
slight performance benefit.
--infer-int-as-float or -A
Cast all integers in data files to floats.
--infer-no-octal or -O Treat numbers like 0123 in data files as string
"0123", not octal for decimal 83 etc.
--infer-none or -S Don't treat values like 123 or 456.7 in data files as
int/float; leave them as strings.
--infer-octal or -O Treat numbers like 0123 in data files as numeric;
default is string. Note that 00--07 etc scan as int;
08-09 scan as float.
--load {filename} Load DSL script file for all put/filter operations on
the command line. If the name following `--load` is a
directory, load all `*.mlr` files in that directory.

View file

@ -5,6 +5,7 @@ PUNCHDOWN LIST
- sort-hof check
- more linux perf checks
- mlr -O / abor!
> doc 07 int 08 float
- --ifs-regex & --ips-regex -- guessing is not safe as evidence by '.' and '|'
- big-picture item @ Rmd (csv memes; and beyond); also webdoc intro page
- function: randsel for arrays; use for example-csv-expander