mirror of
https://github.com/johnkerl/miller.git
synced 2026-01-23 02:14:13 +00:00
Export library code in pkg/ (#1391)
* Export library code in `pkg/` * new doc page
This commit is contained in:
parent
93b7c8eac0
commit
268a96d002
358 changed files with 1076 additions and 693 deletions
39
pkg/types/README.md
Normal file
39
pkg/types/README.md
Normal file
|
|
@ -0,0 +1,39 @@
|
|||
This contains the implementation of the [`types.Mlrval`](./mlrval.go) datatype which is used for record values, as well as expression/variable values in the Miller `put`/`filter` DSL.
|
||||
|
||||
## Mlrval
|
||||
|
||||
The [`types.Mlrval`](./mlrval.go) structure includes **string, int, float, boolean, array-of-mlrval, map-string-to-mlrval, void, absent, and error** types as well as type-conversion logic for various operators.
|
||||
|
||||
* Miller's `absent` type is like Javascript's `undefined` -- it's for times when there is no such key, as in a DSL expression `$out = $foo` when the input record is `$x=3,y=4` -- there is no `$foo` so `$foo` has `absent` type. Nothing is written to the `$out` field in this case. See also [here](https://miller.readthedocs.io/en/latest/reference-main-null-data) for more information.
|
||||
* Miller's `void` type is like Javascript's `null` -- it's for times when there is a key with no value, as in `$out = $x` when the input record is `$x=,$y=4`. This is an overlap with `string` type, since a void value looks like an empty string. I've gone back and forth on this (including when I was writing the C implementation) -- whether to retain `void` as a distinct type from empty-string, or not. I ended up keeping it as it made the `Mlrval` logic easier to understand.
|
||||
* Miller's `error` type is for things like doing type-uncoerced addition of strings. Data-dependent errors are intended to result in `(error)`-valued output, rather than crashing Miller. See also [here](https://miller.readthedocs.io/en/latest/reference-main-data-types) for more information.
|
||||
* Miller's number handling makes auto-overflow from int to float transparent, while preserving the possibility of 64-bit bitwise arithmetic.
|
||||
* This is different from JavaScript, which has only double-precision floats and thus no support for 64-bit numbers (note however that there is now [`BigInt`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/BigInt)).
|
||||
* This is also different from C and Go, wherein casts are necessary -- without which int arithmetic overflows.
|
||||
* Using `$a * $b` in Miller will auto-overflow to float. Using `$a .* $b` will stick with 64-bit integers (if `$a` and `$b` are already 64-bit integers).
|
||||
* More generally:
|
||||
* Bitwise operators such as `|`, `&`, and `^` map ints to ints.
|
||||
* The auto-overflowing math operators `+`, `*`, etc. map ints to ints unless they overflow in which case float is produced.
|
||||
* The int-preserving math operators `.+`, `.*`, etc. map ints to ints even if they overflow.
|
||||
* See also [here](https://miller.readthedocs.io/en/latest/reference-main-arithmetic) for the semantics of Miller arithmetic, which the `Mlrval` class implements.
|
||||
* Since a Mlrval can be of type array-of-mlrval or map-string-to-mlrval, a Mlrval is suited for JSON decoding/encoding.
|
||||
|
||||
# Mlrmap
|
||||
|
||||
[`types.Mlrmap`](./mlrmap.go) is the sequence of key-value pairs which represents a Miller record. The key-lookup mechanism is optimized for Miller read/write usage patterns -- please see `mlrmap.go` for more details.
|
||||
|
||||
It's also an ordered map structure, with string keys and Mlrval values. This is used within Mlrval itself.
|
||||
|
||||
# Context
|
||||
|
||||
[`types.Context`](./context.go) supports AWK-like variables such as `FILENAME`, `NF`, `NR`, and so on.
|
||||
|
||||
# A note on JSON
|
||||
|
||||
* The code for JSON I/O is mixed between `Mlrval` and `Mlrmap. This is unsurprising since JSON is a mutually recursive data structure -- arrays can contain maps and vice versa.
|
||||
* JSON has non-collection types (string, int, float, etc) as well as collection types (array and object). Support for objects is principally in [./mlrmap_json.go](mlrmap_json.go); support for non-collection types as well as arrays is in [./mlrval_json.go](mlrval_json.go).
|
||||
* Both multi-line and single-line formats are supported.
|
||||
* Callsites for JSON output are record-writing (e.g. `--ojson`), the `dump` and `print` DSL routines, and the `json_stringify` DSL function.
|
||||
* The choice between single-line and multi-line for JSON record-writing is controlled by `--jvstack` and `--no-jvstack`, the former (multiline) being the default.
|
||||
* The `dump` and `print` DSL routines produce multi-line output without a way for the user to choose single-line output.
|
||||
* The `json_stringify` DSL function lets the user specify multi-line or single-line, with the former being the default,
|
||||
166
pkg/types/context.go
Normal file
166
pkg/types/context.go
Normal file
|
|
@ -0,0 +1,166 @@
|
|||
package types
|
||||
|
||||
import (
|
||||
"bytes"
|
||||
"container/list"
|
||||
"strconv"
|
||||
|
||||
"github.com/johnkerl/miller/pkg/mlrval"
|
||||
)
|
||||
|
||||
// Since Go is concurrent, the context struct (AWK-like variables such as
|
||||
// FILENAME, NF, NR, FNR, etc.) needs to be duplicated and passed through the
|
||||
// channels along with each record.
|
||||
//
|
||||
// Strings to be printed from put/filter DSL print/dump/etc statements are
|
||||
// passed along to the output channel via this OutputString rather than
|
||||
// fmt.Println directly in the put/filter handlers since we want all print
|
||||
// statements and record-output to be in the same goroutine, for deterministic
|
||||
// output ordering.
|
||||
|
||||
type RecordAndContext struct {
|
||||
Record *mlrval.Mlrmap
|
||||
Context Context
|
||||
OutputString string
|
||||
EndOfStream bool
|
||||
}
|
||||
|
||||
func NewRecordAndContext(
|
||||
record *mlrval.Mlrmap,
|
||||
context *Context,
|
||||
) *RecordAndContext {
|
||||
return &RecordAndContext{
|
||||
Record: record,
|
||||
// Since Go is concurrent, the context struct needs to be duplicated and
|
||||
// passed through the channels along with each record. Here is where
|
||||
// the copy happens, via the '*' in *context.
|
||||
Context: *context,
|
||||
OutputString: "",
|
||||
EndOfStream: false,
|
||||
}
|
||||
}
|
||||
|
||||
// For the record-readers to update their initial context as each new record is read.
|
||||
func (rac *RecordAndContext) Copy() *RecordAndContext {
|
||||
if rac == nil {
|
||||
return nil
|
||||
}
|
||||
recordCopy := rac.Record.Copy()
|
||||
contextCopy := rac.Context
|
||||
return &RecordAndContext{
|
||||
Record: recordCopy,
|
||||
Context: contextCopy,
|
||||
OutputString: "",
|
||||
EndOfStream: false,
|
||||
}
|
||||
}
|
||||
|
||||
// For print/dump/etc to insert strings sequenced into the record-output
|
||||
// stream. This avoids race conditions between different goroutines printing
|
||||
// to stdout: we have a single designated goroutine printing to stdout. This
|
||||
// makes output more predictable and intuitive for users; it also makes our
|
||||
// regression tests run reliably the same each time.
|
||||
func NewOutputString(
|
||||
outputString string,
|
||||
context *Context,
|
||||
) *RecordAndContext {
|
||||
return &RecordAndContext{
|
||||
Record: nil,
|
||||
Context: *context,
|
||||
OutputString: outputString,
|
||||
EndOfStream: false,
|
||||
}
|
||||
}
|
||||
|
||||
// For the record-readers to update their initial context as each new record is read.
|
||||
func NewEndOfStreamMarker(context *Context) *RecordAndContext {
|
||||
return &RecordAndContext{
|
||||
Record: nil,
|
||||
Context: *context,
|
||||
OutputString: "",
|
||||
EndOfStream: true,
|
||||
}
|
||||
}
|
||||
|
||||
// TODO: comment
|
||||
// For the record-readers to update their initial context as each new record is read.
|
||||
func NewEndOfStreamMarkerList(context *Context) *list.List {
|
||||
ell := list.New()
|
||||
ell.PushBack(NewEndOfStreamMarker(context))
|
||||
return ell
|
||||
}
|
||||
|
||||
// ----------------------------------------------------------------
|
||||
type Context struct {
|
||||
FILENAME string
|
||||
FILENUM int64
|
||||
|
||||
// This is computed dynammically from the current record's field-count
|
||||
// NF int
|
||||
NR int64
|
||||
FNR int64
|
||||
}
|
||||
|
||||
// TODO: comment: Remember command-line values to pass along to CST evaluators.
|
||||
// The options struct-pointer can be nil when invoked by non-DSL verbs such as
|
||||
// join or seqgen.
|
||||
func NewContext() *Context {
|
||||
context := &Context{
|
||||
FILENAME: "(stdin)",
|
||||
FILENUM: 0,
|
||||
NR: 0,
|
||||
FNR: 0,
|
||||
}
|
||||
|
||||
return context
|
||||
}
|
||||
|
||||
// TODO: comment: Remember command-line values to pass along to CST evaluators.
|
||||
// The options struct-pointer can be nil when invoked by non-DSL verbs such as
|
||||
// join or seqgen.
|
||||
func NewNilContext() *Context { // TODO: rename
|
||||
context := &Context{
|
||||
FILENAME: "(stdin)",
|
||||
FILENUM: 0,
|
||||
NR: 0,
|
||||
FNR: 0,
|
||||
}
|
||||
|
||||
return context
|
||||
}
|
||||
|
||||
// For the record-readers to update their initial context as each new file is opened.
|
||||
func (context *Context) UpdateForStartOfFile(filename string) {
|
||||
context.FILENAME = filename
|
||||
context.FILENUM++
|
||||
context.FNR = 0
|
||||
}
|
||||
|
||||
// For the record-readers to update their initial context as each new record is read.
|
||||
func (context *Context) UpdateForInputRecord() {
|
||||
context.NR++
|
||||
context.FNR++
|
||||
}
|
||||
|
||||
func (context *Context) Copy() *Context {
|
||||
other := *context
|
||||
return &other
|
||||
}
|
||||
|
||||
func (context *Context) GetStatusString() string {
|
||||
|
||||
var buffer bytes.Buffer // stdio is non-buffered in Go, so buffer for speed increase
|
||||
buffer.WriteString("FILENAME=\"")
|
||||
buffer.WriteString(context.FILENAME)
|
||||
|
||||
buffer.WriteString("\",FILENUM=")
|
||||
buffer.WriteString(strconv.FormatInt(context.FILENUM, 10))
|
||||
|
||||
buffer.WriteString(",NR=")
|
||||
buffer.WriteString(strconv.FormatInt(context.NR, 10))
|
||||
|
||||
buffer.WriteString(",FNR=")
|
||||
buffer.WriteString(strconv.FormatInt(context.FNR, 10))
|
||||
|
||||
return buffer.String()
|
||||
}
|
||||
4
pkg/types/doc.go
Normal file
4
pkg/types/doc.go
Normal file
|
|
@ -0,0 +1,4 @@
|
|||
// Package types contains the implementation of the Mlrval datatype which is
|
||||
// used for record values, as well as expression/variable values in the Miller
|
||||
// put/filter DSL.
|
||||
package types
|
||||
40
pkg/types/indexed-lvalues.md
Normal file
40
pkg/types/indexed-lvalues.md
Normal file
|
|
@ -0,0 +1,40 @@
|
|||
# Supported indexable lvalues
|
||||
|
||||
* Direct/indirect field name like `$x` or `$["x"]`
|
||||
* Direct/indirect oosvar like `@x` or `@["x"]`
|
||||
* Local variable like `x`
|
||||
* Full srec `$*`
|
||||
* Full oosvar `@*`
|
||||
|
||||
# Supported indexing
|
||||
|
||||
Each level by int or string:
|
||||
|
||||
* `$x[1]` or `$x["a"]`
|
||||
* `@x[1]` or `@x["a"]`
|
||||
* `x[1]` or `x["a"]`
|
||||
* `$*[1]` or `$*["a"]`
|
||||
* `@*[1]` (not supported) or `@*["a"]`
|
||||
|
||||
Multiple levels:
|
||||
|
||||
* Each can be further indexed, e.g. `$x[1]["a"][3]`
|
||||
|
||||
Auto-deepen:
|
||||
|
||||
* `x[1][2][3] = 4` should auto-deepen
|
||||
* `x["a"]["b"]["c"] = 4` should auto-deepen
|
||||
* Create new maps at each level if necessary, unless they're already something else -- like `x["a"]` is already int/array/etc.
|
||||
|
||||
# Indexed types
|
||||
|
||||
* `$x` is a `Mlrval`
|
||||
* `@x` is a `Mlrval`
|
||||
* `x` is a `Mlrval
|
||||
* `$*` is a `Mlrmap`
|
||||
* `@*` is a `Mlrmap`
|
||||
|
||||
# Implementation
|
||||
|
||||
* `*Mlrval` needs a `PutIndexed` which takes `indices []*Mlrval` and `rvalue *Mlrval`.
|
||||
* `*Mlrmap` needs a `PutIndexed` which takes `indices []*Mlrval` and `rvalue *Mlrval`.
|
||||
100
pkg/types/mlrval_typing.go
Normal file
100
pkg/types/mlrval_typing.go
Normal file
|
|
@ -0,0 +1,100 @@
|
|||
// ================================================================
|
||||
// Support for things like 'num x = $a + $b' in the DSL, wherein we check types
|
||||
// at assignment time.
|
||||
// ================================================================
|
||||
|
||||
package types
|
||||
|
||||
import (
|
||||
"fmt"
|
||||
|
||||
"github.com/johnkerl/miller/pkg/mlrval"
|
||||
)
|
||||
|
||||
// ----------------------------------------------------------------
|
||||
type TypeGatedMlrvalName struct {
|
||||
Name string
|
||||
TypeName string
|
||||
TypeMask int
|
||||
}
|
||||
|
||||
func NewTypeGatedMlrvalName(
|
||||
name string, // e.g. "x"
|
||||
typeName string, // e.g. "num"
|
||||
) (*TypeGatedMlrvalName, error) {
|
||||
typeMask, ok := mlrval.TypeNameToMask(typeName)
|
||||
if !ok {
|
||||
return nil, fmt.Errorf("mlr: couldn't resolve type name \"%s\".", typeName)
|
||||
}
|
||||
return &TypeGatedMlrvalName{
|
||||
Name: name,
|
||||
TypeName: typeName,
|
||||
TypeMask: typeMask,
|
||||
}, nil
|
||||
}
|
||||
|
||||
func (tname *TypeGatedMlrvalName) Check(value *mlrval.Mlrval) error {
|
||||
bit := value.GetTypeBit()
|
||||
if bit&tname.TypeMask != 0 {
|
||||
return nil
|
||||
} else {
|
||||
return fmt.Errorf(
|
||||
"mlr: couldn't assign variable %s %s from value %s %s\n",
|
||||
tname.TypeName, tname.Name, value.GetTypeName(), value.String(),
|
||||
)
|
||||
}
|
||||
}
|
||||
|
||||
// ----------------------------------------------------------------
|
||||
type TypeGatedMlrvalVariable struct {
|
||||
typeGatedMlrvalName *TypeGatedMlrvalName
|
||||
value *mlrval.Mlrval
|
||||
}
|
||||
|
||||
func NewTypeGatedMlrvalVariable(
|
||||
name string, // e.g. "x"
|
||||
typeName string, // e.g. "num"
|
||||
value *mlrval.Mlrval,
|
||||
) (*TypeGatedMlrvalVariable, error) {
|
||||
typeGatedMlrvalName, err := NewTypeGatedMlrvalName(name, typeName)
|
||||
if err != nil {
|
||||
return nil, err
|
||||
}
|
||||
|
||||
err = typeGatedMlrvalName.Check(value)
|
||||
if err != nil {
|
||||
return nil, err
|
||||
}
|
||||
|
||||
return &TypeGatedMlrvalVariable{
|
||||
typeGatedMlrvalName,
|
||||
value.Copy(),
|
||||
}, nil
|
||||
}
|
||||
|
||||
func (tvar *TypeGatedMlrvalVariable) GetName() string {
|
||||
return tvar.typeGatedMlrvalName.Name
|
||||
}
|
||||
|
||||
func (tvar *TypeGatedMlrvalVariable) GetValue() *mlrval.Mlrval {
|
||||
return tvar.value
|
||||
}
|
||||
|
||||
func (tvar *TypeGatedMlrvalVariable) ValueString() string {
|
||||
return tvar.value.String()
|
||||
}
|
||||
|
||||
func (tvar *TypeGatedMlrvalVariable) Assign(value *mlrval.Mlrval) error {
|
||||
err := tvar.typeGatedMlrvalName.Check(value)
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
|
||||
// TODO: revisit copy-reduction
|
||||
tvar.value = value.Copy()
|
||||
return nil
|
||||
}
|
||||
|
||||
func (tvar *TypeGatedMlrvalVariable) Unassign() {
|
||||
tvar.value = mlrval.ABSENT
|
||||
}
|
||||
Loading…
Add table
Add a link
Reference in a new issue