Export library code in pkg/ (#1391)

* Export library code in `pkg/`

* new doc page
This commit is contained in:
John Kerl 2023-09-10 17:15:13 -04:00 committed by GitHub
parent 93b7c8eac0
commit 268a96d002
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
358 changed files with 1076 additions and 693 deletions

39
pkg/types/README.md Normal file
View file

@ -0,0 +1,39 @@
This contains the implementation of the [`types.Mlrval`](./mlrval.go) datatype which is used for record values, as well as expression/variable values in the Miller `put`/`filter` DSL.
## Mlrval
The [`types.Mlrval`](./mlrval.go) structure includes **string, int, float, boolean, array-of-mlrval, map-string-to-mlrval, void, absent, and error** types as well as type-conversion logic for various operators.
* Miller's `absent` type is like Javascript's `undefined` -- it's for times when there is no such key, as in a DSL expression `$out = $foo` when the input record is `$x=3,y=4` -- there is no `$foo` so `$foo` has `absent` type. Nothing is written to the `$out` field in this case. See also [here](https://miller.readthedocs.io/en/latest/reference-main-null-data) for more information.
* Miller's `void` type is like Javascript's `null` -- it's for times when there is a key with no value, as in `$out = $x` when the input record is `$x=,$y=4`. This is an overlap with `string` type, since a void value looks like an empty string. I've gone back and forth on this (including when I was writing the C implementation) -- whether to retain `void` as a distinct type from empty-string, or not. I ended up keeping it as it made the `Mlrval` logic easier to understand.
* Miller's `error` type is for things like doing type-uncoerced addition of strings. Data-dependent errors are intended to result in `(error)`-valued output, rather than crashing Miller. See also [here](https://miller.readthedocs.io/en/latest/reference-main-data-types) for more information.
* Miller's number handling makes auto-overflow from int to float transparent, while preserving the possibility of 64-bit bitwise arithmetic.
* This is different from JavaScript, which has only double-precision floats and thus no support for 64-bit numbers (note however that there is now [`BigInt`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/BigInt)).
* This is also different from C and Go, wherein casts are necessary -- without which int arithmetic overflows.
* Using `$a * $b` in Miller will auto-overflow to float. Using `$a .* $b` will stick with 64-bit integers (if `$a` and `$b` are already 64-bit integers).
* More generally:
* Bitwise operators such as `|`, `&`, and `^` map ints to ints.
* The auto-overflowing math operators `+`, `*`, etc. map ints to ints unless they overflow in which case float is produced.
* The int-preserving math operators `.+`, `.*`, etc. map ints to ints even if they overflow.
* See also [here](https://miller.readthedocs.io/en/latest/reference-main-arithmetic) for the semantics of Miller arithmetic, which the `Mlrval` class implements.
* Since a Mlrval can be of type array-of-mlrval or map-string-to-mlrval, a Mlrval is suited for JSON decoding/encoding.
# Mlrmap
[`types.Mlrmap`](./mlrmap.go) is the sequence of key-value pairs which represents a Miller record. The key-lookup mechanism is optimized for Miller read/write usage patterns -- please see `mlrmap.go` for more details.
It's also an ordered map structure, with string keys and Mlrval values. This is used within Mlrval itself.
# Context
[`types.Context`](./context.go) supports AWK-like variables such as `FILENAME`, `NF`, `NR`, and so on.
# A note on JSON
* The code for JSON I/O is mixed between `Mlrval` and `Mlrmap. This is unsurprising since JSON is a mutually recursive data structure -- arrays can contain maps and vice versa.
* JSON has non-collection types (string, int, float, etc) as well as collection types (array and object). Support for objects is principally in [./mlrmap_json.go](mlrmap_json.go); support for non-collection types as well as arrays is in [./mlrval_json.go](mlrval_json.go).
* Both multi-line and single-line formats are supported.
* Callsites for JSON output are record-writing (e.g. `--ojson`), the `dump` and `print` DSL routines, and the `json_stringify` DSL function.
* The choice between single-line and multi-line for JSON record-writing is controlled by `--jvstack` and `--no-jvstack`, the former (multiline) being the default.
* The `dump` and `print` DSL routines produce multi-line output without a way for the user to choose single-line output.
* The `json_stringify` DSL function lets the user specify multi-line or single-line, with the former being the default,

166
pkg/types/context.go Normal file
View file

@ -0,0 +1,166 @@
package types
import (
"bytes"
"container/list"
"strconv"
"github.com/johnkerl/miller/pkg/mlrval"
)
// Since Go is concurrent, the context struct (AWK-like variables such as
// FILENAME, NF, NR, FNR, etc.) needs to be duplicated and passed through the
// channels along with each record.
//
// Strings to be printed from put/filter DSL print/dump/etc statements are
// passed along to the output channel via this OutputString rather than
// fmt.Println directly in the put/filter handlers since we want all print
// statements and record-output to be in the same goroutine, for deterministic
// output ordering.
type RecordAndContext struct {
Record *mlrval.Mlrmap
Context Context
OutputString string
EndOfStream bool
}
func NewRecordAndContext(
record *mlrval.Mlrmap,
context *Context,
) *RecordAndContext {
return &RecordAndContext{
Record: record,
// Since Go is concurrent, the context struct needs to be duplicated and
// passed through the channels along with each record. Here is where
// the copy happens, via the '*' in *context.
Context: *context,
OutputString: "",
EndOfStream: false,
}
}
// For the record-readers to update their initial context as each new record is read.
func (rac *RecordAndContext) Copy() *RecordAndContext {
if rac == nil {
return nil
}
recordCopy := rac.Record.Copy()
contextCopy := rac.Context
return &RecordAndContext{
Record: recordCopy,
Context: contextCopy,
OutputString: "",
EndOfStream: false,
}
}
// For print/dump/etc to insert strings sequenced into the record-output
// stream. This avoids race conditions between different goroutines printing
// to stdout: we have a single designated goroutine printing to stdout. This
// makes output more predictable and intuitive for users; it also makes our
// regression tests run reliably the same each time.
func NewOutputString(
outputString string,
context *Context,
) *RecordAndContext {
return &RecordAndContext{
Record: nil,
Context: *context,
OutputString: outputString,
EndOfStream: false,
}
}
// For the record-readers to update their initial context as each new record is read.
func NewEndOfStreamMarker(context *Context) *RecordAndContext {
return &RecordAndContext{
Record: nil,
Context: *context,
OutputString: "",
EndOfStream: true,
}
}
// TODO: comment
// For the record-readers to update their initial context as each new record is read.
func NewEndOfStreamMarkerList(context *Context) *list.List {
ell := list.New()
ell.PushBack(NewEndOfStreamMarker(context))
return ell
}
// ----------------------------------------------------------------
type Context struct {
FILENAME string
FILENUM int64
// This is computed dynammically from the current record's field-count
// NF int
NR int64
FNR int64
}
// TODO: comment: Remember command-line values to pass along to CST evaluators.
// The options struct-pointer can be nil when invoked by non-DSL verbs such as
// join or seqgen.
func NewContext() *Context {
context := &Context{
FILENAME: "(stdin)",
FILENUM: 0,
NR: 0,
FNR: 0,
}
return context
}
// TODO: comment: Remember command-line values to pass along to CST evaluators.
// The options struct-pointer can be nil when invoked by non-DSL verbs such as
// join or seqgen.
func NewNilContext() *Context { // TODO: rename
context := &Context{
FILENAME: "(stdin)",
FILENUM: 0,
NR: 0,
FNR: 0,
}
return context
}
// For the record-readers to update their initial context as each new file is opened.
func (context *Context) UpdateForStartOfFile(filename string) {
context.FILENAME = filename
context.FILENUM++
context.FNR = 0
}
// For the record-readers to update their initial context as each new record is read.
func (context *Context) UpdateForInputRecord() {
context.NR++
context.FNR++
}
func (context *Context) Copy() *Context {
other := *context
return &other
}
func (context *Context) GetStatusString() string {
var buffer bytes.Buffer // stdio is non-buffered in Go, so buffer for speed increase
buffer.WriteString("FILENAME=\"")
buffer.WriteString(context.FILENAME)
buffer.WriteString("\",FILENUM=")
buffer.WriteString(strconv.FormatInt(context.FILENUM, 10))
buffer.WriteString(",NR=")
buffer.WriteString(strconv.FormatInt(context.NR, 10))
buffer.WriteString(",FNR=")
buffer.WriteString(strconv.FormatInt(context.FNR, 10))
return buffer.String()
}

4
pkg/types/doc.go Normal file
View file

@ -0,0 +1,4 @@
// Package types contains the implementation of the Mlrval datatype which is
// used for record values, as well as expression/variable values in the Miller
// put/filter DSL.
package types

View file

@ -0,0 +1,40 @@
# Supported indexable lvalues
* Direct/indirect field name like `$x` or `$["x"]`
* Direct/indirect oosvar like `@x` or `@["x"]`
* Local variable like `x`
* Full srec `$*`
* Full oosvar `@*`
# Supported indexing
Each level by int or string:
* `$x[1]` or `$x["a"]`
* `@x[1]` or `@x["a"]`
* `x[1]` or `x["a"]`
* `$*[1]` or `$*["a"]`
* `@*[1]` (not supported) or `@*["a"]`
Multiple levels:
* Each can be further indexed, e.g. `$x[1]["a"][3]`
Auto-deepen:
* `x[1][2][3] = 4` should auto-deepen
* `x["a"]["b"]["c"] = 4` should auto-deepen
* Create new maps at each level if necessary, unless they're already something else -- like `x["a"]` is already int/array/etc.
# Indexed types
* `$x` is a `Mlrval`
* `@x` is a `Mlrval`
* `x` is a `Mlrval
* `$*` is a `Mlrmap`
* `@*` is a `Mlrmap`
# Implementation
* `*Mlrval` needs a `PutIndexed` which takes `indices []*Mlrval` and `rvalue *Mlrval`.
* `*Mlrmap` needs a `PutIndexed` which takes `indices []*Mlrval` and `rvalue *Mlrval`.

100
pkg/types/mlrval_typing.go Normal file
View file

@ -0,0 +1,100 @@
// ================================================================
// Support for things like 'num x = $a + $b' in the DSL, wherein we check types
// at assignment time.
// ================================================================
package types
import (
"fmt"
"github.com/johnkerl/miller/pkg/mlrval"
)
// ----------------------------------------------------------------
type TypeGatedMlrvalName struct {
Name string
TypeName string
TypeMask int
}
func NewTypeGatedMlrvalName(
name string, // e.g. "x"
typeName string, // e.g. "num"
) (*TypeGatedMlrvalName, error) {
typeMask, ok := mlrval.TypeNameToMask(typeName)
if !ok {
return nil, fmt.Errorf("mlr: couldn't resolve type name \"%s\".", typeName)
}
return &TypeGatedMlrvalName{
Name: name,
TypeName: typeName,
TypeMask: typeMask,
}, nil
}
func (tname *TypeGatedMlrvalName) Check(value *mlrval.Mlrval) error {
bit := value.GetTypeBit()
if bit&tname.TypeMask != 0 {
return nil
} else {
return fmt.Errorf(
"mlr: couldn't assign variable %s %s from value %s %s\n",
tname.TypeName, tname.Name, value.GetTypeName(), value.String(),
)
}
}
// ----------------------------------------------------------------
type TypeGatedMlrvalVariable struct {
typeGatedMlrvalName *TypeGatedMlrvalName
value *mlrval.Mlrval
}
func NewTypeGatedMlrvalVariable(
name string, // e.g. "x"
typeName string, // e.g. "num"
value *mlrval.Mlrval,
) (*TypeGatedMlrvalVariable, error) {
typeGatedMlrvalName, err := NewTypeGatedMlrvalName(name, typeName)
if err != nil {
return nil, err
}
err = typeGatedMlrvalName.Check(value)
if err != nil {
return nil, err
}
return &TypeGatedMlrvalVariable{
typeGatedMlrvalName,
value.Copy(),
}, nil
}
func (tvar *TypeGatedMlrvalVariable) GetName() string {
return tvar.typeGatedMlrvalName.Name
}
func (tvar *TypeGatedMlrvalVariable) GetValue() *mlrval.Mlrval {
return tvar.value
}
func (tvar *TypeGatedMlrvalVariable) ValueString() string {
return tvar.value.String()
}
func (tvar *TypeGatedMlrvalVariable) Assign(value *mlrval.Mlrval) error {
err := tvar.typeGatedMlrvalName.Check(value)
if err != nil {
return err
}
// TODO: revisit copy-reduction
tvar.value = value.Copy()
return nil
}
func (tvar *TypeGatedMlrvalVariable) Unassign() {
tvar.value = mlrval.ABSENT
}