more of same

This commit is contained in:
John Kerl 2021-02-14 01:25:32 -05:00
parent ab89fa15fd
commit d447ebd71f
14 changed files with 51 additions and 51 deletions

View file

@ -81,10 +81,10 @@ During the coding of Miller, I've been guided by the following:
* Names of files, variables, functions, etc. should be fully spelled out (e.g. `NewEvaluableLeafNode`), except for a small number of most-used names where a longer name would cause unnecessary line-wraps (e.g. `Mlrval` instead of `MillerValue` since this appears very very often).
* Code should not be too clever. This includes some reasonable amounts of code duplication from time to time, to keep things inline, rather than lasagna code.
* Things should be transparent. For example, `mlr -n put -v '$y = 3 + 0.1 * $x'` shows you the abstract syntax tree derived from the DSL expression.
* Comments should be robust with respect to reasonably anticipated changes. For example, one package should cross-link to another in its comments, but I try to avoid mentioning specific filenames too much in the comments and README files since these may change over time. I make an exception for stable points such as [mlr.go](./mlr.go), [mlr.bnf](./src/miller/parsing/mlr.bnf), [stream.go](./src/miller/stream/stream.go), etc.
* Comments should be robust with respect to reasonably anticipated changes. For example, one package should cross-link to another in its comments, but I try to avoid mentioning specific filenames too much in the comments and README files since these may change over time. I make an exception for stable points such as [mlr.go](./mlr.go), [mlr.bnf](./src/parsing/mlr.bnf), [stream.go](./src/stream/stream.go), etc.
* *Miller should be pleasant to write.*
* It should be quick to answer the question *Did I just break anything?* -- hence the `build` and `reg_test/run` regression scripts.
* It should be quick to find out what to do next as you iteratively develop -- see for example [cst/README.md](https://github.com/johnkerl/miller/blob/master/go/src/miller/dsl/cst/README.md).
* It should be quick to find out what to do next as you iteratively develop -- see for example [cst/README.md](https://github.com/johnkerl/miller/blob/master/go/src/dsl/cst/README.md).
* *The language should be an asset, not a liability.*
* One of the reasons I chose Go is that (personally anyway) I find it to be reasonably efficient, well-supported with standard libraries, straightforward, and fun. I hope you enjoy it as much as I have.
@ -103,10 +103,10 @@ sequence of key-value pairs. The basic **stream** operation is:
So, in broad overview, the key packages are:
* [src/miller/stream](./src/miller/stream) -- connect input -> transforms -> output via Go channels
* [src/miller/input](./src/miller/input) -- read input records
* [src/miller/transforming](./src/miller/transforming) -- transform input records to output records
* [src/miller/output](./src/miller/output) -- write output records
* [src/stream](./src/stream) -- connect input -> transforms -> output via Go channels
* [src/input](./src/input) -- read input records
* [src/transforming](./src/transforming) -- transform input records to output records
* [src/output](./src/output) -- write output records
* The rest are details to support this.
## Directory-structure details
@ -122,21 +122,21 @@ So, in broad overview, the key packages are:
### Miller per se
* The main entry point is [mlr.go](./mlr.go); everything else in [src/miller](./src/miller).
* [src/miller/lib](./src/miller/lib):
* Implementation of the [`Mlrval`](./src/miller/types/mlrval.go) datatype which includes string/int/float/boolean/void/absent/error types. These are used for record values, as well as expression/variable values in the Miller `put`/`filter` DSL. See also below for more details.
* [`Mlrmap`](./src/miller/types/mlrmap.go) is the sequence of key-value pairs which represents a Miller record. The key-lookup mechanism is optimized for Miller read/write usage patterns -- please see [mlrmap.go](./src/miller/types/mlrmap.go) for more details.
* [`context`](./src/miller/types/context.go) supports AWK-like variables such as `FILENAME`, `NF`, `NR`, and so on.
* [src/miller/cli](./src/miller/cli) is the flag-parsing logic for supporting Miller's command-line interface. When you type something like `mlr --icsv --ojson put '$sum = $a + $b' then filter '$sum > 1000' myfile.csv`, it's the CLI parser which makes it possible for Miller to construct a CSV record-reader, a transformer-chain of `put` then `filter`, and a JSON record-writer.
* [src/miller/cliutil](./src/miller/cliutil) contains datatypes for the CLI-parser, which was split out to avoid a Go package-import cycle.
* [src/miller/stream](./src/miller/stream) is as above -- it uses Go channels to pipe together file-reads, to record-reading/parsing, to a chain of record-transformers, to record-writing/formatting, to terminal standard output.
* [src/miller/input](./src/miller/input) is as above -- one record-reader type per supported input file format, and a factory method.
* [src/miller/output](./src/miller/output) is as above -- one record-writer type per supported output file format, and a factory method.
* [src/miller/transforming](./src/miller/transforming) contains the abstract record-transformer interface datatype, as well as the Go-channel chaining mechanism for piping one transformer into the next.
* [src/miller/transformers](./src/miller/transformers) is all the concrete record-transformers such as `cat`, `tac`, `sort`, `put`, and so on. I put it here, not in `transforming`, so all files in `transformers` would be of the same type.
* [src/miller/parsing](./src/miller/parsing) contains a single source file, `mlr.bnf`, which is the lexical/semantic grammar file for the Miller `put`/`filter` DSL using the GOCC framework. All subdirectories of `src/miller/parsing/` are autogen code created by GOCC's processing of `mlr.bnf`.
* [src/miller/dsl](./src/miller/dsl) contains [`ast_types.go`](src/miller/dsl/ast_types.go) which is the abstract syntax tree datatype shared between GOCC and Miller. I didn't use a `src/miller/dsl/ast` naming convention, although that would have been nice, in order to avoid a Go package-dependency cycle.
* [src/miller/dsl/cst](./src/miller/dsl/cst) is the concrete syntax tree, constructed from an AST produced by GOCC. The CST is what is actually executed on every input record when you do things like `$z = $x * 0.3 * $y`. Please see the [src/miller/dsl/cst/README.md](./src/miller/dsl/cst/README.md) for more information.
* The main entry point is [mlr.go](./mlr.go); everything else in [src](./src).
* [src/lib](./src/lib):
* Implementation of the [`Mlrval`](./src/types/mlrval.go) datatype which includes string/int/float/boolean/void/absent/error types. These are used for record values, as well as expression/variable values in the Miller `put`/`filter` DSL. See also below for more details.
* [`Mlrmap`](./src/types/mlrmap.go) is the sequence of key-value pairs which represents a Miller record. The key-lookup mechanism is optimized for Miller read/write usage patterns -- please see [mlrmap.go](./src/types/mlrmap.go) for more details.
* [`context`](./src/types/context.go) supports AWK-like variables such as `FILENAME`, `NF`, `NR`, and so on.
* [src/cli](./src/cli) is the flag-parsing logic for supporting Miller's command-line interface. When you type something like `mlr --icsv --ojson put '$sum = $a + $b' then filter '$sum > 1000' myfile.csv`, it's the CLI parser which makes it possible for Miller to construct a CSV record-reader, a transformer-chain of `put` then `filter`, and a JSON record-writer.
* [src/cliutil](./src/cliutil) contains datatypes for the CLI-parser, which was split out to avoid a Go package-import cycle.
* [src/stream](./src/stream) is as above -- it uses Go channels to pipe together file-reads, to record-reading/parsing, to a chain of record-transformers, to record-writing/formatting, to terminal standard output.
* [src/input](./src/input) is as above -- one record-reader type per supported input file format, and a factory method.
* [src/output](./src/output) is as above -- one record-writer type per supported output file format, and a factory method.
* [src/transforming](./src/transforming) contains the abstract record-transformer interface datatype, as well as the Go-channel chaining mechanism for piping one transformer into the next.
* [src/transformers](./src/transformers) is all the concrete record-transformers such as `cat`, `tac`, `sort`, `put`, and so on. I put it here, not in `transforming`, so all files in `transformers` would be of the same type.
* [src/parsing](./src/parsing) contains a single source file, `mlr.bnf`, which is the lexical/semantic grammar file for the Miller `put`/`filter` DSL using the GOCC framework. All subdirectories of `src/parsing/` are autogen code created by GOCC's processing of `mlr.bnf`.
* [src/dsl](./src/dsl) contains [`ast_types.go`](src/dsl/ast_types.go) which is the abstract syntax tree datatype shared between GOCC and Miller. I didn't use a `src/dsl/ast` naming convention, although that would have been nice, in order to avoid a Go package-dependency cycle.
* [src/dsl/cst](./src/dsl/cst) is the concrete syntax tree, constructed from an AST produced by GOCC. The CST is what is actually executed on every input record when you do things like `$z = $x * 0.3 * $y`. Please see the [src/dsl/cst/README.md](./src/dsl/cst/README.md) for more information.
## Nil-record conventions
@ -168,7 +168,7 @@ nil through the reader/transformer/writer sequence.
## More about mlrvals
[`Mlrval`](./src/miller/types/mlrval.go) is the datatype of record values, as well as expression/variable values in the Miller `put`/`filter` DSL. It includes string/int/float/boolean/void/absent/error types, not unlike PHP's `zval`.
[`Mlrval`](./src/types/mlrval.go) is the datatype of record values, as well as expression/variable values in the Miller `put`/`filter` DSL. It includes string/int/float/boolean/void/absent/error types, not unlike PHP's `zval`.
* Miller's `absent` type is like Javascript's `undefined` -- it's for times when there is no such key, as in a DSL expression `$out = $foo` when the input record is `$x=3,y=4` -- there is no `$foo` so `$foo` has `absent` type. Nothing is written to the `$out` field in this case. See also [here](http://johnkerl.org/miller/doc/reference.html#Null_data:_empty_and_absent) for more information.
* Miller's `void` type is like Javascript's `null` -- it's for times when there is a key with no value, as in `$out = $x` when the input record is `$x=,$y=4`. This is an overlap with `string` type, since a void value looks like an empty string. I've gone back and forth on this (including when I was writing the C implementation) -- whether to retain `void` as a distinct type from empty-string, or not. I ended up keeping it as it made the `Mlrval` logic easier to understand.
@ -176,7 +176,7 @@ nil through the reader/transformer/writer sequence.
* Miller's number handling makes auto-overflow from int to float transparent, while preserving the possibility of 64-bit bitwise arithmetic.
* This is different from JavaScript, which has only double-precision floats and thus no support for 64-bit numbers (note however that there is now [`BigInt`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/BigInt)).
* This is also different from C and Go, wherein casts are necessary -- without which int arithmetic overflows.
* See also [here](http://johnkerl.org/miller/doc/reference.html#Arithmetic) for the semantics of Miller arithmetic, which the [`Mlrval`](./src/miller/types/mlrval.go) class implements.
* See also [here](http://johnkerl.org/miller/doc/reference.html#Arithmetic) for the semantics of Miller arithmetic, which the [`Mlrval`](./src/types/mlrval.go) class implements.
## Software-testing methodology

View file

@ -11,5 +11,5 @@ set -euo pipefail
#
# See also https://stackoverflow.com/questions/8204367/ctag-database-for-go
ctags -f gosource.tags -R `pwd`/src/miller
ctags -f gosource.tags -R `pwd`/src
mv gosource.tags tags

View file

@ -52,23 +52,23 @@ echo "Parser-autogen OK"
# ----------------------------------------------------------------
# Override GOCC codegen with customized error handling
cp ../../src/miller/parsing/errors.go.template src/experimental/errors/errors.go
cp ../../src/parsing/errors.go.template src/experimental/errors/errors.go
sed -i .bak 's:miller/parsing:experimental:' src/experimental/errors/errors.go
# ----------------------------------------------------------------
# Copy AST files from the main Miller tree
rm -rf ./src/miller/lib/
rm -rf ./src/miller/dsl/
rm -rf ./src/lib/
rm -rf ./src/dsl/
mkdir -p ./src/miller/lib/
mkdir -p ./src/miller/dsl/
mkdir -p ./src/lib/
mkdir -p ./src/dsl/
cp ../../src/miller/lib/*.go ./src/miller/lib/
cp ../../src/miller/dsl/ast*.go ./src/miller/dsl/
cp ../../src/lib/*.go ./src/lib/
cp ../../src/dsl/ast*.go ./src/dsl/
# Different path to autogen between main Miller tree and here
sed -i .bak 's:miller/parsing:experimental:' src/miller/dsl/ast*go
sed -i .bak 's:miller/parsing:experimental:' src/dsl/ast*go
# ----------------------------------------------------------------
# Compile the main and the parser-autogen

View file

@ -1,5 +1,5 @@
Logic for parsing the Miller command line.
* `src/miller/cli` is the flag-parsing logic for supporting Miller's command-line interface. When you type something like `mlr --icsv --ojson put '$sum = $a + $b' then filter '$sum > 1000' myfile.csv`, it's the CLI parser which makes it possible for Miller to construct a CSV record-reader, a transformer chain of `put` then `filter`, and a JSON record-writer.
* `src/miller/cliutil` contains datatypes for the CLI-parser, which was split out to avoid a Go package-import cycle.
* `src/cli` is the flag-parsing logic for supporting Miller's command-line interface. When you type something like `mlr --icsv --ojson put '$sum = $a + $b' then filter '$sum > 1000' myfile.csv`, it's the CLI parser which makes it possible for Miller to construct a CSV record-reader, a transformer chain of `put` then `filter`, and a JSON record-writer.
* `src/cliutil` contains datatypes for the CLI-parser, which was split out to avoid a Go package-import cycle.
* I don't use the Go [`flag`](https://golang.org/pkg/flag/) package here, although I do use it within the transformers' subcommand flag-handling. The `flag` package is quite fine; Miller's command-line processing is multi-purpose between serving CLI needs per se as well as for manpage/docfile generation, and I found it simplest to roll my own command-line handling here.

View file

@ -1,4 +1,4 @@
Datatypes for parsing the Miller command line.
* `src/miller/cli` is the flag-parsing logic for supporting Miller's command-line interface. When you type something like `mlr --icsv --ojson put '$sum = $a + $b' then filter '$sum > 1000' myfile.csv`, it's the CLI parser which makes it possible for Miller to construct a CSV record-reader, a transformer chain of `put` then `filter`, and a JSON record-writer.
* `src/miller/cliutil` contains datatypes for the CLI-parser, which was split out to avoid a Go package-import cycle.
* `src/cli` is the flag-parsing logic for supporting Miller's command-line interface. When you type something like `mlr --icsv --ojson put '$sum = $a + $b' then filter '$sum > 1000' myfile.csv`, it's the CLI parser which makes it possible for Miller to construct a CSV record-reader, a transformer chain of `put` then `filter`, and a JSON record-writer.
* `src/cliutil` contains datatypes for the CLI-parser, which was split out to avoid a Go package-import cycle.

View file

@ -97,5 +97,5 @@ tree is executed once on every data record.
# Source directories/files
* The AST logic is in `./ast*.go`. I didn't use a `src/miller/dsl/ast` naming convention, although that would have been nice, in order to avoid a Go package-dependency cycle.
* The AST logic is in `./ast*.go`. I didn't use a `src/dsl/ast` naming convention, although that would have been nice, in order to avoid a Go package-dependency cycle.
* The CST logic is in [`./cst`](./cst). Please see [cst/README.md](./cst/README.md) for more information.

View file

@ -1,4 +1,4 @@
See [go/src/miller/dsl/README.md](https://github.com/johnkerl/miller/blob/master/go/src/miller/dsl/README.md) for more information about Miller's use of abstract syntax trees (ASTs) and concrete syntax trees (CSTs) within the Miller `put`/`filter` domain-specific language (DSL).
See [go/src/dsl/README.md](https://github.com/johnkerl/miller/blob/master/go/src/dsl/README.md) for more information about Miller's use of abstract syntax trees (ASTs) and concrete syntax trees (CSTs) within the Miller `put`/`filter` domain-specific language (DSL).
## Files
@ -11,7 +11,7 @@ See [go/src/miller/dsl/README.md](https://github.com/johnkerl/miller/blob/master
Go is a strongly typed language, but the AST is polymorphic. This results in if/else or switch statements as an AST is walked.
Also, when we modify code, there can be changes in the [BNF grammar](../../parsing/mlr.bnf) not yet reflected in the [AST](../../src/miller/dsl/ast_types.go). Likewise, there can be AST changes not yet reflected here. (Example: you are partway through adding a new binary operator to the grammar.)
Also, when we modify code, there can be changes in the [BNF grammar](../../parsing/mlr.bnf) not yet reflected in the [AST](../../src/dsl/ast_types.go). Likewise, there can be AST changes not yet reflected here. (Example: you are partway through adding a new binary operator to the grammar.)
As a result, throughout the code, there are error checks which may seem redundant but which are in place to make incremental development more pleasant and robust.

View file

@ -1,3 +1,3 @@
This directory contains a single source file, `mlr.bnf`, which is the lexical/semantic grammar file for the Miller `put`/`filter` DSL using the GOCC framework. (In a classical Lex/Yacc framework, there would be separate `mlr.l` and `mlr.y` files; using GOCC, there is a single `mlr.bnf` file.)
All subdirectories of `src/miller/parsing/` are autogen code created by GOCC's processing of `mlr.bnf`. They are nonetheless committed to source control, since running GOCC takes quite a bit longer than the `go build mlr.go` does, and the BNF file doesn't often change. See the top-level `miller/go` build scripts for how to rerun GOCC. As of this writing, it's `bin/gocc -o src/miller/parsing src/miller/parsing/mlr.bnf` as invoked from the `miller/go` base directory.
All subdirectories of `src/parsing/` are autogen code created by GOCC's processing of `mlr.bnf`. They are nonetheless committed to source control, since running GOCC takes quite a bit longer than the `go build mlr.go` does, and the BNF file doesn't often change. See the top-level `miller/go` build scripts for how to rerun GOCC. As of this writing, it's `bin/gocc -o src/parsing src/parsing/mlr.bnf` as invoked from the `miller/go` base directory.

View file

@ -3,8 +3,8 @@
// over the top of GOCC codegen so that we can customize handling of error
// messages.
//
// Source: src/miller/parsing/errors.go.template Destionation:
// src/miller/parsing/errors/errors.go
// Source: src/parsing/errors.go.template Destionation:
// src/parsing/errors/errors.go
// ================================================================
package errors

View file

@ -3,8 +3,8 @@
// over the top of GOCC codegen so that we can customize handling of error
// messages.
//
// Source: src/miller/parsing/errors.go.template Destionation:
// src/miller/parsing/errors/errors.go
// Source: src/parsing/errors.go.template Destionation:
// src/parsing/errors/errors.go
// ================================================================
package errors

View file

@ -37,7 +37,7 @@
// interface{}/error since they are meant for nesting as arguments here
// within this file.
//
// * Please see src/miller/dsl/ast*.go for more about what the <<...>>
// * Please see src/dsl/ast*.go for more about what the <<...>>
// code here is calling.
// ================================================================

View file

@ -1,4 +1,4 @@
Logic for transforming input records into output records as requested by the user (sort, filter, etc.).
* `src/miller/transforming` contains the abstract record-transformer interface datatype, as well as the Go-channel chaining mechanism for piping one transformer into the next.
* `src/miller/transformers` is all the concrete record-transformers such as `cat`, `tac`, `sort`, `put`, and so on. I put it here, not in `transforming`, so all files in `transformers` would be of the same type.
* `src/transforming` contains the abstract record-transformer interface datatype, as well as the Go-channel chaining mechanism for piping one transformer into the next.
* `src/transformers` is all the concrete record-transformers such as `cat`, `tac`, `sort`, `put`, and so on. I put it here, not in `transforming`, so all files in `transformers` would be of the same type.

View file

@ -1,4 +1,4 @@
Logic for transforming input records into output records as requested by the user (sort, filter, etc.).
* `src/miller/transforming` contains the abstract record-transformer interface datatype, as well as the Go-channel chaining mechanism for piping one transformer into the next.
* `src/miller/transformers` is all the concrete record-transformers such as `cat`, `tac`, `sort`, `put`, and so on. I put it here, not in `transforming`, so all files in `transformers` would be of the same type.
* `src/transforming` contains the abstract record-transformer interface datatype, as well as the Go-channel chaining mechanism for piping one transformer into the next.
* `src/transformers` is all the concrete record-transformers such as `cat`, `tac`, `sort`, `put`, and so on. I put it here, not in `transforming`, so all files in `transformers` would be of the same type.

View file

@ -2,13 +2,13 @@
wc -l \
$(find src -name \*.go | grep -v src/parsing) \
src/miller/parsing/mlr.bnf \
src/parsing/mlr.bnf \
| sort -n
echo
wc -c \
$(find src -name \*.go | grep -v src/parsing) \
src/miller/parsing/mlr.bnf \
src/parsing/mlr.bnf \
| sort -n \
| tail -n 5