From d447ebd71f9db45a8039893c1add562dfd1a89b8 Mon Sep 17 00:00:00 2001 From: John Kerl Date: Sun, 14 Feb 2021 01:25:32 -0500 Subject: [PATCH] more of same --- go/README.md | 46 +++++++++++++++---------------- go/mktags | 2 +- go/parser-experiments/two/build | 16 +++++------ go/src/cli/README.md | 4 +-- go/src/cliutil/README.md | 4 +-- go/src/dsl/README.md | 2 +- go/src/dsl/cst/README.md | 4 +-- go/src/parsing/README.md | 2 +- go/src/parsing/errors.go.template | 4 +-- go/src/parsing/errors/errors.go | 4 +-- go/src/parsing/mlr.bnf | 2 +- go/src/transformers/README.md | 4 +-- go/src/transforming/README.md | 4 +-- go/tools/mcountlines | 4 +-- 14 files changed, 51 insertions(+), 51 deletions(-) diff --git a/go/README.md b/go/README.md index 38ecbb579..c9e847d48 100644 --- a/go/README.md +++ b/go/README.md @@ -81,10 +81,10 @@ During the coding of Miller, I've been guided by the following: * Names of files, variables, functions, etc. should be fully spelled out (e.g. `NewEvaluableLeafNode`), except for a small number of most-used names where a longer name would cause unnecessary line-wraps (e.g. `Mlrval` instead of `MillerValue` since this appears very very often). * Code should not be too clever. This includes some reasonable amounts of code duplication from time to time, to keep things inline, rather than lasagna code. * Things should be transparent. For example, `mlr -n put -v '$y = 3 + 0.1 * $x'` shows you the abstract syntax tree derived from the DSL expression. - * Comments should be robust with respect to reasonably anticipated changes. For example, one package should cross-link to another in its comments, but I try to avoid mentioning specific filenames too much in the comments and README files since these may change over time. I make an exception for stable points such as [mlr.go](./mlr.go), [mlr.bnf](./src/miller/parsing/mlr.bnf), [stream.go](./src/miller/stream/stream.go), etc. + * Comments should be robust with respect to reasonably anticipated changes. For example, one package should cross-link to another in its comments, but I try to avoid mentioning specific filenames too much in the comments and README files since these may change over time. I make an exception for stable points such as [mlr.go](./mlr.go), [mlr.bnf](./src/parsing/mlr.bnf), [stream.go](./src/stream/stream.go), etc. * *Miller should be pleasant to write.* * It should be quick to answer the question *Did I just break anything?* -- hence the `build` and `reg_test/run` regression scripts. - * It should be quick to find out what to do next as you iteratively develop -- see for example [cst/README.md](https://github.com/johnkerl/miller/blob/master/go/src/miller/dsl/cst/README.md). + * It should be quick to find out what to do next as you iteratively develop -- see for example [cst/README.md](https://github.com/johnkerl/miller/blob/master/go/src/dsl/cst/README.md). * *The language should be an asset, not a liability.* * One of the reasons I chose Go is that (personally anyway) I find it to be reasonably efficient, well-supported with standard libraries, straightforward, and fun. I hope you enjoy it as much as I have. @@ -103,10 +103,10 @@ sequence of key-value pairs. The basic **stream** operation is: So, in broad overview, the key packages are: -* [src/miller/stream](./src/miller/stream) -- connect input -> transforms -> output via Go channels -* [src/miller/input](./src/miller/input) -- read input records -* [src/miller/transforming](./src/miller/transforming) -- transform input records to output records -* [src/miller/output](./src/miller/output) -- write output records +* [src/stream](./src/stream) -- connect input -> transforms -> output via Go channels +* [src/input](./src/input) -- read input records +* [src/transforming](./src/transforming) -- transform input records to output records +* [src/output](./src/output) -- write output records * The rest are details to support this. ## Directory-structure details @@ -122,21 +122,21 @@ So, in broad overview, the key packages are: ### Miller per se -* The main entry point is [mlr.go](./mlr.go); everything else in [src/miller](./src/miller). -* [src/miller/lib](./src/miller/lib): - * Implementation of the [`Mlrval`](./src/miller/types/mlrval.go) datatype which includes string/int/float/boolean/void/absent/error types. These are used for record values, as well as expression/variable values in the Miller `put`/`filter` DSL. See also below for more details. - * [`Mlrmap`](./src/miller/types/mlrmap.go) is the sequence of key-value pairs which represents a Miller record. The key-lookup mechanism is optimized for Miller read/write usage patterns -- please see [mlrmap.go](./src/miller/types/mlrmap.go) for more details. - * [`context`](./src/miller/types/context.go) supports AWK-like variables such as `FILENAME`, `NF`, `NR`, and so on. -* [src/miller/cli](./src/miller/cli) is the flag-parsing logic for supporting Miller's command-line interface. When you type something like `mlr --icsv --ojson put '$sum = $a + $b' then filter '$sum > 1000' myfile.csv`, it's the CLI parser which makes it possible for Miller to construct a CSV record-reader, a transformer-chain of `put` then `filter`, and a JSON record-writer. -* [src/miller/cliutil](./src/miller/cliutil) contains datatypes for the CLI-parser, which was split out to avoid a Go package-import cycle. -* [src/miller/stream](./src/miller/stream) is as above -- it uses Go channels to pipe together file-reads, to record-reading/parsing, to a chain of record-transformers, to record-writing/formatting, to terminal standard output. -* [src/miller/input](./src/miller/input) is as above -- one record-reader type per supported input file format, and a factory method. -* [src/miller/output](./src/miller/output) is as above -- one record-writer type per supported output file format, and a factory method. -* [src/miller/transforming](./src/miller/transforming) contains the abstract record-transformer interface datatype, as well as the Go-channel chaining mechanism for piping one transformer into the next. -* [src/miller/transformers](./src/miller/transformers) is all the concrete record-transformers such as `cat`, `tac`, `sort`, `put`, and so on. I put it here, not in `transforming`, so all files in `transformers` would be of the same type. -* [src/miller/parsing](./src/miller/parsing) contains a single source file, `mlr.bnf`, which is the lexical/semantic grammar file for the Miller `put`/`filter` DSL using the GOCC framework. All subdirectories of `src/miller/parsing/` are autogen code created by GOCC's processing of `mlr.bnf`. -* [src/miller/dsl](./src/miller/dsl) contains [`ast_types.go`](src/miller/dsl/ast_types.go) which is the abstract syntax tree datatype shared between GOCC and Miller. I didn't use a `src/miller/dsl/ast` naming convention, although that would have been nice, in order to avoid a Go package-dependency cycle. -* [src/miller/dsl/cst](./src/miller/dsl/cst) is the concrete syntax tree, constructed from an AST produced by GOCC. The CST is what is actually executed on every input record when you do things like `$z = $x * 0.3 * $y`. Please see the [src/miller/dsl/cst/README.md](./src/miller/dsl/cst/README.md) for more information. +* The main entry point is [mlr.go](./mlr.go); everything else in [src](./src). +* [src/lib](./src/lib): + * Implementation of the [`Mlrval`](./src/types/mlrval.go) datatype which includes string/int/float/boolean/void/absent/error types. These are used for record values, as well as expression/variable values in the Miller `put`/`filter` DSL. See also below for more details. + * [`Mlrmap`](./src/types/mlrmap.go) is the sequence of key-value pairs which represents a Miller record. The key-lookup mechanism is optimized for Miller read/write usage patterns -- please see [mlrmap.go](./src/types/mlrmap.go) for more details. + * [`context`](./src/types/context.go) supports AWK-like variables such as `FILENAME`, `NF`, `NR`, and so on. +* [src/cli](./src/cli) is the flag-parsing logic for supporting Miller's command-line interface. When you type something like `mlr --icsv --ojson put '$sum = $a + $b' then filter '$sum > 1000' myfile.csv`, it's the CLI parser which makes it possible for Miller to construct a CSV record-reader, a transformer-chain of `put` then `filter`, and a JSON record-writer. +* [src/cliutil](./src/cliutil) contains datatypes for the CLI-parser, which was split out to avoid a Go package-import cycle. +* [src/stream](./src/stream) is as above -- it uses Go channels to pipe together file-reads, to record-reading/parsing, to a chain of record-transformers, to record-writing/formatting, to terminal standard output. +* [src/input](./src/input) is as above -- one record-reader type per supported input file format, and a factory method. +* [src/output](./src/output) is as above -- one record-writer type per supported output file format, and a factory method. +* [src/transforming](./src/transforming) contains the abstract record-transformer interface datatype, as well as the Go-channel chaining mechanism for piping one transformer into the next. +* [src/transformers](./src/transformers) is all the concrete record-transformers such as `cat`, `tac`, `sort`, `put`, and so on. I put it here, not in `transforming`, so all files in `transformers` would be of the same type. +* [src/parsing](./src/parsing) contains a single source file, `mlr.bnf`, which is the lexical/semantic grammar file for the Miller `put`/`filter` DSL using the GOCC framework. All subdirectories of `src/parsing/` are autogen code created by GOCC's processing of `mlr.bnf`. +* [src/dsl](./src/dsl) contains [`ast_types.go`](src/dsl/ast_types.go) which is the abstract syntax tree datatype shared between GOCC and Miller. I didn't use a `src/dsl/ast` naming convention, although that would have been nice, in order to avoid a Go package-dependency cycle. +* [src/dsl/cst](./src/dsl/cst) is the concrete syntax tree, constructed from an AST produced by GOCC. The CST is what is actually executed on every input record when you do things like `$z = $x * 0.3 * $y`. Please see the [src/dsl/cst/README.md](./src/dsl/cst/README.md) for more information. ## Nil-record conventions @@ -168,7 +168,7 @@ nil through the reader/transformer/writer sequence. ## More about mlrvals -[`Mlrval`](./src/miller/types/mlrval.go) is the datatype of record values, as well as expression/variable values in the Miller `put`/`filter` DSL. It includes string/int/float/boolean/void/absent/error types, not unlike PHP's `zval`. +[`Mlrval`](./src/types/mlrval.go) is the datatype of record values, as well as expression/variable values in the Miller `put`/`filter` DSL. It includes string/int/float/boolean/void/absent/error types, not unlike PHP's `zval`. * Miller's `absent` type is like Javascript's `undefined` -- it's for times when there is no such key, as in a DSL expression `$out = $foo` when the input record is `$x=3,y=4` -- there is no `$foo` so `$foo` has `absent` type. Nothing is written to the `$out` field in this case. See also [here](http://johnkerl.org/miller/doc/reference.html#Null_data:_empty_and_absent) for more information. * Miller's `void` type is like Javascript's `null` -- it's for times when there is a key with no value, as in `$out = $x` when the input record is `$x=,$y=4`. This is an overlap with `string` type, since a void value looks like an empty string. I've gone back and forth on this (including when I was writing the C implementation) -- whether to retain `void` as a distinct type from empty-string, or not. I ended up keeping it as it made the `Mlrval` logic easier to understand. @@ -176,7 +176,7 @@ nil through the reader/transformer/writer sequence. * Miller's number handling makes auto-overflow from int to float transparent, while preserving the possibility of 64-bit bitwise arithmetic. * This is different from JavaScript, which has only double-precision floats and thus no support for 64-bit numbers (note however that there is now [`BigInt`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/BigInt)). * This is also different from C and Go, wherein casts are necessary -- without which int arithmetic overflows. - * See also [here](http://johnkerl.org/miller/doc/reference.html#Arithmetic) for the semantics of Miller arithmetic, which the [`Mlrval`](./src/miller/types/mlrval.go) class implements. + * See also [here](http://johnkerl.org/miller/doc/reference.html#Arithmetic) for the semantics of Miller arithmetic, which the [`Mlrval`](./src/types/mlrval.go) class implements. ## Software-testing methodology diff --git a/go/mktags b/go/mktags index 8b72d11f1..e35854eee 100755 --- a/go/mktags +++ b/go/mktags @@ -11,5 +11,5 @@ set -euo pipefail # # See also https://stackoverflow.com/questions/8204367/ctag-database-for-go -ctags -f gosource.tags -R `pwd`/src/miller +ctags -f gosource.tags -R `pwd`/src mv gosource.tags tags diff --git a/go/parser-experiments/two/build b/go/parser-experiments/two/build index 1ffdc8aac..1ff29b03a 100755 --- a/go/parser-experiments/two/build +++ b/go/parser-experiments/two/build @@ -52,23 +52,23 @@ echo "Parser-autogen OK" # ---------------------------------------------------------------- # Override GOCC codegen with customized error handling -cp ../../src/miller/parsing/errors.go.template src/experimental/errors/errors.go +cp ../../src/parsing/errors.go.template src/experimental/errors/errors.go sed -i .bak 's:miller/parsing:experimental:' src/experimental/errors/errors.go # ---------------------------------------------------------------- # Copy AST files from the main Miller tree -rm -rf ./src/miller/lib/ -rm -rf ./src/miller/dsl/ +rm -rf ./src/lib/ +rm -rf ./src/dsl/ -mkdir -p ./src/miller/lib/ -mkdir -p ./src/miller/dsl/ +mkdir -p ./src/lib/ +mkdir -p ./src/dsl/ -cp ../../src/miller/lib/*.go ./src/miller/lib/ -cp ../../src/miller/dsl/ast*.go ./src/miller/dsl/ +cp ../../src/lib/*.go ./src/lib/ +cp ../../src/dsl/ast*.go ./src/dsl/ # Different path to autogen between main Miller tree and here -sed -i .bak 's:miller/parsing:experimental:' src/miller/dsl/ast*go +sed -i .bak 's:miller/parsing:experimental:' src/dsl/ast*go # ---------------------------------------------------------------- # Compile the main and the parser-autogen diff --git a/go/src/cli/README.md b/go/src/cli/README.md index b46d9b5f1..44a679216 100644 --- a/go/src/cli/README.md +++ b/go/src/cli/README.md @@ -1,5 +1,5 @@ Logic for parsing the Miller command line. -* `src/miller/cli` is the flag-parsing logic for supporting Miller's command-line interface. When you type something like `mlr --icsv --ojson put '$sum = $a + $b' then filter '$sum > 1000' myfile.csv`, it's the CLI parser which makes it possible for Miller to construct a CSV record-reader, a transformer chain of `put` then `filter`, and a JSON record-writer. -* `src/miller/cliutil` contains datatypes for the CLI-parser, which was split out to avoid a Go package-import cycle. +* `src/cli` is the flag-parsing logic for supporting Miller's command-line interface. When you type something like `mlr --icsv --ojson put '$sum = $a + $b' then filter '$sum > 1000' myfile.csv`, it's the CLI parser which makes it possible for Miller to construct a CSV record-reader, a transformer chain of `put` then `filter`, and a JSON record-writer. +* `src/cliutil` contains datatypes for the CLI-parser, which was split out to avoid a Go package-import cycle. * I don't use the Go [`flag`](https://golang.org/pkg/flag/) package here, although I do use it within the transformers' subcommand flag-handling. The `flag` package is quite fine; Miller's command-line processing is multi-purpose between serving CLI needs per se as well as for manpage/docfile generation, and I found it simplest to roll my own command-line handling here. diff --git a/go/src/cliutil/README.md b/go/src/cliutil/README.md index c8c398153..6c3ded04a 100644 --- a/go/src/cliutil/README.md +++ b/go/src/cliutil/README.md @@ -1,4 +1,4 @@ Datatypes for parsing the Miller command line. -* `src/miller/cli` is the flag-parsing logic for supporting Miller's command-line interface. When you type something like `mlr --icsv --ojson put '$sum = $a + $b' then filter '$sum > 1000' myfile.csv`, it's the CLI parser which makes it possible for Miller to construct a CSV record-reader, a transformer chain of `put` then `filter`, and a JSON record-writer. -* `src/miller/cliutil` contains datatypes for the CLI-parser, which was split out to avoid a Go package-import cycle. +* `src/cli` is the flag-parsing logic for supporting Miller's command-line interface. When you type something like `mlr --icsv --ojson put '$sum = $a + $b' then filter '$sum > 1000' myfile.csv`, it's the CLI parser which makes it possible for Miller to construct a CSV record-reader, a transformer chain of `put` then `filter`, and a JSON record-writer. +* `src/cliutil` contains datatypes for the CLI-parser, which was split out to avoid a Go package-import cycle. diff --git a/go/src/dsl/README.md b/go/src/dsl/README.md index 951b7f7ef..12283b81d 100644 --- a/go/src/dsl/README.md +++ b/go/src/dsl/README.md @@ -97,5 +97,5 @@ tree is executed once on every data record. # Source directories/files -* The AST logic is in `./ast*.go`. I didn't use a `src/miller/dsl/ast` naming convention, although that would have been nice, in order to avoid a Go package-dependency cycle. +* The AST logic is in `./ast*.go`. I didn't use a `src/dsl/ast` naming convention, although that would have been nice, in order to avoid a Go package-dependency cycle. * The CST logic is in [`./cst`](./cst). Please see [cst/README.md](./cst/README.md) for more information. diff --git a/go/src/dsl/cst/README.md b/go/src/dsl/cst/README.md index 1d19e4957..2f3952dac 100644 --- a/go/src/dsl/cst/README.md +++ b/go/src/dsl/cst/README.md @@ -1,4 +1,4 @@ -See [go/src/miller/dsl/README.md](https://github.com/johnkerl/miller/blob/master/go/src/miller/dsl/README.md) for more information about Miller's use of abstract syntax trees (ASTs) and concrete syntax trees (CSTs) within the Miller `put`/`filter` domain-specific language (DSL). +See [go/src/dsl/README.md](https://github.com/johnkerl/miller/blob/master/go/src/dsl/README.md) for more information about Miller's use of abstract syntax trees (ASTs) and concrete syntax trees (CSTs) within the Miller `put`/`filter` domain-specific language (DSL). ## Files @@ -11,7 +11,7 @@ See [go/src/miller/dsl/README.md](https://github.com/johnkerl/miller/blob/master Go is a strongly typed language, but the AST is polymorphic. This results in if/else or switch statements as an AST is walked. -Also, when we modify code, there can be changes in the [BNF grammar](../../parsing/mlr.bnf) not yet reflected in the [AST](../../src/miller/dsl/ast_types.go). Likewise, there can be AST changes not yet reflected here. (Example: you are partway through adding a new binary operator to the grammar.) +Also, when we modify code, there can be changes in the [BNF grammar](../../parsing/mlr.bnf) not yet reflected in the [AST](../../src/dsl/ast_types.go). Likewise, there can be AST changes not yet reflected here. (Example: you are partway through adding a new binary operator to the grammar.) As a result, throughout the code, there are error checks which may seem redundant but which are in place to make incremental development more pleasant and robust. diff --git a/go/src/parsing/README.md b/go/src/parsing/README.md index 5a29c4a0b..870c5619a 100644 --- a/go/src/parsing/README.md +++ b/go/src/parsing/README.md @@ -1,3 +1,3 @@ This directory contains a single source file, `mlr.bnf`, which is the lexical/semantic grammar file for the Miller `put`/`filter` DSL using the GOCC framework. (In a classical Lex/Yacc framework, there would be separate `mlr.l` and `mlr.y` files; using GOCC, there is a single `mlr.bnf` file.) -All subdirectories of `src/miller/parsing/` are autogen code created by GOCC's processing of `mlr.bnf`. They are nonetheless committed to source control, since running GOCC takes quite a bit longer than the `go build mlr.go` does, and the BNF file doesn't often change. See the top-level `miller/go` build scripts for how to rerun GOCC. As of this writing, it's `bin/gocc -o src/miller/parsing src/miller/parsing/mlr.bnf` as invoked from the `miller/go` base directory. +All subdirectories of `src/parsing/` are autogen code created by GOCC's processing of `mlr.bnf`. They are nonetheless committed to source control, since running GOCC takes quite a bit longer than the `go build mlr.go` does, and the BNF file doesn't often change. See the top-level `miller/go` build scripts for how to rerun GOCC. As of this writing, it's `bin/gocc -o src/parsing src/parsing/mlr.bnf` as invoked from the `miller/go` base directory. diff --git a/go/src/parsing/errors.go.template b/go/src/parsing/errors.go.template index f8984ac56..a152d5dde 100644 --- a/go/src/parsing/errors.go.template +++ b/go/src/parsing/errors.go.template @@ -3,8 +3,8 @@ // over the top of GOCC codegen so that we can customize handling of error // messages. // -// Source: src/miller/parsing/errors.go.template Destionation: -// src/miller/parsing/errors/errors.go +// Source: src/parsing/errors.go.template Destionation: +// src/parsing/errors/errors.go // ================================================================ package errors diff --git a/go/src/parsing/errors/errors.go b/go/src/parsing/errors/errors.go index f8984ac56..a152d5dde 100644 --- a/go/src/parsing/errors/errors.go +++ b/go/src/parsing/errors/errors.go @@ -3,8 +3,8 @@ // over the top of GOCC codegen so that we can customize handling of error // messages. // -// Source: src/miller/parsing/errors.go.template Destionation: -// src/miller/parsing/errors/errors.go +// Source: src/parsing/errors.go.template Destionation: +// src/parsing/errors/errors.go // ================================================================ package errors diff --git a/go/src/parsing/mlr.bnf b/go/src/parsing/mlr.bnf index 6da955770..53d67c8b0 100644 --- a/go/src/parsing/mlr.bnf +++ b/go/src/parsing/mlr.bnf @@ -37,7 +37,7 @@ // interface{}/error since they are meant for nesting as arguments here // within this file. // -// * Please see src/miller/dsl/ast*.go for more about what the <<...>> +// * Please see src/dsl/ast*.go for more about what the <<...>> // code here is calling. // ================================================================ diff --git a/go/src/transformers/README.md b/go/src/transformers/README.md index 04b56ed47..e9d3b4284 100644 --- a/go/src/transformers/README.md +++ b/go/src/transformers/README.md @@ -1,4 +1,4 @@ Logic for transforming input records into output records as requested by the user (sort, filter, etc.). -* `src/miller/transforming` contains the abstract record-transformer interface datatype, as well as the Go-channel chaining mechanism for piping one transformer into the next. -* `src/miller/transformers` is all the concrete record-transformers such as `cat`, `tac`, `sort`, `put`, and so on. I put it here, not in `transforming`, so all files in `transformers` would be of the same type. +* `src/transforming` contains the abstract record-transformer interface datatype, as well as the Go-channel chaining mechanism for piping one transformer into the next. +* `src/transformers` is all the concrete record-transformers such as `cat`, `tac`, `sort`, `put`, and so on. I put it here, not in `transforming`, so all files in `transformers` would be of the same type. diff --git a/go/src/transforming/README.md b/go/src/transforming/README.md index 04b56ed47..e9d3b4284 100644 --- a/go/src/transforming/README.md +++ b/go/src/transforming/README.md @@ -1,4 +1,4 @@ Logic for transforming input records into output records as requested by the user (sort, filter, etc.). -* `src/miller/transforming` contains the abstract record-transformer interface datatype, as well as the Go-channel chaining mechanism for piping one transformer into the next. -* `src/miller/transformers` is all the concrete record-transformers such as `cat`, `tac`, `sort`, `put`, and so on. I put it here, not in `transforming`, so all files in `transformers` would be of the same type. +* `src/transforming` contains the abstract record-transformer interface datatype, as well as the Go-channel chaining mechanism for piping one transformer into the next. +* `src/transformers` is all the concrete record-transformers such as `cat`, `tac`, `sort`, `put`, and so on. I put it here, not in `transforming`, so all files in `transformers` would be of the same type. diff --git a/go/tools/mcountlines b/go/tools/mcountlines index 1ca6bd46c..87ffd1b6a 100755 --- a/go/tools/mcountlines +++ b/go/tools/mcountlines @@ -2,13 +2,13 @@ wc -l \ $(find src -name \*.go | grep -v src/parsing) \ - src/miller/parsing/mlr.bnf \ + src/parsing/mlr.bnf \ | sort -n echo wc -c \ $(find src -name \*.go | grep -v src/parsing) \ - src/miller/parsing/mlr.bnf \ + src/parsing/mlr.bnf \ | sort -n \ | tail -n 5