New doc page: Parsing and formatting fields (#973)

2026-01-23 02:14:13 +00:00 · 2022-03-06 23:28:16 -05:00 · 2022-03-06 23:28:16 -05:00 · 1eae19421b
commit 1eae19421b
parent 9350fed34d
9 changed files with 596 additions and 216 deletions
--- a/docs/mkdocs.yml
+++ b/docs/mkdocs.yml
@ -59,11 +59,6 @@ nav:
    - "Two-pass algorithms": "two-pass-algorithms.md"
    - "Programming-language examples": "programming-examples.md"
    - "Miscellaneous examples": "misc-examples.md"
-  - 'Background':
-    - "Why?": "why.md"
-    - "Why call it Miller?": "etymology.md"
-    - "How original is Miller?": "originality.md"
-    - "Performance": "performance.md"
  - 'Main reference':
    - "Miller command structure": "reference-main-overview.md"
    - "Then-chaining": "reference-main-then-chaining.md"
@ -72,6 +67,7 @@ nav:
    - "In-place mode": "reference-main-in-place-processing.md"
    - "Number formatting": "reference-main-number-formatting.md"
    - "Separators": "reference-main-separators.md"
+    - "Parsing and formatting fields": "parsing-and-formatting-fields.md"
    - "Flatten/unflatten: converting between JSON and tabular formats": "flatten-unflatten.md"
    - "Sorting": "sorting.md"
    - "Streaming processing, and memory usage": "streaming-and-memory.md"
@ -103,6 +99,11 @@ nav:
    - "DSL errors and transparency": "reference-dsl-errors.md"
    - "Differences from other programming languages": "reference-dsl-differences.md"
    - "A note on the complexity of Miller's expression language": "reference-dsl-complexity.md"
+  - 'Background':
+    - "Why?": "why.md"
+    - "Why call it Miller?": "etymology.md"
+    - "How original is Miller?": "originality.md"
+    - "Performance": "performance.md"
  - 'Misc. reference':
    - "Auxiliary commands": "reference-main-auxiliary-commands.md"
    - "Manual page": "manpage.md"
--- a/docs/src/data/sec2dhms.csv
+++ b/docs/src/data/sec2dhms.csv
@ -0,0 +1,5 @@
+sec
+1
+100
+10000
+1000000
--- a/docs/src/data/split1.csv
+++ b/docs/src/data/split1.csv
@ -0,0 +1,3 @@
+name,nicknames,codes
+Alice,"Allie,Skater","1,3,5"
+Robert,"Bob,Bobby,Biker","2,4,6"
--- a/docs/src/data/split2.csv
+++ b/docs/src/data/split2.csv
@ -0,0 +1,5 @@
+stamp,event
+5-18:53:20,open
+5-18:53:22,close
+5-19:07:34,open
+5-19:07:56,close
--- a/docs/src/parsing-and-formatting-fields.md
+++ b/docs/src/parsing-and-formatting-fields.md
@ -0,0 +1,385 @@
+<!---  PLEASE DO NOT EDIT DIRECTLY. EDIT THE .md.in FILE PLEASE. --->
+<div>
+<span class="quicklinks">
+Quick links:
+&nbsp;
+<a class="quicklink" href="../reference-main-flag-list/index.html">Flags</a>
+&nbsp;
+<a class="quicklink" href="../reference-verbs/index.html">Verbs</a>
+&nbsp;
+<a class="quicklink" href="../reference-dsl-builtin-functions/index.html">Functions</a>
+&nbsp;
+<a class="quicklink" href="../glossary/index.html">Glossary</a>
+&nbsp;
+<a class="quicklink" href="../release-docs/index.html">Release docs</a>
+</span>
+</div>
+# Parsing and formatting fields
+
+Miller offers several ways to split strings into pieces (parsing them), and to put things together
+into a string (formatting them).
+
+## Splitting and joining with the same separator
+
+One pattern we often have is items separated by the same separator, e.g. a field with value
+`1;2;3;4` -- with a `;` between every pair of items. There are several useful
+[DSL](miller-programming-language.md) [functions](reference-dsl-builtin-functions.md) for splitting
+a string into pieces, and joining pieces into a string.
+
+For example, suppose we have a CSV file like this:
+
+<pre class="pre-highlight-in-pair">
+<b>cat data/split1.csv</b>
+</pre>
+<pre class="pre-non-highlight-in-pair">
+name,nicknames,codes
+Alice,"Allie,Skater","1,3,5"
+Robert,"Bob,Bobby,Biker","2,4,6"
+</pre>
+
+<pre class="pre-highlight-in-pair">
+<b>mlr --icsv --ojson cat data/split1.csv</b>
+</pre>
+<pre class="pre-non-highlight-in-pair">
+[
+{
+  "name": "Alice",
+  "nicknames": "Allie,Skater",
+  "codes": "1,3,5"
+},
+{
+  "name": "Robert",
+  "nicknames": "Bob,Bobby,Biker",
+  "codes": "2,4,6"
+}
+]
+</pre>
+
+Then we can use the [`splita`](reference-dsl-builtin-functions.md#splita) function to split the
+`nicknames` string into an array of strings:
+
+<pre class="pre-highlight-in-pair">
+<b>mlr --icsv --ojson --from data/split1.csv put '$nicknames = splita($nicknames, ",")'</b>
+</pre>
+<pre class="pre-non-highlight-in-pair">
+[
+{
+  "name": "Alice",
+  "nicknames": ["Allie", "Skater"],
+  "codes": "1,3,5"
+},
+{
+  "name": "Robert",
+  "nicknames": ["Bob", "Bobby", "Biker"],
+  "codes": "2,4,6"
+}
+]
+</pre>
+
+Likewise we can split the `codes` field. Since these look like numbers, we can again use `splita`
+which tries to type-infer ints and floats when it finds them -- or, we can use
+[splitax](reference-dsl-builtin-functions.md#splitax) to ask for the string to be split up into
+substrings, with no type inference:
+
+<pre class="pre-highlight-in-pair">
+<b>mlr --icsv --ojson --from data/split1.csv put '$codes = splita($codes, ",")'</b>
+</pre>
+<pre class="pre-non-highlight-in-pair">
+[
+{
+  "name": "Alice",
+  "nicknames": "Allie,Skater",
+  "codes": [1, 3, 5]
+},
+{
+  "name": "Robert",
+  "nicknames": "Bob,Bobby,Biker",
+  "codes": [2, 4, 6]
+}
+]
+</pre>
+
+<pre class="pre-highlight-in-pair">
+<b>mlr --icsv --ojson --from data/split1.csv put '$codes = splitax($codes, ",")'</b>
+</pre>
+<pre class="pre-non-highlight-in-pair">
+[
+{
+  "name": "Alice",
+  "nicknames": "Allie,Skater",
+  "codes": ["1", "3", "5"]
+},
+{
+  "name": "Robert",
+  "nicknames": "Bob,Bobby,Biker",
+  "codes": ["2", "4", "6"]
+}
+]
+</pre>
+
+We can do operations on the array, then use [joinv](reference-dsl-builtin-functions.md#joinv) to put them
+back together:
+
+<pre class="pre-highlight-in-pair">
+<b>mlr --icsv --ojson --from data/split1.csv put '</b>
+<b>  $codes = splita($codes, ",");                       # split into array of integers</b>
+<b>  $codes = apply($codes, func(e) { return e * 100 }); # do math on the array of integers</b>
+<b>  $codes = joinv($codes, ",");                        # join the updated array back into a string</b>
+<b>'</b>
+</pre>
+<pre class="pre-non-highlight-in-pair">
+[
+{
+  "name": "Alice",
+  "nicknames": "Allie,Skater",
+  "codes": "100,300,500"
+},
+{
+  "name": "Robert",
+  "nicknames": "Bob,Bobby,Biker",
+  "codes": "200,400,600"
+}
+]
+</pre>
+
+<pre class="pre-highlight-in-pair">
+<b>mlr --csv --from data/split1.csv put '</b>
+<b>  $codes = splita($codes, ",");                       # split into array of integers</b>
+<b>  $codes = apply($codes, func(e) { return e * 100 }); # do math on the array of integers</b>
+<b>  $codes = joinv($codes, ",");                        # join the updated array back into a string</b>
+<b>'</b>
+</pre>
+<pre class="pre-non-highlight-in-pair">
+name,nicknames,codes
+Alice,"Allie,Skater","100,300,500"
+Robert,"Bob,Bobby,Biker","200,400,600"
+</pre>
+
+The full list of split functions includes
+[splita](reference-dsl-builtin-functions.md#splita),
+[splitax](reference-dsl-builtin-functions.md#splitax),
+[splitkv](reference-dsl-builtin-functions.md#splitkv),
+[splitkvx](reference-dsl-builtin-functions.md#splitkvx),
+[splitnv](reference-dsl-builtin-functions.md#splitnv), and
+[splitnx](reference-dsl-builtin-functions.md#splitx). The flavors have to to with what the output is
+-- arrays or maps -- and whether or not type-inference is done.
+
+The full list of join functions includes [joink](reference-dsl-builtin-functions.md#joink),
+[joinv](reference-dsl-builtin-functions.md#joinv), and
+[joinkv](reference-dsl-builtin-functions.md#joinkv). Here the flavors have to do with whether we put
+array/map keys, values, or both into the resulting string.
+
+## Example: shortening hostnames
+
+Suppose you want to just keep the first two components of the hostnames:
+
+<pre class="pre-highlight-in-pair">
+<b>cat data/hosts.csv</b>
+</pre>
+<pre class="pre-non-highlight-in-pair">
+host,status
+xy01.east.acme.org,up
+ab02.west.acme.org,down
+ac91.west.acme.org,up
+</pre>
+
+Using the [`splita`](reference-dsl-builtin-functions.md#splita) and
+[`joinv`](reference-dsl-builtin-functions.md#joinv) functions, along with
+[array slicing](reference-main-arrays.md#slicing), we get
+
+<pre class="pre-highlight-in-pair">
+<b>mlr --csv --from data/hosts.csv put '$host = joinv(splita($host, ".")[1:2], ".")'</b>
+</pre>
+<pre class="pre-non-highlight-in-pair">
+host,status
+xy01.east,up
+ab02.west,down
+ac91.west,up
+</pre>
+
+## Flatten/unflatten: representing arrays in CSV
+
+In the above examples, when we split a string field into an array, we used JSON output. That's
+because JSON permits nested data structures. For CSV output, Miller uses, by default, a
+_flatten/unflatten strategy_: array-valued fields are turned into multiple CSV columns. For example:
+
+<pre class="pre-highlight-in-pair">
+<b>mlr --icsv --ojson --from data/split1.csv put '$codes = splitax($codes, ",")'</b>
+</pre>
+<pre class="pre-non-highlight-in-pair">
+[
+{
+  "name": "Alice",
+  "nicknames": "Allie,Skater",
+  "codes": ["1", "3", "5"]
+},
+{
+  "name": "Robert",
+  "nicknames": "Bob,Bobby,Biker",
+  "codes": ["2", "4", "6"]
+}
+]
+</pre>
+
+<pre class="pre-highlight-in-pair">
+<b>mlr --csv --from data/split1.csv put '$codes = splitax($codes, ",")'</b>
+</pre>
+<pre class="pre-non-highlight-in-pair">
+name,nicknames,codes.1,codes.2,codes.3
+Alice,"Allie,Skater",1,3,5
+Robert,"Bob,Bobby,Biker",2,4,6
+</pre>
+
+See the [flatten/unflatten: converting between JSON and tabular formats¶](flatten-unflatten.md)
+for more on this default behavior, including how to override it when you prefer.
+
+## Splitting and joining with different separators
+
+The above is well and good when a string contains pieces with multiple instances of the same
+separator.  However sometimes we have input like `5-18:53:20`. Here we can use the more flexible
+[unformat](reference-dsl-builtin-functions.md#unformat) and
+[format](reference-dsl-builtin-functions.md#format) DSL functions.  (As above, there's an
+[unformatx](reference-dsl-builtin-functions.md#unformatx) function if you want Miller to just split
+the string into string pieces without trying to type-infer them.)
+
+<pre class="pre-highlight-in-pair">
+<b>cat data/split2.csv</b>
+</pre>
+<pre class="pre-non-highlight-in-pair">
+stamp,event
+5-18:53:20,open
+5-18:53:22,close
+5-19:07:34,open
+5-19:07:56,close
+</pre>
+
+<pre class="pre-highlight-in-pair">
+<b>mlr --icsv --ojson --from data/split2.csv put '$pieces = unformat("{}-{}:{}:{}", $stamp)'</b>
+</pre>
+<pre class="pre-non-highlight-in-pair">
+[
+{
+  "stamp": "5-18:53:20",
+  "event": "open",
+  "pieces": [5, 18, 53, 20]
+},
+{
+  "stamp": "5-18:53:22",
+  "event": "close",
+  "pieces": [5, 18, 53, 22]
+},
+{
+  "stamp": "5-19:07:34",
+  "event": "open",
+  "pieces": [5, 19, "07", 34]
+},
+{
+  "stamp": "5-19:07:56",
+  "event": "close",
+  "pieces": [5, 19, "07", 56]
+}
+]
+</pre>
+
+<pre class="pre-highlight-in-pair">
+<b>mlr --icsv --opprint --from data/split2.csv put '</b>
+<b>  pieces = unformat("{}-{}:{}:{}", $stamp);</b>
+<b>  $description = format("{} day(s) {} hour(s) {} minute(s) {} seconds(s)", pieces[1], pieces[2], pieces[3], pieces[4]);</b>
+<b>'</b>
+</pre>
+<pre class="pre-non-highlight-in-pair">
+stamp      event description
+5-18:53:20 open  5 day(s) 18 hour(s) 53 minute(s) 20 seconds(s)
+5-18:53:22 close 5 day(s) 18 hour(s) 53 minute(s) 22 seconds(s)
+5-19:07:34 open  5 day(s) 19 hour(s) 07 minute(s) 34 seconds(s)
+5-19:07:56 close 5 day(s) 19 hour(s) 07 minute(s) 56 seconds(s)
+</pre>
+
+## Using regular expressions and capture groups
+
+If you prefer [regular expressions](reference-main-regular-expressions.md), those can be used in this context as well:
+
+<pre class="pre-highlight-in-pair">
+<b>mlr --icsv --opprint --from data/split2.csv put '</b>
+<b>  if ($stamp =~ "([0-9]+)-([0-9]+):([0-9]+):([0-9]+)") {</b>
+<b>    $description = "\1 day(s) \2 hour(s) \3 minute(s) \4 seconds(s)";</b>
+<b>  }</b>
+<b>'</b>
+</pre>
+<pre class="pre-non-highlight-in-pair">
+stamp      event description
+5-18:53:20 open  5 day(s) 18 hour(s) 53 minute(s) 20 seconds(s)
+5-18:53:22 close 5 day(s) 18 hour(s) 53 minute(s) 22 seconds(s)
+5-19:07:34 open  5 day(s) 19 hour(s) 07 minute(s) 34 seconds(s)
+5-19:07:56 close 5 day(s) 19 hour(s) 07 minute(s) 56 seconds(s)
+</pre>
+
+## Special case: timestamps
+
+Timestamps are complex enough to merit their own handling: see the
+[DSL datetime/timezone functions page](reference-dsl-time.md). in particular the
+[strptime](reference-dsl-builtin-functions.md#strptime)
+and
+[strftime](reference-dsl-builtin-functions.md#strftime)
+functions.
+
+## Special case: dhms and seconds
+
+For historical reasons, Miller has a way to represent seconds in a more human-readable format, using days,
+hours, minutes, and seconds. For example:
+
+<pre class="pre-highlight-in-pair">
+<b>mlr --c2p --from data/sec2dhms.csv put '$dhms = sec2dhms($sec)'</b>
+</pre>
+<pre class="pre-non-highlight-in-pair">
+sec     dhms
+1       1s
+100     1m40s
+10000   2h46m40s
+1000000 11d13h46m40s
+</pre>
+
+Please see
+[sec2dhms](reference-dsl-builtin-functions.md#sec2dhms)
+and
+[dhms2sec](reference-dsl-builtin-functions.md#sec2dhms)
+
+## Special case: financial values
+
+One way to handle currencies is to sub out the currency marker (like `$`) as well as commas:
+
+<pre class="pre-highlight-in-pair">
+<b>echo 'd=$1234.56' | mlr put '$d = float(gsub(ssub($d, "$", ""), ",", ""))'</b>
+</pre>
+<pre class="pre-non-highlight-in-pair">
+d=1234.56
+</pre>
+
+## Nesting and unnesting fields
+
+Sometimes we want not to split strings into arrays, but rather, to use them to create multiple records.
+
+For example:
+
+<pre class="pre-highlight-in-pair">
+<b>mlr --c2p cat data/split1.csv</b>
+</pre>
+<pre class="pre-non-highlight-in-pair">
+name   nicknames       codes
+Alice  Allie,Skater    1,3,5
+Robert Bob,Bobby,Biker 2,4,6
+</pre>
+
+<pre class="pre-highlight-in-pair">
+<b>mlr --c2p nest --evar , -f nicknames data/split1.csv</b>
+</pre>
+<pre class="pre-non-highlight-in-pair">
+name   nicknames codes
+Alice  Allie     1,3,5
+Alice  Skater    1,3,5
+Robert Bob       2,4,6
+Robert Bobby     2,4,6
+Robert Biker     2,4,6
+</pre>
+
+See [documentation on the nest verb](reference-verbs.md#nest) for general information on how to do this.
--- a/docs/src/parsing-and-formatting-fields.md.in
+++ b/docs/src/parsing-and-formatting-fields.md.in
@ -0,0 +1,190 @@
+# Parsing and formatting fields
+
+Miller offers several ways to split strings into pieces (parsing them), and to put things together
+into a string (formatting them).
+
+## Splitting and joining with the same separator
+
+One pattern we often have is items separated by the same separator, e.g. a field with value
+`1;2;3;4` -- with a `;` between every pair of items. There are several useful
+[DSL](miller-programming-language.md) [functions](reference-dsl-builtin-functions.md) for splitting
+a string into pieces, and joining pieces into a string.
+
+For example, suppose we have a CSV file like this:
+
+GENMD-RUN-COMMAND
+cat data/split1.csv
+GENMD-EOF
+
+GENMD-RUN-COMMAND
+mlr --icsv --ojson cat data/split1.csv
+GENMD-EOF
+
+Then we can use the [`splita`](reference-dsl-builtin-functions.md#splita) function to split the
+`nicknames` string into an array of strings:
+
+GENMD-RUN-COMMAND
+mlr --icsv --ojson --from data/split1.csv put '$nicknames = splita($nicknames, ",")'
+GENMD-EOF
+
+Likewise we can split the `codes` field. Since these look like numbers, we can again use `splita`
+which tries to type-infer ints and floats when it finds them -- or, we can use
+[splitax](reference-dsl-builtin-functions.md#splitax) to ask for the string to be split up into
+substrings, with no type inference:
+
+GENMD-RUN-COMMAND
+mlr --icsv --ojson --from data/split1.csv put '$codes = splita($codes, ",")'
+GENMD-EOF
+
+GENMD-RUN-COMMAND
+mlr --icsv --ojson --from data/split1.csv put '$codes = splitax($codes, ",")'
+GENMD-EOF
+
+We can do operations on the array, then use [joinv](reference-dsl-builtin-functions.md#joinv) to put them
+back together:
+
+GENMD-RUN-COMMAND
+mlr --icsv --ojson --from data/split1.csv put '
+  $codes = splita($codes, ",");                       # split into array of integers
+  $codes = apply($codes, func(e) { return e * 100 }); # do math on the array of integers
+  $codes = joinv($codes, ",");                        # join the updated array back into a string
+'
+GENMD-EOF
+
+GENMD-RUN-COMMAND
+mlr --csv --from data/split1.csv put '
+  $codes = splita($codes, ",");                       # split into array of integers
+  $codes = apply($codes, func(e) { return e * 100 }); # do math on the array of integers
+  $codes = joinv($codes, ",");                        # join the updated array back into a string
+'
+GENMD-EOF
+
+The full list of split functions includes
+[splita](reference-dsl-builtin-functions.md#splita),
+[splitax](reference-dsl-builtin-functions.md#splitax),
+[splitkv](reference-dsl-builtin-functions.md#splitkv),
+[splitkvx](reference-dsl-builtin-functions.md#splitkvx),
+[splitnv](reference-dsl-builtin-functions.md#splitnv), and
+[splitnx](reference-dsl-builtin-functions.md#splitx). The flavors have to to with what the output is
+-- arrays or maps -- and whether or not type-inference is done.
+
+The full list of join functions includes [joink](reference-dsl-builtin-functions.md#joink),
+[joinv](reference-dsl-builtin-functions.md#joinv), and
+[joinkv](reference-dsl-builtin-functions.md#joinkv). Here the flavors have to do with whether we put
+array/map keys, values, or both into the resulting string.
+
+## Example: shortening hostnames
+
+Suppose you want to just keep the first two components of the hostnames:
+
+GENMD-RUN-COMMAND
+cat data/hosts.csv
+GENMD-EOF
+
+Using the [`splita`](reference-dsl-builtin-functions.md#splita) and
+[`joinv`](reference-dsl-builtin-functions.md#joinv) functions, along with
+[array slicing](reference-main-arrays.md#slicing), we get
+
+GENMD-RUN-COMMAND
+mlr --csv --from data/hosts.csv put '$host = joinv(splita($host, ".")[1:2], ".")'
+GENMD-EOF
+
+## Flatten/unflatten: representing arrays in CSV
+
+In the above examples, when we split a string field into an array, we used JSON output. That's
+because JSON permits nested data structures. For CSV output, Miller uses, by default, a
+_flatten/unflatten strategy_: array-valued fields are turned into multiple CSV columns. For example:
+
+GENMD-RUN-COMMAND
+mlr --icsv --ojson --from data/split1.csv put '$codes = splitax($codes, ",")'
+GENMD-EOF
+
+GENMD-RUN-COMMAND
+mlr --csv --from data/split1.csv put '$codes = splitax($codes, ",")'
+GENMD-EOF
+
+See the [flatten/unflatten: converting between JSON and tabular formats¶](flatten-unflatten.md)
+for more on this default behavior, including how to override it when you prefer.
+
+## Splitting and joining with different separators
+
+The above is well and good when a string contains pieces with multiple instances of the same
+separator.  However sometimes we have input like `5-18:53:20`. Here we can use the more flexible
+[unformat](reference-dsl-builtin-functions.md#unformat) and
+[format](reference-dsl-builtin-functions.md#format) DSL functions.  (As above, there's an
+[unformatx](reference-dsl-builtin-functions.md#unformatx) function if you want Miller to just split
+the string into string pieces without trying to type-infer them.)
+
+GENMD-RUN-COMMAND
+cat data/split2.csv
+GENMD-EOF
+
+GENMD-RUN-COMMAND
+mlr --icsv --ojson --from data/split2.csv put '$pieces = unformat("{}-{}:{}:{}", $stamp)'
+GENMD-EOF
+
+GENMD-RUN-COMMAND
+mlr --icsv --opprint --from data/split2.csv put '
+  pieces = unformat("{}-{}:{}:{}", $stamp);
+  $description = format("{} day(s) {} hour(s) {} minute(s) {} seconds(s)", pieces[1], pieces[2], pieces[3], pieces[4]);
+'
+GENMD-EOF
+
+## Using regular expressions and capture groups
+
+If you prefer [regular expressions](reference-main-regular-expressions.md), those can be used in this context as well:
+
+GENMD-RUN-COMMAND
+mlr --icsv --opprint --from data/split2.csv put '
+  if ($stamp =~ "([0-9]+)-([0-9]+):([0-9]+):([0-9]+)") {
+    $description = "\1 day(s) \2 hour(s) \3 minute(s) \4 seconds(s)";
+  }
+'
+GENMD-EOF
+
+## Special case: timestamps
+
+Timestamps are complex enough to merit their own handling: see the
+[DSL datetime/timezone functions page](reference-dsl-time.md). in particular the
+[strptime](reference-dsl-builtin-functions.md#strptime)
+and
+[strftime](reference-dsl-builtin-functions.md#strftime)
+functions.
+
+## Special case: dhms and seconds
+
+For historical reasons, Miller has a way to represent seconds in a more human-readable format, using days,
+hours, minutes, and seconds. For example:
+
+GENMD-RUN-COMMAND
+mlr --c2p --from data/sec2dhms.csv put '$dhms = sec2dhms($sec)'
+GENMD-EOF
+
+Please see
+[sec2dhms](reference-dsl-builtin-functions.md#sec2dhms)
+and
+[dhms2sec](reference-dsl-builtin-functions.md#sec2dhms)
+
+## Special case: financial values
+
+One way to handle currencies is to sub out the currency marker (like `$`) as well as commas:
+
+GENMD-RUN-COMMAND
+echo 'd=$1234.56' | mlr put '$d = float(gsub(ssub($d, "$", ""), ",", ""))'
+GENMD-EOF
+
+## Nesting and unnesting fields
+
+Sometimes we want not to split strings into arrays, but rather, to use them to create multiple records.
+
+For example:
+
+GENMD-RUN-COMMAND
+mlr --c2p cat data/split1.csv
+GENMD-EOF
+
+GENMD-RUN-COMMAND
+mlr --c2p nest --evar , -f nicknames data/split1.csv
+GENMD-EOF
+
+See [documentation on the nest verb](reference-verbs.md#nest) for general information on how to do this.
--- a/docs/src/shapes-of-data.md
+++ b/docs/src/shapes-of-data.md
@ -302,133 +302,6 @@ yellow,circle,true,9,87,63.5058,8.3350,3

 The difference is a matter of taste (although `mlr cat -n` puts the counter first).

-## Splitting a string and taking a few of the components
-
-Suppose you want to just keep the first two components of the hostnames:
-
-<pre class="pre-highlight-in-pair">
-<b>cat data/hosts.csv</b>
-</pre>
-<pre class="pre-non-highlight-in-pair">
-host,status
-xy01.east.acme.org,up
-ab02.west.acme.org,down
-ac91.west.acme.org,up
-</pre>
-
-Using the [`splita`](reference-dsl-builtin-functions.md#splita) and
-[`joinv`](reference-dsl-builtin-functions.md#joinv) functions, along with
-[array slicing](reference-main-arrays.md#slicing), we get
-
-<pre class="pre-highlight-in-pair">
-<b>mlr --csv --from data/hosts.csv put '$host = joinv(splita($host, ".")[1:2], ".")'</b>
-</pre>
-<pre class="pre-non-highlight-in-pair">
-host,status
-xy01.east,up
-ab02.west,down
-ac91.west,up
-</pre>
-
-## Splitting nested fields
-
-Suppose you have a TSV file like this:
-
-<pre class="pre-non-highlight-non-pair">
-a	b
-x	z
-s	u:v:w
-</pre>
-
-The simplest option is to use [nest](reference-verbs.md#nest):
-
-<pre class="pre-highlight-in-pair">
-<b>mlr --tsv nest --explode --values --across-records -f b --nested-fs : data/nested.tsv</b>
-</pre>
-<pre class="pre-non-highlight-in-pair">
-a	b
-x	z
-s	u
-s	v
-s	w
-</pre>
-
-<pre class="pre-highlight-in-pair">
-<b>mlr --tsv nest --explode --values --across-fields  -f b --nested-fs : data/nested.tsv</b>
-</pre>
-<pre class="pre-non-highlight-in-pair">
-a	b_1
-x	z
-
-a	b_1	b_2	b_3
-s	u	v	w
-</pre>
-
-While `mlr nest` is simplest, let's also take a look at a few ways to do this using the `put` DSL.
-
-One option to split out the colon-delimited values in the `b` column is to use `splitnv` to create an integer-indexed map and loop over it, adding new fields to the current record:
-
-<pre class="pre-highlight-in-pair">
-<b>mlr --from data/nested.tsv --itsv --oxtab put '</b>
-<b>  o = splitnv($b, ":");</b>
-<b>  for (k,v in o) {</b>
-<b>    $["p".k]=v</b>
-<b>  }</b>
-<b>'</b>
-</pre>
-<pre class="pre-non-highlight-in-pair">
-a  x
-b  z
-p1 z
-
-a  s
-b  u:v:w
-p1 u
-p2 v
-p3 w
-</pre>
-
-while another is to loop over the same map from `splitnv` and use it (with `put -q` to suppress printing the original record) to produce multiple records:
-
-<pre class="pre-highlight-in-pair">
-<b>mlr --from data/nested.tsv --itsv --oxtab put -q '</b>
-<b>  o = splitnv($b, ":");</b>
-<b>  for (k,v in o) {</b>
-<b>    x = mapsum($*, {"b":v});</b>
-<b>    emit x</b>
-<b>  }</b>
-<b>'</b>
-</pre>
-<pre class="pre-non-highlight-in-pair">
-a x
-b z
-
-a s
-b u
-
-a s
-b v
-
-a s
-b w
-</pre>
-
-<pre class="pre-highlight-in-pair">
-<b>mlr --from data/nested.tsv --tsv put -q '</b>
-<b>  o = splitnv($b, ":");</b>
-<b>  for (k,v in o) {</b>
-<b>    x = mapsum($*, {"b":v}); emit x</b>
-<b>  }</b>
-<b>'</b>
-</pre>
-<pre class="pre-non-highlight-in-pair">
-a	b
-x	z
-s	u
-s	v
-s	w
-</pre>
-
 ## Options for dealing with duplicate rows

 If your data has records appearing multiple times, you can use [uniq](reference-verbs.md#uniq) to show and/or count the unique records.
--- a/docs/src/shapes-of-data.md.in
+++ b/docs/src/shapes-of-data.md.in
@ -168,72 +168,6 @@ GENMD-EOF

 The difference is a matter of taste (although `mlr cat -n` puts the counter first).

-## Splitting a string and taking a few of the components
-
-Suppose you want to just keep the first two components of the hostnames:
-
-GENMD-RUN-COMMAND
-cat data/hosts.csv
-GENMD-EOF
-
-Using the [`splita`](reference-dsl-builtin-functions.md#splita) and
-[`joinv`](reference-dsl-builtin-functions.md#joinv) functions, along with
-[array slicing](reference-main-arrays.md#slicing), we get
-
-GENMD-RUN-COMMAND
-mlr --csv --from data/hosts.csv put '$host = joinv(splita($host, ".")[1:2], ".")'
-GENMD-EOF
-
-## Splitting nested fields
-
-Suppose you have a TSV file like this:
-
-GENMD-INCLUDE-ESCAPED(data/nested.tsv)
-
-The simplest option is to use [nest](reference-verbs.md#nest):
-
-GENMD-RUN-COMMAND
-mlr --tsv nest --explode --values --across-records -f b --nested-fs : data/nested.tsv
-GENMD-EOF
-
-GENMD-RUN-COMMAND
-mlr --tsv nest --explode --values --across-fields  -f b --nested-fs : data/nested.tsv
-GENMD-EOF
-
-While `mlr nest` is simplest, let's also take a look at a few ways to do this using the `put` DSL.
-
-One option to split out the colon-delimited values in the `b` column is to use `splitnv` to create an integer-indexed map and loop over it, adding new fields to the current record:
-
-GENMD-RUN-COMMAND
-mlr --from data/nested.tsv --itsv --oxtab put '
-  o = splitnv($b, ":");
-  for (k,v in o) {
-    $["p".k]=v
-  }
-'
-GENMD-EOF
-
-while another is to loop over the same map from `splitnv` and use it (with `put -q` to suppress printing the original record) to produce multiple records:
-
-GENMD-RUN-COMMAND
-mlr --from data/nested.tsv --itsv --oxtab put -q '
-  o = splitnv($b, ":");
-  for (k,v in o) {
-    x = mapsum($*, {"b":v});
-    emit x
-  }
-'
-GENMD-EOF
-
-GENMD-RUN-COMMAND
-mlr --from data/nested.tsv --tsv put -q '
-  o = splitnv($b, ":");
-  for (k,v in o) {
-    x = mapsum($*, {"b":v}); emit x
-  }
-'
-GENMD-EOF
-
 ## Options for dealing with duplicate rows

 If your data has records appearing multiple times, you can use [uniq](reference-verbs.md#uniq) to show and/or count the unique records.
--- a/todo.txt
+++ b/todo.txt
@ -2,8 +2,8 @@
 RELEASES
 * plan 6.1.0
  o unsparsify -f CSV by default -- ? into CSV record-writer -- ? caveat that record 1 controls all ...
-  o fmt/unfmt/regex doc
-  o FAQ/examples reorg
+    - \d etc to DSL :( -- & parsing-and-formatting-fields.md.in
+  o mlr split -- needs an example page along with the tee DSL function

  o https://github.com/johnkerl/miller/issues?q=is%3Aissue+is%3Aopen+label%3Aneeds-documentation

@ -121,22 +121,6 @@ strict-mode ideas
 * srec:
  o abend unless $?x -- ?

----------------------------------------------------------------
-mlr join --left-fields a,b,c
-
----------------------------------------------------------------
-! Better functions for values manipulation, e.g. easier conversion of strings like "$1,234.56" into numeric values
-  o note on is_error(x) (or string(x) == "(error)")
-  ?  dhms w/ optional separgs -- ? what about fenceposting? ["d","h","m","s"] vs ["-",":",":",""] -- ?
-  o 'Ability to specify some formats that are fixed. Like we can process
-    "5d18h53m20s" format in *dhms* commands, but what about "5-18:53:20"? This is
-    a common format used by the SLURM resource manager.'
-  o linked-to faqent w/ -f -s etc ...
-
----------------------------------------------------------------
-k better print-interpolate with {} etc
-  doc: mlr --csv --from example.csv  put -q 'print format("Index {} at quantity {} and rate {}", $index, $quantity, $rate)'
-
 ----------------------------------------------------------------
 ! sysdate, sysdate_local; datediff ...