diff --git a/.gitignore b/.gitignore
index 9106aea83..3c1a41203 100644
--- a/.gitignore
+++ b/.gitignore
@@ -62,7 +62,6 @@ tags
*~
.deps/
.libs/
-Makefile
config.h
config.log
config.status
@@ -94,3 +93,24 @@ msys-2.0.dll
data/big.*
data/nmc?.*
+
+/Makefile
+c/Makefile
+c/auxents/Makefile
+c/cli/Makefile
+c/containers/Makefile
+c/dsl/Makefile
+c/experimental/Makefile
+c/input/Makefile
+c/lib/Makefile
+c/mapping/Makefile
+c/output/Makefile
+c/parsing/Makefile
+c/reg_test/Makefile
+c/reg_test/expected/Makefile
+c/reg_test/input/Makefile
+c/reg_test/input/comments/Makefile
+c/reg_test/input/rfc-csv/Makefile
+c/stream/Makefile
+c/unit_test/Makefile
+doc/Makefile
diff --git a/docs/Makefile b/docs/Makefile
new file mode 100644
index 000000000..9f9aa64ac
--- /dev/null
+++ b/docs/Makefile
@@ -0,0 +1,24 @@
+# Minimal makefile for Sphinx documentation
+#
+
+# You can set these variables from the command line, and also
+# from the environment for the first two.
+SPHINXOPTS ?=
+SPHINXBUILD ?= sphinx-build
+SOURCEDIR = .
+BUILDDIR = _build
+
+# Put it first so that "make" without argument is like "make help".
+help:
+ @$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
+
+.PHONY: help Makefile
+
+# Catch-all target: route all unknown targets to Sphinx using the new
+# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
+%: Makefile
+ ./genrst
+ @$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
+
+foo:
+ @$(SPHINXBUILD) -M html "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
diff --git a/docs/README.md b/docs/README.md
index 2a18c3941..240fbaf3c 100644
--- a/docs/README.md
+++ b/docs/README.md
@@ -37,6 +37,8 @@
## To do
+* separate install from build; latter to reference section
+* unix-toolkit context: needs a leading paragraph
* Let's all discuss if/how we want the v2 docs to be structured better than the v1 docs.
* !! cross-references all need work !!
* Scan for hrefs and other non-ported markup
diff --git a/docs/cookbook.rst b/docs/cookbook.rst
index 84fe44f19..261f2ed98 100644
--- a/docs/cookbook.rst
+++ b/docs/cookbook.rst
@@ -207,7 +207,7 @@ How to do ``$name = gsub($name, "old", "new")`` for all fields?
Full field renames and reassigns
----------------------------------------------------------------
-Using Miller 5.0.0's map literals and assigning to ``$*``, you can fully generalize ``mlr rename``, ``mlr reorder``, etc.:
+Using Miller 5.0.0's map literals and assigning to ``$*``, you can fully generalize :ref:`mlr rename `, :ref:`mlr reorder `, etc.
::
@@ -293,10 +293,9 @@ The difference is a matter of taste (although ``mlr cat -n`` puts the counter fi
Options for dealing with duplicate rows
----------------------------------------------------------------
-If your data has records appearing multiple times, you can use mlr uniq to show and/or count the unique
-records.
+If your data has records appearing multiple times, you can use :ref:`mlr uniq ` to show and/or count the unique records.
-If you want to look at partial uniqueness -- for example, show only the first record for each unique combination of the ``account_id`` and ``account_status`` fields -- you might use ``mlr head -n 1 -g account_id,account_status``. Please also see mlr head.
+If you want to look at partial uniqueness -- for example, show only the first record for each unique combination of the ``account_id`` and ``account_status`` fields -- you might use ``mlr head -n 1 -g account_id,account_status``. Please also see :ref:`mlr head `.
.. _cookbook-data-cleaning-examples:
@@ -379,7 +378,7 @@ Suppose you have a TSV file like this:
x z
s u:v:w
-The simplest option is to use ``mlr nest``:
+The simplest option is to use :ref:`mlr nest `:
::
@@ -1009,7 +1008,7 @@ There are field names ``a``, ``b``, ``v``, ``u``, ``x``, ``w`` in the data -- bu
1 - 2 - 3 -
- - 1 - - 2
-There is a keystroke-saving verb for this: ``mlr unsparsify``.
+There is a keystroke-saving verb for this: :ref:`mlr unsparsify `.
Parsing log-file output
----------------------------------------------------------------
diff --git a/docs/cookbook.rst.in b/docs/cookbook.rst.in
index 42a4f1e21..58557c4d5 100644
--- a/docs/cookbook.rst.in
+++ b/docs/cookbook.rst.in
@@ -112,7 +112,7 @@ POKI_RUN_COMMAND{{mlr --csv put -f data/sar.mlr data/sar.csv}}HERE
Full field renames and reassigns
----------------------------------------------------------------
-Using Miller 5.0.0's map literals and assigning to ``$*``, you can fully generalize ``mlr rename``, ``mlr reorder``, etc.:
+Using Miller 5.0.0's map literals and assigning to ``$*``, you can fully generalize :ref:`mlr rename `, :ref:`mlr reorder `, etc.
::
@@ -158,10 +158,9 @@ The difference is a matter of taste (although ``mlr cat -n`` puts the counter fi
Options for dealing with duplicate rows
----------------------------------------------------------------
-If your data has records appearing multiple times, you can use mlr uniq to show and/or count the unique
-records.
+If your data has records appearing multiple times, you can use :ref:`mlr uniq ` to show and/or count the unique records.
-If you want to look at partial uniqueness -- for example, show only the first record for each unique combination of the ``account_id`` and ``account_status`` fields -- you might use ``mlr head -n 1 -g account_id,account_status``. Please also see mlr head.
+If you want to look at partial uniqueness -- for example, show only the first record for each unique combination of the ``account_id`` and ``account_status`` fields -- you might use ``mlr head -n 1 -g account_id,account_status``. Please also see :ref:`mlr head `.
.. _cookbook-data-cleaning-examples:
@@ -211,7 +210,7 @@ Suppose you have a TSV file like this:
POKI_INCLUDE_ESCAPED(data/nested.tsv)HERE
-The simplest option is to use ``mlr nest``:
+The simplest option is to use :ref:`mlr nest `:
::
@@ -461,7 +460,7 @@ POKI_RUN_COMMAND{{mlr --ijson --ocsv put -q -f data/unsparsify.mlr data/sparse.j
POKI_RUN_COMMAND{{mlr --ijson --opprint put -q -f data/unsparsify.mlr data/sparse.json}}HERE
-There is a keystroke-saving verb for this: ``mlr unsparsify``.
+There is a keystroke-saving verb for this: :ref:`mlr unsparsify `.
Parsing log-file output
----------------------------------------------------------------
diff --git a/docs/cookbook3.rst b/docs/cookbook3.rst
index 93b54d63f..1537c4cd6 100644
--- a/docs/cookbook3.rst
+++ b/docs/cookbook3.rst
@@ -68,9 +68,9 @@ or
The former (``mlr stats1`` et al.) has the advantages of being easier to type, being less error-prone to type, and running faster.
-Nonetheless, out-of-stream variables (which I whimsically call *oosvars*), begin/end blocks, and emit statements give you the ability to implement logic -- if you wish to do so -- which isn't present in other Miller verbs. (If you find yourself often using the same out-of-stream-variable logic over and over, please file a request at https://github.com/johnkerl/miller/issues to get it implemented directly in C as a Miller verb of its own.)
+Nonetheless, out-of-stream variables (which I whimsically call *oosvars*), begin/end blocks, and emit statements give you the ability to implement logic -- if you wish to do so -- which isn't present in other Miller verbs. (If you find yourself often using the same out-of-stream-variable logic over and over, please file a request at https://github.com/johnkerl/miller/issues to get it implemented directly in C as a Miller verb of its own.)
-The following examples compute some things using oosvars which are already computable using Miller verbs, by way of providing food for thought.
+The following examples compute some things using oosvars which are already computable using Miller verbs, by way of providing food for thought.
Mean without/with oosvars
----------------------------------------------------------------
diff --git a/docs/faq.rst b/docs/faq.rst
index eb8426635..db412d134 100644
--- a/docs/faq.rst
+++ b/docs/faq.rst
@@ -158,7 +158,7 @@ The same conversion rules as above are being used. Namely:
Taken individually the rules make sense; taken collectively they produce a mishmash of types here.
-The solution is to **use the -S flag** for ``mlr put`` and/or ``mlr filter``. Then all field values are left as string. You can type-coerce on demand using syntax like ``'$z = int($x) + float($y)'``. (See also the put documentation; see also https://github.com/johnkerl/miller/issues/150.)
+The solution is to **use the -S flag** for ``mlr put`` and/or ``mlr filter``. Then all field values are left as string. You can type-coerce on demand using syntax like ``'$z = int($x) + float($y)'``. (See also :doc:`reference-dsl`; see also https://github.com/johnkerl/miller/issues/150.)
::
@@ -282,7 +282,7 @@ Given input like
2018-03-07,discovery
2018-02-03,allocation
-we can use ``strptime`` to parse the date field into seconds-since-epoch and then do numeric comparisons. Simply match your input dataset's date-formatting to the strptime format-string. For example:
+we can use ``strptime`` to parse the date field into seconds-since-epoch and then do numeric comparisons. Simply match your input dataset's date-formatting to the :ref:`reference-dsl-strptime` format-string. For example:
::
@@ -304,7 +304,7 @@ How can I handle commas-as-data in various formats?
"Xiao, Lin",administrator
"Khavari, Darius",tester
-Likewise JSON:
+Likewise :ref:`file-formats-json`:
::
@@ -312,7 +312,7 @@ Likewise JSON:
{ "Name": "Xiao, Lin", "Role": "administrator" }
{ "Name": "Khavari, Darius", "Role": "tester" }
-For Miller's XTAB there is no escaping for carriage returns, but commas work fine:
+For Miller's :ref:`vertical-tabular format ` there is no escaping for carriage returns, but commas work fine:
::
@@ -323,7 +323,7 @@ For Miller's XTAB there i
Name Khavari, Darius
Role tester
-But for DKVP and NIDX, commas are the default field separator. And -- as of Miller 5.4.0 anyway -- there is no CSV-style double-quote-handling like there is for CSV. So commas within the data look like delimiters:
+But for :ref:`Key-value_pairs ` and :ref:`index-numbered `, commas are the default field separator. And -- as of Miller 5.4.0 anyway -- there is no CSV-style double-quote-handling like there is for CSV. So commas within the data look like delimiters:
::
diff --git a/docs/faq.rst.in b/docs/faq.rst.in
index c1f88de4c..8fbca2c11 100644
--- a/docs/faq.rst.in
+++ b/docs/faq.rst.in
@@ -67,7 +67,7 @@ The same conversion rules as above are being used. Namely:
Taken individually the rules make sense; taken collectively they produce a mishmash of types here.
-The solution is to **use the -S flag** for ``mlr put`` and/or ``mlr filter``. Then all field values are left as string. You can type-coerce on demand using syntax like ``'$z = int($x) + float($y)'``. (See also the put documentation; see also https://github.com/johnkerl/miller/issues/150.)
+The solution is to **use the -S flag** for ``mlr put`` and/or ``mlr filter``. Then all field values are left as string. You can type-coerce on demand using syntax like ``'$z = int($x) + float($y)'``. (See also :doc:`reference-dsl`; see also https://github.com/johnkerl/miller/issues/150.)
::
@@ -146,7 +146,7 @@ Given input like
POKI_RUN_COMMAND{{cat dates.csv}}HERE
-we can use ``strptime`` to parse the date field into seconds-since-epoch and then do numeric comparisons. Simply match your input dataset's date-formatting to the strptime format-string. For example:
+we can use ``strptime`` to parse the date field into seconds-since-epoch and then do numeric comparisons. Simply match your input dataset's date-formatting to the :ref:`reference-dsl-strptime` format-string. For example:
::
@@ -163,19 +163,19 @@ How can I handle commas-as-data in various formats?
POKI_RUN_COMMAND{{cat commas.csv}}HERE
-Likewise JSON:
+Likewise :ref:`file-formats-json`:
::
POKI_RUN_COMMAND{{mlr --icsv --ojson cat commas.csv}}HERE
-For Miller's XTAB there is no escaping for carriage returns, but commas work fine:
+For Miller's :ref:`vertical-tabular format ` there is no escaping for carriage returns, but commas work fine:
::
POKI_RUN_COMMAND{{mlr --icsv --oxtab cat commas.csv}}HERE
-But for DKVP and NIDX, commas are the default field separator. And -- as of Miller 5.4.0 anyway -- there is no CSV-style double-quote-handling like there is for CSV. So commas within the data look like delimiters:
+But for :ref:`Key-value_pairs ` and :ref:`index-numbered `, commas are the default field separator. And -- as of Miller 5.4.0 anyway -- there is no CSV-style double-quote-handling like there is for CSV. So commas within the data look like delimiters:
::
diff --git a/docs/file-formats.rst b/docs/file-formats.rst
index 533e6906b..d2f10ab5b 100644
--- a/docs/file-formats.rst
+++ b/docs/file-formats.rst
@@ -72,6 +72,8 @@ Examples
| | 4 | 5 | 6 | | Record 2: "apple" => "4", "bat" => "5", "cog" => "6"
+-----------------------+
+.. _file-formats-csv:
+
CSV/TSV/ASV/USV/etc.
----------------------------------------------------------------
@@ -108,6 +110,8 @@ Here are things they have in common:
* The ``--implicit-csv-header`` flag for input and the ``--headerless-csv-output`` flag for output.
+.. _file-formats-dkvp:
+
DKVP: Key-value pairs
----------------------------------------------------------------
@@ -151,6 +155,8 @@ to analyze my logs.
See :doc:`reference` regarding how to specify separators other than the default equals-sign and comma.
+.. _file-formats-nidx:
+
NIDX: Index-numbered (toolkit style)
----------------------------------------------------------------
@@ -202,6 +208,8 @@ Example with index-numbered input and output:
the dawn's
light
+.. _file-formats-json:
+
Tabular JSON
----------------------------------------------------------------
@@ -379,6 +387,8 @@ JSON non-streaming
The JSON parser Miller uses does not return until all input is parsed: in particular this means that, unlike for other file formats, Miller does not (at present) handle JSON files in ``tail -f`` contexts.
+.. _file-formats-pprint:
+
PPRINT: Pretty-printed tabular
----------------------------------------------------------------
@@ -419,6 +429,8 @@ For output only (this isn't supported in the input-scanner as of 5.0.0) you can
| wye | pan | 5 | 0.5732889198020006 | 0.8636244699032729 |
+-----+-----+---+---------------------+---------------------+
+.. _file-formats-xtab:
+
XTAB: Vertical tabular
----------------------------------------------------------------
diff --git a/docs/file-formats.rst.in b/docs/file-formats.rst.in
index 0549b9577..7d05718df 100644
--- a/docs/file-formats.rst.in
+++ b/docs/file-formats.rst.in
@@ -10,6 +10,8 @@ Examples
POKI_RUN_COMMAND{{mlr --usage-data-format-examples}}HERE
+.. _file-formats-csv:
+
CSV/TSV/ASV/USV/etc.
----------------------------------------------------------------
@@ -46,6 +48,8 @@ Here are things they have in common:
* The ``--implicit-csv-header`` flag for input and the ``--headerless-csv-output`` flag for output.
+.. _file-formats-dkvp:
+
DKVP: Key-value pairs
----------------------------------------------------------------
@@ -84,6 +88,8 @@ to analyze my logs.
See :doc:`reference` regarding how to specify separators other than the default equals-sign and comma.
+.. _file-formats-nidx:
+
NIDX: Index-numbered (toolkit style)
----------------------------------------------------------------
@@ -113,6 +119,8 @@ POKI_RUN_COMMAND{{cat data/mydata.txt}}HERE
POKI_RUN_COMMAND{{mlr --nidx --fs ' ' --repifs cut -f 2,3 data/mydata.txt}}HERE
+.. _file-formats-json:
+
Tabular JSON
----------------------------------------------------------------
@@ -195,6 +203,8 @@ JSON non-streaming
The JSON parser Miller uses does not return until all input is parsed: in particular this means that, unlike for other file formats, Miller does not (at present) handle JSON files in ``tail -f`` contexts.
+.. _file-formats-pprint:
+
PPRINT: Pretty-printed tabular
----------------------------------------------------------------
@@ -214,6 +224,8 @@ For output only (this isn't supported in the input-scanner as of 5.0.0) you can
POKI_RUN_COMMAND{{mlr --opprint --barred cat data/small}}HERE
+.. _file-formats-xtab:
+
XTAB: Vertical tabular
----------------------------------------------------------------
diff --git a/docs/index.rst b/docs/index.rst
index 2c732a68d..7f6afa27b 100644
--- a/docs/index.rst
+++ b/docs/index.rst
@@ -32,11 +32,11 @@ Details
.. toctree::
:maxdepth: 1
- data-sharing
faq
cookbook
cookbook2
cookbook3
+ data-sharing
Reference
----------------------------------------------------------------
diff --git a/docs/mk-func-h2s.sh b/docs/mk-func-h2s.sh
index cde43e23e..cd2e41750 100755
--- a/docs/mk-func-h2s.sh
+++ b/docs/mk-func-h2s.sh
@@ -27,6 +27,9 @@ mlr -F | grep -v '^[a-zA-Z]' | uniq | while read funcname; do
elif [ "$funcname" = ':' ]; then
displayname='\:'
linkname='colon'
+ elif [ "$funcname" = '? :' ]; then
+ displayname='\?'
+ linkname='question-mark-colon'
fi
echo ""
@@ -66,6 +69,9 @@ mlr -F | grep '^[a-zA-Z]' | sort -u | while read funcname; do
elif [ "$funcname" = ':' ]; then
displayname='\:'
linkname='colon'
+ elif [ "$funcname" = '? :' ]; then
+ displayname='\?'
+ linkname='question-mark-colon'
fi
echo ""
diff --git a/docs/reference-dsl.rst b/docs/reference-dsl.rst
index 3be31008f..13e1cd05f 100644
--- a/docs/reference-dsl.rst
+++ b/docs/reference-dsl.rst
@@ -301,7 +301,7 @@ Built-in variables
These are written all in capital letters, such as ``NR``, ``NF``, ``FILENAME``, and only a small, specific set of them is defined by Miller.
-Namely, Miller supports the following five built-in variables for ``filter`` and ``put``, all ``awk``-inspired: ``NF``, ``NR``, ``FNR``, ``FILENUM``, and ``FILENAME``, as well as the mathematical constants ``M_PI`` and ``M_E``. Lastly, the ``ENV`` hashmap allows read access to environment variables, e.g. ``ENV["HOME"]`` or ``ENV["foo_".$hostname]``.
+Namely, Miller supports the following five built-in variables for :doc:`filter and put `, all ``awk``-inspired: ``NF``, ``NR``, ``FNR``, ``FILENUM``, and ``FILENAME``, as well as the mathematical constants ``M_PI`` and ``M_E``. Lastly, the ``ENV`` hashmap allows read access to environment variables, e.g. ``ENV["HOME"]`` or ``ENV["foo_".$hostname]``.
::
@@ -359,7 +359,7 @@ These are all **read-only** for the ``mlr put`` and ``mlr filter`` DSLs: they ma
Field names
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-Names of fields within stream records must be specified using a ``$`` in ``filter`` and ``put`` expressions, even though the dollar signs don't appear in the data stream itself. For integer-indexed data, this looks like ``awk``'s ``$1,$2,$3``, except that Miller allows non-numeric names such as ``$quantity`` or ``$hostname``. Likewise, enclose string literals in double quotes in ``filter`` expressions even though they don't appear in file data. In particular, ``mlr filter '$x=="abc"'`` passes through the record ``x=abc``.
+Names of fields within stream records must be specified using a ``$`` in :doc:`filter and put expressions `, even though the dollar signs don't appear in the data stream itself. For integer-indexed data, this looks like ``awk``'s ``$1,$2,$3``, except that Miller allows non-numeric names such as ``$quantity`` or ``$hostname``. Likewise, enclose string literals in double quotes in ``filter`` expressions even though they don't appear in file data. In particular, ``mlr filter '$x=="abc"'`` passes through the record ``x=abc``.
If field names have **special characters** such as ``.`` then you can use braces, e.g. ``'${field.name}'``.
@@ -3230,9 +3230,9 @@ You can get a list of all functions using **mlr -F**.
-.. _reference-dsl-? ::
+.. _reference-dsl-question-mark-colon:
-? :
+\?
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
::
diff --git a/docs/reference-dsl.rst.in b/docs/reference-dsl.rst.in
index 2a4cd8274..1f8025142 100644
--- a/docs/reference-dsl.rst.in
+++ b/docs/reference-dsl.rst.in
@@ -191,7 +191,7 @@ Built-in variables
These are written all in capital letters, such as ``NR``, ``NF``, ``FILENAME``, and only a small, specific set of them is defined by Miller.
-Namely, Miller supports the following five built-in variables for ``filter`` and ``put``, all ``awk``-inspired: ``NF``, ``NR``, ``FNR``, ``FILENUM``, and ``FILENAME``, as well as the mathematical constants ``M_PI`` and ``M_E``. Lastly, the ``ENV`` hashmap allows read access to environment variables, e.g. ``ENV["HOME"]`` or ``ENV["foo_".$hostname]``.
+Namely, Miller supports the following five built-in variables for :doc:`filter and put `, all ``awk``-inspired: ``NF``, ``NR``, ``FNR``, ``FILENUM``, and ``FILENAME``, as well as the mathematical constants ``M_PI`` and ``M_E``. Lastly, the ``ENV`` hashmap allows read access to environment variables, e.g. ``ENV["HOME"]`` or ``ENV["foo_".$hostname]``.
::
@@ -220,7 +220,7 @@ These are all **read-only** for the ``mlr put`` and ``mlr filter`` DSLs: they ma
Field names
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-Names of fields within stream records must be specified using a ``$`` in ``filter`` and ``put`` expressions, even though the dollar signs don't appear in the data stream itself. For integer-indexed data, this looks like ``awk``'s ``$1,$2,$3``, except that Miller allows non-numeric names such as ``$quantity`` or ``$hostname``. Likewise, enclose string literals in double quotes in ``filter`` expressions even though they don't appear in file data. In particular, ``mlr filter '$x=="abc"'`` passes through the record ``x=abc``.
+Names of fields within stream records must be specified using a ``$`` in :doc:`filter and put expressions `, even though the dollar signs don't appear in the data stream itself. For integer-indexed data, this looks like ``awk``'s ``$1,$2,$3``, except that Miller allows non-numeric names such as ``$quantity`` or ``$hostname``. Likewise, enclose string literals in double quotes in ``filter`` expressions even though they don't appear in file data. In particular, ``mlr filter '$x=="abc"'`` passes through the record ``x=abc``.
If field names have **special characters** such as ``.`` then you can use braces, e.g. ``'${field.name}'``.