diff --git a/.gitignore b/.gitignore index 9106aea83..3c1a41203 100644 --- a/.gitignore +++ b/.gitignore @@ -62,7 +62,6 @@ tags *~ .deps/ .libs/ -Makefile config.h config.log config.status @@ -94,3 +93,24 @@ msys-2.0.dll data/big.* data/nmc?.* + +/Makefile +c/Makefile +c/auxents/Makefile +c/cli/Makefile +c/containers/Makefile +c/dsl/Makefile +c/experimental/Makefile +c/input/Makefile +c/lib/Makefile +c/mapping/Makefile +c/output/Makefile +c/parsing/Makefile +c/reg_test/Makefile +c/reg_test/expected/Makefile +c/reg_test/input/Makefile +c/reg_test/input/comments/Makefile +c/reg_test/input/rfc-csv/Makefile +c/stream/Makefile +c/unit_test/Makefile +doc/Makefile diff --git a/docs/Makefile b/docs/Makefile new file mode 100644 index 000000000..9f9aa64ac --- /dev/null +++ b/docs/Makefile @@ -0,0 +1,24 @@ +# Minimal makefile for Sphinx documentation +# + +# You can set these variables from the command line, and also +# from the environment for the first two. +SPHINXOPTS ?= +SPHINXBUILD ?= sphinx-build +SOURCEDIR = . +BUILDDIR = _build + +# Put it first so that "make" without argument is like "make help". +help: + @$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O) + +.PHONY: help Makefile + +# Catch-all target: route all unknown targets to Sphinx using the new +# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS). +%: Makefile + ./genrst + @$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O) + +foo: + @$(SPHINXBUILD) -M html "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O) diff --git a/docs/README.md b/docs/README.md index 2a18c3941..240fbaf3c 100644 --- a/docs/README.md +++ b/docs/README.md @@ -37,6 +37,8 @@ ## To do +* separate install from build; latter to reference section +* unix-toolkit context: needs a leading paragraph * Let's all discuss if/how we want the v2 docs to be structured better than the v1 docs. * !! cross-references all need work !! * Scan for hrefs and other non-ported markup diff --git a/docs/cookbook.rst b/docs/cookbook.rst index 84fe44f19..261f2ed98 100644 --- a/docs/cookbook.rst +++ b/docs/cookbook.rst @@ -207,7 +207,7 @@ How to do ``$name = gsub($name, "old", "new")`` for all fields? Full field renames and reassigns ---------------------------------------------------------------- -Using Miller 5.0.0's map literals and assigning to ``$*``, you can fully generalize ``mlr rename``, ``mlr reorder``, etc.: +Using Miller 5.0.0's map literals and assigning to ``$*``, you can fully generalize :ref:`mlr rename `, :ref:`mlr reorder `, etc. :: @@ -293,10 +293,9 @@ The difference is a matter of taste (although ``mlr cat -n`` puts the counter fi Options for dealing with duplicate rows ---------------------------------------------------------------- -If your data has records appearing multiple times, you can use mlr uniq to show and/or count the unique -records. +If your data has records appearing multiple times, you can use :ref:`mlr uniq ` to show and/or count the unique records. -If you want to look at partial uniqueness -- for example, show only the first record for each unique combination of the ``account_id`` and ``account_status`` fields -- you might use ``mlr head -n 1 -g account_id,account_status``. Please also see mlr head. +If you want to look at partial uniqueness -- for example, show only the first record for each unique combination of the ``account_id`` and ``account_status`` fields -- you might use ``mlr head -n 1 -g account_id,account_status``. Please also see :ref:`mlr head `. .. _cookbook-data-cleaning-examples: @@ -379,7 +378,7 @@ Suppose you have a TSV file like this: x z s u:v:w -The simplest option is to use ``mlr nest``: +The simplest option is to use :ref:`mlr nest `: :: @@ -1009,7 +1008,7 @@ There are field names ``a``, ``b``, ``v``, ``u``, ``x``, ``w`` in the data -- bu 1 - 2 - 3 - - - 1 - - 2 -There is a keystroke-saving verb for this: ``mlr unsparsify``. +There is a keystroke-saving verb for this: :ref:`mlr unsparsify `. Parsing log-file output ---------------------------------------------------------------- diff --git a/docs/cookbook.rst.in b/docs/cookbook.rst.in index 42a4f1e21..58557c4d5 100644 --- a/docs/cookbook.rst.in +++ b/docs/cookbook.rst.in @@ -112,7 +112,7 @@ POKI_RUN_COMMAND{{mlr --csv put -f data/sar.mlr data/sar.csv}}HERE Full field renames and reassigns ---------------------------------------------------------------- -Using Miller 5.0.0's map literals and assigning to ``$*``, you can fully generalize ``mlr rename``, ``mlr reorder``, etc.: +Using Miller 5.0.0's map literals and assigning to ``$*``, you can fully generalize :ref:`mlr rename `, :ref:`mlr reorder `, etc. :: @@ -158,10 +158,9 @@ The difference is a matter of taste (although ``mlr cat -n`` puts the counter fi Options for dealing with duplicate rows ---------------------------------------------------------------- -If your data has records appearing multiple times, you can use mlr uniq to show and/or count the unique -records. +If your data has records appearing multiple times, you can use :ref:`mlr uniq ` to show and/or count the unique records. -If you want to look at partial uniqueness -- for example, show only the first record for each unique combination of the ``account_id`` and ``account_status`` fields -- you might use ``mlr head -n 1 -g account_id,account_status``. Please also see mlr head. +If you want to look at partial uniqueness -- for example, show only the first record for each unique combination of the ``account_id`` and ``account_status`` fields -- you might use ``mlr head -n 1 -g account_id,account_status``. Please also see :ref:`mlr head `. .. _cookbook-data-cleaning-examples: @@ -211,7 +210,7 @@ Suppose you have a TSV file like this: POKI_INCLUDE_ESCAPED(data/nested.tsv)HERE -The simplest option is to use ``mlr nest``: +The simplest option is to use :ref:`mlr nest `: :: @@ -461,7 +460,7 @@ POKI_RUN_COMMAND{{mlr --ijson --ocsv put -q -f data/unsparsify.mlr data/sparse.j POKI_RUN_COMMAND{{mlr --ijson --opprint put -q -f data/unsparsify.mlr data/sparse.json}}HERE -There is a keystroke-saving verb for this: ``mlr unsparsify``. +There is a keystroke-saving verb for this: :ref:`mlr unsparsify `. Parsing log-file output ---------------------------------------------------------------- diff --git a/docs/cookbook3.rst b/docs/cookbook3.rst index 93b54d63f..1537c4cd6 100644 --- a/docs/cookbook3.rst +++ b/docs/cookbook3.rst @@ -68,9 +68,9 @@ or The former (``mlr stats1`` et al.) has the advantages of being easier to type, being less error-prone to type, and running faster. -Nonetheless, out-of-stream variables (which I whimsically call *oosvars*), begin/end blocks, and emit statements give you the ability to implement logic -- if you wish to do so -- which isn't present in other Miller verbs. (If you find yourself often using the same out-of-stream-variable logic over and over, please file a request at https://github.com/johnkerl/miller/issues to get it implemented directly in C as a Miller verb of its own.) +Nonetheless, out-of-stream variables (which I whimsically call *oosvars*), begin/end blocks, and emit statements give you the ability to implement logic -- if you wish to do so -- which isn't present in other Miller verbs. (If you find yourself often using the same out-of-stream-variable logic over and over, please file a request at https://github.com/johnkerl/miller/issues to get it implemented directly in C as a Miller verb of its own.) -The following examples compute some things using oosvars which are already computable using Miller verbs, by way of providing food for thought. +The following examples compute some things using oosvars which are already computable using Miller verbs, by way of providing food for thought. Mean without/with oosvars ---------------------------------------------------------------- diff --git a/docs/faq.rst b/docs/faq.rst index eb8426635..db412d134 100644 --- a/docs/faq.rst +++ b/docs/faq.rst @@ -158,7 +158,7 @@ The same conversion rules as above are being used. Namely: Taken individually the rules make sense; taken collectively they produce a mishmash of types here. -The solution is to **use the -S flag** for ``mlr put`` and/or ``mlr filter``. Then all field values are left as string. You can type-coerce on demand using syntax like ``'$z = int($x) + float($y)'``. (See also the put documentation; see also https://github.com/johnkerl/miller/issues/150.) +The solution is to **use the -S flag** for ``mlr put`` and/or ``mlr filter``. Then all field values are left as string. You can type-coerce on demand using syntax like ``'$z = int($x) + float($y)'``. (See also :doc:`reference-dsl`; see also https://github.com/johnkerl/miller/issues/150.) :: @@ -282,7 +282,7 @@ Given input like 2018-03-07,discovery 2018-02-03,allocation -we can use ``strptime`` to parse the date field into seconds-since-epoch and then do numeric comparisons. Simply match your input dataset's date-formatting to the strptime format-string. For example: +we can use ``strptime`` to parse the date field into seconds-since-epoch and then do numeric comparisons. Simply match your input dataset's date-formatting to the :ref:`reference-dsl-strptime` format-string. For example: :: @@ -304,7 +304,7 @@ How can I handle commas-as-data in various formats? "Xiao, Lin",administrator "Khavari, Darius",tester -Likewise JSON: +Likewise :ref:`file-formats-json`: :: @@ -312,7 +312,7 @@ Likewise JSON: { "Name": "Xiao, Lin", "Role": "administrator" } { "Name": "Khavari, Darius", "Role": "tester" } -For Miller's XTAB there is no escaping for carriage returns, but commas work fine: +For Miller's :ref:`vertical-tabular format ` there is no escaping for carriage returns, but commas work fine: :: @@ -323,7 +323,7 @@ For Miller's XTAB there i Name Khavari, Darius Role tester -But for DKVP and NIDX, commas are the default field separator. And -- as of Miller 5.4.0 anyway -- there is no CSV-style double-quote-handling like there is for CSV. So commas within the data look like delimiters: +But for :ref:`Key-value_pairs ` and :ref:`index-numbered `, commas are the default field separator. And -- as of Miller 5.4.0 anyway -- there is no CSV-style double-quote-handling like there is for CSV. So commas within the data look like delimiters: :: diff --git a/docs/faq.rst.in b/docs/faq.rst.in index c1f88de4c..8fbca2c11 100644 --- a/docs/faq.rst.in +++ b/docs/faq.rst.in @@ -67,7 +67,7 @@ The same conversion rules as above are being used. Namely: Taken individually the rules make sense; taken collectively they produce a mishmash of types here. -The solution is to **use the -S flag** for ``mlr put`` and/or ``mlr filter``. Then all field values are left as string. You can type-coerce on demand using syntax like ``'$z = int($x) + float($y)'``. (See also the put documentation; see also https://github.com/johnkerl/miller/issues/150.) +The solution is to **use the -S flag** for ``mlr put`` and/or ``mlr filter``. Then all field values are left as string. You can type-coerce on demand using syntax like ``'$z = int($x) + float($y)'``. (See also :doc:`reference-dsl`; see also https://github.com/johnkerl/miller/issues/150.) :: @@ -146,7 +146,7 @@ Given input like POKI_RUN_COMMAND{{cat dates.csv}}HERE -we can use ``strptime`` to parse the date field into seconds-since-epoch and then do numeric comparisons. Simply match your input dataset's date-formatting to the strptime format-string. For example: +we can use ``strptime`` to parse the date field into seconds-since-epoch and then do numeric comparisons. Simply match your input dataset's date-formatting to the :ref:`reference-dsl-strptime` format-string. For example: :: @@ -163,19 +163,19 @@ How can I handle commas-as-data in various formats? POKI_RUN_COMMAND{{cat commas.csv}}HERE -Likewise JSON: +Likewise :ref:`file-formats-json`: :: POKI_RUN_COMMAND{{mlr --icsv --ojson cat commas.csv}}HERE -For Miller's XTAB there is no escaping for carriage returns, but commas work fine: +For Miller's :ref:`vertical-tabular format ` there is no escaping for carriage returns, but commas work fine: :: POKI_RUN_COMMAND{{mlr --icsv --oxtab cat commas.csv}}HERE -But for DKVP and NIDX, commas are the default field separator. And -- as of Miller 5.4.0 anyway -- there is no CSV-style double-quote-handling like there is for CSV. So commas within the data look like delimiters: +But for :ref:`Key-value_pairs ` and :ref:`index-numbered `, commas are the default field separator. And -- as of Miller 5.4.0 anyway -- there is no CSV-style double-quote-handling like there is for CSV. So commas within the data look like delimiters: :: diff --git a/docs/file-formats.rst b/docs/file-formats.rst index 533e6906b..d2f10ab5b 100644 --- a/docs/file-formats.rst +++ b/docs/file-formats.rst @@ -72,6 +72,8 @@ Examples | | 4 | 5 | 6 | | Record 2: "apple" => "4", "bat" => "5", "cog" => "6" +-----------------------+ +.. _file-formats-csv: + CSV/TSV/ASV/USV/etc. ---------------------------------------------------------------- @@ -108,6 +110,8 @@ Here are things they have in common: * The ``--implicit-csv-header`` flag for input and the ``--headerless-csv-output`` flag for output. +.. _file-formats-dkvp: + DKVP: Key-value pairs ---------------------------------------------------------------- @@ -151,6 +155,8 @@ to analyze my logs. See :doc:`reference` regarding how to specify separators other than the default equals-sign and comma. +.. _file-formats-nidx: + NIDX: Index-numbered (toolkit style) ---------------------------------------------------------------- @@ -202,6 +208,8 @@ Example with index-numbered input and output: the dawn's light +.. _file-formats-json: + Tabular JSON ---------------------------------------------------------------- @@ -379,6 +387,8 @@ JSON non-streaming The JSON parser Miller uses does not return until all input is parsed: in particular this means that, unlike for other file formats, Miller does not (at present) handle JSON files in ``tail -f`` contexts. +.. _file-formats-pprint: + PPRINT: Pretty-printed tabular ---------------------------------------------------------------- @@ -419,6 +429,8 @@ For output only (this isn't supported in the input-scanner as of 5.0.0) you can | wye | pan | 5 | 0.5732889198020006 | 0.8636244699032729 | +-----+-----+---+---------------------+---------------------+ +.. _file-formats-xtab: + XTAB: Vertical tabular ---------------------------------------------------------------- diff --git a/docs/file-formats.rst.in b/docs/file-formats.rst.in index 0549b9577..7d05718df 100644 --- a/docs/file-formats.rst.in +++ b/docs/file-formats.rst.in @@ -10,6 +10,8 @@ Examples POKI_RUN_COMMAND{{mlr --usage-data-format-examples}}HERE +.. _file-formats-csv: + CSV/TSV/ASV/USV/etc. ---------------------------------------------------------------- @@ -46,6 +48,8 @@ Here are things they have in common: * The ``--implicit-csv-header`` flag for input and the ``--headerless-csv-output`` flag for output. +.. _file-formats-dkvp: + DKVP: Key-value pairs ---------------------------------------------------------------- @@ -84,6 +88,8 @@ to analyze my logs. See :doc:`reference` regarding how to specify separators other than the default equals-sign and comma. +.. _file-formats-nidx: + NIDX: Index-numbered (toolkit style) ---------------------------------------------------------------- @@ -113,6 +119,8 @@ POKI_RUN_COMMAND{{cat data/mydata.txt}}HERE POKI_RUN_COMMAND{{mlr --nidx --fs ' ' --repifs cut -f 2,3 data/mydata.txt}}HERE +.. _file-formats-json: + Tabular JSON ---------------------------------------------------------------- @@ -195,6 +203,8 @@ JSON non-streaming The JSON parser Miller uses does not return until all input is parsed: in particular this means that, unlike for other file formats, Miller does not (at present) handle JSON files in ``tail -f`` contexts. +.. _file-formats-pprint: + PPRINT: Pretty-printed tabular ---------------------------------------------------------------- @@ -214,6 +224,8 @@ For output only (this isn't supported in the input-scanner as of 5.0.0) you can POKI_RUN_COMMAND{{mlr --opprint --barred cat data/small}}HERE +.. _file-formats-xtab: + XTAB: Vertical tabular ---------------------------------------------------------------- diff --git a/docs/index.rst b/docs/index.rst index 2c732a68d..7f6afa27b 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -32,11 +32,11 @@ Details .. toctree:: :maxdepth: 1 - data-sharing faq cookbook cookbook2 cookbook3 + data-sharing Reference ---------------------------------------------------------------- diff --git a/docs/mk-func-h2s.sh b/docs/mk-func-h2s.sh index cde43e23e..cd2e41750 100755 --- a/docs/mk-func-h2s.sh +++ b/docs/mk-func-h2s.sh @@ -27,6 +27,9 @@ mlr -F | grep -v '^[a-zA-Z]' | uniq | while read funcname; do elif [ "$funcname" = ':' ]; then displayname='\:' linkname='colon' + elif [ "$funcname" = '? :' ]; then + displayname='\?' + linkname='question-mark-colon' fi echo "" @@ -66,6 +69,9 @@ mlr -F | grep '^[a-zA-Z]' | sort -u | while read funcname; do elif [ "$funcname" = ':' ]; then displayname='\:' linkname='colon' + elif [ "$funcname" = '? :' ]; then + displayname='\?' + linkname='question-mark-colon' fi echo "" diff --git a/docs/reference-dsl.rst b/docs/reference-dsl.rst index 3be31008f..13e1cd05f 100644 --- a/docs/reference-dsl.rst +++ b/docs/reference-dsl.rst @@ -301,7 +301,7 @@ Built-in variables These are written all in capital letters, such as ``NR``, ``NF``, ``FILENAME``, and only a small, specific set of them is defined by Miller. -Namely, Miller supports the following five built-in variables for ``filter`` and ``put``, all ``awk``-inspired: ``NF``, ``NR``, ``FNR``, ``FILENUM``, and ``FILENAME``, as well as the mathematical constants ``M_PI`` and ``M_E``. Lastly, the ``ENV`` hashmap allows read access to environment variables, e.g. ``ENV["HOME"]`` or ``ENV["foo_".$hostname]``. +Namely, Miller supports the following five built-in variables for :doc:`filter and put `, all ``awk``-inspired: ``NF``, ``NR``, ``FNR``, ``FILENUM``, and ``FILENAME``, as well as the mathematical constants ``M_PI`` and ``M_E``. Lastly, the ``ENV`` hashmap allows read access to environment variables, e.g. ``ENV["HOME"]`` or ``ENV["foo_".$hostname]``. :: @@ -359,7 +359,7 @@ These are all **read-only** for the ``mlr put`` and ``mlr filter`` DSLs: they ma Field names ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -Names of fields within stream records must be specified using a ``$`` in ``filter`` and ``put`` expressions, even though the dollar signs don't appear in the data stream itself. For integer-indexed data, this looks like ``awk``'s ``$1,$2,$3``, except that Miller allows non-numeric names such as ``$quantity`` or ``$hostname``. Likewise, enclose string literals in double quotes in ``filter`` expressions even though they don't appear in file data. In particular, ``mlr filter '$x=="abc"'`` passes through the record ``x=abc``. +Names of fields within stream records must be specified using a ``$`` in :doc:`filter and put expressions `, even though the dollar signs don't appear in the data stream itself. For integer-indexed data, this looks like ``awk``'s ``$1,$2,$3``, except that Miller allows non-numeric names such as ``$quantity`` or ``$hostname``. Likewise, enclose string literals in double quotes in ``filter`` expressions even though they don't appear in file data. In particular, ``mlr filter '$x=="abc"'`` passes through the record ``x=abc``. If field names have **special characters** such as ``.`` then you can use braces, e.g. ``'${field.name}'``. @@ -3230,9 +3230,9 @@ You can get a list of all functions using **mlr -F**. -.. _reference-dsl-? :: +.. _reference-dsl-question-mark-colon: -? : +\? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ :: diff --git a/docs/reference-dsl.rst.in b/docs/reference-dsl.rst.in index 2a4cd8274..1f8025142 100644 --- a/docs/reference-dsl.rst.in +++ b/docs/reference-dsl.rst.in @@ -191,7 +191,7 @@ Built-in variables These are written all in capital letters, such as ``NR``, ``NF``, ``FILENAME``, and only a small, specific set of them is defined by Miller. -Namely, Miller supports the following five built-in variables for ``filter`` and ``put``, all ``awk``-inspired: ``NF``, ``NR``, ``FNR``, ``FILENUM``, and ``FILENAME``, as well as the mathematical constants ``M_PI`` and ``M_E``. Lastly, the ``ENV`` hashmap allows read access to environment variables, e.g. ``ENV["HOME"]`` or ``ENV["foo_".$hostname]``. +Namely, Miller supports the following five built-in variables for :doc:`filter and put `, all ``awk``-inspired: ``NF``, ``NR``, ``FNR``, ``FILENUM``, and ``FILENAME``, as well as the mathematical constants ``M_PI`` and ``M_E``. Lastly, the ``ENV`` hashmap allows read access to environment variables, e.g. ``ENV["HOME"]`` or ``ENV["foo_".$hostname]``. :: @@ -220,7 +220,7 @@ These are all **read-only** for the ``mlr put`` and ``mlr filter`` DSLs: they ma Field names ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -Names of fields within stream records must be specified using a ``$`` in ``filter`` and ``put`` expressions, even though the dollar signs don't appear in the data stream itself. For integer-indexed data, this looks like ``awk``'s ``$1,$2,$3``, except that Miller allows non-numeric names such as ``$quantity`` or ``$hostname``. Likewise, enclose string literals in double quotes in ``filter`` expressions even though they don't appear in file data. In particular, ``mlr filter '$x=="abc"'`` passes through the record ``x=abc``. +Names of fields within stream records must be specified using a ``$`` in :doc:`filter and put expressions `, even though the dollar signs don't appear in the data stream itself. For integer-indexed data, this looks like ``awk``'s ``$1,$2,$3``, except that Miller allows non-numeric names such as ``$quantity`` or ``$hostname``. Likewise, enclose string literals in double quotes in ``filter`` expressions even though they don't appear in file data. In particular, ``mlr filter '$x=="abc"'`` passes through the record ``x=abc``. If field names have **special characters** such as ``.`` then you can use braces, e.g. ``'${field.name}'``.