Attempt to unbreak readthedocs build

This commit is contained in:
John Kerl 2021-05-31 01:41:01 -04:00
parent 5da252172b
commit c7556cda26
29 changed files with 201 additions and 201 deletions

View file

@ -344,14 +344,14 @@ Often we want to print output to the screen. Miller does this by default, as we'
Sometimes we want to print output to another file: just use **> outputfilenamegoeshere** at the end of your command:
.. code-block::
.. code-block:: bash
:emphasize-lines: 1,1
% mlr --icsv --opprint cat example.csv > newfile.csv
# Output goes to the new file;
# nothing is printed to the screen.
.. code-block::
.. code-block:: bash
:emphasize-lines: 1,1
% cat newfile.csv
@ -369,12 +369,12 @@ Sometimes we want to print output to another file: just use **> outputfilenamego
Other times we just want our files to be **changed in-place**: just use **mlr -I**:
.. code-block::
.. code-block:: bash
:emphasize-lines: 1,1
% cp example.csv newfile.txt
.. code-block::
.. code-block:: bash
:emphasize-lines: 1,1
% cat newfile.txt
@ -390,12 +390,12 @@ Other times we just want our files to be **changed in-place**: just use **mlr -I
yellow,circle,1,87,63.5058,8.3350
purple,square,0,91,72.3735,8.2430
.. code-block::
.. code-block:: bash
:emphasize-lines: 1,1
% mlr -I --icsv --opprint cat newfile.txt
.. code-block::
.. code-block:: bash
:emphasize-lines: 1,1
% cat newfile.txt
@ -413,7 +413,7 @@ Other times we just want our files to be **changed in-place**: just use **mlr -I
Also using ``mlr -I`` you can bulk-operate on lots of files: e.g.:
.. code-block::
.. code-block:: bash
:emphasize-lines: 1,1
mlr -I --csv cut -x -f unwanted_column_name *.csv
@ -462,7 +462,7 @@ What's a CSV file, really? It's an array of rows, or *records*, each being a lis
For example, if you have:
.. code-block::
.. code-block:: bash
shape,flag,index
circle,1,24
@ -470,7 +470,7 @@ For example, if you have:
then that's a way of saying:
.. code-block::
.. code-block:: bash
shape=circle,flag=1,index=24
shape=square,flag=0,index=36
@ -479,7 +479,7 @@ Data written this way are called **DKVP**, for *delimited key-value pairs*.
We've also already seen other ways to write the same data:
.. code-block::
.. code-block:: bash
CSV PPRINT JSON
shape,flag,index shape flag index [

View file

@ -97,14 +97,14 @@ Often we want to print output to the screen. Miller does this by default, as we'
Sometimes we want to print output to another file: just use **> outputfilenamegoeshere** at the end of your command:
.. code-block::
.. code-block:: bash
:emphasize-lines: 1,1
% mlr --icsv --opprint cat example.csv > newfile.csv
# Output goes to the new file;
# nothing is printed to the screen.
.. code-block::
.. code-block:: bash
:emphasize-lines: 1,1
% cat newfile.csv
@ -122,12 +122,12 @@ Sometimes we want to print output to another file: just use **> outputfilenamego
Other times we just want our files to be **changed in-place**: just use **mlr -I**:
.. code-block::
.. code-block:: bash
:emphasize-lines: 1,1
% cp example.csv newfile.txt
.. code-block::
.. code-block:: bash
:emphasize-lines: 1,1
% cat newfile.txt
@ -143,12 +143,12 @@ Other times we just want our files to be **changed in-place**: just use **mlr -I
yellow,circle,1,87,63.5058,8.3350
purple,square,0,91,72.3735,8.2430
.. code-block::
.. code-block:: bash
:emphasize-lines: 1,1
% mlr -I --icsv --opprint cat newfile.txt
.. code-block::
.. code-block:: bash
:emphasize-lines: 1,1
% cat newfile.txt
@ -166,7 +166,7 @@ Other times we just want our files to be **changed in-place**: just use **mlr -I
Also using ``mlr -I`` you can bulk-operate on lots of files: e.g.:
.. code-block::
.. code-block:: bash
:emphasize-lines: 1,1
mlr -I --csv cut -x -f unwanted_column_name *.csv
@ -190,7 +190,7 @@ What's a CSV file, really? It's an array of rows, or *records*, each being a lis
For example, if you have:
.. code-block::
.. code-block:: bash
shape,flag,index
circle,1,24
@ -198,7 +198,7 @@ For example, if you have:
then that's a way of saying:
.. code-block::
.. code-block:: bash
shape=circle,flag=1,index=24
shape=square,flag=0,index=36
@ -207,7 +207,7 @@ Data written this way are called **DKVP**, for *delimited key-value pairs*.
We've also already seen other ways to write the same data:
.. code-block::
.. code-block:: bash
CSV PPRINT JSON
shape,flag,index shape flag index [

View file

@ -24,5 +24,5 @@ help:
# Catch-all target: route all unknown targets to Sphinx using the new
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
%: Makefile
##### temp test ./genrst
#### temp ./genrst
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)

View file

@ -72,7 +72,7 @@ Miller has been built on Windows using MSYS2: http://www.msys2.org/. You can in
You will first need to install MSYS2: http://www.msys2.org/. Then, start an MSYS2 shell, e.g. (supposing you installed MSYS2 to ``C:\msys2\``) run ``C:\msys2\mingw64.exe``. Within the MSYS2 shell, you can run the following to install dependent packages:
.. code-block::
.. code-block:: bash
pacman -Syu
pacman -Su
@ -90,13 +90,13 @@ There is a unit-test false-negative issue involving the semantics of the ``mkste
Within MSYS2 you can run ``mlr``: simply copy it from the ``c`` subdirectory to your desired location somewhere within your MSYS2 ``$PATH``. To run ``mlr`` outside of MSYS2, just as with precompiled binaries as described above, you'll need ``msys-2.0.dll``. One way to do this is to augment your path:
.. code-block::
.. code-block:: bash
C:\> set PATH=%PATH%;\msys64\mingw64\bin
Another way to do it is to copy the Miller executable and the DLL to the same directory:
.. code-block::
.. code-block:: bash
C:\> mkdir \mbin
C:\> copy \msys64\mingw64\bin\msys-2.0.dll \mbin
@ -181,7 +181,7 @@ In this example I am using version 3.4.0; of course that will change for subsequ
* Similarly for ``macports``: https://github.com/macports/macports-ports/blob/master/textproc/miller/Portfile.
* Social-media updates.
.. code-block::
.. code-block:: bash
git remote add upstream https://github.com/Homebrew/homebrew-core # one-time setup only
git fetch upstream

View file

@ -69,7 +69,7 @@ Miller has been built on Windows using MSYS2: http://www.msys2.org/. You can in
You will first need to install MSYS2: http://www.msys2.org/. Then, start an MSYS2 shell, e.g. (supposing you installed MSYS2 to ``C:\msys2\``) run ``C:\msys2\mingw64.exe``. Within the MSYS2 shell, you can run the following to install dependent packages:
.. code-block::
.. code-block:: bash
pacman -Syu
pacman -Su
@ -87,13 +87,13 @@ There is a unit-test false-negative issue involving the semantics of the ``mkste
Within MSYS2 you can run ``mlr``: simply copy it from the ``c`` subdirectory to your desired location somewhere within your MSYS2 ``$PATH``. To run ``mlr`` outside of MSYS2, just as with precompiled binaries as described above, you'll need ``msys-2.0.dll``. One way to do this is to augment your path:
.. code-block::
.. code-block:: bash
C:\> set PATH=%PATH%;\msys64\mingw64\bin
Another way to do it is to copy the Miller executable and the DLL to the same directory:
.. code-block::
.. code-block:: bash
C:\> mkdir \mbin
C:\> copy \msys64\mingw64\bin\msys-2.0.dll \mbin
@ -178,7 +178,7 @@ In this example I am using version 3.4.0; of course that will change for subsequ
* Similarly for ``macports``: https://github.com/macports/macports-ports/blob/master/textproc/miller/Portfile.
* Social-media updates.
.. code-block::
.. code-block:: bash
git remote add upstream https://github.com/Homebrew/homebrew-core # one-time setup only
git fetch upstream

View file

@ -1080,13 +1080,13 @@ Parsing log-file output
This, of course, depends highly on what's in your log files. But, as an example, suppose you have log-file lines such as
.. code-block::
.. code-block:: bash
2015-10-08 08:29:09,445 INFO com.company.path.to.ClassName @ [sometext] various/sorts/of data {& punctuation} hits=1 status=0 time=2.378
I prefer to pre-filter with ``grep`` and/or ``sed`` to extract the structured text, then hand that to Miller. Example:
.. code-block::
.. code-block:: bash
grep 'various sorts' *.log | sed 's/.*} //' | mlr --fs space --repifs --oxtab stats1 -a min,p10,p50,p90,max -f time -g status
@ -1118,7 +1118,7 @@ The recursive function for the Fibonacci sequence is famous for its computationa
produces output like this:
.. code-block::
.. code-block:: bash
i o fcount seconds_delta
1 1 1 0
@ -1175,7 +1175,7 @@ Note that the time it takes to evaluate the function is blowing up exponentially
with output like this:
.. code-block::
.. code-block:: bash
i o fcount seconds_delta
1 1 1 0

View file

@ -323,13 +323,13 @@ Parsing log-file output
This, of course, depends highly on what's in your log files. But, as an example, suppose you have log-file lines such as
.. code-block::
.. code-block:: bash
2015-10-08 08:29:09,445 INFO com.company.path.to.ClassName @ [sometext] various/sorts/of data {& punctuation} hits=1 status=0 time=2.378
I prefer to pre-filter with ``grep`` and/or ``sed`` to extract the structured text, then hand that to Miller. Example:
.. code-block::
.. code-block:: bash
grep 'various sorts' *.log | sed 's/.*} //' | mlr --fs space --repifs --oxtab stats1 -a min,p10,p50,p90,max -f time -g status
@ -344,7 +344,7 @@ POKI_INCLUDE_ESCAPED(data/fibo-uncached.sh)HERE
produces output like this:
.. code-block::
.. code-block:: bash
i o fcount seconds_delta
1 1 1 0
@ -382,7 +382,7 @@ POKI_INCLUDE_ESCAPED(data/fibo-cached.sh)HERE
with output like this:
.. code-block::
.. code-block:: bash
i o fcount seconds_delta
1 1 1 0

View file

@ -9,7 +9,7 @@ Randomly selecting words from a list
Given this `word list <https://github.com/johnkerl/miller/blob/master/docs/data/english-words.txt>`_, first take a look to see what the first few lines look like:
.. code-block::
.. code-block:: bash
:emphasize-lines: 1,1
$ head data/english-words.txt
@ -26,7 +26,7 @@ Given this `word list <https://github.com/johnkerl/miller/blob/master/docs/data/
Then the following will randomly sample ten words with four to eight characters in them:
.. code-block::
.. code-block:: bash
:emphasize-lines: 1,1
$ mlr --from data/english-words.txt --nidx filter -S 'n=strlen($1);4<=n&&n<=8' then sample -k 10
@ -48,7 +48,7 @@ These are simple *n*-grams as `described here <http://johnkerl.org/randspell/ran
The idea is that words from the input file are consumed, then taken apart and pasted back together in ways which imitate the letter-to-letter transitions found in the word list -- giving us automatically generated words in the same vein as *bromance* and *spork*:
.. code-block::
.. code-block:: bash
:emphasize-lines: 1,1
$ mlr --nidx --from ./ngrams/gsl-2000.txt put -q -f ./ngrams/ngfuncs.mlr -f ./ngrams/ng5.mlr
@ -526,7 +526,7 @@ At standard resolution this makes a nice little ASCII plot:
But using a very small font size (as small as my Mac will let me go), and by choosing the coordinates to zoom in on a particular part of the complex plane, we can get a nice little picture:
.. code-block::
.. code-block:: bash
#!/bin/bash
# Get the number of rows and columns from the terminal window dimensions

View file

@ -6,7 +6,7 @@ Randomly selecting words from a list
Given this `word list <https://github.com/johnkerl/miller/blob/master/docs/data/english-words.txt>`_, first take a look to see what the first few lines look like:
.. code-block::
.. code-block:: bash
:emphasize-lines: 1,1
$ head data/english-words.txt
@ -23,7 +23,7 @@ Given this `word list <https://github.com/johnkerl/miller/blob/master/docs/data/
Then the following will randomly sample ten words with four to eight characters in them:
.. code-block::
.. code-block:: bash
:emphasize-lines: 1,1
$ mlr --from data/english-words.txt --nidx filter -S 'n=strlen($1);4<=n&&n<=8' then sample -k 10
@ -45,7 +45,7 @@ These are simple *n*-grams as `described here <http://johnkerl.org/randspell/ran
The idea is that words from the input file are consumed, then taken apart and pasted back together in ways which imitate the letter-to-letter transitions found in the word list -- giving us automatically generated words in the same vein as *bromance* and *spork*:
.. code-block::
.. code-block:: bash
:emphasize-lines: 1,1
$ mlr --nidx --from ./ngrams/gsl-2000.txt put -q -f ./ngrams/ngfuncs.mlr -f ./ngrams/ng5.mlr
@ -135,7 +135,7 @@ POKI_RUN_COMMAND{{mlr -n put -f ./programs/mand.mlr}}HERE
But using a very small font size (as small as my Mac will let me go), and by choosing the coordinates to zoom in on a particular part of the complex plane, we can get a nice little picture:
.. code-block::
.. code-block:: bash
#!/bin/bash
# Get the number of rows and columns from the terminal window dimensions

View file

@ -9,30 +9,30 @@ How to use .mlrrc
Suppose you always use CSV files. Then instead of always having to type ``--csv`` as in
.. code-block::
.. code-block:: bash
:emphasize-lines: 1,1
mlr --csv cut -x -f extra mydata.csv
.. code-block::
.. code-block:: bash
:emphasize-lines: 1,1
mlr --csv sort -n id mydata.csv
and so on, you can instead put the following into your ``$HOME/.mlrrc``:
.. code-block::
.. code-block:: bash
--csv
Then you can just type things like
.. code-block::
.. code-block:: bash
:emphasize-lines: 1,1
mlr cut -x -f extra mydata.csv
.. code-block::
.. code-block:: bash
:emphasize-lines: 1,1
mlr sort -n id mydata.csv

View file

@ -6,30 +6,30 @@ How to use .mlrrc
Suppose you always use CSV files. Then instead of always having to type ``--csv`` as in
.. code-block::
.. code-block:: bash
:emphasize-lines: 1,1
mlr --csv cut -x -f extra mydata.csv
.. code-block::
.. code-block:: bash
:emphasize-lines: 1,1
mlr --csv sort -n id mydata.csv
and so on, you can instead put the following into your ``$HOME/.mlrrc``:
.. code-block::
.. code-block:: bash
--csv
Then you can just type things like
.. code-block::
.. code-block:: bash
:emphasize-lines: 1,1
mlr cut -x -f extra mydata.csv
.. code-block::
.. code-block:: bash
:emphasize-lines: 1,1
mlr sort -n id mydata.csv

View file

@ -307,7 +307,7 @@ Note that running a subprocess on every record takes a non-trivial amount of tim
..
hard-coded, not live-code, since %N doesn't exist on all platforms
.. code-block::
.. code-block:: bash
$ mlr --opprint put '$t=system("date +%s.%N")' then step -a delta -f t data/small
a b i x y t t_delta
@ -317,7 +317,7 @@ Note that running a subprocess on every record takes a non-trivial amount of tim
eks wye 4 0.38139939387114097 0.13418874328430463 1568774318.516547441 0.000929
wye pan 5 0.5732889198020006 0.8636244699032729 1568774318.517518828 0.000971
.. code-block::
.. code-block:: bash
$ mlr --opprint put '$t=systime()' then step -a delta -f t data/small
a b i x y t t_delta

View file

@ -67,7 +67,7 @@ Note that running a subprocess on every record takes a non-trivial amount of tim
..
hard-coded, not live-code, since %N doesn't exist on all platforms
.. code-block::
.. code-block:: bash
$ mlr --opprint put '$t=system("date +%s.%N")' then step -a delta -f t data/small
a b i x y t t_delta
@ -77,7 +77,7 @@ Note that running a subprocess on every record takes a non-trivial amount of tim
eks wye 4 0.38139939387114097 0.13418874328430463 1568774318.516547441 0.000929
wye pan 5 0.5732889198020006 0.8636244699032729 1568774318.517518828 0.000971
.. code-block::
.. code-block:: bash
$ mlr --opprint put '$t=systime()' then step -a delta -f t data/small
a b i x y t t_delta

View file

@ -670,13 +670,13 @@ XML, JSON, etc. are, by contrast, all **recursive** or **nested** data structure
Now, you can put tabular data into these formats -- since list-of-key-value-pairs is one of the things representable in XML or JSON. Example:
.. code-block::
.. code-block:: bash
# DKVP
x=1,y=2
z=3
.. code-block::
.. code-block:: bash
# XML
<table>
@ -695,7 +695,7 @@ Now, you can put tabular data into these formats -- since list-of-key-value-pair
</record>
</table>
.. code-block::
.. code-block:: bash
# JSON
[{"x":1,"y":2},{"z":3}]

View file

@ -278,13 +278,13 @@ XML, JSON, etc. are, by contrast, all **recursive** or **nested** data structure
Now, you can put tabular data into these formats -- since list-of-key-value-pairs is one of the things representable in XML or JSON. Example:
.. code-block::
.. code-block:: bash
# DKVP
x=1,y=2
z=3
.. code-block::
.. code-block:: bash
# XML
<table>
@ -303,7 +303,7 @@ Now, you can put tabular data into these formats -- since list-of-key-value-pair
</record>
</table>
.. code-block::
.. code-block:: bash
# JSON
[{"x":1,"y":2},{"z":3}]

View file

@ -130,21 +130,21 @@ Miller's default file format is DKVP, for **delimited key-value pairs**. Example
Such data are easy to generate, e.g. in Ruby with
.. code-block::
.. code-block:: bash
puts "host=#{hostname},seconds=#{t2-t1},message=#{msg}"
.. code-block::
.. code-block:: bash
puts mymap.collect{|k,v| "#{k}=#{v}"}.join(',')
or ``print`` statements in various languages, e.g.
.. code-block::
.. code-block:: bash
echo "type=3,user=$USER,date=$date\n";
.. code-block::
.. code-block:: bash
logger.log("type=3,user=$USER,date=$date\n");
@ -152,7 +152,7 @@ Fields lacking an IPS will have positional index (starting at 1) used as the key
As discussed in :doc:`record-heterogeneity`, Miller handles changes of field names within the same data stream. But using DKVP format this is particularly natural. One of my favorite use-cases for Miller is in application/server logs, where I log all sorts of lines such as
.. code-block::
.. code-block:: bash
resource=/path/to/file,loadsec=0.45,ok=true
record_count=100, resource=/path/to/file

View file

@ -57,21 +57,21 @@ POKI_RUN_COMMAND{{mlr cat data/small}}HERE
Such data are easy to generate, e.g. in Ruby with
.. code-block::
.. code-block:: bash
puts "host=#{hostname},seconds=#{t2-t1},message=#{msg}"
.. code-block::
.. code-block:: bash
puts mymap.collect{|k,v| "#{k}=#{v}"}.join(',')
or ``print`` statements in various languages, e.g.
.. code-block::
.. code-block:: bash
echo "type=3,user=$USER,date=$date\n";
.. code-block::
.. code-block:: bash
logger.log("type=3,user=$USER,date=$date\n");
@ -79,7 +79,7 @@ Fields lacking an IPS will have positional index (starting at 1) used as the key
As discussed in :doc:`record-heterogeneity`, Miller handles changes of field names within the same data stream. But using DKVP format this is particularly natural. One of my favorite use-cases for Miller is in application/server logs, where I log all sorts of lines such as
.. code-block::
.. code-block:: bash
resource=/path/to/file,loadsec=0.45,ok=true
record_count=100, resource=/path/to/file

View file

@ -9,38 +9,38 @@ Prebuilt executables via package managers
`Homebrew <https://brew.sh/>`_ installation support for OSX is available via
.. code-block::
.. code-block:: bash
:emphasize-lines: 1,1
brew update && brew install miller
...and also via `MacPorts <https://www.macports.org/>`_:
.. code-block::
.. code-block:: bash
:emphasize-lines: 1,1
sudo port selfupdate && sudo port install miller
You may already have the ``mlr`` executable available in your platform's package manager on NetBSD, Debian Linux, Ubuntu Xenial and upward, Arch Linux, or perhaps other distributions. For example, on various Linux distributions you might do one of the following:
.. code-block::
.. code-block:: bash
:emphasize-lines: 1,1
sudo apt-get install miller
.. code-block::
.. code-block:: bash
:emphasize-lines: 1,1
sudo apt install miller
.. code-block::
.. code-block:: bash
:emphasize-lines: 1,1
sudo yum install miller
On Windows, Miller is available via `Chocolatey <https://chocolatey.org/>`_:
.. code-block::
.. code-block:: bash
:emphasize-lines: 1,1
choco install miller

View file

@ -6,38 +6,38 @@ Prebuilt executables via package managers
`Homebrew <https://brew.sh/>`_ installation support for OSX is available via
.. code-block::
.. code-block:: bash
:emphasize-lines: 1,1
brew update && brew install miller
...and also via `MacPorts <https://www.macports.org/>`_:
.. code-block::
.. code-block:: bash
:emphasize-lines: 1,1
sudo port selfupdate && sudo port install miller
You may already have the ``mlr`` executable available in your platform's package manager on NetBSD, Debian Linux, Ubuntu Xenial and upward, Arch Linux, or perhaps other distributions. For example, on various Linux distributions you might do one of the following:
.. code-block::
.. code-block:: bash
:emphasize-lines: 1,1
sudo apt-get install miller
.. code-block::
.. code-block:: bash
:emphasize-lines: 1,1
sudo apt install miller
.. code-block::
.. code-block:: bash
:emphasize-lines: 1,1
sudo yum install miller
On Windows, Miller is available via `Chocolatey <https://chocolatey.org/>`_:
.. code-block::
.. code-block:: bash
:emphasize-lines: 1,1
choco install miller

View file

@ -6,70 +6,70 @@ Quick examples
Column select:
.. code-block::
.. code-block:: bash
:emphasize-lines: 1,1
% mlr --csv cut -f hostname,uptime mydata.csv
Add new columns as function of other columns:
.. code-block::
.. code-block:: bash
:emphasize-lines: 1,1
% mlr --nidx put '$sum = $7 < 0.0 ? 3.5 : $7 + 2.1*$8' *.dat
Row filter:
.. code-block::
.. code-block:: bash
:emphasize-lines: 1,1
% mlr --csv filter '$status != "down" && $upsec >= 10000' *.csv
Apply column labels and pretty-print:
.. code-block::
.. code-block:: bash
:emphasize-lines: 1,1
% grep -v '^#' /etc/group | mlr --ifs : --nidx --opprint label group,pass,gid,member then sort -f group
Join multiple data sources on key columns:
.. code-block::
.. code-block:: bash
:emphasize-lines: 1,1
% mlr join -j account_id -f accounts.dat then group-by account_name balances.dat
Multiple formats including JSON:
.. code-block::
.. code-block:: bash
:emphasize-lines: 1,1
% mlr --json put '$attr = sub($attr, "([0-9]+)_([0-9]+)_.*", "\1:\2")' data/*.json
Aggregate per-column statistics:
.. code-block::
.. code-block:: bash
:emphasize-lines: 1,1
% mlr stats1 -a min,mean,max,p10,p50,p90 -f flag,u,v data/*
Linear regression:
.. code-block::
.. code-block:: bash
:emphasize-lines: 1,1
% mlr stats2 -a linreg-pca -f u,v -g shape data/*
Aggregate custom per-column statistics:
.. code-block::
.. code-block:: bash
:emphasize-lines: 1,1
% mlr put -q '@sum[$a][$b] += $x; end {emit @sum, "a", "b"}' data/*
Iterate over data using DSL expressions:
.. code-block::
.. code-block:: bash
:emphasize-lines: 1,1
% mlr --from estimates.tbl put '
@ -83,35 +83,35 @@ Iterate over data using DSL expressions:
Run DSL expressions from a script file:
.. code-block::
.. code-block:: bash
:emphasize-lines: 1,1
% mlr --from infile.dat put -f analyze.mlr
Split/reduce output to multiple filenames:
.. code-block::
.. code-block:: bash
:emphasize-lines: 1,1
% mlr --from infile.dat put 'tee > "./taps/data-".$a."-".$b, $*'
Compressed I/O:
.. code-block::
.. code-block:: bash
:emphasize-lines: 1,1
% mlr --from infile.dat put 'tee | "gzip > ./taps/data-".$a."-".$b.".gz", $*'
Interoperate with other data-processing tools using standard pipes:
.. code-block::
.. code-block:: bash
:emphasize-lines: 1,1
% mlr --from infile.dat put -q '@v=$*; dump | "jq .[]"'
Tap/trace:
.. code-block::
.. code-block:: bash
:emphasize-lines: 1,1
% mlr --from infile.dat put '(NR % 1000 == 0) { print > stderr, "Checkpoint ".NR}'

View file

@ -3,70 +3,70 @@ Quick examples
Column select:
.. code-block::
.. code-block:: bash
:emphasize-lines: 1,1
% mlr --csv cut -f hostname,uptime mydata.csv
Add new columns as function of other columns:
.. code-block::
.. code-block:: bash
:emphasize-lines: 1,1
% mlr --nidx put '$sum = $7 < 0.0 ? 3.5 : $7 + 2.1*$8' *.dat
Row filter:
.. code-block::
.. code-block:: bash
:emphasize-lines: 1,1
% mlr --csv filter '$status != "down" && $upsec >= 10000' *.csv
Apply column labels and pretty-print:
.. code-block::
.. code-block:: bash
:emphasize-lines: 1,1
% grep -v '^#' /etc/group | mlr --ifs : --nidx --opprint label group,pass,gid,member then sort -f group
Join multiple data sources on key columns:
.. code-block::
.. code-block:: bash
:emphasize-lines: 1,1
% mlr join -j account_id -f accounts.dat then group-by account_name balances.dat
Multiple formats including JSON:
.. code-block::
.. code-block:: bash
:emphasize-lines: 1,1
% mlr --json put '$attr = sub($attr, "([0-9]+)_([0-9]+)_.*", "\1:\2")' data/*.json
Aggregate per-column statistics:
.. code-block::
.. code-block:: bash
:emphasize-lines: 1,1
% mlr stats1 -a min,mean,max,p10,p50,p90 -f flag,u,v data/*
Linear regression:
.. code-block::
.. code-block:: bash
:emphasize-lines: 1,1
% mlr stats2 -a linreg-pca -f u,v -g shape data/*
Aggregate custom per-column statistics:
.. code-block::
.. code-block:: bash
:emphasize-lines: 1,1
% mlr put -q '@sum[$a][$b] += $x; end {emit @sum, "a", "b"}' data/*
Iterate over data using DSL expressions:
.. code-block::
.. code-block:: bash
:emphasize-lines: 1,1
% mlr --from estimates.tbl put '
@ -80,35 +80,35 @@ Iterate over data using DSL expressions:
Run DSL expressions from a script file:
.. code-block::
.. code-block:: bash
:emphasize-lines: 1,1
% mlr --from infile.dat put -f analyze.mlr
Split/reduce output to multiple filenames:
.. code-block::
.. code-block:: bash
:emphasize-lines: 1,1
% mlr --from infile.dat put 'tee > "./taps/data-".$a."-".$b, $*'
Compressed I/O:
.. code-block::
.. code-block:: bash
:emphasize-lines: 1,1
% mlr --from infile.dat put 'tee | "gzip > ./taps/data-".$a."-".$b.".gz", $*'
Interoperate with other data-processing tools using standard pipes:
.. code-block::
.. code-block:: bash
:emphasize-lines: 1,1
% mlr --from infile.dat put -q '@v=$*; dump | "jq .[]"'
Tap/trace:
.. code-block::
.. code-block:: bash
:emphasize-lines: 1,1
% mlr --from infile.dat put '(NR % 1000 == 0) { print > stderr, "Checkpoint ".NR}'

View file

@ -286,17 +286,17 @@ Semicolons are required between statements even if those statements are on separ
Bodies for all compound statements must be enclosed in **curly braces**, even if the body is a single statement:
.. code-block::
.. code-block:: bash
mlr put 'if ($x == 1) $y = 2' # Syntax error
.. code-block::
.. code-block:: bash
mlr put 'if ($x == 1) { $y = 2 }' # This is OK
Bodies for compound statements may be empty:
.. code-block::
.. code-block:: bash
mlr put 'if ($x == 1) { }' # This no-op is syntactically acceptable
@ -956,7 +956,7 @@ Local variables can be defined either untyped as in ``x = 1``, or typed as in ``
The reason for ``num`` is that ``int`` and ``float`` typedecls are very precise:
.. code-block::
.. code-block:: bash
float a = 0; # Runtime error since 0 is int not float
int b = 1.0; # Runtime error since 1.0 is float not int
@ -967,7 +967,7 @@ A suggestion is to use ``num`` for general use when you want numeric content, an
The ``var`` type declaration indicates no type restrictions, e.g. ``var x = 1`` has the same type restrictions on ``x`` as ``x = 1``. The difference is in intentional shadowing: if you have ``x = 1`` in outer scope and ``x = 2`` in inner scope (e.g. within a for-loop or an if-statement) then outer-scope ``x`` has value 2 after the second assignment. But if you have ``var x = 2`` in the inner scope, then you are declaring a variable scoped to the inner block.) For example:
.. code-block::
.. code-block:: bash
x = 1;
if (NR == 4) {
@ -975,7 +975,7 @@ The ``var`` type declaration indicates no type restrictions, e.g. ``var x = 1``
}
print x; # Value of x is now two
.. code-block::
.. code-block:: bash
x = 1;
if (NR == 4) {
@ -985,7 +985,7 @@ The ``var`` type declaration indicates no type restrictions, e.g. ``var x = 1``
Likewise function arguments can optionally be typed, with type enforced when the function is called:
.. code-block::
.. code-block:: bash
func f(map m, int i) {
...
@ -1000,7 +1000,7 @@ Likewise function arguments can optionally be typed, with type enforced when the
Thirdly, function return values can be type-checked at the point of ``return`` using ``:`` and a typedecl after the parameter list:
.. code-block::
.. code-block:: bash
func f(map m, int i): bool {
...
@ -1395,7 +1395,7 @@ Operator precedence
Operators are listed in order of decreasing precedence, highest first.
.. code-block::
.. code-block:: bash
Operators Associativity
--------- -------------
@ -1498,11 +1498,11 @@ If-statements
These are again reminiscent of ``awk``. Pattern-action blocks are a special case of ``if`` with no ``elif`` or ``else`` blocks, no ``if`` keyword, and parentheses optional around the boolean expression:
.. code-block::
.. code-block:: bash
mlr put 'NR == 4 {$foo = "bar"}'
.. code-block::
.. code-block:: bash
mlr put 'if (NR == 4) {$foo = "bar"}'

View file

@ -121,17 +121,17 @@ POKI_INCLUDE_AND_RUN_ESCAPED(data/trailing-commas.sh)HERE
Bodies for all compound statements must be enclosed in **curly braces**, even if the body is a single statement:
.. code-block::
.. code-block:: bash
mlr put 'if ($x == 1) $y = 2' # Syntax error
.. code-block::
.. code-block:: bash
mlr put 'if ($x == 1) { $y = 2 }' # This is OK
Bodies for compound statements may be empty:
.. code-block::
.. code-block:: bash
mlr put 'if ($x == 1) { }' # This no-op is syntactically acceptable
@ -360,7 +360,7 @@ Local variables can be defined either untyped as in ``x = 1``, or typed as in ``
The reason for ``num`` is that ``int`` and ``float`` typedecls are very precise:
.. code-block::
.. code-block:: bash
float a = 0; # Runtime error since 0 is int not float
int b = 1.0; # Runtime error since 1.0 is float not int
@ -371,7 +371,7 @@ A suggestion is to use ``num`` for general use when you want numeric content, an
The ``var`` type declaration indicates no type restrictions, e.g. ``var x = 1`` has the same type restrictions on ``x`` as ``x = 1``. The difference is in intentional shadowing: if you have ``x = 1`` in outer scope and ``x = 2`` in inner scope (e.g. within a for-loop or an if-statement) then outer-scope ``x`` has value 2 after the second assignment. But if you have ``var x = 2`` in the inner scope, then you are declaring a variable scoped to the inner block.) For example:
.. code-block::
.. code-block:: bash
x = 1;
if (NR == 4) {
@ -379,7 +379,7 @@ The ``var`` type declaration indicates no type restrictions, e.g. ``var x = 1``
}
print x; # Value of x is now two
.. code-block::
.. code-block:: bash
x = 1;
if (NR == 4) {
@ -389,7 +389,7 @@ The ``var`` type declaration indicates no type restrictions, e.g. ``var x = 1``
Likewise function arguments can optionally be typed, with type enforced when the function is called:
.. code-block::
.. code-block:: bash
func f(map m, int i) {
...
@ -404,7 +404,7 @@ Likewise function arguments can optionally be typed, with type enforced when the
Thirdly, function return values can be type-checked at the point of ``return`` using ``:`` and a typedecl after the parameter list:
.. code-block::
.. code-block:: bash
func f(map m, int i): bool {
...
@ -463,7 +463,7 @@ Operator precedence
Operators are listed in order of decreasing precedence, highest first.
.. code-block::
.. code-block:: bash
Operators Associativity
--------- -------------
@ -524,11 +524,11 @@ If-statements
These are again reminiscent of ``awk``. Pattern-action blocks are a special case of ``if`` with no ``elif`` or ``else`` blocks, no ``if`` keyword, and parentheses optional around the boolean expression:
.. code-block::
.. code-block:: bash
mlr put 'NR == 4 {$foo = "bar"}'
.. code-block::
.. code-block:: bash
mlr put 'if (NR == 4) {$foo = "bar"}'

View file

@ -158,7 +158,7 @@ bootstrap
The canonical use for bootstrap sampling is to put error bars on statistical quantities, such as mean. For example:
.. code-block::
.. code-block:: bash
$ mlr --opprint stats1 -a mean,count -f u -g color data/colored-shapes.dkvp
color u_mean u_count
@ -169,7 +169,7 @@ The canonical use for bootstrap sampling is to put error bars on statistical qua
blue 0.517717 1470
orange 0.490532 303
.. code-block::
.. code-block:: bash
$ mlr --opprint bootstrap then stats1 -a mean,count -f u -g color data/colored-shapes.dkvp
color u_mean u_count
@ -180,7 +180,7 @@ The canonical use for bootstrap sampling is to put error bars on statistical qua
blue 0.512529 1496
orange 0.521030 321
.. code-block::
.. code-block:: bash
$ mlr --opprint bootstrap then stats1 -a mean,count -f u -g color data/colored-shapes.dkvp
color u_mean u_count
@ -191,7 +191,7 @@ The canonical use for bootstrap sampling is to put error bars on statistical qua
green 0.496803 1075
purple 0.486337 1199
.. code-block::
.. code-block:: bash
$ mlr --opprint bootstrap then stats1 -a mean,count -f u -g color data/colored-shapes.dkvp
color u_mean u_count

View file

@ -69,7 +69,7 @@ POKI_RUN_COMMAND{{mlr bootstrap --help}}HERE
The canonical use for bootstrap sampling is to put error bars on statistical quantities, such as mean. For example:
.. code-block::
.. code-block:: bash
$ mlr --opprint stats1 -a mean,count -f u -g color data/colored-shapes.dkvp
color u_mean u_count
@ -80,7 +80,7 @@ The canonical use for bootstrap sampling is to put error bars on statistical qua
blue 0.517717 1470
orange 0.490532 303
.. code-block::
.. code-block:: bash
$ mlr --opprint bootstrap then stats1 -a mean,count -f u -g color data/colored-shapes.dkvp
color u_mean u_count
@ -91,7 +91,7 @@ The canonical use for bootstrap sampling is to put error bars on statistical qua
blue 0.512529 1496
orange 0.521030 321
.. code-block::
.. code-block:: bash
$ mlr --opprint bootstrap then stats1 -a mean,count -f u -g color data/colored-shapes.dkvp
color u_mean u_count
@ -102,7 +102,7 @@ The canonical use for bootstrap sampling is to put error bars on statistical qua
green 0.496803 1075
purple 0.486337 1199
.. code-block::
.. code-block:: bash
$ mlr --opprint bootstrap then stats1 -a mean,count -f u -g color data/colored-shapes.dkvp
color u_mean u_count

View file

@ -39,7 +39,7 @@ Formats
Options:
.. code-block::
.. code-block:: bash
--dkvp --idkvp --odkvp
--nidx --inidx --onidx
@ -97,14 +97,14 @@ Compression
Options:
.. code-block::
.. code-block:: bash
--prepipe {command}
The prepipe command is anything which reads from standard input and produces data acceptable to Miller. Nominally this allows you to use whichever decompression utilities you have installed on your system, on a per-file basis. If the command has flags, quote them: e.g. ``mlr --prepipe 'zcat -cf'``. Examples:
.. code-block::
.. code-block:: bash
# These two produce the same output:
$ gunzip < myfile1.csv.gz | mlr cut -f hostname,uptime
@ -113,14 +113,14 @@ The prepipe command is anything which reads from standard input and produces dat
$ mlr --prepipe gunzip cut -f hostname,uptime myfile1.csv.gz myfile2.csv.gz
$ mlr --prepipe gunzip --idkvp --oxtab cut -f hostname,uptime myfile1.dat.gz myfile2.dat.gz
.. code-block::
.. code-block:: bash
# Similar to the above, but with compressed output as well as input:
$ gunzip < myfile1.csv.gz | mlr cut -f hostname,uptime | gzip > outfile.csv.gz
$ mlr --prepipe gunzip cut -f hostname,uptime myfile1.csv.gz | gzip > outfile.csv.gz
$ mlr --prepipe gunzip cut -f hostname,uptime myfile1.csv.gz myfile2.csv.gz | gzip > outfile.csv.gz
.. code-block::
.. code-block:: bash
# Similar to the above, but with different compression tools for input and output:
$ gunzip < myfile1.csv.gz | mlr cut -f hostname,uptime | xz -z > outfile.csv.xz
@ -136,7 +136,7 @@ Miller has record separators ``IRS`` and ``ORS``, field separators ``IFS`` and `
Options:
.. code-block::
.. code-block:: bash
--rs --irs --ors
--fs --ifs --ofs --repifs
@ -157,7 +157,7 @@ Number formatting
The command-line option ``--ofmt {format string}`` is the global number format for commands which generate numeric output, e.g. ``stats1``, ``stats2``, ``histogram``, and ``step``, as well as ``mlr put``. Examples:
.. code-block::
.. code-block:: bash
--ofmt %.9le --ofmt %.6lf --ofmt %.0lf
@ -200,13 +200,13 @@ then-chaining
In accord with the `Unix philosophy <http://en.wikipedia.org/wiki/Unix_philosophy>`_, you can pipe data into or out of Miller. For example:
.. code-block::
.. code-block:: bash
mlr cut --complement -f os_version *.dat | mlr sort -f hostname,uptime
You can, if you like, instead simply chain commands together using the ``then`` keyword:
.. code-block::
.. code-block:: bash
mlr cut --complement -f os_version then sort -f hostname,uptime *.dat
@ -602,25 +602,25 @@ Regex captures of the form ``\0`` through ``\9`` are supported as
* Captures have in-function context for ``sub`` and ``gsub``. For example, the first ``\1,\2`` pair belong to the first ``sub`` and the second ``\1,\2`` pair belong to the second ``sub``:
.. code-block::
.. code-block:: bash
mlr put '$b = sub($a, "(..)_(...)", "\2-\1"); $c = sub($a, "(..)_(.)(..)", ":\1:\2:\3")'
* Captures endure for the entirety of a ``put`` for the ``=~`` and ``!=~`` operators. For example, here the ``\1,\2`` are set by the ``=~`` operator and are used by both subsequent assignment statements:
.. code-block::
.. code-block:: bash
mlr put '$a =~ "(..)_(....); $b = "left_\1"; $c = "right_\2"'
* The captures are not retained across multiple puts. For example, here the ``\1,\2`` won't be expanded from the regex capture:
.. code-block::
.. code-block:: bash
mlr put '$a =~ "(..)_(....)' then {... something else ...} then put '$b = "left_\1"; $c = "right_\2"'
* Captures are ignored in ``filter`` for the ``=~`` and ``!=~`` operators. For example, there is no mechanism provided to refer to the first ``(..)`` as ``\1`` or to the second ``(....)`` as ``\2`` in the following filter statement:
.. code-block::
.. code-block:: bash
mlr filter '$a =~ "(..)_(....)'
@ -650,7 +650,7 @@ The short of it is that Miller does this transparently for you so you needn't th
Implementation details of this, for the interested: integer adds and subtracts overflow by at most one bit so it suffices to check sign-changes. Thus, Miller allows you to add and subtract arbitrary 64-bit signed integers, converting only to float precisely when the result is less than -2\ :sup:`63` or greater than 2\ :sup:`63`\ -1. Multiplies, on the other hand, can overflow by a word size and a sign-change technique does not suffice to detect overflow. Instead Miller tests whether the floating-point product exceeds the representable integer range. Now, 64-bit integers have 64-bit precision while IEEE-doubles have only 52-bit mantissas -- so, there are 53 bits including implicit leading one. The following experiment explicitly demonstrates the resolution at this range:
.. code-block::
.. code-block:: bash
64-bit integer 64-bit integer Casted to double Back to 64-bit
in hex in decimal integer

View file

@ -32,7 +32,7 @@ Formats
Options:
.. code-block::
.. code-block:: bash
--dkvp --idkvp --odkvp
--nidx --inidx --onidx
@ -72,14 +72,14 @@ Compression
Options:
.. code-block::
.. code-block:: bash
--prepipe {command}
The prepipe command is anything which reads from standard input and produces data acceptable to Miller. Nominally this allows you to use whichever decompression utilities you have installed on your system, on a per-file basis. If the command has flags, quote them: e.g. ``mlr --prepipe 'zcat -cf'``. Examples:
.. code-block::
.. code-block:: bash
# These two produce the same output:
$ gunzip < myfile1.csv.gz | mlr cut -f hostname,uptime
@ -88,14 +88,14 @@ The prepipe command is anything which reads from standard input and produces dat
$ mlr --prepipe gunzip cut -f hostname,uptime myfile1.csv.gz myfile2.csv.gz
$ mlr --prepipe gunzip --idkvp --oxtab cut -f hostname,uptime myfile1.dat.gz myfile2.dat.gz
.. code-block::
.. code-block:: bash
# Similar to the above, but with compressed output as well as input:
$ gunzip < myfile1.csv.gz | mlr cut -f hostname,uptime | gzip > outfile.csv.gz
$ mlr --prepipe gunzip cut -f hostname,uptime myfile1.csv.gz | gzip > outfile.csv.gz
$ mlr --prepipe gunzip cut -f hostname,uptime myfile1.csv.gz myfile2.csv.gz | gzip > outfile.csv.gz
.. code-block::
.. code-block:: bash
# Similar to the above, but with different compression tools for input and output:
$ gunzip < myfile1.csv.gz | mlr cut -f hostname,uptime | xz -z > outfile.csv.xz
@ -111,7 +111,7 @@ Miller has record separators ``IRS`` and ``ORS``, field separators ``IFS`` and `
Options:
.. code-block::
.. code-block:: bash
--rs --irs --ors
--fs --ifs --ofs --repifs
@ -132,7 +132,7 @@ Number formatting
The command-line option ``--ofmt {format string}`` is the global number format for commands which generate numeric output, e.g. ``stats1``, ``stats2``, ``histogram``, and ``step``, as well as ``mlr put``. Examples:
.. code-block::
.. code-block:: bash
--ofmt %.9le --ofmt %.6lf --ofmt %.0lf
@ -163,13 +163,13 @@ then-chaining
In accord with the `Unix philosophy <http://en.wikipedia.org/wiki/Unix_philosophy>`_, you can pipe data into or out of Miller. For example:
.. code-block::
.. code-block:: bash
mlr cut --complement -f os_version *.dat | mlr sort -f hostname,uptime
You can, if you like, instead simply chain commands together using the ``then`` keyword:
.. code-block::
.. code-block:: bash
mlr cut --complement -f os_version then sort -f hostname,uptime *.dat
@ -365,25 +365,25 @@ Regex captures of the form ``\0`` through ``\9`` are supported as
* Captures have in-function context for ``sub`` and ``gsub``. For example, the first ``\1,\2`` pair belong to the first ``sub`` and the second ``\1,\2`` pair belong to the second ``sub``:
.. code-block::
.. code-block:: bash
mlr put '$b = sub($a, "(..)_(...)", "\2-\1"); $c = sub($a, "(..)_(.)(..)", ":\1:\2:\3")'
* Captures endure for the entirety of a ``put`` for the ``=~`` and ``!=~`` operators. For example, here the ``\1,\2`` are set by the ``=~`` operator and are used by both subsequent assignment statements:
.. code-block::
.. code-block:: bash
mlr put '$a =~ "(..)_(....); $b = "left_\1"; $c = "right_\2"'
* The captures are not retained across multiple puts. For example, here the ``\1,\2`` won't be expanded from the regex capture:
.. code-block::
.. code-block:: bash
mlr put '$a =~ "(..)_(....)' then {... something else ...} then put '$b = "left_\1"; $c = "right_\2"'
* Captures are ignored in ``filter`` for the ``=~`` and ``!=~`` operators. For example, there is no mechanism provided to refer to the first ``(..)`` as ``\1`` or to the second ``(....)`` as ``\2`` in the following filter statement:
.. code-block::
.. code-block:: bash
mlr filter '$a =~ "(..)_(....)'
@ -413,7 +413,7 @@ The short of it is that Miller does this transparently for you so you needn't th
Implementation details of this, for the interested: integer adds and subtracts overflow by at most one bit so it suffices to check sign-changes. Thus, Miller allows you to add and subtract arbitrary 64-bit signed integers, converting only to float precisely when the result is less than -2\ :sup:`63` or greater than 2\ :sup:`63`\ -1. Multiplies, on the other hand, can overflow by a word size and a sign-change technique does not suffice to detect overflow. Instead Miller tests whether the floating-point product exceeds the representable integer range. Now, 64-bit integers have 64-bit precision while IEEE-doubles have only 52-bit mantissas -- so, there are 53 bits including implicit leading one. The following experiment explicitly demonstrates the resolution at this range:
.. code-block::
.. code-block:: bash
64-bit integer 64-bit integer Casted to double Back to 64-bit
in hex in decimal integer

View file

@ -13,7 +13,7 @@ I like to produce SQL-query output with header-column and tab delimiter: this is
For example, using default output formatting in ``mysql`` we get formatting like Miller's ``--opprint --barred``:
.. code-block::
.. code-block:: bash
:emphasize-lines: 1,1
$ mysql --database=mydb -e 'show columns in mytable'
@ -29,7 +29,7 @@ For example, using default output formatting in ``mysql`` we get formatting like
Using ``mysql``'s ``-B`` we get TSV output:
.. code-block::
.. code-block:: bash
:emphasize-lines: 1,1
$ mysql --database=mydb -B -e 'show columns in mytable' | mlr --itsvlite --opprint cat
@ -42,7 +42,7 @@ Using ``mysql``'s ``-B`` we get TSV output:
Since Miller handles TSV output, we can do as much or as little processing as we want in the SQL query, then send the rest on to Miller. This includes outputting as JSON, doing further selects/joins in Miller, doing stats, etc. etc.:
.. code-block::
.. code-block:: bash
:emphasize-lines: 1,1
$ mysql --database=mydb -B -e 'show columns in mytable' | mlr --itsvlite --ojson --jlistwrap --jvstack cat
@ -89,12 +89,12 @@ Since Miller handles TSV output, we can do as much or as little processing as we
}
]
.. code-block::
.. code-block:: bash
:emphasize-lines: 1,1
$ mysql --database=mydb -B -e 'select * from mytable' > query.tsv
.. code-block::
.. code-block:: bash
:emphasize-lines: 1,1
$ mlr --from query.tsv --t2p stats1 -a count -f id -g category,assigned_to
@ -118,7 +118,7 @@ One use of NIDX (value-only, no keys) format is for loading up SQL tables.
Create and load SQL table:
.. code-block::
.. code-block:: bash
:emphasize-lines: 1,1
mysql> CREATE TABLE abixy(
@ -130,19 +130,19 @@ Create and load SQL table:
);
Query OK, 0 rows affected (0.01 sec)
.. code-block::
.. code-block:: bash
:emphasize-lines: 1,1
bash$ mlr --onidx --fs comma cat data/medium > medium.nidx
.. code-block::
.. code-block:: bash
:emphasize-lines: 1,1
mysql> LOAD DATA LOCAL INFILE 'medium.nidx' REPLACE INTO TABLE abixy FIELDS TERMINATED BY ',' ;
Query OK, 10000 rows affected (0.07 sec)
Records: 10000 Deleted: 0 Skipped: 0 Warnings: 0
.. code-block::
.. code-block:: bash
:emphasize-lines: 1,1
mysql> SELECT COUNT(*) AS count FROM abixy;
@ -153,7 +153,7 @@ Create and load SQL table:
+-------+
1 row in set (0.00 sec)
.. code-block::
.. code-block:: bash
:emphasize-lines: 1,1
mysql> SELECT * FROM abixy LIMIT 10;
@ -174,7 +174,7 @@ Create and load SQL table:
Aggregate counts within SQL:
.. code-block::
.. code-block:: bash
:emphasize-lines: 1,1
mysql> SELECT a, b, COUNT(*) AS count FROM abixy GROUP BY a, b ORDER BY COUNT DESC;
@ -211,7 +211,7 @@ Aggregate counts within SQL:
Aggregate counts within Miller:
.. code-block::
.. code-block:: bash
:emphasize-lines: 1,1
$ mlr --opprint uniq -c -g a,b then sort -nr count data/medium
@ -234,7 +234,7 @@ Aggregate counts within Miller:
Pipe SQL output to aggregate counts within Miller:
.. code-block::
.. code-block:: bash
:emphasize-lines: 1,1
$ mysql -D miller -B -e 'select * from abixy' | mlr --itsv --opprint uniq -c -g a,b then sort -nr count

View file

@ -10,7 +10,7 @@ I like to produce SQL-query output with header-column and tab delimiter: this is
For example, using default output formatting in ``mysql`` we get formatting like Miller's ``--opprint --barred``:
.. code-block::
.. code-block:: bash
:emphasize-lines: 1,1
$ mysql --database=mydb -e 'show columns in mytable'
@ -26,7 +26,7 @@ For example, using default output formatting in ``mysql`` we get formatting like
Using ``mysql``'s ``-B`` we get TSV output:
.. code-block::
.. code-block:: bash
:emphasize-lines: 1,1
$ mysql --database=mydb -B -e 'show columns in mytable' | mlr --itsvlite --opprint cat
@ -39,7 +39,7 @@ Using ``mysql``'s ``-B`` we get TSV output:
Since Miller handles TSV output, we can do as much or as little processing as we want in the SQL query, then send the rest on to Miller. This includes outputting as JSON, doing further selects/joins in Miller, doing stats, etc. etc.:
.. code-block::
.. code-block:: bash
:emphasize-lines: 1,1
$ mysql --database=mydb -B -e 'show columns in mytable' | mlr --itsvlite --ojson --jlistwrap --jvstack cat
@ -86,12 +86,12 @@ Since Miller handles TSV output, we can do as much or as little processing as we
}
]
.. code-block::
.. code-block:: bash
:emphasize-lines: 1,1
$ mysql --database=mydb -B -e 'select * from mytable' > query.tsv
.. code-block::
.. code-block:: bash
:emphasize-lines: 1,1
$ mlr --from query.tsv --t2p stats1 -a count -f id -g category,assigned_to
@ -115,7 +115,7 @@ One use of NIDX (value-only, no keys) format is for loading up SQL tables.
Create and load SQL table:
.. code-block::
.. code-block:: bash
:emphasize-lines: 1,1
mysql> CREATE TABLE abixy(
@ -127,19 +127,19 @@ Create and load SQL table:
);
Query OK, 0 rows affected (0.01 sec)
.. code-block::
.. code-block:: bash
:emphasize-lines: 1,1
bash$ mlr --onidx --fs comma cat data/medium > medium.nidx
.. code-block::
.. code-block:: bash
:emphasize-lines: 1,1
mysql> LOAD DATA LOCAL INFILE 'medium.nidx' REPLACE INTO TABLE abixy FIELDS TERMINATED BY ',' ;
Query OK, 10000 rows affected (0.07 sec)
Records: 10000 Deleted: 0 Skipped: 0 Warnings: 0
.. code-block::
.. code-block:: bash
:emphasize-lines: 1,1
mysql> SELECT COUNT(*) AS count FROM abixy;
@ -150,7 +150,7 @@ Create and load SQL table:
+-------+
1 row in set (0.00 sec)
.. code-block::
.. code-block:: bash
:emphasize-lines: 1,1
mysql> SELECT * FROM abixy LIMIT 10;
@ -171,7 +171,7 @@ Create and load SQL table:
Aggregate counts within SQL:
.. code-block::
.. code-block:: bash
:emphasize-lines: 1,1
mysql> SELECT a, b, COUNT(*) AS count FROM abixy GROUP BY a, b ORDER BY COUNT DESC;
@ -208,7 +208,7 @@ Aggregate counts within SQL:
Aggregate counts within Miller:
.. code-block::
.. code-block:: bash
:emphasize-lines: 1,1
$ mlr --opprint uniq -c -g a,b then sort -nr count data/medium
@ -231,7 +231,7 @@ Aggregate counts within Miller:
Pipe SQL output to aggregate counts within Miller:
.. code-block::
.. code-block:: bash
:emphasize-lines: 1,1
$ mysql -D miller -B -e 'select * from abixy' | mlr --itsv --opprint uniq -c -g a,b then sort -nr count