mirror of
https://github.com/johnkerl/miller.git
synced 2026-01-24 02:36:15 +00:00
280 lines
No EOL
11 KiB
HTML
280 lines
No EOL
11 KiB
HTML
|
||
<!DOCTYPE html>
|
||
|
||
<html>
|
||
<head>
|
||
<meta charset="utf-8" />
|
||
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
|
||
<title>SQL examples — Miller 6.0.0-alpha documentation</title>
|
||
|
||
<link rel="stylesheet" href="_static/scrolls.css" type="text/css" />
|
||
<link rel="stylesheet" href="_static/pygments.css" type="text/css" />
|
||
<link rel="stylesheet" href="_static/print.css" type="text/css" />
|
||
|
||
<script id="documentation_options" data-url_root="./" src="_static/documentation_options.js"></script>
|
||
<script src="_static/jquery.js"></script>
|
||
<script src="_static/underscore.js"></script>
|
||
<script src="_static/doctools.js"></script>
|
||
<script src="_static/language_data.js"></script>
|
||
<script src="_static/theme_extras.js"></script>
|
||
<link rel="index" title="Index" href="genindex.html" />
|
||
<link rel="search" title="Search" href="search.html" />
|
||
<link rel="next" title="Data-cleaning examples" href="data-cleaning-examples.html" />
|
||
<link rel="prev" title="Log-processing examples" href="log-processing-examples.html" />
|
||
</head><body>
|
||
<div id="content">
|
||
<div class="header">
|
||
<h1 class="heading"><a href="index.html"
|
||
title="back to the documentation overview"><span>SQL examples</span></a></h1>
|
||
</div>
|
||
<div class="relnav" role="navigation" aria-label="related navigation">
|
||
<a href="log-processing-examples.html">« Log-processing examples</a> |
|
||
<a href="#">SQL examples</a>
|
||
| <a href="data-cleaning-examples.html">Data-cleaning examples »</a>
|
||
</div>
|
||
<div id="contentwrapper">
|
||
<div id="toc" role="navigation" aria-label="table of contents navigation">
|
||
<h3>Table of Contents</h3>
|
||
<ul>
|
||
<li><a class="reference internal" href="#">SQL examples</a><ul>
|
||
<li><a class="reference internal" href="#sql-output-examples">SQL-output examples</a></li>
|
||
<li><a class="reference internal" href="#sql-input-examples">SQL-input examples</a></li>
|
||
</ul>
|
||
</li>
|
||
</ul>
|
||
|
||
</div>
|
||
<div role="main">
|
||
|
||
<div class="section" id="sql-examples">
|
||
<h1>SQL examples<a class="headerlink" href="#sql-examples" title="Permalink to this headline">¶</a></h1>
|
||
<div class="section" id="sql-output-examples">
|
||
<span id="id1"></span><h2>SQL-output examples<a class="headerlink" href="#sql-output-examples" title="Permalink to this headline">¶</a></h2>
|
||
<p>I like to produce SQL-query output with header-column and tab delimiter: this is CSV but with a tab instead of a comma, also known as TSV. Then I post-process with <code class="docutils literal notranslate"><span class="pre">mlr</span> <span class="pre">--tsv</span></code> or <code class="docutils literal notranslate"><span class="pre">mlr</span> <span class="pre">--tsvlite</span></code>. This means I can do some (or all, or none) of my data processing within SQL queries, and some (or none, or all) of my data processing using Miller – whichever is most convenient for my needs at the moment.</p>
|
||
<p>For example, using default output formatting in <code class="docutils literal notranslate"><span class="pre">mysql</span></code> we get formatting like Miller’s <code class="docutils literal notranslate"><span class="pre">--opprint</span> <span class="pre">--barred</span></code>:</p>
|
||
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> mysql --database=mydb -e 'show columns in mytable'
|
||
</span> +------------------+--------------+------+-----+---------+-------+
|
||
| Field | Type | Null | Key | Default | Extra |
|
||
+------------------+--------------+------+-----+---------+-------+
|
||
| id | bigint(20) | NO | MUL | NULL | |
|
||
| category | varchar(256) | NO | | NULL | |
|
||
| is_permanent | tinyint(1) | NO | | NULL | |
|
||
| assigned_to | bigint(20) | YES | | NULL | |
|
||
| last_update_time | int(11) | YES | | NULL | |
|
||
+------------------+--------------+------+-----+---------+-------+
|
||
</pre></div>
|
||
</div>
|
||
<p>Using <code class="docutils literal notranslate"><span class="pre">mysql</span></code>’s <code class="docutils literal notranslate"><span class="pre">-B</span></code> we get TSV output:</p>
|
||
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> mysql --database=mydb -B -e 'show columns in mytable' | mlr --itsvlite --opprint cat
|
||
</span> Field Type Null Key Default Extra
|
||
id bigint(20) NO MUL NULL -
|
||
category varchar(256) NO - NULL -
|
||
is_permanent tinyint(1) NO - NULL -
|
||
assigned_to bigint(20) YES - NULL -
|
||
last_update_time int(11) YES - NULL -
|
||
</pre></div>
|
||
</div>
|
||
<p>Since Miller handles TSV output, we can do as much or as little processing as we want in the SQL query, then send the rest on to Miller. This includes outputting as JSON, doing further selects/joins in Miller, doing stats, etc. etc.:</p>
|
||
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> mysql --database=mydb -B -e 'show columns in mytable' | mlr --itsvlite --ojson --jlistwrap --jvstack cat
|
||
</span> [
|
||
{
|
||
"Field": "id",
|
||
"Type": "bigint(20)",
|
||
"Null": "NO",
|
||
"Key": "MUL",
|
||
"Default": "NULL",
|
||
"Extra": ""
|
||
},
|
||
{
|
||
"Field": "category",
|
||
"Type": "varchar(256)",
|
||
"Null": "NO",
|
||
"Key": "",
|
||
"Default": "NULL",
|
||
"Extra": ""
|
||
},
|
||
{
|
||
"Field": "is_permanent",
|
||
"Type": "tinyint(1)",
|
||
"Null": "NO",
|
||
"Key": "",
|
||
"Default": "NULL",
|
||
"Extra": ""
|
||
},
|
||
{
|
||
"Field": "assigned_to",
|
||
"Type": "bigint(20)",
|
||
"Null": "YES",
|
||
"Key": "",
|
||
"Default": "NULL",
|
||
"Extra": ""
|
||
},
|
||
{
|
||
"Field": "last_update_time",
|
||
"Type": "int(11)",
|
||
"Null": "YES",
|
||
"Key": "",
|
||
"Default": "NULL",
|
||
"Extra": ""
|
||
}
|
||
]
|
||
</pre></div>
|
||
</div>
|
||
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> mysql --database=mydb -B -e 'select * from mytable' > query.tsv
|
||
</span></pre></div>
|
||
</div>
|
||
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> mlr --from query.tsv --t2p stats1 -a count -f id -g category,assigned_to
|
||
</span> category assigned_to id_count
|
||
special 10000978 207
|
||
special 10003924 385
|
||
special 10009872 168
|
||
standard 10000978 524
|
||
standard 10003924 392
|
||
standard 10009872 108
|
||
...
|
||
</pre></div>
|
||
</div>
|
||
<p>Again, all the examples in the CSV section apply here – just change the input-format flags.</p>
|
||
</div>
|
||
<div class="section" id="sql-input-examples">
|
||
<span id="id2"></span><h2>SQL-input examples<a class="headerlink" href="#sql-input-examples" title="Permalink to this headline">¶</a></h2>
|
||
<p>One use of NIDX (value-only, no keys) format is for loading up SQL tables.</p>
|
||
<p>Create and load SQL table:</p>
|
||
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>mysql> CREATE TABLE abixy(
|
||
a VARCHAR(32),
|
||
b VARCHAR(32),
|
||
i BIGINT(10),
|
||
x DOUBLE,
|
||
y DOUBLE
|
||
);
|
||
Query OK, 0 rows affected (0.01 sec)
|
||
|
||
bash$ mlr --onidx --fs comma cat data/medium > medium.nidx
|
||
|
||
mysql> LOAD DATA LOCAL INFILE 'medium.nidx' REPLACE INTO TABLE abixy FIELDS TERMINATED BY ',' ;
|
||
Query OK, 10000 rows affected (0.07 sec)
|
||
Records: 10000 Deleted: 0 Skipped: 0 Warnings: 0
|
||
|
||
mysql> SELECT COUNT(*) AS count FROM abixy;
|
||
+-------+
|
||
| count |
|
||
+-------+
|
||
| 10000 |
|
||
+-------+
|
||
1 row in set (0.00 sec)
|
||
|
||
mysql> SELECT * FROM abixy LIMIT 10;
|
||
+------+------+------+---------------------+---------------------+
|
||
| a | b | i | x | y |
|
||
+------+------+------+---------------------+---------------------+
|
||
| pan | pan | 1 | 0.3467901443380824 | 0.7268028627434533 |
|
||
| eks | pan | 2 | 0.7586799647899636 | 0.5221511083334797 |
|
||
| wye | wye | 3 | 0.20460330576630303 | 0.33831852551664776 |
|
||
| eks | wye | 4 | 0.38139939387114097 | 0.13418874328430463 |
|
||
| wye | pan | 5 | 0.5732889198020006 | 0.8636244699032729 |
|
||
| zee | pan | 6 | 0.5271261600918548 | 0.49322128674835697 |
|
||
| eks | zee | 7 | 0.6117840605678454 | 0.1878849191181694 |
|
||
| zee | wye | 8 | 0.5985540091064224 | 0.976181385699006 |
|
||
| hat | wye | 9 | 0.03144187646093577 | 0.7495507603507059 |
|
||
| pan | wye | 10 | 0.5026260055412137 | 0.9526183602969864 |
|
||
+------+------+------+---------------------+---------------------+
|
||
</pre></div>
|
||
</div>
|
||
<p>Aggregate counts within SQL:</p>
|
||
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>mysql> SELECT a, b, COUNT(*) AS count FROM abixy GROUP BY a, b ORDER BY COUNT DESC;
|
||
+------+------+-------+
|
||
| a | b | count |
|
||
+------+------+-------+
|
||
| zee | wye | 455 |
|
||
| pan | eks | 429 |
|
||
| pan | pan | 427 |
|
||
| wye | hat | 426 |
|
||
| hat | wye | 423 |
|
||
| pan | hat | 417 |
|
||
| eks | hat | 417 |
|
||
| pan | zee | 413 |
|
||
| eks | eks | 413 |
|
||
| zee | hat | 409 |
|
||
| eks | wye | 407 |
|
||
| zee | zee | 403 |
|
||
| pan | wye | 395 |
|
||
| wye | pan | 392 |
|
||
| zee | eks | 391 |
|
||
| zee | pan | 389 |
|
||
| hat | eks | 389 |
|
||
| wye | eks | 386 |
|
||
| wye | zee | 385 |
|
||
| hat | zee | 385 |
|
||
| hat | hat | 381 |
|
||
| wye | wye | 377 |
|
||
| eks | pan | 371 |
|
||
| hat | pan | 363 |
|
||
| eks | zee | 357 |
|
||
+------+------+-------+
|
||
25 rows in set (0.01 sec)
|
||
</pre></div>
|
||
</div>
|
||
<p>Aggregate counts within Miller:</p>
|
||
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> mlr --opprint uniq -c -g a,b then sort -nr count data/medium
|
||
</span> a b count
|
||
zee wye 455
|
||
pan eks 429
|
||
pan pan 427
|
||
wye hat 426
|
||
hat wye 423
|
||
pan hat 417
|
||
eks hat 417
|
||
eks eks 413
|
||
pan zee 413
|
||
zee hat 409
|
||
eks wye 407
|
||
zee zee 403
|
||
pan wye 395
|
||
hat pan 363
|
||
eks zee 357
|
||
</pre></div>
|
||
</div>
|
||
<p>Pipe SQL output to aggregate counts within Miller:</p>
|
||
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> mysql -D miller -B -e 'select * from abixy' | mlr --itsv --opprint uniq -c -g a,b then sort -nr count
|
||
</span> a b count
|
||
zee wye 455
|
||
pan eks 429
|
||
pan pan 427
|
||
wye hat 426
|
||
hat wye 423
|
||
pan hat 417
|
||
eks hat 417
|
||
eks eks 413
|
||
pan zee 413
|
||
zee hat 409
|
||
eks wye 407
|
||
zee zee 403
|
||
pan wye 395
|
||
wye pan 392
|
||
zee eks 391
|
||
zee pan 389
|
||
hat eks 389
|
||
wye eks 386
|
||
hat zee 385
|
||
wye zee 385
|
||
hat hat 381
|
||
wye wye 377
|
||
eks pan 371
|
||
hat pan 363
|
||
eks zee 357
|
||
</pre></div>
|
||
</div>
|
||
</div>
|
||
</div>
|
||
|
||
|
||
</div>
|
||
</div>
|
||
</div>
|
||
|
||
<div class="footer" role="contentinfo">
|
||
© Copyright 2021, John Kerl.
|
||
Created using <a href="https://www.sphinx-doc.org/">Sphinx</a> 3.2.1.
|
||
</div>
|
||
</body>
|
||
</html> |