miller/docs6/_build/html/feature-comparison.html
2021-05-24 00:11:53 -04:00

162 lines
No EOL
14 KiB
HTML
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>Unix-toolkit context &#8212; Miller 5.10.2 documentation</title>
<link rel="stylesheet" href="_static/classic.css" type="text/css" />
<link rel="stylesheet" href="_static/pygments.css" type="text/css" />
<script id="documentation_options" data-url_root="./" src="_static/documentation_options.js"></script>
<script src="_static/jquery.js"></script>
<script src="_static/underscore.js"></script>
<script src="_static/doctools.js"></script>
<script src="_static/language_data.js"></script>
<link rel="index" title="Index" href="genindex.html" />
<link rel="search" title="Search" href="search.html" />
<link rel="next" title="File formats" href="file-formats.html" />
<link rel="prev" title="Miller in 10 minutes" href="10min.html" />
</head><body>
<div class="related" role="navigation" aria-label="related navigation">
<h3>Navigation</h3>
<ul>
<li class="right" style="margin-right: 10px">
<a href="genindex.html" title="General Index"
accesskey="I">index</a></li>
<li class="right" >
<a href="file-formats.html" title="File formats"
accesskey="N">next</a> |</li>
<li class="right" >
<a href="10min.html" title="Miller in 10 minutes"
accesskey="P">previous</a> |</li>
<li class="nav-item nav-item-0"><a href="index.html">Miller 5.10.2 documentation</a> &#187;</li>
<li class="nav-item nav-item-this"><a href="">Unix-toolkit context</a></li>
</ul>
</div>
<div class="document">
<div class="documentwrapper">
<div class="bodywrapper">
<div class="body" role="main">
<div class="section" id="unix-toolkit-context">
<h1>Unix-toolkit context<a class="headerlink" href="#unix-toolkit-context" title="Permalink to this headline"></a></h1>
<p>How does Miller fit within the Unix toolkit (<cite>grep</cite>, <cite>sed</cite>, <cite>awk</cite>, etc.)?</p>
<div class="section" id="file-format-awareness">
<h2>File-format awareness<a class="headerlink" href="#file-format-awareness" title="Permalink to this headline"></a></h2>
<p>Miller respects CSV headers. If you do <code class="docutils literal notranslate"><span class="pre">mlr</span> <span class="pre">--csv</span> <span class="pre">cat</span> <span class="pre">*.csv</span></code> then the header line is written once:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ cat data/a.csv
a,b,c
1,2,3
4,5,6
</pre></div>
</div>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ cat data/b.csv
a,b,c
7,8,9
</pre></div>
</div>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ mlr --csv cat data/a.csv data/b.csv
a,b,c
1,2,3
4,5,6
7,8,9
</pre></div>
</div>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ mlr --csv sort -nr b data/a.csv data/b.csv
a,b,c
7,8,9
4,5,6
1,2,3
</pre></div>
</div>
<p>Likewise with <code class="docutils literal notranslate"><span class="pre">mlr</span> <span class="pre">sort</span></code>, <code class="docutils literal notranslate"><span class="pre">mlr</span> <span class="pre">tac</span></code>, and so on.</p>
</div>
<div class="section" id="awk-like-features-mlr-filter-and-mlr-put">
<h2>awk-like features: mlr filter and mlr put<a class="headerlink" href="#awk-like-features-mlr-filter-and-mlr-put" title="Permalink to this headline"></a></h2>
<ul class="simple">
<li><p><code class="docutils literal notranslate"><span class="pre">mlr</span> <span class="pre">filter</span></code> includes/excludes records based on a filter expression, e.g. <code class="docutils literal notranslate"><span class="pre">mlr</span> <span class="pre">filter</span> <span class="pre">'$count</span> <span class="pre">&gt;</span> <span class="pre">10'</span></code>.</p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">mlr</span> <span class="pre">put</span></code> adds a new field as a function of others, e.g. <code class="docutils literal notranslate"><span class="pre">mlr</span> <span class="pre">put</span> <span class="pre">'$xy</span> <span class="pre">=</span> <span class="pre">$x</span> <span class="pre">*</span> <span class="pre">$y'</span></code> or <code class="docutils literal notranslate"><span class="pre">mlr</span> <span class="pre">put</span> <span class="pre">'$counter</span> <span class="pre">=</span> <span class="pre">NR'</span></code>.</p></li>
<li><p>The <code class="docutils literal notranslate"><span class="pre">$name</span></code> syntax is straight from <code class="docutils literal notranslate"><span class="pre">awk</span></code>s <code class="docutils literal notranslate"><span class="pre">$1</span> <span class="pre">$2</span> <span class="pre">$3</span></code> (adapted to name-based indexing), as are the variables <code class="docutils literal notranslate"><span class="pre">FS</span></code>, <code class="docutils literal notranslate"><span class="pre">OFS</span></code>, <code class="docutils literal notranslate"><span class="pre">RS</span></code>, <code class="docutils literal notranslate"><span class="pre">ORS</span></code>, <code class="docutils literal notranslate"><span class="pre">NF</span></code>, <code class="docutils literal notranslate"><span class="pre">NR</span></code>, and <code class="docutils literal notranslate"><span class="pre">FILENAME</span></code>. The <code class="docutils literal notranslate"><span class="pre">ENV[...]</span></code> syntax is from Ruby.</p></li>
<li><p>While <code class="docutils literal notranslate"><span class="pre">awk</span></code> functions are record-based, Miller subcommands (or <em>verbs</em>) are stream-based: each of them maps a stream of records into another stream of records.</p></li>
<li><p>Like <code class="docutils literal notranslate"><span class="pre">awk</span></code>, Miller (as of v5.0.0) allows you to define new functions within its <code class="docutils literal notranslate"><span class="pre">put</span></code> and <code class="docutils literal notranslate"><span class="pre">filter</span></code> expression language. Further programmability comes from chaining with <code class="docutils literal notranslate"><span class="pre">then</span></code>.</p></li>
<li><p>As with <code class="docutils literal notranslate"><span class="pre">awk</span></code>, <code class="docutils literal notranslate"><span class="pre">$</span></code>-variables are stream variables and all verbs (such as <code class="docutils literal notranslate"><span class="pre">cut</span></code>, <code class="docutils literal notranslate"><span class="pre">stats1</span></code>, <code class="docutils literal notranslate"><span class="pre">put</span></code>, etc.) as well as <code class="docutils literal notranslate"><span class="pre">put</span></code>/<code class="docutils literal notranslate"><span class="pre">filter</span></code> statements operate on streams. This means that you define actions to be done on each record and then stream your data through those actions. The built-in variables <code class="docutils literal notranslate"><span class="pre">NF</span></code>, <code class="docutils literal notranslate"><span class="pre">NR</span></code>, etc. change from one line to another, <code class="docutils literal notranslate"><span class="pre">$x</span></code> is a label for field <code class="docutils literal notranslate"><span class="pre">x</span></code> in the current record, and the input to <code class="docutils literal notranslate"><span class="pre">sqrt($x)</span></code> changes from one record to the next. The expression language for the <code class="docutils literal notranslate"><span class="pre">put</span></code> and <code class="docutils literal notranslate"><span class="pre">filter</span></code> verbs additionally allows you to define <code class="docutils literal notranslate"><span class="pre">begin</span> <span class="pre">{...}</span></code> and <code class="docutils literal notranslate"><span class="pre">end</span> <span class="pre">{...}</span></code> blocks for actions to be taken before and after records are processed, respectively.</p></li>
<li><p>As with <code class="docutils literal notranslate"><span class="pre">awk</span></code>, Millers <code class="docutils literal notranslate"><span class="pre">put</span></code>/<code class="docutils literal notranslate"><span class="pre">filter</span></code> language lets you set <code class="docutils literal notranslate"><span class="pre">&#64;sum=0</span></code> before records are read, then update that sum on each record, then print its value at the end. Unlike <code class="docutils literal notranslate"><span class="pre">awk</span></code>, Miller makes syntactically explicit the difference between variables with extent across all records (names starting with <code class="docutils literal notranslate"><span class="pre">&#64;</span></code>, such as <code class="docutils literal notranslate"><span class="pre">&#64;sum</span></code>) and variables which are local to the current expression (names starting without <code class="docutils literal notranslate"><span class="pre">&#64;</span></code>, such as <code class="docutils literal notranslate"><span class="pre">sum</span></code>).</p></li>
<li><p>Miller can be faster than <code class="docutils literal notranslate"><span class="pre">awk</span></code>, <code class="docutils literal notranslate"><span class="pre">cut</span></code>, and so on, depending on platform; see also <a class="reference internal" href="performance.html"><span class="doc">Performance</span></a>. In particular, Millers DSL syntax is parsed into C control structures at startup time, with the bulk data-stream processing all done in C.</p></li>
</ul>
</div>
<div class="section" id="see-also">
<h2>See also<a class="headerlink" href="#see-also" title="Permalink to this headline"></a></h2>
<p>See <a class="reference internal" href="reference-verbs.html"><span class="doc">Verbs reference</span></a> for more on Millers subcommands <code class="docutils literal notranslate"><span class="pre">cat</span></code>, <code class="docutils literal notranslate"><span class="pre">cut</span></code>, <code class="docutils literal notranslate"><span class="pre">head</span></code>, <code class="docutils literal notranslate"><span class="pre">sort</span></code>, <code class="docutils literal notranslate"><span class="pre">tac</span></code>, <code class="docutils literal notranslate"><span class="pre">tail</span></code>, <code class="docutils literal notranslate"><span class="pre">top</span></code>, and <code class="docutils literal notranslate"><span class="pre">uniq</span></code>, as well as <a class="reference internal" href="reference-dsl.html"><span class="doc">DSL reference</span></a> for more on the awk-like <code class="docutils literal notranslate"><span class="pre">mlr</span> <span class="pre">filter</span></code> and <code class="docutils literal notranslate"><span class="pre">mlr</span> <span class="pre">put</span></code>.</p>
</div>
</div>
<div class="clearer"></div>
</div>
</div>
</div>
<div class="sphinxsidebar" role="navigation" aria-label="main navigation">
<div class="sphinxsidebarwrapper">
<h3><a href="index.html">Table of Contents</a></h3>
<ul>
<li><a class="reference internal" href="#">Unix-toolkit context</a><ul>
<li><a class="reference internal" href="#file-format-awareness">File-format awareness</a></li>
<li><a class="reference internal" href="#awk-like-features-mlr-filter-and-mlr-put">awk-like features: mlr filter and mlr put</a></li>
<li><a class="reference internal" href="#see-also">See also</a></li>
</ul>
</li>
</ul>
<h4>Previous topic</h4>
<p class="topless"><a href="10min.html"
title="previous chapter">Miller in 10 minutes</a></p>
<h4>Next topic</h4>
<p class="topless"><a href="file-formats.html"
title="next chapter">File formats</a></p>
<div role="note" aria-label="source link">
<h3>This Page</h3>
<ul class="this-page-menu">
<li><a href="_sources/feature-comparison.rst.txt"
rel="nofollow">Show Source</a></li>
</ul>
</div>
<div id="searchbox" style="display: none" role="search">
<h3 id="searchlabel">Quick search</h3>
<div class="searchformwrapper">
<form class="search" action="search.html" method="get">
<input type="text" name="q" aria-labelledby="searchlabel" />
<input type="submit" value="Go" />
</form>
</div>
</div>
<script>$('#searchbox').show(0);</script>
</div>
</div>
<div class="clearer"></div>
</div>
<div class="related" role="navigation" aria-label="related navigation">
<h3>Navigation</h3>
<ul>
<li class="right" style="margin-right: 10px">
<a href="genindex.html" title="General Index"
>index</a></li>
<li class="right" >
<a href="file-formats.html" title="File formats"
>next</a> |</li>
<li class="right" >
<a href="10min.html" title="Miller in 10 minutes"
>previous</a> |</li>
<li class="nav-item nav-item-0"><a href="index.html">Miller 5.10.2 documentation</a> &#187;</li>
<li class="nav-item nav-item-this"><a href="">Unix-toolkit context</a></li>
</ul>
</div>
<div class="footer" role="contentinfo">
&#169; Copyright 2020, John Kerl.
Created using <a href="https://www.sphinx-doc.org/">Sphinx</a> 3.2.1.
</div>
</body>
</html>