mirror of
https://github.com/johnkerl/miller.git
synced 2026-01-23 02:14:13 +00:00
194 lines
6.6 KiB
HTML
194 lines
6.6 KiB
HTML
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
|
|
<html lang="en">
|
|
|
|
<!-- PAGE GENERATED FROM template.html and content-for-feature-comparison.html BY poki. -->
|
|
<!-- PLEASE MAKE CHANGES THERE AND THEN RE-RUN poki. -->
|
|
<head>
|
|
<meta http-equiv="Content-type" content="text/html;charset=UTF-8"/>
|
|
<meta name="description" content="Miller documentation"/>
|
|
<meta name="viewport" content="width=device-width, initial-scale=1.0"/> <!-- mobile-friendly -->
|
|
<title> Miller features in the context of the Unix toolkit </title>
|
|
<link rel="stylesheet" type="text/css" href="css/miller.css"/>
|
|
<link rel="stylesheet" type="text/css" href="css/poki-callbacks.css"/>
|
|
</head>
|
|
|
|
<!-- ================================================================ -->
|
|
<script type="text/javascript">
|
|
var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www.");
|
|
document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E"));
|
|
</script>
|
|
<script type="text/javascript">
|
|
try {
|
|
var pageTracker = _gat._getTracker("UA-15651652-1");
|
|
pageTracker._trackPageview();
|
|
} catch(err) {}
|
|
</script>
|
|
|
|
<script type="text/javascript">
|
|
function toggle(divName) {
|
|
var eleDiv = document.getElementById(divName);
|
|
if (eleDiv != null) {
|
|
if (eleDiv.style.display == "block") {
|
|
eleDiv.style.display = "none";
|
|
} else {
|
|
eleDiv.style.display = "block";
|
|
}
|
|
}
|
|
}
|
|
</script>
|
|
|
|
<!--
|
|
The background image is from a screenshot of a Google search for "data analysis
|
|
tools", lightened and sepia-toned. Over this was placed a Mac Terminal app with
|
|
very light-grey font and translucent background, in which a few statistical
|
|
Miller commands were run with pretty-print-tabular output format.
|
|
-->
|
|
<body background="pix/sepia-overlay.jpg">
|
|
|
|
<!-- ================================================================ -->
|
|
<table width="100%">
|
|
<tr>
|
|
|
|
<!-- navbar -->
|
|
<td width="15%">
|
|
<!--
|
|
<img src="pix/mlr.jpg" />
|
|
<img style="border-width:1px; color:black;" src="pix/mlr.jpg" />
|
|
-->
|
|
|
|
<div class="pokinav">
|
|
<center><titleinbody>Miller</titleinbody></center>
|
|
|
|
<!-- PAGE LIST GENERATED FROM template.html BY poki -->
|
|
<br/>User info:
|
|
<br/>• <a href="index.html">About</a>
|
|
<br/>• <a href="file-formats.html">File formats</a>
|
|
<br/>• <a href="feature-comparison.html"><b>Miller features in the context of the Unix toolkit</b></a>
|
|
<br/>• <a href="record-heterogeneity.html">Record-heterogeneity</a>
|
|
<br/>• <a href="performance.html">Performance</a>
|
|
<br/>• <a href="etymology.html">Why call it Miller?</a>
|
|
<br/>• <a href="originality.html">How original is Miller?</a>
|
|
<br/>• <a href="reference.html">Reference</a>
|
|
<br/>• <a href="data-examples.html">Data examples</a>
|
|
<br/>• <a href="to-do.html">Things to do</a>
|
|
<br/>Developer info:
|
|
<br/>• <a href="build.html">Compiling, portability, dependencies, and testing</a>
|
|
<br/>• <a href="whyc.html">Why C?</a>
|
|
<br/>• <a href="contact.html">Contact information</a>
|
|
<br/>• <a href="https://github.com/johnkerl/miller">GitHub repo</a>
|
|
<br/> <br/> <br/> <br/> <br/> <br/> <br/> <br/> <br/> <br/> <br/> <br/>
|
|
<br/> <br/> <br/> <br/> <br/> <br/> <br/> <br/> <br/> <br/> <br/> <br/>
|
|
<br/> <br/> <br/> <br/> <br/> <br/>
|
|
</div>
|
|
</td>
|
|
|
|
<!-- page body -->
|
|
<td>
|
|
<div style="overflow-y:scroll;height:1500px">
|
|
<center> <titleinbody> Miller features in the context of the Unix toolkit </titleinbody> </center>
|
|
<p/>
|
|
|
|
<!-- BODY COPIED FROM content-for-feature-comparison.html BY poki -->
|
|
<div class="pokitoc">
|
|
<center><b>Contents:</b></center>
|
|
• <a href="#File-format_awareness">File-format awareness</a><br/>
|
|
• <a href="#awk-like_features:_mlr_filter_and_mlr_put">awk-like features: mlr filter and mlr put</a><br/>
|
|
• <a href="#See_also">See also</a><br/>
|
|
</div>
|
|
<p/>
|
|
|
|
<a id="File-format_awareness"/><h1>File-format awareness</h1>
|
|
|
|
Miller respects CSV headers. If you do <tt>mlr --csv-input cat *.csv</tt> then the header line is written once:
|
|
|
|
<table><tr>
|
|
<td>
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ cat a.csv
|
|
a,b,c
|
|
1,2,3
|
|
4,5,6
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
</td>
|
|
<td>
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ cat b.csv
|
|
a,b,c
|
|
7,8,9
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
</td>
|
|
<td>
|
|
<p/>
|
|
<div class="pokipanel">
|
|
<pre>
|
|
$ mlr --csv cat a.csv b.csv
|
|
a,b,c
|
|
1,2,3
|
|
4,5,6
|
|
7,8,9
|
|
</pre>
|
|
</div>
|
|
<p/>
|
|
</td>
|
|
</tr></table>
|
|
|
|
Likewise with <tt>mlr sort</tt>, <tt>mlr tac</tt>, and so on.
|
|
|
|
<a id="awk-like_features:_mlr_filter_and_mlr_put"/><h1>awk-like features: mlr filter and mlr put</h1>
|
|
|
|
<ul>
|
|
|
|
<li/> <tt>mlr filter</tt> includes/excludes records based on a filter
|
|
expression, e.g. <tt>mlr filter '$count > 10'</tt>.
|
|
|
|
<li/> <tt>mlr put</tt> adds a new field as a function of others, e.g. <tt>mlr
|
|
put '$xy = $x * $y'</tt> or <tt>mlr put '$counter = NR'</tt>.
|
|
|
|
<li/> The <tt>$name</tt> syntax is straight from <tt>awk</tt>’s <tt>$1 $2
|
|
$3</tt> (adapted to name-based indexing), as are the variables <tt>FS</tt>,
|
|
<tt>OFS</tt>, <tt>RS</tt>, <tt>ORS</tt>, <tt>NF</tt>, <tt>NR</tt>, and
|
|
<tt>FILENAME</tt>.
|
|
|
|
<li/> While <tt>awk</tt> functions are record-based, Miller subcommands (or
|
|
functions, if you like) are stream-based: each of them maps a stream of records
|
|
into another stream of records.
|
|
|
|
<li/> Unlike <tt>awk</tt>, Miller doesn’t allow you to define new functions.
|
|
Its domain-specific languages are limited to the <tt>filter</tt> and
|
|
<tt>put</tt> syntax. Futher programmability comes from chaining with
|
|
<tt>then</tt>.
|
|
|
|
<li/> Unlike with <tt>awk</tt>, all variables are stream variables and all
|
|
functions are stream functions. This means <tt>NF</tt>, <tt>NR</tt>, etc.
|
|
change from one line to another, <tt>$x</tt> is a label for field <tt>x</tt> in
|
|
the current record, and the input to <tt>sqrt($x)</tt> changes from one record
|
|
to the next. Miller doesn’t let you set, say, <tt>sum=0</tt> and then
|
|
update that on each record.
|
|
|
|
<li/> Miller is faster than <tt>awk</tt>, <tt>cut</tt>, and so on (depending on
|
|
platform; see also <a href="performance.html">Performance</a>). In
|
|
particular, Miller’s DSL syntax is parsed into C control structures at
|
|
startup time, with the bulk data-stream processing all done in C.
|
|
|
|
</ul>
|
|
|
|
<a id="See_also"/><h1>See also</h1>
|
|
|
|
<p/>See <a href="reference.html">Reference</a> for more on Miller’s
|
|
subcommands <tt>cat</tt>, <tt>cut</tt>, <tt>head</tt>, <tt>sort</tt>,
|
|
<tt>tac</tt>, <tt>tail</tt>, <tt>top</tt>, and <tt>uniq</tt>, as well as awk-like
|
|
<tt>mlr filter</tt> and <tt>mlr put</tt>.
|
|
</div>
|
|
</td>
|
|
|
|
</table>
|
|
</body>
|
|
</html>
|