mirror of
https://github.com/johnkerl/miller.git
synced 2026-01-23 18:25:45 +00:00
135 lines
5.3 KiB
HTML
135 lines
5.3 KiB
HTML
<!DOCTYPE html>
|
|
<html lang="en">
|
|
|
|
<!-- PAGE GENERATED FROM template.html and content-for-performance.html BY poki. -->
|
|
<!-- PLEASE MAKE CHANGES THERE AND THEN RE-RUN poki. -->
|
|
<head>
|
|
<meta http-equiv="Content-type" content="text/html;charset=UTF-8"/>
|
|
<meta name="description" content="Miller documentation"/>
|
|
<meta name="viewport" content="width=device-width, initial-scale=1.0"/> <!-- mobile-friendly -->
|
|
<meta name="keywords"
|
|
content="John Kerl, Kerl, Miller, miller, mlr, OLAP, data analysis software, regression, correlation, variance, data tools, " />
|
|
|
|
<title> Performance </title>
|
|
<link rel="stylesheet" type="text/css" href="css/miller.css"/>
|
|
<link rel="stylesheet" type="text/css" href="css/poki-callbacks.css"/>
|
|
</head>
|
|
|
|
<!-- ================================================================ -->
|
|
<script type="text/javascript">
|
|
var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www.");
|
|
document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E"));
|
|
</script>
|
|
<script type="text/javascript">
|
|
try {
|
|
var pageTracker = _gat._getTracker("UA-15651652-1");
|
|
pageTracker._trackPageview();
|
|
} catch(err) {}
|
|
</script>
|
|
<!-- ================================================================ -->
|
|
|
|
<body bgcolor="#ffffff">
|
|
|
|
<!-- ================================================================ -->
|
|
|
|
<!-- navbar -->
|
|
<div class="pokinav">
|
|
<center><titleinbody>Miller</titleinbody></center>
|
|
|
|
<!-- NAVBAR GENERATED FROM template.html BY poki -->
|
|
<br/>
|
|
<a class="poki-navbar-element" href="index.html">Overview</a>
|
|
|
|
<a class="poki-navbar-element" href="faq.html">Using</a>
|
|
|
|
<a class="poki-navbar-element" href="reference.html">Reference</a>
|
|
|
|
<a class="poki-navbar-element" href="why.html"><b>Background</b></a>
|
|
|
|
<a class="poki-navbar-element" href="contact.html">Repository</a>
|
|
|
|
<br/>
|
|
<br/><a href="why.html">Why?</a>
|
|
<br/><a href="whyc.html">Why C?</a>
|
|
<br/><a href="whyc-details.html">Why C: details</a>
|
|
<br/><a href="etymology.html">Why call it Miller?</a>
|
|
<br/><a href="originality.html">How original is Miller?</a>
|
|
<br/><a href="performance.html"><b>Performance</b></a>
|
|
</div>
|
|
|
|
<!-- page body -->
|
|
<p/>
|
|
|
|
<!-- BODY COPIED FROM content-for-performance.html BY poki -->
|
|
<div class="pokitoc">
|
|
<center><titleinbody>Performance</titleinbody></center>
|
|
• <a href="#Disclaimer">Disclaimer</a><br/>
|
|
• <a href="#Summary">Summary</a><br/>
|
|
</div>
|
|
<p/>
|
|
|
|
<p/>
|
|
<button style="font-weight:bold;color:maroon;border:0" onclick="bodyToggler.expandAll();" href="javascript:;">Expand all sections</button>
|
|
<button style="font-weight:bold;color:maroon;border:0" onclick="bodyToggler.collapseAll();" href="javascript:;">Collapse all sections</button>
|
|
|
|
<a id="Disclaimer"/><h1>Disclaimer</h1>
|
|
|
|
In a previous version of this page (see <a
|
|
href="http://johnkerl.org/miller-releases/miller-5.1.0/doc/performance.html">here</a>)
|
|
I compared Miller to some items in the Unix toolkit in terms of run time. But
|
|
such comparisons are very much not apples-to-apples:
|
|
|
|
<ul>
|
|
|
|
<li/> Miller’s principal strength is that it handles <b>key-value data in
|
|
various formats</b> while the system tools <b>do not</b>. So if you time
|
|
<code>mlr sort</code> on a CSV file against system <code>sort</code>, it's not relevant
|
|
to say which is faster by how many percent — Miller will respect the
|
|
header line, leaving it in place, while the system sort will move it, sorting
|
|
it along with all the other header lines. This would be comparing the run times
|
|
of two programs produce different outputs. Likewise, <code>awk</code>
|
|
doesn’t respect header lines, although you can code up some CSV-handling
|
|
using <code>if (NR==1) { ... } else { ... }</code>. And that’s just CSV: I
|
|
don’t know any simple way to get <code>sort</code>, <code>awk</code>, etc. to
|
|
handle DKVP, JSON, etc. — which is the main rreason I wrote Miller.
|
|
|
|
<li/> <b>Implementations differ by platform</b>: one <code>awk</code> may be
|
|
fundamentally faster than another, and <code>mawk</code> has a very efficient
|
|
bytecode implementation — which handles positionally indexed data
|
|
far faster than Miller does.
|
|
|
|
<li/> The system <code>sort</code> command will, on some systems, handle
|
|
too-large-for-RAM datasets by spilling to disk; Miller (as of version 5.2.0,
|
|
mid-2017) does not. Miller sorts are always stable; GNU supports stable and
|
|
unstable variants.
|
|
|
|
<li/> Etc.
|
|
|
|
</ul>
|
|
|
|
<a id="Summary"/><h1>Summary</h1>
|
|
|
|
<p/> Miller can do many kinds of processing on key-value-pair data using
|
|
elapsed time roughly of the same order of magnitude as items in the Unix
|
|
toolkit can handle positionally indexed data. Specific results vary widely
|
|
by platform, implementation details, multi-core use (or not). Lastly,
|
|
specific special-purpose non-record-aware processing will run far faster
|
|
in <code>grep</code>, <code>sed</code>, etc.
|
|
|
|
</div>
|
|
|
|
<!-- ================================================================ -->
|
|
<script type="text/javascript" src="js/miller-doc-toggler.js"></script>
|
|
<!-- wtf -->
|
|
<script type="text/javascript">
|
|
// Put this at the bottom of the page since its constructor scans the
|
|
// document's div tags to find the toggleables.
|
|
const bodyToggler = new MillerDocToggler(
|
|
"body_section_toggle_",
|
|
'maroon',
|
|
'maroon',
|
|
);
|
|
</script>
|
|
|
|
</body>
|
|
</html>
|