miller/docs6/docs/_build/html/faq.html


<!DOCTYPE html>

<html>
  <head>
    <meta charset="utf-8" />
    <meta name="viewport" content="width=device-width, initial-scale=1.0" />
    <title>FAQ &#8212; Miller 6.0.0-alpha documentation</title>
    <link rel="stylesheet" href="_static/classic.css" type="text/css" />
    <link rel="stylesheet" href="_static/pygments.css" type="text/css" />

    <script id="documentation_options" data-url_root="./" src="_static/documentation_options.js"></script>
    <script src="_static/jquery.js"></script>
    <script src="_static/underscore.js"></script>
    <script src="_static/doctools.js"></script>
    <script src="_static/language_data.js"></script>

    <link rel="index" title="Index" href="genindex.html" />
    <link rel="search" title="Search" href="search.html" />
    <link rel="next" title="Cookbook part 1: common patterns" href="cookbook.html" />
    <link rel="prev" title="Joins" href="joins.html" />
  </head><body>
    <div class="related" role="navigation" aria-label="related navigation">
      <h3>Navigation</h3>
      <ul>
        <li class="right" style="margin-right: 10px">
          <a href="genindex.html" title="General Index"
             accesskey="I">index</a></li>
        <li class="right" >
          <a href="cookbook.html" title="Cookbook part 1: common patterns"
             accesskey="N">next</a> |</li>
        <li class="right" >
          <a href="joins.html" title="Joins"
             accesskey="P">previous</a> |</li>
        <li class="nav-item nav-item-0"><a href="index.html">Miller 6.0.0-alpha documentation</a> &#187;</li>
        <li class="nav-item nav-item-this"><a href="">FAQ</a></li>
      </ul>
    </div>

    <div class="document">
      <div class="documentwrapper">
        <div class="bodywrapper">
          <div class="body" role="main">

  <div class="section" id="faq">
<h1>FAQ<a class="headerlink" href="#faq" title="Permalink to this headline">¶</a></h1>
<div class="section" id="how-do-i-suppress-numeric-conversion">
<h2>How do I suppress numeric conversion?<a class="headerlink" href="#how-do-i-suppress-numeric-conversion" title="Permalink to this headline">¶</a></h2>
<p><strong>TL;DR use put -S</strong>.</p>
<p>Within <code class="docutils literal notranslate"><span class="pre">mlr</span> <span class="pre">put</span></code> and <code class="docutils literal notranslate"><span class="pre">mlr</span> <span class="pre">filter</span></code>, the default behavior for scanning input records is to parse them as integer, if possible, then as float, if possible, else leave them as string:</p>
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> $ cat data/scan-example-1.tbl
</span> value
 1
 2.0
 3x
 hello
</pre></div>
</div>
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> $ mlr --pprint put &#39;$copy = $value; $type = typeof($value)&#39; data/scan-example-1.tbl
</span> value copy  type
 1     1     int
 2.0   2.0   float
 3x    3x    string
 hello hello string
</pre></div>
</div>
<p>The numeric-conversion rule is simple:</p>
<ul class="simple">
<li><p>Try to scan as integer (<code class="docutils literal notranslate"><span class="pre">&quot;1&quot;</span></code> should be int);</p></li>
<li><p>If that doesn’t succeed, try to scan as float (<code class="docutils literal notranslate"><span class="pre">&quot;1.0&quot;</span></code> should be float);</p></li>
<li><p>If that doesn’t succeed, leave the value as a string (<code class="docutils literal notranslate"><span class="pre">&quot;1x&quot;</span></code> is string).</p></li>
</ul>
<p>This is a sensible default: you should be able to put <code class="docutils literal notranslate"><span class="pre">'$z</span> <span class="pre">=</span> <span class="pre">$x</span> <span class="pre">+</span> <span class="pre">$y'</span></code> without having to write <code class="docutils literal notranslate"><span class="pre">'$z</span> <span class="pre">=</span> <span class="pre">int($x)</span> <span class="pre">+</span> <span class="pre">float($y)'</span></code>.  Also note that default output format for floating-point numbers created by <code class="docutils literal notranslate"><span class="pre">put</span></code> (and other verbs such as <code class="docutils literal notranslate"><span class="pre">stats1</span></code>) is six decimal places; you can override this using <code class="docutils literal notranslate"><span class="pre">mlr</span> <span class="pre">--ofmt</span></code>.  Also note that Miller uses your system’s Go library functions whenever possible: e.g. <code class="docutils literal notranslate"><span class="pre">sscanf</span></code> for converting strings to integer or floating-point.</p>
<p>But now suppose you have data like these:</p>
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> $ cat data/scan-example-2.tbl
</span> value
 0001
 0002
 0005
 0005WA
 0006
 0007
 0007WA
 0008
 0009
 0010
</pre></div>
</div>
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> $ mlr --pprint put &#39;$copy = $value; $type = typeof($value)&#39; data/scan-example-2.tbl
</span> value  copy   type
 0001   0001   int
 0002   0002   int
 0005   0005   int
 0005WA 0005WA string
 0006   0006   int
 0007   0007   int
 0007WA 0007WA string
 0008   0008   float
 0009   0009   float
 0010   0010   int
</pre></div>
</div>
<p>The same conversion rules as above are being used. Namely:</p>
<ul class="simple">
<li><p>By default field values are inferred to int, else float, else string;</p></li>
<li><p>leading zeroes indicate octal for integers (<code class="docutils literal notranslate"><span class="pre">sscanf</span></code> semantics);</p></li>
<li><p>since <code class="docutils literal notranslate"><span class="pre">0008</span></code> doesn’t scan as integer (leading 0 requests octal but 8 isn’t a valid octal digit), the float scan is tried next and it succeeds;</p></li>
<li><p>default floating-point output format is 6 decimal places (override with <code class="docutils literal notranslate"><span class="pre">mlr</span> <span class="pre">--ofmt</span></code>).</p></li>
</ul>
<p>Taken individually the rules make sense; taken collectively they produce a mishmash of types here.</p>
<p>The solution is to <strong>use the -S flag</strong> for <code class="docutils literal notranslate"><span class="pre">mlr</span> <span class="pre">put</span></code> and/or <code class="docutils literal notranslate"><span class="pre">mlr</span> <span class="pre">filter</span></code>. Then all field values are left as string. You can type-coerce on demand using syntax like <code class="docutils literal notranslate"><span class="pre">'$z</span> <span class="pre">=</span> <span class="pre">int($x)</span> <span class="pre">+</span> <span class="pre">float($y)'</span></code>. (See also <a class="reference internal" href="reference-dsl.html"><span class="doc">DSL reference</span></a>; see also <a class="reference external" href="https://github.com/johnkerl/miller/issues/150">https://github.com/johnkerl/miller/issues/150</a>.)</p>
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> $ mlr --pprint put -S &#39;$copy = $value; $type = typeof($value)&#39; data/scan-example-2.tbl
</span> value  copy   type
 0001   0001   int
 0002   0002   int
 0005   0005   int
 0005WA 0005WA string
 0006   0006   int
 0007   0007   int
 0007WA 0007WA string
 0008   0008   float
 0009   0009   float
 0010   0010   int
</pre></div>
</div>
</div>
<div class="section" id="how-do-i-examine-then-chaining">
<h2>How do I examine then-chaining?<a class="headerlink" href="#how-do-i-examine-then-chaining" title="Permalink to this headline">¶</a></h2>
<p>Then-chaining found in Miller is intended to function the same as Unix pipes, but with less keystroking. You can print your data one pipeline step at a time, to see what intermediate output at one step becomes the input to the next step.</p>
<p>First, look at the input data:</p>
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> $ cat data/then-example.csv
</span> Status,Payment_Type,Amount
 paid,cash,10.00
 pending,debit,20.00
 paid,cash,50.00
 pending,credit,40.00
 paid,debit,30.00
</pre></div>
</div>
<p>Next, run the first step of your command, omitting anything from the first <code class="docutils literal notranslate"><span class="pre">then</span></code> onward:</p>
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> $ mlr --icsv --opprint count-distinct -f Status,Payment_Type data/then-example.csv
</span> Status  Payment_Type count
 paid    cash         2
 pending debit        1
 pending credit       1
 paid    debit        1
</pre></div>
</div>
<p>After that, run it with the next <code class="docutils literal notranslate"><span class="pre">then</span></code> step included:</p>
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> $ mlr --icsv --opprint count-distinct -f Status,Payment_Type then sort -nr count data/then-example.csv
</span> Status  Payment_Type count
 paid    cash         2
 pending debit        1
 pending credit       1
 paid    debit        1
</pre></div>
</div>
<p>Now if you use <code class="docutils literal notranslate"><span class="pre">then</span></code> to include another verb after that, the columns <code class="docutils literal notranslate"><span class="pre">Status</span></code>, <code class="docutils literal notranslate"><span class="pre">Payment_Type</span></code>, and <code class="docutils literal notranslate"><span class="pre">count</span></code> will be the input to that verb.</p>
<p>Note, by the way, that you’ll get the same results using pipes:</p>
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> $ mlr --csv count-distinct -f Status,Payment_Type data/then-example.csv | mlr --icsv --opprint sort -nr count
</span> Status  Payment_Type count
 paid    cash         2
 pending debit        1
 pending credit       1
 paid    debit        1
</pre></div>
</div>
</div>
<div class="section" id="how-can-i-filter-by-date">
<h2>How can I filter by date?<a class="headerlink" href="#how-can-i-filter-by-date" title="Permalink to this headline">¶</a></h2>
<p>Given input like</p>
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> $ cat dates.csv
</span> date,event
 2018-02-03,initialization
 2018-03-07,discovery
 2018-02-03,allocation
</pre></div>
</div>
<p>we can use <code class="docutils literal notranslate"><span class="pre">strptime</span></code> to parse the date field into seconds-since-epoch and then do numeric comparisons.  Simply match your input dataset’s date-formatting to the <a class="reference internal" href="reference-dsl.html#reference-dsl-strptime"><span class="std std-ref">strptime</span></a> format-string.  For example:</p>
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> $ mlr --csv filter &#39;strptime($date, &quot;%Y-%m-%d&quot;) &gt; strptime(&quot;2018-03-03&quot;, &quot;%Y-%m-%d&quot;)&#39; dates.csv
</span> date,event
 2018-03-07,discovery
</pre></div>
</div>
<p>Caveat: localtime-handling in timezones with DST is still a work in progress; see <a class="reference external" href="https://github.com/johnkerl/miller/issues/170">https://github.com/johnkerl/miller/issues/170</a>. See also <a class="reference external" href="https://github.com/johnkerl/miller/issues/208">https://github.com/johnkerl/miller/issues/208</a> – thanks &#64;aborruso!</p>
</div>
<div class="section" id="how-can-i-handle-commas-as-data-in-various-formats">
<h2>How can I handle commas-as-data in various formats?<a class="headerlink" href="#how-can-i-handle-commas-as-data-in-various-formats" title="Permalink to this headline">¶</a></h2>
<p><a class="reference internal" href="file-formats.html"><span class="doc">CSV</span></a> handles this well and by design:</p>
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> $ cat commas.csv
</span> Name,Role
 &quot;Xiao, Lin&quot;,administrator
 &quot;Khavari, Darius&quot;,tester
</pre></div>
</div>
<p>Likewise <a class="reference internal" href="file-formats.html#file-formats-json"><span class="std std-ref">Tabular JSON</span></a>:</p>
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> $ mlr --icsv --ojson cat commas.csv
</span> {
   &quot;Name&quot;: &quot;Xiao, Lin&quot;,
   &quot;Role&quot;: &quot;administrator&quot;
 }
 {
   &quot;Name&quot;: &quot;Khavari, Darius&quot;,
   &quot;Role&quot;: &quot;tester&quot;
 }
</pre></div>
</div>
<p>For Miller’s <a class="reference internal" href="file-formats.html#file-formats-xtab"><span class="std std-ref">vertical-tabular format</span></a> there is no escaping for carriage returns, but commas work fine:</p>
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> $ mlr --icsv --oxtab cat commas.csv
</span> Name Xiao, Lin
 Role administrator

 Name Khavari, Darius
 Role tester
</pre></div>
</div>
<p>But for <a class="reference internal" href="file-formats.html#file-formats-dkvp"><span class="std std-ref">Key-value_pairs</span></a> and <a class="reference internal" href="file-formats.html#file-formats-nidx"><span class="std std-ref">index-numbered</span></a>, commas are the default field separator. And – as of Miller 5.4.0 anyway – there is no CSV-style double-quote-handling like there is for CSV. So commas within the data look like delimiters:</p>
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> $ mlr --icsv --odkvp cat commas.csv
</span> Name=Xiao, Lin,Role=administrator
 Name=Khavari, Darius,Role=tester
</pre></div>
</div>
<p>One solution is to use a different delimiter, such as a pipe character:</p>
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> $ mlr --icsv --odkvp --ofs pipe cat commas.csv
</span> Name=Xiao, Lin|Role=administrator
 Name=Khavari, Darius|Role=tester
</pre></div>
</div>
<p>To be extra-sure to avoid data/delimiter clashes, you can also use control
characters as delimiters – here, control-A:</p>
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> $ mlr --icsv --odkvp --ofs &#39;\001&#39;  cat commas.csv | cat -v
</span> Name=Xiao, Lin\001Role=administrator
 Name=Khavari, Darius\001Role=tester
</pre></div>
</div>
</div>
<div class="section" id="how-can-i-handle-field-names-with-special-symbols-in-them">
<h2>How can I handle field names with special symbols in them?<a class="headerlink" href="#how-can-i-handle-field-names-with-special-symbols-in-them" title="Permalink to this headline">¶</a></h2>
<p>Simply surround the field names with curly braces:</p>
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> $ echo &#39;x.a=3,y:b=4,z/c=5&#39; | mlr put &#39;${product.all} = ${x.a} * ${y:b} * ${z/c}&#39;
</span> x.a=3,y:b=4,z/c=5,product.all=60
</pre></div>
</div>
</div>
<div class="section" id="how-to-escape-in-regexes">
<h2>How to escape ‘?’ in regexes?<a class="headerlink" href="#how-to-escape-in-regexes" title="Permalink to this headline">¶</a></h2>
<p>One way is to use square brackets; an alternative is to use simple string-substitution rather than a regular expression.</p>
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> $ cat data/question.dat
</span> a=is it?,b=it is!
</pre></div>
</div>
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> $ mlr --oxtab put &#39;$c = gsub($a, &quot;[?]&quot;,&quot; ...&quot;)&#39; data/question.dat
</span> a is it?
 b it is!
 c is it ...
</pre></div>
</div>
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> $ mlr --oxtab put &#39;$c = ssub($a, &quot;?&quot;,&quot; ...&quot;)&#39; data/question.dat
</span> a is it?
 b it is!
 c is it ...
</pre></div>
</div>
<p>The <code class="docutils literal notranslate"><span class="pre">ssub</span></code> function exists precisely for this reason: so you don’t have to escape anything.</p>
</div>
<div class="section" id="how-can-i-put-single-quotes-into-strings">
<h2>How can I put single-quotes into strings?<a class="headerlink" href="#how-can-i-put-single-quotes-into-strings" title="Permalink to this headline">¶</a></h2>
<p>This is a little tricky due to the shell’s handling of quotes. For simplicity, let’s first put an update script into a file:</p>
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>$a = &quot;It&#39;s OK, I said, then &#39;for now&#39;.&quot;
</pre></div>
</div>
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> $ echo a=bcd | mlr put -f data/single-quote-example.mlr
</span> a=It&#39;s OK, I said, then &#39;for now&#39;.
</pre></div>
</div>
<p>So, it’s simple: Miller’s DSL uses double quotes for strings, and you can put single quotes (or backslash-escaped double-quotes) inside strings, no problem.</p>
<p>Without putting the update expression in a file, it’s messier:</p>
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> $ echo a=bcd | mlr put &#39;$a=&quot;It&#39;\&#39;&#39;s OK, I said, &#39;\&#39;&#39;for now&#39;\&#39;&#39;.&quot;&#39;
</span> a=It&#39;s OK, I said, &#39;for now&#39;.
</pre></div>
</div>
<p>The idea is that the outermost single-quotes are to protect the <code class="docutils literal notranslate"><span class="pre">put</span></code> expression from the shell, and the double quotes within them are for Miller. To get a single quote in the middle there, you need to actually put it <em>outside</em> the single-quoting for the shell. The pieces are the following, all concatenated together:</p>
<ul class="simple">
<li><p><code class="docutils literal notranslate"><span class="pre">$a=&quot;It</span></code></p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">\'</span></code></p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">s</span> <span class="pre">OK,</span> <span class="pre">I</span> <span class="pre">said,</span></code></p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">\'</span></code></p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">for</span> <span class="pre">now</span></code></p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">\'</span></code></p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">.</span></code></p></li>
</ul>
</div>
<div class="section" id="nr-is-not-consecutive-after-then-chaining">
<h2>NR is not consecutive after then-chaining<a class="headerlink" href="#nr-is-not-consecutive-after-then-chaining" title="Permalink to this headline">¶</a></h2>
<p>Given this input data:</p>
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> $ cat data/small
</span> a=pan,b=pan,i=1,x=0.3467901443380824,y=0.7268028627434533
 a=eks,b=pan,i=2,x=0.7586799647899636,y=0.5221511083334797
 a=wye,b=wye,i=3,x=0.20460330576630303,y=0.33831852551664776
 a=eks,b=wye,i=4,x=0.38139939387114097,y=0.13418874328430463
 a=wye,b=pan,i=5,x=0.5732889198020006,y=0.8636244699032729
</pre></div>
</div>
<p>why don’t I see <code class="docutils literal notranslate"><span class="pre">NR=1</span></code> and <code class="docutils literal notranslate"><span class="pre">NR=2</span></code> here??</p>
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> $ mlr filter &#39;$x &gt; 0.5&#39; then put &#39;$NR = NR&#39; data/small
</span> a=eks,b=pan,i=2,x=0.7586799647899636,y=0.5221511083334797,NR=2
 a=wye,b=pan,i=5,x=0.5732889198020006,y=0.8636244699032729,NR=5
</pre></div>
</div>
<p>The reason is that <code class="docutils literal notranslate"><span class="pre">NR</span></code> is computed for the original input records and isn’t dynamically updated. By contrast, <code class="docutils literal notranslate"><span class="pre">NF</span></code> is dynamically updated: it’s the number of fields in the current record, and if you add/remove a field, the value of <code class="docutils literal notranslate"><span class="pre">NF</span></code> will change:</p>
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> $ echo x=1,y=2,z=3 | mlr put &#39;$nf1 = NF; $u = 4; $nf2 = NF; unset $x,$y,$z; $nf3 = NF&#39;
</span> nf1=3,u=4,nf2=5,nf3=3
</pre></div>
</div>
<p><code class="docutils literal notranslate"><span class="pre">NR</span></code>, by contrast (and <code class="docutils literal notranslate"><span class="pre">FNR</span></code> as well), retains the value from the original input stream, and records may be dropped by a <code class="docutils literal notranslate"><span class="pre">filter</span></code> within a <code class="docutils literal notranslate"><span class="pre">then</span></code>-chain. To recover consecutive record numbers, you can use out-of-stream variables as follows:</p>
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> $ mlr --opprint --from data/small put &#39;
</span>   begin{ @nr1 = 0 }
   @nr1 += 1;
   $nr1 = @nr1
 &#39; \
 then filter &#39;$x&gt;0.5&#39; \
 then put &#39;
   begin{ @nr2 = 0 }
   @nr2 += 1;
   $nr2 = @nr2
 &#39;
 a   b   i x                  y                  nr1 nr2
 eks pan 2 0.7586799647899636 0.5221511083334797 2   1
 wye pan 5 0.5732889198020006 0.8636244699032729 5   2
</pre></div>
</div>
<p>Or, simply use <code class="docutils literal notranslate"><span class="pre">mlr</span> <span class="pre">cat</span> <span class="pre">-n</span></code>:</p>
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> $ mlr filter &#39;$x &gt; 0.5&#39; then cat -n data/small
</span> n=1,a=eks,b=pan,i=2,x=0.7586799647899636,y=0.5221511083334797
 n=2,a=wye,b=pan,i=5,x=0.5732889198020006,y=0.8636244699032729
</pre></div>
</div>
</div>
<div class="section" id="why-am-i-not-seeing-all-possible-joins-occur">
<h2>Why am I not seeing all possible joins occur?<a class="headerlink" href="#why-am-i-not-seeing-all-possible-joins-occur" title="Permalink to this headline">¶</a></h2>
<p><strong>This section describes behavior before Miller 5.1.0. As of 5.1.0, -u is the default.</strong></p>
<p>For example, the right file here has nine records, and the left file should add in the <code class="docutils literal notranslate"><span class="pre">hostname</span></code> column – so the join output should also have 9 records:</p>
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> $ mlr --icsvlite --opprint cat data/join-u-left.csv
</span> hostname              ipaddr
 nadir.east.our.org    10.3.1.18
 zenith.west.our.org   10.3.1.27
 apoapsis.east.our.org 10.4.5.94
</pre></div>
</div>
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> $ mlr --icsvlite --opprint cat data/join-u-right.csv
</span> ipaddr    timestamp  bytes
 10.3.1.27 1448762579 4568
 10.3.1.18 1448762578 8729
 10.4.5.94 1448762579 17445
 10.3.1.27 1448762589 12
 10.3.1.18 1448762588 44558
 10.4.5.94 1448762589 8899
 10.3.1.27 1448762599 0
 10.3.1.18 1448762598 73425
 10.4.5.94 1448762599 12200
</pre></div>
</div>
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> $ mlr --icsvlite --opprint join -s -j ipaddr -f data/join-u-left.csv data/join-u-right.csv
</span> ipaddr    hostname              timestamp  bytes
 10.3.1.27 zenith.west.our.org   1448762579 4568
 10.4.5.94 apoapsis.east.our.org 1448762579 17445
 10.4.5.94 apoapsis.east.our.org 1448762589 8899
 10.4.5.94 apoapsis.east.our.org 1448762599 12200
</pre></div>
</div>
<p>The issue is that Miller’s <code class="docutils literal notranslate"><span class="pre">join</span></code>, by default (before 5.1.0), took input sorted (lexically ascending) by the sort keys on both the left and right files.  This design decision was made intentionally to parallel the Unix/Linux system <code class="docutils literal notranslate"><span class="pre">join</span></code> command, which has the same semantics. The benefit of this default is that the joiner program can stream through the left and right files, needing to load neither entirely into memory. The drawback, of course, is that is requires sorted input.</p>
<p>The solution (besides pre-sorting the input files on the join keys) is to simply use <strong>mlr join -u</strong> (which is now the default). This loads the left file entirely into memory (while the right file is still streamed one line at a time) and does all possible joins without requiring sorted input:</p>
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> $ mlr --icsvlite --opprint join -u -j ipaddr -f data/join-u-left.csv data/join-u-right.csv
</span> ipaddr    hostname              timestamp  bytes
 10.3.1.27 zenith.west.our.org   1448762579 4568
 10.3.1.18 nadir.east.our.org    1448762578 8729
 10.4.5.94 apoapsis.east.our.org 1448762579 17445
 10.3.1.27 zenith.west.our.org   1448762589 12
 10.3.1.18 nadir.east.our.org    1448762588 44558
 10.4.5.94 apoapsis.east.our.org 1448762589 8899
 10.3.1.27 zenith.west.our.org   1448762599 0
 10.3.1.18 nadir.east.our.org    1448762598 73425
 10.4.5.94 apoapsis.east.our.org 1448762599 12200
</pre></div>
</div>
<p>General advice is to make sure the left-file is relatively small, e.g. containing name-to-number mappings, while saving large amounts of data for the right file.</p>
</div>
<div class="section" id="how-to-rectangularize-after-joins-with-unpaired">
<h2>How to rectangularize after joins with unpaired?<a class="headerlink" href="#how-to-rectangularize-after-joins-with-unpaired" title="Permalink to this headline">¶</a></h2>
<p>Suppose you have the following two data files:</p>
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>id,code
3,0000ff
2,00ff00
4,ff0000
</pre></div>
</div>
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>id,color
4,red
2,green
</pre></div>
</div>
<p>Joining on color the results are as expected:</p>
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> $ mlr --csv join -j id -f data/color-codes.csv data/color-names.csv
</span> id,code,color
 4,ff0000,red
 2,00ff00,green
</pre></div>
</div>
<p>However, if we ask for left-unpaireds, since there’s no <code class="docutils literal notranslate"><span class="pre">color</span></code> column, we get a row not having the same column names as the other:</p>
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> $ mlr --csv join --ul -j id -f data/color-codes.csv data/color-names.csv
</span> id,code,color
 4,ff0000,red
 2,00ff00,green

 id,code
 3,0000ff
</pre></div>
</div>
<p>To fix this, we can use <strong>unsparsify</strong>:</p>
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> $ mlr --csv join --ul -j id -f data/color-codes.csv then unsparsify --fill-with &quot;&quot; data/color-names.csv
</span> id,code,color
 4,ff0000,red
 2,00ff00,green
 3,0000ff,
</pre></div>
</div>
<p>Thanks to &#64;aborruso for the tip!</p>
</div>
</div>


            <div class="clearer"></div>
          </div>
        </div>
      </div>
      <div class="sphinxsidebar" role="navigation" aria-label="main navigation">
        <div class="sphinxsidebarwrapper">
  <h3><a href="index.html">Table of Contents</a></h3>
  <ul>
<li><a class="reference internal" href="#">FAQ</a><ul>
<li><a class="reference internal" href="#how-do-i-suppress-numeric-conversion">How do I suppress numeric conversion?</a></li>
<li><a class="reference internal" href="#how-do-i-examine-then-chaining">How do I examine then-chaining?</a></li>
<li><a class="reference internal" href="#how-can-i-filter-by-date">How can I filter by date?</a></li>
<li><a class="reference internal" href="#how-can-i-handle-commas-as-data-in-various-formats">How can I handle commas-as-data in various formats?</a></li>
<li><a class="reference internal" href="#how-can-i-handle-field-names-with-special-symbols-in-them">How can I handle field names with special symbols in them?</a></li>
<li><a class="reference internal" href="#how-to-escape-in-regexes">How to escape ‘?’ in regexes?</a></li>
<li><a class="reference internal" href="#how-can-i-put-single-quotes-into-strings">How can I put single-quotes into strings?</a></li>
<li><a class="reference internal" href="#nr-is-not-consecutive-after-then-chaining">NR is not consecutive after then-chaining</a></li>
<li><a class="reference internal" href="#why-am-i-not-seeing-all-possible-joins-occur">Why am I not seeing all possible joins occur?</a></li>
<li><a class="reference internal" href="#how-to-rectangularize-after-joins-with-unpaired">How to rectangularize after joins with unpaired?</a></li>
</ul>
</li>
</ul>

  <h4>Previous topic</h4>
  <p class="topless"><a href="joins.html"
                        title="previous chapter">Joins</a></p>
  <h4>Next topic</h4>
  <p class="topless"><a href="cookbook.html"
                        title="next chapter">Cookbook part 1: common patterns</a></p>
  <div role="note" aria-label="source link">
    <h3>This Page</h3>
    <ul class="this-page-menu">
      <li><a href="_sources/faq.rst.txt"
            rel="nofollow">Show Source</a></li>
    </ul>
   </div>
<div id="searchbox" style="display: none" role="search">
  <h3 id="searchlabel">Quick search</h3>
    <div class="searchformwrapper">
    <form class="search" action="search.html" method="get">
      <input type="text" name="q" aria-labelledby="searchlabel" />
      <input type="submit" value="Go" />
    </form>
    </div>
</div>
<script>$('#searchbox').show(0);</script>
        </div>
      </div>
      <div class="clearer"></div>
    </div>
    <div class="related" role="navigation" aria-label="related navigation">
      <h3>Navigation</h3>
      <ul>
        <li class="right" style="margin-right: 10px">
          <a href="genindex.html" title="General Index"
             >index</a></li>
        <li class="right" >
          <a href="cookbook.html" title="Cookbook part 1: common patterns"
             >next</a> |</li>
        <li class="right" >
          <a href="joins.html" title="Joins"
             >previous</a> |</li>
        <li class="nav-item nav-item-0"><a href="index.html">Miller 6.0.0-alpha documentation</a> &#187;</li>
        <li class="nav-item nav-item-this"><a href="">FAQ</a></li>
      </ul>
    </div>
    <div class="footer" role="contentinfo">
        &#169; Copyright 2020, John Kerl.
      Created using <a href="https://www.sphinx-doc.org/">Sphinx</a> 3.2.1.
    </div>
  </body>
</html>