mirror of
https://github.com/johnkerl/miller.git
synced 2026-01-24 02:36:15 +00:00
254 lines
No EOL
9.7 KiB
HTML
254 lines
No EOL
9.7 KiB
HTML
|
||
<!DOCTYPE html>
|
||
|
||
<html>
|
||
<head>
|
||
<meta charset="utf-8" />
|
||
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
|
||
<title>Log-processing examples — Miller 6.0.0-alpha documentation</title>
|
||
|
||
<link rel="stylesheet" href="_static/scrolls.css" type="text/css" />
|
||
<link rel="stylesheet" href="_static/pygments.css" type="text/css" />
|
||
<link rel="stylesheet" href="_static/print.css" type="text/css" />
|
||
|
||
<script id="documentation_options" data-url_root="./" src="_static/documentation_options.js"></script>
|
||
<script src="_static/jquery.js"></script>
|
||
<script src="_static/underscore.js"></script>
|
||
<script src="_static/doctools.js"></script>
|
||
<script src="_static/language_data.js"></script>
|
||
<script src="_static/theme_extras.js"></script>
|
||
<link rel="index" title="Index" href="genindex.html" />
|
||
<link rel="search" title="Search" href="search.html" />
|
||
<link rel="next" title="SQL examples" href="sql-examples.html" />
|
||
<link rel="prev" title="Data-diving examples" href="data-diving-examples.html" />
|
||
</head><body>
|
||
<div id="content">
|
||
<div class="header">
|
||
<h1 class="heading"><a href="index.html"
|
||
title="back to the documentation overview"><span>Log-processing examples</span></a></h1>
|
||
</div>
|
||
<div class="relnav" role="navigation" aria-label="related navigation">
|
||
<a href="data-diving-examples.html">« Data-diving examples</a> |
|
||
<a href="#">Log-processing examples</a>
|
||
| <a href="sql-examples.html">SQL examples »</a>
|
||
</div>
|
||
<div id="contentwrapper">
|
||
<div id="toc" role="navigation" aria-label="table of contents navigation">
|
||
<h3>Table of Contents</h3>
|
||
<ul>
|
||
<li><a class="reference internal" href="#">Log-processing examples</a><ul>
|
||
<li><a class="reference internal" href="#generating-and-aggregating-log-file-output">Generating and aggregating log-file output</a></li>
|
||
<li><a class="reference internal" href="#parsing-log-file-output">Parsing log-file output</a></li>
|
||
</ul>
|
||
</li>
|
||
</ul>
|
||
|
||
</div>
|
||
<div role="main">
|
||
|
||
<div class="section" id="log-processing-examples">
|
||
<h1>Log-processing examples<a class="headerlink" href="#log-processing-examples" title="Permalink to this headline">¶</a></h1>
|
||
<p>Another of my favorite use-cases for Miller is doing ad-hoc processing of log-file data. Here’s where DKVP format really shines: one, since the field names and field values are present on every line, every line stands on its own. That means you can <code class="docutils literal notranslate"><span class="pre">grep</span></code> or what have you. Also it means not every line needs to have the same list of field names (“schema”).</p>
|
||
<div class="section" id="generating-and-aggregating-log-file-output">
|
||
<h2>Generating and aggregating log-file output<a class="headerlink" href="#generating-and-aggregating-log-file-output" title="Permalink to this headline">¶</a></h2>
|
||
<p>Again, all the examples in the CSV section apply here – just change the input-format flags. But there’s more you can do when not all the records have the same shape.</p>
|
||
<p>Writing a program – in any language whatsoever – you can have it print out log lines as it goes along, with items for various events jumbled together. After the program has finished running you can sort it all out, filter it, analyze it, and learn from it.</p>
|
||
<p>Suppose your program has printed something like this (<a class="reference external" href="./log.txt">log.txt</a>):</p>
|
||
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> cat log.txt
|
||
</span> op=enter,time=1472819681
|
||
op=cache,type=A9,hit=0
|
||
op=cache,type=A4,hit=1
|
||
time=1472819690,batch_size=100,num_filtered=237
|
||
op=cache,type=A1,hit=1
|
||
op=cache,type=A9,hit=0
|
||
op=cache,type=A1,hit=1
|
||
op=cache,type=A9,hit=0
|
||
op=cache,type=A9,hit=0
|
||
op=cache,type=A1,hit=1
|
||
time=1472819705,batch_size=100,num_filtered=348
|
||
op=cache,type=A4,hit=1
|
||
op=cache,type=A9,hit=0
|
||
op=cache,type=A9,hit=0
|
||
op=cache,type=A9,hit=0
|
||
op=cache,type=A9,hit=0
|
||
op=cache,type=A4,hit=1
|
||
time=1472819713,batch_size=100,num_filtered=493
|
||
op=cache,type=A9,hit=1
|
||
op=cache,type=A1,hit=1
|
||
op=cache,type=A9,hit=0
|
||
op=cache,type=A9,hit=0
|
||
op=cache,type=A9,hit=1
|
||
time=1472819720,batch_size=100,num_filtered=554
|
||
op=cache,type=A1,hit=0
|
||
op=cache,type=A4,hit=1
|
||
op=cache,type=A9,hit=0
|
||
op=cache,type=A9,hit=0
|
||
op=cache,type=A9,hit=0
|
||
op=cache,type=A4,hit=0
|
||
op=cache,type=A4,hit=0
|
||
op=cache,type=A9,hit=0
|
||
time=1472819736,batch_size=100,num_filtered=612
|
||
op=cache,type=A1,hit=1
|
||
op=cache,type=A9,hit=0
|
||
op=cache,type=A9,hit=0
|
||
op=cache,type=A9,hit=0
|
||
op=cache,type=A9,hit=0
|
||
op=cache,type=A4,hit=1
|
||
op=cache,type=A1,hit=1
|
||
op=cache,type=A9,hit=0
|
||
op=cache,type=A9,hit=0
|
||
time=1472819742,batch_size=100,num_filtered=728
|
||
</pre></div>
|
||
</div>
|
||
<p>Each print statement simply contains local information: the current timestamp, whether a particular cache was hit or not, etc. Then using either the system <code class="docutils literal notranslate"><span class="pre">grep</span></code> command, or Miller’s <code class="docutils literal notranslate"><span class="pre">having-fields</span></code>, or <code class="docutils literal notranslate"><span class="pre">is_present</span></code>, we can pick out the parts we want and analyze them:</p>
|
||
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> grep op=cache log.txt \
|
||
</span><span class="hll"> | mlr --idkvp --opprint stats1 -a mean -f hit -g type then sort -f type
|
||
</span> type hit_mean
|
||
A1 0.8571428571428571
|
||
A4 0.7142857142857143
|
||
A9 0.09090909090909091
|
||
</pre></div>
|
||
</div>
|
||
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> mlr --from log.txt --opprint \
|
||
</span><span class="hll"> filter 'is_present($batch_size)' \
|
||
</span><span class="hll"> then step -a delta -f time,num_filtered \
|
||
</span><span class="hll"> then sec2gmt time
|
||
</span> time batch_size num_filtered time_delta num_filtered_delta
|
||
2016-09-02T12:34:50Z 100 237 0 0
|
||
2016-09-02T12:35:05Z 100 348 15 111
|
||
2016-09-02T12:35:13Z 100 493 8 145
|
||
2016-09-02T12:35:20Z 100 554 7 61
|
||
2016-09-02T12:35:36Z 100 612 16 58
|
||
2016-09-02T12:35:42Z 100 728 6 116
|
||
</pre></div>
|
||
</div>
|
||
<p>Alternatively, we can simply group the similar data for a better look:</p>
|
||
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> mlr --opprint group-like log.txt
|
||
</span> op time
|
||
enter 1472819681
|
||
|
||
op type hit
|
||
cache A9 0
|
||
cache A4 1
|
||
cache A1 1
|
||
cache A9 0
|
||
cache A1 1
|
||
cache A9 0
|
||
cache A9 0
|
||
cache A1 1
|
||
cache A4 1
|
||
cache A9 0
|
||
cache A9 0
|
||
cache A9 0
|
||
cache A9 0
|
||
cache A4 1
|
||
cache A9 1
|
||
cache A1 1
|
||
cache A9 0
|
||
cache A9 0
|
||
cache A9 1
|
||
cache A1 0
|
||
cache A4 1
|
||
cache A9 0
|
||
cache A9 0
|
||
cache A9 0
|
||
cache A4 0
|
||
cache A4 0
|
||
cache A9 0
|
||
cache A1 1
|
||
cache A9 0
|
||
cache A9 0
|
||
cache A9 0
|
||
cache A9 0
|
||
cache A4 1
|
||
cache A1 1
|
||
cache A9 0
|
||
cache A9 0
|
||
|
||
time batch_size num_filtered
|
||
1472819690 100 237
|
||
1472819705 100 348
|
||
1472819713 100 493
|
||
1472819720 100 554
|
||
1472819736 100 612
|
||
1472819742 100 728
|
||
</pre></div>
|
||
</div>
|
||
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> mlr --opprint group-like then sec2gmt time log.txt
|
||
</span> op time
|
||
enter 2016-09-02T12:34:41Z
|
||
|
||
op type hit
|
||
cache A9 0
|
||
cache A4 1
|
||
cache A1 1
|
||
cache A9 0
|
||
cache A1 1
|
||
cache A9 0
|
||
cache A9 0
|
||
cache A1 1
|
||
cache A4 1
|
||
cache A9 0
|
||
cache A9 0
|
||
cache A9 0
|
||
cache A9 0
|
||
cache A4 1
|
||
cache A9 1
|
||
cache A1 1
|
||
cache A9 0
|
||
cache A9 0
|
||
cache A9 1
|
||
cache A1 0
|
||
cache A4 1
|
||
cache A9 0
|
||
cache A9 0
|
||
cache A9 0
|
||
cache A4 0
|
||
cache A4 0
|
||
cache A9 0
|
||
cache A1 1
|
||
cache A9 0
|
||
cache A9 0
|
||
cache A9 0
|
||
cache A9 0
|
||
cache A4 1
|
||
cache A1 1
|
||
cache A9 0
|
||
cache A9 0
|
||
|
||
time batch_size num_filtered
|
||
2016-09-02T12:34:50Z 100 237
|
||
2016-09-02T12:35:05Z 100 348
|
||
2016-09-02T12:35:13Z 100 493
|
||
2016-09-02T12:35:20Z 100 554
|
||
2016-09-02T12:35:36Z 100 612
|
||
2016-09-02T12:35:42Z 100 728
|
||
</pre></div>
|
||
</div>
|
||
</div>
|
||
<div class="section" id="parsing-log-file-output">
|
||
<h2>Parsing log-file output<a class="headerlink" href="#parsing-log-file-output" title="Permalink to this headline">¶</a></h2>
|
||
<p>This, of course, depends highly on what’s in your log files. But, as an example, suppose you have log-file lines such as</p>
|
||
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>2015-10-08 08:29:09,445 INFO com.company.path.to.ClassName @ [sometext] various/sorts/of data {& punctuation} hits=1 status=0 time=2.378
|
||
</pre></div>
|
||
</div>
|
||
<p>I prefer to pre-filter with <code class="docutils literal notranslate"><span class="pre">grep</span></code> and/or <code class="docutils literal notranslate"><span class="pre">sed</span></code> to extract the structured text, then hand that to Miller. Example:</p>
|
||
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span><span class="hll"> grep 'various sorts' *.log \
|
||
</span><span class="hll"> | sed 's/.*} //' \
|
||
</span><span class="hll"> | mlr --fs space --repifs --oxtab stats1 -a min,p10,p50,p90,max -f time -g status
|
||
</span></pre></div>
|
||
</div>
|
||
</div>
|
||
</div>
|
||
|
||
|
||
</div>
|
||
</div>
|
||
</div>
|
||
|
||
<div class="footer" role="contentinfo">
|
||
© Copyright 2021, John Kerl.
|
||
Created using <a href="https://www.sphinx-doc.org/">Sphinx</a> 3.2.1.
|
||
</div>
|
||
</body>
|
||
</html> |