diff --git a/doc/10-min.html b/doc/10-min.html index 8b58f5857..7696ab938 100644 --- a/doc/10-min.html +++ b/doc/10-min.html @@ -115,7 +115,6 @@ Miller commands were run with pretty-print-tabular output format.
• Cookbook part 1
• Cookbook part 2
• Cookbook part 3 -
• Cookbook part 4
• Data-diving examples
• Manpage
• Reference diff --git a/doc/build.html b/doc/build.html index 01ece4c72..0bd74910a 100644 --- a/doc/build.html +++ b/doc/build.html @@ -115,7 +115,6 @@ Miller commands were run with pretty-print-tabular output format.
• Cookbook part 1
• Cookbook part 2
• Cookbook part 3 -
• Cookbook part 4
• Data-diving examples
• Manpage
• Reference diff --git a/doc/contact.html b/doc/contact.html index e15b7d91b..b2714ee1c 100644 --- a/doc/contact.html +++ b/doc/contact.html @@ -115,7 +115,6 @@ Miller commands were run with pretty-print-tabular output format.
• Cookbook part 1
• Cookbook part 2
• Cookbook part 3 -
• Cookbook part 4
• Data-diving examples
• Manpage
• Reference diff --git a/doc/content-for-cookbook3.html b/doc/content-for-cookbook3.html index 9e5e3fe27..4050aeb89 100644 --- a/doc/content-for-cookbook3.html +++ b/doc/content-for-cookbook3.html @@ -1,7 +1,7 @@

-Comparing verbs and DSL +Stats with and without out-of-stream variables
POKI_PUT_TOC_HERE @@ -15,6 +15,117 @@ POKI_PUT_TOC_HERE
-

To do. +

One of Miller’s strengths is its compact notation: for example, given input of the form + +POKI_RUN_COMMAND{{head -n 5 ../data/medium}}HERE + +you can simply do + +POKI_RUN_COMMAND{{mlr --oxtab stats1 -a sum -f x ../data/medium}}HERE + +or + +POKI_RUN_COMMAND{{mlr --opprint stats1 -a sum -f x -g b ../data/medium}}HERE + +rather than the more tedious + +POKI_INCLUDE_AND_RUN_ESCAPED(oosvar-example-sum.sh)HERE + +or + +POKI_INCLUDE_AND_RUN_ESCAPED(oosvar-example-sum-grouped.sh)HERE + +

The former (mlr stats1 et al.) has the advantages of being easier +to type, being less error-prone to type, and running faster. + +

Nonetheless, out-of-stream variables (which I whimsically call +oosvars), begin/end blocks, and emit statements give you the ability to +implement logic — if you wish to do so — which isn’t present +in other Miller verbs. (If you find yourself often using the same +out-of-stream-variable logic over and over, please file a request at https://github.com/johnkerl/miller/issues +to get it implemented directly in C as a Miller verb of its own.) + +

The following examples compute some things using oosvars which are already +computable using Miller verbs, by way of providing food for thought. + +

+ +

Mean without/with oosvars

+ +
+ +POKI_RUN_COMMAND{{mlr --opprint stats1 -a mean -f x data/medium}}HERE +POKI_INCLUDE_AND_RUN_ESCAPED(data/mean-with-oosvars.sh)HERE + +
+ +

Keyed mean without/with oosvars

+ +
+ +POKI_RUN_COMMAND{{mlr --opprint stats1 -a mean -f x -g a,b data/medium}}HERE +POKI_INCLUDE_AND_RUN_ESCAPED(data/keyed-mean-with-oosvars.sh)HERE + +
+ +

Variance and standard deviation without/with oosvars

+ +
+ +POKI_RUN_COMMAND{{mlr --oxtab stats1 -a count,sum,mean,var,stddev -f x data/medium}}HERE +POKI_RUN_COMMAND{{cat variance.mlr}}HERE +POKI_RUN_COMMAND{{mlr --oxtab put -q -f variance.mlr data/medium}}HERE + +You can also do this keyed, of course, imitating the keyed-mean example above. + +
+ +

Min/max without/with oosvars

+ +
+ +POKI_RUN_COMMAND{{mlr --oxtab stats1 -a min,max -f x data/medium}}HERE + +POKI_RUN_COMMAND{{mlr --oxtab put -q '@x_min = min(@x_min, $x); @x_max = max(@x_max, $x); end{emitf @x_min, @x_max}' data/medium}}HERE + +
+ +

Keyed min/max without/with oosvars

+ +
+ +POKI_RUN_COMMAND{{mlr --opprint stats1 -a min,max -f x -g a data/medium}}HERE +POKI_INCLUDE_AND_RUN_ESCAPED(data/keyed-min-max-with-oosvars.sh)HERE + +
+ +

Delta without/with oosvars

+ +
+ +POKI_RUN_COMMAND{{mlr --opprint step -a delta -f x data/small}}HERE + +POKI_RUN_COMMAND{{mlr --opprint put '$x_delta = is_present(@last) ? $x - @last : 0; @last = $x' data/small}}HERE + +
+ +

Keyed delta without/with oosvars

+ +
+ +POKI_RUN_COMMAND{{mlr --opprint step -a delta -f x -g a data/small}}HERE + +POKI_RUN_COMMAND{{mlr --opprint put '$x_delta = is_present(@last[$a]) ? $x - @last[$a] : 0; @last[$a]=$x' data/small}}HERE + +
+ +

Exponentially weighted moving averages without/with oosvars

+ +
+ +POKI_INCLUDE_AND_RUN_ESCAPED(verb-example-ewma.sh)HERE + +POKI_INCLUDE_AND_RUN_ESCAPED(oosvar-example-ewma.sh)HERE
diff --git a/doc/content-for-cookbook4.html b/doc/content-for-cookbook4.html deleted file mode 100644 index 4050aeb89..000000000 --- a/doc/content-for-cookbook4.html +++ /dev/null @@ -1,131 +0,0 @@ - -

-

-Stats with and without out-of-stream variables -
- -POKI_PUT_TOC_HERE - -

- - - - -

Overview

- -
- -

One of Miller’s strengths is its compact notation: for example, given input of the form - -POKI_RUN_COMMAND{{head -n 5 ../data/medium}}HERE - -you can simply do - -POKI_RUN_COMMAND{{mlr --oxtab stats1 -a sum -f x ../data/medium}}HERE - -or - -POKI_RUN_COMMAND{{mlr --opprint stats1 -a sum -f x -g b ../data/medium}}HERE - -rather than the more tedious - -POKI_INCLUDE_AND_RUN_ESCAPED(oosvar-example-sum.sh)HERE - -or - -POKI_INCLUDE_AND_RUN_ESCAPED(oosvar-example-sum-grouped.sh)HERE - -

The former (mlr stats1 et al.) has the advantages of being easier -to type, being less error-prone to type, and running faster. - -

Nonetheless, out-of-stream variables (which I whimsically call -oosvars), begin/end blocks, and emit statements give you the ability to -implement logic — if you wish to do so — which isn’t present -in other Miller verbs. (If you find yourself often using the same -out-of-stream-variable logic over and over, please file a request at https://github.com/johnkerl/miller/issues -to get it implemented directly in C as a Miller verb of its own.) - -

The following examples compute some things using oosvars which are already -computable using Miller verbs, by way of providing food for thought. - -

- -

Mean without/with oosvars

- -
- -POKI_RUN_COMMAND{{mlr --opprint stats1 -a mean -f x data/medium}}HERE -POKI_INCLUDE_AND_RUN_ESCAPED(data/mean-with-oosvars.sh)HERE - -
- -

Keyed mean without/with oosvars

- -
- -POKI_RUN_COMMAND{{mlr --opprint stats1 -a mean -f x -g a,b data/medium}}HERE -POKI_INCLUDE_AND_RUN_ESCAPED(data/keyed-mean-with-oosvars.sh)HERE - -
- -

Variance and standard deviation without/with oosvars

- -
- -POKI_RUN_COMMAND{{mlr --oxtab stats1 -a count,sum,mean,var,stddev -f x data/medium}}HERE -POKI_RUN_COMMAND{{cat variance.mlr}}HERE -POKI_RUN_COMMAND{{mlr --oxtab put -q -f variance.mlr data/medium}}HERE - -You can also do this keyed, of course, imitating the keyed-mean example above. - -
- -

Min/max without/with oosvars

- -
- -POKI_RUN_COMMAND{{mlr --oxtab stats1 -a min,max -f x data/medium}}HERE - -POKI_RUN_COMMAND{{mlr --oxtab put -q '@x_min = min(@x_min, $x); @x_max = max(@x_max, $x); end{emitf @x_min, @x_max}' data/medium}}HERE - -
- -

Keyed min/max without/with oosvars

- -
- -POKI_RUN_COMMAND{{mlr --opprint stats1 -a min,max -f x -g a data/medium}}HERE -POKI_INCLUDE_AND_RUN_ESCAPED(data/keyed-min-max-with-oosvars.sh)HERE - -
- -

Delta without/with oosvars

- -
- -POKI_RUN_COMMAND{{mlr --opprint step -a delta -f x data/small}}HERE - -POKI_RUN_COMMAND{{mlr --opprint put '$x_delta = is_present(@last) ? $x - @last : 0; @last = $x' data/small}}HERE - -
- -

Keyed delta without/with oosvars

- -
- -POKI_RUN_COMMAND{{mlr --opprint step -a delta -f x -g a data/small}}HERE - -POKI_RUN_COMMAND{{mlr --opprint put '$x_delta = is_present(@last[$a]) ? $x - @last[$a] : 0; @last[$a]=$x' data/small}}HERE - -
- -

Exponentially weighted moving averages without/with oosvars

- -
- -POKI_INCLUDE_AND_RUN_ESCAPED(verb-example-ewma.sh)HERE - -POKI_INCLUDE_AND_RUN_ESCAPED(oosvar-example-ewma.sh)HERE - -
diff --git a/doc/content-for-originality.html b/doc/content-for-originality.html index 9999210e7..b5be809f3 100644 --- a/doc/content-for-originality.html +++ b/doc/content-for-originality.html @@ -45,6 +45,12 @@ Miller’s added values include: jq does for JSON. If you’re not already familiar with jq, please check it out!. +

What about similar tools? +Here’s a comprehensive list: +https://github.com/dbohdan/structured-text-tools. +It doesn’t mention rows so here’s a plug for that as well. +As it turns out, I learned about most of these after writing Miller. +

What about DOTADIW? One of the key points of the Unix philosophy is that a tool should do one thing and do it well. Hence sort and diff --git a/doc/cookbook.html b/doc/cookbook.html index c8069848e..95cd45a08 100644 --- a/doc/cookbook.html +++ b/doc/cookbook.html @@ -115,7 +115,6 @@ Miller commands were run with pretty-print-tabular output format.
• Cookbook part 1
• Cookbook part 2
• Cookbook part 3 -
• Cookbook part 4
• Data-diving examples
• Manpage
• Reference diff --git a/doc/cookbook2.html b/doc/cookbook2.html index 7ccebe000..2142c20c4 100644 --- a/doc/cookbook2.html +++ b/doc/cookbook2.html @@ -115,7 +115,6 @@ Miller commands were run with pretty-print-tabular output format.
• Cookbook part 1
• Cookbook part 2
• Cookbook part 3 -
• Cookbook part 4
• Data-diving examples
• Manpage
• Reference diff --git a/doc/cookbook3.html b/doc/cookbook3.html index f76212e42..9084a5450 100644 --- a/doc/cookbook3.html +++ b/doc/cookbook3.html @@ -115,7 +115,6 @@ Miller commands were run with pretty-print-tabular output format.
• Cookbook part 1
• Cookbook part 2
• Cookbook part 3 -
• Cookbook part 4
• Data-diving examples
• Manpage
• Reference @@ -188,12 +187,20 @@ Miller commands were run with pretty-print-tabular output format.

-Comparing verbs and DSL +Stats with and without out-of-stream variables
Contents:
• Overview
+• Mean without/with oosvars
+• Keyed mean without/with oosvars
+• Variance and standard deviation without/with oosvars
+• Min/max without/with oosvars
+• Keyed min/max without/with oosvars
+• Delta without/with oosvars
+• Keyed delta without/with oosvars
+• Exponentially weighted moving averages without/with oosvars

@@ -206,7 +213,433 @@ Miller commands were run with pretty-print-tabular output format.

-

To do. +

One of Miller’s strengths is its compact notation: for example, given input of the form + +

+

+
+$ head -n 5 ../data/medium
+a=pan,b=pan,i=1,x=0.3467901443380824,y=0.7268028627434533
+a=eks,b=pan,i=2,x=0.7586799647899636,y=0.5221511083334797
+a=wye,b=wye,i=3,x=0.20460330576630303,y=0.33831852551664776
+a=eks,b=wye,i=4,x=0.38139939387114097,y=0.13418874328430463
+a=wye,b=pan,i=5,x=0.5732889198020006,y=0.8636244699032729
+
+
+

+ +you can simply do + +

+

+
+$ mlr --oxtab stats1 -a sum -f x ../data/medium
+x_sum 4986.019682
+
+
+

+ +or + +

+

+
+$ mlr --opprint stats1 -a sum -f x -g b ../data/medium
+b   x_sum
+pan 965.763670
+wye 1023.548470
+zee 979.742016
+eks 1016.772857
+hat 1000.192668
+
+
+

+ +rather than the more tedious + +

+

+
+$ mlr --oxtab put -q '
+  @x_sum += $x;
+  end {
+    emit @x_sum
+  }
+' data/medium
+x_sum 4986.019682
+
+
+

+ +or + +

+

+
+$ mlr --opprint put -q '
+  @x_sum[$b] += $x;
+  end {
+    emit @x_sum, "b"
+  }
+' data/medium
+b   x_sum
+pan 965.763670
+wye 1023.548470
+zee 979.742016
+eks 1016.772857
+hat 1000.192668
+
+
+

+ +

The former (mlr stats1 et al.) has the advantages of being easier +to type, being less error-prone to type, and running faster. + +

Nonetheless, out-of-stream variables (which I whimsically call +oosvars), begin/end blocks, and emit statements give you the ability to +implement logic — if you wish to do so — which isn’t present +in other Miller verbs. (If you find yourself often using the same +out-of-stream-variable logic over and over, please file a request at https://github.com/johnkerl/miller/issues +to get it implemented directly in C as a Miller verb of its own.) + +

The following examples compute some things using oosvars which are already +computable using Miller verbs, by way of providing food for thought. + +

+ +

Mean without/with oosvars

+ +
+ +

+

+
+$ mlr --opprint stats1 -a mean -f x data/medium
+x_mean
+0.498602
+
+
+

+

+

+
+$ mlr --opprint put -q '
+  @x_sum += $x;
+  @x_count += 1;
+  end {
+    @x_mean = @x_sum / @x_count;
+    emit @x_mean
+  }
+' data/medium
+x_mean
+0.498602
+
+
+

+ +

+ +

Keyed mean without/with oosvars

+ +
+ +

+

+
+$ mlr --opprint stats1 -a mean -f x -g a,b data/medium
+a   b   x_mean
+pan pan 0.513314
+eks pan 0.485076
+wye wye 0.491501
+eks wye 0.483895
+wye pan 0.499612
+zee pan 0.519830
+eks zee 0.495463
+zee wye 0.514267
+hat wye 0.493813
+pan wye 0.502362
+zee eks 0.488393
+hat zee 0.509999
+hat eks 0.485879
+wye hat 0.497730
+pan eks 0.503672
+eks eks 0.522799
+hat hat 0.479931
+hat pan 0.464336
+zee zee 0.512756
+pan hat 0.492141
+pan zee 0.496604
+zee hat 0.467726
+wye zee 0.505907
+eks hat 0.500679
+wye eks 0.530604
+
+
+

+

+

+
+$ mlr --opprint put -q '
+  @x_sum[$a][$b] += $x;
+  @x_count[$a][$b] += 1;
+  end{
+    for ((a, b), v in @x_sum) {
+      @x_mean[a][b] = @x_sum[a][b] / @x_count[a][b];
+    }
+    emit @x_mean, "a", "b"
+  }
+' data/medium
+a   b   x_mean
+pan pan 0.513314
+pan wye 0.502362
+pan eks 0.503672
+pan hat 0.492141
+pan zee 0.496604
+eks pan 0.485076
+eks wye 0.483895
+eks zee 0.495463
+eks eks 0.522799
+eks hat 0.500679
+wye wye 0.491501
+wye pan 0.499612
+wye hat 0.497730
+wye zee 0.505907
+wye eks 0.530604
+zee pan 0.519830
+zee wye 0.514267
+zee eks 0.488393
+zee zee 0.512756
+zee hat 0.467726
+hat wye 0.493813
+hat zee 0.509999
+hat eks 0.485879
+hat hat 0.479931
+hat pan 0.464336
+
+
+

+ +

+ +

Variance and standard deviation without/with oosvars

+ +
+ +

+

+
+$ mlr --oxtab stats1 -a count,sum,mean,var,stddev -f x data/medium
+x_count  10000
+x_sum    4986.019682
+x_mean   0.498602
+x_var    0.084270
+x_stddev 0.290293
+
+
+

+

+

+
+$ cat variance.mlr
+@n += 1;
+@sumx += $x;
+@sumx2 += $x**2;
+end {
+  @mean = @sumx / @n;
+  @var = (@sumx2 - @mean * (2 * @sumx - @n * @mean)) / (@n - 1);
+  @stddev = sqrt(@var);
+  emitf @n, @sumx, @sumx2, @mean, @var, @stddev
+}
+
+
+

+

+

+
+$ mlr --oxtab put -q -f variance.mlr data/medium
+n      10000
+sumx   4986.019682
+sumx2  3328.652400
+mean   0.498602
+var    0.084270
+stddev 0.290293
+
+
+

+ +You can also do this keyed, of course, imitating the keyed-mean example above. + +

+ +

Min/max without/with oosvars

+ +
+ +

+

+
+$ mlr --oxtab stats1 -a min,max -f x data/medium
+x_min 0.000045
+x_max 0.999953
+
+
+

+ +

+

+
+$ mlr --oxtab put -q '@x_min = min(@x_min, $x); @x_max = max(@x_max, $x); end{emitf @x_min, @x_max}' data/medium
+x_min 0.000045
+x_max 0.999953
+
+
+

+ +

+ +

Keyed min/max without/with oosvars

+ +
+ +

+

+
+$ mlr --opprint stats1 -a min,max -f x -g a data/medium
+a   x_min    x_max
+pan 0.000204 0.999403
+eks 0.000692 0.998811
+wye 0.000187 0.999823
+zee 0.000549 0.999490
+hat 0.000045 0.999953
+
+
+

+

+

+
+$ mlr --opprint --from data/medium put -q '
+  @min[$a] = min(@min[$a], $x);
+  @max[$a] = max(@max[$a], $x);
+  end{
+    emit (@min, @max), "a";
+  }
+'
+a   min      max
+pan 0.000204 0.999403
+eks 0.000692 0.998811
+wye 0.000187 0.999823
+zee 0.000549 0.999490
+hat 0.000045 0.999953
+
+
+

+ +

+ +

Delta without/with oosvars

+ +
+ +

+

+
+$ mlr --opprint step -a delta -f x data/small
+a   b   i x                   y                   x_delta
+pan pan 1 0.3467901443380824  0.7268028627434533  0
+eks pan 2 0.7586799647899636  0.5221511083334797  0.411890
+wye wye 3 0.20460330576630303 0.33831852551664776 -0.554077
+eks wye 4 0.38139939387114097 0.13418874328430463 0.176796
+wye pan 5 0.5732889198020006  0.8636244699032729  0.191890
+
+
+

+ +

+

+
+$ mlr --opprint put '$x_delta = is_present(@last) ? $x - @last : 0; @last = $x' data/small
+a   b   i x                   y                   x_delta
+pan pan 1 0.3467901443380824  0.7268028627434533  0
+eks pan 2 0.7586799647899636  0.5221511083334797  0.411890
+wye wye 3 0.20460330576630303 0.33831852551664776 -0.554077
+eks wye 4 0.38139939387114097 0.13418874328430463 0.176796
+wye pan 5 0.5732889198020006  0.8636244699032729  0.191890
+
+
+

+ +

+ +

Keyed delta without/with oosvars

+ +
+ +

+

+
+$ mlr --opprint step -a delta -f x -g a data/small
+a   b   i x                   y                   x_delta
+pan pan 1 0.3467901443380824  0.7268028627434533  0
+eks pan 2 0.7586799647899636  0.5221511083334797  0
+wye wye 3 0.20460330576630303 0.33831852551664776 0
+eks wye 4 0.38139939387114097 0.13418874328430463 -0.377281
+wye pan 5 0.5732889198020006  0.8636244699032729  0.368686
+
+
+

+ +

+

+
+$ mlr --opprint put '$x_delta = is_present(@last[$a]) ? $x - @last[$a] : 0; @last[$a]=$x' data/small
+a   b   i x                   y                   x_delta
+pan pan 1 0.3467901443380824  0.7268028627434533  0
+eks pan 2 0.7586799647899636  0.5221511083334797  0
+wye wye 3 0.20460330576630303 0.33831852551664776 0
+eks wye 4 0.38139939387114097 0.13418874328430463 -0.377281
+wye pan 5 0.5732889198020006  0.8636244699032729  0.368686
+
+
+

+ +

+ +

Exponentially weighted moving averages without/with oosvars

+ +
+ +

+

+
+$ mlr --opprint step -a ewma -d 0.1 -f x data/small
+a   b   i x                   y                   x_ewma_0.1
+pan pan 1 0.3467901443380824  0.7268028627434533  0.346790
+eks pan 2 0.7586799647899636  0.5221511083334797  0.387979
+wye wye 3 0.20460330576630303 0.33831852551664776 0.369642
+eks wye 4 0.38139939387114097 0.13418874328430463 0.370817
+wye pan 5 0.5732889198020006  0.8636244699032729  0.391064
+
+
+

+ +

+

+
+$ mlr --opprint put '
+  begin{ @a=0.1 };
+  $e = NR==1 ? $x : @a * $x + (1 - @a) * @e;
+  @e=$e
+' data/small
+a   b   i x                   y                   e
+pan pan 1 0.3467901443380824  0.7268028627434533  0.346790
+eks pan 2 0.7586799647899636  0.5221511083334797  0.387979
+wye wye 3 0.20460330576630303 0.33831852551664776 0.369642
+eks wye 4 0.38139939387114097 0.13418874328430463 0.370817
+wye pan 5 0.5732889198020006  0.8636244699032729  0.391064
+
+
+

diff --git a/doc/cookbook4.html b/doc/cookbook4.html deleted file mode 100644 index 127ce8891..000000000 --- a/doc/cookbook4.html +++ /dev/null @@ -1,651 +0,0 @@ - - - - - - - - - - - - Cookbook part 4 - - - - - - - - - - - - - - - - - - - - - - - - -
- - - - - - - -
-
Cookbook part 4
-

- - - -

-

-Stats with and without out-of-stream variables -
- - -

- -

- - - - -

Overview

- -
- -

One of Miller’s strengths is its compact notation: for example, given input of the form - -

-

-
-$ head -n 5 ../data/medium
-a=pan,b=pan,i=1,x=0.3467901443380824,y=0.7268028627434533
-a=eks,b=pan,i=2,x=0.7586799647899636,y=0.5221511083334797
-a=wye,b=wye,i=3,x=0.20460330576630303,y=0.33831852551664776
-a=eks,b=wye,i=4,x=0.38139939387114097,y=0.13418874328430463
-a=wye,b=pan,i=5,x=0.5732889198020006,y=0.8636244699032729
-
-
-

- -you can simply do - -

-

-
-$ mlr --oxtab stats1 -a sum -f x ../data/medium
-x_sum 4986.019682
-
-
-

- -or - -

-

-
-$ mlr --opprint stats1 -a sum -f x -g b ../data/medium
-b   x_sum
-pan 965.763670
-wye 1023.548470
-zee 979.742016
-eks 1016.772857
-hat 1000.192668
-
-
-

- -rather than the more tedious - -

-

-
-$ mlr --oxtab put -q '
-  @x_sum += $x;
-  end {
-    emit @x_sum
-  }
-' data/medium
-x_sum 4986.019682
-
-
-

- -or - -

-

-
-$ mlr --opprint put -q '
-  @x_sum[$b] += $x;
-  end {
-    emit @x_sum, "b"
-  }
-' data/medium
-b   x_sum
-pan 965.763670
-wye 1023.548470
-zee 979.742016
-eks 1016.772857
-hat 1000.192668
-
-
-

- -

The former (mlr stats1 et al.) has the advantages of being easier -to type, being less error-prone to type, and running faster. - -

Nonetheless, out-of-stream variables (which I whimsically call -oosvars), begin/end blocks, and emit statements give you the ability to -implement logic — if you wish to do so — which isn’t present -in other Miller verbs. (If you find yourself often using the same -out-of-stream-variable logic over and over, please file a request at https://github.com/johnkerl/miller/issues -to get it implemented directly in C as a Miller verb of its own.) - -

The following examples compute some things using oosvars which are already -computable using Miller verbs, by way of providing food for thought. - -

- -

Mean without/with oosvars

- -
- -

-

-
-$ mlr --opprint stats1 -a mean -f x data/medium
-x_mean
-0.498602
-
-
-

-

-

-
-$ mlr --opprint put -q '
-  @x_sum += $x;
-  @x_count += 1;
-  end {
-    @x_mean = @x_sum / @x_count;
-    emit @x_mean
-  }
-' data/medium
-x_mean
-0.498602
-
-
-

- -

- -

Keyed mean without/with oosvars

- -
- -

-

-
-$ mlr --opprint stats1 -a mean -f x -g a,b data/medium
-a   b   x_mean
-pan pan 0.513314
-eks pan 0.485076
-wye wye 0.491501
-eks wye 0.483895
-wye pan 0.499612
-zee pan 0.519830
-eks zee 0.495463
-zee wye 0.514267
-hat wye 0.493813
-pan wye 0.502362
-zee eks 0.488393
-hat zee 0.509999
-hat eks 0.485879
-wye hat 0.497730
-pan eks 0.503672
-eks eks 0.522799
-hat hat 0.479931
-hat pan 0.464336
-zee zee 0.512756
-pan hat 0.492141
-pan zee 0.496604
-zee hat 0.467726
-wye zee 0.505907
-eks hat 0.500679
-wye eks 0.530604
-
-
-

-

-

-
-$ mlr --opprint put -q '
-  @x_sum[$a][$b] += $x;
-  @x_count[$a][$b] += 1;
-  end{
-    for ((a, b), v in @x_sum) {
-      @x_mean[a][b] = @x_sum[a][b] / @x_count[a][b];
-    }
-    emit @x_mean, "a", "b"
-  }
-' data/medium
-a   b   x_mean
-pan pan 0.513314
-pan wye 0.502362
-pan eks 0.503672
-pan hat 0.492141
-pan zee 0.496604
-eks pan 0.485076
-eks wye 0.483895
-eks zee 0.495463
-eks eks 0.522799
-eks hat 0.500679
-wye wye 0.491501
-wye pan 0.499612
-wye hat 0.497730
-wye zee 0.505907
-wye eks 0.530604
-zee pan 0.519830
-zee wye 0.514267
-zee eks 0.488393
-zee zee 0.512756
-zee hat 0.467726
-hat wye 0.493813
-hat zee 0.509999
-hat eks 0.485879
-hat hat 0.479931
-hat pan 0.464336
-
-
-

- -

- -

Variance and standard deviation without/with oosvars

- -
- -

-

-
-$ mlr --oxtab stats1 -a count,sum,mean,var,stddev -f x data/medium
-x_count  10000
-x_sum    4986.019682
-x_mean   0.498602
-x_var    0.084270
-x_stddev 0.290293
-
-
-

-

-

-
-$ cat variance.mlr
-@n += 1;
-@sumx += $x;
-@sumx2 += $x**2;
-end {
-  @mean = @sumx / @n;
-  @var = (@sumx2 - @mean * (2 * @sumx - @n * @mean)) / (@n - 1);
-  @stddev = sqrt(@var);
-  emitf @n, @sumx, @sumx2, @mean, @var, @stddev
-}
-
-
-

-

-

-
-$ mlr --oxtab put -q -f variance.mlr data/medium
-n      10000
-sumx   4986.019682
-sumx2  3328.652400
-mean   0.498602
-var    0.084270
-stddev 0.290293
-
-
-

- -You can also do this keyed, of course, imitating the keyed-mean example above. - -

- -

Min/max without/with oosvars

- -
- -

-

-
-$ mlr --oxtab stats1 -a min,max -f x data/medium
-x_min 0.000045
-x_max 0.999953
-
-
-

- -

-

-
-$ mlr --oxtab put -q '@x_min = min(@x_min, $x); @x_max = max(@x_max, $x); end{emitf @x_min, @x_max}' data/medium
-x_min 0.000045
-x_max 0.999953
-
-
-

- -

- -

Keyed min/max without/with oosvars

- -
- -

-

-
-$ mlr --opprint stats1 -a min,max -f x -g a data/medium
-a   x_min    x_max
-pan 0.000204 0.999403
-eks 0.000692 0.998811
-wye 0.000187 0.999823
-zee 0.000549 0.999490
-hat 0.000045 0.999953
-
-
-

-

-

-
-$ mlr --opprint --from data/medium put -q '
-  @min[$a] = min(@min[$a], $x);
-  @max[$a] = max(@max[$a], $x);
-  end{
-    emit (@min, @max), "a";
-  }
-'
-a   min      max
-pan 0.000204 0.999403
-eks 0.000692 0.998811
-wye 0.000187 0.999823
-zee 0.000549 0.999490
-hat 0.000045 0.999953
-
-
-

- -

- -

Delta without/with oosvars

- -
- -

-

-
-$ mlr --opprint step -a delta -f x data/small
-a   b   i x                   y                   x_delta
-pan pan 1 0.3467901443380824  0.7268028627434533  0
-eks pan 2 0.7586799647899636  0.5221511083334797  0.411890
-wye wye 3 0.20460330576630303 0.33831852551664776 -0.554077
-eks wye 4 0.38139939387114097 0.13418874328430463 0.176796
-wye pan 5 0.5732889198020006  0.8636244699032729  0.191890
-
-
-

- -

-

-
-$ mlr --opprint put '$x_delta = is_present(@last) ? $x - @last : 0; @last = $x' data/small
-a   b   i x                   y                   x_delta
-pan pan 1 0.3467901443380824  0.7268028627434533  0
-eks pan 2 0.7586799647899636  0.5221511083334797  0.411890
-wye wye 3 0.20460330576630303 0.33831852551664776 -0.554077
-eks wye 4 0.38139939387114097 0.13418874328430463 0.176796
-wye pan 5 0.5732889198020006  0.8636244699032729  0.191890
-
-
-

- -

- -

Keyed delta without/with oosvars

- -
- -

-

-
-$ mlr --opprint step -a delta -f x -g a data/small
-a   b   i x                   y                   x_delta
-pan pan 1 0.3467901443380824  0.7268028627434533  0
-eks pan 2 0.7586799647899636  0.5221511083334797  0
-wye wye 3 0.20460330576630303 0.33831852551664776 0
-eks wye 4 0.38139939387114097 0.13418874328430463 -0.377281
-wye pan 5 0.5732889198020006  0.8636244699032729  0.368686
-
-
-

- -

-

-
-$ mlr --opprint put '$x_delta = is_present(@last[$a]) ? $x - @last[$a] : 0; @last[$a]=$x' data/small
-a   b   i x                   y                   x_delta
-pan pan 1 0.3467901443380824  0.7268028627434533  0
-eks pan 2 0.7586799647899636  0.5221511083334797  0
-wye wye 3 0.20460330576630303 0.33831852551664776 0
-eks wye 4 0.38139939387114097 0.13418874328430463 -0.377281
-wye pan 5 0.5732889198020006  0.8636244699032729  0.368686
-
-
-

- -

- -

Exponentially weighted moving averages without/with oosvars

- -
- -

-

-
-$ mlr --opprint step -a ewma -d 0.1 -f x data/small
-a   b   i x                   y                   x_ewma_0.1
-pan pan 1 0.3467901443380824  0.7268028627434533  0.346790
-eks pan 2 0.7586799647899636  0.5221511083334797  0.387979
-wye wye 3 0.20460330576630303 0.33831852551664776 0.369642
-eks wye 4 0.38139939387114097 0.13418874328430463 0.370817
-wye pan 5 0.5732889198020006  0.8636244699032729  0.391064
-
-
-

- -

-

-
-$ mlr --opprint put '
-  begin{ @a=0.1 };
-  $e = NR==1 ? $x : @a * $x + (1 - @a) * @e;
-  @e=$e
-' data/small
-a   b   i x                   y                   e
-pan pan 1 0.3467901443380824  0.7268028627434533  0.346790
-eks pan 2 0.7586799647899636  0.5221511083334797  0.387979
-wye wye 3 0.20460330576630303 0.33831852551664776 0.369642
-eks wye 4 0.38139939387114097 0.13418874328430463 0.370817
-wye pan 5 0.5732889198020006  0.8636244699032729  0.391064
-
-
-

- -

-
-
- - diff --git a/doc/data-examples.html b/doc/data-examples.html index b7b049e26..1e3da2db8 100644 --- a/doc/data-examples.html +++ b/doc/data-examples.html @@ -115,7 +115,6 @@ Miller commands were run with pretty-print-tabular output format.
• 
Cookbook part 1
• Cookbook part 2
• Cookbook part 3 -
• Cookbook part 4
• Data-diving examples
• Manpage
• Reference diff --git a/doc/etymology.html b/doc/etymology.html index ab0be59a8..aa9b76501 100644 --- a/doc/etymology.html +++ b/doc/etymology.html @@ -115,7 +115,6 @@ Miller commands were run with pretty-print-tabular output format.
• Cookbook part 1
• Cookbook part 2
• Cookbook part 3 -
• Cookbook part 4
• Data-diving examples
• Manpage
• Reference diff --git a/doc/faq.html b/doc/faq.html index 52cc18460..3bfddcd31 100644 --- a/doc/faq.html +++ b/doc/faq.html @@ -115,7 +115,6 @@ Miller commands were run with pretty-print-tabular output format.
• Cookbook part 1
• Cookbook part 2
• Cookbook part 3 -
• Cookbook part 4
• Data-diving examples
• Manpage
• Reference diff --git a/doc/feature-comparison.html b/doc/feature-comparison.html index e393f273d..eb7a24da2 100644 --- a/doc/feature-comparison.html +++ b/doc/feature-comparison.html @@ -115,7 +115,6 @@ Miller commands were run with pretty-print-tabular output format.
• Cookbook part 1
• Cookbook part 2
• Cookbook part 3 -
• Cookbook part 4
• Data-diving examples
• Manpage
• Reference diff --git a/doc/file-formats.html b/doc/file-formats.html index 163ba9da3..f540962b2 100644 --- a/doc/file-formats.html +++ b/doc/file-formats.html @@ -115,7 +115,6 @@ Miller commands were run with pretty-print-tabular output format.
• Cookbook part 1
• Cookbook part 2
• Cookbook part 3 -
• Cookbook part 4
• Data-diving examples
• Manpage
• Reference diff --git a/doc/index.html b/doc/index.html index bd38f6905..a3515ac3c 100644 --- a/doc/index.html +++ b/doc/index.html @@ -115,7 +115,6 @@ Miller commands were run with pretty-print-tabular output format.
• Cookbook part 1
• Cookbook part 2
• Cookbook part 3 -
• Cookbook part 4
• Data-diving examples
• Manpage
• Reference diff --git a/doc/internationalization.html b/doc/internationalization.html index 92e73af4b..d5b2c2055 100644 --- a/doc/internationalization.html +++ b/doc/internationalization.html @@ -115,7 +115,6 @@ Miller commands were run with pretty-print-tabular output format.
• Cookbook part 1
• Cookbook part 2
• Cookbook part 3 -
• Cookbook part 4
• Data-diving examples
• Manpage
• Reference diff --git a/doc/manpage.html b/doc/manpage.html index f7610aeef..859998ea4 100644 --- a/doc/manpage.html +++ b/doc/manpage.html @@ -115,7 +115,6 @@ Miller commands were run with pretty-print-tabular output format.
• Cookbook part 1
• Cookbook part 2
• Cookbook part 3 -
• Cookbook part 4
• Data-diving examples
• Manpage
• Reference diff --git a/doc/originality.html b/doc/originality.html index a09c3f3ff..7bb8074ff 100644 --- a/doc/originality.html +++ b/doc/originality.html @@ -115,7 +115,6 @@ Miller commands were run with pretty-print-tabular output format.
• Cookbook part 1
• Cookbook part 2
• Cookbook part 3 -
• Cookbook part 4
• Data-diving examples
• Manpage
• Reference @@ -232,6 +231,12 @@ Miller’s added values include: jq does for JSON. If you’re not already familiar with jq, please check it out!. +

What about similar tools? +Here’s a comprehensive list: +https://github.com/dbohdan/structured-text-tools. +It doesn’t mention rows so here’s a plug for that as well. +As it turns out, I learned about most of these after writing Miller. +

What about DOTADIW? One of the key points of the Unix philosophy is that a tool should do one thing and do it well. Hence sort and diff --git a/doc/performance.html b/doc/performance.html index ef38d9828..e83ad1c84 100644 --- a/doc/performance.html +++ b/doc/performance.html @@ -115,7 +115,6 @@ Miller commands were run with pretty-print-tabular output format.
• Cookbook part 1
• Cookbook part 2
• Cookbook part 3 -
• Cookbook part 4
• Data-diving examples
• Manpage
• Reference diff --git a/doc/poki.cfg b/doc/poki.cfg index e7297b0cb..b8e21d869 100644 --- a/doc/poki.cfg +++ b/doc/poki.cfg @@ -11,7 +11,6 @@ faq.html FAQ cookbook.html Cookbook part 1 cookbook2.html Cookbook part 2 cookbook3.html Cookbook part 3 -cookbook4.html Cookbook part 4 data-examples.html Data-diving examples manpage.html Manpage reference.html Reference diff --git a/doc/record-heterogeneity.html b/doc/record-heterogeneity.html index 047959037..59de38e50 100644 --- a/doc/record-heterogeneity.html +++ b/doc/record-heterogeneity.html @@ -115,7 +115,6 @@ Miller commands were run with pretty-print-tabular output format.
• Cookbook part 1
• Cookbook part 2
• Cookbook part 3 -
• Cookbook part 4
• Data-diving examples
• Manpage
• Reference diff --git a/doc/reference-dsl.html b/doc/reference-dsl.html index e86a93f94..56dcdf534 100644 --- a/doc/reference-dsl.html +++ b/doc/reference-dsl.html @@ -115,7 +115,6 @@ Miller commands were run with pretty-print-tabular output format.
• Cookbook part 1
• Cookbook part 2
• Cookbook part 3 -
• Cookbook part 4
• Data-diving examples
• Manpage
• Reference diff --git a/doc/reference-verbs.html b/doc/reference-verbs.html index c242521f5..86ffd1284 100644 --- a/doc/reference-verbs.html +++ b/doc/reference-verbs.html @@ -115,7 +115,6 @@ Miller commands were run with pretty-print-tabular output format.
• Cookbook part 1
• Cookbook part 2
• Cookbook part 3 -
• Cookbook part 4
• Data-diving examples
• Manpage
• Reference diff --git a/doc/reference.html b/doc/reference.html index c6d8c49d7..32ac5faef 100644 --- a/doc/reference.html +++ b/doc/reference.html @@ -115,7 +115,6 @@ Miller commands were run with pretty-print-tabular output format.
• Cookbook part 1
• Cookbook part 2
• Cookbook part 3 -
• Cookbook part 4
• Data-diving examples
• Manpage
• Reference diff --git a/doc/release-docs.html b/doc/release-docs.html index 1e9039c4f..61396bbde 100644 --- a/doc/release-docs.html +++ b/doc/release-docs.html @@ -115,7 +115,6 @@ Miller commands were run with pretty-print-tabular output format.
• Cookbook part 1
• Cookbook part 2
• Cookbook part 3 -
• Cookbook part 4
• Data-diving examples
• Manpage
• Reference diff --git a/doc/to-do.html b/doc/to-do.html index 2bd02359b..1eecf529b 100644 --- a/doc/to-do.html +++ b/doc/to-do.html @@ -115,7 +115,6 @@ Miller commands were run with pretty-print-tabular output format.
• Cookbook part 1
• Cookbook part 2
• Cookbook part 3 -
• Cookbook part 4
• Data-diving examples
• Manpage
• Reference diff --git a/doc/why.html b/doc/why.html index 8f245038f..9bfc6310b 100644 --- a/doc/why.html +++ b/doc/why.html @@ -115,7 +115,6 @@ Miller commands were run with pretty-print-tabular output format.
• Cookbook part 1
• Cookbook part 2
• Cookbook part 3 -
• Cookbook part 4
• Data-diving examples
• Manpage
• Reference diff --git a/doc/whyc.html b/doc/whyc.html index dff9b39f6..5ce091237 100644 --- a/doc/whyc.html +++ b/doc/whyc.html @@ -115,7 +115,6 @@ Miller commands were run with pretty-print-tabular output format.
• Cookbook part 1
• Cookbook part 2
• Cookbook part 3 -
• Cookbook part 4
• Data-diving examples
• Manpage
• Reference