DSL reference: overview¶
Overview¶
Here’s comparison of verbs and put/filter DSL expressions:
Example:
mlr stats1 -a sum -f x -g a data/small
a=pan,x_sum=0.3467901443380824
a=eks,x_sum=1.1400793586611044
a=wye,x_sum=0.7778922255683036
Verbs are coded in Go
They run a bit faster
They take fewer keystrokes
There is less to learn
Their customization is limited to each verb’s options
Example:
mlr put -q '@x_sum[$a] += $x; end{emit @x_sum, "a"}' data/small
a=pan,x_sum=0.3467901443380824
a=eks,x_sum=1.1400793586611044
a=wye,x_sum=0.7778922255683036
You get to write your own DSL expressions
They run a bit slower
They take more keystrokes
There is more to learn
They are highly customizable
Please see Reference: list of verbs for information on verbs other than put and filter.
The essential usages of mlr filter and mlr put are for record-selection and record-updating expressions, respectively. For example, given the following input data:
cat data/small
a=pan,b=pan,i=1,x=0.3467901443380824,y=0.7268028627434533
a=eks,b=pan,i=2,x=0.7586799647899636,y=0.5221511083334797
a=wye,b=wye,i=3,x=0.20460330576630303,y=0.33831852551664776
a=eks,b=wye,i=4,x=0.38139939387114097,y=0.13418874328430463
a=wye,b=pan,i=5,x=0.5732889198020006,y=0.8636244699032729
you might retain only the records whose a field has value eks:
mlr filter '$a == "eks"' data/small
a=eks,b=pan,i=2,x=0.7586799647899636,y=0.5221511083334797
a=eks,b=wye,i=4,x=0.38139939387114097,y=0.13418874328430463
or you might add a new field which is a function of existing fields:
mlr put '$ab = $a . "_" . $b ' data/small
a=pan,b=pan,i=1,x=0.3467901443380824,y=0.7268028627434533,ab=pan_pan
a=eks,b=pan,i=2,x=0.7586799647899636,y=0.5221511083334797,ab=eks_pan
a=wye,b=wye,i=3,x=0.20460330576630303,y=0.33831852551664776,ab=wye_wye
a=eks,b=wye,i=4,x=0.38139939387114097,y=0.13418874328430463,ab=eks_wye
a=wye,b=pan,i=5,x=0.5732889198020006,y=0.8636244699032729,ab=wye_pan
The two verbs mlr filter and mlr put are essentially the same. The only differences are:
Expressions sent to
mlr filtermust end with a boolean expression, which is the filtering criterion;mlr filterexpressions may not reference thefilterkeyword within them; andmlr filterexpressions may not usetee,emit,emitp, oremitf.
All the rest is the same: in particular, you can define and invoke functions and subroutines to help produce the final boolean statement, and record fields may be assigned to in the statements preceding the final boolean statement.
There are more details and more choices, of course, as detailed in the following sections.