DSL user-defined functions

As of Miller 5.0.0 you can define your own functions, as well as subroutines.

User-defined functions

Here's the obligatory example of a recursive function to compute the factorial function:

mlr --opprint --from data/small put '
    func f(n) {
        if (is_numeric(n)) {
            if (n > 0) {
                return n * f(n-1);
            } else {
                return 1;
            }
        }
        # implicitly return absent-null if non-numeric
    }
    $ox = f($x + NR);
    $oi = f($i);
'

a   b   i x                   y                   ox                  oi
pan pan 1 0.3467901443380824  0.7268028627434533  0.46705354854811026 1
eks pan 2 0.7586799647899636  0.5221511083334797  3.680838410072862   2
wye wye 3 0.20460330576630303 0.33831852551664776 1.7412511955594865  6
eks wye 4 0.38139939387114097 0.13418874328430463 18.588348778962008  24
wye pan 5 0.5732889198020006  0.8636244699032729  211.38730958519247  120

Properties of user-defined functions:

Function bodies start with func and a parameter list, defined outside of begin, end, or other func or subr blocks. (I.e. the Miller DSL has no nested functions.)
A function (uniqified by its name) may not be redefined: either by redefining a user-defined function, or by redefining a built-in function. However, functions and subroutines have separate namespaces: you can define a subroutine log (for logging messages to stderr, say) which does not clash with the mathematical log (logarithm) function.
Functions may be defined either before or after use -- there is an object-binding/linkage step at startup. More specifically, functions may be either recursive or mutually recursive.
Functions may be defined and called either within mlr filter or mlr put.
Argument values may be reassigned: they are not read-only.
When a return value is not implicitly returned, this results in a return value of absent-null. (In the example above, if there were records for which the argument to f is non-numeric, the assignments would be skipped.) See also the null-data reference page.
See the section on Local variables for information on scope and extent of arguments, as well as for information on the use of local variables within functions.
See the section on Expressions from files for information on the use of -f and -e flags.

User-defined subroutines

Example:

mlr --opprint --from data/small put -q '
  begin {
    @call_count = 0;
  }
  subr s(n) {
    @call_count += 1;
    if (is_numeric(n)) {
      if (n > 1) {
        call s(n-1);
      } else {
        print "numcalls=" . @call_count;
      }
    }
  }
  print "NR=" . NR;
  call s(NR);
'

NR=1
numcalls=1
NR=2
numcalls=3
NR=3
numcalls=6
NR=4
numcalls=10
NR=5
numcalls=15

Properties of user-defined subroutines:

Subroutine bodies start with subr and a parameter list, defined outside of begin, end, or other func or subr blocks. (I.e. the Miller DSL has no nested subroutines.)
A subroutine (uniqified by its name) may not be redefined. However, functions and subroutines have separate namespaces: you can define a subroutine log which does not clash with the mathematical log function.
Subroutines may be defined either before or after use -- there is an object-binding/linkage step at startup. More specifically, subroutines may be either recursive or mutually recursive. Subroutines may call functions.
Subroutines may be defined and called either within mlr put or mlr put.
Subroutines have read/write access to $-variables and @-variables.
Argument values may be reassigned: they are not read-only.
See the section on local variables for information on scope and extent of arguments, as well as for information on the use of local variables within functions.
See the section on Expressions from files for information on the use of -f and -e flags.

4.5 KiB Raw Permalink Blame History

DSL user-defined functions

User-defined functions

User-defined subroutines

4.5 KiB

Raw Permalink Blame History