User-defined functions

Here's the obligatory example of a recursive function to compute the factorial function:

mlr --opprint --from data/small put '
    func f(n) {
        if (is_numeric(n)) {
            if (n > 0) {
                return n * f(n-1);
            } else {
                return 1;
            }
        }
        # implicitly return absent-null if non-numeric
    }
    $ox = f($x + NR);
    $oi = f($i);
'

a   b   i x        y        ox                 oi
pan pan 1 0.346791 0.726802 0.4670549976810001 1
eks pan 2 0.758679 0.522151 3.6808304227112796 2
wye wye 3 0.204603 0.338318 1.7412477437471126 6
eks wye 4 0.381399 0.134188 18.588317372151177 24
wye pan 5 0.573288 0.863624 211.38663947090302 120

Properties of user-defined functions:

Function bodies start with func and a parameter list, defined outside of begin, end, or other func or subr blocks. (I.e. the Miller DSL has no nested functions.)
A function (uniqified by its name) may not be redefined: either by redefining a user-defined function, or by redefining a built-in function. However, functions and subroutines have separate namespaces: you can define a subroutine log (for logging messages to stderr, say) which does not clash with the mathematical log (logarithm) function.
Functions may be defined either before or after use -- there is an object-binding/linkage step at startup. More specifically, functions may be either recursive or mutually recursive.
Functions may be defined and called either within mlr filter or mlr put.
Argument values may be reassigned: they are not read-only.
When a return value is not implicitly returned, this results in a return value of absent-null. (In the example above, if there were records for which the argument to f is non-numeric, the assignments would be skipped.) See also the null-data reference page.
See the section on Local variables for information on scope and extent of arguments, as well as for information on the use of local variables within functions.
See the section on Expressions from files for information on the use of -f and -e flags.

User-defined subroutines

Example:

mlr --opprint --from data/small put -q '
  begin {
    @call_count = 0;
  }
  subr s(n) {
    @call_count += 1;
    if (is_numeric(n)) {
      if (n > 1) {
        call s(n-1);
      } else {
        print "numcalls=" . @call_count;
      }
    }
  }
  print "NR=" . NR;
  call s(NR);
'

NR=1
numcalls=1
NR=2
numcalls=3
NR=3
numcalls=6
NR=4
numcalls=10
NR=5
numcalls=15

Properties of user-defined subroutines:

Subroutine bodies start with subr and a parameter list, defined outside of begin, end, or other func or subr blocks. (I.e. the Miller DSL has no nested subroutines.)
A subroutine (uniqified by its name) may not be redefined. However, functions and subroutines have separate namespaces: you can define a subroutine log which does not clash with the mathematical log function.
Subroutines may be defined either before or after use -- there is an object-binding/linkage step at startup. More specifically, subroutines may be either recursive or mutually recursive. Subroutines may call functions.
Subroutines may be defined and called either within mlr put or mlr put.
Subroutines have read/write access to $-variables and @-variables.
Argument values may be reassigned: they are not read-only.
See the section on local variables for information on scope and extent of arguments, as well as for information on the use of local variables within functions.
See the section on Expressions from files for information on the use of -f and -e flags.

Differences between functions and subroutines

Subroutines cannot return values, and they are invoked by the keyword call.

In hindsight, subroutines needn't have been invented. If foo is a function then you can write foo(1,2,3) while ignoring its return value, and that plays the role of subroutine quite well.

5.2 KiB Raw Blame History

User-defined functions

User-defined subroutines

Differences between functions and subroutines

5.2 KiB

Raw Blame History