5.2 KiB
As of Miller 5.0.0 you can define your own functions, as well as subroutines.
User-defined functions
Here's the obligatory example of a recursive function to compute the factorial function:
mlr --opprint --from data/small put '
func f(n) {
if (is_numeric(n)) {
if (n > 0) {
return n * f(n-1);
} else {
return 1;
}
}
# implicitly return absent-null if non-numeric
}
$ox = f($x + NR);
$oi = f($i);
'
a b i x y ox oi pan pan 1 0.346791 0.726802 0.4670549976810001 1 eks pan 2 0.758679 0.522151 3.6808304227112796 2 wye wye 3 0.204603 0.338318 1.7412477437471126 6 eks wye 4 0.381399 0.134188 18.588317372151177 24 wye pan 5 0.573288 0.863624 211.38663947090302 120
Properties of user-defined functions:
-
Function bodies start with
funcand a parameter list, defined outside ofbegin,end, or otherfuncorsubrblocks. (I.e. the Miller DSL has no nested functions.) -
A function (uniqified by its name) may not be redefined: either by redefining a user-defined function, or by redefining a built-in function. However, functions and subroutines have separate namespaces: you can define a subroutine
log(for logging messages to stderr, say) which does not clash with the mathematicallog(logarithm) function. -
Functions may be defined either before or after use -- there is an object-binding/linkage step at startup. More specifically, functions may be either recursive or mutually recursive.
-
Functions may be defined and called either within
mlr filterormlr put. -
Argument values may be reassigned: they are not read-only.
-
When a return value is not implicitly returned, this results in a return value of absent-null. (In the example above, if there were records for which the argument to
fis non-numeric, the assignments would be skipped.) See also the null-data reference page. -
See the section on Local variables for information on scope and extent of arguments, as well as for information on the use of local variables within functions.
-
See the section on Expressions from files for information on the use of
-fand-eflags.
User-defined subroutines
Example:
mlr --opprint --from data/small put -q '
begin {
@call_count = 0;
}
subr s(n) {
@call_count += 1;
if (is_numeric(n)) {
if (n > 1) {
call s(n-1);
} else {
print "numcalls=" . @call_count;
}
}
}
print "NR=" . NR;
call s(NR);
'
NR=1 numcalls=1 NR=2 numcalls=3 NR=3 numcalls=6 NR=4 numcalls=10 NR=5 numcalls=15
Properties of user-defined subroutines:
-
Subroutine bodies start with
subrand a parameter list, defined outside ofbegin,end, or otherfuncorsubrblocks. (I.e. the Miller DSL has no nested subroutines.) -
A subroutine (uniqified by its name) may not be redefined. However, functions and subroutines have separate namespaces: you can define a subroutine
logwhich does not clash with the mathematicallogfunction. -
Subroutines may be defined either before or after use -- there is an object-binding/linkage step at startup. More specifically, subroutines may be either recursive or mutually recursive. Subroutines may call functions.
-
Subroutines may be defined and called either within
mlr putormlr put. -
Subroutines have read/write access to
$-variables and@-variables. -
Argument values may be reassigned: they are not read-only.
-
See the section on local variables for information on scope and extent of arguments, as well as for information on the use of local variables within functions.
-
See the section on Expressions from files for information on the use of
-fand-eflags.
Differences between functions and subroutines
Subroutines cannot return values, and they are invoked by the keyword call.
In hindsight, subroutines needn't have been invented. If foo is a function
then you can write foo(1,2,3) while ignoring its return value, and that plays
the role of subroutine quite well.