miller/docs6/docs/reference-dsl-differences.md
2021-08-25 22:01:32 -04:00

5.2 KiB

Differences from other programming languages

The Miller programming language is intended to be straightforward and familiar, as well as not overly complex. It doesn't try to break new ground in terms of syntax; there are no classes or closures, and so on.

While the Principle of Least Surprise is often held to, nonetheless the following may be surprising.

No ++ or --

There is no ++ or -- operator. To increment x, use x = x+1 or x += 1, and similarly for decrement.

Semicolons as delimiters

You don't need a semicolon to end expressions, only to separate them. This was done intentionally from the very start of Miller: you should be able to do simple things like mlr put '$z = $x * $y' myfile.dat without needing a semicolon.

Note that since you also don't need a semicolon before or after closing curly braces (such as begin/end blocks, if-statements, for-loops, etc.) it's easy to key in several semicolon-free statements, and then to forget a semicolon where one is needed . The parser tries to remind you about semicolons whenever there's a chance a missing semicolon might be involved in a parse error.

No autoconvert to boolean

Boolean tests in if/while/for/etc must always take a boolean expression: if (1) {...} results in the parse error Miller: conditional expression did not evaluate to boolean., Likewise if (x) {...}, unless x is a variable of boolean type. Please use if (x != 0) {...}, etc.

Integer-preserving arithmetic

As discussed on the arithmetic page the sum, difference, and product of two integers is again an integer, unless overflow occurs -- in which case Miller tries to convert to float in the least obtrusive way possible.

Likewise, while quotient and remainder are generally pythonic, the quotient and exponentiation of two integers is an integer when possible.

$ mlr repl -q
[mlr] 6/2
3

[mlr] typeof(6/2)
int

[mlr] 6/5
1.2

[mlr] typeof(6/5)
float

[mlr] typeof(7**8)
int

[mlr] typeof(7**80)
float

1-up array indices

Arrays are indexed starting with 1, not 0. This is discussed in detail on the arrays page.

mlr --csv --from data/short.csv cat
word,value
apple,37
ball,28
cat,54
mlr --csv --from data/short.csv put -q '
  @records[NR] = $*;
  end {
    for (i = 1; i <= NR; i += 1) {
      print "Record", i, "has word", @records[i]["word"];
    }
  }
'
Record 1 has word apple
Record 2 has word ball
Record 3 has word cat

Print adds spaces around multiple arguments

As seen in the previous example, print with multiple comma-delimited arguments fills in intervening spaces for you. If you want to avoid this, use the dot operator for string-concatenation instead.

mlr -n put -q '
  end {
    print "[", "a", "b", "c", "]";
    print "[" . "a" . "b" . "c" . "]";
  }
'
[ a b c ]
[abc]

Similarly, a final newline is printed for you; use printn to avoid this.

Insertion-order-preserving hashmaps

Miller's hashmaps [TODO:linkify] (as in many modern languages) preserve insertion order. If you set x["foo"]=1 and then x["bar"]=2, then you are guaranteed that any looping over x will retrieve the "foo" key-value pair first, and the "bar" key-value pair second.

mlr -n put -q 'end {
  x["foo"] = 1;
  x["bar"] = 2;
  dump x;
  for (k,v in x) {
    print "key", k, "value", v
  }
}'
{
  "foo": 1,
  "bar": 2
}
key foo value 1
key bar value 2

Two-variable for-loops

Miller has a key-value loop flavor: whether x is a map or array, in for (k,v in x) { ... } the k will be bound to successive map keys (for maps) or 1-up array indices (for arrays), and the v will be bound to successive map values.

Semantics for one-variable for-loops

Miller also has a single-variable loop flavor. If x is a map then for (e in x) { ... } binds e to successive map keys (not values as in PHP). But if x is an array then for e in x) { ... } binds e to successive array values (not indices).

Absent-null

Miller has a somewhat novel flavor of null data called absent: if a record has a field x then $y=$x creates a field y, but if it doesn't then the assignment is skipped. See the null-data page for more information.