miller/docs6/docs/reference-dsl-differences.md.in

188 lines
7.1 KiB
Markdown

# Differences from other programming languages
The Miller programming language is intended to be straightforward and familiar,
as well as [not overly complex](reference-dsl-complexity.md). It doesn't try to
break new ground in terms of syntax; there are no classes or closures, and so
on.
While the [Principle of Least
Surprise](https://en.wikipedia.org/wiki/Principle_of_least_astonishment) is
often held to, nonetheless the following may be surprising.
## No ++ or --
There is no `++` or `--` [operator](reference-dsl-operators.md). To increment
`x`, use `x = x+1` or `x += 1`, and similarly for decrement.
## Semicolons as delimiters
You don't need a semicolon to end expressions, only to separate them. This
was done intentionally from the very start of Miller: you should be able to do
simple things like `mlr put '$z = $x * $y' myfile.dat` without needing a
semicolon.
Note that since you also don't need a semicolon before or after closing curly
braces (such as `begin`/`end` blocks, `if`-statements, `for`-loops, etc.) it's
easy to key in a few semicolon-free statements, and then to forget a
semicolon where one is needed . The parser tries to remind you about semicolons
whenever there's a chance a missing semicolon might be involved in a parse
error.
GENMD_RUN_COMMAND_TOLERATING_ERROR
mlr --csv --from example.csv put -q '
begin {
@count = 0 # No semicolon required -- before closing curly brace
}
$x=1 # No semicolon required -- at end of expression
'
GENMD_EOF
GENMD_RUN_COMMAND_TOLERATING_ERROR
mlr --csv --from example.csv put -q '
begin {
@count = 0 # No semicolon required -- before closing curly brace
}
$x=1 # Needs a semicolon after it
$y=2 # No semicolon required -- at end of expression
'
GENMD_EOF
## elif
Miller has [`elif`](reference-dsl-control-structures.md#if-statements), not `else if` or `elsif`.
## Required curly braces
Bodies for all compound statements must be enclosed in curly braces, even if the body is a single statement:
GENMD_SHOW_COMMAND
mlr ... put 'if ($x == 1) $y = 2' # Syntax error
GENMD_EOF
GENMD_SHOW_COMMAND
mlr ... put 'if ($x == 1) { $y = 2 }' # This is OK
GENMD_EOF
## No autoconvert to boolean
Boolean tests in `if`/`while`/`for`/etc must always take a boolean expression:
`if (1) {...}` results in the parse error
`Miller: conditional expression did not evaluate to boolean.`,
Likewise `if (x) {...}`, unless `x` is a variable of boolean type.
Please use `if (x != 0) {...}`, etc.
## Integer-preserving arithmetic
As discussed on the [arithmetic page](reference-main-arithmetic.md) the sum, difference, and product of two integers is again an integer, unless overflow occurs -- in which case Miller tries to convert to float in the least obtrusive way possible.
Likewise, while quotient and remainder are generally pythonic, the quotient and exponentiation of two integers is an integer when possible.
GENMD_CARDIFY_HIGHLIGHT_ONE
$ mlr repl -q
[mlr] 6/2
3
[mlr] typeof(6/2)
int
[mlr] 6/5
1.2
[mlr] typeof(6/5)
float
[mlr] typeof(7**8)
int
[mlr] typeof(7**80)
float
GENMD_EOF
## Print adds spaces around multiple arguments
As seen in the previous example,
[`print`](reference-dsl-output-statements.md#print-statements) with multiple
comma-delimited arguments fills in intervening spaces for you. If you want to
avoid this, use the dot operator for string-concatenation instead.
GENMD_RUN_COMMAND
mlr -n put -q '
end {
print "[", "a", "b", "c", "]";
print "[" . "a" . "b" . "c" . "]";
}
'
GENMD_EOF
Similarly, a final newline is printed for you; use [`printn`](reference-dsl-output-statements.md#print-statements) to avoid this.
## String literals with double quotes only
In some languages, like Ruby and Bash, string literals can be in single quotes or double quotes,
where single quotes suppress `\n` converting to a newline character and double quotes allowing it:
`'a\nb'` prints as the four characters `a`, `\`, `n`, and `b` on one line; `"a\nb"` prints as an
`a` on one line and a `b` on another.
In others, like Python and JavaScript, string literals can be in single quotes or double quotes,
interchangeably -- so you can have `"don't"` or `'the "right" thing'` as you wish.
In yet others, such as C/C++ and Java, string literals are in double auotes, like `"abc"`,
while single quotes are for character literals like `'a'` or `'\n'`. In these, if `s` is a non-empty string,
then `s[0]` is its first character.
In the [Miller programming language](programming-language.md):
* String literals are always in double quotes, like `"abc"`.
* String-indexing/slicing always results in strings (even of length 1): `"abc"[1:1]` is the string `"a"`, and there is no notion in the Miller programming language of a character type.
* The single-quote character plays no role whatsoever in the grammar of the Miller programming language.
* Single quotes are reserved for wrapping expressions at the system command line. For example, in `mlr put '$message = "hello"' ...`, the [`put` verb](reference-dsl.md) gets the string `$message = "hello"`; the shell has consumed the outer single quotes by the time the Miller parser receives it.
* Things are a little different on Windows, where `"""` sequences are sometimes necessary: see the [Miller on Windows page](miller-on-windows.md).
## Absent-null
Miller has a somewhat novel flavor of null data called _absent_: if a record
has a field `x` then `$y=$x` creates a field `y`, but if it doesn't then the assignment
is skipped. See the [null-data page](reference-main-null-data.md) for more
information.
## Maps
See the [maps page](reference-main-maps.md).
## Arrays, including 1-up array indices
Arrays are indexed starting with 1, not 0. This is discussed in detail on the [arrays page](reference-main-arrays.md).
GENMD_RUN_COMMAND
mlr --csv --from data/short.csv cat
GENMD_EOF
GENMD_RUN_COMMAND
mlr --csv --from data/short.csv put -q '
@records[NR] = $*;
end {
for (i = 1; i <= NR; i += 1) {
print "Record", i, "has word", @records[i]["word"];
}
}
'
GENMD_EOF
See the [arrays page](reference-main-arrays.md) for more about arrays.
## Two-variable for-loops
Miller has a [key-value loop flavor](reference-dsl-control-structures.md#key-value-for-loops): whether `x` is a map or array, in `for (k,v in x) { ... }` the `k` will be bound to successive map keys (for maps) or 1-up array indices (for arrays), and the `v` will be bound to successive map values.
## Semantics for one-variable for-loops
Miller also has a [single-variable loop flavor](reference-dsl-control-structures.md#single-variable-for-loops). If `x` is a map then `for (e in x) { ... }` binds `e` to successive map _keys_ (not values as in PHP). But if `x` is an array then `for e in x) { ... }` binds `e` to successive array _values_ (not indices).
## JSON parse, stringify, decode, and encode
Miller has the verbs
[`json-parse`](reference-verbs.md#json-parse) and
[`json-stringify`](reference-verbs.md#json-stringify), and the DSL functions
[`json_parse`](reference-dsl-builtin-functions.md#json_parse) and
[`json_stringify`](reference-dsl-builtin-functions.md#json_stringify).
In some other lannguages these are called `json_decode` and `json_encode`.