miller/docs6/docs/reference-main-maps.md.in
John Kerl 4f1424789e
Doc6 proofreads 3 (#638)
* Docs6 proofreads batch 3

* BUild-everything script for local development

* Start of glossary

* Put quicklinks atop every page, not just the base-index page

* Expanded record-heterogeneity page

* streaming page

* separators page

* vimrc doc

* separators page
2021-09-03 23:19:32 -04:00

130 lines
3.7 KiB
Markdown

# Maps
Miller data types are listed on the [Data types](reference-main-data-types.md) page; here we focus specifically on maps.
On the whole, maps are as in most other programming languages. However, following the
[Principle of Least Surprise](https://en.wikipedia.org/wiki/Principle_of_least_astonishment)
and aiming to reduce keystroking for Miller's most-used streaming-record-processing model,
there are a few differences as noted below.
## Types of maps
_Map literals_ are written in curly braces with string keys any [Miller data type](reference-main-data-types.md) (including other maps, or arrays) as values. Also, integers may be given as keys although they'll be stored as strings.
GENMD_RUN_COMMAND
mlr -n put '
end {
x = {"a": 1, "b": {"x": 2, "y": [3,4,5]}, 99: true};
dump x;
print x[99];
print x["99"];
}
'
GENMD_EOF
As with arrays and argument-lists, trailing commas are supported:
GENMD_RUN_COMMAND
mlr -n put '
end {
x = {
"a" : 1,
"b" : 2,
"c" : 3,
};
print x;
}
'
GENMD_EOF
The current record, accessible using `$*`, is a map.
GENMD_RUN_COMMAND
mlr --csv --from example.csv head -n 2 then put -q '
dump $*;
print "Color is", $*["color"];
'
GENMD_EOF
The collection of all [out-of-stream variables](reference-dsl-variables.md#out-of-stream0variables), `@*`, is a map.
GENMD_RUN_COMMAND
mlr --csv --from example.csv put -q '
begin {
@last_rates = {};
}
@last_rates[$shape] = $rate;
@last_color = $color;
end {
dump @*;
}
'
GENMD_EOF
Also note that several [built-in functions](reference-dsl-builtin-functions.md) operate on maps and/or return maps.
## Insertion order is preserved
Miller maps preserve insertion order. So if you write `@m["y"]=7` and then `@m["x"]=3` then any loop over
the map `@m` will give you the kays `"y"` and `"x"` in that order.
## String keys, with conversion from/to integer
All Miller map keys are strings. If a map is indexed with an integer for either
read or write (i.e. on either the right-hand side or left-hand side of an
assignment) then the integer will be converted to/from string, respectively. So
`@m[3]` is the same as `@m["3"]`. The reason for this is for situations like
[operating on all records](operating-on-all-records.md) where it's important to
let people do `@records[NR] = $*`.
## Auto-create
Indexing any as-yet-assigned local variable or out-of-stream variable results
in **auto-create** of that variable as a map variable:
GENMD_RUN_COMMAND
mlr --csv --from example.csv put -q '
# You can do this but you do not need to:
# begin { @last_rates = {} }
@last_rates[$shape] = $rate;
end {
dump @last_rates;
}
'
GENMD_EOF
*This also means that auto-create results in maps, not arrays, even if keys are integers.*
If you want to auto-extend an [array](reference-main-arrays.md), initialize it explicitly to `[]`.
GENMD_RUN_COMMAND
mlr --csv --from example.csv head -n 4 then put -q '
begin {
@my_array = [];
}
@my_array[NR] = $quantity;
@my_map[NR] = $rate;
end {
dump
}
'
GENMD_EOF
## Auto-deepen
Similarly, maps are **auto-deepened**: you can put `@m["a"]["b"]["c"]=3`
without first setting `@m["a"]={}` and `@m["a"]["b"]={}`. The reason for this
is for doing data aggregations: for example if you want compute keyed sums, you
can do that with a minimum of keystrokes.
GENMD_RUN_COMMAND
mlr --icsv --opprint --from example.csv put -q '
@quantity_sum[$color][$shape] += $rate;
end {
emit @quantity_sum, "color", "shape";
}
'
GENMD_EOF
## Looping
See [single-variable for-loops](reference-dsl-control-structures.md#single-variable-for-loops) and [key-value for-loops](reference-dsl-control-structures.md#key-value-for-loops).