This commit is contained in:
John Kerl 2015-07-27 21:50:02 -04:00
parent 792b09971b
commit 12ac07bbce
2 changed files with 5 additions and 4 deletions

View file

@ -9,7 +9,7 @@ With Miller you get to use named fields without needing to count positional indi
% mlr --csv filter '$status != "down"' mydata.csv
```
This is something the Unix toolkit always could have done, and arguably always should have done. It operates on key-value-pair data while the familiar Unix tools operate on integer-indexed fields: if the natural data structure for the latter is the array, then Miller's natural data structure is the insertion-ordered hash map. This encompasses a **variety of data formats**, including but not limited to the familiar CSV. (Miller can handle positionally-indexed data as a special case.)
This is something the Unix toolkit always could have done, and arguably always should have done. It operates on **key-value-pair data** while the familiar Unix tools operate on integer-indexed fields: if the natural data structure for the latter is the array, then Miller's natural data structure is the insertion-ordered hash map. This encompasses a **variety of data formats**, including but not limited to the familiar **CSV**. (Miller can handle positionally-indexed data as a special case.)
Features:

View file

@ -6,8 +6,10 @@ TOP OF LIST
* make a -D for hash-collision stats ...
* go through remaining functions to decide when null-through is ok.
* also document thoroughly. emphasize this is crucial for heterogeneous data.
* document nullability thoroughly: emphasize this is crucial for heterogeneous data.
- UTs cases for all
- note null-loses logic for min/max.
- separate doc section?
* doc w/ very specific examples of sed/grep/etc preprocessing to structurize semi-structured data (e.g. logs)
@ -15,7 +17,6 @@ TOP OF LIST
!! booleans into put-dsl: should be able to do: mlr put '$ok = $x < 10'
* min/max functions: f_ff. need to impl null-loses logic.
* regularization mapper: when same names, reorder the same as 1st occurrence.
e.g. one record w/ a,b,c & subsequent with a,c,b.