2.5 KiB
Introduction
Miller is a command-line tool for querying, shaping, and reformatting data files in various formats including CSV, TSV, and JSON.
In several senses, Miller is more than one tool:
Format conversion: You can convert CSV files to JSON, or vice versa, or pretty-print your data horizontally or vertically to make it easier to read.
Data manipulation: With a few keystrokes you can remove columns you don't care about -- or, make new ones.
Pre-processing/post-processing vs standalone use: You can use Miller to clean data files and put them into standard formats, perhaps in preparation to load them into a database or a hands-off data-processing pipeline. Or you can use it post-process and summary database-query output. As well, you can use Miller to explore and analyze your data interactively.
Compact verbs vs programming language: For low-keystroking you can do things like
mlr --csv sort -f name input.csv
mlr --json head -n 1 myfile.json
The sort, head, etc are called verbs. They're analogs of familiar command-line tools like sort, head, and so on -- but they're aware of name-indexed, multi-line file formats like CSV, TSV, and JSON. In addition, though, using Miller's put verb you can use programming-language statements for expressions like
mlr --csv put '$rate = $units / $seconds' input.csv
which allow you to succintly express your own logic.
Multiple domains: People use Miller for data analysis, data science, software engineering, devops/system-administration, journalism, scientific research, and more.
In the following you can see how CSV, TSV, tabular, JSON, and other file formats share a common theme which is lists of key-value-pairs. Miller embraces this common theme.
