miller

mirror of https://github.com/johnkerl/miller.git synced 2026-01-23 18:25:45 +00:00

No description

Find a file

John Kerl 1a893097ac doc update re releases		2015-08-21 23:43:31 -04:00
c	use CCOMP=gcc in c/dsls/Makefile as well as c/Makefile	2015-08-21 23:40:39 -04:00
data	data-dir reorgs	2015-06-01 09:23:42 -04:00
doc	doc update re releases	2015-08-21 23:43:31 -04:00
perf	[csv iterate] string-builder iterate	2015-08-21 22:31:53 -04:00
python	doc neatens	2015-05-14 14:05:45 -04:00
.gitignore	[csv iterate] string-builder iterate	2015-08-21 22:31:53 -04:00
index.html	Initial commit	2015-05-03 16:11:45 -07:00
LICENSE.txt	neaten	2015-08-15 10:15:50 -07:00
Makefile	doc updates, including separating required from optional external dependencies	2015-08-17 23:25:23 -04:00
name-ideas.txt	fix nidx reader for null fields	2015-05-09 13:50:05 -07:00
README.md	HN feedbacks	2015-08-19 18:31:59 -04:00

README.md

Miller is like sed, awk, cut, join, and sort for name-indexed data such as CSV.

With Miller you get to use named fields without needing to count positional indices. For example:

% mlr --csv cut -f hostname,uptime mydata.csv
% mlr --csv sort -f hostname,uptime mydata.csv
% mlr --csv put '$z = $x + 2.7*$y' mydata.csv
% mlr --csv filter '$status != "down"' mydata.csv

This is something the Unix toolkit always could have done, and arguably always should have done. It operates on key-value-pair data while the familiar Unix tools operate on integer-indexed fields: if the natural data structure for the latter is the array, then Miller's natural data structure is the insertion-ordered hash map. This encompasses a variety of data formats, including but not limited to the familiar CSV. (Miller can handle positionally-indexed data as a special case.)

Features:

I/O formats including tabular pretty-printing
Conversion between formats
Format-aware processing: e.g. CSV sort and tac keep header lines first
High-throughput performance on par with the Unix toolkit
Miller is pipe-friendly and interoperates with Unix toolkit
It complements SQL databases: you can slice, dice, and reformat data on the client side on its way into or out of a database. You can also reap some of the benefits of databases for quick, setup-free one-off tasks when just need to query some data in disk files in a hurry.
Miller also goes beyond classic Unix tools by stepping into our modern, no-SQL world: its essential record-heterogeneity property allows it to operate on data where records with different schema (field names) are interleaved.
Not unlike jq (http://stedolan.github.io/jq/) for JSON, Miller is written in modern C, and it has zero runtime dependencies. You can download or compile a single binary, scp it to a faraway machine, and expect it to work.

For more information please see http://johnkerl.org/miller/doc