mirror of https://github.com/johnkerl/miller.git synced 2026-01-23 10:15:36 +00:00

No description

Find a file

John Kerl d59d321a92 todo		2021-05-30 21:53:43 -04:00
.github/workflows	Abandon Travis; GitHub Actions as forward path	2021-05-23 00:46:21 +00:00
autotools	autoreconf -fiv	2020-01-26 10:31:43 -05:00
c	Document --no-implicit-csv-header	2021-05-22 12:43:43 -04:00
data	neaten	2018-02-10 12:37:01 -05:00
docs	fix release-docs links	2021-05-30 20:44:52 -04:00
docs6	fix release-docs links	2021-05-30 20:44:52 -04:00
experiments	prototype for platform-independent cli-parse	2021-03-28 16:53:56 -04:00
go	todo	2021-05-30 21:53:43 -04:00
m4	autoreconf -fiv	2020-01-26 10:31:43 -05:00
man	Makefile build-order comments	2021-03-31 17:40:51 -04:00
perf	add timings.txt for PR #430	2021-02-23 23:08:35 -05:00
python	doc neatens	2015-05-14 14:05:45 -04:00
vim	Complete mlr.vim syntax file	2021-04-18 17:05:43 -04:00
.gitattributes	fix mlr termcvt	2017-05-13 19:37:40 -04:00
.gitignore	iterating	2021-05-24 22:02:43 -04:00
.travis.yml	Abandon Travis; GitHub Actions as forward path	2021-05-23 00:55:04 +00:00
aclocal.m4	autoreconf -fiv	2020-01-26 10:31:43 -05:00
appveyor.yml	Abandon Travis; GitHub Actions as forward path	2021-05-23 00:55:04 +00:00
config.h.in	autoreconf -fiv	2019-04-11 21:35:31 -04:00
configure	5.10.2	2021-03-23 23:05:17 -04:00
configure.ac	post-5.10.2	2021-03-23 23:36:07 -04:00
index.html	Initial commit	2015-05-03 16:11:45 -07:00
LICENSE.txt	neaten	2015-08-15 10:15:50 -07:00
Makefile.am	Restore mlr.1 manpage in dist-file	2021-03-22 23:32:31 -04:00
Makefile.in	more	2021-03-23 00:33:34 -04:00
Makefile.no-autoconfig	more	2021-03-23 00:33:34 -04:00
miller.spec	5.10.2	2021-03-23 23:02:54 -04:00
msys2-build.sh	appveyor iterate	2017-07-02 21:11:22 -04:00
name-ideas.txt	fix nidx reader for null fields	2015-05-09 13:50:05 -07:00
README-autobuild.md	Abandon Travis; GitHub Actions as forward path	2021-05-23 00:55:04 +00:00
README-RPM.md	doc fixups in prep for 5.8.0 release	2020-05-17 23:33:37 -04:00
README.md	Survey link	2021-05-27 10:08:12 -04:00

README.md

Take the 2021 Miller User Survey!

What is Miller?

Miller is like awk, sed, cut, join, and sort for data formats such as CSV, TSV, tabular JSON and positionally-indexed.

What can Miller do for me?

With Miller, you get to use named fields without needing to count positional indices, using familiar formats such as CSV, TSV, JSON, and positionally-indexed. Then, on the fly, you can add new fields which are functions of existing fields, drop fields, sort, aggregate statistically, pretty-print, and more.

Miller operates on key-value-pair data while the familiar Unix tools operate on integer-indexed fields: if the natural data structure for the latter is the array, then Miller's natural data structure is the insertion-ordered hash map.
Miller handles a variety of data formats, including but not limited to the familiar CSV, TSV, and JSON. (Miller can handle positionally-indexed data too!)

Getting started

Build status

License: BSD2

Docs

Community

Discussion forum: https://github.com/johnkerl/miller/discussions
Feature requests / bug reports: https://github.com/johnkerl/miller/issues

Contributors

Thanks to all the fine people who help make Miller better by contributing commits/PRs! (I wish there were an equally fine way to honor all the fine people who contribute through issues and feature requests!)

Distributions

There's a good chance you can get Miller pre-built for your system:

OS	Installation command
Linux	`yum install miller` `apt-get install miller`
Mac	`brew install miller` `port install miller`
Windows	`choco install miller`

Features

Miller is multi-purpose: it's useful for data cleaning, data reduction, statistical reporting, devops, system administration, log-file processing, format conversion, and database-query post-processing.
You can use Miller to snarf and munge log-file data, including selecting out relevant substreams, then produce CSV format and load that into all-in-memory/data-frame utilities for further statistical and/or graphical processing.
Miller complements data-analysis tools such as R, pandas, etc.: you can use Miller to clean and prepare your data. While you can do basic statistics entirely in Miller, its streaming-data feature and single-pass algorithms enable you to reduce very large data sets.
Miller complements SQL databases: you can slice, dice, and reformat data on the client side on its way into or out of a database. You can also reap some of the benefits of databases for quick, setup-free one-off tasks when you just need to query some data in disk files in a hurry.
Miller also goes beyond the classic Unix tools by stepping fully into our modern, no-SQL world: its essential record-heterogeneity property allows Miller to operate on data where records with different schema (field names) are interleaved.
Miller is streaming: most operations need only a single record in memory at a time, rather than ingesting all input before producing any output. For those operations which require deeper retention (sort, tac, stats1), Miller retains only as much data as needed. This means that whenever functionally possible, you can operate on files which are larger than your system’s available RAM, and you can use Miller in tail -f contexts.
Miller is pipe-friendly and interoperates with the Unix toolkit
Miller's I/O formats include tabular pretty-printing, positionally indexed (Unix-toolkit style), CSV, JSON, and others
Miller does conversion between formats
Miller's processing is format-aware: e.g. CSV sort and tac keep header lines first
Miller has high-throughput performance on par with the Unix toolkit
Not unlike jq (http://stedolan.github.io/jq/) for JSON, Miller is written in portable, modern C, with zero runtime dependencies. You can download or compile a single binary, scp it to a faraway machine, and expect it to work.

README.md Unescape Escape