miller/docs6/docs/cpu.md at a34fdafe8aec0efd57ecaaee8f2291c05a055f21

Mirrors/miller

Fork 0

mirror of https://github.com/johnkerl/miller.git synced 2026-01-24 02:36:15 +00:00

John Kerl be1b026ff3 stub for new flag-list page

2021-09-06 21:57:15 -04:00

2.5 KiB

Raw Blame History

Quick links: Flag list Verb list Function list Glossary Repository ↗

# CPU/multicore usage

Miller 6 is written in Go which supports multicore programming.

Miller uses Go's channel concept. The following are all separate goroutines:

One channel for input-record reader: parsing input file(s) to record objects
One channel for each verb in then-chains
One channel for output-record writer: formatting output records as text
One controller channel which coordinates all these, without much work to do.

For example, mlr --csv cut -f somefield then sort -f otherfield then put '$z = $x + $y' a.csv b.csv c.csv will have 6 goroutines running: input-reader, cut, sort, put, output-writer, controller.

If all the verbs in the chain are streaming -- operating on each record as it arrives, then passing it on -- then all verbs in the chain will be active at once. On the other hand, if there is a non-streaming verb in the chain, which produces output only after receiving all input -- for example, sort -- then we would expect verbs after that in the chain to sit idle until the end of the input stream is reached, the sort does its computation, then sends its output to downstream verbs.

In practice, profiling has shown that the input-reader uses the most CPU of all the above. This means CPUs running verbs may not be 100% utilized, since they are likely to be spending some of their time waiting for input data.

Running Miller on a machine with more CPUs than active channels (as listed above) won't speed up a given invocation of Miller. However, of course, you'll be able to run more invocations of Miller at the same time if you like.

You can set the Go-standard environment variable GOMAXPROCS if you like. If you don't, Miller will (as is standard for Go programs in Go 1.16 and above) up to all available CPUs.

If you set $GOMAXPROCS=1 in the environment, that's fine -- the Go runtime will multiplex different channel-handling goroutines onto the same CPU.

2.5 KiB Raw Blame History

2.5 KiB

Raw Blame History