mirror of
https://github.com/johnkerl/miller.git
synced 2026-01-23 02:14:13 +00:00
Update README-profiling.md (#810)
This commit is contained in:
parent
b6097a2793
commit
2e431d4234
5 changed files with 38 additions and 119 deletions
|
|
@ -163,6 +163,7 @@ nil through the reader/transformer/writer sequence.
|
|||
|
||||
Key performance-related PRs for the Go port include:
|
||||
|
||||
* [https://github.com/johnkerl/miller/pull/424](#424): In C, lots of little mallocs are fine. In Go, not so much. This is not the garbage-collection penalty -- it's the penalty of _allocating_ -- lots of `duffcopy` and `madvise` appearing in the flame graphs. The idea was to reduce data-copies in the DSL.
|
||||
* [https://github.com/johnkerl/miller/pull/765](#765): In C, prints to `stdout` are buffered a line at a time if the output is to the terminal, or a few KB at a time if not (i.e. file or pipe). Note the cost is how often the process does a `write` system call with associated overhead of context-switching into the kernel and back out. The C behavior is the right thing to do. In the Go port, very early on writes were all unbuffered -- several per record. Then buffering was soon switched to per-record, which was an improvement. But as of #765, the buffering is done at the library level, and it's done C-style -- much less frequently when output is not to a terminal.
|
||||
* [https://github.com/johnkerl/miller/pull/774](#774): For CSV-lite and DKVP, this avoids using regexes to split strings when `strings.Split` will do.
|
||||
* [https://github.com/johnkerl/miller/pull/779](#779): The basic idea of the Miller Go port was that the record-reader writes a record at a time over a channel to the first verb; the first verb writes records one at a time to the second verb, and so on; the last verb writes records one at a time to the record-writer. This is very simple, but for large files, the Go runtime scheduler overhead is too large -- data are chopped up into too many pieces. On #779 records are written 500 (or fewer) per batch, and all the channels from record-reader, to verbs, to record-writer are on record-batches. This lets Miller spend more time doing its job and less time yielding to the goroutine scheduler.
|
||||
|
|
@ -170,9 +171,12 @@ Key performance-related PRs for the Go port include:
|
|||
* [https://github.com/johnkerl/miller/pull/787](#787): This removed an unnecessary data copy in the `mlrval.String()` method. Originally this method had non-pointer receiver to conform with the `fmt.Stringer` interface. Hoewver, that's a false economy: `fmt.Println(someMlrval)` is a corner case, and stream processing is the primary concern. Implementing this as a pointer-receiver method was a performance improvement.
|
||||
* [https://github.com/johnkerl/miller/pull/809](#809): This reduced the number of passes through fields for just-in-time type-inference. For example, for `$y = $x + 1`, each record's `$x` field's raw string (if not already accessed in the processing chain) needs to be checked to see if it's int (like `123`), float (like `123.4` or `1.2e3`), or string (anything else). Previously, succinct calls to built-in Go library functions were used. That was easy to code, but made too many expensive calls that were avoidable by lighter peeking of strings. In particular, an is-octal regex was being invoked unnecessarily on every field type-infer operation.
|
||||
|
||||
See also [./README-profiling.md](./README-profiling.md) and [https://miller.readthedocs.io/en/latest/new-in-miller-6/#performance-benchmarks](https://miller.readthedocs.io/en/latest/new-in-miller-6/#performance-benchmarks).
|
||||
|
||||
In summary:
|
||||
|
||||
* #765, #774, and #787 were low-hanging fruit.
|
||||
* #424 was a bit more involved, and reveals that memory allocation -- not just GC -- needs to be handled more mindfully in Go than in C.
|
||||
* #779 was a bit more involved, and reveals that Go's elegant goroutine/channel processing model comes with the caveat that channelized data should not be organized in many, small pieces.
|
||||
* #809 was also bit more involved, and reveals that library functions are convenient, but profiling and analysis can sometimes reveal an opportunity for an impact, custom solution.
|
||||
* #786 was a massive refactor involving about 10KLOC -- in hindsight it would have been best to do this work at the start of the Go port, not at the end.
|
||||
|
|
|
|||
|
|
@ -1,60 +1,56 @@
|
|||
# How to run the profiler
|
||||
# See also
|
||||
|
||||
Running Miller:
|
||||
* [https://github.com/johnkerl/miller/blob/readme-profiling/README-go-port.md#performance-optimizations](https://github.com/johnkerl/miller/blob/readme-profiling/README-go-port.md#performance-optimizations)
|
||||
* [https://miller.readthedocs.io/en/latest/new-in-miller-6/#performance-benchmarks](https://miller.readthedocs.io/en/latest/new-in-miller-6/#performance-benchmarks).
|
||||
|
||||
# How to view profile data
|
||||
|
||||
Run the profiler:
|
||||
|
||||
```
|
||||
mlr --cpuprofile cpu.pprof put -f u/example.mlr then nothing ~/tmp/huge > /dev/null
|
||||
mlr --cpuprofile cpu.pprof --csv put -f scripts/chain-1.mlr ~/tmp/big.csv > /dev/null
|
||||
```
|
||||
|
||||
(or whatever command-line flags).
|
||||
(or whatever command-line flags for Miller).
|
||||
|
||||
# How to view the profiling results
|
||||
|
||||
|
||||
## Text mode
|
||||
Text mode:
|
||||
|
||||
```
|
||||
go tool pprof mlr cpu.pprof
|
||||
top10
|
||||
```
|
||||
|
||||
## PDF mode
|
||||
Graphical mode:
|
||||
|
||||
```
|
||||
go tool pprof --pdf mlr cpu.pprof > mlr-call-graph.pdf
|
||||
mv mlr-call-graph.pdf ~/Desktop
|
||||
go tool pprof -http=:8080 cpu.pprof
|
||||
```
|
||||
|
||||
## Flame-graph mode
|
||||
and let it pop open a browser window.
|
||||
|
||||
One-time setup:
|
||||
# Benchmark scripts
|
||||
|
||||
Scripts:
|
||||
|
||||
```
|
||||
export GOPATH=$HOME/go
|
||||
mkdir -p $HOME/go
|
||||
```
|
||||
* [./scripts/make-big-files](./scripts/make-big-files) -- Create million-record data files in various formats.
|
||||
* [./scripts/chain-cmps.sh](./scripts/chain-cmps.sh) -- Run a few processing scenarios on the million-record CSV file.
|
||||
* [./scripts/chain-1.mlr](./scripts/chain-1.mlr) -- An example `mlr put` used by the previous script
|
||||
* [./scripts/time-big-files](./scripts/time-big-files) -- Runs `mlr cat` for million-record files of various file formats. Catting files isn't intrinsically interesting but it shows how input and output processing vary over file formats.
|
||||
* [./scripts/time-big-file](./scripts/time-big-file) -- Helper script for the former.
|
||||
* [./scripts/chain-lengths.sh](./scripts/chain-lengths.sh) -- Run longer and longer chains of `scripts/chain1.mlr`, showing how Miller handles multicore and concurrency.
|
||||
* [./scripts/make-data-stream](./scripts/make-data-stream) -- Create an endless stream of data to be piped into Miller for steady-state load-testing: e.g. `scripts/make-data-stream | mlr ...` then look at `htop` in another window.
|
||||
|
||||
```
|
||||
go get -u github.com/google/pprof
|
||||
ll ~/go/bin/pprof
|
||||
go get -u github.com/uber/go-torch
|
||||
```
|
||||
Notes:
|
||||
|
||||
```
|
||||
mkdir -p ~/git/brendangregg
|
||||
cd ~/git/brendangregg
|
||||
git clone https://github.com/brendangregg/FlameGraph
|
||||
```
|
||||
* Any of the above can be run using the profiler. I find Flame Graph mode particuarly informative for drill-down.
|
||||
* The above refer to `mlr5` and `~/tmp/miller/mlr` as well as `./mlr`. The idea is I have a copy of Miller 5.10.3 (the C implementation) saved off in my path as `mlr5`. Then I keep `~/tmp/miller` on recent HEAD. Then I have `.` on a dev branch. Comparing `mlr5` to `./mlr` shows relative performance of the C and Go implementations. Comparing `~/tmp/miller/mlr` to `./mlr` shows relative performance of whatever optimization I'm currently working on.
|
||||
* Several of the above scripts use [justtime](https://github.com/johnkerl/scripts/blob/main/fundam/justtime) to get one-line timing information.
|
||||
|
||||
Per run:
|
||||
# How to vary compiler versions
|
||||
|
||||
```
|
||||
cd /path/to/mlr/go
|
||||
export PATH=${PATH}:~/git/brendangregg/FlameGraph/
|
||||
go-torch cpu.pprof
|
||||
mv torch.svg ~/Desktop/
|
||||
```
|
||||
* [./scripts/compiler-versions-install](./scripts/compiler-versions-install)
|
||||
* [./scripts/compiler-versions-build](./scripts/compiler-versions-build)
|
||||
* [./scripts/compiler-versions-time](./scripts/compiler-versions-time)
|
||||
|
||||
# How to control garbage collection
|
||||
|
||||
|
|
@ -69,79 +65,3 @@ GOGC=1000 GODEBUG=gctrace=1 mlr -n put -q -f u/mand.mlr 1> /dev/null
|
|||
# Turn off GC entirely and see where time is spent:
|
||||
GOGC=off GODEBUG=gctrace=1 mlr -n put -q -f u/mand.mlr 1> /dev/null
|
||||
```
|
||||
|
||||
# Findings from the profiler as of 2021-02
|
||||
|
||||
* GC on: lots of GC
|
||||
* GC off: lots of `duffcopy` and `madvise`
|
||||
* From benchmark `u/mand.mlr`: issue is allocation created by expressions
|
||||
* Things like `type BinaryFunc func(input1 *Mlrval, input2 *Mlrval) (output Mlrval)`
|
||||
* `z = 2 + x + 4 + y * 3` results in AST, mapped to a CST, with a malloc on the output of every unary/binary/ternary function
|
||||
* Idea: replace with `type BinaryFunc func(output* Mlrval, input1 *Mlrval, input2 *Mlrval)`: no allocation at all.
|
||||
* Breaks the Fibonacci test since the binary-operator node is no longer re-entrant
|
||||
* Idea: replace with `type BinaryFunc func(input1 *Mlrval, input2 *Mlrval) (output *Mlrval)`: better.
|
||||
* Makes possible zero-copy eval of literal nodes, etc.
|
||||
|
||||
```
|
||||
for i in 100 200 300 400 500 600 700 800 900 1000 ; do
|
||||
for j in 1 2 3 4 5 ; do
|
||||
echo $i;
|
||||
justtime GOGC=$i mlr -n put -q -f u/mand.mlr > /dev/null
|
||||
done
|
||||
done
|
||||
```
|
||||
|
||||
```
|
||||
100 23.754
|
||||
100 23.883
|
||||
100 24.021
|
||||
100 24.022
|
||||
100 24.305
|
||||
200 20.864
|
||||
200 20.211
|
||||
200 19.980
|
||||
200 20.251
|
||||
200 20.691
|
||||
300 19.140
|
||||
300 18.610
|
||||
300 18.793
|
||||
300 19.111
|
||||
300 19.027
|
||||
400 18.067
|
||||
400 18.274
|
||||
400 18.344
|
||||
400 18.378
|
||||
400 18.250
|
||||
500 17.791
|
||||
500 17.644
|
||||
500 17.814
|
||||
500 18.064
|
||||
500 18.403
|
||||
600 17.878
|
||||
600 17.892
|
||||
600 18.034
|
||||
600 18.125
|
||||
600 18.008
|
||||
700 18.153
|
||||
700 18.286
|
||||
700 17.342
|
||||
700 21.136
|
||||
700 20.729
|
||||
800 19.585
|
||||
800 19.116
|
||||
800 17.170
|
||||
800 18.549
|
||||
800 18.236
|
||||
900 16.950
|
||||
900 17.883
|
||||
900 17.532
|
||||
900 17.551
|
||||
900 17.804
|
||||
1000 20.076
|
||||
1000 20.745
|
||||
1000 19.657
|
||||
1000 18.733
|
||||
1000 18.560
|
||||
```
|
||||
|
||||
Sweet spot around 500. Note https://golang.org/pkg/runtime/debug/#SetGCPercent.
|
||||
|
|
|
|||
|
|
@ -1,4 +1,3 @@
|
|||
|
||||
# Introduction
|
||||
|
||||
**Miller is a command-line tool for querying, shaping, and reformatting data files in various formats including CSV, TSV, and JSON.**
|
||||
|
|
@ -35,3 +34,5 @@ which allow you to succintly express your own logic.
|
|||
In the following you can see how CSV, TSV, tabular, JSON, and other **file formats** share a common theme which is **lists of key-value-pairs**. Miller embraces this common theme.
|
||||
|
||||

|
||||
|
||||
**The big picture:** Even well into the 21st century, our world is full of text-formatted data like CSV. Google _CSV memes_, for example. We need tooling to _thrive in this world_, nimbly manipulating data which is in CSVs. And we need tooling to _move beyond CSV_, to be able to pull data out and into other storage and processing systems. Miller is designed for both these goals.
|
||||
|
|
|
|||
|
|
@ -1,6 +1,6 @@
|
|||
#!/bin/bash
|
||||
|
||||
mlr="mlr -S"
|
||||
mlr="mlr"
|
||||
suffix="dkvp"
|
||||
|
||||
iflag=""
|
||||
|
|
|
|||
8
todo.txt
8
todo.txt
|
|
@ -1,20 +1,14 @@
|
|||
================================================================
|
||||
PUNCHDOWN LIST
|
||||
|
||||
* numeric-inference perf
|
||||
o README-profiling.md -- update ...
|
||||
o README-go-port.md re various scripts
|
||||
|
||||
* blockers:
|
||||
- fractional-strptime
|
||||
- improved regex doc w/ lots of examples
|
||||
- cmp-matrices
|
||||
- all-contribs: twi dm
|
||||
https://github.com/all-contributors/all-contributors
|
||||
- license triple-checks
|
||||
- `mlr put` -> coverart
|
||||
- 0b1011 olh/webdoc
|
||||
- big-picture note
|
||||
? array/map fields: marshal as JSON_SINGLE_LINE
|
||||
|
||||
* nikos materials -> fold in
|
||||
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue