mirror of
https://github.com/NanXiao/perf-little-book.git
synced 2026-01-23 02:14:39 +00:00
Format document
This commit is contained in:
parent
3f9629f4e7
commit
0752154821
3 changed files with 11 additions and 11 deletions
|
|
@ -2,7 +2,7 @@
|
|||
|
||||
Let's see a simple example:
|
||||
|
||||
$ cat add_vec.cpp
|
||||
# cat add_vec.cpp
|
||||
#include <algorithm>
|
||||
#include <iostream>
|
||||
#include <vector>
|
||||
|
|
@ -41,14 +41,14 @@ Let's see a simple example:
|
|||
|
||||
Build and run "`perf record`" command to profile it:
|
||||
|
||||
$ g++ add_vec.cpp -o add_vec
|
||||
$ perf record ./add_vec
|
||||
# g++ add_vec.cpp -o add_vec
|
||||
# perf record ./add_vec
|
||||
[ perf record: Woken up 1 times to write data ]
|
||||
[ perf record: Captured and wrote 0.003 MB perf.data (27 samples) ]
|
||||
|
||||
A "perf.data" file will be generated. Use "`perf report`" command to analyze it:
|
||||
|
||||
$ perf report
|
||||
# perf report
|
||||
Samples: 27 of event 'cycles:uppp', Event count (approx.): 2149376
|
||||
Overhead Command Shared Object Symbol
|
||||
33.67% add_vec ld-2.28.so [.] do_lookup_x
|
||||
|
|
@ -91,11 +91,11 @@ At least from output, `do_lookup_x` in `ld-2.28.so` is sampled mostly. Highlight
|
|||
|
||||
If you want to map the assembly code to source code, try to build program with `-g` option:
|
||||
|
||||
$ g++ add_vec.cpp -g -o add_vec
|
||||
# g++ add_vec.cpp -g -o add_vec
|
||||
|
||||
Another useful option of using "`perf record`" is `-g`, which records function call stack information.
|
||||
|
||||
$ perf record -g ./add_vec
|
||||
# perf record -g ./add_vec
|
||||
[ perf record: Woken up 1 times to write data ]
|
||||
[ perf record: Captured and wrote 0.004 MB perf.data (28 samples) ]
|
||||
$ perf report
|
||||
|
|
|
|||
|
|
@ -4,7 +4,7 @@ Generally speaking, cache false sharing is one processor modifies the data in on
|
|||
|
||||
Check following code:
|
||||
|
||||
$ cat false_share.c
|
||||
# cat false_share.c
|
||||
#include <omp.h>
|
||||
|
||||
#define N 100000000
|
||||
|
|
@ -33,12 +33,12 @@ The size of `sum` array is `64` bytes on my `X64` platform, and resides in one c
|
|||
sum[i] += values[j] >> i;
|
||||
It will cause cache false sharing issue. Build and use "`perf c2c record`" command to profile it:
|
||||
|
||||
$ gcc -fopenmp -g false_share.c -o false_share
|
||||
$ perf c2c record ./false_share
|
||||
# gcc -fopenmp -g false_share.c -o false_share
|
||||
# perf c2c record ./false_share
|
||||
|
||||
Use "`perf c2c report`" to analyze it, and "`HITM`" event is the central issue:
|
||||
|
||||
$ perf c2c report --stdio
|
||||
# perf c2c report --stdio
|
||||
......
|
||||
=================================================
|
||||
Trace Event Information
|
||||
|
|
|
|||
|
|
@ -3,7 +3,7 @@
|
|||
`perf mem` command can be used to profile memory access. I.e, `perf mem record` samples while `perf mem report` shows the results. By default, `perf mem record` will count both load and store operations, and `-t` option can be used to specify one of them (e.g, `-t load`). Check following example:
|
||||
|
||||
# perf mem record ./stream
|
||||
# perf mem report --stdio
|
||||
# perf mem report --stdio
|
||||
# To display the perf.data header info, please use --header/--header-only options.
|
||||
#
|
||||
#
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue