Format document

This commit is contained in:
Nan Xiao 2019-04-12 17:47:00 +08:00
parent 3f9629f4e7
commit 0752154821
3 changed files with 11 additions and 11 deletions

View file

@ -2,7 +2,7 @@
Let's see a simple example:
$ cat add_vec.cpp
# cat add_vec.cpp
#include <algorithm>
#include <iostream>
#include <vector>
@ -41,14 +41,14 @@ Let's see a simple example:
Build and run "`perf record`" command to profile it:
$ g++ add_vec.cpp -o add_vec
$ perf record ./add_vec
# g++ add_vec.cpp -o add_vec
# perf record ./add_vec
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.003 MB perf.data (27 samples) ]
A "perf.data" file will be generated. Use "`perf report`" command to analyze it:
$ perf report
# perf report
Samples: 27 of event 'cycles:uppp', Event count (approx.): 2149376
Overhead Command Shared Object Symbol
33.67% add_vec ld-2.28.so [.] do_lookup_x
@ -91,11 +91,11 @@ At least from output, `do_lookup_x` in `ld-2.28.so` is sampled mostly. Highlight
If you want to map the assembly code to source code, try to build program with `-g` option:
$ g++ add_vec.cpp -g -o add_vec
# g++ add_vec.cpp -g -o add_vec
Another useful option of using "`perf record`" is `-g`, which records function call stack information.
$ perf record -g ./add_vec
# perf record -g ./add_vec
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.004 MB perf.data (28 samples) ]
$ perf report

View file

@ -4,7 +4,7 @@ Generally speaking, cache false sharing is one processor modifies the data in on
Check following code:
$ cat false_share.c
# cat false_share.c
#include <omp.h>
#define N 100000000
@ -33,12 +33,12 @@ The size of `sum` array is `64` bytes on my `X64` platform, and resides in one c
sum[i] += values[j] >> i;
It will cause cache false sharing issue. Build and use "`perf c2c record`" command to profile it:
$ gcc -fopenmp -g false_share.c -o false_share
$ perf c2c record ./false_share
# gcc -fopenmp -g false_share.c -o false_share
# perf c2c record ./false_share
Use "`perf c2c report`" to analyze it, and "`HITM`" event is the central issue:
$ perf c2c report --stdio
# perf c2c report --stdio
......
=================================================
Trace Event Information

View file

@ -3,7 +3,7 @@
`perf mem` command can be used to profile memory access. I.e, `perf mem record` samples while `perf mem report` shows the results. By default, `perf mem record` will count both load and store operations, and `-t` option can be used to specify one of them (e.g, `-t load`). Check following example:
# perf mem record ./stream
# perf mem report --stdio
# perf mem report --stdio
# To display the perf.data header info, please use --header/--header-only options.
#
#