Format document

2026-01-23 02:14:39 +00:00 · 2019-04-12 17:47:00 +08:00 · 2019-04-12 17:47:00 +08:00 · 0752154821
commit 0752154821
parent 3f9629f4e7
3 changed files with 11 additions and 11 deletions
--- a/posts/an-example-of-profiling-application.md
+++ b/posts/an-example-of-profiling-application.md
@ -2,7 +2,7 @@

 Let's see a simple example:  

-	$ cat add_vec.cpp
+	# cat add_vec.cpp
 	#include <algorithm>
 	#include <iostream>
 	#include <vector>
@ -41,14 +41,14 @@ Let's see a simple example:

 Build and run "`perf record`" command to profile it:  

-	$ g++ add_vec.cpp -o add_vec
-	$ perf record ./add_vec
+	# g++ add_vec.cpp -o add_vec
+	# perf record ./add_vec
 	[ perf record: Woken up 1 times to write data ]
 	[ perf record: Captured and wrote 0.003 MB perf.data (27 samples) ]

 A "perf.data" file will be generated. Use "`perf report`" command to analyze it:  

-	$ perf report
+	# perf report
 	Samples: 27  of event 'cycles:uppp', Event count (approx.): 2149376
 	Overhead  Command  Shared Object     Symbol
 	  33.67%  add_vec  ld-2.28.so        [.] do_lookup_x
@ -91,11 +91,11 @@ At least from output, `do_lookup_x` in `ld-2.28.so` is sampled mostly. Highlight

 If you want to map the assembly code to source code, try to build program with `-g` option:  

-	$ g++ add_vec.cpp -g -o add_vec
+	# g++ add_vec.cpp -g -o add_vec

 Another useful option of using "`perf record`" is `-g`, which records function call stack information.

-	$ perf record -g ./add_vec
+	# perf record -g ./add_vec
 	[ perf record: Woken up 1 times to write data ]
 	[ perf record: Captured and wrote 0.004 MB perf.data (28 samples) ]
 	$ perf report
--- a/posts/check-cache-false-sharing.md
+++ b/posts/check-cache-false-sharing.md
@ -4,7 +4,7 @@ Generally speaking, cache false sharing is one processor modifies the data in on

 Check following code:  

-	$ cat false_share.c
+	# cat false_share.c
 	#include <omp.h>

 	#define N 100000000
@ -33,12 +33,12 @@ The size of `sum` array is `64` bytes on my `X64` platform, and resides in one c
 	sum[i] += values[j] >> i;
 It will cause cache false sharing issue. Build and use "`perf c2c record`" command to profile it:  

-	$ gcc -fopenmp -g false_share.c -o false_share
-	$ perf c2c record ./false_share
+	# gcc -fopenmp -g false_share.c -o false_share
+	# perf c2c record ./false_share

 Use "`perf c2c report`" to analyze it, and "`HITM`" event is the central issue:  

-	$ perf c2c report --stdio
+	# perf c2c report --stdio
 	......
 	=================================================
 	            Trace Event Information
--- a/posts/profile-memory-access.md
+++ b/posts/profile-memory-access.md
@ -3,7 +3,7 @@
 `perf mem` command can be used to profile memory access. I.e, `perf mem record` samples while `perf mem report` shows the results. By default, `perf mem record` will count both load and store operations, and `-t` option can be used to specify one of them (e.g, `-t load`). Check following example:  

 	# perf mem record ./stream
-  	# perf mem report --stdio
+	# perf mem report --stdio
 	# To display the perf.data header info, please use --header/--header-only options.
 	#
 	#