Memoro
Thierry Treyer
Performance & Capacity Intern
Mark Santaniello
Performance & Capacity Engineer
James Larus
EPFL IC School Dean
Scaling an LLVM-Based Heap Profiler
1
Memoro Scaling an LLVM-Based Heap Profiler Thierry Treyer Mark - - PowerPoint PPT Presentation
Memoro Scaling an LLVM-Based Heap Profiler Thierry Treyer Mark Santaniello James Larus Performance & Performance & EPFL IC School Dean Capacity Intern Capacity Engineer 1 vector<BigT> getValues( map<Id, BigT>&
Memoro
Thierry Treyer
Performance & Capacity Intern
Mark Santaniello
Performance & Capacity Engineer
James Larus
EPFL IC School Dean
Scaling an LLVM-Based Heap Profiler
1
vector<BigT> getValues( map<Id, BigT>& largeMap, vector<Id>& keys) { vector<BigT> values; values.reserve(largeMap.size()); for (const auto& key: keys) values.emplace_back(largeMap[key]); return values; }
2
3
vector<BigT> getValues( map<Id, BigT>& largeMap, vector<Id>& keys) { vector<BigT> values; values.reserve(largeMap.size()); for (const auto& key: keys) values.emplace_back(largeMap[key]); return values; }
4
vector<BigT> getValues( map<Id, BigT>& largeMap, vector<Id>& keys) { vector<BigT> values; values.reserve(largeMap.size()); for (const auto& key: keys) values.emplace_back(largeMap[key]); return values; }
5
vector<BigT> getValues( map<Id, BigT>& largeMap, vector<Id>& keys) { vector<BigT> values; values.reserve(keys.size()); for (const auto& key: keys) values.emplace_back(largeMap[key]); return values; }
6
LLVM-Based Profiler
7
Manipulate the IR
LLVM-Based Profiler
7
Manipulate the IR Infrastructure
LLVM-Based Profiler
7
Manipulate the IR Infrastructure Collecting and Displaying data
LLVM-Based Profiler
7
Visualizer Open Challenges Run-Time Overhead
Memoro +
8
Source Code Compile Run Analyze
Overview
9
Source Code Compile Run Analyze No modification
Overview
9
Source Code Compile Run Analyze No modification Instrument loads/stores Instrument intrinsics Collect types
Overview
INSTRUMENTATION PASS (LLVM)9
Source Code Compile Run Analyze No modification Instrument loads/stores Intercept alloc/free Instrument intrinsics Collect types Intercept loads/stores Intercept syscalls Collect stats
Overview
INSTRUMENTATION PASS (LLVM) RUN-TIME (COMPILER-RT)9
Source Code Compile Run Analyze No modification Instrument loads/stores Intercept alloc/free Score AP Instrument intrinsics Collect types Intercept loads/stores Intercept syscalls Guide exploration Collect stats
Overview
INSTRUMENTATION PASS (LLVM) RUN-TIME (COMPILER-RT) VISUALIZER (ELECTRON)9
Source Code Compile Run Analyze No modification Instrument loads/stores Intercept alloc/free Score AP Instrument intrinsics Collect types Intercept loads/stores Intercept syscalls Guide exploration Collect stats
Overview
INSTRUMENTATION PASS (LLVM) RUN-TIME (COMPILER-RT) VISUALIZER (ELECTRON)9
Visualizer Open Challenges Run-Time Overhead
Memoro +
10
Visualizer Open Challenges Run-Time Overhead
Memoro +
10
slowdown due to Memoro's run-time
11
Run-Time Sampling
void interceptLoadStore(…) { // Sample accesses if (sample_count++ % access_sampling_rate != 0) return; /* Process access... */ } int sample_count = 0;
12
Run-Time Sampling
void interceptLoadStore(…) { // Sample accesses if (sample_count++ % access_sampling_rate != 0) return; /* Process access... */ } THREADLOCAL int sample_count = 0;
12
Power to the user!
MEMORO_OPTIONS="…" ./myapp
// Public API: memoro_interface.h #include <memoro_interface.h> void foo(…) { MemoroFlags *mflags = memoro::getFlags(); mflags->access_sampling_rate = 50; /* ... */ }
13
99%
14
Time spent by address type
0% 25% 50% 75% 100%Primary Heap Secondary Heap Not Heap
99%
14
Time spent by address type
0% 25% 50% 75% 100%Primary Heap Secondary Heap Stack
99%
14
ld 0x…
Metadata Addr Size First Access Time Access Range Low …The Allocators
15
ld 0x…
Metadata Addr Size First Access Time Access Range Low …🔓
The Allocators
15
Issue with non-heap addresses
16
Issue with non-heap addresses
16
Issue with non-heap addresses
16
Issue with non-heap addresses
16
Issue with non-heap addresses
0% 25% 50% 75% 100% Primary Heap Secondary Heap Stack16
Run-Time Filter
17
Run-Time Filter
0xABCD 17
Run-Time Filter
0xABCD 0xAAAA 17
Run-Time Filter
0xABCD 0xAAAA 0xAABB 0x1234 17
0.58% 0.58%
Time spent by address type
0% 25% 50% 75% 100%Primary Heap Secondary Heap Not heap Stack Filtered
99% <2%
18
slowdown due to Memoro's run-time
19
slowdown due to Memoro's run-time
20
Visualizer Open Challenges Run-Time Overhead
Memoro +
21
Visualizer Open Challenges Run-Time Overhead
Memoro +
21
Stack Traces +
1B
Allocations
22
23
Truncate
Score 0% 10% 20% 30% 40% Bin Size 100 300 1k 3k 10k 30k 100k24
Truncate
Score 0% 10% 20% 30% 40% Bin Size 100 300 1k 3k 10k 30k 100k HIDE24
25
25
VS.
foo() bar()
Death by a thousand cuts
main() . . . main() . . .
26
VS.
foo() bar()
Death by a thousand cuts
main() . . . main() . . .
26
VS.
foo() bar()
Death by a thousand cuts
main() . . . main() . . . main()
26
VS.
foo() bar()
Death by a thousand cuts
main() . . . main() . . . .
26
VS.
foo() bar()
Death by a thousand cuts
main() . . . main() . . . .
26
VS.
foo() bar()
Death by a thousand cuts
main() . . . main() . . . .
26
VS.
foo() bar()
Death by a thousand cuts
main() . . . main() . . . bar()
26
VS.
foo() bar()
Death by a thousand cuts
main() . . . main() . . .
26
VS.
foo() bar()
Death by a thousand cuts
main() . . . main() . . .
26
main() main()
Death by a thousand cuts
bar() . . . foo() . . .
27
Memoro +
28
vector<BigT> getValues( map<Id, BigT>& largeMap, vector<Id>& keys) { vector<BigT> values; values.reserve(largeMap.size()); for (const auto& key: keys) values.emplace_back(largeMap[key]); return values; }
29
Demo
30
Visualizer Open Challenges Run-Time Overhead
Memoro +
31
Dumping Profile
Your regular service
32
Dumping Profile
AtExit()
Your regular service
32
Dumping Profile
AtExit()
Your regular service
32
Dumping Profile
AtExit()
Facebook service
32
Dumping Profile
AtExit()
Facebook service
32
Dumping Profile
AtExit()
Facebook service
32
Dumping Profile
AtExit()
Facebook service
lldb
call AtExit()
32
Dumping Profile
AtExit()
Facebook service
lldb
call AtExit()
32
Dumping Profile
AtExit()
Facebook service
lldb
call AtExit()
a. Signal to dump (SIGPROF)
32
Dumping Profile
AtExit()
Facebook service
lldb
call AtExit()
a. Signal to dump (SIGPROF)
32
Compile-Time Stack Analysis
33
Compile-Time Stack Analysis
ld/st 33
Compile-Time Stack Analysis
llvm::GetUnderlyingObject() ld/st 33
Compile-Time Stack Analysis
llvm::GetUnderlyingObject()
Ratio Instrumented load/store 22500 45000 67500 90000 GetUnderlyingObject(depth = X) 1 2 4 8ld/st 33
Compile-Time Stack Analysis
llvm::GetUnderlyingObject()
Ratio Instrumented load/store 22500 45000 67500 90000 GetUnderlyingObject(depth = X) 1 2 4 8ld/st bar() foo() 33
Compile-Time Stack Analysis
llvm::GetUnderlyingObject()
Ratio Instrumented load/store 22500 45000 67500 90000 GetUnderlyingObject(depth = X) 1 2 4 8ld/st bar() foo() 33
Thank you!
Thierry Treyer
Performance & Capacity Intern
Mark Santaniello
Performance & Capacity Engineer
James Larus
EPFL IC School Dean
34
github.com/epfl-vlsc/memoro