configurable and efficient memory access tracing via
play

Configurable and Efficient Memory Access Tracing via Selective - PowerPoint PPT Presentation

Simone Economo, Davide Cingolani, Alessandro Pellegrini and Francesco Quaglia DIAG - Sapienza University of Rome Configurable and Efficient Memory Access Tracing via Selective Expression-based x86 Binary Instrumentation


  1. Simone Economo, Davide Cingolani, Alessandro Pellegrini and Francesco Quaglia DIAG - Sapienza University of Rome Configurable and Efficient Memory Access Tracing via Selective Expression-based x86 Binary Instrumentation {economo,cingolani,pellegrini,quaglia} @ diag.uniroma1.it

  2. • Interception of memory accesses issued by a program • Off-line and on-line applications – Performance evaluation of architectures • e.g., Trace-driven simulation – Detection of security vulnerabilities • e.g., Buffer overflows – Detection of memory inefficiencies • e.g., Memory leaks – Runtime optimization of programs • e.g., CC-NUMA systems Memory access tracing

  3. • Memory access tracing is interesting because – Intercepting all accesses may lead to excessive runtime overhead • e.g., profilers and debuggers – Intercepting some accesses may lead to inaccurate tracing results • e.g., trace-driven simulation, run-time optimization – Users could want a trade-off between accuracy and overhead • e.g., "I'm willing to sacrifice some accuracy for less overhead" – Users could be interested in tracing accesses to bigger chunks • e.g., OS pages, cache lines, malloc chunk etc. Tracing challenges

  4. • Hardware-based – Performance Monitoring Units (PMUs) • Tracing performed implicitly by the hardware running the program • Software-based – Kernel-level • Usually limited to OS-page granularity (e.g., 4KB or 2MB) – Library-level • Usually limited to very specific application domains (e.g., MPI applications) – Binary Code Instrumentation • Performed explicitly and transparently by injecting additional code in the program • Our approach! Tracing techniques

  5. Accurate directly affects the tracing accuracy Configurable Efficient should affect both overhead and accuracy – in terms of subset size and tracing granularity – 3. Add flexibility to tracing – 1. Instrument a subset of the accesses using a smart selection algorithm – 2. Make this subset representative directly affects the tracing overhead – rather than the entire stream – Our goals

  6. Constants in expressions don't carry memory-alignment information A single expression can encode different addresses over time • False chunk sharing – Different expressions can encode the same address at the same time • Address aliasing – • • Memory addresses are encoded as expressions Address multiplexing – • Memory address expressions are subject to some issues – e.g., x86 SIB expressions (Scale-Index-Base) – evaluated to actual addresses at run-time – linear combinations of registers and constants Instrumentation issues • evaluated to Base + Index * Scale + Displacement

  7. • x86 SIB addressing ( Scale-Index-Base ) is complex – The same structure is used for addressing different types of memory • e.g., The base address of a static object can be specified through an immediate • e.g., The base address of a dynamic object must be specified through a register – An address can be computed in more convoluted ways • e.g., A register in a SIB expression can be the result of another SIB expression Instrumentation issues on x86/GCC/Linux mov 0x601120(,%rax,4),%edi mov -0x4(%rbp),%edx lea 0x0(,%rax,4),%rdx add %rdx,%rax mov (%rax),%esi

  8. • An abstract addressing model – Formalizes the structure and complexity of SIB expressions • A selection algorithm – Deals with the intrinsic issues of tracing via instrumentation – Satisfies the efficiency, accuracy and flexibility goals Our contributions

  9. – either a register identifier or an immediate • A BID template is a family of expressions – sharing the same type (register or immediate) for each field ➡ x86 SIB expressions fall into two BID templates: 1. (e.g., dynamic memory or convoluted accesses to all kinds of memory) 2. (e.g., static memory) Base-Index-Displacement (BID) model • A BID address field is a placeholder for a value • A BID address expression is a tuple of fields <b,i,d> – evaluates to the address b + i + d RRI , when the base address is a register IRR , when the base address is an immediate

  10. • It relies on two user-defined parameters: 1. • Determines the percentage of traced accesses at runtime • Affects overhead and accuracy 2. Chunk size = C • Determines the granularity of tracing • Partially affects accuracy • It elides the address multiplexing problem – Register values coming from multiple control-flow paths are ignored • The internal state is dicarded at basic-block boundaries – Updates to the contents of registers are tracked • Including possible updates coming from conditional data-flow instructions Selection algorithm Instrumentation factor = ω

  11. • Two BID expressions are equal if and only if – they share the same fields – they share the same values for each field • Pointer aliasing can still occur – because the contents of registers are unpredictable – ...but there are no false positives Expression equality

  12. • Equal expressions form a cluster led by a representative – so that further analysis doesn't have to consider the whole cluster – its access count is the size of the cluster that it represents ➡ Tracing a representative means tracing the cluster – a single instrumentation coin buys tracing of the whole cluster – reduces the overhead without affecting the accuracy Expression representatives

  13. • The distance between two representatives is – evaluated on a field-by-field basis – zero if they are likely to fall into the same C-byte chunk – greater if they are likely to produce more distant addresses • False chunk sharing is still possible – because only runtime addresses have memory-alignment information – ...but the probability of false positives decreases with increasing C's – ...and also with decreasing gaps between immediates Expression distance • by comparing register identifiers against equality (e.g., rax ≠ rbx ) • by comparing immediates against their absolute difference e.g., |0x10 - 0x18| )

  14. False True False False False True True True Distance function for RRI expressions b 1 = b 2 i 1 = i 2 i 1 = i 2 4 3 1 |d 1 - d 2 | ≥ C 0 5

  15. False False True False True True True False Distance function for IRR expressions |b 1 - b 2 | ≥ C i 1 = i 2 5 d 1 = d 2 d 1 = d 2 1 0 4 3

  16. e 2 e 1 Absolute difference less than C Example of false chunk sharing

  17. • The score of a representative is a tuple composed of 1. Access count = how many other accesses are traced for free 2. ➡ The higher the score, the most valuable is the access – tells where an instrumentation coin is best spent – improves the accuracy without affecting the overhead Expression scores Average distance ≃ how well the access "samples" the address space

  18. • Reduced to a (0,1)-knapsack problem , solved iteratively – Items are representatives – Values are scores – Weights are all equal ➡ Maximize sum of values, for all representatives, such that – items in the knapsack don't exceed the residual space Selecting expressions – The knapsack size is ω % of all representatives – Iteration i sees the residual space left by iteration i - 1

  19. Start a new iterative step Place it in the knapsack 2. Unfreeze all representatives 1. If there is residual space in the knapsack 2. Freeze all zero-distance representatives 3. 2. • Base step (ignoring frozen ones) Select the next most-valuable representative 1. Solve a residual (0,1)-knapsack instance 1. • Iterative step Choose representatives and compute scores – The iterative (0,1)-knapsack

  20. Example ω = 50%, C = 16B, n = 18, m = ? 1. RRI mov -0x4(%rbp),%edx 2. RRI mov -0x8(%rbp),%eax 3. RRI mov -0x18(%rbp),%rax 4. RRI mov -0x4(%rbp),%edx 5. RRI mov -0x8(%rbp),%eax 6. RRI mov -0x18(%rbp),%rax 7. RRI mov (%rax),%esi 8. RRI mov -0x4(%rbp),%edx 9. RRI mov -0xc(%rbp),%eax 10. IRR mov 0x601120(,%rax,4),%edi 11. RRI mov -0xc(%rbp),%edx 12. RRI mov -0x8(%rbp),%eax 13. IRR mov 0x601060(,%rax,4),%eax 14. RRI mov -0x4(%rbp),%edx 15. RRI mov -0x8(%rbp),%eax 16. RRI mov -0x18(%rbp),%rax 17. RRI mov -0x4(%rbp),%edx 18. RRI mov (%rax),%esi

  21. Example ω = 50%, C = 16B, n = 18, m = ? 1. RRI 5 mov -0x4(%rbp),%edx 2. RRI 4 mov -0x8(%rbp),%eax 3. RRI 3 mov -0x18(%rbp),%rax 4. RRI mov -0x4(%rbp),%edx 5. RRI mov -0x8(%rbp),%eax 6. RRI mov -0x18(%rbp),%rax 7. RRI 1 mov (%rax),%esi 8. RRI mov -0x4(%rbp),%edx 9. RRI 2 mov -0xc(%rbp),%eax 10. IRR 1 mov 0x601120(,%rax,4),%edi 11. RRI mov -0xc(%rbp),%edx 12. RRI mov -0x8(%rbp),%eax 13. IRR 1 mov 0x601060(,%rax,4),%eax 14. RRI mov -0x4(%rbp),%edx 15. RRI mov -0x8(%rbp),%eax 16. RRI mov -0x18(%rbp),%rax 17. RRI mov -0x4(%rbp),%edx 18. RRI 1 mov (%rax),%esi

  22. Example ω = 50%, C = 16B, n = 18, m = 8 1. RRI mov -0x4(%rbp),%edx score = <5, ?> 2. RRI mov -0x8(%rbp),%eax score = <4, ?> 3. RRI mov -0x18(%rbp),%rax score = <3, ?> 4. RRI mov (%rax),%esi score = <1, ?> 5. RRI mov -0xc(%rbp),%eax score = <2, ?> 6. IRR mov 0x601120(,%rax,4),%edi score = <1, ?> 7. IRR mov 0x601060(,%rax,4),%eax score = <1, ?> 8. RRI mov (%rax),%esi score = <1, ?>

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend