an analysis of call site patching without strong hardware
play

An Analysis of Call-site Patching Without Strong Hardware Support - PowerPoint PPT Presentation

An Analysis of Call-site Patching Without Strong Hardware Support for Self-Modifying-Code Tim Hartley, Foivos Zakkak, first.last@manchester.ac.uk Christos Kotselidis, Mikel Lujan MPLR19 2019-10-22 Call-Sites Direct branching Indirect


  1. An Analysis of Call-site Patching Without Strong Hardware Support for Self-Modifying-Code Tim Hartley, Foivos Zakkak, first.last@manchester.ac.uk Christos Kotselidis, Mikel Lujan MPLR’19 2019-10-22

  2. Call-Sites Direct branching Indirect branching Method A Method A call/jmp <offset> ld target, 0xabcd Memory call/jmp target Method B Method B Method C Method C 2019-10-22 MPLR’19 @foivoszakkak 2

  3. Call-Site Patching § Tiered compilation § De-optimization § Etc. 2019-10-22 MPLR’19 @foivoszakkak 3

  4. JIT compilation and Caches Main Memory 1 Code-stream vs Data-stream I-CACHE 001010101010110 010101010100101 0 11 00 1 00 11 00 111 1. Code gets fetched to I-Cache 3 1 000 1 0 1 0 1 0 1 0 1 00 1 000 1 0 1 0 1 0 1 0 111 2. Data get fetched to D-Cache 1 000 1 0 1 0 1 0 1 0 1 00 0 11 00 1 00 11 00 111 3. CPU executes code from I-Cache CPU 010101010101010 111110101010100 4. CPU writes data to D-Cache 6 010100100010101 100110010000011 5. D-Cache writes-back to memory 100110010000011 8 7 4 11 000 1 00 111 00 1 0 6. D-Cache fetches code to be edited 1 0 1 0 1 0 1 00 1 00 1 0 1 1 0 1 0 111 00 1 00 111 7. CPU writes code to D-Cache 1 0 1 0 1 0 1 00 1 00 1 0 1 11 00 1 0 1 00 1 00 1 0 1 D-CACHE 8. D-Cache writes-back code 111110101010100 5 2 2019-10-22 MPLR’19 @foivoszakkak 4

  5. Low-power architectures and call-site patching § Fixed size instructions – Limit the range of direct branches/calls • +- 128MiB on AArch64 • +- 1MiB on RISC-V – Require multiple instructions to perform long-range calls AArch64 128MiB x86-64 240MiB 2019-10-22 MPLR’19 @foivoszakkak 5

  6. Low-power architectures and call-site patching (cont.) § Weak memory models and self-modifying-code (SMC) support – SW explicitly issues memory barriers – Code-stream handled separately from data-stream (need to sync them) § Not all instructions are safe to patch – ARM (armv7 and armv8) and IBM (Power) limit the instructions that are safe to be patched while executing • Even if using atomic writes 2019-10-22 MPLR’19 @foivoszakkak 6

  7. Patchable call-site implementations in AArch64 Direct Branching (short-range only) Relative-Load Indirect Branching B TARGET CALLEE_1 : .quad 0 x0123456789ABCDEF ... CALLEE_N : .quad 0 x01234ABCDEF56789 START : ... LDR X16, CALLEE_1 BLR X16 Absolute-Load Indirect Branching Trampolines (OpenJDK approach) MOVZ X16, #0xABCD ; Craft the address L: LDR X16, CALLEE MOVK X16, #0xEF89, lsl #16 ; holding BR X16 ; Don 't link MOVK X16, #0x7654, lsl #32 ; the CALLEE: .quad 0 x0123456789ABCDEF MOVK X16, #0x0213, lsl #48 ; target START: ... LDR X16, [X16] BL SHORT_TARGET ; or L BLR X16 2019-10-22 MPLR’19 @foivoszakkak 7

  8. Comparison of call-site implementation approaches 2019-10-22 MPLR’19 @foivoszakkak 8

  9. Evaluation Setup § Odroid-C2 – Quad-core Cortex-A53 @ 1.54GHz (pinned) • 8-stage pipelined processor with 2-way superscalar, in-order pipeline – 2 GB DDR3 RAM – Ubuntu 18.04.02 LTS – Kernel: Odroid 3.16..68-41 – GCC 8.3.0 – MaxineVM 2.8.0 – OpenJDK 8 u212 2019-10-22 MPLR’19 @foivoszakkak 9

  10. Microbenchmark § Generates inline call-sites § Callers are ret-only methods § To patch we call a patcher method instead of a ret-only § Patcher always patches the next call-site (allows us to control number of patches § Patcher performs the necessary barriers as it would in a real system 2019-10-22 MPLR’19 @foivoszakkak 10

  11. Microbenchmark results 2019-10-22 MPLR’19 @foivoszakkak 11

  12. Dacapo and MaxineVM § We take the best two performing approaches (Direct and Relative-Load Indirect) and evaluate them with DaCapo using MaxineVM § We had to tweak Relative-Load Indirect to make it work with MaxineVM – Due to its metacircular nature, MaxineVM can only operate with offsets (relative branches), since at boot image creation the absolute targets are not known yet Indirect-Maxine ADR X17, CALL ; Get address of BLR LDR X16, OFFSET ; Load offset ADD X16, X16 , X17 ; Add them B #8 ; Jump over inline offset OFFSET: .int CALL - CALLEE_1 CALL: BLR X16 2019-10-22 MPLR’19 @foivoszakkak 12

  13. Indirect-Maxine in Microbenchmark results 2019-10-22 MPLR’19 @foivoszakkak 13

  14. DaCapo Results 2019-10-22 MPLR’19 @foivoszakkak 14

  15. Conclusions § OpenJDK’s method seems the best for AArch64 since it penalizes only long-range branches and avoids explicit instruction cache invalidations on callers. – If you have a higher #"#$%&'($%) *(""+ #+,#'-&'($%) *(""+ ratio then maybe Relative-Load is better § The most promising approach in theory would be combining the following gadgets Indirect (long-rang) Direct (short-range only) ADRP X16, CALLEE ADD X16, X16, :lo12:CALLEE B TARGET BLR X16 – On AArch64 this is not possible though since ADRP and ADD cannot be safely overwritten if they are being executed concurrently with the modifications. 2019-10-22 MPLR’19 @foivoszakkak 15

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend