Generating Low-Overhead Dynamic Binary Translators Mathias Payer - PowerPoint PPT Presentation

Generating Low-Overhead Dynamic Binary Translators Mathias Payer and Thomas R. Gross Department of Computer Science ü ETH Z rich

Motivation “ ”  Binary Translation (BT) well known technique for late transformations  Extend or add features on the fly  Flexibility of dynamic software BT incurs runtime overhead  Complexity of transformations can be a challenge  Offer a high-level interface at compile time, compile into effective translation tables ü 2010-05-26 ETH Z rich / LST / Mathias Payer 2

Outline  Introduction  Design and Implementation  Table generation  Translator  Optimization  Conclusion ü 2010-05-26 ETH Z rich / LST / Mathias Payer 3

Binary Translation in a Nutshell Instrumented program Original program Static translation 0' 0 1' 1 2' 3' 2 3 What about: ● Self modifying code? ? 4' 4 ● Shared libraries? ● Obfuscated Code? ü 2010-05-26 ETH Z rich / LST / Mathias Payer 4

Binary Translation in a Nutshell Instrumented program Original program Dynamic translation 0' 0 1' 1 2' 3' 2 3 Features: ● Translates all executed ... ... ... code 4 ● Captures all indirect control flow transfers ● Just in time translation ü 2010-05-26 ETH Z rich / LST / Mathias Payer 5

Binary Translation in a Nutshell Original program Code cache Translator Gen. opcode 1' 0 table 1 3' 2 3 Table generator 2' supplies generated opcode tables ... 4 at compile time ü 2010-05-26 ETH Z rich / LST / Mathias Payer 6

Binary Translation in a Nutshell Original program Code cache Translator Gen. opcode 1' 0 table 1 3' Trampoline to translate 4 2 3 2' Mapping 3 3' ... 4 1 1' 2 2' ü 2010-05-26 ETH Z rich / LST / Mathias Payer 7

fastBT  Prototype for a dynamic BT system  Machine-independent, OS-independent  Focus of this talk: IA32, Linux ü 2010-05-26 ETH Z rich / LST / Mathias Payer 8

Table Generation  Translation tables describe individual instructions and are used to select the correct adapter functions  Manual table construction is hard & cumbersome  Many instructions, write machine-code tables by hand  Use automation and high level description!  Information about opcodes, possible encodings, and properties  Specify default translation actions Table generator Intel IA32 Optimized ● High level interface opcode translator ● Adapter functions tables table ü 2010-05-26 ETH Z rich / LST / Mathias Payer 9

Table Generation  Use table generator to offer high-level interface  Transforming opcode tables into runtime translation tables  Add analysis functions to control the table generation  Memory access?  What are src, dst, aux parameters?  FPU usage?  What kind of opcode?  What opcode class (load, store, arithmetic, control flow, ...)?  Immediate value as pointer?  etc. ü 2010-05-26 ETH Z rich / LST / Mathias Payer 10

Translator implementation  Translator uses an iterator based approach and per- instruction actions  Fundamentals to master low overhead:  Code cache  Inlining  Master (indirect) control transfers ü 2010-05-26 ETH Z rich / LST / Mathias Payer 11

Optimization  Indirect control flow transfers are expensive  Runtime lookup and patching required  Indirect control transfer replaced by software trap  Optimizations in fastBT:  Local branch prediction  Inlining a fast lookup into the code cache  Building on-the-fly shadow jump tables ü 2010-05-26 ETH Z rich / LST / Mathias Payer 12

Optimization: Branch prediction  Cache the last one or two targets  If there is a cache hit  No lookup is needed  Results in 3 to 5 instructions  If there is a cache miss  Lookup the target and cache it for future use  Updating the cache costs additional instructions ü 2010-05-26 ETH Z rich / LST / Mathias Payer 13

Optimization: Fast lookup  Emit an inlined fast lookup into the code cache  Uses the mapping table to translate the target  Optimized for direct hit in the mapping table  Results in 13 or 14 instructions ü 2010-05-26 ETH Z rich / LST / Mathias Payer 14

Optimization: Shadow jump table  Build a shadow jump table, iff the original indirect control transfer uses a jump table  Initialize all entries with catch-all function  Lazy lookup and write-back in catch-all  Results in 5 instructions if the target is translated ü 2010-05-26 ETH Z rich / LST / Mathias Payer 15

Optimization: Problem  Each optimization is only effective for some program locations and a specific program behavior  Low number of targets, few changes  Use a cache  High number of targets, many changes  Use fast lookup  Location has many different targets, all close to each other  Use a shadow jump-table  An adaptive runtime optimization can select the best optimization for each indirect control transfer ü 2010-05-26 ETH Z rich / LST / Mathias Payer 16

Adaptive Optimization  fastBT offers an adaptive optimization for indirect control transfers  Start with a prediction for 1 or 2 locations, count misses  Recover to a fast lookup, if count exceeds threshold  Construct a shadow jump-table, if the control transfer uses a jump table  Adaptive optimizations bring competitive performance! ü 2010-05-26 ETH Z rich / LST / Mathias Payer 17

Benchmarks: Setup  Used null-transformation to show translation overhead  Used SPEC CPU2006 benchmarks to evaluate performance  We use the Test dataset for short running programs and the Ref dataset for long running programs  Machine: E6850 Intel Core2Duo @ 3.00GHz ü 2010-05-26 ETH Z rich / LST / Mathias Payer 18

Related work  HDTrans  S. Sridhar et al. HDTrans: a low-overhead dynamic translator. SIGARCH'07  Table based dynamic BT, no high level interface  DynamoRIO  D. Bruening et al. Design and implementation of a dynamic optimization framework for windows. In ACM Workshop Feedback- directed Dyn. Opt. (FDDO-4) (2001).  IR based optimizing BT, does not export a translation interface  PIN  C.-K. Luk et al. Pin: building customized program analysis tools with dynamic instrumentation. In PLDI'05  High overhead, offers high level interface ü 2010-05-26 ETH Z rich / LST / Mathias Payer 19

Benchmarks: Ref dataset 126% 100% 90% 80% 70% 60% fastBT Overhead HDTrans 50% PIN 40% dynamoRIO 30% 20% 10% 0% 400.perlbench 445.gobmk 483.xalancbmk 447.dealII Average ü 2010-05-26 ETH Z rich / LST / Mathias Payer 20

Benchmarks: Ref dataset Benchmark Function inlined Indirect jmptbl pred Indirect pred calls 1) jumps 1) calls 1) 400.perlbench 25'814 8.1% 21'930 93.7% 6.3% 3'903 7.4% 445.gobmk 18'001 1.3% 93 1.0% 99.0% 185 4.1% 483.xalancbmk 28'888 10.6% 2'627 27.0% 63.6% 9'161 96.1% 447.dealII 52'756 54.5% 21'147 1.7% 98.3% 540 98.4% 1) All numbers are *10 6 ü 2010-05-26 ETH Z rich / LST / Mathias Payer 21

Benchmarks: Test dataset 1415% 3481% 308% 745% 140% 120% 100% fastBT Overhead 80% HDTrans PIN 60% dynamoRIO 40% 20% 0% 400.perlbench 445.gobmk 483.xalancbmk 447.dealII Average ü 2010-05-26 ETH Z rich / LST / Mathias Payer 22

Benchmarks: Ref vs. Test Dataset Ref dataset Test dataset Benchmark no BT [s] fastBT no BT[s] fastBT 400.perlbench 486 56% 4 29% 445.gobmk 611 18% 21 18% 483.xalancbmk 371 24% <1 56% 447.dealII 552 44% 25 36% Average 839 6% 8 10% ü 2010-05-26 ETH Z rich / LST / Mathias Payer 23

Benchmarks: Summary  High overhead:  Many indirect control transfers  Function calls incur high overhead, even with optimizations  Indirect control transfers without caches or jump tables add overhead  High collision rate in mapping table  Expensive recoveries, try different rescheduling strategies  Low overhead:  Few indirect control transfers  Cost of indirect control transfers is reduced through optimizations ü 2010-05-26 ETH Z rich / LST / Mathias Payer 24

Conclusion  fastBT shows that it is possible to combine ease of use with efficient binary translation  Adaptive optimizations select best optimization for individual locations  Adaptive optimizations are necessary for low overhead in table based binary translators ü 2010-05-26 ETH Z rich / LST / Mathias Payer 25

Thanks for your attention! ?  fastBT project page: http://nebelwelt.net/fastBT  Contact: mathias.payer@inf.ethz.ch  Kudos to:  Marcel Wirth, Peter Suter, Stephan Classen, and Antonio Barresi for code contributions  My colleagues for endless comments and reviews ü 2010-05-26 ETH Z rich / LST / Mathias Payer 26

Table Generation: Analysis Function bool isMemOp (const unsigned char* opcode, const instr& disInf, std::string& action) { bool res; /* check for memory access in instr. */ res = mayOpAccessMem(disInf.dstFlags); res |= mayOpAccessMem(disInf.srcFlags); res |= mayOpAccessMem(disInf.auxFlags); /* change the default action */ if (res) { action = "handleMemOp"; } return res; } // in main function: addAnalysFunction(isMemOp); ü 2010-05-26 ETH Z rich / LST / Mathias Payer 27

Generating Low-Overhead Dynamic Binary Translators Mathias Payer - PowerPoint PPT Presentation

Generating Low-Overhead Dynamic Binary Translators Mathias Payer and Thomas R. Gross Department of Computer Science ETH Z rich Motivation Binary Translation (BT) well known technique for late transformations Extend or add

Low-Overhead System Tracing With eBPF Akshay Kapoor DevOps Engineer @ SAP Labs May 2018

Binary Numbers Binary numbers look like this Binary Numbers or Binary Code Binary numbers or

A Quick Review Decimal to binary Binary to decimal Binary to hexadecimal

H2 F2009 H2 F2009 GENERATING GENERATING GENERATING GENERATING FREE CASH FLOW FREE CASH FLOW

Binary Trees, Heaps Binary Trees, Heaps Binary trees Binary trees A binary tree (

61A Lecture 21 Announcements Binary Trees Binary Tree Class 4 Binary Tree Class class

Bursty Tracing: A Framework for Low-Overhead Temporal Profiling Martin Hirzel Trishul Chilimbi

Balanced Search Trees Binary Search Trees Binary Search Tree Binary Search Tree A binary tree is

Binary Numbers 723 Binary Numbers 723 = 7x100 + 2x10 + 3x1 Binary Numbers 723 = 7x100 + 2x10 +

Electric Traction Electrified railway systems Prof. Dr. Ir. R.P.B.J. Dollevoet Introduction

Fast dynamic and partial reconfiguration Data Path with low Hardware overhead on Xilinx FPGAs

Automatic Generation of Efficient Dynamic Binary Translators Fr ed eric P etrot, Luc

CMSC 206 Binary Search Trees 1 Binary Search Tree n A Binary Search Tree is a Binary Tree in

Binary Search Trees and Balanced Binary Search Trees using AVL Trees Mark Redekopp David Kempe

LECTURE 2 Review 1 Binary Math and Assembly BINARY MATH In this section, we review Binary

Binary trees Binary trees David Morgan Binary trees Binary trees elements have up to 2

Harnessing Harnessing Grid Resources with Grid Resources with Data- -Centric Task Farms

PRESENTATION TO INQUIRY INTO OP BURNHAM AND RELATED MATTERS 4 APRIL 2019 Good afternoon Sir

Conditions for and effects of CARD cache implementations Gustaf R antil a and Mikael W

SPLIT ARRAY CACHES FOR EMBEDDED APPLICATIONS Euromicro DSD 2010 Alice M. Tokarnia, Marina

Tarek Bohsali Microsoft SESSION SUMMARY [PRES ESEN ENTATI TION N TITLE LE] [PRES ESEN

WARM SRAM: A Novel Scheme to Reduce Static Leakage Energy in SRAM Arrays Mahadevan

SALES HISTORY SINCE IPO 35,000,000 2019 19-20 20 in su summa mary 30,000,000 15

Virtual CDN Implementation Eugene E. Otoakhia - eugene.otoakhia@bt.com, BT Peter Willis