Performance, Correctness, Exceptions: Pick Three Andrea Gussoni , - - PowerPoint PPT Presentation

performance correctness exceptions pick three
SMART_READER_LITE
LIVE PREVIEW

Performance, Correctness, Exceptions: Pick Three Andrea Gussoni , - - PowerPoint PPT Presentation

Performance, Correctness, Exceptions: Pick Three Performance, Correctness, Exceptions: Pick Three Andrea Gussoni , Alessandro Di Federico, Pietro Fezzardi, Giovanni Agosta Politecnico di Milano 24 February 2019 1 / 36 Performance,


slide-1
SLIDE 1

Performance, Correctness, Exceptions: Pick Three

Performance, Correctness, Exceptions: Pick Three

Andrea Gussoni, Alessandro Di Federico, Pietro Fezzardi, Giovanni Agosta

Politecnico di Milano

24 February 2019

1 / 36

slide-2
SLIDE 2

Performance, Correctness, Exceptions: Pick Three

Table of Contents

1 Motivations 2 rev.ng 3 Design 4 Experimental Results 5 Conclusions

2 / 36

slide-3
SLIDE 3

Performance, Correctness, Exceptions: Pick Three Motivations

Motivations

Static binary translation has a variety of possible uses: Support for legacy code. Performance improvement for legacy architectures. Instrumentation of code.

3 / 36

slide-4
SLIDE 4

Performance, Correctness, Exceptions: Pick Three Motivations

Goals

Improve the performance of the translated binaries. Do not reinvent the wheel, use as much as possible off-the-shelf components. Be architecture independent, as the the whole rev.ng framework.

4 / 36

slide-5
SLIDE 5

Performance, Correctness, Exceptions: Pick Three rev.ng

Table of Contents

1 Motivations 2 rev.ng 3 Design 4 Experimental Results 5 Conclusions

5 / 36

slide-6
SLIDE 6

Performance, Correctness, Exceptions: Pick Three rev.ng

rev.ng

EMU

6 / 36

slide-7
SLIDE 7

Performance, Correctness, Exceptions: Pick Three rev.ng

rev.ng

input.elf Lift to QEMU IR Translate to LLVM IR Recompile

  • utput.elf
slide-8
SLIDE 8

Performance, Correctness, Exceptions: Pick Three rev.ng

rev.ng

input.elf Lift to QEMU IR Translate to LLVM IR Recompile

  • utput.elf

Function Isolation

7 / 36

slide-9
SLIDE 9

Performance, Correctness, Exceptions: Pick Three rev.ng

The root Function

At the present time, the lifting phase places all the code recovered from the binary in a single (and often large) llvm function, that we call root.

8 / 36

slide-10
SLIDE 10

Performance, Correctness, Exceptions: Pick Three rev.ng

The root Function

bb.__stdio_exit.0x28 20 bb.__stdio_exit.0x16 22 bb.__stdio_exit.0x6 25 bb.__stdio_exit.0x9 24 bb.__stdio_exit.0xe 23 bb.__stdio_exit.0x1c 21 bb.close_file 35 46 bb.__stdio_write.0xb0 43 bb.__stdio_write.0xbc 42 bb.__stdio_write.0xc6 41 bb.__stdio_write.0x44 49 62 bb.wcrtomb.0x44 61 bb.wcrtomb.0xf1 53 bb.__syscall_ret.0x10 68 bb.fwrite_unloc 100 bb.fwrite_unlocked.0x34 103 bb.fwrite_unlocked.0x37 102 110 bb.__fwritex.0x3b 119 bb.__towrite 39 109 108 bb.__stdio_close.0x9 130 bb.__syscall_ret 70 bb.__stdio_close bb.dummy4 bb.wctomb.0x7 146 bb.wcrtomb 66 bb.frexpl.0x1b 155 bb.frexpl.0x27 153 bb.strerror_l.0x2 176 bb.strerror_l.0xf 175 bb.strerror_l.0x16 174 bb.strerror_l.0x1b 173 bb.strerror_l.0x1f 172 bb.strerror_l.0x28 171 bb.strerror_l.0x2c 170 bb.__lctrans 165 bb.strerror_l bb.__init_tls.0xf1 185 _thread_area bb.__init_tls.0x37 192 bb.__init_tls.0x3d 191 bb.__init_tls.0x69 190 bb.__init_tls.0x9b 189 bb.__init_tls.0xa6 188 bb.__init_tls.0xce 187 bb.__init_tls.0xec 186 bb.__copy_tls.0xd 201 bb.__memcpy_fwd 96 bb.__copy_tls 202 bb.strlen 254 bb.__unlockfile 136 bb.vfprintf.0x6d 264 bb.vfprintf.0x70 263 bb.vfprintf.0x82 262 bb.vfprintf.0x87 261 bb.vfprintf.0x91 260 bb.vfprintf.0xba 259 bb.vfprintf.0x31 272 bb.vfprintf.0x33 271 bb.vfprintf.0x38 270 bb.vfprintf.0x3a 269 bb.printf_core.0x85a 277 bb.printf_core.0x861 276 bb.printf_core.0x747 299 bb.printf_core.0x704 302 bb.printf_core.0x666 311 bb.printf_core.0x66e 310 bb.printf_core.0x67b bb.printf_core.0x5c5 324 bb.strerror 169 bb.printf_core.0x39 437 bb.printf_core.0x40 436 bb.printf_core.0x53 435 bb.printf_core.0x70 bb.printf_core.0x7a 432 bb.printf_core.0x7d 431 bb.printf_core.0x83 430 bb.printf_core.0x87 429 bb.printf_core.0x8c 428 bb.printf_core.0x8f 427 bb.printf_core.0xbf 423 bb.printf_core.0xc6 422 bb.printf_core.0x95 426 bb.printf_core.0xa5 425 bb.printf_core.0x1dd 396 bb.printf_core.0x1e9 395 bb.printf_core.0x295 381 bb.printf_core.0x29a bb.printf_core.0x2a6 bb.printf_core.0x2fd 370 bb.printf_core.0x301 bb.memchr bb.printf_core.0x5c0 325 bb.__errno_location 178 bb.printf_core.0x506 335 bb.printf_core.0x4be 339 bb.printf_core.0x4c8 338 bb.printf_core.0x509 334 bb.printf_core.0x4d8 337 bb.printf_core.0x4e1 336 bb.printf_core.0x511 333 bb.printf_core.0x51d 332 bb.printf_core.0x333 366 bb.printf_core.0x7b1 292 bb.printf_core 438 bb.fmt_fp.0xa99 442 bb.fmt_fp.0xa27 448 bb.fmt_fp.0xa2b 447 bb.fmt_fp.0xa32 446 bb.fmt_fp.0x9ea 454 bb.fmt_fp.0x9ef 453 bb.fmt_fp.0x9fb 452 bb.fmt_fp.0xa03 451 bb.fmt_fp.0xa08 450 bb.fmt_fp.0xa46 445 bb.fmt_fp.0xa11 449 bb.fmt_fp.0x967 462 bb.fmt_fp.0x91c 472 bb.fmt_fp.0x8df 478 bb.fmt_fp.0x8e6 477 bb.fmt_fp.0x8eb 476 bb.fmt_fp.0x8f3 475 bb.fmt_fp.0x8f8 474 bb.fmt_fp.0x904 473 bb.fmt_fp.0x8a3 482 bb.fmt_fp.0x8ae 481 bb.fmt_fp.0x8c9 480 bb.fmt_fp.0x8d0 479 bb.fmt_fp.0x91e 471 bb.fmt_fp.0x922 470 465 bb.fmt_fp.0x929 469 bb.fmt_fp.0x9aa 458 bb.fmt_fp.0x9d0 456 bb.fmt_fp.0x9db 455 bb.fmt_fp.0x881 483 bb.fmt_fp.0x871 484 bb.fmt_fp.0x80a 489 bb.fmt_fp.0x816 488 bb.fmt_fp.0x821 487 bb.fmt_fp.0x37f 575 bb.fmt_fp.0x33d 577 bb.fmt_fp.0x309 579 95 p.0x1f1 594 fe 4d 0x28c 589 588 mt_fp.0x298 bb.fmt_fp.0x29c 586 bb.fmt_fp.0x117 609 bb.fmt_fp.0xaab 441 bb.fmt_fp.0x107 610 bb.fmt_fp.0xaa 614 bb.fmt_fp.0xb5 613 bb.fmt_fp.0xd0 612 bb.fmt_fp.0xe1 611 bb.fmt_fp.0x124 608 bb.frexpl 156 bb.fmt_fp.0x45 620 bb.fmt_fp.0x51 619 bb.fmt_fp.0x61 618 bb.fmt_fp.0x6b 617 bb.fmt_fp.0x8e 616 bb.fmt_fp.0x9a 615 bb.__fpclassifyl 163 bb.pad.0x54 625 bb.pad.0x3c 627 bb.pad.0x44 626 bb.pad.0x5c 624 bb.memset 230 bb.pop_arg 668 bb.out.0xb 670 bb.__fwritex 122 bb.out 671 bb.fmt_u 675 bb.printf.0x24 678 bb.printf.0x5b 677 bb.vfprintf 273 bb.exit.0x32 680 bb._Exit 168 bb.exit.0x2b 681 bb.__stdio_exit 26 682 bb._fini 1 bb.dummy2 bb.__libc_start_main.0x4f 688 bb.exit 686 bb.__init_libc.0x8d 713 714 198 bb.dummy1 725 bb.main bb.frame_dummy.0x3c 806 bb.frame_dummy.0x1d 810 bb.frame_dummy.0x28 809 bb.frame_dummy bb.register_tm_clones 826 bb._start.0x36 831 bb.deregister_tm_clones 830 bb._init.0x6 835 bb.__do_global_ctors_aux 6 catchblock d1 invoke_return F1 g1 h1 i1 I1 m1 M1 P1 r1 R1 s2 S2 a3 b3 B3 c3 C3 d3 D3 i3 I3 K3 m3 P3 q3 r3 R3 S3 s3 t3 T3 U3 u3 v3 V3 w3 W3 a4 A4 d4 D4 e4 E4 f4 F4 g4 G4 h4 H4
  • 4
O4 m5 M5 O5 q5 Q5 r5 R5 bb.vfprintf.0x31_L0_ft s5 bb.vfprintf.0x31_L0 S5 bb.vfprintf.0x38_L0_ft t5 bb.vfprintf.0x38_L0 T5 u5 U5 y5 Y5 z6 Z6 b6 B6 c6 C6 d6 D6 e6 f6 F6 g6 G6 H6 h6 i6 I6 j6 w6 f7 F7 l7 5 7 8 unexpectedpc t7 T7 u7 U7 v7 V7 w7 W7 G8 m8 M8 n8 N8
  • 8
O8 p8 P8 q8 Q8 r8 R8 t8 T8 u8 U8 v8 V8 w8 W8 x8 X8 y8 Y8 Z9 z9 c9 d9 D9 e9 E9 G9 J9 9 k9 L9 m11 M11 n11 N11
  • 11
O11 p11 P11 q11 Q11 r11 R11 t11 T11 v11 V11 W11 g12 G12 h12 H12 i12 I12 k12 K12 l12 L12 m12 M12 n12 N12 p12 P12 q12 Q12 r12 R12 t12 l13 L13
  • 13
O13 d15 E15 f15 F15 j15 l15 L15 n15 N15
  • 15
q15 Q15 normal_invoke s15 abnormal_invoke S15

9 / 36

slide-11
SLIDE 11

Performance, Correctness, Exceptions: Pick Three rev.ng

The Dispatcher

What about indirect branches or indirect function calls (e.g. jmp rax)? We need the dispatcher.

10 / 36

slide-12
SLIDE 12

Performance, Correctness, Exceptions: Pick Three rev.ng

The Dispatcher

switch i64 @pc , label %dispatcher.default [ i64 4194536 , label %bb._init i64 4194542 , label %bb._init.0x6 i64 4194547 , label %bb._init.0xb i64 4194560 , label %bb._start i64 4194582 , label %bb._start_c i64 4194614 , label %bb._start .0x36 i64 4194624 , label %bb.deregister_tm_clones i64 4194645 , label %bb.deregister_tm_clones .0x15 i64 4194655 , label %bb.deregister_tm_clones .0x1f i64 4194672 , label %bb.deregister_tm_clones .0x30 i64 4194688 , label %bb.register_tm_clones i64 4194723 , label %bb.register_tm_clones .0x23 i64 4194733 , label %bb.register_tm_clones .0x2d i64 4194744 , label %bb.register_tm_clones .0x38 ]

11 / 36

slide-13
SLIDE 13

Performance, Correctness, Exceptions: Pick Three rev.ng

Current Limitations

One mayor problem of the dispatcher is that every time we need to pass through it, we pay an high cost in terms of performance. The CFG of the root function contains a lot of unnecessary edges, and this leads to a mazy topology. This topology prevents a lot of opt optimizations.

12 / 36

slide-14
SLIDE 14

Performance, Correctness, Exceptions: Pick Three rev.ng

Current Limitations

bb.main: store 0 @rax br %bb.main.0x8 bb.main.0x8: %1 = load @rax bb.dispatcher: switch . . .

13 / 36

slide-15
SLIDE 15

Performance, Correctness, Exceptions: Pick Three Design

Table of Contents

1 Motivations 2 rev.ng 3 Design 4 Experimental Results 5 Conclusions

14 / 36

slide-16
SLIDE 16

Performance, Correctness, Exceptions: Pick Three Design

A naive approach

The natural thing to do is try to reconstruct (with some approximations) the original function layout. Will things break? (Spoiler: yes, they will).

15 / 36

slide-17
SLIDE 17

Performance, Correctness, Exceptions: Pick Three Design

Bird View

def root(): bb.foo: bb.foo.0x8: bb.main: bb.main.0xa: bb.bar: bb.bar.0x4: def main(): bb.main: bb.main.0xa: def foo(): bb.foo: bb.foo.0x8: def bar(): bb.bar: bb.bar.0x4:

br %bb.bar?

16 / 36

slide-18
SLIDE 18

Performance, Correctness, Exceptions: Pick Three Design

Bird View

def root(): bb.foo: bb.foo.0x8: bb.main: bb.main.0xa: bb.bar: bb.bar.0x4: def main(): bb.main: bb.main.0xa: def foo(): bb.foo: bb.foo.0x8: def bar(): bb.bar: bb.bar.0x4:

br %bb.bar?

16 / 36

slide-19
SLIDE 19

Performance, Correctness, Exceptions: Pick Three Design

Bird View

def root(): bb.foo: bb.foo.0x8: bb.main: bb.main.0xa: bb.bar: bb.bar.0x4: def main(): bb.main: bb.main.0xa: def foo(): bb.foo: bb.foo.0x8: def bar(): bb.bar: bb.bar.0x4:

br %bb.bar?

16 / 36

slide-20
SLIDE 20

Performance, Correctness, Exceptions: Pick Three Design

What if we make the isolated functions and the root function coexist?

17 / 36

slide-21
SLIDE 21

Performance, Correctness, Exceptions: Pick Three Design

Isolated and Non-Isolated Realms

We define these two realms: Isolated Realm In this realm, we have a new llvm function for each function discovered by the FBDA. Non-Isolated Realm In this realm the original root function has been preserved, basically unaltered.

18 / 36

slide-22
SLIDE 22

Performance, Correctness, Exceptions: Pick Three Design

To the Isolated Realm and Back

Transitioning to the isolated realm is easy, every time we find a basic block that is an entry point of a function, we call the corresponding isolated function in the isolated realm. The transition in the opposite direction is more complicated, our idea is to exploit the exception handling mechanism provided by llvm.

19 / 36

slide-23
SLIDE 23

Performance, Correctness, Exceptions: Pick Three Design

To the Isolated Realm and Back

def root(): dispatcher: switch (pc): 1 label %bb.foo 2 label %bb.main 3 label %bb.bar bb.foo: invoke foo() bb.main: invoke main() bb.bar: invoke bar() def main(): bb.main: call foo() ret def foo(): bb.foo: br bb.bar throw exception def bar(): bb.bar: . . . ret

20 / 36

slide-24
SLIDE 24

Performance, Correctness, Exceptions: Pick Three Design

To the Isolated Realm and Back

def root(): dispatcher: switch (pc): 1 label %bb.foo 2 label %bb.main 3 label %bb.bar bb.foo: invoke foo() bb.main: invoke main() bb.bar: invoke bar() def main(): bb.main: call foo() ret def foo(): bb.foo: br bb.bar throw exception def bar(): bb.bar: . . . ret

20 / 36

slide-25
SLIDE 25

Performance, Correctness, Exceptions: Pick Three Design

To the Isolated Realm and Back

def root(): dispatcher: switch (pc): 1 label %bb.foo 2 label %bb.main 3 label %bb.bar bb.foo: invoke foo() bb.main: invoke main() bb.bar: invoke bar() def main(): bb.main: call foo() ret def foo(): bb.foo: br bb.bar throw exception def bar(): bb.bar: . . . ret

20 / 36

slide-26
SLIDE 26

Performance, Correctness, Exceptions: Pick Three Design

To the Isolated Realm and Back

def root(): dispatcher: switch (pc): 1 label %bb.foo 2 label %bb.main 3 label %bb.bar bb.foo: invoke foo() bb.main: invoke main() bb.bar: invoke bar() def main(): bb.main: call foo() ret def foo(): bb.foo: br bb.bar throw exception def bar(): bb.bar: . . . ret

20 / 36

slide-27
SLIDE 27

Performance, Correctness, Exceptions: Pick Three Design

To the Isolated Realm and Back

def root(): dispatcher: switch (pc): 1 label %bb.foo 2 label %bb.main 3 label %bb.bar bb.foo: invoke foo() bb.main: invoke main() bb.bar: invoke bar() def main(): bb.main: call foo() ret def foo(): bb.foo: br bb.bar throw exception def bar(): bb.bar: . . . ret

20 / 36

slide-28
SLIDE 28

Performance, Correctness, Exceptions: Pick Three Design

To the Isolated Realm and Back

def root(): dispatcher: switch (pc): 1 label %bb.foo 2 label %bb.main 3 label %bb.bar bb.foo: invoke foo() bb.main: invoke main() bb.bar: invoke bar() def main(): bb.main: call foo() ret def foo(): bb.foo: br bb.bar throw exception def bar(): bb.bar: . . . ret

20 / 36

slide-29
SLIDE 29

Performance, Correctness, Exceptions: Pick Three Design

Exception Handling Mechanism

Our fallback mechanism is implemented using: The exception support provided by the llvm framework. The stack unwinding mechanism via libgcc.

21 / 36

slide-30
SLIDE 30

Performance, Correctness, Exceptions: Pick Three Design

Function Isolation

Function isolation is performed on the basis of the information provided by the Function Boundaries Detection Analysis pass. The accuracy of the FBDA is an important factor for performing an high quality function isolation. The quality of the function isolation determines how much the fallback-mechanism is actually employed.

22 / 36

slide-31
SLIDE 31

Performance, Correctness, Exceptions: Pick Three Design

Function Boundaries Analysis Limitations

There are situations (e.g. exceptions in the original code), where the good (or even optimal) quality of the FBDA will not be sufficient. Our fallback mechanism guarantees that we can handle the execution in these situations. We handle exceptions with exceptions!

23 / 36

slide-32
SLIDE 32

Performance, Correctness, Exceptions: Pick Three Experimental Results

Table of Contents

1 Motivations 2 rev.ng 3 Design 4 Experimental Results 5 Conclusions

24 / 36

slide-33
SLIDE 33

Performance, Correctness, Exceptions: Pick Three Experimental Results

Experimental Setup

We used the SPECint 2006 benchmark suite. 4 configurations:

Native qemu rev.ng rev.ng with isolation

25 / 36

slide-34
SLIDE 34

Performance, Correctness, Exceptions: Pick Three Experimental Results

Experimental Results

libquantum h264ref bzip2 perlbench sjeng gcc xalancbmk gobmk 2× 4× 8× 16×

2.2× 10.6× 6.2× 17.1× 11.1× 5.8× 10.2× 9.8× 1.6× 3.7× 3.1× 5.1× 3.9× 3.1× 5.5× 5.2× 1.1× 2.7× 2.2× 3.7× 2.6× 2.1× 2.8× 3.3×

Slowdown qemu rev.ng isolated Figure: Slowdown of the different translation techniques compared to native

  • code. Logarithmic scale. Lower is better.

26 / 36

slide-35
SLIDE 35

Performance, Correctness, Exceptions: Pick Three Conclusions

Table of Contents

1 Motivations 2 rev.ng 3 Design 4 Experimental Results 5 Conclusions

27 / 36

slide-36
SLIDE 36

Performance, Correctness, Exceptions: Pick Three Conclusions

Future Work

Recognize function parameters. Recognize return values. Promote global variables (registers) to local variables when possible.

28 / 36

slide-37
SLIDE 37

Performance, Correctness, Exceptions: Pick Three Conclusions

Resources

The function isolation feature has been implemented in rev.ng as a llvm pass. The artifacts produced during the work, the code and the instructions to reproduce them are available at https://rev.ng/gitlab/revng-bar-2019/artifacts. If you are interested in more general instructions on how to get started with rev.ng, you can check the official website at https://rev.ng/getting-started.html.

29 / 36

slide-38
SLIDE 38

Performance, Correctness, Exceptions: Pick Three Conclusions

Questions?

30 / 36

slide-39
SLIDE 39

Performance, Correctness, Exceptions: Pick Three Conclusions

License

These slides are published under a Creative Commons Attribution-ShareAlike 4.0 license.

31 / 36

slide-40
SLIDE 40

Performance, Correctness, Exceptions: Pick Three Backup

Backup Slides

Backup Slides

32 / 36

slide-41
SLIDE 41

Performance, Correctness, Exceptions: Pick Three Backup

LLVM IR

int counter; int main(int argc) { if (argc > 5) { counter ++; } else { myfunction (); } return 1; } @counter = common global i32 0 define i32 @main ( i32 %argc ) { %1 = icmp sgt i32 %argc , 5 br i1 %1 , label %yes , label %no yes : %2 = load i32 , i32 * @counter %3 = add i32 %2 , 1 store i32 %3 , i32 * @counter br label %end no : call void @otherfunction () br label %end end : ret i32 1 }

33 / 36

slide-42
SLIDE 42

Performance, Correctness, Exceptions: Pick Three Backup

Exception Handling Mechanism

To do this, we mainly used the exception handling mechanism provided by llvm. In our solution, this mechanism is in charge of recovering a potentially faulty situation, for example when static analysis cannot foresee the destination of a jump, taking care of redirecting the execution to a component that is in charge of understanding what to do next.

34 / 36

slide-43
SLIDE 43

Performance, Correctness, Exceptions: Pick Three Backup

Exception Handling Mechanism

At the implementation level, for using exceptions we need to: Replace in the root function, each function entry basic block body with an invoke instruction (a peculiar call instruction) to the isolated function. In the isolated realm, each time we need to exit from the isolated function in an unexpected manner, throw an exception. Provide to llvm a personality function, which is a function that is in charge of specifying the runtime behavior when an exception is thrown.

35 / 36

slide-44
SLIDE 44

Performance, Correctness, Exceptions: Pick Three Backup

CSV

rev.ng represents the current CPU state using the so called CSV (CPU state variable), which are llvm global variables. In the general case, this is a great bottleneck for the performances (we need to go through memory).

36 / 36