ILLINOIS Motivati tion: Bug Reproduction is Difficult Especially - - PowerPoint PPT Presentation

illinois
SMART_READER_LITE
LIVE PREVIEW

ILLINOIS Motivati tion: Bug Reproduction is Difficult Especially - - PowerPoint PPT Presentation

Replay Debugging: Leveraging Record and Replay for Program Debugging Nima Honarmand and Josep Torrellas University of Illinois at Urbana-Champaign http://iacoma.cs.uiuc.edu/ 1 ILLINOIS Motivati tion: Bug Reproduction is Difficult Especially


slide-1
SLIDE 1

ILLINOIS

Replay Debugging: Leveraging Record and Replay for Program Debugging

Nima Honarmand and Josep Torrellas University of Illinois at Urbana-Champaign http://iacoma.cs.uiuc.edu/

1

slide-2
SLIDE 2

ILLINOIS

#2 “Replay Debugging” Nima Honarmand and Josep Torrellas

Especially for bugs in production runs Due to − Complex inputs − Non-deterministic timing in concurrent programs

Motivati tion: Bug Reproduction is Difficult

Record and Deterministic Replay (RnR) can help − Recreates execution of a program − Record: capture non-deterministic events in a log − Replay: use the log to recreate the exact same execution

Problem: Current RnR solutions are not quite suitable as debugging tools.

slide-3
SLIDE 3

ILLINOIS

#3 “Replay Debugging” Nima Honarmand and Josep Torrellas

RnR Logs in QuickRec [ISCA’13]

Input Log: Program/OS interactions − System call results − Data copied to application buffers by OS kernel − Signals, …

Interleaving Log

Memory Access Interleaving Log − Inter-thread dependences

RD X WR Y WR X RD Y Execution WR X RD Y RD X WR Y

Captured in OS kernel Captured using special HW

Chunk content recorded as # of instructions in the chunk

Chunk-based Recording

slide-4
SLIDE 4

ILLINOIS

#4 “Replay Debugging” Nima Honarmand and Josep Torrellas

Replay only reproduces the buggy execution

Problem lem: Naïve RnR not Enough for Debugging

Not enough! need to augment the code for diagnosis

X Crash Log . . . . . . . . . . . . Program X Crash Log . . . . . . . . . . . . Program

Debug code 1 Debug code 2 Debug code 3

Bug Found !!!

Replay Debugging: Augment the code and still be able to replay the log Augmented code cannot be replayed using the recorded log: – Different input events – Different sequence of insts Enables fast & deterministic convergence to the source of the bug

slide-5
SLIDE 5

ILLINOIS

#5 “Replay Debugging” Nima Honarmand and Josep Torrellas

…Needs the ability to

− Write debug code as if part of the same program (may be inlined with main code)

Effective Bug Diagnosis

− Access main program state − Call main program functions − Output results of debug code − Have debug-only state in the debug code, e.g.,

  • Local and global variables, heap-allocated objects, shadow

data structures

int a = /* program code */; #ifdef DEBUG printf(“a is %d”, a); #endif

How to enable all of this without breaking replay?

slide-6
SLIDE 6

ILLINOIS

#6 “Replay Debugging” Nima Honarmand and Josep Torrellas

  • A methodology for guaranteed deterministic replay in

presence of debug code →One can debug a non-deterministic bug deterministically

  • A design combining compiler technology + replay

mechanisms

  • Implementation using LLVM and Intel’s Pin
  • Seamless debugging experience for the programmer

− RDB debug code very similar to ordinary debug code

Contributi tion: Replay DeBugging (RDB)

slide-7
SLIDE 7

ILLINOIS

#7 “Replay Debugging” Nima Honarmand and Josep Torrellas

RDB Approach: Overall View

Compiler extracts debug code and generates two binaries − One contains original, unmodified code − Other contains the debug code

Modified LLVM Compiler

  • 1. Writing

Debug Code

  • 2. Extracting

Debug Code

  • 3. Executing

Debug Code While Replaying

Programmer adds debug code to program source code Replay tool automatically invokes debug code at correct points while replaying the log

slide-8
SLIDE 8

ILLINOIS

#8 “Replay Debugging” Nima Honarmand and Josep Torrellas

Should be marked using special markers Can read main program state Can invoke main program functions Should not write to main program state (directly or indirectly) Can have its own state (local, global and heap) Use the same virtual address space as the main code

− E.g., debug vars can point to main data

Can use runtime library functions

− E.g., printf() or malloc() from libc − Will have its own instance of runtime libs during replay

  • 1. Programmer Writes Debug Code

To guarantee replay remains deterministic Debug Code:

slide-9
SLIDE 9

ILLINOIS

#9 “Replay Debugging” Nima Honarmand and Josep Torrellas

Front-end creates unoptimized LLVM IR from source code Optimizer transforms LLVM IR to optimized form

− We assume all optimizations are disabled for now

CodeGen generates machine code We modify the last two

  • 2. Compiling the Augmented Code

Clang Front-end LLVM IR-Level Transformations and Optimizations LLVM CodeGen Backend (x86) LLVM IR LLVM IR C/C++ Code Machine Code

LLVM Compiler Flow

slide-10
SLIDE 10

ILLINOIS

#10 “Replay Debugging” Nima Honarmand and Josep Torrellas

2.1. Generating Initial LLVM IR

Clang Front-end LLVM IR-level Transformations LLVM CodeGen Backend (x86) LLVM IR LLVM IR C/C++ Code Machine Code

No Changes to the Front-end

void main(void) { char c; c = getchar(); rdb_begin printf("c is '%c'\n", c); rdb_end }

Clang Front-end

@.str = “c is '%c'\n” void @main() { %c = alloca i8 %_tmp0 = call @getchar() store %_tmp0, %c call @__rdb_begin() %_tmp1 = load %c call @printf(@.str, %_tmp1) call @__rdb_end() }

Instrumented C Code LLVM IR

slide-11
SLIDE 11

ILLINOIS

#11 “Replay Debugging” Nima Honarmand and Josep Torrellas

2.2. Extracting the Debug Code

Clang Front-end LLVM IR-level Transformations LLVM CodeGen Backend (x86) LLVM IR LLVM IR C/C++ Code Machine Code

Extract the debug code

@.str = “c is '%c'\n” void @main() { %c = alloca i8 %_tmp0 = call @getchar() store %_tmp0, %c call @__rdb_begin() %_tmp1 = load %c call @printf(@.str, %_tmp1) call @__rdb_end() }

LLVM IR-Level Transformations

void @main() { %c = alloca i8 %_tmp0 = call @getchar() store %_tmp0, %c call @llvm.rdb.location(1) call @llvm.rdb.arg(1, 0, %c) }

LLVM IR (from Front-end)

Extracted Main Code (LLVM IR) Extracted Debug Code (LLVM IR)

FuncID FuncName 1 __rdb_func_1 2 …

Function Descriptors (C++)

@.str = “c is '%c'\n” void @__rdb_func_1(i8* %arg) { %_tmp1 = load %arg call @printf(@.str, %_tmp1) }

slide-12
SLIDE 12

ILLINOIS

#12 “Replay Debugging” Nima Honarmand and Josep Torrellas

void @main() { %c = alloca i8 %_tmp0 = call @getchar() store %_tmp0, %c call @llvm.rdb.location(1) call @llvm.rdb.arg(1, 0, %c) }

FuncID Position Class Info 1 Stack (SP, -20) 2 … … 2 1 … …

2.3. Generating Machine Code

Clang Front-end LLVM IR-level Transformations LLVM CodeGen Backend (x86) LLVM IR LLVM IR C/C++ Code Machine Code

Generate Machine Code

LLVM CodeGen Backend (x86) Extracted Main Code (LLVM IR) Main Code (x86) + Symbols for Location Markers Argument Descriptors (C++) LLVM CodeGen Backend (x86)

@.str = “c is '%c'\n” void @__rdb_func_1(i8* %arg) { %_tmp1 = load %arg call @printf(@.str, %_tmp1) }

Extracted Debug Code (LLVM IR) Debug Code (x86 object file)

slide-13
SLIDE 13

ILLINOIS

#13 “Replay Debugging” Nima Honarmand and Josep Torrellas

Replay implemented using Intel’s Pin (similar to QuickRec) − A binary instrumentation infrastructure Anatomy of Pin − Program and Pintool in the same address space − Pintool is use-case specific Our pintool, RdbTool, does two things: − Replays the log − Invokes debugging code

  • 3. Replay Tool Invokes Debug Code

Virtual Address Space

Pintool Space

Code

Libraries

Heap

Static Data

Stack Program Space

Code

Libraries

Heap

Static Data

Stack

slide-14
SLIDE 14

ILLINOIS

#14 “Replay Debugging” Nima Honarmand and Josep Torrellas

To replay, RdbTool − Instruments system calls to inject program inputs − Counts # of insts to enforce recorded interleaving To invoke debug code, compile debug code into RdbTool RdbTool then − Sets breakpoints at debug markers − Finds and invokes debug code using Function and Argument descriptors

  • 3. Replay Tool Invokes Debug Code

Extracted Debug Code (x86) Function/Arg Descriptors (C++) RdbTool core logic (C++) RdbTool Binary

C/C++ Compiler & Linker

slide-15
SLIDE 15

ILLINOIS

#15 “Replay Debugging” Nima Honarmand and Josep Torrellas

  • 3. Replay Tool Invokes Debug Code

Loads the main code; links it with runtime libraries Loads the RdbTool; links it with separate runtime libraries Replays the main code & invokes debug code on hitting a debug marker Execution is the same as recorded in the log

Virtual Address Space

RdbTool Space

Code

Libraries

Heap

Static Data

Stack Main Program Space

Code

Libraries

Heap

Static Data

Stack Log Invoke Debug Funcs Control Replay

slide-16
SLIDE 16

ILLINOIS

#16 “Replay Debugging” Nima Honarmand and Josep Torrellas

void f() { char c = getchar(); int a = c ? 5 : 6; printf(“c is %d\n”, c); rdb_begin printf(“a is %d\n”, a); rdb_end }

Problem with Compiler Optimizations

Optimizations will be performed after extracting debug code May render the debug code invalid

− E.g., may optimize away state needed by the debug code

Work in progress…

slide-17
SLIDE 17

ILLINOIS

#17 “Replay Debugging” Nima Honarmand and Josep Torrellas

  • Real example of bug diagnosis with RDB
  • Support for event-driven debugging (watch points)
  • Enforcing read-only access to main-program’s memory
  • Using gdb together with RDB
  • Replay debugging without Pin

Also in the Paper

slide-18
SLIDE 18

ILLINOIS

#18 “Replay Debugging” Nima Honarmand and Josep Torrellas

Naïve RnR not enough for bug diagnosis Replay Debugging: A methodology for guaranteed deterministic replay in presence of debug code

  • Seamless debugging experience for programmer
  • Combines compiler and replay technology
  • Proof-of-the-concept implementation using LLVM and

Pin

With RDB, one can diagnose a non-deterministic bug deterministically

Conclusions

slide-19
SLIDE 19

ILLINOIS

#19 “Replay Debugging” Nima Honarmand and Josep Torrellas

THANK YOU!