 
              Prototyping Architectural Support for Program Rollback Using FPGAs Radu Teodorescu and Josep Torrellas http://iacoma.cs.uiuc.edu University of Illinois at Urbana - Champaign
Motivation • Problem: • Software bugs – major cause of system failure • Production software is hard to debug • Continuous debugging is needed • Software - based dynamic monitoring tools • Can catch a wide range of bugs • Orders of magnitude slowdowns Radu Teodorescu - University of Illinois 2 Architectural Support for Program Rollback
Motivation • Alternative solutions • Hardware support for debugging • Low overhead • Exiting support is still modest • Our system: • Hardware - assisted, lightweight debugger • Monitoring, detection and recovery from bugs in production systems Radu Teodorescu - University of Illinois 3 Architectural Support for Program Rollback
Contributions • W e implemented a hardware prototype of a debugging - aware processor • W e show that simple changes to a general purpose processor can provide powerful debugging primitives • W e run experiments on buggy programs • Implementation technology: FPGA • Ideal platform for rapid prototyping • V alidate design, measure hardware overheads, run realistic experiments Radu Teodorescu - University of Illinois 4 Architectural Support for Program Rollback
Debugging Production Code Dynamic execution • Applications run in multiple states: • Normal • Speculative ( can be undone ) • Re - execute • T ransition between states is controlled by software Radu Teodorescu - University of Illinois 5 Architectural Support for Program Rollback
Debugging Production Code Original code Instrumented code Dynamic execution num=1; num=1; num=1; ... Replay p=m[a[*x]]+&y; enter_spec(); enter_spec(); ... p=m[a[*x]]+&y; num++; p=m[a[*x]]+&y; p=m[a[*x]]+&y; ... ... ... if(pstate()==REEXEC) if(pstate()==REEXEC) if(pstate()==REEXEC) Rollback { { info_collect(); exit_spec(flag); info_collect(); } } exit_spec(flag); exit_spec(flag); num++; Normal num++; Speculative Re - execute Radu Teodorescu - University of Illinois 6 Architectural Support for Program Rollback
System Implementation Radu Teodorescu - University of Illinois 7 Architectural Support for Program Rollback
Hardware Extensions • Undo program execution checkpointed state • Large code sections CPU • Small overhead • Software control • Lightweight checkpointing Data Cache • Hardware support needed: • Register checkpointing Memory • Speculative data cache Radu Teodorescu - University of Illinois 8 Architectural Support for Program Rollback
Register Checkpointing • Needed to allow restoration of processor state • Beginning of speculative execution • Register fi le is copied into a shadow register fi le • End of speculative execution • Commit: discard checkpoint • Rollback: restore registers & PC from checkpoint Radu Teodorescu - University of Illinois 9 Architectural Support for Program Rollback
Speculative Data Cache • Holds both speculative and non - CPU CPU speculative data Rollback • Each line has a “ speculative ” bit • Cache walk: merging or invalidating lines • Speculative lines cannot be evicted Line A Line B SPEC TAG DIRTY DATA Data cache data cache line Radu Teodorescu - University of Illinois 10 Architectural Support for Program Rollback
Software Control • Give the compiler control over speculative execution • Control instructions: • Begin speculation • End speculation ( commit or rollback ) • W e use SPARC ’ s special access load • LDA [r0] code, r1 Radu Teodorescu - University of Illinois 11 Architectural Support for Program Rollback
Begin Speculative Execution Normal IF ID EX MEM WB STALL! Speculative BS BS BS BS BS Re - execute Data Cache CPU release pipeline checkpoint done Register Cache begin checkpoint Checkpoiting Controller Radu Teodorescu - University of Illinois 12 Architectural Support for Program Rollback
Limits • Size of the speculative window is a ff ected by: • Cache size and associativity - cache over fl ow • I/O operations cannot be rolled back • In both cases exceptions are raised • Early commit • OS intervention: bu ff er speculative state or I/O instructions Radu Teodorescu - University of Illinois 13 Architectural Support for Program Rollback
Experiments and Results Radu Teodorescu - University of Illinois 14 Architectural Support for Program Rollback
Processor Prototype • LEON2 - SPARC V8 compliant processor • Single issue, 5 - stage pipeline • Windowed register fi le • 2-32 sets, 16 registers • L1 instruction and data caches • 1-4 sets, up to 64KB /set • Synthesizable, open source VHDL code Radu Teodorescu - University of Illinois 15 Architectural Support for Program Rollback
Experimental Infrastructure • System on a chip: PCI, Ethernet and serial interfaces • Development tools • RTL Simulation - ModelSIM • Synthesis - Xilinx ISE 6.1 • Development board: • Xilinx Virtex II XC2V3000, 64 Mbytes SDRAM • Linux embedded Radu Teodorescu - University of Illinois 16 Architectural Support for Program Rollback
Deployment J Processor T Netlist A G C Output O Terminal M PCI Communication Binaries Tool Radu Teodorescu - University of Illinois 17 Architectural Support for Program Rollback
Hardware Overhead Con fi gurable Logic Blocks 9000 8000 Average overhead 4.5% 7000 6000 5000 CLBs 4000 3000 ba 2000 1000 0 4KB 8KB 16KB 32KB 64KB Data Cache Size base base+reg_ckpt base+reg_ckpt+spec_cache Radu Teodorescu - University of Illinois 18 Architectural Support for Program Rollback
Buggy Applications • Applications with known bugs DETECTION WINDOW • Manually instrument the code bug location • Detection window contains: • bug location bug manifestation • bug manifestation • Determine if we can roll back the buggy code section • Test con fi guration: 32KB data cache, 4KB instruction Radu Teodorescu - University of Illinois 19 Architectural Support for Program Rollback
Buggy Applications Successful Dynamic Application Bug Description rollback Instructions ncompress-4.2.4 Input file name longer than 1024 bytes Yes 10653 corrupts stack return address polymorph-0.4.0 Input file name longer than 2048 bytes No 103838 corrupts stack return address Unexpected loop bounds causes heap tar-1.13.25 Yes 193 object overflow Wrong bounds checking causes static man-1.5h1 Yes 54217 object corruption Input file name longer than 1024 bytes gzip-1.2.4 Yes 17535 overflows a global variable Radu Teodorescu - University of Illinois 20 Architectural Support for Program Rollback
Conclusions • W e implemented a hardware prototype of a processor with software controlled speculative execution • W e show that simple changes to a general purpose processor can provide powerful debugging primitives • Obtained an estimate of the hardware overhead and run experiments on buggy programs • W e are looking at the integration of our hardware with compiler and operating system support Radu Teodorescu - University of Illinois 21 Architectural Support for Program Rollback
Prototyping Architectural Support for Program Rollback Using FPGAs Radu Teodorescu and Josep Torrellas http://iacoma.cs.uiuc.edu University of Illinois at Urbana - Champaign 22
Recommend
More recommend