Precise Exceptions and Out-of-Order Execution Samira Khan

Multi-Cycle Execution • Not all instructions take the same amount of time for “execution” • Idea: Have multiple different functional units that take different number of cycles • Can be pipelined or not pipelined • Can let independent instructions start execution on a different functional unit before a previous long-latency instruction finishes execution 2

ISSUES IN PIPELINING: MULTI-CYCLE EXECUTE • Instructions can take different number of cycles in EXECUTE stage • Integer ADD versus FP Multiply F D E E E E E E E E W FMUL R4 ß R1, R2 ADD R3 ß R1, R2 F D E W F D E W F D E W F D E E E E E E E E W FMUL R2 ß R5, R6 F D E W ADD R4 ß R5, R6 F D E W • What is wrong with this picture? • What if FMUL incurs an exception? • Sequential semantics of the ISA NOT preserved! 3

The Von Neumann Model/Architecture • Also called stored program computer (instructions in memory). Two key properties: • Stored program • Instructions stored in a linear memory array • Memory is unified between instructions and data • The interpretation of a stored value depends on the control signals • Sequential instruction processing • One instruction processed (fetched, executed, and completed) at a time • Program counter (instruction pointer) identifies the current instr. • Program counter is advanced sequentially except for control transfer instructions 4

HANDLING EXCEPTIONS IN PIPELINING • Exceptions versus interrupts • Cause • Exceptions: internal to the running thread • Interrupts: external to the running thread • When to Handle • Exceptions: when detected (and known to be non-speculative) • Interrupts: when convenient • Except for very high priority ones • Power failure • Machine check • Priority: process (exception), depends (interrupt) • Handling Context: process (exception), system (interrupt) 5

PRECISE EXCEPTIONS/INTERRUPTS • The architectural state should be consistent when the exception/interrupt is ready to be handled 1. All previous instructions should be completely retired. 2. No later instruction should be retired. Retire = commit = finish execution and update arch. state 6

WHY DO WE WANT PRECISE EXCEPTIONS? • Aid software debugging • Enable (easy) recovery from exceptions, e.g. page faults • Enable (easily) restartable processes 7

ENSURING PRECISE EXCEPTIONS IN PIPELINING • Idea: Make each operation take the same amount of time F D E E E E E E E E W FMUL R3 ß R1, R2 ADD R4 ß R1, R2 F D E E E E E E E E W F D E E E E E E E E W F D E E E E E E E E W F D E E E E E E E E W F D E E E E E E E E W E W F D E E E E E E E • Downside • What about memory operations? • Each functional unit takes 500 cycles? 8

SOLUTION: REORDER BUFFER (ROB) • Idea: Complete instructions out-of-order, but reorder them before making results visible to architectural state • When instruction is decoded it reserves an entry in the ROB • When instruction completes, it writes result into ROB entry • When instruction oldest in ROB and it has completed, its result moved to reg. file or memory Func Unit Register Instruction Reorder Func Unit File Cache Buffer Func Unit 9

V DEST DEST CO REG VAL MPL ETE Oldest FMUL 1 R4 -- 0 ADD 1 R3 -- 0 1 0 1 0 FMUL Youngest 1 0 ADD Reorder File

REORDER BUFFER: INDEPENDENT T CYCLE 5 OPERATI TIONS V DEST DEST CO REG VAL MPL Oldest ETE 0 1 2 3 4 5 6 7 8 9 10 11 FMUL 1 R4 -- 0 ADD 1 R3 1000 1 F D E E E E E E E E R W 1 0 F D E R W 1 0 Youngest F D E R W 1 R2 -- 0 FMUL ADD F D E R W F D E E E E E E E E R W FMUL R2 ß R5, R6 ADD R4 ß R5, R6 F D E R W W F D E R Reorder File 11

REORDER BUFFER: INDEPENDENT T CYCLE 5 OPERATI TIONS V DEST DEST CO REG VAL MPL Oldest ETE 0 1 2 3 4 5 6 7 8 9 10 11 FMUL 1 R4 -- 0 ADD 1 R3 1000 1 F D E E E E E E E E R W 1 0 F D E R W 1 0 Youngest F D E R W 1 R2 -- 0 FMUL ADD 1 R4 -- 0 F D E R W F D E E E E E E E E R W FMUL R2 ß R5, R6 ADD R4 ß R5, R6 F D E R W W F D E R Reorder File 12

REORDER BUFFER: INDEPENDENT T CYCLE 11 OPERATI TIONS V DEST DEST CO REG VAL MPL Oldest ETE 0 1 2 3 4 5 6 7 8 9 10 11 FMUL 1 R4 101 0 ADD 1 R3 1000 1 F D E E E E E E E E R W 1 0 F D E R W 1 0 Youngest F D E R W 1 R2 -- 0 FMUL ADD 1 R4 -- 0 F D E R W F D E E E E E E E E R W FMUL R2 ß R5, R6 ADD R4 ß R5, R6 F D E R W W F D E R Reorder File 13

REORDER BUFFER: INDEPENDENT T CYCLE 12 OPERATI TIONS RETIRE V DEST DEST CO OLDEST REG VAL MPL Oldest ETE 0 1 2 3 4 5 6 7 8 9 10 11 FMUL 1 R4 101 1 ADD 1 R3 1000 1 F D E E E E E E E E R W 1 0 F D E R W 1 0 Youngest F D E R W 1 R2 -- 0 FMUL ADD 1 R4 -- 0 F D E R W F D E E E E E E E E R W FMUL R2 ß R5, R6 ADD R4 ß R5, R6 F D E R W W F D E R Reorder File 14

REORDER BUFFER: INDEPENDENT T CYCLE 12 OPERATI TIONS RETIRE V DEST DEST CO OLDEST REG VAL MPL ETE 0 1 2 3 4 5 6 7 8 9 10 11 Oldest FMUL 0 R4 101 1 ADD 1 R3 1000 1 F D E E E E E E E E R W 1 0 F D E R W 1 0 Youngest F D E R W 1 R2 -- 0 FMUL ADD 1 R4 -- 0 F D E R W F D E E E E E E E E R W FMUL R2 ß R5, R6 ADD R4 ß R5, R6 F D E R W W F D E R Reorder File 15

REORDER BUFFER: INDEPENDENT T CYCLE 12 OPERATI TIONS V DEST DEST CO REG VAL MPL ETE 0 1 2 3 4 5 6 7 8 9 10 11 Oldest 0 ADD 1 R3 1000 1 F D E E E E E E E E R W 1 0 F D E R W 1 0 Youngest F D E R W 1 R2 -- 0 FMUL ADD 1 R4 -- 0 F D E R W F D E E E E E E E E R W FMUL R2 ß R5, R6 ADD R4 ß R5, R6 F D E R W W F D E R Reorder File What if a later operation needs a value in the reorder buffer? Read reorder buffer in parallel with the register file. How? 16

REORDER BUFFER: HOW TO ACCESS? • A register value can be in the register file, reorder buffer, (or bypass paths) Register Instruction File Cache Func Unit Func Unit Content Reorder Func Unit Addressable Buffer Memory bypass path (searched with register ID) 17

Search for Register Value VAL V V DEST DEST CO REG VAL MPL R1 1 1 ETE R2 0 Oldest 0 R3 0 ADD 1 R3 1000 1 R4 0 1 0 R5 5 1 1 0 R6 6 1 Youngest 1 R2 -- 0 R7 8 1 ADD 1 R4 -- 0 R8 8 1 R9 9 1 R10 10 1 R11 11 0

SIMPLIFYING REORDER BUFFER ACCESS • Idea: Use indirection • Access register file first • If register not valid, register file stores the ID of the reorder buffer entry that contains (or will contain) the value of the register • Mapping of the register to a ROB entry • Access reorder buffer next • What is in a reorder buffer entry? V DestRegID DestRegVal StoreAddr StoreData BranchTarget PC/IP Control/valid bits • Can it be simplified further? 19

Search for Register Value VAL TAG V V DEST DEST CO REG VAL MPL R1 1 1 ETE R2 5 0 Oldest 0 R3 2 0 ADD 1 R3 1000 1 R4 6 0 1 0 R5 5 1 1 0 R6 6 1 Youngest 1 R2 -- 0 R7 8 1 ADD 1 R4 -- 0 R8 8 1 R9 9 1 R10 10 1 R11 11 1

REORDER BUFFER PROS AND CONS • Pro • Conceptually simple for supporting precise exceptions • Con • Reorder buffer needs to be accessed to get the results that are yet to be written to the register file • CAM or indirection à increased latency and complexity 21

Reorder Buffer in Intel Pentium III Boggs et al., “The Microarchitecture of the Pentium 4 Processor,” Intel Technology Journal, 2001. 22

In-Order Pipeline with Reorder Buffer • Decode (D): Access regfile/ROB, allocate entry in ROB, check if instruction can execute, if so dispatch instruction • Execute (E): Instructions can complete out-of-order • Completion (R): Write result to reorder buffer • Retirement/Commit (W): Check for exceptions; if none, write result to architectural register file or memory; else, flush pipeline and start from exception handler • In-order dispatch/execution, out-of-order completion, in-order retirement Integer add E Integer mul E E E E W R F D FP mul E E E E E E E E E . . . E E E E E E E Load/store 23

Out-of-Order Execution (Dynamic Instruction Scheduling)

AN AN IN-ORD ORDER ER PIPEL ELINE Integer add E Integer mul E E E E R W F D FP mul E E E E E E E E . . . E E E E E E E E Cache miss • Problem: A true data dependency stalls dispatch of younger instructions into functional (execution) units • Dispatch: Act of sending an instruction to a functional unit 25

Precise Exceptions and Out-of-Order Execution Samira Khan - PowerPoint PPT Presentation

Precise Exceptions and Out-of-Order Execution Samira Khan Multi-Cycle Execution Not all instructions take the same amount of time for execution Idea: Have multiple different functional units that take different number of cycles

Troubleshooting Exceptions Module Overview Exceptions Managed Exceptions Unhandled Exceptions

Exceptions Exceptions Amtoft from Hatcliff Raising Exceptions Handling Exceptions Application

Precise Exceptions and Idea: Have multiple different functional units that take Out-of-Order

CS 104 Computer Organization and Design Exceptions and Interrupts CS104: Exceptions and

Exceptions, MIPS-Style Reminder: MIPS CPU deals with exceptions. Interrupts are

Imprecise Exceptions - Exceptions in Haskell Christopher Krau Universit at des Saarlandes

Conflict Exceptions: Simplifying Concurrent Language Semantics with Precise Hardware Exceptions

MQTT Protocol for Real Time GNSS Data and Correction Distribution Precise Positioning Precise

Precise Performance LTD Jake Yarranton jake@precise-performance.co.uk 07468 465754 Precise

Exploiting Out-of-Order-Execution Processor Side Channels to Enable Cross VM Code Execution Sophia

Reorder Buffer Method Issue Execute Write Classic 5-stage pipeline In-order In-order

MASTERING STRATEGY EXECUTION 18 BEST PRACTICES FOR STRATEGY EXECUTION STRATEGY EXECUTION AS

Out-of-Order Execution Several implementations out-of-order completion CDC 6600 with

What are Exceptions? Exceptions are rare events triggered by the hardware and forcing the

Stark Exceptions The Stark exceptions are mandatory. That is, if an arrangement falls within

Exceptions zero, that require immediate handling when encountered by your program. The C++

Outline Overview Theoretical background Parallel computing systems Parallel

SegSlice: new primitives for trustworthy computing Sergey Bratus, PKI/Trust Lab, Dartmouth

CS184c: Computer Architecture [Parallel and Multithreaded] Day 14: May 24, 2001 SCORE CALTECH

Data Flow Coverage 1 Stuart Anderson Stuart Anderson Data Flow Coverage 1 2011 c 1 Why

Introduction to Machine-Independent Optimizations - 2 Data-Flow Analysis Y.N. Srikant

Some ideas for DUNE DAQ Architecture, Triggering, Reduction (focus on single-phase) Brett Viren

A Domain-Specific Interpreter for Parallelising a Large Mixed-Language Visualisation Application

CSCI 2951U: Topics in Software Security Introduction Vasileios (Vasilis) Kemerlis January 27,