1 forwarding idea read wrong value (e.g. from register) correct - PowerPoint PPT Presentation

forwarding idea read wrong value (e.g. from register) correct value is already computed elsewhere in pipeline maybe even after old value was read substitute from wrong value using MUX 2

quiz question: forwarding in IRMOVQ irmovq $50, %r8 addq %r11, %r8 output of decode/execute regs ( irmovq ) (unchanged during execute stage) input of execute/memory regs ( irmovq ) input of decode/execute regs ( addq ) 3 cycle # 0 1 2 3 4 5 6 7 8 F D E M W F D E M W

quiz question: forwarding in IRMOVQ output of decode/execute regs ( irmovq ) (unchanged during execute stage) input of execute/memory regs ( irmovq ) input of decode/execute regs ( addq ) 3 cycle # 0 1 2 3 4 5 6 7 8 irmovq $50, %r8 F D E M W addq %r11, %r8 F D E M W

forwarding logic split execute/writeback decode/execute fetch/decode MUX MUX add 2 ADD ADD 0xF R[srcB] PC R[srcA] next R[dstE] next R[dstM] dstE dstM srcB srcA register fjle Mem. Instr. 4

some forwarding paths addq %r8, %r9 subq %r9, %r11 rmmovq %r9, 8(%r11) 5 cycle # 0 1 2 3 4 5 6 7 8 F D E M W F D E M W mrmovq 4(%r11), %r10 F D E M W F D E M W xorq %r10, %r9 F D E M W

forwarding in HCL register dE { valA : 64 = 0; dstE : 4 = 0; }; ... /* was: d_valA = reg_outputA; */ d_valA = [ reg_srcA == e_dstE : e_valE; ... 1 : reg_outputA; ]; d_dstE = ...; 6

forwarding in HCL register dE { valA : 64 = 0; dstE : 4 = 0; }; ... /* was: d_valA = reg_outputA; */ d_valA = [ ... 1 : reg_outputA; ]; d_dstE = ...; 6 reg_srcA == e_dstE : e_valE;

unsolved problem subq %rbx, %rcx subq %rbx, %rcx stall 7 cycle # 0 1 2 3 4 5 6 7 8 mrmovq 0(%rax), %rbx F D E M W F D E M W F F D E M W

multiple forwarding paths addq %r10, %r8 addq %r11, %r8 addq %r12, %r8 8 cycle # 0 1 2 3 4 5 6 7 8 F D E M W F D E M W F D E M W

8 multiple forwarding paths cycle # 0 1 2 3 4 5 6 7 8 addq %r10, %r8 F D E M W addq %r11, %r8 F D E M W addq %r12, %r8 F D E M W

multiple forwarding HCL d_valA = [ ... reg_srcA == e_dstE : e_valE; reg_srcA == m_dstE : m_valE; ... 1 : reg_outputA; ]; 9

multiple forwarding paths (2) addq %r10, %r8 addq %r11, %r12 addq %r12, %r8 10 cycle # 0 1 2 3 4 5 6 7 8 F D E M W F D E M W F D E M W

multiple forwarding paths (2) addq %r11, %r12 10 cycle # 0 1 2 3 4 5 6 7 8 addq %r10, %r8 F D E M W F D E M W addq %r12, %r8 F D E M W

multiple forwarding paths (2) addq %r10, %r8 10 cycle # 0 1 2 3 4 5 6 7 8 F D E M W addq %r11, %r12 F D E M W addq %r12, %r8 F D E M W

after forwarding/prediction where do we still need to stall? memory output needed in fetch ret followed by anything memory output needed in exceute mrmovq or popq + use (in immediatelly following instruction) 11

overall CPU 5 stage pipeline most data hazards: solved by forwarding load/use hazard: 1 cycle of stalling jXX control hazard: branch prediction + squashing 2 cycle penalty for misprediction ret control hazard: 3 cycles of stalling 12 1 instruction completes every cycle — except hazards

pipelined control costs how much faster than single-cycle processor? at most fjve times faster depends on hardware details does added logic make clock cycle slower? depends on what programs we run: how many mispredicted jumps? how many rets? how many load/use hazards? 13

hazards versus dependencies dependency — X needs result of instruction Y? hazard — will it not work in some pipeline? before extra work is done to “resolve” hazards like forwarding or stalling or branch prediction 14

ex.: dependencies and hazards (1) %rcx, which are resolved with forwarding? which are hazards in our pipeline? where are dependencies? %r10 %rbx, addq %r10 addq addq %rcx $100, irmovq %rcx %rax, subq %rbx %rax, 15

ex.: dependencies and hazards (2) %rdx which are resolved with forwarding? which are hazards in our pipeline? where are dependencies? foo: %rcx (%rdx) mrmovq %rcx mrmovq addq foo jne %rcx %rbx addq 16 0(%rax) %rbx

pipeline with difgerent hazards xorq %rax, %r10 addq/andq is not a hazard with 4-stage pipeline addq/andq is hazard with 5-stage pipeline // D // D %r11 // E // EM // M example: 4-stage pipeline: // W subq %rax, %r9 // W // // 5 stage // 4 stage fetch/decode/execute+memory/writeback 17 addq %rax, %r8 andq %r8,

pipeline with difgerent hazards xorq %rax, %r10 addq/andq is not a hazard with 4-stage pipeline addq/andq is hazard with 5-stage pipeline // D // D %r11 andq %r8, // E // EM // M example: 4-stage pipeline: // W subq %rax, %r9 // W // addq %rax, %r8 // 5 stage // 4 stage fetch/decode/execute+memory/writeback 17

exercise: difgerent pipeline F addq %r9, %rbx F D D E1 E2 M W addq %rax, %r9 D E1 E2 M D E1 E2 M W addq %rax, %r9 F F D E1 E2 M W W F split execute into two stages: F/D/E1/E2/M/W 5 cycle # 0 1 2 3 4 6 addq %r9, %rbx 7 8 addq %rcx, %r9 F D E1 E2 M W 18

exercise: forwarding paths D F D E M W mrmovq 8(%r9), %r11 F E W M W pushq %r11 F D E M W popq %r10 M cycle # 8 0 1 2 3 4 5 6 7 addq %rcx, %r9 E F D E M W rmmovq %r9, 8(%r8) F D 19

exercise: forwarding paths (alt pipe) W W F DE M pushq %r11 W F DE M mrmovq 8(%r9), %r11 W F DE M popq %r10 W F DE M rmmovq %r9, 8(%r8) F DE M suppose four-stage pipeline: addq %rcx, %r9 8 7 6 5 4 3 2 1 0 cycle # fetch/decode+execute/memory/writeback 20

overall CPU 5 stage pipeline most data hazards: solved by forwarding load/use hazard: 1 cycle of stalling jXX control hazard: branch prediction + squashing 2 cycle penalty for misprediction ret control hazard: 3 cycles of stalling 21 1 instruction completes every cycle — except hazards

pipelined control costs how much faster than single-cycle processor? at most fjve times faster depends on HW details: how expensive is forwarding logic? (new MUXes on critical path) how well balanced are the stages? depends on what programs we run: how many mispredicted jumps? how many rets? how many load/use hazards? 22

HCL2D pipeline registers valA : 64 = 0; valB : 64 = E; dstE : 4 = REG_NONE; /* Writeback */ } valE : 64 = 0; dstE : 4 = REG_NONE; register eW { /* Execute */ } register dE { register xF { /* Decode */ }; rA : 4 = REG_NONE; rB : 4 = REG_NONE; register fD { /* Fetch+PC Update*/ }; pc : 64 = 0; 23

HCL2D: Fetch/Decode pc = F_pc; pipelined d_valB = reg_outputB; d_valA = reg_outputA; dstE = D_rB; reg_srcB = D_rB; reg_srcA = D_rA; /* Decode */ f_rB = i10bytes[8..12]; f_rA = i10bytes[12..16]; x_pc = pc + 2; /* Fetch+PC Update*/ /* Fetch+PC Update*/ unpipelined valB = reg_outputB; valA = reg_outputA; dstE = rB; reg_srcB = rB; reg_srcA = rA; /* Decode */ rB = i10bytes[8..12]; rA = i10bytes[12..16]; x_pc = pc + 2; pc = F_pc; 24

1 forwarding idea read wrong value (e.g. from register) correct - PowerPoint PPT Presentation

1 forwarding idea read wrong value (e.g. from register) correct value is already computed elsewhere in pipeline maybe even after old value was read substitute from wrong value using MUX 2 quiz question: forwarding in IRMOVQ irmovq $50, %r8

Product Transport & Shipping Options 1 DHL Logistics Cambodia | 2014 DHL Global Forwarding

NEURONprocessing IDEATION AS A SERVICE IDEA Development | IDEA Developer | IDEA Software | IDEA

SOLUTIONS General Cargo Project Forwarding Industrial Services Dextra Industry and

Address Resolution ARP, RARP, Proxy ARP (C) Herbert Haas 2005/03/11 Agenda IP Forwarding

Routing Process of distributing information through network so routers can build forwarding

draft-ietf-mpls-forwarding-02 MPLS Forwarding Compliance and Performance Requirements Curtis

IDEA GROUP IGM IDEA GROUP IGM Srl since 2005 IDEA Gorup IGM srl is an Italian marketing agency

PROJECT DIVISION LEMUIR GROUP COMPANIES Project Forwarding, Bulk Handling, Customs Brokerage,

The Glimpse of Detectron : Dynamic Forwarding and Routing in Modern Detectors Ziwei Liu

Multiple Cyclic Queuing and Forwarding (slides to accompany df-finn-multiple-CQF-0919-v01) Norman

Introduction to Freight Forwarding Services International Air Freight International Ocean

CS244 Advanced Topics in Networking Lecture 7: Programmable Forwarding Nick McKeown Processing

Scalable Routing Outline Routing Algorithms Scalability 1 Overview Forwarding vs Routing

Interplay between routing and forwarding routing algorithm Routing Algorithms and Routing local

Ethernet Access Technologies 2 Moldovn Istvn Department of Budapest University of

The Network Layer Forwarding Tables and Switching Fabric Smith College, CSC 249 February

BJT e MOS at HF Let us compute the A I in CE configuration: With R L =0 we find h ie h ie

DANAE - a new experiment for direct dark matter detection using RNDR DEPFET detectors Hexi

MOLSON C OOR S B R EW IN G C OMPA N Y 2 n d QU A R TER 2017 EA R N IN GS R ESU LTS A U G U S T

Curvature-free estimates for solutions of variational problems in Riemannian geometry Alexander

1 Changelog Changes made in this version not seen in fjrst lecture: 10 October 2017: remove

Chapter 7 Programming Hardware Programming Barely Information (Low level programming

How software works Basically, computers (plus phones, smart devices, etc) are just circuitry

Machines Murray Cole Machines 1 Machines 2 Implementing Systems Monitor, mouse, keyboard etc

1 forwarding idea read wrong value (e.g. from register) correct - PowerPoint PPT Presentation

1 forwarding idea read wrong value (e.g. from register) correct value is already computed elsewhere in pipeline maybe even after old value was read substitute from wrong value using MUX 2 quiz question: forwarding in IRMOVQ irmovq $50, %r8

Product Transport &amp; Shipping Options 1 DHL Logistics Cambodia | 2014 DHL Global Forwarding

NEURONprocessing IDEATION AS A SERVICE IDEA Development | IDEA Developer | IDEA Software | IDEA

SOLUTIONS General Cargo Project Forwarding Industrial Services Dextra Industry and

Address Resolution ARP, RARP, Proxy ARP (C) Herbert Haas 2005/03/11 Agenda IP Forwarding

Routing Process of distributing information through network so routers can build forwarding

draft-ietf-mpls-forwarding-02 MPLS Forwarding Compliance and Performance Requirements Curtis

IDEA GROUP IGM IDEA GROUP IGM Srl since 2005 IDEA Gorup IGM srl is an Italian marketing agency

PROJECT DIVISION LEMUIR GROUP COMPANIES Project Forwarding, Bulk Handling, Customs Brokerage,

The Glimpse of Detectron : Dynamic Forwarding and Routing in Modern Detectors Ziwei Liu

Multiple Cyclic Queuing and Forwarding (slides to accompany df-finn-multiple-CQF-0919-v01) Norman

Introduction to Freight Forwarding Services International Air Freight International Ocean

CS244 Advanced Topics in Networking Lecture 7: Programmable Forwarding Nick McKeown Processing

Scalable Routing Outline Routing Algorithms Scalability 1 Overview Forwarding vs Routing

Interplay between routing and forwarding routing algorithm Routing Algorithms and Routing local

Ethernet Access Technologies 2 Moldovn Istvn Department of Budapest University of

The Network Layer Forwarding Tables and Switching Fabric Smith College, CSC 249 February

BJT e MOS at HF Let us compute the A I in CE configuration: With R L =0 we find h ie h ie

DANAE - a new experiment for direct dark matter detection using RNDR DEPFET detectors Hexi

MOLSON C OOR S B R EW IN G C OMPA N Y 2 n d QU A R TER 2017 EA R N IN GS R ESU LTS A U G U S T

Curvature-free estimates for solutions of variational problems in Riemannian geometry Alexander

1 Changelog Changes made in this version not seen in fjrst lecture: 10 October 2017: remove

Chapter 7 Programming Hardware Programming Barely Information (Low level programming

How software works Basically, computers (plus phones, smart devices, etc) are just circuitry

Machines Murray Cole Machines 1 Machines 2 Implementing Systems Monitor, mouse, keyboard etc

Product Transport & Shipping Options 1 DHL Logistics Cambodia | 2014 DHL Global Forwarding