CSCI341 Lecture 37, Introduction to Parallelism PIPELINING - PowerPoint PPT Presentation

CSCI341 Lecture 37, Introduction to Parallelism

PIPELINING “Exploits potential parallelism among instructions.” “Instruction-level parallelism”

INSTRUCTION-LEVEL PARALLELISM • Increase depth of pipeline (greater overlap of instructions) • Replicate hardware (handle more instructions simultaneously) • aka “multiple issue”

MULTIPLE ISSUE • Instruction execution can exceed clock rate • CPI less than 1

EXAMPLE A 4GHz four-way multiple issue microprocessor... • 16 billion instructions per second • Ideal CPI of 0.25 (IPC of 4) • In a five-stage pipeline, 20 instructions in progress at once (modern CPUs approach 3 - 6 instructions per cycle)

2 IMPORTANT IMPLEMENTATIONS • Compile-time (statically) • During execution (dynamically)

CHALLENGES • How does the CPU determine how many instructions (and which instructions) can be issued? • How do we deal with data/control hazards?

SPECULATION • The compiler / CPU “guesses” about the properties of an instructions. • eg, branching, storing & loading • Potential for bad guesses (changing the decision is complex) • Buffering speculated instructions • Buffering exceptions

STATIC MULTIPLE ISSUE SYSTEM Heavy reliance on the compiler.

ISSUE PACKET • Set of instructions issued in a given clock cycle • Very Long Instruction Word (VLIW)

CONSIDER... A two-issue MIPS processor. • One instruction can be ALU operation or branch • The other can be load/store (lets call it “TIM”)

TIM • How many bits of instructions per cycle? • Instructions paired, aligned. • ALU/branch instruction is “first.” • If one member of the pair can’t be used, replace with nop.

TIM Two instructions per stage at a time.

TIM HAZARDS • Sometimes, it’s the compiler’s full responsibility • Remove hazards by arranging/scheduling instructions • Inserting NOPs where necessary, etc

TIM HAZARDS • Sometimes, the hardware detects hazards between issue packets • Generates stalls • Still relies on compiler to generate appropriate packets

TIM’S DATAPATH

TIM’S DATAPATH 32 more bits from instruction memory Two more read ‘ports’ one more write ‘port’ Extra ALU

NO MAGIC SPEED BOOST Potential to double performance. Potential for hazards to impact two instructions.

that’s a noun USE LATENCY “Number of clock cycles between a load instruction and an instruction that can use the result of the load without stalling the pipeline.”

MIPS use latency of one cycle TIM Potentially impacts two instructions.

TIM really needs to rely on the compiler.

EXAMPLE Loop: lw $t0, 0($s1) addu $t0, $t0, $s2 sw $t0, 0($s1) addi $s1, $s1, -4 bne $s1, $zero, Loop How might we schedule this for TIM?

lw $t0, 0($s1) addu $t0, $t0, $s2 sw $t0, 0($s1) EXAMPLE addi $s1, $s1, -4 bne $s1, $zero, Loop ALU/branch ins. Data xfer ins. clock cycle Loop: 1 lw $t0, 0($s1) 2 addi $s1, $s1, -4 3 addu $t0, $t0, $s2 4 bne $s1, $zero, Loop sw $t0, 4($s1)

LOOP UNROLLING (compiler technique) For loops that access arrays, make multiple copies of the loop body. Schedule instructions from different iterations together.

LOOP UNROLLING Challenge: how does this work? (p396 - 398)

HOMEWORK • Reading 31 • Continue Project 8 TIMmeh!

CSCI341 Lecture 37, Introduction to Parallelism PIPELINING - PowerPoint PPT Presentation

CSCI341 Lecture 37, Introduction to Parallelism PIPELINING Exploits potential parallelism among instructions. Instruction-level parallelism INSTRUCTION-LEVEL PARALLELISM Increase depth of pipeline (greater overlap of

CSCI341 Lecture 18, IEEE Floating Point Image courtesy of http://debsbookbag.blogspot.com/ The

CSCI341 Lecture 27, ASCII &Unicode, Addressing Modes ASCII American Standard Code for

CSCI341 Lecture 22, MIPS Programming: Directives, Linkers, Loaders, Memory REVIEW Assemblers

CSCI341 Lecture 31, Control RECALL... The datapath is a representation of the flow of

CSCI341 Lecture 30, Building a Datapath RECALL... The datapath is a representation of

CSCI341 Lecture 11, Logical Operations Image courtesy of http://debsbookbag.blogspot.com/ vs

CSCI341 Lecture 38, Introduction to Multicore Architectures GOAL: PERFORMANCE Recall: Power as

CSCI341 Lecture 36, Pipelining & Hazards RECALL... RECALL... HAZARDS Data Hazards

CSCI341 Lecture 21, MIPS Programming REVIEW Assemblers understand special commands called

CS 6354: Branch Prediction (cont) / Multiple Issue blt $t0, 10000, loop addiu $t0, $t0, 1 ...

CS 3410 Computer Science Cornell University The slides are the product of many rounds of

LECTURE 10 Pipelining: Advanced ILP EXCEPTIONS An exception , or interrupt , is an event

CSC2/458 Parallel and Distributed Systems Automated Parallelization in Software Sreepathi Pai

CS356 : Discussion #14 Processor Architecture Marco Paolieri (paolieri@usc.edu) Illustrations

Processor Architecture 2 Lab Schedule Ac=vi=es Assignments

Energy-efficient parallel software for mobile hand-held devices Antti P Miettinen , Nokia Research

Private Information Retrieval Vesa Vaskelainen Helsinki University of Technology

SELLING AGILE V E S A PA L M U @ W U N D E R . U K B U S I N E S S A N D S T R AT E G Y T R

NixOS Andres L oh joint work with Eelco Dolstra Department of Information and Computing

mikro - Introducing a C ++ Mikro Kernel mikro Low level C ++ Design Victor Aperc & Julien

Stu fg Drupal with Feeds and custom feeds plugins Developer Days Barcelona 2012 Mikael Kundert

Fun with the Linux Desktop 3D-Desktop with XGL/AIGLX and compiz mrmcd101b, Darmstadt, 1. -

Modeling a vibrating string terminated against a bridge with arbitrary geometry Dmitri

Microsoft File System Microsoft File System Instructor: Chia-Tsun Wu. 11/25/2004 ACCESS IC LAB

CSCI341 Lecture 37, Introduction to Parallelism PIPELINING - PowerPoint PPT Presentation

CSCI341 Lecture 37, Introduction to Parallelism PIPELINING Exploits potential parallelism among instructions. Instruction-level parallelism INSTRUCTION-LEVEL PARALLELISM Increase depth of pipeline (greater overlap of

CSCI341 Lecture 18, IEEE Floating Point Image courtesy of http://debsbookbag.blogspot.com/ The

CSCI341 Lecture 27, ASCII &amp;Unicode, Addressing Modes ASCII American Standard Code for

CSCI341 Lecture 22, MIPS Programming: Directives, Linkers, Loaders, Memory REVIEW Assemblers

CSCI341 Lecture 31, Control RECALL... The datapath is a representation of the flow of

CSCI341 Lecture 30, Building a Datapath RECALL... The datapath is a representation of

CSCI341 Lecture 11, Logical Operations Image courtesy of http://debsbookbag.blogspot.com/ vs

CSCI341 Lecture 38, Introduction to Multicore Architectures GOAL: PERFORMANCE Recall: Power as

CSCI341 Lecture 36, Pipelining &amp; Hazards RECALL... RECALL... HAZARDS Data Hazards

CSCI341 Lecture 21, MIPS Programming REVIEW Assemblers understand special commands called

CS 6354: Branch Prediction (cont) / Multiple Issue blt $t0, 10000, loop addiu $t0, $t0, 1 ...

CS 3410 Computer Science Cornell University The slides are the product of many rounds of

LECTURE 10 Pipelining: Advanced ILP EXCEPTIONS An exception , or interrupt , is an event

CSC2/458 Parallel and Distributed Systems Automated Parallelization in Software Sreepathi Pai

CS356 : Discussion #14 Processor Architecture Marco Paolieri (paolieri@usc.edu) Illustrations

Processor Architecture 2 Lab Schedule Ac=vi=es Assignments

Energy-efficient parallel software for mobile hand-held devices Antti P Miettinen , Nokia Research

Private Information Retrieval Vesa Vaskelainen Helsinki University of Technology

SELLING AGILE V E S A PA L M U @ W U N D E R . U K B U S I N E S S A N D S T R AT E G Y T R

NixOS Andres L oh joint work with Eelco Dolstra Department of Information and Computing

mikro - Introducing a C ++ Mikro Kernel mikro Low level C ++ Design Victor Aperc &amp; Julien

Stu fg Drupal with Feeds and custom feeds plugins Developer Days Barcelona 2012 Mikael Kundert

Fun with the Linux Desktop 3D-Desktop with XGL/AIGLX and compiz mrmcd101b, Darmstadt, 1. -

Modeling a vibrating string terminated against a bridge with arbitrary geometry Dmitri

Microsoft File System Microsoft File System Instructor: Chia-Tsun Wu. 11/25/2004 ACCESS IC LAB

CSCI341 Lecture 27, ASCII &Unicode, Addressing Modes ASCII American Standard Code for

CSCI341 Lecture 36, Pipelining & Hazards RECALL... RECALL... HAZARDS Data Hazards

mikro - Introducing a C ++ Mikro Kernel mikro Low level C ++ Design Victor Aperc & Julien