Processor Architectures 2 Schedule Friday, April 13 th - PowerPoint PPT Presentation

ì ¡ Computer ¡Systems ¡and ¡Networks ¡ ECPE ¡170 ¡– ¡Jeff ¡Shafer ¡– ¡University ¡of ¡the ¡Pacific ¡ Processor ¡ Architectures ¡

2 ¡ Schedule ¡ ì Friday, ¡April ¡13 th ¡ – ¡ Pacific ¡Day ¡– ¡No ¡class ¡ ì Exam ¡3 ¡– ¡Friday, ¡April ¡20 th ¡ ¡ Caches ¡ ì Virtual ¡Memory ¡ ì Input ¡/ ¡Output ¡ ì OperaIng ¡Systems ¡ ì Compilers ¡& ¡Assemblers ¡ ì Processor ¡Architecture ¡ ì Review ¡the ¡lecture ¡notes ¡before ¡the ¡exam ¡ ì (not ¡just ¡the ¡homework!) ¡ No ¡calculators ¡for ¡this ¡exam ¡ ì Computer ¡Systems ¡and ¡Networks ¡ Spring ¡2012 ¡

3 ¡ ì ¡ Flynn’s ¡Taxonomy ¡ Computer ¡Systems ¡and ¡Networks ¡ Spring ¡2012 ¡

4 ¡ Flynn’s ¡Taxonomy ¡ ì Many ¡aMempts ¡have ¡been ¡made ¡to ¡come ¡up ¡with ¡a ¡ way ¡to ¡categorize ¡computer ¡architectures ¡ ì Flynn’s ¡Taxonomy ¡ has ¡been ¡the ¡most ¡enduring ¡of ¡ these ¡ ì But ¡it ¡is ¡not ¡perfect! ¡ ì ConsideraIons ¡ ì Number ¡of ¡processors? ¡ ì Number ¡of ¡data ¡paths? ¡(or ¡data ¡streams) ¡ Computer ¡Systems ¡and ¡Networks ¡ Spring ¡2012 ¡

5 ¡ Flynn’s ¡Taxonomy ¡ ì SISD : ¡Single ¡instrucIon ¡stream, ¡single ¡data ¡stream ¡ Classic ¡uniprocessor ¡system ¡(e.g. ¡MARIE) ¡ ì ì SIMD : ¡Single ¡instrucIon ¡stream, ¡mulIple ¡data ¡streams ¡ Execute ¡the ¡same ¡instrucIon ¡on ¡mulIple ¡data ¡values ¡ ì Example: ¡Vector ¡processor ¡ ì ì MIMD: ¡MulIple ¡instrucIon ¡streams, ¡mulIple ¡data ¡ streams ¡ Today’s ¡parallel ¡architectures ¡ ì ì MISD: ¡MulIple ¡instrucIon ¡streams, ¡single ¡data ¡stream ¡ Uncommon ¡– ¡used ¡for ¡fault ¡tolerance ¡ ì Computer ¡Systems ¡and ¡Networks ¡ Spring ¡2012 ¡

6 ¡ Instruction-‑Level ¡Parallelism ¡ ì Example ¡program: ¡ (imagine ¡it ¡was ¡in ¡assembly) ¡ ① e = a + b; ② f = c + d; ③ g = e * h; ì Assume ¡we ¡have ¡a ¡processor ¡with ¡“lots” ¡of ¡ALUs ¡ ì What ¡instrucUons ¡can ¡be ¡executed ¡in ¡parallel? ¡ ì What ¡instrucUons ¡cannot ¡be ¡executed ¡in ¡parallel? ¡ Computer ¡Systems ¡and ¡Networks ¡ Spring ¡2012 ¡

7 ¡ Instruction-‑Level ¡Parallelism ¡ ì Example ¡program ¡2: ¡ (imagine ¡it ¡was ¡in ¡assembly) ¡ ① e = a + b; ② f = c + d; ③ if(e > f) ④ a = 15; ⑤ else ⑥ a = 18; ⑦ g = h + 30; ì Assume ¡we ¡have ¡a ¡processor ¡with ¡“lots” ¡of ¡ALUs ¡ What ¡instrucUons ¡can ¡be ¡executed ¡in ¡parallel? ¡ ì What ¡instrucUons ¡cannot ¡be ¡executed ¡in ¡parallel? ¡ ì ì If ¡we ¡tried ¡really ¡hard, ¡could ¡we ¡run ¡them ¡in ¡parallel? ¡ Computer ¡Systems ¡and ¡Networks ¡ Spring ¡2012 ¡

8 ¡ Instruction-‑Level ¡Parallelism ¡ ì This ¡is ¡ instrucUon-‑level ¡parallelism ¡ ¡ ì Finding ¡instrucIons ¡in ¡the ¡ same ¡program ¡that ¡be ¡ executed ¡in ¡parallel ¡ ì Different ¡ from ¡ mulI-‑core ¡parallelism, ¡which ¡ executes ¡instrucIons ¡from ¡ different ¡programs ¡in ¡ parallel ¡ ì You ¡can ¡do ¡this ¡in ¡a ¡single ¡“core” ¡of ¡a ¡CPU ¡ ì Adding ¡more ¡ALUs ¡to ¡the ¡chip ¡is ¡easy ¡ ì Finding ¡the ¡parallelism ¡to ¡exploit ¡is ¡harder… ¡ ì Gebng ¡the ¡data ¡to ¡the ¡ALUs ¡is ¡harder… ¡ Computer ¡Systems ¡and ¡Networks ¡ Spring ¡2012 ¡

9 ¡ Instruction-‑Level ¡Parallelism ¡ ì InstrucUon-‑level ¡parallelism ¡is ¡good ¡ ì Let’s ¡find ¡as ¡much ¡of ¡it ¡as ¡possible ¡and ¡use ¡it ¡to ¡ decrease ¡execuIon ¡Ime! ¡ ì Two ¡compeIng ¡methods: ¡ ì Superscalar : ¡the ¡ hardware ¡finds ¡the ¡parallelism ¡ ì VLIW : ¡the ¡ compiler ¡finds ¡the ¡parallelism ¡ ì Both ¡designs ¡have ¡ mulUple ¡execuUon ¡units ¡ ¡ (e.g. ¡ALUs) ¡in ¡a ¡ single ¡processor ¡core ¡ Computer ¡Systems ¡and ¡Networks ¡ Spring ¡2012 ¡

10 ¡ MIMD ¡– ¡Superscalar ¡ ì Superscalar ¡designs ¡– ¡the ¡ hardware ¡finds ¡the ¡ instrucUon-‑level ¡parallelism ¡while ¡the ¡program ¡is ¡ running ¡ ì Challenges ¡ ì CPU ¡ instruc8on ¡fetch ¡unit ¡must ¡simultaneously ¡ retrieve ¡several ¡instrucIons ¡from ¡memory ¡ ì CPU ¡ instruc8on ¡decoding ¡unit ¡determines ¡which ¡of ¡ these ¡instrucIons ¡can ¡be ¡executed ¡in ¡parallel ¡and ¡ combines ¡them ¡accordingly ¡ ì Complicated! ¡ Computer ¡Systems ¡and ¡Networks ¡ Spring ¡2012 ¡

11 ¡ MIMD ¡– ¡VLIW ¡ Very ¡long ¡instrucUon ¡word ¡(VLIW) ¡designs ¡– ¡the ¡ compiler ¡ ì finds ¡the ¡ instrucUon-‑level ¡parallelism ¡before ¡the ¡program ¡ executes ¡ The ¡ compiler ¡packs ¡mulIple ¡instrucIons ¡into ¡one ¡ long ¡ ì instrucIons ¡that ¡the ¡hardware ¡executes ¡in ¡parallel ¡ Arguments: ¡ ì For : ¡Simplifies ¡hardware, ¡plus ¡the ¡compiler ¡can ¡beMer ¡ ì idenIfy ¡instrucIon ¡dependencies ¡(it ¡has ¡more ¡Ime ¡to ¡work) ¡ Against : ¡Compilers ¡cannot ¡have ¡a ¡view ¡of ¡the ¡run ¡Ime ¡code, ¡ ì and ¡must ¡plan ¡for ¡all ¡possible ¡branches ¡and ¡code ¡paths ¡ Examples: ¡Intel ¡Itanium, ¡ATI ¡R600-‑R900 ¡GPUs ¡ ì Computer ¡Systems ¡and ¡Networks ¡ Spring ¡2012 ¡

12 ¡ Instruction-‑Level ¡Parallelism ¡ ì Back ¡to ¡the ¡example ¡ ì More ¡techniques ¡for ¡ILP ¡ program: ¡ ì SpeculaUve ¡execuUon ¡ ¡ (or ¡ branch ¡predicUon ) ¡ ① e = a + b; Guess ¡that ¡e>f, ¡and ¡ ② f = c + d; ì execute ¡line ¡4 ¡ ③ if(e > f) immediately… ¡ ④ a = 15; ⑤ else ì Out-‑of-‑order ¡execuUon ¡ ⑥ a = 18; ⑦ g = h + 30; Execute ¡line ¡7 ¡before ¡4-‑6, ¡ ì since ¡it ¡doesn’t ¡depend ¡on ¡ them ¡ Computer ¡Systems ¡and ¡Networks ¡ Spring ¡2012 ¡

13 ¡ Shared ¡Memory ¡Multiprocessors ¡ ì Imagine ¡a ¡mulI-‑core ¡CPU. ¡How ¡do ¡different ¡cores ¡ (running ¡different ¡programs) ¡communicate ¡with ¡ each ¡other? ¡ ì One ¡common ¡approach ¡– ¡use ¡main ¡memory! ¡ ì Referred ¡to ¡as ¡ symmetric ¡mulUprocessing ¡(SMP) ¡ ì The ¡processors ¡do ¡not ¡necessarily ¡have ¡to ¡share ¡the ¡ same ¡block ¡of ¡physical ¡memory ¡ ì Each ¡processor ¡can ¡have ¡its ¡own ¡memory, ¡but ¡it ¡ must ¡share ¡it ¡with ¡the ¡other ¡processors ¡ Computer ¡Systems ¡and ¡Networks ¡ Spring ¡2012 ¡

14 ¡ Shared ¡Memory ¡Multiprocessors ¡ ì Shared ¡memory ¡MIMD ¡machines ¡can ¡be ¡divided ¡ into ¡two ¡categories ¡based ¡upon ¡how ¡they ¡access ¡ memory ¡ ì Uniform ¡memory ¡access ¡(UMA) ¡ ì Non-‑uniform ¡memory ¡access ¡(NUMA) ¡ ¡ Computer ¡Systems ¡and ¡Networks ¡ Spring ¡2012 ¡

15 ¡ Shared ¡Memory ¡Multiprocessors ¡ ì MIMD ¡ uniform ¡memory ¡access ¡(UMA) ¡ ì All ¡memory ¡accesses ¡take ¡the ¡same ¡amount ¡of ¡Ime ¡ ì Hard ¡to ¡scale ¡to ¡large ¡numbers ¡of ¡processors! ¡ ì Bus ¡becomes ¡a ¡boMleneck ¡ Processor ¡ Processor ¡ Processor ¡ Processor ¡ Cache ¡ Cache ¡ Cache ¡ Cache ¡ Bus ¡ Memory ¡ Memory ¡ Computer ¡Systems ¡and ¡Networks ¡ Spring ¡2012 ¡

16 ¡ Shared ¡Memory ¡Multiprocessors ¡ ì MIMD ¡ nonuniform ¡memory ¡access ¡(NUMA) ¡ A ¡processor ¡can ¡access ¡its ¡own ¡memory ¡much ¡more ¡ ì quickly ¡than ¡it ¡can ¡access ¡memory ¡that ¡is ¡elsewhere ¡ Each ¡processor ¡has ¡its ¡own ¡memory ¡and ¡cache ¡ ì ì More ¡scalable ¡/ ¡ cache ¡coherence ¡challenges ! ¡ Processor ¡ Processor ¡ Processor ¡ Processor ¡ Cache ¡ Cache ¡ Cache ¡ Cache ¡ Memory ¡ Memory ¡ Memory ¡ Memory ¡ Bus ¡ Computer ¡Systems ¡and ¡Networks ¡ Spring ¡2012 ¡

Processor Architectures 2 Schedule Friday, April 13 th - PowerPoint PPT Presentation

Computer Systems and Networks ECPE 170 Jeff Shafer University of the Pacific Processor Architectures 2 Schedule Friday, April 13 th

FPGA co-processor Patrick Dunne for the co-processor group Introduction Co-processor will

Architectures Architectural styles Software architectures Architectures versus middleware

Processor Design Pipelined Processor Hung-Wei Tseng Drawbacks of a single-cycle processor

Systems Architecture The ARM Processor The ARM Processor p. 1/14 The ARM Processor ARM:

Monte Carlo Processor Modeling Monte Carlo Processor Modeling of Contemporary Computer of

Router Architectures CPU CPU Memory Memory packets NFE NFE Processor Processor Line Card

Cortex-A15 Processor ARMs next generation mobile applications processor Travis Lanier Senior

Ch. 5: Processor + Memory December 12, 2008 Ch. 5: Processor + Memory Overview of Implementation

Chapter 12 CPU Structure and Function Contents Processor organization Register

Processor Architecture: Current Trends A B Transfer a truckload at a time from A to B Processor

Embedded systems & the Nios II soft core processor A Nios II processor system I equivalent to

Processor Design Single Cycle Processor Hung-Wei Tseng Recap: the stored-program computer

Datapath component (4) Prof. Usagi Recap: Memory hierarchy in modern processor

of Transient Errors Occurring in Processor-based Digital Architectures: Principles and

CompSci 356: Computer Network Architectures Lecture 2: Network Architectures Xiaowei Yang

Architectures, Architectures, Microkernels, IPC, Microkernels, IPC, Capabilities Capabilities

24 Implementation of Iso-P Triangular Elements IFEM Ch 24 Slide 1 Department of

Variational Time Integrators Symposium on Geometry Processing Course 2015 Andrew Sageman-Furnas

Asymptotic enumeration of labelled planar graphs . Omer Gimnez, Marc Noy omer.gimenez@upc.edu

204111 Computer & Programming Lecture # 5.2: Writing Methods http: / / mike.cpe.ku.ac.th/

Course Material Course material: www.cs.umu.se/kurser/5DV011/VT12 Lecture 1: Introduction

THREAD LEVEL PARALLELISM Mahdi Nazm Bojnordi Assistant Professor School of Computing University

Lecture 5.1 Flynns Taxonomy EN 600.320/420/620 Instructor: Randal Burns 12 February 2018

CS3350B Computer Organization Chapter 5: Parallel Architectures Alex Brandt Department of

Sambuz

Useful Links

Newsletter

Mail Us

Processor Architectures 2 Schedule Friday, April 13 th - PowerPoint PPT Presentation

Computer Systems and Networks ECPE 170 Jeff Shafer University of the Pacific Processor Architectures 2 Schedule Friday, April 13 th

FPGA co-processor Patrick Dunne for the co-processor group Introduction Co-processor will

Architectures Architectural styles Software architectures Architectures versus middleware

Processor Design Pipelined Processor Hung-Wei Tseng Drawbacks of a single-cycle processor

Systems Architecture The ARM Processor The ARM Processor p. 1/14 The ARM Processor ARM:

Monte Carlo Processor Modeling Monte Carlo Processor Modeling of Contemporary Computer of

Router Architectures CPU CPU Memory Memory packets NFE NFE Processor Processor Line Card

Cortex-A15 Processor ARMs next generation mobile applications processor Travis Lanier Senior

Ch. 5: Processor + Memory December 12, 2008 Ch. 5: Processor + Memory Overview of Implementation

Chapter 12 CPU Structure and Function Contents Processor organization Register

Processor Architecture: Current Trends A B Transfer a truckload at a time from A to B Processor

Embedded systems &amp; the Nios II soft core processor A Nios II processor system I equivalent to

Processor Design Single Cycle Processor Hung-Wei Tseng Recap: the stored-program computer

Datapath component (4) Prof. Usagi Recap: Memory hierarchy in modern processor

of Transient Errors Occurring in Processor-based Digital Architectures: Principles and

CompSci 356: Computer Network Architectures Lecture 2: Network Architectures Xiaowei Yang

Architectures, Architectures, Microkernels, IPC, Microkernels, IPC, Capabilities Capabilities

24 Implementation of Iso-P Triangular Elements IFEM Ch 24 Slide 1 Department of

Variational Time Integrators Symposium on Geometry Processing Course 2015 Andrew Sageman-Furnas

Asymptotic enumeration of labelled planar graphs . Omer Gimnez, Marc Noy omer.gimenez@upc.edu

204111 Computer &amp; Programming Lecture # 5.2: Writing Methods http: / / mike.cpe.ku.ac.th/

Course Material Course material: www.cs.umu.se/kurser/5DV011/VT12 Lecture 1: Introduction

THREAD LEVEL PARALLELISM Mahdi Nazm Bojnordi Assistant Professor School of Computing University

Lecture 5.1 Flynns Taxonomy EN 600.320/420/620 Instructor: Randal Burns 12 February 2018

CS3350B Computer Organization Chapter 5: Parallel Architectures Alex Brandt Department of

Sambuz

Useful Links

Newsletter

Mail Us

Embedded systems & the Nios II soft core processor A Nios II processor system I equivalent to

204111 Computer & Programming Lecture # 5.2: Writing Methods http: / / mike.cpe.ku.ac.th/