Chapter 9 such statements as they tend to sound pretty silly in 5 - PowerPoint PPT Presentation

Quote “ It would appear that we have reached the limit of what is possible to achieve with computer technology, although one should be careful with Chapter 9 such statements as they tend to sound pretty silly in 5 years” Alternative Architectures - John von Neumann, 1949 2 Chapter 9 Objectives 9.1 Introduction • We have so far studied only the simplest models of • Learn the properties that often distinguish RISC computer systems; classical single-processor von from CISC architectures. Neumann systems. • Understand how multiprocessor architectures are classified. • This chapter presents a number of different approaches to computer organization and • Appreciate the factors that create complexity in architecture. multiprocessor systems. • Become familiar with the ways in which some • Some of these approaches are in place in today’s architectures transcend the traditional von commercial systems. Others may form the basis for Neumann paradigm. the computers of tomorrow. 3 4

9.2 RISC Machines 9.2 RISC Machines • The difference between CISC and RISC becomes • The underlying philosophy of RISC machines is that evident through the basic computer performance a system is better able to manage program execution equation: when the program consists of only a few different instructions that are the same length and require the same number of clock cycles to decode and execute. • RISC systems access memory only with explicit load • RISC systems shorten execution time by reducing and store instructions. the clock cycles per instruction. • In CISC systems, many different kinds of instructions access memory, making instruction length variable • CISC systems improve performance by reducing the and fetch-decode-execute time unpredictable. number of instructions per program. 5 6 9.2 RISC Machines 9.2 RISC Machines • The simple instruction set of RISC machines • Consider the the program fragments: mov ax, 0 enables control units to be hardwired for maximum mov bx, 10 mov cx, 5 mov ax, 10 speed. Begin add ax, bx CISC mov bx, 5 RISC loop Begin • The more complex -- and variable -- instruction set mul bx, ax of CISC machines requires microcode-based control • The total clock cycles for the CISC version might be: units that interpret instructions as they are fetched (2 movs × 1 cycle) + (1 mul × 30 cycles) = 32 cycles from memory. This translation takes time. • While the clock cycles for the RISC version is: (3 movs × 1 cycle) + (5 adds × 1 cycle) + • With fixed-length instructions, RISC lends itself to (5 loops × 1 cycle) = 13 cycles pipelining and speculative execution. • With RISC clock cycle being shorter, RISC gives us much faster execution speeds. Speculative execution : The main idea is to do work before it is known whether that work will be needed at all. 7 8

9.2 RISC Machines 9.2 RISC Machines • Because of their load-store ISAs, RISC architectures • This is how registers can be require a large number of CPU registers. Common to all windows. overlapped in a • These register provide fast access to data during RISC system. sequential program execution. • The current • They can also be employed to reduce the overhead window pointer typically caused by passing parameters to (CWP) points to subprograms. the active register window. • Instead of pulling parameters off of a stack, the subprogram is directed to use a subset of registers. From the programmer's perspective, 9 10 there are only 32 registers available. 9.2 RISC Machines 9.2 RISC Machines • The save and restore • It is becoming increasingly difficult to distinguish operations allocate RISC architectures from CISC architectures. registers in a circular • Some RISC systems provide more extravagant fashion. instruction sets than some CISC systems. • If the supply of • Many systems combine both approaches. Many registers get exhausted systems now employ RISC cores to implement memory takes over, storing the register CICS architectures. windows which contain • The following two slides summarize the values from the oldest characteristics that traditionally typify the procedure activations. differences between these two architectures. 11 12

9.2 RISC Machines 9.2 RISC Machines • RISC • CISC • RISC • CISC – Multiple register sets. – Single register set. – Simple instructions, – Many complex few in number. instructions. – Three operands per – One or two register instruction. operands per instruction. – Fixed length – Variable length instructions. instructions. – Parameter passing – Parameter passing through through register memory. – Complexity in – Complexity in windows. compiler. microcode. – Single-cycle – Multiple cycle – Only LOAD/STORE – Many instructions can instructions. instructions. instructions access access memory. memory. – Hardwired – Microprogrammed control. control. – Few addressing modes. – Many addressing modes. – Highly pipelined. – Less pipelined. Continued.... 13 14 9.3 Flynn’s Taxonomy 9.3 Flynn’s Taxonomy • Many attempts have been made to come up with a • The four combinations of multiple processors and way to categorize computer architectures. multiple data streams are described by Flynn as: • Flynn’s Taxonomy (1972) has been the most – SISD : S ingle i nstruction stream, s ingle d ata stream. These are classic uniprocessor systems. enduring of these, despite having some limitations. – SIMD : S ingle i nstruction stream, m ultiple d ata streams. • Flynn’s Taxonomy takes into consideration the Execute the same instruction on multiple data values, as in number of processors and the number of data vector processors and GPUs. streams incorporated into an architecture. – MIMD : M ultiple i nstruction streams, m ultiple d ata streams. • A machine can have one or many processors that These are today’s parallel architectures. operate on one or many data streams. – MISD : M ultiple i nstruction streams, s ingle d ata stream. As of 2006, all the top 10 and most of the TOP500 supercomputers 15 are based on a MIMD architecture (Wikipedia). 16

9.3 Flynn’s Taxonomy 9.3 Flynn’s Taxonomy SISD (Single-Instruction Single-Data) MIMD (Multiple-Instruction Multiple-Data) I I I 1 n . . . PE PE PE 1 n D D D D D D i1 o1 in on i o SIMD (Single-Instruction Multiple-Data) MISD (Multiple-Instruction Single-Data) I I I 1 n . . . PE PE 1 n . . . Pipeline PE PE 1 n D D D D Fault-tolerance i1 o1 in on D D i o PE: Processing element 17 18 I: Instruction D: Data 9.3 Flynn’s Taxonomy 9.3 Flynn’s Taxonomy • Flynn’s Taxonomy falls short in a number of ways: • Symmetric multiprocessors (SMP) and massively parallel processors (MPP) are MIMD architectures • First, there appears to be very few (if any) that differ in how they use memory. applications for MISD machines. • SMP systems share the same memory and MPP • Second, parallelism is not homogeneous. do not. This assumption ignores the contribution of specialized processors. • An easy way to distinguish SMP from MPP is: • Third, it provides no straightforward way to SMP fewer processors + shared memory + ⇒ communication via memory distinguish architectures of the MIMD category. ⇒ – One idea is to divide these systems into those that share MPP many processors + distributed memory + memory, and those that don’t, as well as whether the communication via network (messages) interconnections are bus-based or switch-based. 19 20

9.3 Flynn’s Taxonomy 9.3 Flynn’s Taxonomy • Other examples of MIMD architectures are found in • Flynn’s Taxonomy has been expanded to include distributed computing, where processing takes SPMD ( s ingle p rogram, m ultiple d ata) architectures. place collaboratively among networked computers. • Each SPMD processor has its own data set and – A network of workstations (NOW) uses otherwise idle program memory. Different nodes can execute systems to solve a problem. different instructions within the same program using – A collection of workstations (COW) is a NOW where one instructions similar to: workstation coordinates the actions of the others. If myNodeNum = 1 do this, else do that – A dedicated cluster parallel computer (DCPC) is a group of • Yet another idea missing from Flynn’s is whether the workstations brought together to solve a specific problem. architecture is instruction driven or data driven. – A pile of PCs (POPC) is a cluster of (usually) heterogeneous systems that form a dedicated parallel system. The next slide provides a revised taxonomy. NOWs, COWs, DCPCs, and POPCs are all examples of cluster computing . 21 22 9.4 Parallel and Multiprocessor 9.3 Flynn’s Taxonomy Architectures • If we are using an ox to pull out a tree, and the tree is too large, we don't try to grow a bigger ox. • In stead, we use two oxen. According to Wikipedia, SPMD is a subcategory of MIMD • Multiprocessor architectures are analogous to the oxen. 23 24

Chapter 9 such statements as they tend to sound pretty silly in 5 - PowerPoint PPT Presentation

Quote It would appear that we have reached the limit of what is possible to achieve with computer technology, although one should be careful with Chapter 9 such statements as they tend to sound pretty silly in 5 years Alternative

Topics 11/13/2006 Chapter 11, start Chapter 12 11/20/2006 Chapter 12 11/27/2006 Chapter 13

Topics 11/13/2006 Chapter 11, start Chapter 12 11/20/2006 Chapter 12 Inheritance Concepts

Chapter 13 Chapter 13 1 What is this? Chapter 13 2 What is this? Chapter 13 3 What is

CHAPTER CHAPTER VII CHAPTER CHAPTER VII VII VII MANAGEMENT AND MANAGEMENT AND

Appendix A Chapter 9 versus Chapter 1 1 at a Glance Chapter 9 Chapter 1 1 ( I n) voluntary Cannot

CHAPTER II I CHAPTER I Recurrent Neural Networks Recurrent Neural Networks CHAPTER II : I :

Pushdown Automata Chapter 5 Chapter 5 Chapter 5 Chapter 5

Chapter 6 Programme design and development Lets Recap Chapter 2: Chapter 3: Chapter 1:

OWASP London Chapter Meeting 27th July 2017 London Chapter Chapter Leaders: Sam

Constraint Satisfaction Problem s C t i t S ti f ti P bl Reading: Chapter 6 (3 rd ed );

Chapter 3 Chapter 3 Data Description McGraw-Hill, Bluman, 7 th ed, Chapter 3 1 Ch Chapter 3

OWASP London Chapter Meeting 23rd November 2017 London Chapter Chapter Leaders: Sam

A.I.S. Class 22: Outline I Learning Objectives for Chapter 8 I Chapter 8 Quiz I New ACCESS Features

A.I.S. Class 27: Outline I Learning Objectives for Chapter 8 I Chapter 8 Quiz I New ACCESS Features

Chapters for the Final Exam Chapter 20: Electric forces and fields (Conceptual Questions) Chapter

Chapter: 9 9 9 9 Chapter: Chapter: Chapter: High-Speed Downlink High-Speed Downlink Packet

Disciplina Sistemas de Computao Aula 04 Aviso Slides e Arquivos j esto no site

Hardware-Software Codesign 3. Mapping Applications To Architectures Lothar Thiele Computer

Dataflow Supercomputers Michael J. Flynn Maxeler T echnologies and Stanford University Outline

Class 3 Review; questions Basic Analyses (3) Assign (see Schedule for links)

Data Speculation Adam Wierman Daniel Neill Lipasti and Shen. Exceeding the dataflow limit, 1996.

CS510 Software Engineering Static Program Analysis Asst. Prof. Mathias Payer Department of

An FPGA-Based Scalable Hardware Scheduler For Data-Flow Models Roberto Giorgi, Marco Procaccini,

Computing in Space PRACE Keynote Oskar Mencer, April 2014 Thinking