EECS 252 Graduate Computer Architecture Lec 1 - Introduction David - PowerPoint PPT Presentation

EECS 252 Graduate Computer Architecture Lec 1 - Introduction David Culler Electrical Engineering and Computer Sciences University of California, Berkeley http://www.eecs.berkeley.edu/~culler http://www-inst.eecs.berkeley.edu/~cs252

Outline • What is Computer Architecture? • Computer Instruction Sets – the fundamental abstraction – review and set up • Dramatic Technology Advance • Beneath the illusion – nothing is as it appears • Computer Architecture Renaissance • How would you like your CS252? 1/18/2005 CS252-s05, Lec 01-intro 2

What is “Computer Architecture”? Applications App photo Operating System Compiler Firmware Instruction Set Architecture Instr. Set Proc. I/O system Datapath & Control Digital Design Circuit Design Layout & fab Semiconductor Materials Die photo • Coordination of many levels of abstraction • Under a rapidly changing set of forces • Design, Measurement, and Evaluation 1/18/2005 CS252-s05, Lec 01-intro 3

Forces on Computer Architecture Technology Programming Languages Applications Computer Architecture Operating Systems History (A = F / M) 1/18/2005 CS252-s05, Lec 01-intro 4

The Instruction Set: a Critical Interface software instruction set hardware • Properties of a good abstraction – Lasts through many generations (portability) – Used in many different ways (generality) – Provides convenient functionality to higher levels – Permits an efficient implementation at lower levels 1/18/2005 CS252-s05, Lec 01-intro 5

Instruction Set Architecture ... the attributes of a [computing] system as seen by the programmer, i.e. the conceptual structure and functional behavior, as distinct from the organization of the data flows and controls the logic design, and the physical implementation. – Amdahl, Blaaw, and Brooks, 1964 SOFTWARE SOFTWARE -- Organization of Programmable Storage -- Data Types & Data Structures: Encodings & Representations -- Instruction Formats -- Instruction (or Operation Code) Set -- Modes of Addressing and Accessing Data Items and Instructions -- Exceptional Conditions 1/18/2005 CS252-s05, Lec 01-intro 6

Computer Organization • Capabilities & Performance Logic Designer's View Characteristics of Principal Functional ISA Level Units – (e.g., Registers, ALU, Shifters, Logic Units, ...) FUs & Interconnect • Ways in which these components are interconnected • Information flows between components • Logic and means by which such information flow is controlled. • Choreography of FUs to realize the ISA • Register Transfer Level (RTL) Description 1/18/2005 CS252-s05, Lec 01-intro 7

Fundamental Execution Cycle Obtain instruction Memory Instruction from program Fetch storage Processor program Determine required Instruction actions and Decode regs instruction size Locate and obtain Operand F.U.s operand data Fetch Data Compute result value Execute or status von Neuman Result Deposit results in bottleneck storage for later Store use Next Determine successor Instruction instruction 1/18/2005 CS252-s05, Lec 01-intro 8

Elements of an ISA • Set of machine-recognized data types – bytes, words, integers, floating point, strings, . . . • Operations performed on those data types – Add, sub, mul, div, xor, move, …. • Programmable storage – regs, PC, memory • Methods of identifying and obtaining data referenced by instructions (addressing modes) – Literal, reg., absolute, relative, reg + offset, … • Format (encoding) of the instructions – Op code, operand fields, … Next Logical State Current Logical State of the Machine of the Machine 1/18/2005 CS252-s05, Lec 01-intro 9

Example: MIPS R3000 r0 0 Programmable storage Data types ? r1 2^32 x bytes ° Format ? ° 31 x 32-bit GPRs (R0=0) Addressing Modes? ° 32 x 32-bit FP regs (paired DP) r31 PC HI, LO, PC lo hi Arithmetic logical Add, AddU, Sub, SubU, And, Or, Xor, Nor, SLT, SLTU, AddI, AddIU, SLTI, SLTIU, AndI, OrI, XorI, LUI SLL, SRL, SRA, SLLV, SRLV, SRAV Memory Access LB, LBU, LH, LHU, LW, LWL,LWR SB, SH, SW, SWL, SWR Control 32-bit instructions on word boundary J, JAL, JR, JALR BEq, BNE, BLEZ,BGTZ,BLTZ,BGEZ,BLTZAL,BGEZAL 1/18/2005 CS252-s05, Lec 01-intro 10

Evolution of Instruction Sets Single Accumulator (EDSAC 1950) Accumulator + Index Registers (Manchester Mark I, IBM 700 series 1953) Separation of Programming Model from Implementation High-level Language Based (Stack) Concept of a Family (B5000 1963) (IBM 360 1964) General Purpose Register Machines Complex Instruction Sets Load/Store Architecture (CDC 6600, Cray 1 1963-76) (Vax, Intel 432 1977-80) RISC iX86? (MIPS,Sparc,HP-PA,IBM RS6000, 1987) 1/18/2005 CS252-s05, Lec 01-intro 11

Dramatic Technology Advance • Prehistory: Generations – 1 st Tubes – 2 nd Transistors – 3 rd Integrated Circuits – 4 th VLSI…. • Discrete advances in each generation – Faster, smaller, more reliable, easier to utilize • Modern computing: Moore’s Law – Continuous advance, fairly homogeneous technology 1/18/2005 CS252-s05, Lec 01-intro 12

Moore’s Law • “Cramming More Components onto Integrated Circuits” – Gordon Moore, Electronics, 1965 • # on transistors on cost-effective integrated circuit double every 18 months 1/18/2005 CS252-s05, Lec 01-intro 13

Technology Trends: Microprocessor Capacity 100000000 Itanium II: 241 million Pentium 4: 55 million 10000000 Moore’s Law Alpha 21264: 15 million Pentium Pentium Pro: 5.5 million i80486 1000000 PowerPC 620: 6.9 million Alpha 21164: 9.3 million i80386 Sparc Ultra: 5.2 million i80286 100000 CMOS improvements: • Die size: 2X every 3 yrs i8086 10000 • Line width: halve / 7 yrs i8080 i4004 1000 1970 1975 1980 1985 1990 1995 2000 Year 1/18/2005 CS252-s05, Lec 01-intro 14

Memory Capacity (Single Chip DRAM) size 1000000000 year size(Mb) cyc time 1980 0.0625 250 ns 100000000 1983 0.25 220 ns 10000000 1986 1 190 ns 1989 4 165 ns 1000000 1992 16 145 ns 100000 1996 64 120 ns 10000 2000 256 100 ns 2003 1024 60 ns 1000 1970 1975 1980 1985 1990 1995 2000 Year 1/18/2005 CS252-s05, Lec 01-intro 15

Technology Trends • Clock Rate: ~30% per year • Transistor Density: ~35% • Chip Area: ~15% • Transistors per chip: ~55% • Total Performance Capability: ~100% • by the time you graduate... – 3x clock rate (~10 GHz) – 10x transistor count (10 Billion transistors) – 30x raw capability • plus 16x dram density, • 32x disk density (60% per year) • Network bandwidth, … 1/18/2005 CS252-s05, Lec 01-intro 16

Performance Trends 100 Supercomputers 10 Performance Mainframes Microprocessors Minicomputers 1 0.1 1965 1970 1975 1980 1985 1990 1995 1/18/2005 CS252-s05, Lec 01-intro 17

1/18/2005 (1.35X before, 1.55X now) Processor Performance 1000 1200 200 400 600 800 0 87 Sun-4/260 88 MIPS M/2000 1.54X/yr 89 MIPS M/120 CS252-s05, Lec 01-intro 90 IBM RS/6000 HP 9000/750 91 92 DEC AXP/500 93 IBM POWER 100 DEC Alpha 4/266 94 DEC Alpha 5/300 95 DEC Alpha 5/500 96 97 DEC Alpha 21164/600 18

Definition: Performance • Performance is in units of things per sec – bigger is better • If we are primarily concerned with response time performance(x) = 1 execution_time(x) " X is n times faster than Y" means Performance(X) Execution_time(Y) n = = Performance(Y) Execution_time(Y) 1/18/2005 CS252-s05, Lec 01-intro 19

Metrics of Performance Application Answers per day/month Programming Language Compiler (millions) of Instructions per second: MIPS (millions) of (FP) operations per second: MFLOP/s ISA Datapath Megabytes per second Control Function Units Cycles per second (clock rate) Transistors Wires Pins 1/18/2005 CS252-s05, Lec 01-intro 20

CPI Components of Performance inst count Cycle time CPU time = Seconds = Instructions x Cycles x Seconds CPU time = Seconds = Instructions x Cycles x Seconds Program Program Instruction Cycle Program Program Instruction Cycle Inst Count CPI Clock Rate Program X Compiler X (X) Inst. Set. X X Organization X X Technology X 1/18/2005 CS252-s05, Lec 01-intro 21

What’s a Clock Cycle? Latch combinational or logic register • Old days: 10 levels of gates • Today: determined by numerous time-of-flight issues + gate delays – clock propagation, wire lengths, drivers 1/18/2005 CS252-s05, Lec 01-intro 22

Integrated Approach What really matters is the functioning of the complete system, I.e. hardware, runtime system, compiler, and operating system In networking, this is called the “End to End argument” • Computer architecture is not just about transistors, individual instructions, or particular implementations • Original RISC projects replaced complex instructions with a compiler + simple instructions 1/18/2005 CS252-s05, Lec 01-intro 23

How do you turn more stuff into more performance? • Do more things at once • Do the things that you do faster • Beneath the ISA illusion…. 1/18/2005 CS252-s05, Lec 01-intro 24

EECS 252 Graduate Computer Architecture Lec 1 - Introduction David - PowerPoint PPT Presentation

EECS 252 Graduate Computer Architecture Lec 1 - Introduction David Culler Electrical Engineering and Computer Sciences University of California, Berkeley http://www.eecs.berkeley.edu/~culler http://www-inst.eecs.berkeley.edu/~cs252 Outline

EECS 252 Graduate Computer Architecture Lec 7 Dynamically Scheduled Instruction Processing

CS654 Advanced Computer Architecture Lec 1 - Introduction Peter Kemper Adapted from the slides

CS654 Advanced Computer Architecture Lec 3 - Introduction Peter Kemper Adapted from the slides

CS654 Advanced Computer Architecture Lec 2 - Introduction Peter Kemper Adapted from the slides

CS654 Advanced Computer Architecture Lec 4 - Introduction Peter Kemper Adapted from the slides

CS654 Advanced Computer Architecture Lec 12 Vector Wrap-up and Multiprocessor Introduction

CS654 Advanced Computer Architecture Lec 5 Performance + Pipeline Review Peter Kemper

CS654 Advanced Computer Architecture Lec 9 Limits to ILP and Simultaneous Multithreading

CS654 Advanced Computer Architecture Lec 8 Memory Hierarchy Review Peter Kemper Adapted

CS654 Advanced Computer Architecture Lec 8 Instruction Level Parallelism Peter Kemper

CS654 Advanced Computer Architecture Lec 14 Directory Based Multiprocessors Peter Kemper

Post Graduate Fellowships Types of Fellowships Graduate Study Post Graduate Travel Post

Optimization Models EECS 127 / EECS 227AT Laurent El Ghaoui EECS department UC Berkeley Spring

Problem Solving with Similar and Right Triangles Triangles Three basic approaches to real world

Slide 4 / 252 Throughout this unit, the Standards for Mathematical Practice are used. MP1:

NOW Handout Page 1 SPARC (and RISC I) had register Can we have fast interrupts? windows Raise

Theories of Light Dark Matter and Their Connection to Intensity Experiments Kathryn M. Zurek

SuperWIMP Dark Matter Takeo Moroi (Tokyo) 1. Introduction Popular candidate of dark matter:

A Macro-Financial Analysis of the Euro Area Sovereign Bond Market (Redenomination Risk in the

Neutrino Masses from TeV Scale New Physics -- Tests of Neutrino Masses at the LHC Mu-Chun Chen,

Is SUSY still alive? Felix Brmmer Felix Brmmer Is SUSY still alive? 1 / 13 Felix Brmmer

Instruction Selection Akim Demaille tienne Renault Roland Levillain first . last

EI 338: Computer Systems Engineering (Operating Systems & Computer Architecture) Dept. of

Instruction Set The repertoire of instructions of a computer

EECS 252 Graduate Computer Architecture Lec 1 - Introduction David - PowerPoint PPT Presentation

EECS 252 Graduate Computer Architecture Lec 1 - Introduction David Culler Electrical Engineering and Computer Sciences University of California, Berkeley http://www.eecs.berkeley.edu/~culler http://www-inst.eecs.berkeley.edu/~cs252 Outline

EECS 252 Graduate Computer Architecture Lec 7 Dynamically Scheduled Instruction Processing

CS654 Advanced Computer Architecture Lec 1 - Introduction Peter Kemper Adapted from the slides

CS654 Advanced Computer Architecture Lec 3 - Introduction Peter Kemper Adapted from the slides

CS654 Advanced Computer Architecture Lec 2 - Introduction Peter Kemper Adapted from the slides

CS654 Advanced Computer Architecture Lec 4 - Introduction Peter Kemper Adapted from the slides

CS654 Advanced Computer Architecture Lec 12 Vector Wrap-up and Multiprocessor Introduction

CS654 Advanced Computer Architecture Lec 5 Performance + Pipeline Review Peter Kemper

CS654 Advanced Computer Architecture Lec 9 Limits to ILP and Simultaneous Multithreading

CS654 Advanced Computer Architecture Lec 8 Memory Hierarchy Review Peter Kemper Adapted

CS654 Advanced Computer Architecture Lec 8 Instruction Level Parallelism Peter Kemper

CS654 Advanced Computer Architecture Lec 14 Directory Based Multiprocessors Peter Kemper

Post Graduate Fellowships Types of Fellowships Graduate Study Post Graduate Travel Post

Optimization Models EECS 127 / EECS 227AT Laurent El Ghaoui EECS department UC Berkeley Spring

Problem Solving with Similar and Right Triangles Triangles Three basic approaches to real world

Slide 4 / 252 Throughout this unit, the Standards for Mathematical Practice are used. MP1:

NOW Handout Page 1 SPARC (and RISC I) had register Can we have fast interrupts? windows Raise

Theories of Light Dark Matter and Their Connection to Intensity Experiments Kathryn M. Zurek

SuperWIMP Dark Matter Takeo Moroi (Tokyo) 1. Introduction Popular candidate of dark matter:

A Macro-Financial Analysis of the Euro Area Sovereign Bond Market (Redenomination Risk in the

Neutrino Masses from TeV Scale New Physics -- Tests of Neutrino Masses at the LHC Mu-Chun Chen,

Is SUSY still alive? Felix Brmmer Felix Brmmer Is SUSY still alive? 1 / 13 Felix Brmmer

Instruction Selection Akim Demaille tienne Renault Roland Levillain first . last

EI 338: Computer Systems Engineering (Operating Systems &amp; Computer Architecture) Dept. of

Instruction Set The repertoire of instructions of a computer

EI 338: Computer Systems Engineering (Operating Systems & Computer Architecture) Dept. of