CS654 Advanced Computer Architecture Lec 2 - Introduction Peter - PowerPoint PPT Presentation

CS654 Advanced Computer Architecture Lec 2 - Introduction Peter Kemper Adapted from the slides of EECS 252 by Prof. David Patterson Electrical Engineering and Computer Sciences University of California, Berkeley

Outline • Computer Science at a Crossroads • Computer Architecture v. Instruction Set Arch. • What Computer Architecture brings to table • Technology Trends 2 1/23/09 CS654 W&M

What Computer Architecture brings to Table • Other fields often borrow ideas from architecture • Quantitative Principles of Design 1. Take Advantage of Parallelism 2. Principle of Locality 3. Focus on the Common Case 4. Amdahl’s Law 5. The Processor Performance Equation • Careful, quantitative comparisons – Define, quantify, and summarize relative performance – Define and quantify relative cost – Define and quantify dependability – Define and quantify power • Culture of anticipating and exploiting advances in technology • Culture of well-defined interfaces that are carefully implemented and thoroughly checked 3 1/23/09 CS654 W&M

1) Taking Advantage of Parallelism • Increasing throughput of server computer via multiple processors or multiple disks • Detailed HW design – Carry lookahead adders uses parallelism to speed up computing sums from linear to logarithmic in number of bits per operand – Multiple memory banks searched in parallel in set-associative caches • Pipelining: overlap instruction execution to reduce the total time to complete an instruction sequence. – Not every instruction depends on immediate predecessor ⇒ executing instructions completely/partially in parallel possible – Classic 5-stage pipeline: 1) Instruction Fetch (Ifetch), 2) Register Read (Reg), 3) Execute (ALU), 4) Data Memory Access (Dmem), 5) Register Write (Reg) 4 1/23/09 CS654 W&M

Pipelined Instruction Execution Time (clock cycles) Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 I ALU n Reg Ifetch Reg DMem s t r. ALU Reg Ifetch Reg DMem O r ALU Reg Ifetch Reg DMem d e r ALU Reg Ifetch Reg DMem 5 1/23/09 CS654 W&M

Limits to pipelining • Hazards prevent next instruction from executing during its designated clock cycle – Structural hazards: attempt to use the same hardware to do two different things at once – Data hazards: Instruction depends on result of prior instruction still in the pipeline – Control hazards: Caused by delay between the fetching of instructions and decisions about changes in control flow (branches and jumps). Time (clock cycles) I ALU Reg Ifetch Reg DMem n s ALU Reg Ifetch Reg DMem t r. ALU Ifetch Reg DMem Reg O ALU Ifetch Reg DMem Reg r d e r 6 1/23/09 CS654 W&M

2) The Principle of Locality • The Principle of Locality: – Program access a relatively small portion of the address space at any instant of time. • Two Different Types of Locality: – Temporal Locality (Locality in Time): If an item is referenced, it will tend to be referenced again soon (e.g., loops, reuse) – Spatial Locality (Locality in Space): If an item is referenced, items whose addresses are close by tend to be referenced soon (e.g., straight-line code, array access) • Last 30 years, HW relied on locality for memory perf. MEM P $ 7 1/23/09 CS654 W&M

Levels of the Memory Hierarchy Capacity Staging Access Time Xfer Unit Cost Upper Level CPU Registers Registers 100s Bytes prog./compiler 300 – 500 ps (0.3-0.5 ns) Instr. Operands faster 1-8 bytes L1 Cache L1 and L2 Cache 10s-100s K Bytes cache cntl Blocks ~1 ns - ~10 ns 32-64 bytes $1000s/ GByte L2 Cache cache cntl Blocks 64-128 bytes Main Memory G Bytes Memory 80ns- 200ns ~ $100/ GByte OS Pages 4K-8K bytes Disk 10s T Bytes, 10 ms Disk (10,000,000 ns) ~ $1 / GByte user/operator Files Mbytes Larger Tape Tape Lower Level infinite sec-min ~$1 / GByte 8 1/23/09 CS654 W&M

3) Focus on the Common Case • Common sense guides computer design – Since it's engineering, common sense is valuable • In making a design trade-off, favor the frequent case over the infrequent case – E.g., Instruction fetch and decode unit used more frequently than multiplier, so optimize it 1st – E.g., If database server has 50 disks / processor, storage dependability dominates system dependability, so optimize it 1st • Frequent case is often simpler and can be done faster than the infrequent case – E.g., overflow is rare when adding 2 numbers, so improve performance by optimizing more common case of no overflow – May slow down overflow, but overall performance improved by optimizing for the normal case • What is frequent case and how much performance improved by making case faster => Amdahl’s Law 9 1/23/09 CS654 W&M

4) Amdahl’s Law Fraction � � enhanced ExTime ExTime Fraction ( 1 ) = � � + new old enhanced � � Speedup enhanced � � ExTime 1 old Speedup = = overall Fraction ExTime enhanced Fraction new ( 1 ) � + enhanced Speedup enhanced Best you could ever hope to do: 1 Speedup = maximum 1 - Fraction ( ) enhanced 10 1/23/09 CS654 W&M

Amdahl’s Law example • New CPU 10X faster • I/O bound server, so 60% time waiting for I/O 1 Speedup = overall Fraction ( ) 1 Fraction enhanced � + enhanced Speedup enhanced 1 1 1 . 56 = = = 0.4 0 . 64 ( ) 1 0.4 � + 10 • Apparently, its human nature to be attracted by 10X faster, vs. keeping in perspective its just 1.6X faster 11 1/23/09 CS654 W&M

CPI 5) Processor performance equation inst count Cycle time CPU time = Seconds = Instructions x Cycles x Seconds Program Program Instruction Cycle Inst Count CPI Clock Rate Program X Compiler X (X) Inst. Set. X X Organization X X Technology X 12 1/23/09 CS654 W&M

At this point … • Computer Architecture >> instruction sets • Computer Architecture skill sets are different – 5 Quantitative principles of design – Quantitative approach to design – Solid interfaces that really work – Technology tracking and anticipation • Computer Science at the crossroads from sequential to parallel computing – Salvation requires innovation in many fields, including computer architecture • However for CS654, we have to go through the state of the art first: – Material: read Chapter 1, then Appendix A in Hennessy/Patterson 13 1/23/09 CS654 W&M

CS654 Advanced Computer Architecture Lec 2 - Introduction Peter - PowerPoint PPT Presentation

CS654 Advanced Computer Architecture Lec 2 - Introduction Peter Kemper Adapted from the slides of EECS 252 by Prof. David Patterson Electrical Engineering and Computer Sciences University of California, Berkeley Outline Computer Science

CS654 Advanced Computer Architecture Lec 1 - Introduction Peter Kemper Adapted from the slides

CS654 Advanced Computer Architecture Lec 3 - Introduction Peter Kemper Adapted from the slides

CS654 Advanced Computer Architecture Lec 4 - Introduction Peter Kemper Adapted from the slides

CS654 Advanced Computer Architecture Lec 12 Vector Wrap-up and Multiprocessor Introduction

CS654 Advanced Computer Architecture Lec 5 Performance + Pipeline Review Peter Kemper

CS654 Advanced Computer Architecture Lec 9 Limits to ILP and Simultaneous Multithreading

CS654 Advanced Computer Architecture Lec 8 Memory Hierarchy Review Peter Kemper Adapted

CS654 Advanced Computer Architecture Lec 8 Instruction Level Parallelism Peter Kemper

CS654 Advanced Computer Architecture Lec 14 Directory Based Multiprocessors Peter Kemper

CacheAddressingBasics CS654 September27,2001 WhatisaCache?

EECS 252 Graduate Computer Architecture Lec 1 - Introduction David Culler Electrical

Introduction to Reinforcement Learning LEC 01 : Dynamic Programming Professor Scott Moura

Eulers Function s.t. k rel. prime to n Albert R Meyer March 9, 2012 lec 5F.1 Albert R Meyer

EECS 252 Graduate Computer Architecture Lec 7 Dynamically Scheduled Instruction Processing

When Software meets Hardware Faults Hao Han hhan@cs.wm.edu 7 April 2009 Some slides are

ENE 2XX: Renewable Energy Systems and Control LEC 05 : Dynamic Programming Professor Scott Moura

Outline Digital CMOS design Arithmetic operators Adders Adders Comparators Shifters

Tuesday, 20 October 2015 Check your exams! (problem 10) --see email Questions about yesterdays

Egyptian Numerals

1 2-bits, 4-bits, 6-bits, a dollar The ripple carry adder o We have a 1-bit ALU that performs

Introduction to GPGPUs Mark Greenstreet CpSc 418 Mar. 3, 2017 GPUs Early geometry

Mathematical Preliminaries Ling 324 Reading: Basic Concepts of Set Theory Outline Set theory

Formal Languages & Regular Expressions Cartesian Products Definition 1 Let n Z + , and

Probability and Random Processes Lecture 9 Extensions to measures Product measure Mikael

CS654 Advanced Computer Architecture Lec 2 - Introduction Peter - PowerPoint PPT Presentation

CS654 Advanced Computer Architecture Lec 2 - Introduction Peter Kemper Adapted from the slides of EECS 252 by Prof. David Patterson Electrical Engineering and Computer Sciences University of California, Berkeley Outline Computer Science

CS654 Advanced Computer Architecture Lec 1 - Introduction Peter Kemper Adapted from the slides

CS654 Advanced Computer Architecture Lec 3 - Introduction Peter Kemper Adapted from the slides

CS654 Advanced Computer Architecture Lec 4 - Introduction Peter Kemper Adapted from the slides

CS654 Advanced Computer Architecture Lec 12 Vector Wrap-up and Multiprocessor Introduction

CS654 Advanced Computer Architecture Lec 5 Performance + Pipeline Review Peter Kemper

CS654 Advanced Computer Architecture Lec 9 Limits to ILP and Simultaneous Multithreading

CS654 Advanced Computer Architecture Lec 8 Memory Hierarchy Review Peter Kemper Adapted

CS654 Advanced Computer Architecture Lec 8 Instruction Level Parallelism Peter Kemper

CS654 Advanced Computer Architecture Lec 14 Directory Based Multiprocessors Peter Kemper

CacheAddressingBasics CS654 September27,2001 WhatisaCache?

EECS 252 Graduate Computer Architecture Lec 1 - Introduction David Culler Electrical

Introduction to Reinforcement Learning LEC 01 : Dynamic Programming Professor Scott Moura

Eulers Function s.t. k rel. prime to n Albert R Meyer March 9, 2012 lec 5F.1 Albert R Meyer

EECS 252 Graduate Computer Architecture Lec 7 Dynamically Scheduled Instruction Processing

When Software meets Hardware Faults Hao Han hhan@cs.wm.edu 7 April 2009 Some slides are

ENE 2XX: Renewable Energy Systems and Control LEC 05 : Dynamic Programming Professor Scott Moura

Outline Digital CMOS design Arithmetic operators Adders Adders Comparators Shifters

Tuesday, 20 October 2015 Check your exams! (problem 10) --see email Questions about yesterdays

Egyptian Numerals

1 2-bits, 4-bits, 6-bits, a dollar The ripple carry adder o We have a 1-bit ALU that performs

Introduction to GPGPUs Mark Greenstreet CpSc 418 Mar. 3, 2017 GPUs Early geometry

Mathematical Preliminaries Ling 324 Reading: Basic Concepts of Set Theory Outline Set theory

Formal Languages &amp; Regular Expressions Cartesian Products Definition 1 Let n Z + , and

Probability and Random Processes Lecture 9 Extensions to measures Product measure Mikael

Formal Languages & Regular Expressions Cartesian Products Definition 1 Let n Z + , and