Computer Architecture An Introduction Virendra Singh Associate - - PowerPoint PPT Presentation

computer architecture
SMART_READER_LITE
LIVE PREVIEW

Computer Architecture An Introduction Virendra Singh Associate - - PowerPoint PPT Presentation

Computer Architecture An Introduction Virendra Singh Associate Professor Computer Architecture and Dependable Systems Lab Department of Electrical Engineering Indian Institute of Technology Bombay http://www.ee.iitb.ac.in/~viren/ E-mail:


slide-1
SLIDE 1

CADSL

Computer Architecture

An Introduction

Virendra Singh

Associate Professor Computer Architecture and Dependable Systems Lab Department of Electrical Engineering Indian Institute of Technology Bombay

http://www.ee.iitb.ac.in/~viren/ E-mail: viren@ee.iitb.ac.in

EE-717/453:Advance Computing for Electrical Engineers

Lecture 14 (05 Sep 2013)

slide-2
SLIDE 2

CADSL

  • Rely on abstraction layers to

manage complexity

Applications Technology Computer Architecture

Computer Architecture

Quantum Physics Transistors & Devices Logic Gates & Memory Von Neumann Machine x86 Machine Primitives Visual C++ Firefox, MS Excel Windows 7

05 Sep 2013 2 EE-717/453@IITB

slide-3
SLIDE 3

CADSL

Running Program on Processor

Processor Performance = --------------- Time

Program

Architecture

Compiler Designer Instruction s Time Program Instructio n (code size)

= X

05 Sep 2013 3 EE-717/453@IITB

slide-4
SLIDE 4

CADSL

Computer Architecture

  • Instruction Set Architecture (IBM 360)

– … the attributes of a [computing] system as seen

by the programmer. I.e. the conceptual structure and functional behavior, as distinct from the

  • rganization of the data flows and controls, the

logic design, and the physical implementation. -- Amdahl, Blaaw, & Brooks, 1964

05 Sep 2013 4 EE-717/453@IITB

slide-5
SLIDE 5

CADSL

Iron Law

  • Instructions/Program

– Instructions executed, not static code

size

– Determined by algorithm, compiler, ISA

05 Sep 2013 5 EE-717/453@IITB

slide-6
SLIDE 6

CADSL

Running Program on Processor

Processor Performance = --------------- Time Program

Architecture --> Implementation

Compiler Designer Processor Designer Instruction s Cycles Program Instructio n Time Cycl e (code size)

= X X

(CPI)

05 Sep 2013 6 EE-717/453@IITB

slide-7
SLIDE 7

CADSL

Iron Law

  • Instructions/Program

– Instructions executed, not static code size – Determined by algorithm, compiler, ISA

  • Cycles/Instruction

– Determined by ISA and CPU

  • rganization

– Overlap among instructions reduces

this term

05 Sep 2013 7 EE-717/453@IITB

slide-8
SLIDE 8

CADSL

Running Program on Processor

Processor Performance = --------------- Time Program

Architecture --> Implementation --> Realization

  • mpiler Designer Processor Designer Chip Designer

Instruction s Cycles Program Instructio n Time Cycl e (code size)

= X X

(CPI) (cycle time)

05 Sep 2013 8 EE-717/453@IITB

slide-9
SLIDE 9

CADSL

Iron Law

  • Instructions/Program

– Instructions executed, not static code size – Determined by algorithm, compiler, ISA

  • Cycles/Instruction

– Determined by ISA and CPU organization – Overlap among instructions reduces this term

  • Time/cycle

– Determined by technology,

  • rganization, clever circuit design

05 Sep 2013 9 EE-717/453@IITB

slide-10
SLIDE 10

CADSL

Computer Architecture’s Changing Definition

  • 1950s to 1960s:

Computer Architecture Course = Computer Arithmetic

  • 1970s to mid 1980s:

Computer Architecture Course = Instruction Set Design, especially ISA appropriate for compilers

  • 1990s onwards:

Computer Architecture Course = Design of CPU (Processor Microarchitecture), memory system, I/O system, Multiprocessors

10 Aug 2013 virendra@Raisoni 10

slide-11
SLIDE 11

CADSL

Computer Architecture

  • Exercise in engineering tradeoff analysis

– Find the fastest/cheapest/power-efficient/etc.

solution

– Optimization problem with 100s of variables

  • All the variables are changing

– At non-uniform rates – With inflection points

  • Two high-level effects:

– Technology push – Application Pull

05 Sep 2013 11 EE-717/453@IITB

slide-12
SLIDE 12

CADSL

Technology

  • Technology advances at astounding rate

– 19th century: attempts to build mechanical

computers

– Early 20th century: mechanical counting systems

(cash registers, etc.)

– Mid 20th century: vacuum tubes as switches – Since: transistors, integrated circuits

  • 1965: Moore’s law [Gordon Moore]

– Predicted doubling of IC capacity every 18

months

– Has held and will continue to hold

  • Drives functionality, performance, cost

– Exponential improvement for 40+ years

05 Sep 2013 12 EE-717/453@IITB

slide-13
SLIDE 13

CADSL

Technology Push

  • What do these two intervals have in

common?

– 1776-2011 (235 years) – 2011-2013 (2 years)

  • Answer: Equal progress in processor speed!
  • The power of exponential growth!
  • Driven by Moore’s Law
  • Devices per chip doubles every 18-24 months
  • Computer architects turn additional resources

into

  • Speed
  • Power savings
  • Functionality

05 Sep 2013 13 EE-717/453@IITB

slide-14
SLIDE 14

CADSL

Application Pull

  • Corollary to Moore’s Law:

Cost halves every two years

In a decade you can buy a computer for less than its sales tax today. –Jim Gray

Computers cost-effective for

– National security – weapons design – Enterprise computing – banking – Departmental computing – computer-aided

design

– Personal computer – spreadsheets, email, web – Mobile computing – GPS, location-aware,

ubiquitous

05 Sep 2013 14 EE-717/453@IITB

slide-15
SLIDE 15

CADSL

Application Pull

  • What about the future?

– E.g. weather forecasting computational demand

  • Must dream up applications that are not cost-

effective today

– Virtual reality, telepresence – Web agents, social networking – Wireless, location-aware – Proactive (beyond interactive) w/ sensors – Recognition/Mining/Synthesis (RMS) – ???

  • This is your job!

05 Sep 2013 15 EE-717/453@IITB

slide-16
SLIDE 16

CADSL

16

Microprocessor Designs

10 Aug 2013 virendra@Raisoni

slide-17
SLIDE 17

CADSL

Trends

  • Moore’s Law for device integration
  • Chip power consumption
  • Single-thread performance trend

[source: Intel]

05 Sep 2013 17 EE-717/453@IITB

slide-18
SLIDE 18

CADSL

Performance and Cost

Airplane Passengers Range (mi) Speed (mph) Boeing 737-100 101 630 598 Boeing 747 470 4150 610 BAC/Sud Concorde 132 4000 1350 Douglas DC-8-50 146 8720 544

 Which of the following airplanes has the best

performance?

 How much faster is the Concorde vs. the 747?  How much bigger is the 747 vs. DC-8?

05 Sep 2013 18 EE-717/453@IITB

slide-19
SLIDE 19

CADSL

Performance and Cost

  • Which computer is fastest?
  • Not so simple

– Scientific simulation – FP performance – Program development – Integer

performance

– Database workload – Memory, I/O

05 Sep 2013 19 EE-717/453@IITB

slide-20
SLIDE 20

CADSL

Performance of Computers

  • Want to buy the fastest computer for what

you want to do?

– Workload is all-important – Correct measurement and analysis

Want to design the fastest computer for what the customer wants to pay?

– Cost is an important criterion

05 Sep 2013 20 EE-717/453@IITB

slide-21
SLIDE 21

CADSL

Defining Performance

  • What is important to whom?
  • Computer system user

– Minimize elapsed time for program

= time_end – time_start

– Called response time

Computer center manager

– Maximize completion rate = #jobs/second – Called throughput

05 Sep 2013 21 EE-717/453@IITB

slide-22
SLIDE 22

CADSL

Response Time vs. Throughput

  • Is throughput = 1/av. response time?

– Only if NO overlap – Otherwise, throughput > 1/av. response time – E.g. a lunch buffet – assume 5 entrees – Each person takes 2 minutes/entrée – Throughput is 1 person every 2 minutes – BUT time to fill up tray is 10 minutes – Why and what would the throughput be

  • therwise?
  • 5 people simultaneously filling tray (overlap)
  • Without overlap, throughput = 1/10

05 Sep 2013 22 EE-717/453@IITB

slide-23
SLIDE 23

CADSL

What is Performance for us?

  • For computer architects

– CPU time = time spent running a program

Intuitively, bigger should be faster, so:

– Performance = 1/X time, where X is

response, CPU execution, etc.

Elapsed time = CPU time + I/O wait We will concentrate on CPU time

05 Sep 2013 23 EE-717/453@IITB

slide-24
SLIDE 24

CADSL

Improve Performance

  • Improve

– response time or – throughput?

  • Faster CPU [Faster is better – Scale up]
  • Helps both 1 and 2
  • Add more CPUs [More is better – Scale
  • ut]
  • Helps 2 and perhaps 1 due to less

queueing

05 Sep 2013 24 EE-717/453@IITB

slide-25
SLIDE 25

CADSL

Performance Comparison

  • Machine A is n times faster than machine B iff

perf(A)/perf(B) = time(B)/time(A) = n Machine A is x% faster than machine B iff

– perf(A)/perf(B) = time(B)/time(A) = 1 + x/100

E.g. time(A) = 10s, time(B) = 15s

– 15/10 = 1.5 => A is 1.5 times faster than B – 15/10 = 1.5 => A is 50% faster than B

05 Sep 2013 25 EE-717/453@IITB

slide-26
SLIDE 26

CADSL

Breaking Down Performance

  • A program is broken into instructions

– H/W is aware of instructions, not programs

At lower level, H/W breaks instructions into cycles

– Lower level state machines change state every

cycle For example:

– 1GHz Snapdragon runs 1000M cycles/sec, 1 cycle =

1ns

– 2.5GHz Core i7 runs 2.5G cycles/sec, 1 cycle = 0.25ns

05 Sep 2013 26 EE-717/453@IITB

slide-27
SLIDE 27

CADSL

Other Metrics

  • MIPS and MFLOPS
  • MIPS

= instruction count/(execution time x 106) = clock rate/(CPI x 106)

  • But MIPS has serious shortcomings

05 Sep 2013 27 EE-717/453@IITB

slide-28
SLIDE 28

CADSL

Problems with MIPS

  • Without FP hardware, an FP op may take 50

single-cycle instructions

  • With FP hardware, only one 2-cycle

instruction

 Thus, adding FP hardware:

  • CPI increases (why?)
  • Instructions/program

decreases (why?)

  • Total execution time

decreases

 BUT, MIPS gets worse!

50/50 => 2/1 50 => 1 50 => 2 50 MIPS => 2 MIPS

05 Sep 2013 28 EE-717/453@IITB

slide-29
SLIDE 29

CADSL

Problems with MIPS

  • Ignores program
  • Usually used to quote peak performance

– Ideal conditions => guaranteed not to

exceed!

When is MIPS ok?

– Same compiler, same ISA – E.g. same binary running on AMD Phenom,

Intel Core i7

– Why? Instr/program is constant and can be

ignored

05 Sep 2013 29 EE-717/453@IITB

slide-30
SLIDE 30

CADSL

Other Metrics

  • MFLOPS = FP ops in program/(execution time x

106)

  • Assuming FP ops independent of compiler and

ISA

– Often safe for numeric codes: matrix size

determines # of FP ops/program

– However, not always safe:

  • Missing instructions (e.g. FP divide)
  • Optimizing compilers
  • Relative MIPS and normalized MFLOPS

– Adds to confusion

05 Sep 2013 30 EE-717/453@IITB

slide-31
SLIDE 31

CADSL

Rules

  • Use ONLY Time
  • Beware when reading, especially if details

are omitted

  • Beware of Peak

– “Guaranteed not to exceed”

05 Sep 2013 31 EE-717/453@IITB

slide-32
SLIDE 32

CADSL

Iron Law Example

  • Machine A: clock 1ns, CPI 2.0, for program x
  • Machine B: clock 2ns, CPI 1.2, for program x
  • Which is faster and how much?

Time/Program = instr/program x cycles/instr x sec/cycle

Time(A) = N x 2.0 x 1 = 2N Time(B) = N x 1.2 x 2 = 2.4N Compare: Time(B)/Time(A) = 2.4N/2N = 1.2

  • So, Machine A is 20% faster than Machine B for

this program

05 Sep 2013 32 EE-717/453@IITB

slide-33
SLIDE 33

CADSL

Iron Law Example

Keep clock(A) @ 1ns and clock(B) @2ns For equal performance, if CPI(B)=1.2, what is CPI(A)?

Time(B)/Time(A) = 1 = (Nx2x1.2)/(Nx1xCPI(A)) CPI(A) = 2.4

05 Sep 2013 33 EE-717/453@IITB

slide-34
SLIDE 34

CADSL

Iron Law Example

  • Keep CPI(A)=2.0 and CPI(B)=1.2
  • For equal performance, if clock(B)=2ns,

what is clock(A)?

Time(B)/Time(A) = 1 = (N x 2.0 x clock(A))/(N x 1.2 x 2) clock(A) = 1.2ns

05 Sep 2013 34 EE-717/453@IITB

slide-35
SLIDE 35

CADSL

Which Programs

  • Execution time of what program?
  • Best case – your always run the same set of

programs

– Port them and time the whole workload

In reality, use benchmarks

– Programs chosen to measure performance – Predict performance of actual workload – Saves effort and money – Representative? Honest? Benchmarketing…

05 Sep 2013 35 EE-717/453@IITB

slide-36
SLIDE 36

CADSL

How to Average

  • One answer: for total execution time,

how much faster is B? 9.1x

Machine A Machine B Program 1 1 10 Program 2 1000 100 Total 1001 110

05 Sep 2013 36 EE-717/453@IITB

slide-37
SLIDE 37

CADSL

How to Average

  • Another: arithmetic mean (same result)
  • Arithmetic mean of times:
  • AM(A) = 1001/2 = 500.5
  • AM(B) = 110/2 = 55
  • 500.5/55 = 9.1x
  • Valid only if programs run equally often, so

use weighted arithmetic mean:

n i time

n i

1 ) (

1

×      ∑

=

( )

n i time i weight

n i

1 ) ( ) (

1

×       ×

=

05 Sep 2013 37 EE-717/453@IITB

slide-38
SLIDE 38

CADSL

Other Averages

  • E.g., 30 mph for first 10 miles, then 90

mph for next 10 miles, what is average speed?

  • Average speed = (30+90)/2 WRONG
  • Average speed = total distance / total time

= (20 / (10/30 + 10/90)) = 45 mph

05 Sep 2013 38 EE-717/453@IITB

slide-39
SLIDE 39

CADSL

Harmonic Mean

  • Harmonic mean of rates =
  • Use HM if forced to start and end with rates

(e.g. reporting MIPS or MFLOPS)

  • Why?

– Rate has time in denominator – Mean should be proportional to inverse of sums

  • f time (not sum of inverses)

– See: J.E. Smith, “Characterizing computer

performance with a single number,” CACM Volume 31 , Issue 10 (October 1988), pp. 1202- 1206.

     ∑

= n i

n rate n

1

) ( 1

05 Sep 2013 39 EE-717/453@IITB

slide-40
SLIDE 40

CADSL

Dealing with Ratios

  • If we take ratios with respect to machine

A

Machine A Machine B Program 1 1 10 Program 2 1000 100 Total 1001 110 Machine A Machine B Program 1 1 10 Program 2 1 0.1

05 Sep 2013 40 EE-717/453@IITB

slide-41
SLIDE 41

CADSL

Dealing with Ratios

  • Average for machine A is 1, average for

machine B is 5.05

  • If we take ratios with respect to machine

B

  • Can’t both be true!!!
  • Don’t use arithmetic mean on ratios!

Machine A Machine B Program 1 0.1 1 Program 2 10 1 Average 5.05 1

05 Sep 2013 41 EE-717/453@IITB

slide-42
SLIDE 42

CADSL

Geometric Mean

  • Use geometric mean for ratios
  • Geometric mean of ratios =
  • Independent of reference machine
  • In the example, GM for machine a is 1, for

machine B is also 1

– Normalized with respect to either machine

n n i

i ratio

=1

) (

05 Sep 2013 42 EE-717/453@IITB

slide-43
SLIDE 43

CADSL

But…

  • GM of ratios is not proportional to total time
  • AM in example says machine B is 9.1 times

faster

  • GM says they are equal
  • If we took total execution time, A and B are

equal only if

– Program 1 is run 100 times more often than

program 2

  • Generally, GM will mispredict for three or more

machines

05 Sep 2013 43 EE-717/453@IITB

slide-44
SLIDE 44

CADSL

Summary

  • Use AM for times
  • Use HM if forced to use rates
  • Use GM if forced to use ratios
  • Best of all, use unnormalized numbers to

compute time

05 Sep 2013 44 EE-717/453@IITB

slide-45
SLIDE 45

CADSL

Benchmarks: SPEC2000

  • System Performance Evaluation

Cooperative

– Formed in 80s to combat benchmarketing – SPEC89, SPEC92, SPEC95, SPEC2000,

SPEC2006

12 integer and 14 floating-point programs

– Sun Ultra-5 300MHz reference machine has

score of 100

– Report GM of ratios to reference machine

05 Sep 2013 45 EE-717/453@IITB

slide-46
SLIDE 46

CADSL

Benchmarks: SPEC CINT2000

Benchmark Description 164.gzip Compression 175.vpr FPGA place and route 176.gcc C compiler 181.mcf Combinatorial optimization 186.crafty Chess 197.parser Word processing, grammatical analysis 252.eon Visualization (ray tracing) 253.perlbmk PERL script execution 254.gap Group theory interpreter 255.vortex Object-oriented database 256.bzip2 Compression 300.twolf Place and route simulator

05 Sep 2013 46 EE-717/453@IITB

slide-47
SLIDE 47

CADSL

Benchmarks: SPEC CFP2000

Benchmark Description 168.wupwise Physics/Quantum Chromodynamics 171.swim Shallow water modeling 172.mgrid Multi-grid solver: 3D potential field 173.applu Parabolic/elliptic PDE 177.mesa 3-D graphics library 178.galgel Computational Fluid Dynamics 179.art Image Recognition/Neural Networks 183.equake Seismic Wave Propagation Simulation 187.facerec Image processing: face recognition 188.ammp Computational chemistry 189.lucas Number theory/primality testing 191.fma3d Finite-element Crash Simulation 200.sixtrack High energy nuclear physics accelerator design 301.apsi Meteorology: Pollutant distribution

05 Sep 2013 47 EE-717/453@IITB

slide-48
SLIDE 48

CADSL

Benchmark Pitfalls

  • Benchmark not representative

– Your workload is I/O bound, SPEC is

useless

Benchmark is too old

– Benchmarks age poorly; benchmarketing

pressure causes vendors to optimize compiler/hardware/software to benchmarks

– Need to be periodically refreshed

05 Sep 2013 48 EE-717/453@IITB

slide-49
SLIDE 49

CADSL

Instruction Set Design

  • How complex instruction?
  • Church’s thesis: A very primitive computer can

compute anything that a fancy computer can compute – you need only arithmetic/logical functions, read and write memory, and data- dependent decisions

  • Therefore, ISA selected for practical reasons:

– Performance and cost, not computability

  • Regularity tends to improve both

– E.g. H/W to handle arbitrary number of

  • perands is complex and slow and

UNNECESSARY

28 Aug 2012 EE-717/EE-453@IITB 49