CS184b: Computer Architecture [Single Threaded Architecture: - - PDF document

cs184b computer architecture single threaded architecture
SMART_READER_LITE
LIVE PREVIEW

CS184b: Computer Architecture [Single Threaded Architecture: - - PDF document

CS184b: Computer Architecture [Single Threaded Architecture: abstractions, quantification, and optimizations] Day16: March 8, 2001 Review and Retrospection Caltech CS184b Winter2001 -- DeHon 1 Today This Quarter What is


slide-1
SLIDE 1

1

Caltech CS184b Winter2001 -- DeHon 1

CS184b: Computer Architecture [Single Threaded Architecture: abstractions, quantification, and

  • ptimizations]

Day16: March 8, 2001 Review and Retrospection

Caltech CS184b Winter2001 -- DeHon 2

Today

  • This Quarter

– What is Architecture? – Why? – Optimizations w/in Model – Themes

  • Next Quarter

– beyond a single thread of control

  • Admin: final
slide-2
SLIDE 2

2

Caltech CS184b Winter2001 -- DeHon 3

CS184 Sequence

  • A - structure and organization

– raw components, building blocks – design space

  • B - single threaded architecture

– emphasis on abstractions and optimizations including quantification

  • C - multithreaded architecture

Caltech CS184b Winter2001 -- DeHon 4

“Architecture”

  • “attributes of a system as seen by the

programmer”

  • “conceptual structure and functional

behavior”

  • Defines the visible interface between the

hardware and software

  • Defines the semantics of the program

(machine code)

slide-3
SLIDE 3

3

Caltech CS184b Winter2001 -- DeHon 5

Architecture distinguished from Implementation

  • IA32 architecture vs.

– 80486DX2, AMD K5, Intel Pentium-II-700

  • VAX architectures vs.

– 11/750, 11/780, uVax-II

  • PowerPC vs.

– PPC 601, 604, 630 …

  • Alpha vs.

– EV4, 21164, 21264, …

  • Admits to many different implementations
  • f single architecture

Caltech CS184b Winter2001 -- DeHon 6

Value?

  • Abstraction
  • Effort

– human brain time is key bottleneck/scarce resource in exploiting modern computing technology

  • Economics
  • Software Distribution
  • capture and package meaning

– pragmatic of failure of software engineering

slide-4
SLIDE 4

4

Caltech CS184b Winter2001 -- DeHon 7

Fixed Points

  • Must “fix” the interface
  • Trick is picking what to expose in the

interface and fix, and what to hide

  • What are the “fixed points?”

– how you describe the computation – primitive operations the machine understands – primitive data types – interface to memory, I/O – interface to system routines?

Caltech CS184b Winter2001 -- DeHon 8

Abstract Away?

  • Specific sizes

– what fits in on-chip memory – available memory (to some extent) – number of peripherals – 0, 1, infinity

  • Timing

– individual operations – resources (e.g. memory)

slide-5
SLIDE 5

5

Caltech CS184b Winter2001 -- DeHon 9

Optimizations

  • Simple Sequential Model
  • Pipeline

– hazards, interlocking

  • Multiple Instructions / Cycle

– out of order completion, issue – renaming, scoreboarding

  • Branch prediction, predication
  • Speculation
  • Memory Optimization
  • Translation to different underlying org.

Caltech CS184b Winter2001 -- DeHon 10

Simple Seq. Model

Do one instruction completely; then do next instruction.

slide-6
SLIDE 6

6

Caltech CS184b Winter2001 -- DeHon 11

Pipeline

  • [todo: draw DP with bypass muxes]

Caltech CS184b Winter2001 -- DeHon 12

Pipelining

  • Watch Data Hazards: bypass stall
  • Watch Control Hazards:

– minimize cycle, predict, flush

  • Watch Exceptions: in-order retire to state
slide-7
SLIDE 7

7

Caltech CS184b Winter2001 -- DeHon 13

ILP (available)

Hennessy and Patterson 4.38

Caltech CS184b Winter2001 -- DeHon 14

Supporting ILP

IF ID Reorder Bypass EX ALU MPY LD/ST RF

  • Rename
  • Scoreboard
  • Reorder
slide-8
SLIDE 8

8

Caltech CS184b Winter2001 -- DeHon 15

ILP Challenges: e.g. Window Size

[Hennessy and Patterson 4.39] There’s quite a bit of non-local parallelism.

Caltech CS184b Winter2001 -- DeHon 16

Branching

  • Makes stalls expensive

– potential ILP limiter – e.g.

  • with 7 instructions / branch
  • issue 7 instructions, hit branch, stall for instructions

to complete...

  • Fisher: Instructions/mispredict: 40-160

– even with different data sets

  • Predication: avoid losing trace on small,

unpredictable branches

– can be better to do both than branch wrong

slide-9
SLIDE 9

9

Caltech CS184b Winter2001 -- DeHon 17

Two Control Options

  • Local control

– unify choices

  • build all options into spatial compute structure and

select operation

  • Instruction selection

– provide a different instruction (instruction sequence) for each option – selection occurs when chose which instruction(s) to issue CS184a = Predication = Branching

Caltech CS184b Winter2001 -- DeHon 18

Predication: Quantification

slide-10
SLIDE 10

10

Caltech CS184b Winter2001 -- DeHon 19

Memory System

  • Motivation for Caching

– fast memories small – large memories slow – need large memories – speed of small w/ capacity/density of large

  • Programs need frequent memory access

– e.g. 20% load operations – fetch required for every instruction

  • Memory is the performance bottleneck?

– Programs run slow?

Caltech CS184b Winter2001 -- DeHon 20

Multi-Level Numbers

  • L1, 1ns, 4KB, 10% miss
  • L2, 5ns, 128KB, 1% miss
  • Main, 50ns
  • No Cache CPI=Base+0.3*50=Base+15
  • L1 only CPI=Base+0.3*0.1*50=Base +1.5
  • L2 only CPI=Base+0.3*(0.99*4+0.01*50)

=Base+1.7

  • L1/L2=Base+(0.3*0.1*5 + 0.01*50)

=Base+0.65

slide-11
SLIDE 11

11

Caltech CS184b Winter2001 -- DeHon 21

Themes for Quarter

  • Recurring

– “cached” answers and change – merit analysis (cost/performance) – dominant/bottleneck resource requirements – structure/common case

  • common case fast
  • fast case common
  • correct in every case

– exploit freedom in application – virtualization

Caltech CS184b Winter2001 -- DeHon 22

Themes for Quarter

  • New/new focus

– measurement – abstractions/semantics – abstractions 0, 1, infinity – dynamic data/event handling (vs. static) – binding times

  • compile-time vs. run-time
  • …now load time (JIT), during execution

– predictability (avg. vs. worst case)

  • feedback

– translation

slide-12
SLIDE 12

12

Caltech CS184b Winter2001 -- DeHon 23

More Themes

  • Primitives
  • Simplicity

– of model – of implementation

Caltech CS184b Winter2001 -- DeHon 24

Model and Quantitative

  • Have a model which defines the semantics

– correct behavior/operation

  • Any implementation which provides same

semantics is acceptable

  • Creates freedom to optimize and quantify

– make changes / hypothesized optimization – measure results – benchmarks relative to model – simple to change implementation below visible fixed point

slide-13
SLIDE 13

13

Caltech CS184b Winter2001 -- DeHon 25

Equations for Opt. And Understanding

  • Time= (Instructions)(Cycles/Instruction)

(Cycle Time)

  • CPI = 1 + Pstall (Stall Cycles) + Pbr-mispredict

(Branch Penalty)

  • CPI = Base CPI + Refs/Instr (Miss

Rate)(Miss Latency)

Caltech CS184b Winter2001 -- DeHon 26

Binding Time

  • Hoist code out of heavy use region if at all

possible to do earlier

– loop invariants out of loops – instruction decoding/interpretation out of commonly run regions of code – scheduling decisions from runtime

  • to compile time
  • to one-time runtime translation
slide-14
SLIDE 14

14

Caltech CS184b Winter2001 -- DeHon 27

Binding Time Optimization Prospects

  • Translation vs. Emulation

– Ttrun = Ttrans+nTop – Ttrns >Tem_op > Top

  • If compute long enough

– nTop>>Ttrans – → amortize out load

Caltech CS184b Winter2001 -- DeHon 28

Common Case (Structure)

  • Simple Instructions (fast)
  • Non-conflicting/interlocking Instructions
  • Fast/Small memory

– temporal, spatial locality, TLBs

  • Predictable control flow

– branch predict, exceptions

  • Speculation on probable properties

– trace direction, no aliasing, …

  • Compiled/optimized code

– frequently executed regions

slide-15
SLIDE 15

15

Caltech CS184b Winter2001 -- DeHon 29

Bottlenecks?

  • ALU/functional units?
  • Feeding data to operators

– bandwidth – latency

  • Parallelism

– that can expose cheaply

  • Figuring out where to go next

– accuracy – decision latency

Caltech CS184b Winter2001 -- DeHon 30

Freedom in Applications

  • DAG Scheduling of operations

– linearization, trace scheduling – increase parallelism

  • hide latency, overlap operations

– promote locality

  • Assignment of data to

– registers, addresses, pages – increase locality, decrease conflicts

  • Assign operations to ALUs

– increase locality, reduce communication

slide-16
SLIDE 16

16

Caltech CS184b Winter2001 -- DeHon 31

0, 1, Infinity

  • Virtual Memory

– abstract out physical capacity

  • Traditional RISC/CISC

– single operator per cycle (model)

  • ILP/EPIC operator exploitation

– arbitrary number of functional units

  • Registers not have this property

Caltech CS184b Winter2001 -- DeHon 32

Feedback

  • Discover the common case

– the common case for this application – ...this run of this application

  • Branch predictability/control flow
  • commonly run pieces of code

– hotspots

  • typical aliasing
  • latencies/capacities...
slide-17
SLIDE 17

17

Caltech CS184b Winter2001 -- DeHon 33

Computer Architecture Parallel to Parthenon Critique

  • Are we making:

– copies in submicron CMOS – of copies in early NMOS – of copies in discrete TTL – of vacuum tube computers?

Caltech CS184b Winter2001 -- DeHon 34

Should we still build computers the way we did in 1967?

Yesterday’s solution becomes today’s historical curiosity.

  • - Goldratt

In 1983?

slide-18
SLIDE 18

18

Caltech CS184b Winter2001 -- DeHon 35

Old vs. New?

  • Sequential ISA
  • virtual memory
  • caches
  • Multiple functional

units, ILP

  • Register Renaming
  • date back to 60’s
  • Predication
  • Feedback
  • EPIC/VLIW
  • Speculation
  • Binary Translation
  • last 10 years

Caltech CS184b Winter2001 -- DeHon 36

EPIC

  • New model

– not strictly sequential instructions – still have sequential semantics

  • control flow
  • memory access
slide-19
SLIDE 19

19

Caltech CS184b Winter2001 -- DeHon 37

Compiler

  • Increasing sophistication
  • Increasing reliance upon

– RISC

  • code motion, scheduling, register assignment

– VLIW/EPIC

  • trace scheduling, parallelism and likelihood

management

– Speculation

  • more aggressive transformation

– JIT/Binary Translation

  • takes over as means to provide model

Caltech CS184b Winter2001 -- DeHon 38

CS184 Sequence

  • A - structure and organization

– raw components, building blocks – design space

  • B - single threaded architecture

– emphasis on abstractions and optimizations including quantification

  • C - multithreaded architecture
slide-20
SLIDE 20

20

Caltech CS184b Winter2001 -- DeHon 39

Spring Quarter

  • Alternate models

– single threaded, single memory model – ...was a big limitation for us

  • Greater parallelism (beyond ILP)

– data – coarse-grained

Caltech CS184b Winter2001 -- DeHon 40

Next Quarter

  • Multithreaded Abstractions, Optimization,

and Structures

– dataflow – multithreaded – message passing – shared memory – vector/SIMD (could be single threaded) – multiprocessor interconnect – defect and fault tolerance (also single thread)

slide-21
SLIDE 21

21

Caltech CS184b Winter2001 -- DeHon 41

Admin

  • Final

– out Sunday evening – due Friday (3/16) 5pm – similar to last time

  • open book, notes…
  • work alone
  • no time restrictions beyond get done by F5pm

Caltech CS184b Winter2001 -- DeHon 42

Big Ideas

  • Architectural abstraction

– define the fixed points – stable abstraction to programmer – admit to variety of implementation – ease adoption/exploitation of new hardware – reduce human effort

slide-22
SLIDE 22

22

Caltech CS184b Winter2001 -- DeHon 43

Big Ideas

  • Optimize beneath abstraction

– exploit freedom of implementation – exploit binding time – exploit structure and common case

  • Identify bottlenecks
  • Cost/benefits analysis

– quantify tradeoffs and options

Caltech CS184b Winter2001 -- DeHon 44

End of Line

(MCP)