1 The CPU Performance Equation Instruction-level Parallelism (ILP) - - PDF document

1
SMART_READER_LITE
LIVE PREVIEW

1 The CPU Performance Equation Instruction-level Parallelism (ILP) - - PDF document

Traditional Computer Architecture The term architecture is used here to describe the attribute of a system as seen Lecture 1: Introduction by the programmer, i.e., the conceptual structure and functional behavior as distinct from the


slide-1
SLIDE 1

1

Lecture 1: Introduction

CprE 585 Advanced Computer Architecture, Fall 2003 Zhao Zhang Traditional “Computer Architecture” The term architecture is used here to describe the attribute of a system as seen by the programmer, i.e., the conceptual structure and functional behavior as distinct from the organization of the data flow and controls, the logic design, and the physical implementation.

Gene Amdahl, IBM Journal R&D, April 1964

Contemporary “Computer Architecture”

Instruction set architecture: program-visible instruction set

Instruction format, memory addressing modes,

architectural registers

EX: RISC, CISC, VLIW, EPIC

Organization: high-level aspects of a computer’s design

Pipeline stages, instruction scheduling, cache,

memory, disks, buses, etc.

Implementations: the specifics of a machine

Logic design, packaging technology

Comprehensive Course Contents Fundamentals Processor architecture Memory hierarchy I/O systems Multiprocessors Multicomputers Contents of This Course

1.

Fundamentals: ISA design principles, evaluation methodology, market factors in computer design

  • 2. Processor architecture: We will focus on ILP techniques of

modern superscalar processors

  • Multiple-issue
  • Dynamically scheduling
  • Speculative execution
  • Non-blocking load/stores
  • 3. Memory hierarchy
  • Cache basics
  • Multi-level caches and memory system designs
  • Advanced cache techniques

4.

Brief coverage of VLIW and EPIC processors, storage systems and multiprocessors

  • 5. Selected research topics: multi-threaded processors, embedded

processor, low power arch., etc.

Your Background Some digital design knowledge RISC Istruction set architecture (MIPS) Arithmetic design Control and data path design Single-cycle processor implementation Multi-cycle implementation Pipelined implementation

slide-2
SLIDE 2

2

The CPU Performance Equation CPU time = #Inst × CPI × Clock cycle time CPI = CPIideal+CPIcontrol hazard+CPIdata hazard Instruction-level Parallelism (ILP)

LD F2,45(R3) MULTI F0,F2,F4 LD F6,34(R2) SUBD F8,F6,F2 DIVD F10,F0,F6 ADD F6,F8,F2

LD2 LD1 MULTI SUBD DIVD ADD Given infinite resources, how fast can the processor run the code?

Multi-issue

IF ID EX MEM WB IF ID EX MEM WB IF ID EX MEM WB Stall check Data forwarding Single-issue Two-way issue

Static and Dynamic Scheduling and VLIW

Static scheduling: Instructions execute in program order Dynamic scheduling: Instructions may execute out-of-order VLIW: dump hardware, compiler determines scheduling How many cycles in each case?

LD F2,45(R3) MULTI F0,F2,F4 LD F6,34(R2) SUBD F8,F6,F2 DIVD F10,F0,F6 ADD F6,F8,F2

Branch Prediction and Speculative Execution

BEQ R8, R0, skip LD F2,45(R3) MULTI F0,F2,F4 Skip: LD F6,34(R2) SUBD F8,F6,F2 DIVD F10,F0,F6 ADD F6,F8,F2

Branch outputs determine data dependence Consider typical integer programs: one branch per seven instructions How much performance loss?

A typical memory hierarchy today: Here we focus on L1/L2/L3 caches and main memory

What Is Memory Hierarchy

Proc/Regs L1-Cache L2-Cache Memory Disk, Tape, etc. Bigger Faster L3-Cache (optional)

slide-3
SLIDE 3

3

1980: no cache in µproc; 1995 2-level cache on chip (1989 first Intel µproc with a cache on chip)

Why Memory Hierarchy?

µProc 60%/yr. DRAM 7%/yr.

1 10 100 1000

1980 1981 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000

DRAM CPU

1982

Processor-Memory Performance Gap: (grows 50% / year)

Performance

“Moore’s Law”

What Else in This Course VLIW and EPIC processors Multiprocessors Storage systems Selected advanced topics (tentative list)

Simultaneous multithreading processors Embedded processors Modeling Dependability and security …

Course Schedule by Weeks (Subject to Changes)

Week 1. Introduction; Performance evaluation Week 2. ISA (Lab day) Week 3. Review of MIPS pipeline; Tomasulo Algorithm Week 3. Tomasulo Algorithm; Alpha 21264 inst scheduling Week 5. Branch prediction and speculative execution Week 6. Memory load/store unit designs Week 7. Real examples of superscalar processors Week 8. Cache fundamentals Week 9. Cache optimization techniques Week 10. Virtual memory; Exam Week 11-15. Advanced topics, student presentations

Course Projects

You will work in groups of two: Preliminary project: get warmed up Verilog Project 1: Dynamic instruction scheduling

Tomasulo algorithm Alpha 21264 instruction scheduling

Verilog Project 2: Branch prediction and speculative execution

Branch prediction table, branch target buffer Recovery through reorder buffer

Verilog Project 3: Cache and TLB

Direct mapped cache Direct mapped TLB

Final Project: On selected research topics

Re-evaluate an existing study; or survey on a topic Including proposal, presentation, and final report

Verilog Code Sketch

module cpu (reset, cycle, clock); // tomasulo with MIPS32 … /* stage 1: inst fetch */ inst_fetch M1(/* request */fetch_req, /* ok */fetch_ok, /* pc */pc, /* inst */inst, /* reset */reset, /* branch */0, /* branch target */0, /* clock */clock); /* stage 2: rename, register read, issue */ rename M2(fetch_req, …); /* stage 3: execute */ adder(request, …); // fu adder with RS … /* stage 4: write back */ … endmodule

Still under construction

Syllabus, Class web site, WebCT

Syllabus Course Schedule Textbook and references Projects Homework Exam Grading On class web site (found it from my home page) Check announcements Get papers etc On WebCT Check your grades Join discussions: Verilog programming, project understanding, course contents, homework problems

slide-4
SLIDE 4

4

Major Faces in Today’s Market

To know some non-technical background for processor design

Desktop computers

Providing desktop computing for individuals Optimized for price-performance

Servers

Providing larger-scale and more reliable file and computing

service

Designed for performance, availability, and scalability

Embedded computers

Lodged in other devices (networking switches, printer, palm,

cell phone, etc.)

Emphasizing real-time performance requirements Emphasizing low cost and low power design

Technology Trends

Implementation technologies change dramatically

Integrated circuit logic technology Semiconductor DRAM Magnetic disk technology Network technology

ISA must be stable: software is more expensive than hardware

Cost, Price, and Their Trends Cost and price may determine if a computer product will be successful in markets In many cases cost is the single important factor in design considerations

Add a new feature or not? Trade performance with cost and price

Especially true for desktop and embedded market

Processor Performance Trends Processor Price Trend Cost of IC

yield Die per wafer Dies wafer

  • f

Cost ×

yield test Final Test Final & Packing Testing die

  • f

Cost + +

Cost of die =

slide-5
SLIDE 5

5

Die Yield

diameter Wafer : d area Die 2 d π area Die (d/2) π per wafer Dies

2

× × − × = level Masking : α area unit per Defects : D ) α area Die D (1 yield Wafer yield Die

α −

× + × =

Die Yield

Wafer diameter size: 30cm, Defect density: 0.6 per cm2 Mask level (α): 4 0.7cm ×0.7cm 0.75 1cm×1cm 0.57 1.5 cm×1.5cm 0.44 2cm ×2cm 0.35 Processor cost is more than linear to performance! Price increase is even more! (See textbook) level Masking : α area unit per Defects : D ) α area Die D (1 yield Wafer yield Die

α −

× + × =

DRAM Price Trend End of Lecture Course Strategies

Learn the fundamentals of computer architecture design Learn the most important aspects (at this time) of computer architecture: superscalar processors and memory hierarchy Be exposed on the other topics: storage systems and multiprocessors To appreciate the merits in computer architecture research Remember: hot topics tomorrow may be different

Notes Add slides for multiple-issue, branch prediction, load/store units, and memory hierarchy Add slides for Tomasulo and Alpha-21264 like scheduling Add slides for course scheduling