1 The CPU Performance Equation Instruction-level Parallelism (ILP) - PDF document

Traditional “Computer Architecture” The term architecture is used here to describe the attribute of a system as seen Lecture 1: Introduction by the programmer, i.e., the conceptual structure and functional behavior as distinct from the organization of the data CprE 585 Advanced Computer flow and controls, the logic design, and Architecture, Fall 2003 the physical implementation. Zhao Zhang � Gene Amdahl, IBM Journal R&D, April 1964 Contemporary “Computer Architecture” Comprehensive Course Contents Instruction set architecture: program-visible Fundamentals instruction set Processor architecture � Instruction format, memory addressing modes, architectural registers Memory hierarchy � EX: RISC, CISC, VLIW, EPIC I/O systems Organization: high-level aspects of a computer’s design Multiprocessors � Pipeline stages, instruction scheduling, cache, Multicomputers memory, disks, buses, etc. Implementations: the specifics of a machine � Logic design, packaging technology Contents of This Course Your Background 1. Fundamentals: ISA design principles, evaluation methodology, market factors in computer design Some digital design knowledge 2. Processor architecture : We will focus on ILP techniques of modern superscalar processors RISC Istruction set architecture (MIPS) Multiple-issue � Dynamically scheduling � Arithmetic design Speculative execution � Non-blocking load/stores � Control and data path design 3. Memory hierarchy Cache basics � Single-cycle processor implementation Multi-level caches and memory system designs � Advanced cache techniques � Multi-cycle implementation 4. Brief coverage of VLIW and EPIC processors, storage systems and multiprocessors Pipelined implementation 5. Selected research topics : multi-threaded processors, embedded processor, low power arch., etc. 1

The CPU Performance Equation Instruction-level Parallelism (ILP) LD F2 ,45(R3) CPU time = #Inst × CPI × Clock cycle time LD1 LD2 MULTI F0, F2 ,F4 LD F6 ,34(R2) MULTI SUBD CPI = CPI ideal +CPI control hazard +CPI data hazard SUBD F8 , F6 , F2 DIVD F10,F0, F6 DIVD ADD ADD F6, F8 , F2 Given infinite resources, how fast can the processor run the code? Multi-issue Static and Dynamic Scheduling and VLIW Single-issue Two-way issue LD F2 ,45(R3) Static scheduling: Instructions MULTI F0, F2 ,F4 execute in program order IF IF IF LD F6 ,34(R2) SUBD F8 , F6 , F2 ID Dynamic scheduling: Instructions ID ID DIVD F10,F0, F6 may execute out-of-order ADD F6, F8 , F2 EX EX EX MEM VLIW: dump hardware, compiler MEM MEM determines scheduling WB WB WB How many cycles in each case? Stall check Data forwarding What Is Memory Hierarchy Branch Prediction and Speculative Execution A typical memory hierarchy today: Branch outputs determine data BEQ R8, R0, skip dependence LD F2 ,45(R3) MULTI F0, F2 ,F4 Consider typical integer Proc/Regs Skip: programs: one branch per LD F6 ,34(R2) seven instructions L1-Cache SUBD F8 , F6 , F2 Bigger Faster L2-Cache DIVD F10,F0, F6 How much performance loss? ADD F6, F8 , F2 L3-Cache (optional) Memory Disk, Tape, etc. Here we focus on L1/L2/L3 caches and main memory 2

Why Memory Hierarchy? What Else in This Course VLIW and EPIC processors µProc 1000 CPU Multiprocessors 60%/yr. “Moore’s Law” Performance Storage systems 100 Processor-Memory Selected advanced topics (tentative list) Performance Gap: (grows 50% / year) � Simultaneous multithreading processors 10 DRAM � Embedded processors 7%/yr. DRAM � Modeling 1 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 � Dependability and security � … 1980: no cache in µproc; 1995 2-level cache on chip (1989 first Intel µproc with a cache on chip) Course Projects Course Schedule by Weeks (Subject to Changes) You will work in groups of two: Week 1. Introduction; Performance evaluation Preliminary project: get warmed up Week 2. ISA (Lab day) Verilog Project 1: Dynamic instruction scheduling Week 3. Review of MIPS pipeline; Tomasulo Algorithm � Tomasulo algorithm Week 3. Tomasulo Algorithm; Alpha 21264 inst scheduling � Alpha 21264 instruction scheduling Verilog Project 2: Branch prediction and speculative Week 5. Branch prediction and speculative execution execution Week 6. Memory load/store unit designs � Branch prediction table, branch target buffer Week 7. Real examples of superscalar processors � Recovery through reorder buffer Week 8. Cache fundamentals Verilog Project 3: Cache and TLB � Direct mapped cache Week 9. Cache optimization techniques � Direct mapped TLB Week 10. Virtual memory; Exam Final Project: On selected research topics Week 11-15. Advanced topics, student presentations � Re-evaluate an existing study; or survey on a topic � Including proposal, presentation, and final report Verilog Code Sketch Syllabus, Class web site, WebCT module cpu (reset, cycle, clock); // tomasulo with MIPS32 Syllabus On class web site (found it … from my home page) Course Schedule /* stage 1: inst fetch */ inst_fetch M1(/* request */fetch_req, /* ok */fetch_ok, /* pc */pc, /* inst */inst, Check announcements Textbook and references /* reset */reset, /* branch */0, /* branch target */0, Get papers etc /* clock */clock); Projects Homework /* stage 2: rename, register read, issue */ rename M2(fetch_req, …); On WebCT Exam Check your grades /* stage 3: execute */ Grading adder(request, …); // fu adder with RS Join discussions: Verilog … programming, project /* stage 4: write back */ … understanding, course endmodule contents, homework problems Still under construction 3

Major Faces in Today’s Market Technology Trends To know some non-technical background for processor design Implementation technologies change dramatically Desktop computers � Integrated circuit logic technology � Providing desktop computing for individuals � Semiconductor DRAM � Optimized for price-performance Servers � Magnetic disk technology � Providing larger-scale and more reliable file and computing � Network technology service � Designed for performance, availability, and scalability Embedded computers ISA must be stable: software is more expensive � Lodged in other devices (networking switches, printer, palm, cell phone, etc.) than hardware � Emphasizing real-time performance requirements � Emphasizing low cost and low power design Cost, Price, and Their Trends Processor Performance Trends Cost and price may determine if a computer product will be successful in markets In many cases cost is the single important factor in design considerations � Add a new feature or not? � Trade performance with cost and price Especially true for desktop and embedded market Processor Price Trend Cost of IC + + Cost of die Testing Packing & Final Test Final test yield Cost of die = Cost of wafer × Dies per wafer Die yield 4

Die Yield Die Yield × D Die area = × + − Die yield Wafer yield (1 ) α α × 2 × π (d/2) π d D : Defects per unit area = − Dies per wafer Die area × 2 Die area α : Masking level d : Wafer diameter Wafer diameter size: 30cm, Defect density: 0.6 per cm 2 Mask level ( α ): 4 0.7cm × 0.7cm 0.75 × D Die area = × + − Die yield Wafer yield (1 ) α 1cm × 1cm 0.57 α 1.5 cm × 1.5cm 0.44 2cm × 2cm 0.35 D : Defects per unit area α : Masking level Processor cost is more than linear to performance! Price increase is even more! (See textbook) DRAM Price Trend End of Lecture Course Strategies Notes Learn the fundamentals of computer Add slides for multiple-issue, branch architecture design prediction, load/store units, and memory Learn the most important aspects (at this hierarchy time) of computer architecture: superscalar Add slides for Tomasulo and Alpha-21264 processors and memory hierarchy like scheduling Be exposed on the other topics: storage systems and multiprocessors Add slides for course scheduling To appreciate the merits in computer architecture research Remember: hot topics tomorrow may be different 5

1 The CPU Performance Equation Instruction-level Parallelism (ILP) - PDF document

Traditional Computer Architecture The term architecture is used here to describe the attribute of a system as seen Lecture 1: Introduction by the programmer, i.e., the conceptual structure and functional behavior as distinct from the

Multidimensional Binary Vector Assignment problem: standard, structural and above guarantee

Implementation of Advanced Solar-Cell Analysis at Cell Test Ronald A. Sinton, Adrienne L. Blum

Statistical Learning: The Complex Cases Case 0: Bayesian Network structure known, all

ASIC Project Cost Smith Text Chapter 1 VLSI Implementations Custom Standard cell Gate

Influence of CO2 Bubbling (Carbonation) During Semiconductor Wafer Sawing Process KP Yan ,

1 Development Team Dang Nguyen Ryan Kirkpatrick Development Team What is IEA? Input Spec

N-type Transistor CS/EE 6710 CMOS Processing D + G Vds i electrons S - +Vgs N-type from

Modeling Algorithm for Test Cost Reduction of Analog/RF Circuits Hugo Gonalves 1,2 , Xin Li 1 ,

Multidimensional Assignment Problems for Semiconductor Plants Trivikram Dokka, Yves Crama, Frits

Writing X11/Wayland agnostic GL applications with Waffle Emil Velikov

God in Movies True Story the question Why Why Does God Allow Some Doors to Close and

FAST, EASY USABILITY TRICKS FOR BIG PRODUCT IMPROVEMENTS Chris Nodder Chris Nodder Consulting

Accessing Google at home Parent s Handbook For laptops (if you have a Chrome device, log into

Formal AOP: Opportunity Abounds James Riely http://www.depaul.edu/ jriely DePaul CTI,

Beryllium Windows Lorentz Force Detuning and Field Enhancement F. Marhauser, MuPlus, Inc.

Launching the New Venture Organizing Resources and Raising Capital The Basic Paradox

Manipulation of SVG graphs with Stata Tim Morris , MRC Clinical Trials Unit at UCL Robert Grant ,

twenty four http:// nisee.berkeley.edu/godden concrete construction: flat spanning systems

Lecture 3.1 Lecture 3.1 Design of Two Design of Two- - way Floor Slab System way

Brought to you by your Belgian waffle man! 2 Dimitri "Dim0" Vanoverbeke Solutions

Amounts and Proportions Session 4 PMAP 8921: Data Visualization with R Andrew Young School of

Sponsored by Presented by

Introduction to Elliptic Curve Cryptography Benjamin Smith Team GRACE INRIA + Laboratoire

Logical Forms Prof. Sameer Singh CS 295: STATISTICAL NLP WINTER 2017 February 16, 2017 Based

1 The CPU Performance Equation Instruction-level Parallelism (ILP) - PDF document

Traditional Computer Architecture The term architecture is used here to describe the attribute of a system as seen Lecture 1: Introduction by the programmer, i.e., the conceptual structure and functional behavior as distinct from the

Multidimensional Binary Vector Assignment problem: standard, structural and above guarantee

Implementation of Advanced Solar-Cell Analysis at Cell Test Ronald A. Sinton, Adrienne L. Blum

Statistical Learning: The Complex Cases Case 0: Bayesian Network structure known, all

ASIC Project Cost Smith Text Chapter 1 VLSI Implementations Custom Standard cell Gate

Influence of CO2 Bubbling (Carbonation) During Semiconductor Wafer Sawing Process KP Yan ,

1 Development Team Dang Nguyen Ryan Kirkpatrick Development Team What is IEA? Input Spec

N-type Transistor CS/EE 6710 CMOS Processing D + G Vds i electrons S - +Vgs N-type from

Modeling Algorithm for Test Cost Reduction of Analog/RF Circuits Hugo Gonalves 1,2 , Xin Li 1 ,

Multidimensional Assignment Problems for Semiconductor Plants Trivikram Dokka, Yves Crama, Frits

Writing X11/Wayland agnostic GL applications with Waffle Emil Velikov

God in Movies True Story the question Why Why Does God Allow Some Doors to Close and

FAST, EASY USABILITY TRICKS FOR BIG PRODUCT IMPROVEMENTS Chris Nodder Chris Nodder Consulting

Accessing Google at home Parent s Handbook For laptops (if you have a Chrome device, log into

Formal AOP: Opportunity Abounds James Riely http://www.depaul.edu/ jriely DePaul CTI,

Beryllium Windows Lorentz Force Detuning and Field Enhancement F. Marhauser, MuPlus, Inc.

Launching the New Venture Organizing Resources and Raising Capital The Basic Paradox

Manipulation of SVG graphs with Stata Tim Morris , MRC Clinical Trials Unit at UCL Robert Grant ,

twenty four http:// nisee.berkeley.edu/godden concrete construction: flat spanning systems

Lecture 3.1 Lecture 3.1 Design of Two Design of Two- - way Floor Slab System way

Brought to you by your Belgian waffle man! 2 Dimitri &quot;Dim0&quot; Vanoverbeke Solutions

Amounts and Proportions Session 4 PMAP 8921: Data Visualization with R Andrew Young School of

Sponsored by Presented by

Introduction to Elliptic Curve Cryptography Benjamin Smith Team GRACE INRIA + Laboratoire

Logical Forms Prof. Sameer Singh CS 295: STATISTICAL NLP WINTER 2017 February 16, 2017 Based

Brought to you by your Belgian waffle man! 2 Dimitri "Dim0" Vanoverbeke Solutions