1 CPI Cycles per Instruction Instruction Classes We can have - PDF document

Performance Introduction •Many factors impact performance: •Technology: •basic circuit speed (clock speed, usually in MHz, now in GHz - billions of cycles per second) •process technology (# of transistors per chip) •Organization: •what style of ISA (RISC vs. CISC) •what type of memory hierarchy •Software: quality of compiler, OS, database, etc 5/3/2002 104 5/3/2002 105 Metrics Execution Time •Raw speed (peak performance -- never attained) Performance: Performance A = 1/ExecutionTime A • Execution time (also called response time, ie. time required to execute program from beginning to end). Benchmarks: Processor A is faster than Processor B if: •Integer dominated programs (compilers, etc) Performance A > Performance B •Scientific (lots of floating point) ExecutionTime A < ExecutionTime B •Graphics/multimedia • Throughput (total amount of work in given time) Relative Performance: •Good metric for systems managers Performance A /Performance B = ExecutionTime B / ExecutionTime A •Databases: keep the most people happy 5/3/2002 106 5/3/2002 107 Measuring Execution Time Defining Execution Time •Wall clock, response time, elapsed time •Execution time = clock cycles x clock cycle time •Unix time function: •Execution time is program dependent •Clock cycles are program dependent [fiji]:~ time someprogram 346.085u 0.39s 5:48.32 99.4% 5+202k 0+0io 0pf+0w •clock cycle time (usually in ns) is dependent on the machine ...lists user CPU time, system CPU time, elapsed time, percentage of Since clock cycle time = 1/(clock cycle rate), and alternate definition is: elapsed time which is CPU time and other info CPU Execution time = CPU clock cycles We'll typically use User CPU time to mean CPU execution time , or ---------------- clock cycle rate just execution time 5/3/2002 108 5/3/2002 109 1

CPI Cycles per Instruction Instruction Classes •We can have different CPIs for different classes of instructions •Definition: CPI is the average # of cycles per instruction: (eg. floating point instructions take more cycles than integer •CPU clock cycles = Number of instructions executed x CPI instructions.) CPU Execution Time = Number of Instructions x CPI x clock cycle time CPU Execution time = Σ (CPI i x C i ) x clock cycle time •CPI in isolation is not a measure of performance (program and compiler •C i is the number of instructions in a class that have executed dependent) •Note that minimizing the number of instructions doesn't necessarily •Ideally CPI = 1, but this might slow the clock (compromise) improve performance. •Can we have CPI < 1 •Improving part of the architecture can improve a C i . 5/3/2002 110 5/3/2002 111 Measuring CPI Other Metrics: MIPS •Instruction count: need a simulator or profiler: •MIPS = Millions of Instructions Per Second •simulator interprets and counts each instruction •profiler uses a sampling technique MIPS = Instruction count / (Execution Time x 1,000,000) •CPU execution time can be measured •MIPS is appealing because it is a rate -- bigger is better •Clock cycle time is given by processor •But MIPS in isolation is no better than CPI -- it's program dependent •We know Exetime, so we can solve for total cycles •Does not take the instruction set into account: •Knowing total cycles together with the number of instructions •CISC programs typically take fewer instructions than a RISC, so we executed lets us solve for average CPI can't compare the different ISAs using MIPS 5/3/2002 112 5/3/2002 113 The Trouble with MIPS Benchmarks •It gives "wrong" results: •Benchmark: workload representative of what the computer will be used for. •Machine A with compiler C1 executes program P in 10 seconds, using 100,000,000 instructions (10 MIPS) •CPU benchmarks: SPEC (SPECint, SPECfp, etc) •Machine A with compiler C2 executes program P in 15 seconds, using •Database benchmarks 180,000,000 instructions (12 MIPS) •Webserver benchmarks •C1 is clearly better, but it has a lower MIPS rating. •Caveats: •MIPS doesn't take CPI into account... •Compilers optimize specifically for benchmarks •Some benchmarks don't test the memory system sufficiently 5/3/2002 114 5/3/2002 115 2

Amdahl's Law Example Measurements •Amount we can improve performance is limited by the amount Category GCC SPICE Ave CPI that the improved feature is actually used: Load/Store 33% 40% 1.4 Branches 16% 8% 1.8 Jumps 2% 2% 1.2 New Execution Time = Execution Time affected by Improvement + Unaffected Exe time Amount of improvement FP Add - 5% 2.0 FP Sub - 3% 4.0 Example: if loads/stores take up 33% of our Exe time, how much do we FP Mul - 6% 5.0 need to improve loads/stores to make the program run 1.5 times FP Div - 3% 19.0 faster? Other (integer ADD, etc) 49% 33% 1.0 Corollary: Make the common case fast! •What is the average CPI for gcc? For spice? 5/3/2002 116 5/3/2002 117 3

1 CPI Cycles per Instruction Instruction Classes We can have - PDF document

Performance Introduction Many factors impact performance: Technology: basic circuit speed (clock speed, usually in MHz, now in GHz - billions of cycles per second) process technology (# of transistors per chip) Organization:

Global Predicate Detection and Event Ordering Our Problem To compute predicates over the state

Debugging Debugging with High Level Languages Same goals as low-level debugging Examine and

Choueiry AIMA: Chapter 3 (Setions 3.1, 3.2 and 3.3) In tro dution to

Dos Rios WRC Electrical System Improvements Phase 2 Project Jeff Ray, E.I.T. Project

C Execution Time Analysis ! Jan Gustafsson, Docent Mlardalen Real-Time Research Center

Governance Retreat Friday, December 8, 2017 Agenda 10:00a Welcome and Overview of Current

STATE LEVEL ( STATE LEVEL (California California) 4 SOURCES OF LAW 4 SOURCES OF LAW by Brandon

st r Prr

Trevor Brennan How To Succeed In Virtual Fitness And Why It Matters H O W T O S U C C E E D I

PAD IS A MAJOR GLOBAL HEALTH PROBLEM Michael S. Conte MD Division of Vascular and Endovascular

Aerobi-Dog Autonomous Dog Exerciser Purple A Product Area: Active Play Product Vision

My name is Irwin Reyes from the International Computer Science Institute. Ill be speaking to you

DataWedge v6.3 and beyond DevTalk September 2017 Darryn Campbell Software Architect

COVI VID-1 -19 INSIGH GHT Issue ue 5 November 2020 COVID INSIGHT INFECTION PREVENTION AND

CuriousDroid: Automated User Interface Interaction for Android Application Analysis

RevNIC ReverseEngineeringofBinaryDeviceDrivers

Objects (XSL-FO ) Asst. Prof. Dr. Kanda Runapongsa (krunapon@kku.ac.th) Dept. of Computer

Algoritmi per la Bioinformatica To abstract from specific computers (processor speed, computer

Introduction The MP Problem Solving a system of m multivariate polynomial equations in n variables

PROBABILISTIC ANALYSIS OF AN EXHAUSTIVE SEARCH ALGORITHM IN RANDOM GRAPHS Hsien-Kuei Hwang

Parsimony II Search Algorithms Genome 373 Genomic Informatics Elhanan Borenstein A quick

Exhaustive search of optimal formulae for bilinear maps Svyatoslav Covanov Supervisors:

Neurally-Guided Structure Inference http://ngsi.csail.mit.edu Sidi Lu, Jiayuan Mao, Josh

Review Models of Interactivity mousePressed() mouseReleased() mouseClicked()

1 CPI Cycles per Instruction Instruction Classes We can have - PDF document

Performance Introduction Many factors impact performance: Technology: basic circuit speed (clock speed, usually in MHz, now in GHz - billions of cycles per second) process technology (# of transistors per chip) Organization:

Global Predicate Detection and Event Ordering Our Problem To compute predicates over the state

Debugging Debugging with High Level Languages Same goals as low-level debugging Examine and

Choueiry AIMA: Chapter 3 (Setions 3.1, 3.2 and 3.3) In tro dution to

Dos Rios WRC Electrical System Improvements Phase 2 Project Jeff Ray, E.I.T. Project

C Execution Time Analysis ! Jan Gustafsson, Docent Mlardalen Real-Time Research Center

Governance Retreat Friday, December 8, 2017 Agenda 10:00a Welcome and Overview of Current

STATE LEVEL ( STATE LEVEL (California California) 4 SOURCES OF LAW 4 SOURCES OF LAW by Brandon

st r Prr

Trevor Brennan How To Succeed In Virtual Fitness And Why It Matters H O W T O S U C C E E D I

PAD IS A MAJOR GLOBAL HEALTH PROBLEM Michael S. Conte MD Division of Vascular and Endovascular

Aerobi-Dog Autonomous Dog Exerciser Purple A Product Area: Active Play Product Vision

My name is Irwin Reyes from the International Computer Science Institute. Ill be speaking to you

DataWedge v6.3 and beyond DevTalk September 2017 Darryn Campbell Software Architect

COVI VID-1 -19 INSIGH GHT Issue ue 5 November 2020 COVID INSIGHT INFECTION PREVENTION AND

CuriousDroid: Automated User Interface Interaction for Android Application Analysis

RevNIC ReverseEngineeringofBinaryDeviceDrivers

Objects (XSL-FO ) Asst. Prof. Dr. Kanda Runapongsa (krunapon@kku.ac.th) Dept. of Computer

Algoritmi per la Bioinformatica To abstract from specific computers (processor speed, computer

Introduction The MP Problem Solving a system of m multivariate polynomial equations in n variables

PROBABILISTIC ANALYSIS OF AN EXHAUSTIVE SEARCH ALGORITHM IN RANDOM GRAPHS Hsien-Kuei Hwang

Parsimony II Search Algorithms Genome 373 Genomic Informatics Elhanan Borenstein A quick

Exhaustive search of optimal formulae for bilinear maps Svyatoslav Covanov Supervisors:

Neurally-Guided Structure Inference http://ngsi.csail.mit.edu Sidi Lu*, Jiayuan Mao*, Josh

Review Models of Interactivity mousePressed() mouseReleased() mouseClicked()

Neurally-Guided Structure Inference http://ngsi.csail.mit.edu Sidi Lu, Jiayuan Mao, Josh