1 SPEC CPU Benchmark Other SPEC Benchmarks SPEC: Standard - PDF document

What Does Performance Mean? Response time Lecture 2: Performance Evaluation � A simulation program finishes in 5 minutes Methods Throughput � A web server serves 5 million request per Performance definition, benchmark, second summarizing performance, Amdahl’s Other metrics law, and CPI � MIPS (million instruction per second) � MFLOPS � Clock frequency Quantitative Definitions Performance Comparison “X is n times faster than Y”: Use response time or execution time: Execution time Performanc e � Performance is 1/(Execution time) = y = x n Performanc e Execution time � Performance is 1/CPI y x n : speedup if we are considering an � Performance is IPC (instruction per cycle, enhancement, optimization, etc. talk later) Some terms � Elapsed time vs. CPU time � Improve performance: decrease execution time, Use throughput increase throughput � Improve execution time: decrease execution time � Performance is 5 million requests per � Degrade performance: the reverse of the above; second, 5 simulation programs per hour brings negative speedup Performance of Computers Benchmark Suite Performance is defined for a given program and a given Benchmark suite is a collection of benchmarks machine. How about the machine alone? Need with a variety of applications benchmark programs: � Alleviating weakness of a single benchmark Real applications: scientific programs, compilers, text- processing software, image processing � More representative for computer designers to Modified applications: providing portability and focus evaluate their design Kernels: good to isolate performance of individual Categories of benchmark suites features � Desktop benchmarks: CPU, memory, and graphics � Lmbench: measure latency and bandwidth of memory, file performance system, networking, etc. Toy benchmarks � Sever benchmarks: throughput-oriented, I/O and OS intensive Synthetic benchmarks: matching average execution profile � Embedded benchmarks: measuring the ability to meet deadline and save power 1

SPEC CPU Benchmark Other SPEC Benchmarks SPEC: Standard Performance Evaluation SPECviewperf and SPEapc: 3D graphics Corporation performance CPU-intensive benchmark for evaluating SPEC JVM98: performance of client-side processor performance of workstation Java virtual machine Four generations: SPEC89, SPEC92, SPEC JBB2000: Server-cline Java SPEC95, and SPEC2000 application Two types of programs: INT and FP SPEC WEB99: evaluating WWW servers Emphasizing memory system SPEC HPC96: parallel and distributed performance in SPEC2000 computing Server Benchmarks Embedded Benchmark SPEC CPU2000, WBB99, SFS97 EEMBC (Embedded Microprocessor TPC Measuring the ability of a system to Benchmark Consortium) benchmarks handle transactions � Based on kernel performance � TPC-C: online transaction processing (OLTP) � Five classes: automotive/industrial, benchmark (for bank systems) consumer networking, office automation, � TPC-H: ad hoc decision make support and telecommunications � TPC-R: decision make support with standard queries Embedded benchmarks are not mature � TPC-W: simulating business-oriented transactional web server Metric 1: Arithmetic Mean Summarizing Performance Given the performance of a set of programs, how Total execution time / (number of to evaluate the performance of machines? programs) 1 n ∑ Time A B C i n = i 1 P1 (secs) 1 10 20 � Simple and intuitive P2 (secs) 1000 100 20 � Representative if the user run the programs Total (secs) 1001 110 40 an equal number of times Which computer is the best one? 2

Metric 2: Weighted Arithmetic Mean Metric 3: Geometric Means Based on relative performance to a reference Give (different) weights to different machine programs n ∏ Execution time ratio n n n i ∑ × ∑ = Weight Time , Weight 1 i = 1 i Relative performance is consistent with i i = i 1 = i 1 different reference machines Geometric mean(X ) X i = Geometric mean( i ) � Considering the frequencies of programs in Geometric mean(Y ) Y i i the workload � If C is 2x faster than B (using B as the reference), B is 2x faster than A (A as the reference), then C is 4x faster than A (A as the reference) Example Harmonic Mean Recall the previous example Given speedups s1, s2, …, s_n, the A B C average speedup by harmonic mean is P1 (secs) 1 10 20 P2 (secs) 1000 100 20 Total (secs) 1001 110 40 1 / (1/s1 + 1/s2 + … + 1/s_n) Arithmetic mean: B is 9.1x faster than A, C is Why not arithmetic mean? 25x times faster than A Geometric mean: A and B are equally fast, and C is only 60% faster than A Amdahl’s Law Amdahl’s Law We know about performance: defining, Predict overall speedup from “local measuring, and summarizing speedup” by an enhancement, provided the frequency to use the enhancement is How to maximize performance gains from know. the beginning in our design? � “Local speedup” is related to design and Principle: Make the Common Case Fast! optimization objectives, like to double CPU frequency, to reduce cache latency by half 3

Amdahl’s Law Amdahl’s Law Application Objective: improve performance of a graphics = Execution time Execution Time engine new old   Choice one: Speed up FP Square root by 10x Fraction ( )   × − + 1 Fraction enhanced   Choice two: Speed up all FP instruction by 1.6x enhanced Speedup   enhance Assume 20% inst are FP Square root, 50% for all FP inst Execution time = Speedup old overall Execution time Ask: Which choice is better? new The answer is: 1 = Fraction ( ) + 1 - Fraction enhanced Implication: Optimizing for the common case first enhanced Speedup enhanced CPI and IPC CPU Time Equation CPI: Average number of cycles spend for = × CPU time CPU clock cycles cycle time each instruction = × CPU clock cycles Instructio n count CPI CPU clock cycles for a program CPI = Instructio n count ⇒ = × CPU time Instructio n count CPI IPC: Average number of instructions that can be finished in one cycle × cycle time Instructio n count IPC = CPU clock cycles for a program Make Design Choice Using CPU Time Equation Based on Instruction Types Equation FP FPSQR Other = × CPU time CPU Clock Cycles Clock cycle time Frequency 25% 2% 75%   n CPI 4.0 20 1.33   = × CPU Clock Cycles ∑ IC CPI   i i  =  i 1  n  Alternative 1: CPI FPSQR 20 → 2   ⇒ = × × CPU time ∑ IC CPI Clock cycle time   Alternative 2: CPI FP 4 → 2.5 i i  =  i 1 n ∑ = × CPI Instructio n frequency CPI Which one is better? Calculate speedups. i i i = 1 4

1 SPEC CPU Benchmark Other SPEC Benchmarks SPEC: Standard - PDF document

What Does Performance Mean? Response time Lecture 2: Performance Evaluation A simulation program finishes in 5 minutes Methods Throughput A web server serves 5 million request per Performance definition, benchmark, second summarizing

End-to-end Design of a PUF based Privacy Preserving Authentication Protocol Aydin Aysu (Virginia

CS4491-02 Fog Computing Life Cycles 1 Questions What is the life cycle of IoT systems and

Sensitivity Of Quake3 Players Sensitivity Of Quake3 Players Sensitivity Of Quake3 Players

Serverless networking (peer-to-peer computing) Peer-to-peer models Client-server computing

301AA - Advanced Programming Lecturer: Andrea Corradini andrea@di.unipi.it

Introductory Queueing Theory Tutorial Mor Harchol-Balter Computer Science Dept, Carnegie Mellon

Lecture 2: Performance Todays topics: Performance trends and equations Reminders:

IC220 See through the marketing hype Slide Set #5B: Performance Key to understanding

Data and Process Modelling Lab10. Quantitative Process Analysis Marco Montali KRDB Research

Performance What do we mean by Performance? We must take many different factors into account:

CSEE 3827: Fundamentals of Computer Systems, Spring 2011 8. Processor Performance Prof. Martha

CS4617 Computer Architecture Lecture 4: Memory Hierarchy 2 Dr J Vaughan September 17, 2014 1/25

Chapter 5 Managing Process Constraints Theory of Constraints Managing Bottlenecks

Appendix A Appendix A Pipelining: Basic and Intermediate Concepts p 1 Overview Basics of

Pipelining Performance Measurements Cycle Time: Time in between clock ticks Latency:

Computer Systems Lecture 14 Performance Measures CS 230 - Spring 2020 3-1 CPU Clocking

CSSE232 Computer Architecture Performance Class status

Technology Insertion/Infusion CALCE Electronic Products and Systems Center University of Maryland

1 CPI (cycles per instruction) CPI (cycles per instruction) Parallelism to save time

LECTURE 1 Introduction CLASSES OF COMPUTERS When we think of a computer, most of us

CS137: Today Electronic Design Automation Retiming Cycle time (clock period)

Continuous Improvement Toolkit Lean Measures Continuous Improvement Toolkit . www.citoolkit.com

Processor Pipeline Nima Honarmand Spring 2016 :: CSE 502 Computer Architecture Generic

Cycle 4c: R-type result write (add, and) Inf2C Computer Systems - 2013-2014 19 What happens in