what does performance mean
play

What Does Performance Mean? Response time Lecture 2: Performance - PDF document

What Does Performance Mean? Response time Lecture 2: Performance A simulation program finishes in 5 minutes Throughput Evaluation A web server serves 5 million request per second Performance definition, Other metrics


  1. What Does Performance Mean? � Response time Lecture 2: Performance – A simulation program finishes in 5 minutes � Throughput Evaluation – A web server serves 5 million request per second Performance definition, � Other metrics benchmark, summarizing – MIPS (million instruction per second) performance, Amdahl’s law, and – MFLOPS CPI – Clock frequency Execution Time Performance of Computers Performance is defined for a program and a � Processor design is concerned with processor machine . consumed by program execution. Shorter How to compare computers? Need benchmark execution time=> programs: – Shorter response time – Real applications: scientific programs, compilers, text-processing software, image processing – Higher throughput – Modified applications: providing portability and � Execution time = #inst×CPI×Cycletime focus – What affects #inst, CPI, and cycle time? – Kernels: good to isolate performance of individual features – Almost all designs can be interpreted � Lmbench: measure latency and bandwidth of memory, file � Any other metrics is meaningful only if system, networking, etc. consistent with execution time – Toy benchmarks – Synthetic benchmarks: matching average execution profile Performance Comparison Benchmark Suite “X is n times faster than Y”: � Benchmark suite is a collection of benchmarks with a variety of applications Execution time Performanc e – Alleviating weakness of a single benchmark = y = x n – More representative for computer designers to evaluate Performanc e Execution time their design y x – Benchmarks test both computer and compilers, and OS in � n : speedup if we are considering an many cases enhancement, optimization, etc. � Desktop benchmarks: CPU, memory, and graphics performance � What does “improving” mean? � Sever benchmarks: throughput-oriented, I/O and OS – Improve performance: decrease execution time, intensive increase throughput � Embedded benchmarks: measuring the ability to meet – Improve execution time: decrease execution time deadline and save power – Degrade performance: the reverse of the above; brings negative speedup 1

  2. Arithmetic Mean Summarizing Performance � Total execution time / (number of Given the performance of a set of programs, how to evaluate the performance of programs) machines? 1 n ∑ Time A B C i n = i 1 P1 (secs) 1 10 20 – Simple and intuitive P2 (secs) 1000 100 20 – Representative if the user run the Total (secs) 1001 110 40 programs an equal number of times � Which computer is the “best” one? Weighted Arithmetic Mean Geometric Means � Based on relative performance to a reference � Give (different) weights to different machine programs n ∏ Execution time ratio n n n i ∑ × = Weight Time , ∑ Weight 1 = 1 i i i i � Relative performance is consistent with = i 1 = i 1 different reference machines Geometric mean(X ) X i = i Geometric mean( ) – Considering the frequencies of programs in Geometric mean(Y ) Y the workload i i – If C is 2x faster than B (using B as the reference), B is 2x faster than A (A as the reference), then C is 4x faster than A (A as the reference) Harmonic Mean Amdahl’s Law � Given speedups s1, s2, …, s_n, the We know about performance: defining, average speedup by harmonic mean is measuring, and summarizing How to maximize performance gains from the beginning in our design? n / (1/s1 + 1/s2 + … + 1/s_n) � Principle: Make the Common Case Fast! Why not arithmetic mean? 2

  3. Amdahl’s Law Amdahl’s Law � Predict overall speedup from “local = Execution time Execution Time new old speedup” by an enhancement, provided   Fraction ( ) ×  − +  the frequency to use the enhancement 1 Fraction enhanced   enhanced Speedup   is know. enhance Execution time = Speedup old – “Local speedup” is related to design and overall Execution time new optimization objectives, like to double CPU 1 frequency, to reduce cache latency by half = Fraction ( ) + 1 - Fraction enhanced enhanced Speedup enhanced Equation Based on Instruction Amdahl’s Law Types Assume we need to improve the performance of = × CPU time CPU Clock Cycles Clock cycle time a graphics engine Choice one: Speed up FP Square root by 10x  n    Choice two: Speed up all FP instruction by 1.6x = × CPU Clock Cycles ∑ IC CPI   i i  =  Assume 20% inst are FP Square root, 50% for i 1 all FP inst  n    ⇒ = × × CPU time ∑ IC CPI Clock cycle time   i i Which choice is better? =  i 1  n ∑ Implication: Optimizing for the common case = × CPI Instructio n frequency CPI i i first i = 1 Make Design Choice Using CPU SPEC CPU Benchmark Time Equation FP FPSQR Other � SPEC: Standard Performance Evaluation Corporation Frequency 25% 2% 75% � CPU-intensive benchmark for evaluating CPI 4.0 20 1.33 processor performance of workstation � Four generations: SPEC89, SPEC92, Alternative 1: CPI FPSQR 20 → 2 SPEC95, and SPEC2000 Alternative 2: CPI FP 4 → 2.5 � Two types of programs: INT and FP � Emphasizing memory system Which one is better? Calculate speedups. performance in SPEC2000 3

  4. SPEC CPU2000 Profiling Other SPEC Benchmarks � SPECviewperf and SPEapc: 3D graphics Dynamic instruction mix performance Instruction Int avg FP avg � SPEC JVM98: performance of client- Load int 26% 15% Store int 10% 2% side Java virtual machine Load fp - 15% � SPEC JBB2000: Server-cline Java Store fp - 7% application Add 19% 23% � SPEC WEB99: evaluating WWW servers All fp inst - 41% � SPEC HPC96: parallel and distributed Cond br. 12% 4% computing All ctrl inst 16% 4% Server Benchmarks Embedded Benchmark � SPEC CPU2000, WBB99, SFS97 � EEMBC (Embedded Microprocessor � TPC Measuring the ability of a system Benchmark Consortium) benchmarks to handle transactions – Based on kernel performance – TPC-C: online transaction processing (OLTP) – Five classes: automotive/industrial, benchmark (for bank systems) consumer networking, office automation, – TPC-H: ad hoc decision make support and telecommunications – TPC-R: decision make support with standard queries Embedded benchmarks are not mature – TPC-W: simulating business-oriented transactional web server 4

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend