Performance Questions How to characterize the performance of - PowerPoint PPT Presentation

Performance Questions � How to characterize the performance of applications and systems? � User’s requirements in performance and cost? � How about performance measurement? � How will system perform when having more resources or How will system perform when having more resources or more workload? 2110412 Parallel Comp Arch Performance and Benchmarking Natawut Nupairoj, Ph.D. Department of Computer Engineering, Chulalongkorn University Important Keywords Performance Metrics � Peak Performance � Indicators of how good the systems are. � Theoretical performance. � To evaluate correctly, we must consider: � Typically, peak of single CPU * n � What is the metric (or metrics) ? � Sustained Performance � What is its definition ? � The maximal achievable performance by running a � How to measure it ? Benchmark algorithm ? benchmark. � What is the evaluating environment ? � Configuration. � Workload.

Popular Metrics Execution Time � Time - Execution Time � Aka. Wall clock time, elapsed time, delay. � Rate - Throughput and Processing Speed � CPU time + I/O + user + … � Resource – Utilization � The lower, the better. � Ratio - Cost Effectiveness � Factors � Algorithm. Algorithm. � Reliability – Error Rate � Reliability – Error Rate � Data structure. � Availability – Mean Time To Failure (MTTF) � Input. � Hardware/Software/OS. � Language. Definition of Time Analysis of Time � Let’s try “time” command for Unix 90.7u 12.9s 2:39 65% � User time = 90.7 secs � User time = 90.7 secs � System time = 12.9 secs � Elapsed time = 2 mins 39 secs = 159 secs � (90.7 + 12.9) / 159 = 65% � Meaning?

Processing Speed Throughput � Number of jobs that can be processed in a unit time. � Aka. Bandwidth (in communication). � How fast can the system execute ? � The more, the better. � MIPS, MFLOPS. � High throughput does not necessary mean low execution � The more, the better. time. time. � Can be very misleading !!! � Can be very misleading !!! � Pipeline. � Multiple execution units. k = m + n; for j=0 to x for j=0 to x/4 k = m + n; k = m + n; k = m + n; k = m + n; k = m + n; k = m + n; k = m + n; ... k = m + n; Utilization Cost Effectiveness � The percentage of resources � Peak performance/cost ratio being used � Price/performance ratio � Ratio of � PCs are much better in this category than Supercomputer � busy time vs. total time � sustained speed vs. peak speed � The more the better? � True for manager � But may be not for user/customer � Resource with highest utilization is the “bottleneck”

Price/Performance Ratio Moore’s Law (1965) From Tom’s Hardware Guide: CPU Chart 2009 Performance of Parallel Systems Kurzweil: The Law of Accelerating Returns � Factors � Components and architecture. � Degree of Parallelism. � Overheads. � Architecture � CPU speed. � Memory size and speed. � Memory hierarchy.

Parallelism and Overheads Parallelism and Overheads � Execution time � Tseq – Time spent in Sequential � Only one node (usually master) do the job T = Tpar + Tseq + Tcomm � Load / save data from disk � Critical sections � Tpar – Time spent in Parallel � Usually, occurs during start and end of program � All nodes execute at the same time All nodes execute at the same time � Tcomm - Communication overhead � Computation Time (mostly) � Communication between nodes � Depends on Algorithm � Data movement � Load-imbalance (Degree of Parallelism) � Synchronization: barrier, lock, and critical region � Aggregation: reduction. Execution Time Components Speedup Analysis � How good the parallel system is, when compared to the � Given program with Workload W: sequential system � Let α be the percentage of SEQUENTIAL portion in this program � Predict the scalability � Parallel portion = 1 - α � Speedup metrics � Amdahl’s Law � Gustafson’s Law W W W ( 1 ) = + − α α

Execution Time Components Speedup Formula � Suppose this program requires T time units on SINGLE processor: � T = Tpar + Tseq + Tcomm � Tpar = (1 - α )T Sequential Sequential execution execution time time � Tseq = α T � Tseq = T Speedup = Speedup Parallel execution time � For simplicity ignore Tcomm T T T ( 1 ) = + − α α Amdahl’s Law Amdahl’s Law (2) � Aka. Fixed-Load (Problem) Speedup α T � Given workload W, how good it is if we have n processors Time (ignore communication) ? (1−α) T Time Time to to execute execute W W on on 1 1 processor processor S n = Time to execute W on n processor T T T ( 1 ) = α + − α Number of processors T n 1 S n n as = = → → ∞ T T n n ( 1 ) / 1 ( 1 ) � Very popular (and also pessimistic). + − + − α α α α

Impact of Parallel Portion (1 - α ) Example 1 � 95% of a program’s execution time occurs inside a loop that can be executed in parallel. What is the maximum speedup we should expect from a parallel version of the program executing on 8 CPUs? Example 2 Limitations of Amdahl’s Law � 20% of a program’s execution time is spent within � Ignores Tcomm inherently sequential code. What is the limit to the � Overestimates speedup achievable speedup achievable by a parallel version of the program? � Very pessimistic � When people have bigger machines, they always run bigger programs � Thus, when people have more processors, they usually run bigger workloads � More workloads = more parallel portion � Workload may not be fixed, but SCALE

Problem Size and Amdahl’s Law Gustafson’s Law � Aka. Fixed-Time Speedup (or Scaled-Load Speedup). � Given a workload W, suppose it takes time T to execute W Speedup on 1 processor. n = 10,000 � With the same T, how much (workload) we can run on n processors ? Let’s call it W’. � Assume the sequential work remains constant. n = 1,000 n = 100 W W W W W nW ( 1 ) ' ( 1 ) = α + − α = α + − α Processors Weather Prediction Gustafson’s Law (2) � Fixed-Time Speedup Workload size that can be executed in time T with n processors S ′ n = Workload Workload size size that that can can be be executed executed in in time time T T with with 1 1 processors processors W W nW ( 1 ) ′ S n + − n α α ( 1 ) ′ = = = + − W W α α 2110412 Parallel Comp Arch Natawut Nupairoj, Ph.D.

Gustafson’s Law (3) Example 1 � An application running on 10 processors spends 3% of its α W time in serial code. What is the scaled speedup of the Time application? (1−α) nW X 1 X 2 X 3 X 4 X 5 Number of processors Example 2 Performance Benchmarking � What is the maximum fraction of a program’s parallel � Benchmark execution time that can be spent in serial code if it is to � Measure and predict the performance of a system achieve a scaled speedup of 7 on 8 processors? � Reveal the strengths and weaknesses � Benchmark Suite � A set of benchmark programs and testing conditions and procedures � Benchmark Family � A set of benchmark suites

Benchmarks Classification Popular Benchmark Suites � By instructions � SPEC � Full application � TPC � Kernel -- a set of frequently-used functions � LINPACK � By workloads � Real programs � Synthetic programs SPEC CINT2006 � By Standard Performance Evaluation Corporation 400.perlbench C PERL Programming Language 401.bzip2 C Compression � Using real applications 403.gcc C C Compiler � http://www.spec.org 429.mcf C Combinatorial Optimization � SPEC CPU2006 445.gobmk C Artificial Intelligence: go � Measure CPU performance Measure CPU performance 456.hmmer C Search Gene Sequence � Raw speed of completing a single task 458.sjeng C Artificial Intelligence: chess � Rates of processing many tasks 462.libquantum C Physics: Quantum Computing � CINT2006 - Integer performance 464.h264ref C Video Compression � CFP2006 - Floating-point performance 471.omnetpp C++ Discrete Event Simulation 473.astar C++ Path-finding Algorithms 483.xalancbmk C++ XML Processing

Performance Questions How to characterize the performance of - PowerPoint PPT Presentation

Performance Questions How to characterize the performance of applications and systems? Users requirements in performance and cost? How about performance measurement? How will system perform when having more resources or How will

Questions? Questions? Questions? Questions? Questions? Questions? Questions? Questions?

Now Front and Center #NonprofitProfiles Have questions? Have questions? Have questions? Have

Medicaid Transformation Waiver Update April 26, 2016 Questions and Sound Check Questions Please

QUESTIONS Monday, 19 September 11 QUESTIONS How many of you: Monday, 19 September 11 QUESTIONS

Performance and Scalability (Chapter 11) Performance and Scalability Performance: How long

March 2019 CONTENTS Page Combined Partner Performance 1 Breckland Performance Reports 2-6

Performance Bas Performance Bas Performance Bas Performance Bas ed ed ed ed Methodology for

Verification Verification, Performance Performance Analysis Performance Performance Analysis

QUESTIONS AND ANSWERS Submit questions to: AE.Customer.Service@dot.ca.gov Questions and

Rhetorical Questions Present IDEAS in question forms. Questions create anticipation in the

Lectur Lecture 20: e 20: DC M DC Motor otors Exam Exam 2 Results 2 Results Most M ost

Deer Task Force PRESENTATION PRESENTATION QUESTIONS QUESTIONS about PRESENTATION

2 March 2010 2009 Full Year Results Questions, questions, questions... How long do you expect

Management questions: Are there changes in customer buying behaviour? Research questions:

COMS 4160: Problems and Questions on Rendering Ravi Ramamoorthi Questions and Problems We first

OSMC Performance session July 2016 Agenda Performance Management Framework Overview

Performance of computer systems Many different factors among which: Technology Raw

Web Proxy Caching Model Web Servers Aggregate Proxy Workload server Web Clients Factors and

The Impact of Weights on the Performance of Server Load Balancing (SLB) Systems Jrg Jung

Open Source Database Performance Optimization and Monitoring with PMM Fernando Laudares,

+ Design of Parallel Algorithms Course Introduction + CSE 4163/6163 Parallel Algorithm Analysis

Measuring and Reasoning About Performance Readings: 1.4-1.5 1 Goals for this Class

Performance analysis and performance modeling of web-applications Dr. Heinz Kredel Dr.

Practicing Oblivious Access on Cloud Storage: the Gap, Fallacy, and the New Way Forward Vincent