Improving Performance We want to improve the performance of our - PowerPoint PPT Presentation

17.1 17.2 Improving Performance • We want to improve the performance of our computation Unit 17 • Question: What are we referring to when we say "performance"? – __________________ Improving Performance – __________________ – __________________ Caching and Pipelining • We will primarily consider __________ in this discussion 17.3 17.4 How Do We Measure Speed Performance Depends on View Point?! • Fundamental Measurement : _________ • What's faster to get from point A to point B? – Absolute time from __________ to ___________ – A 747 Jumbo Airliner – To compare two alternative systems (HW + SW) and their – An F-22 supersonic, fighter jet performance, start a timer when you begin a task and stop it when the • If only _______________ to get from point A to point B, then task ends the ___________ – Do this for both systems and compare the resulting times – This is known as _______________ [units of seconds] • We call this the __________ of the system and it works great – Time from the start of an operation until it completes from the perspective of the _______________ task • If _______________ to get from point A to point B, the _____ – If system A completes the task in 2 seconds and system B requires 3 looks much better seconds, then system A is clearly superior – This is known as _______________ [jobs/second] • But when we dig deeper and realize that the single, overall • The overall execution time (latency) may best be improved by task is likely made of _________ small tasks, we can consider _______________ throughput and not the latency of more than just latency individual tasks

17.5 17.6 Hardware Techniques • We can add hardware or reorganize our hardware to improve throughput and latency of individual tasks in an effort to reduce the total latency (time) to finish the overall task • We will look at two examples: – Caching: Improves ______________ Improving Latency and Throughput – Pipelining: Improves ______________ CACHING AND PIPELINING 17.7 17.8 Caching Cache Overview • Remember what register are used for? • Cache (def.) – " to store away in hiding or for future use " Processor Chip – Quick access to copies of data • Primary idea Registers – Only a _______ (32 or 64) so that we can ALUs s0 ALUs – The ______________ you access or use something you expend the access really quickly PC 800a5 sf ________ amount of time to get it – Controlled by the __________________ – However, store it someplace (i.e. in a cache) you can get it more • Cache memory is a small-ish, (____bytes to Cache Memory ______________ the next time you need it a few _____bytes) "_________" memory – The next time you need something check if it is in the cache first usually built onto the processor chip – If it is in the cache, you can get it quickly; else go get it expending the • Will hold ____________ of the full amount of time (but then __________ it in the cache) latest data & instructions Bus accessed by the processor • Examples: 0x400000 • Managed by the ____ – _____________________ 0x400040 – ____________ to the software – _____________________ … – _____________________ Memory (RAM)

17.9 17.10 Cache Operation (1) Cache Operation (2) • When processor wants data or • When processor asks for the Processor Chip instructions it always data again or for the next data Registers Registers ALUs ALUs s0 s0 ALUs ALUs _________ in the cache first value in the array (or PC 800a5 sf PC 800a5 sf Proc. requests Proc. requests instruction of the code) the data @ 0x400028 1 Cache forwards 1 Cache has the 4 2 • If it is there, ______ access data @ 0x400028 desired data data & forwards again cache will likely have it it quickly Cache Memory • If not, get it from __________ Proc. requests 3 4 data @ 0x400024 • Questions? again Cache also • Memory will also supply Cache Cache does not have has the Memory the data and thus nearby data 2 ______________ data since it requests data from 3 memory is likely to be needed soon Memory responds not only with desired data Bus Bus but surrounding data • Why? 0x400000 0x400000 Main point: Caching reduces • Things like ______ & ______ 0x400040 0x400040 the latency of memory (instructions) are commonly accesses which improves … … accessed sequentially overall program performance. Memory (RAM) Memory (RAM) 17.11 17.12 Memory Hierarchy & Caching Pipelining • Use several levels of faster and faster memory to hide _______ • We'll now look at a hardware technique called of larger levels More pipelining to improve _______________ Smaller Faster Expensive Unit of Transfer: • The key idea is to __________ the processing 8- to 64- bits Registers of multiple "items" (either data or L1 Cache instructions) ~ 1ns L2 Cache Unit of Transfer: 8-64 bytes ~ 10ns Main Memory ~ 100 ns Less Slower Larger Expensive

17.13 17.14 Example Pipelining Example • Pipelining refers to insertion of registers to split • Suppose you are asked to build dedicated hardware to combinational logic into smaller stages that can be perform some operation on all 100 elements of some arrays overlapped in time (i.e. create an assembly line) • Suppose the operation (A[i]+B[i])/4 takes 10 ns to perform • How long would it take to process the entire arrays: ______ ns for(i=0; i < 100; i++) C[i] = (A[i] + B[i]) / 4; – Can we improve? for(i=0; i < 100; i++) Clock Freq. = 1/__ns = Time for 0 th elements to _______ MHz C[i] = (A[i] + B[i]) / 4; complete: __________ (longest path from register to register) Time between each of the Memory remaining 99 element A: (Addr. Generator) A[i] completing: ________ i Counter Total: ______________ B: Define: Stage 1 Stage 2 B[i] �� Stage 1 Stage 2 C: �� 5ns 5ns �� 1000�� Clock Cycle 0 A[0] + B[0] ______�� ____ Clock Cycle 1 A[1] + B[1] (A[0] + B[0]) / 4 Clock freq: = _________ Clock Cycle 2 A[2] + B[2] (A[1] + B[1]) / 4 17.15 17.16 Need for Registers Pipelining Example • By adding more pipelined stages we can improve throughput • Provides separation between combinational functions • Have we affected the latency of processing individual – Without registers, fast signals could “catch-up” to data values in the elements? ____________ next operation stage • Questions/Issues? for(i=0; i < 100; i++) C[i] = (A[i] + B[i]) / 4; – ____________ stage delays – ___________ of registers (Not free to split stages) Time for 0 th elements to 5 ns complete: __________ • This limits how much we can split our logic Signal i Signal j Time between each of the 2 ns remaining 99 element completing: ________ Total: ______________ CLK CLK �� 1000�� 257.5�� 4" Performing an We don’t want signals from two operation yields different data values mixing. signals with different Therefore we must collect and paths and delays synchronize the values from Stage 1 Stage 2 Stage 3 Stage 4 the previous operation before passing them on to the next 2.5ns 2.5ns 2.5ns 2.5ns

17.17 17.18 Non-Pipelined Processors Pipelined Processors • Currently we know our processors execute software • By breaking our processor hardware for instruction execution 1 instruction at a time into stages we can overlap these stages of work • 3 steps/stages of work for each instruction are: • Latency for a single instruction is the _____________ • Overall throughput, and thus total latency, are greatly – ___________ improved – ___________ – ___________ time time instruc. i F D E instruc. i F D E instruc. i+1 F D E instruc. i+1 F D E instruc. i+2 instruc. i+2 instruc. i+3 17.19 17.20 More and More Stages Summary • We can break the basic stages of work into • By investing extra hardware we can improve the substages to get better performance • In doing so our clock period goes ______; overall latency of computation frequency goes _____ • Measures of performance: All kinds of interesting issues come up • though when we overlap instructions and – Latency is start to finish time are discussed in future CENG courses – Throughput is tasks completed per unit time (measure of parallelism) Clock freq. = 1/__ns = ___MHz Clock freq. = 1/10ns = 100MHz • Caching reduces latency by holding data we will use time time in the future in quickly accessible memory 10ns 10ns 10ns 5ns 5ns5ns 5ns5ns 5ns instruc. i F D E F1F2D1D2E1E2 instruc. i • Pipelining improves throughput by overlapping F D E F1F2D1D2E1E2 instruc. i+1 instruc. i+1 processing of multiple items (i.e. an assembly line) F1F2D1D2E1E2 instruc. i+2 F D E instruc. i+2 F1F2D1D2E1E2 instruc. i+3 F D E instruc. i+3

Improving Performance We want to improve the performance of our - PowerPoint PPT Presentation

17.1 17.2 Improving Performance We want to improve the performance of our computation Unit 17 Question: What are we referring to when we say "performance"? Improving Performance

Improving Improving Finances, Finances, Improving Improving Lives Lives www.jeanchatzky.com

Pennine Acute Hospitals NHS Trust: Improvement Journey 1 Pennine Improvement Plan Improving

for innovation improving for innovation improving Design Thinking for innovation improving New

Improving the performance of the qcow2 format KVM Forum 2017 Alberto Garcia

Improving performance for Improving performance for security enabled web security enabled web

CSE 504: Project Proposal Jennifer Niederlnder 01/13/2016 Improving Security Testing

Improving Outcomes and Controlling Costs: Improving Outcomes and Controlling Costs: Improving

Services in Portsmouth date Improving health services Improving health services Improving

Bending the Cost Curve and Improving Bending the Cost Curve and Improving Bending the Cost Curve

Duke iGEM 2014 Methodology Scaling up Synthetic Biology Improving Improving Improving CRISPR

Making better decisions and improving Making better decisions and improving performance

Progress with improving electricity Progress with improving electricity industry performance 1

Plymouth Larissa Milden, Active for All Service Manager What is Im Improving Lives Plymouth?

Creating a Healthy Beach Community 60 Years of Improving Health 60 Years of Improving Health 60

Annual General Meeting Wednesday 2 nd October 2013 Improving health services Improving health

Improving the performance of data servers on multicore architectures Fabien Gaud Grenoble

Supersonic propagation in long-range lattice models Michael Kastner GGI Florence, 29 May 2014

A Quick Look At Low Mach Number Methodology Ann Almgren Center for Computational Sciences and

Speed of sound Introduc)on to Aeronau)cal Engineering Ir. Nando

2012 AUSTRALIAN TUTORING ASSOCIATION (ATA) LTD ANNUAL GENERAL MEETING 2013 6 C O L L E G E S

Multi-Dimensional Gas Flows Tai-Ping Liu Academia Sinica, Taiwan Stanford University Final

Steady Deflagration Structure in Two-Phase Granular Propellants Joseph M. Powers 1 , Mark E.

Slide 1 / 102 Slide 2 / 102 8th Grade Wave Properties Classwork-Homwork Slides 2015-10-15

Stability of supersonic flow onto a wedge with the attached weak shock under the fulfillment of

Improving Performance We want to improve the performance of our - PowerPoint PPT Presentation

17.1 17.2 Improving Performance We want to improve the performance of our computation Unit 17 Question: What are we referring to when we say "performance"? __________________ Improving Performance __________________

Improving Improving Finances, Finances, Improving Improving Lives Lives www.jeanchatzky.com

Pennine Acute Hospitals NHS Trust: Improvement Journey 1 Pennine Improvement Plan Improving

for innovation improving for innovation improving Design Thinking for innovation improving New

Improving the performance of the qcow2 format KVM Forum 2017 Alberto Garcia

Improving performance for Improving performance for security enabled web security enabled web

CSE 504: Project Proposal Jennifer Niederlnder 01/13/2016 Improving Security Testing

Improving Outcomes and Controlling Costs: Improving Outcomes and Controlling Costs: Improving

Services in Portsmouth date Improving health services Improving health services Improving

Bending the Cost Curve and Improving Bending the Cost Curve and Improving Bending the Cost Curve

Duke iGEM 2014 Methodology Scaling up Synthetic Biology Improving Improving Improving CRISPR

Making better decisions and improving Making better decisions and improving performance

Progress with improving electricity Progress with improving electricity industry performance 1

Plymouth Larissa Milden, Active for All Service Manager What is Im Improving Lives Plymouth?

Creating a Healthy Beach Community 60 Years of Improving Health 60 Years of Improving Health 60

Annual General Meeting Wednesday 2 nd October 2013 Improving health services Improving health

Improving the performance of data servers on multicore architectures Fabien Gaud Grenoble

Supersonic propagation in long-range lattice models Michael Kastner GGI Florence, 29 May 2014

A Quick Look At Low Mach Number Methodology Ann Almgren Center for Computational Sciences and

Speed of sound Introduc)on to Aeronau)cal Engineering Ir. Nando

2012 AUSTRALIAN TUTORING ASSOCIATION (ATA) LTD ANNUAL GENERAL MEETING 2013 6 C O L L E G E S

Multi-Dimensional Gas Flows Tai-Ping Liu Academia Sinica, Taiwan Stanford University Final

Steady Deflagration Structure in Two-Phase Granular Propellants Joseph M. Powers 1 , Mark E.

Slide 1 / 102 Slide 2 / 102 8th Grade Wave Properties Classwork-Homwork Slides 2015-10-15

Stability of supersonic flow onto a wedge with the attached weak shock under the fulfillment of

17.1 17.2 Improving Performance We want to improve the performance of our computation Unit 17 Question: What are we referring to when we say "performance"? Improving Performance