Five ways not to fool yourself Tim Harris 23-Jun-18 Five ways not - PowerPoint PPT Presentation

Five ways not to fool yourself Tim Harris 23-Jun-18

Five ways not to fool yourself “A pragmatic implementation of non-blocking linked lists”, Tim Harris, DISC 2001

Five ways not to fool yourself 1. Measure as you go

Starting and stopping work • How much work to do? Short runs Long runs Too little: results dominated by start-up effects. Normalized metrics vary as you vary the duration.

Starting and stopping work • How much work to do? Short runs Long runs Too little: results OK: results not dominated by start-up sensitive to the exact effects. Normalized choice of settings. metrics vary as you Confirm this: double / vary the duration. halve duration with no change.

Starting and stopping work • How much work to do? Short runs Long runs Too little: results OK: results not Unnecessarily long – dominated by start-up sensitive to the exact deters experimentation, effects. Normalized choice of settings. and risks errors from metrics vary as you Confirm this: double / mixing up results from vary the duration. halve duration with no different runs change.

Constant load Constant work 1 2 3 4 5 6 7 8 9 10 11 Fixed set of threads active throughout Fixed amount of work (e.g., loop the measurement interval. Measure iterations). Measure the time taken to the work they do. perform it. Vary the number of threads.

Plot what you measure, not what you configure “Bind threads 1 Have each thread report per socket” where it is running “Run for 10s” Record time at start & end “Use 50% reads” Measured #reads/#ops “Distribute memory Actual locations and across the machine” page sizes used

Five ways not to fool yourself 1. Measure as you go 2. Include lightweight sanity checks

Be skeptical about the results

Be skeptical about the results • Is the harness running what you intend it to run? – Incorrect algorithms are often faster – Good practice: do not print any output until you have confidence in the result

Be skeptical about the results • Does the data structure pass simple checks? – Start with N items, insert P, delete M, check that we have N+P-M at the end – Suppose we are building a balanced binary tree – is it actually balanced at the end? – Suppose we have a vector of N items and swap pairs of items – do we have N distinct items at the end?

Five ways not to fool yourself 1. Measure as you go 2. Include lightweight sanity checks 3. Understand the simple cases first

Skip-list, 100 % read only, 2*Haswell Normalize to optimized sequential code (and 0.8 report absolute baseline). Self-relative scaling is almost never a good metric to use. 0.7 Normalized throughput 0.6 0.5 0.4 0.3 Why isn’t this a 0.2 horizontal line? 0.1 0.0 1 2 4 8 16 32 64 128 Threads

Skip-list, 100 % read only, 2*Haswell 1.0 With Turbo Boost. 0.9 0.8 Normalized throughput 0.7 0.6 Fixed. Without Turbo Boost. 0.5 0.4 0.3 0.2 0.1 0.0 1 2 4 8 16 32 64 128 Threads

Five ways not to fool yourself 1. Measure as you go 2. Include lightweight sanity checks 3. Understand the simple cases first 4. Look beyond timing

Look beyond timing • Try to link: – Performance measurements from an experiment – Measurements of resource use during the experiment – Differences between the algorithms being executed

Resource utilization • Examine the use of significant resources in the machine – Bandwidth to and from memory – Bandwidth use on the interconnect – Instruction execution rate • Clock frequency and power settings • Look for evidence of bad behavior – High page fault rate (i.e., going to disk) – High TLB miss rate

Thread placement • Choice between OS-control threading versus pinning • Real workloads run with OS-controlled threading – …but OS-controlled threading can be sensitive to blocking / wake-up behavior, thread creation order, prior machine state, …. • Deliberately explore different pinned placements, and quantify impact – Are differences between algorithms consistent across these runs? • In experiments compare: – OS (report version) – Different pinning choices (how many sockets used, how many cores per socket, what order are h/w threads used)?

Memory placement • How are we distributing memory across sockets? • How is the load distributed over memory channels? • How is memory being allocated / deallocated?

Unfairness • Look across all of the threads: did they complete the same amount of work? • Trade-offs between unfairness and aggregate throughput – Unfairness may correlate with better LLC behavior – Threads running nearby synchronize more quickly, and get to complete more work • Whether we care about unfairness in itself depends on the workload – Threads serving different clients: may want even response time – Threads completing a batch of work: just care about overall completion time

Unfairness: simple test-and-test-and-set lock • 2-socket Haswell, threads pinned sequentially to cores in both sockets 50.0 45x, not 45%! 45.0 Operations per thread 40.0 normalized to main 35.0 30.0 25.0 20.0 15.0 10.0 5.0 0.0 H/W thread number (0..36)

Five ways not to fool yourself 1. Measure as you go 2. Include lightweight sanity checks 3. Understand the simple cases first 4. Look beyond timing 5. Move toward production settings

Concluding comments • We optimize for what we measure, or measure what we optimized – Why pick specific workloads (read/write mix, key space, … ?) – Does the choice reflect an important workload? – Are results sensitive to the choice? • Be careful about averages – As with fairness over threads, an average over time hides details – Even if you do not plot all the results, examine trends over time, variability, etc. • Be careful about trade-offs – Is a new system strictly better, or exploring a new point in a trade-off?

Further reading • Books – Huff & Geis – “How to Lie with Statistics” – Jain – “The Art of Computer Systems Performance Analysis: Techniques for Experimental Design, Measurement, Simulation, and Modeling” – Tufte – “The Visual Display of Quantitative Information” • Papers and articles – Bailey – “Twelve Ways to Fool the Masses” – Fleming & Wallace – “How not to lie with statistics: the correct way to summarize benchmark results” – Heiser – “Systems Benchmarking Crimes” – Hoefler & Belli – “Scientific Benchmarking of Parallel Computing Systems”

Five ways not to fool yourself Tim Harris 23-Jun-18 Five ways not - PowerPoint PPT Presentation

Five ways not to fool yourself Tim Harris 23-Jun-18 Five ways not to fool yourself A pragmatic implementation of non-blocking linked lists, Tim Harris, DISC 2001 Five ways not to fool yourself 1. Measure as you go Starting and stopping

Atheist Day What Type of Fool Are You? The Fool Proper The fool says in his heart, There

Lesson 2 Greek Vocabulary One does not equal five!!! One does not equal five!!! One does not

Deception and Estimation: Deception and Estimation: How We Fool Ourselves How We Fool Ourselves

He who asks is a fool for five CSE527 minutes, but he who does not Computational Biology ask

He who asks is a fool for five CSEP590A minutes, but he who does not Computational Biology ask

He who asks is a fool for five CSE427 minutes, but he who does not Computational Biology ask

RUS RUSSI SIAN AN VI VISUAL SUAL CULTURE Jos Alaniz University of Washington, Seattle THE

Fool Proof Luke 24:13-35 April 1, 2018 1957 Swiss Spaghetti Harvest 1976 Zero-G Day

Five Winds The concept of the cluster development of the internal and inbound tourism in

FIVE STAR CHAPTER EXCELLENCE 2.0 From Good to Great Five Star Chapter Excellence 2.0 Agenda

CSEP 527 Computational Biology http://courses.cs.washington.edu/courses/csep527/16sp Larry Ruzzo

CSE 527 Computational Biology http://www.cs.washington.edu/527 Lecture 1: Overview & Bio

CSEP 527 Computational Biology http://courses.cs.washington.edu/courses/csep527/18wi Larry Ruzzo

CSE427 Computational Biology http://www.cs.washington.edu/427 Larry Ruzzo Winter 2008 UW CSE

CSE527 Computational Biology http://www.cs.washington.edu/527 Larry Ruzzo Autumn 2007 UW CSE

CSE 427 Computational Biology http://courses.cs.washington.edu/courses/cse427 Larry Ruzzo Winter

CloudLeak: Large-Scale Deep Learning Models Stealing Through Adversarial Examples Honggang Yu 1 ,

by learning the Distributions of Adversarial Examples Boqing Gong Joint work with Yandong Li,

Words & their Meaning: Distributional Semantics CMSC 470 Marine Carpuat Slides credit: Dan

OpenStack based telco clouds Todays Panel Sunil Sood Moderator Ericsson VP, IT & Cloud

Outline Introduction Dynamic Symbolic Execution Binsec/SE Demo CEA - - 2/11 Introduction The

Joey Stanley University of Georgia joeystan@uga.edu @joey_stan joeystanley.com Special thanks

Robustness and geometry of deep neural networks Alhussein Fawzi DeepMind May 23rd 2019 The

Agile Adoption and Parenting Max Keeler September 28, 2009 The Goal 2 September 28, 2009

Five ways not to fool yourself Tim Harris 23-Jun-18 Five ways not - PowerPoint PPT Presentation

Five ways not to fool yourself Tim Harris 23-Jun-18 Five ways not to fool yourself A pragmatic implementation of non-blocking linked lists, Tim Harris, DISC 2001 Five ways not to fool yourself 1. Measure as you go Starting and stopping

Atheist Day What Type of Fool Are You? The Fool Proper The fool says in his heart, There

Lesson 2 Greek Vocabulary One does not equal five!!! One does not equal five!!! One does not

Deception and Estimation: Deception and Estimation: How We Fool Ourselves How We Fool Ourselves

He who asks is a fool for five CSE527 minutes, but he who does not Computational Biology ask

He who asks is a fool for five CSEP590A minutes, but he who does not Computational Biology ask

He who asks is a fool for five CSE427 minutes, but he who does not Computational Biology ask

RUS RUSSI SIAN AN VI VISUAL SUAL CULTURE Jos Alaniz University of Washington, Seattle THE

Fool Proof Luke 24:13-35 April 1, 2018 1957 Swiss Spaghetti Harvest 1976 Zero-G Day

Five Winds The concept of the cluster development of the internal and inbound tourism in

FIVE STAR CHAPTER EXCELLENCE 2.0 From Good to Great Five Star Chapter Excellence 2.0 Agenda

CSEP 527 Computational Biology http://courses.cs.washington.edu/courses/csep527/16sp Larry Ruzzo

CSE 527 Computational Biology http://www.cs.washington.edu/527 Lecture 1: Overview &amp; Bio

CSEP 527 Computational Biology http://courses.cs.washington.edu/courses/csep527/18wi Larry Ruzzo

CSE427 Computational Biology http://www.cs.washington.edu/427 Larry Ruzzo Winter 2008 UW CSE

CSE527 Computational Biology http://www.cs.washington.edu/527 Larry Ruzzo Autumn 2007 UW CSE

CSE 427 Computational Biology http://courses.cs.washington.edu/courses/cse427 Larry Ruzzo Winter

CloudLeak: Large-Scale Deep Learning Models Stealing Through Adversarial Examples Honggang Yu 1 ,

by learning the Distributions of Adversarial Examples Boqing Gong Joint work with Yandong Li,

Words &amp; their Meaning: Distributional Semantics CMSC 470 Marine Carpuat Slides credit: Dan

OpenStack based telco clouds Todays Panel Sunil Sood Moderator Ericsson VP, IT &amp; Cloud

Outline Introduction Dynamic Symbolic Execution Binsec/SE Demo CEA - - 2/11 Introduction The

Joey Stanley University of Georgia joeystan@uga.edu @joey_stan joeystanley.com Special thanks

Robustness and geometry of deep neural networks Alhussein Fawzi DeepMind May 23rd 2019 The

Agile Adoption and Parenting Max Keeler September 28, 2009 The Goal 2 September 28, 2009

CSE 527 Computational Biology http://www.cs.washington.edu/527 Lecture 1: Overview & Bio

Words & their Meaning: Distributional Semantics CMSC 470 Marine Carpuat Slides credit: Dan

OpenStack based telco clouds Todays Panel Sunil Sood Moderator Ericsson VP, IT & Cloud