Duet Benchmarking Improving Measurement Accuracy in the Cloud - PowerPoint PPT Presentation

Duet Benchmarking Improving Measurement Accuracy in the Cloud Lubomír Bulej François Farquet Vojtěch Horký Aleksandar Prokopec Petr Tůma

Software Regression Testing … … of Performance

Common Testing Pipeline Write Check Write Check commit Code Out Code Out hook commit Write Write Code Code Code Build More Build More Repository Repository Repository Code Code commit Write Run Write Run Even Tests Even Tests More More Code Pass or fail Code Pass or fail verdict verdict commit

Project Context Graal Java JIT+AOT Compiler • Currently ~5 merge commits per day • Bare minimum testing JDK 8 + JDK 11 • Running ~60 standard benchmarks • Minimum warm up time 5 minutes Use Use • Minimum 10 executions more more Skip Skip machines machines some some 5 x 2 x 60 x 5 x 10 commits 5 x 2 x 60 x 5 x 10 commits = 30000 minutes = 30000 minutes Skip Skip = ~21 days = ~21 days some some benchmarks benchmarks

Where to Go for More Machines ? … to the Cloud !

Cloud Resource Sharing Amazon Elastic Cloud Instance Types • t3.nano 2 vCPU @ 5% power, 512MB RAM • t3.medium 2 vCPU @ 20% power, 4GB RAM • m5.large 2 vCPU 8GB RAM • ... This might perhaps This might perhaps Or you can forgo the virtualization somewhat disrupt somewhat disrupt • m5.metal 96 threads 48 cores 384GB RAM measurements measurements • Likely the same Intel Xeon Platinum 8175M Envelope estimate • CPU 48 cores / 5% = 960 instances • RAM 384 GB / 512 MB = 768 instances

… Effect of Resource Sharing 99% CI for the mean 99% CI for the mean is ~61% bigger is ~61% bigger

… Effect of Resource Sharing 99% CI for the mean 99% CI for the mean is ~1800% bigger is ~1800% bigger

Resource Management ... … Should Be Fair !

Is Resource Management Fair ? Hyperthreading • Intel says it “maximizes use of execution units” Bursty processor scheduling • Amazon says “one CPU credit is equal to 100% utilization for one minute” (in any combination) and “credits are accrued and spent at millisecond resolution” Memory caches ? Would it be fine Would it be fine Memory bandwidth ? if some instances if some instances Thermal budget ? were systematically were systematically disadvantaged ? disadvantaged ?

Two Measurements In Parallel Both workloads Both workloads fluctuate together fluctuate together Measured on Measured on GitLab CI GitLab CI

How To Use This ? Look at ratios instead of absolute values • Assumes effects are multiplicative • Ratios are what people want to know “We want to reliably detect 5% slowdowns ...” Confidence intervals using bootstrap Compare with sequential measurements • Confidence interval width relative to mean • Not quite apples-to-apples but gives some intuition

How Much More Accurate ? ScalaBench ~2.3x ScalaBench ~2.3x SPEC CPU ~27x SPEC CPU ~27x ScalaBench ~9.1x ScalaBench ~9.1x ScalaBench ~12x ScalaBench ~12x SPEC CPU ~24x SPEC CPU ~24x

… More Done Does duet benchmarking work because of synchronized interference ? Does duet benchmarking address interference due to resource sharing ? Does duet benchmarking measure performance differences accurately ? …

Thank You ! Complete paper at https://arxiv.org/abs/2001.05811 For more information visit http://d3s.mff.cuni.cz

Duet Benchmarking Improving Measurement Accuracy in the Cloud - PowerPoint PPT Presentation

Duet Benchmarking Improving Measurement Accuracy in the Cloud Lubomr Bulej Franois Farquet Vojtch Hork Aleksandar Prokopec Petr Tma Software Regression Testing of Performance Common Testing Pipeline Write Check Write

B3 Benchmarking B3 Building Benchmarking Program Overview www.CleanEnergyResourceTeams.org B3

Benchmarking Lunch-n-Learn March 18, 2019 Agenda 1. Why Benchmarking? 2. Introduction to

Synchronization in duet performance: Testing the two-person phase error correction model Dirk

Matrix-correlated random variables: A statistical physics and signal processing duet Florian

Matrix-correlated random variables: A statistical physics and signal processing duet Florian

The greatest thing youll ever learn, is just to love and be loved in return (Moulin

DUET Investor Presentation November 2005 a~

Early Engagement, Future Success October 19, 2016 Samia Bristow Kathrin Brellochs Marjie Mogul

Ion Beam Facilities in Japan (TIARA/DuET/HIT) Eiichi Wakai (MLF, JPARC Center, JAEA ) R R a

Ubiquitous and Mobile Computing CS 528: Duet: Exploring Joint Interactions on a Smart Phone and a

Duet Making Localization Work for Smart Homes Shichao Yue Presenting on behalf of Deepak

2015 Benchmarking & Data Management April 15, 2015 PSTA Runs on Data Highlights from 2015 1.

Autonomous Driving on Benchmarks Xiaodi Hou TWO DECADES OF BENCHMARKING Two decades of

PMPA/MPI Statistics and PMPA/MPI Statistics and Benchmarking Project Benchmarking Project Magda

MSA Benchmarking Daniel Yuan and Stanley Liu Intro Benchmarking 6 MSA software 3

President and CEO CFO Source: Benchmarking Alliance Source: Benchmarking Alliance

Call Semantics Finding Remote Objects Method Call Semantics what does it mean to It

Finding Concurrency Bugs in Java David Hovemeyer and William Pugh July 25, 2004 David Hovemeyer

Generic Types in Java 4003-232-06 (Winter 2006-2007) Week 5: Generics, (Ch. 21 in Liang) Java

OpenJDK & What it means for the Java Developer Dalibor Topi Java F/OSS Ambassador Sun

JavaFX Basics Lecture 7 JavaFX Basics February 27, 2017 1 Wentworth Institute of Technology

Recap: What an object can do Label objects for example Public interface what clients need

1 LinkedList: doubly-linked list (cont.) Removing an element from a linked list List <E>

Philly Java Users Group Whats new in Whats new in Java 2 Standard Edition 1.4 Java 2

Duet Benchmarking Improving Measurement Accuracy in the Cloud - PowerPoint PPT Presentation

Duet Benchmarking Improving Measurement Accuracy in the Cloud Lubomr Bulej Franois Farquet Vojtch Hork Aleksandar Prokopec Petr Tma Software Regression Testing of Performance Common Testing Pipeline Write Check Write

B3 Benchmarking B3 Building Benchmarking Program Overview www.CleanEnergyResourceTeams.org B3

Benchmarking Lunch-n-Learn March 18, 2019 Agenda 1. Why Benchmarking? 2. Introduction to

Synchronization in duet performance: Testing the two-person phase error correction model Dirk

Matrix-correlated random variables: A statistical physics and signal processing duet Florian

Matrix-correlated random variables: A statistical physics and signal processing duet Florian

The greatest thing youll ever learn, is just to love and be loved in return (Moulin

DUET Investor Presentation November 2005 a~

Early Engagement, Future Success October 19, 2016 Samia Bristow Kathrin Brellochs Marjie Mogul

Ion Beam Facilities in Japan (TIARA/DuET/HIT) Eiichi Wakai (MLF, JPARC Center, JAEA ) R R a

Ubiquitous and Mobile Computing CS 528: Duet: Exploring Joint Interactions on a Smart Phone and a

Duet Making Localization Work for Smart Homes Shichao Yue Presenting on behalf of Deepak

2015 Benchmarking &amp; Data Management April 15, 2015 PSTA Runs on Data Highlights from 2015 1.

Autonomous Driving on Benchmarks Xiaodi Hou TWO DECADES OF BENCHMARKING Two decades of

PMPA/MPI Statistics and PMPA/MPI Statistics and Benchmarking Project Benchmarking Project Magda

MSA Benchmarking Daniel Yuan and Stanley Liu Intro Benchmarking 6 MSA software 3

President and CEO CFO Source: Benchmarking Alliance Source: Benchmarking Alliance

Call Semantics Finding Remote Objects Method Call Semantics what does it mean to It

Finding Concurrency Bugs in Java David Hovemeyer and William Pugh July 25, 2004 David Hovemeyer

Generic Types in Java 4003-232-06 (Winter 2006-2007) Week 5: Generics, (Ch. 21 in Liang) Java

OpenJDK &amp; What it means for the Java Developer Dalibor Topi Java F/OSS Ambassador Sun

JavaFX Basics Lecture 7 JavaFX Basics February 27, 2017 1 Wentworth Institute of Technology

Recap: What an object can do Label objects for example Public interface what clients need

1 LinkedList: doubly-linked list (cont.) Removing an element from a linked list List &lt;E&gt;

Philly Java Users Group Whats new in Whats new in Java 2 Standard Edition 1.4 Java 2

2015 Benchmarking & Data Management April 15, 2015 PSTA Runs on Data Highlights from 2015 1.

OpenJDK & What it means for the Java Developer Dalibor Topi Java F/OSS Ambassador Sun

1 LinkedList: doubly-linked list (cont.) Removing an element from a linked list List <E>