Wake Up and Smell The Coffee Evaluation Methodology for the 21 st - PowerPoint PPT Presentation

Wake Up and Smell The Coffee Evaluation Methodology for the 21 st Century Stephen M Blackburn, Kathryn S McKinley, Robin Garner, Chris Hoffmann, Asjad M Khan, Rotem Bentzur, Amer Diwan, Daniel Feinberg, Daniel Frampton, Samuel Z Guyer, Martin Hirzel, Antony Hosking, Maria Jump, Han Lee, J Eliot B Moss, Aashish Phansalkar, Darko Stefanovic, Thomas VanDrunen, Daniel von Dincklage, Ben Wiedermann

Wake Up and Smell The Coffee 2

“…improves throughput by up to 41x” “speed up by 10-25% in many cases…” “…about 2x in two cases…” “…more than 10x in two small benchmarks” “speedups of 1.2x to 6.4x on a variety of benchmarks” “can reduce garbage collection time by 50% to 75%” “…demonstrating high efficiency and scalability” “our prototype has usable performance” There are lies, damn lies, and benchmarks “sometimes more than twice as fast” “our algorithm is highly efficient” “garbage collection degrades performance by 70%” “speedups…. are very significant (up to 54-fold)” “our …. is better or almost as good as …. across the board” “the overhead …. is on average negligible” Wake Up and Smell The Coffee 3

The success of most systems innovation hinges on benchmark performance. Predicate 1. Benchmarks reflect current (and ideally, future) reality. Predicate 2. Methodology is appropriate. Wake Up and Smell The Coffee 4

Benchmarks & Reality 1. JVM design & implementation – SPECjvm98 is small and jbb is relatively simple • Q: What has this done to GC research? • Q: What has this done to compiler research? 2. Computer architecture – ISCA & Micro rely on SPEC CPU • Q: What does this mean for Java and C# performance on modern architectures? 3. C# – Public benchmarks are almost non-existant • Q: How has this impacted research? Wake Up and Smell The Coffee 5

Benchmarks & Methodology • We’re not in Kansas anymore! – JIT compilation, GC, dynamic checks, etc • Methodology has not adapted – Needs to be codified and mandated “…this sophistication provides a significant challenge to understanding complete system performance, not found in traditional languages such as C or C++” [Hauswirth et al OOPSLA ’04] Wake Up and Smell The Coffee 6

Benchmarks & Methodology 1.8 System A Normalized Time 1.6 System B 1.4 System C 1.2 1 0.8 0.6 0.4 0.2 0 • Comprehensive comparison – 3 state-of-the-art JVMs – Best of 5 executions – 19 benchmarks – 1 platform • 3 students perform the same evaluation… Wake Up and Smell The Coffee 7

Benchmarks & Methodology 1.8 System A Normalized Time 1.6 System B 1.4 System C 1.2 1 0.8 0.6 0.4 0.2 0 2 System A 1.8 Normalized Time 1.6 System B 1.4 System C 1.2 1 0.8 0.6 0.4 0.2 0 2 System A 1.8 Normalized Time 1.6 System B 1.4 System C 1.2 1 0.8 0.6 0.4 0.2 0 Wake Up and Smell The Coffee 8

Benchmarks & Methodology 1.8 System A Normalized Time 1.6 System B 1.4 1st iteration System C 1.2 1 0.8 0.6 0.4 0.2 0 2 System A 1.8 Normalized Time 1.6 System B 1.4 System C 1.2 • Comprehensive comparison 1 0.8 0.6 0.4 – 3 state-of-the-art JVMs 0.2 0 – Best of 5 executions 2 – 19 benchmarks System A 1.8 Normalized Time 1.6 System B 1.4 System C – 1 platform 1.2 1 0.8 0.6 0.4 0.2 0 Wake Up and Smell The Coffee 9

Benchmarks & Methodology SPEC _209_db 1.35 1.3 1.25 Normalized Time 1.2 1.15 1.1 1.05 1 System A System B Wake Up and Smell The Coffee 10

Benchmarks & Methodology SPEC _209_db SPEC _209_db 1.35 1.2 1.3 1.15 1.25 Normalized Time Normalized Time 1.1 1.2 1.15 1.05 1.1 1 1.05 1 0.95 System A System B System A System B Another evaluation of the same systems, same hardware, same iteration measured…. Wake Up and Smell The Coffee 11

Benchmarks & Methodology SPEC _209_db 1.35 Normalized Time SPEC _209_db 1.3 1.3 1.25 1.2 System A 1.15 1.25 System B 1.1 Normalized Time 1.05 1.2 1 System A System B 1.15 1.1 1.05 SPEC _209_db 1.2 Normalized Time 1.15 1 1.1 20 40 60 80 100 120 1.05 Heap Size (MB) 1 0.95 System A System B Wake Up and Smell The Coffee 12

Benchmarks & Methodology 4.50 1st JVM A 2nd JVM A 3rd JVM A 4.00 Time (Normalized) 1st JVM B 2nd JVM B 3rd JVM B 3.50 3.00 2.50 2.00 1.50 1.00 0.50 antlr bloat chart eclipse fop hsqldb jython lusearch luindex pmd xalan min max geomean 5.50 1st JVM A 2nd JVM A 3rd JVM A Time (Normalized) 1st JVM B 2nd JVM B 3rd JVM B 4.50 3.50 2.50 1.50 0.50 antlr bloat chart eclipse fop hsqldb jython lusearch luindex pmd xalan min max geomean 3.50 1st JVM A 2nd JVM A 3rd JVM A Time (Normalized) 3.00 1st JVM B 2nd JVM B 3rd JVM B 2.50 2.00 1.50 1.00 0.50 antlr bloat chart eclipse fop hsqldb jython lusearch luindex pmd xalan min max geomean 13 Wake Up and Smell The Coffee

Benchmarks & Methodology 4.50 1st JVM A 2nd JVM A 3rd JVM A 4.00 Time (Normalized) 1st JVM B 2nd JVM B 3rd JVM B 3.50 Pentium M 3.00 2.50 2.00 1.50 1.00 0.50 antlr bloat chart eclipse fop hsqldb jython lusearch luindex pmd xalan min max geomean 5.50 1st JVM A 2nd JVM A 3rd JVM A Time (Normalized) 1st JVM B 2nd JVM B 3rd JVM B 4.50 AMD Athlon 3.50 2.50 1.50 0.50 antlr bloat chart eclipse fop hsqldb jython lusearch luindex pmd xalan min max geomean 3.50 1st JVM A 2nd JVM A 3rd JVM A Time (Normalized) 3.00 1st JVM B 2nd JVM B 3rd JVM B SPARC 2.50 2.00 1.50 1.00 0.50 antlr bloat chart eclipse fop hsqldb jython lusearch luindex pmd xalan min max geomean 14 Wake Up and Smell The Coffee

The success of most systems innovation hinges on benchmark performance. Predicate 1. Benchmarks reflect current (and ideally, future) reality. Predicate 2. Methodology is appropriate. Wake Up and Smell The Coffee 15

The success of most systems innovation hinges on benchmark performance. ✘ Predicate 1. Benchmarks reflect current ✘ (and ideally, future) reality. Predicate 2. Methodology is appropriate. Wake Up and Smell The Coffee 16

? The success of most systems innovation hinges on benchmark performance. ✘ Predicate 1. Benchmarks reflect current ✘ (and ideally, future) reality. Predicate 2. Methodology is appropriate. Wake Up and Smell The Coffee 17

Innovation Trap • Innovation is gated by benchmarks • Poor benchmarking retards innovation – Reality: inappropriate, unrealistic benchmarks – Reality: poor methodology • Concrete, contemporary instances – Architectural tuning to managed languages – Software transactional memory – C# – GC avoided in SPEC performance runs Wake Up and Smell The Coffee 18

How Did This Happen? • Researchers depend on SPEC – Primary purveyor & de facto guardian – Industry body – Concerned with product comparison • Little involvement from researchers – Historically C & Fortran benchmarks • Did not update/adapt methodology for Java • Researchers tend not to create their own suites – Enormously expensive exercise Wake Up and Smell The Coffee 19

Enough Whining. How Do We Respond? • Critique our benchmarks & methodology – Not enough to “set the bar high” when reviewing! – Need appropriate benchmarks & methodology • Develop new benchmarks – NSF review challenged us • Maintain and evolve those benchmarks • Establish new, appropriate methodologies • Attack problem as a community – Formally (SIGs?) and ad hoc (e.g. DaCapo) Wake Up and Smell The Coffee 20

The DaCapo Suite Background & Scope • Motivation (mid 2003) – We wanted to do good Java runtime and compiler research – An NSF review panel agreed that the existing Java benchmarks were limiting our progress • Non-goal: product comparison (SPEC does a fine job) • Scope – Client-side, real-world, measurable Java applications • Real world data and coding idioms, manageable dependencies • Two-pronged effort – New candidate benchmarks – New suite of analyses to characterize candidates Wake Up and Smell The Coffee 21

The DaCapo Suite: Goals • Open source – Encourage (& leverage) community feedback – Enable analysis of benchmark sources – Freely available, avoid intellectual property restrictions • Real, non-trivial applications – Popular, non-contrived, active applications – Use analysis to ensure non-trivial, good coverage • Responsive, not static – Adapt the suite as circumstances change • Easy to use Wake Up and Smell The Coffee 22

The DaCapo Suite: Today • Open source (www.dacapobench.org) • Significant community-driven improvements already • 11 real, non-trivial applications – Compared to JVM98, JBB2000, on average: • 2.5 X classes, 4 X methods, 3 X DIT, 20 X LCOM, 2 X optimized methods, 5 X icache load, 8 X ITLB, 3 X running time, 10 X allocations, 2 X live size. – Uncovered bugs in product JVMs • Responsive, not static – Have adapted the suite • Easy to use – Single jar file, OS-independent, output validation Wake Up and Smell The Coffee 23

Some of our Analyses Wake Up and Smell The Coffee 24

Wake Up and Smell The Coffee Evaluation Methodology for the 21 st - PowerPoint PPT Presentation

Wake Up and Smell The Coffee Evaluation Methodology for the 21 st Century Stephen M Blackburn, Kathryn S McKinley, Robin Garner, Chris Hoffmann, Asjad M Khan, Rotem Bentzur, Amer Diwan, Daniel Feinberg, Daniel Frampton, Samuel Z Guyer, Martin

WAKE TRANSIT PLAN Summer 2018 Planning for growth WAKE COUNTYs population already exceeds ONE

Historic Landmark Designation Public Hearing AP~A I WAKE COUNTY Purpose: Wake County Historic

Wake Transi sit Dr Draft Wor ork Pl Plan Summary - Fisc scal Year 2020 - WAKE COUNTY IS

Systems Neuroscience Nov. 22, 2016 Taste and Smell Daniel C. Kiper kiper@ini.ethz.ch http:

How not to fall Sara Riggare Loss of sense of Swea@ng smell Dry mouth Tiredness Clogni@on

The Wake Community-University Partnership (WakeCUP) Presentation Goals and Objectives Overview

Experiments in the wind turbine far wake for the evaluation of analytical wake models Norwegian

Developing Your Own Wake Word Engine Just Like Alexa and OK Google Xuchen Yao, CEO,

European Wake Vortex European Wake Vortex Mitigation Benefits Study Mitigation Benefits Study

Foreclosure s Wake s Wake Foreclosure The Credit Experiences of Individuals Following

Wake Turbulence: do we Wake Turbulence: do we know enough to manage the know enough to manage

Outline Outline Turbulent Wake Flows Turbulent Wake Flows Momentum Integral

Application of Wake Turbulence Application of Wake Turbulence Separation at London Heathrow

Efficient Wake-Up Scheduling for Efficient Wake-Up Scheduling for Multi-Core Systems Multi-Core

Regionalism and Globalism: Regionalism and Globalism: East Asian Trade Relations in the Wake of

A Comparative Study of the Wake Dynamics during Yaw and Curtailment. Sren Juhl Andersen 1 June

Mostly-functional behavior in Java programs William C. Benton Red Hat Emerging Technologies and

Ultra-Low Latency Ethernet Ethernet Everywhere Network Technology Lab of 2012 Labs

NFV validation and troubleshooting live demo for vRAN (Telco EDGE) Franck Baudin, Sr Principal

Dynamic Geometry Processing EG 2012 Tutorial Will Chang, Hao Li, Niloy Mitra, Mark Pauly,

Alambic Open-Data Management for your project Boris Baldassari

Comparison of Network Interface Controllers for Software Packet Processing (Final Talk) Alexander

The Stairway to Maven The Stairway to Maven The JAKARTA Project Build The JAKARTA Project Build

TOPIC DISCUSSION 1 LKESBR SPONSORED 2016 INDUSTRY DAY EVENT Benefits of Participating in