benchmark design for robust profile directed optimization
play

Benchmark Design for Robust Profile-Directed Optimization SPEC - PowerPoint PPT Presentation

Benchmark Design for Robust Profile-Directed Optimization SPEC Workshop 2007 Paul Berube and Jos Nelson Amaral University of Alberta NSERC Alberta Ingenuity iCore January 21, 2007 Paul Berube 1 In this talk SPEC: SPEC


  1. Benchmark Design for Robust Profile-Directed Optimization SPEC Workshop 2007 Paul Berube and José Nelson Amaral University of Alberta NSERC Alberta Ingenuity iCore January 21, 2007 Paul Berube 1

  2. In this talk • SPEC: SPEC CPU • PDF: Offline, profile-guided optimization • Test: Evaluate • Data/Inputs: Program input data January 21, 2007 Paul Berube 2

  3. PDF in Research • SPEC benchmarks and inputs used, but rules seldom followed exactly – PDF will continue regardless of admissibility in reported results • Some degree of profiling is taken as a given in many recent compiler and architecture works January 21, 2007 Paul Berube 3

  4. An Opportunity to Improve • No PDF for base in CPU2006 – An opportunity to step back and consider • Current evaluation methodology for PDF is not rigorous – Dictated by inputs/rules provided in SPEC CPU – Usually followed when reporting PDF research January 21, 2007 Paul Berube 4

  5. Current Methodology Static optimization input.ref Flag Tuning optimizing Test compiler peak_static January 21, 2007 Paul Berube 5

  6. Current Methodology PDF optimization input.train input.ref Flag Tuning PDF Train optimizing Test Profile compiler peak_pdf Instrumenting compiler January 21, 2007 Paul Berube 6

  7. Current Methodology PDF optimization input.train input.ref  Flag Tuning PDF Train optimizing Test Profile compiler if (peak_pdf > peak_static) Instrumenting peak := peak_pdf; compiler January 21, 2007 Paul Berube 7

  8. Current Methodology PDF optimization input.train input.ref Flag Tuning PDF Train optimizing Test Profile compiler if (peak_pdf > peak_static) Instrumenting peak := peak_pdf; compiler else peak := peak_static; January 21, 2007 Paul Berube 8

  9. Current Methodology PDF optimization input.train input.ref Flag Tuning Is this PDF Train optimizing Test comparison Profile compiler sound? Does 1 training and 1 test input predict PDF performance? (peak_pdf > peak_static) if (peak_pdf > peak_static) Instrumenting (peak_pdf > other_pdf) peak := peak_pdf; compiler else peak := peak_static; January 21, 2007 Paul Berube 9

  10. Current Methodology PDF optimization input.train input.ref Flag Tuning Variance Is this between inputs PDF Train optimizing Test comparison can be larger than Profile compiler sound? reported Does 1 training improvements! and 1 test input predict PDF performance? (peak_pdf > peak_static) if (peak_pdf > peak_static) Instrumenting (peak_pdf > other_pdf) peak := peak_pdf; compiler else peak := peak_static; January 21, 2007 Paul Berube 10

  11. January 21, 2007 vs. Static combined bzip2 – Train on xml -6 10 12 -4 -2 compressed 0 2 4 6 8 docs gap graphic jpeg Paul Berube log mp3 mpeg pdf program random reuters source > 14% xml 11

  12. PDF is like Machine Learning • Complex parameter space • Limited observed data (training) • Adjust parameters to match observed data – maximize expected performance January 21, 2007 Paul Berube 12

  13. Evaluation of Learning Systems • Must take sensitivity to training and evaluation inputs into account – PDF specializes code according to training data – Changing inputs can greatly alter performance • Performance results must have statistical significance measures – Differentiate between gains/losses and noise January 21, 2007 Paul Berube 13

  14. Overfitting • Specializing for the training data too closely • Exploiting particular properties of the training data that do not generalize • Causes: – insufficient quantity of training data – insufficient variation among training data – deficient learning system January 21, 2007 Paul Berube 14

  15. Overfitting • Currently: ✗ Engineer the compiler to not overfit the single training data (underfitting) ✗ No clear rules for input selection ✗ Some benchmark authors replicate data between train and ref • Overfitting can be rewarded! January 21, 2007 Paul Berube 15

  16. Criteria for Evaluation • Predict expected future performance • Measure performance variance • Do not reward overfitting • Same evaluation criteria as ML – Cross-validation addresses these criteria January 21, 2007 Paul Berube 16

  17. Cross-Validation • Split a collection of inputs into two or more non-overlapping sets • Train on one set, test on the other set(s) • Repeat, using a different set for training Test Train January 21, 2007 Paul Berube 17

  18. Leave-one-out Cross-Validation • If little data, reduce test set to 1 input – Leave N out: only N inputs in test Test Train January 21, 2007 Paul Berube 18

  19. Cross-Validation • The same data is NEVER in both the training and the testing set – Overfitting will not enhance performance • Multiple evaluations allows statistical measure to be calculated on the results – Standard deviation, confidence intervals... • Set of training inputs allows system to exploit commonalities between inputs January 21, 2007 Paul Berube 19

  20. Proposed Methodology • PDFPeak score, distinct from peak – Report with standard deviation • Provide a PDF workload – Inputs used for both training and evaluation, so “medium” sized (~2 min running time) – 9 inputs needed for meaningful statistical measures January 21, 2007 Paul Berube 20

  21. Proposed Methodology • Split inputs into 3 sets (at design time) • For each input in each evaluation, calculate speedup compared to (non-PDF) peak • Calculate (over all evaluations) – mean speedup – standard deviation of speedups January 21, 2007 Paul Berube 21

  22. Example PDF Workload (9 inputs): jpeg mpeg xml html text doc pdf source program January 21, 2007 Paul Berube 22

  23. Example – Split workload PDF Workload A jpeg (9 inputs): xml pdf jpeg mpeg xml B mpeg html html text source doc pdf source C text program doc program January 21, 2007 Paul Berube 23

  24. Example – Train and Run A Train Instrumenting compiler January 21, 2007 Paul Berube 24

  25. Example – Train and Run A PDF Train optimizing Profile(A) compiler Instrumenting compiler January 21, 2007 Paul Berube 25

  26. Example – Train and Run A B+C PDF Train optimizing Test Profile(A) compiler mpeg 1% html 5% text 4% Instrumenting compiler doc -3% source 4% program 2% January 21, 2007 Paul Berube 26

  27. Example – Train and Run B A+C PDF Train optimizing Test Profile(B) compiler jpeg 4% Mpeg 2% xml -1% html 5% text 5% text 3% Instrumenting doc 1% compiler doc -7% pdf 4% source 1% program 1% program 1% January 21, 2007 Paul Berube 27

  28. Example – Train and Run C A+B PDF Train optimizing Test Profile(C) compiler jpeg 5% jpeg 2% Mpeg 2% xml 2% xml -3% html 5% mpeg -1% text 2% text 3% Instrumenting html 3% doc 2% compiler doc -7% pdf 3% pdf 3% source 1% source 3% program-1% program 1% January 21, 2007 Paul Berube 28

  29. Example – Evaluate doc 1% doc -3% html 3% html 5% Average: 2.33 jpeg 5% jpeg 4% mpeg -1% mpeg 1% pdf 3% pdf 4% program 1% program 2% source 3% source 4% text 5% text 4% xml -1% xml 2% January 21, 2007 Paul Berube 29

  30. Example – Evaluate doc 1% doc -3% html 3% html 5% Average: 2.33 jpeg 5% jpeg 4% mpeg -1% mpeg 1% pdf 3% pdf 4% program 1% program 2% Std. Dev: 2.30 source 3% source 4% text 5% text 4% xml -1% xml 2% January 21, 2007 Paul Berube 30

  31. Example – Evaluate doc 1% doc -3% html 3% html 5% Average: 2.33 jpeg 5% jpeg 4% mpeg -1% PDF improves performance: mpeg 1% • 2.33±2.30%, 17 times out of 25 pdf 3% • 2.33±4.60%, 19 times out of 20 pdf 4% program 1% program 2% Std. Dev: 2.30 source 3% source 4% text 5% text 4% xml -1% xml 2% January 21, 2007 Paul Berube 31

  32. Example – Evaluate PDF improves performance: • 2.33±2.30%, 17 times out of 25 • 2.33±4.60%, 19 times out of 20 (peak_pdf > peak_static)? (new_pdf > other_pdf)? Depends on mean and variance of both! January 21, 2007 Paul Berube 32

  33. Pieces of Effective Evaluation • Workload of inputs • Education about input selection – Rules and guidelines for authors • Adoption of a new methodology for PDF evaluation January 21, 2007 Paul Berube 33

  34. Practical Concerns • Benchmark user – Many additional runs, but on smaller inputs – Two additional program compilation • Benchmark author – Most INT benchmarks use multiple data, and/or additional data is easily available – PDF input set could be used for REF January 21, 2007 Paul Berube 34

  35. Conclusion • PDF is here: important for compilers and architecture, in research and in practice • The current methodology for PDF evaluation is not reliable • Proposed a methodology for meaningful evaluation January 21, 2007 Paul Berube 35

  36. Thanks Questions? January 21, 2007 Paul Berube 36

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend