Machine Learning for Performance and Power Modeling/Prediction Lizy - - PowerPoint PPT Presentation

machine learning for performance and power modeling
SMART_READER_LITE
LIVE PREVIEW

Machine Learning for Performance and Power Modeling/Prediction Lizy - - PowerPoint PPT Presentation

PACT 2010 Machine Learning for Performance and Power Modeling/Prediction Lizy K. John University of Texas at Austin Simulation Challenges Simulation Based Performance Models eg: SimOS, SIMICS, GEM5, SimpleScalar Power modeling eg:


slide-1
SLIDE 1

PACT 2010

Machine Learning for Performance and Power Modeling/Prediction

Lizy K. John

University of Texas at Austin

slide-2
SLIDE 2

2

Simulation Challenges

§ Simulation Based Performance Models § eg: SimOS, SIMICS, GEM5, SimpleScalar § Power modeling § eg: McPAT, CACTI § Full system Simulation is prohibitively slow § Simulation errors § Large Gap between what’s evaluated pre-silicon and

what’s run post-silicon

slide-3
SLIDE 3

3

Three Examples for Using Machine Learning in Performance/Power Modeling and Prediction

§Calibration of Power Models using Machine

Learning

§Cross-Platform Performance/Power

Prediction using Machine Learning

§Stressmarks and Power Viruses using

Machine Learning

slide-4
SLIDE 4

Machine Learning for Model Calibration

Lizy K. John 5/12/17

ISLPED 2015 Example 1

slide-5
SLIDE 5

Machine Learning for Model Calibration

Lizy K. John 5/12/17

Example 1

slide-6
SLIDE 6

Non-negative Least Square Error Solver

Lizy K. John 5/12/17

Correction Factors – Additive and Multiplicative

Example 1

slide-7
SLIDE 7

Training Process

Lizy K. John 5/12/17

Example 1

slide-8
SLIDE 8

Lizy K. John 5/12/17

Training Process

Example 1

slide-9
SLIDE 9

Machine Learning for Model Calibration

Lizy K. John 5/12/17

Example 1

slide-10
SLIDE 10

Calibrated Power

Lizy K. John 5/12/17

Example 1

slide-11
SLIDE 11

Machine Learning for Cross-Platform Prediction

Lizy K. John 5/12/17

Example 2 Motivation: Full System Simulation is too slow. Analytical Models are not accurate enough. Bridge the gap between the two using machine learning. Intuition: Performance on two platforms is correlated. Can machine learning be used to understand that correlation?

slide-12
SLIDE 12

Machine Learning for Cross-Platform Prediction (DAC 2016, DATE 2017)

Lizy K. John 5/12/17

Example 2

Constrained LASSO Regression

slide-13
SLIDE 13

Use Cases for Cross-Platform Prediction

Lizy K. John 5/12/17

Example 2 Slow Simulator – No time to run all benchmarks but fast previous generation or other ISA hardware available - Run some benchmarks and use machine learning Limited Access to New Hardware – Make some runs – Train using them and predict power of other benchmarks Hardware Software Co-development - If they can run code on existing hardware and predict based on the cross-platform model provided by hardware developer

slide-14
SLIDE 14

Learning Formulation

Lizy K. John 5/12/17

slide-15
SLIDE 15

Training Set – ACM Programming Contest

Lizy K. John 5/12/17

Example 2

slide-16
SLIDE 16

Profiling for Training

Lizy K. John 5/12/17

Example 2

slide-17
SLIDE 17

Performance Prediction Accuracy

Lizy K. John 5/12/17

Example 2

slide-18
SLIDE 18

Power Prediction - Accuracy

Lizy K. John 5/12/17

Example 2

slide-19
SLIDE 19

Prediction at Fine-grain

Lizy K. John 5/12/17

Example 2

slide-20
SLIDE 20

Power Prediction at Fine-grain

Lizy K. John 5/12/17

Example 2

slide-21
SLIDE 21

Average Cross-Validation Error: 10-fold cross-validation

Lizy K. John 5/12/17

Example 2

slide-22
SLIDE 22

Non-Linearity of F

Lizy K. John 5/12/17

Example 2

slide-23
SLIDE 23

Lizy K. John 5/12/17

Example 2

slide-24
SLIDE 24

Lizy K. John 5/12/17

Example 2

slide-25
SLIDE 25

Lizy K. John 5/12/17

Example 2 Host Sensitivity Instrumenting Source Method Aligning in No-Source (Perf Counter based) Methodology Stochastic Dynamic Coupling Requires Solving Regression during Prediction More work to be done but cross-platform prediction seems feasible.

Challenges in Cross-Platform Prediction

slide-26
SLIDE 26

Challenges in Creating Max-power Viruses

26 Laboratory for Computer Architecture

§ Hand crafting code snippets for power viruses

– Very tedious process, complex interactions inside the processor – Cannot be sure if it is the maximum case – Heavily architecture dependent; heavy domain knowledge

§ Automatically generate power viruses Example 3

slide-27
SLIDE 27

Power measurement of Viruses on Hardware

27 Laboratory for Computer Architecture

§ BurnK7 – 72.1 Watts § SPEC CPU2006: 416.gamess and 453.povray consume highest power of 63.1 and 59.6 Watts Example 3

slide-28
SLIDE 28

Power Proxies and Viruses using Machine Learning

28 Laboratory for Computer Architecture 11/16/2011

Machine Learning Abstract workload specs Code Generator

Synth 1 Synth 2 Synth n

… Power/Perf Simulator Power estimates Fitness values 1,2 … n

Power Virus

Example 3

slide-29
SLIDE 29

– Derive proxy applications from a set of workload characterizations – Proxies convey no proprietary information, but capture the execution behavior of developer’s applications – Proxy applications have similar power and performance characteristics as original

Proxies are miniature and can be run on RTL Power can be modeled on RTL without OS and without software stack

Proxy Workload Generation

Performance/Power Clones Original Workloads

Example 3

slide-30
SLIDE 30

30 Laboratory for Computer Architecture

Automatic Synthetic Benchmark Generation

Example 3

slide-31
SLIDE 31

Power Virus Generation using Machine Learning

31 Laboratory for Computer Architecture

Machine Learning Abstract workload specs Code Generator

Synth 1 Synth 2 Synth n

… Power/Perf Simulator Power estimates Fitness values 1,2 … n

Power Virus

Example 3

slide-32
SLIDE 32

SYMPO and MAMPO Frameworks

§ Automatically search for power viruses using an abstract workload model and machine learning

32 Laboratory for Computer Architecture

§ GA: search heuristic to solve optimization problems § Choose a random population, evaluate fitness, apply GA

  • perators to generate

next population § Evolve until required fitness achieved Example 3

slide-33
SLIDE 33

SYMPO Framework – Genetic Algorithm

– Individuals -> synthetic workloads, – Fitness function -> power on the design under study – Mutation rate, reproduction rate, crossover rate

33 Laboratory for Computer Architecture

Single-point Crossover Single-point Mutation

Example 3

slide-34
SLIDE 34

SYMPO Vs Mprime

  • n SPARC ISA

34 Laboratory for Computer Architecture

Config 1 - 14% more Config 2 - 24% more Config 3 - 41% more

Example 3

slide-35
SLIDE 35

Comparison to SPEC CPU2006 on SPARC ISA

35 Laboratory for Computer Architecture

§ Comparison to SPEC CPU2006: 74.4Watts compared to 89.8Watts in SYMPO Example 3

slide-36
SLIDE 36

Comparison to SPEC CPU2006 on Alpha ISA

36 Laboratory for Computer Architecture

§ Comparison to SPEC CPU2006: 111.8 Watts compared to 89.2 Watts, where theoretical maximum is 220 Watts Example 3

slide-37
SLIDE 37

Validation on x86 Hardware

§ The auto-generated stressmark (SYMPO) could beat the hand-tuned burnk7

37 Laboratory for Computer Architecture 48 53 58 63 68 73 78 Power (W) Benchmarks

Example 3

slide-38
SLIDE 38

Summary

Lizy K. John 5/12/17

Machine Learning Techniques can be used to improve Power Modeling and Prediction. Cross-Platform Prediction using Machine Learning can accurately track performance and power at phase level. Synthetic Stressmarks created using Genetic Algorithms can excel hand-generated stressmarks.

slide-39
SLIDE 39

5/12/17

BPOE 2014

Thank You! Questions?

Laboratory for Computer Architecture (LCA) The University of Texas at Austin lca.ece.utexas.edu

Lizy K. John