Workload Prediction for Adaptive Power Scaling Using Deep Learning - PowerPoint PPT Presentation

Workload Prediction for Adaptive Power Scaling Using Deep Learning Steve Tarsa, Amit Kumar, & HT Kung Harvard, Intel Labs MRL May 29, 2014 ICICDT ‘14

In these slides… Machine learning (ML) is applied to performance counters in order to model workloads and predictively optimize frequency/voltage Deep machine learning (ML) methods are popular due to successes in computer • vision, natural language processing, etc. We demonstrate that ML improves statistical accuracy over techniques like • regression in complicated scenarios, for which accurate models are elusive At the architecture level, we use ML to capture hidden structure in counter data • that corresponds to cross-layer user/OS/chip interactions Hierarchical sparse coding improves accuracy and look-ahead range for predicting instruction throughput dips, giving more time for chip adjustment Multi-layer (i.e. “deep”) ML models first extract canonical features, and then their • interrelationships to find high-dimensional patterns over time on little training data Our methods rely on pattern matching, and can be implemented in circuitry with • simple low-precision inner product computations We demonstrate 3x improvement in look-ahead range and a 50% power reduction • during throughput dips for web surfing on an ARMv7a/Android Gingerbread device

User-driven workloads, e.g. web surfing, have many opportunities for dynamic power optimization using DVFS, when instruction throughput drops temporarily Instruction Throughput - Android Web Surfing Sub-25% instruction throughput characterizes 20% of runtime BBENCH on gem5, Single Core ARM v7a

But, anticipating dips in CPU activity requires modeling complicated interactions between users, OS/apps, and chip architecture User & Workload OS & Application Chip Architecture e.g. Browsing habits, multi- e.g. Process Management, e.g. Data or Instruction Web Page Caching Policy tasking habits cache configuration Instead of modeling by hand, machine learning extracts “hidden” structure from raw data, yielding statistical models with better prediction and training requirements than standard regression methods

From hardware-counter time series data, we extract common patterns using a clustering algorithm; clusters become atoms in a feature dictionary Counter Name Description d 2 cpu.committedInsts # Committed Instructions d 1 Data Vector: Sparse Code: cpu.num_fp_register_reads # times fp registers read cpu.dtb.read_accesses DTB Read accesses cpu.dtb.read_hits DTB Read hits cpu.dtb.read_misses DTB Read misses d 3 cpu.dtb.flush_entries # entries flushed from DTB … Expressing raw data in terms of a few prominent features removes noise, and generalizes a few training examples for good statistical accuracy under variation

Deep architectures use multiple layers to first find simple features within short windows, and then find feature interrelationships over larger time scales Event Signal: t = {0,1} SVM Predictor Feature Interrelationship Vector: z t,2 Feature Sparse Coding Interrelationship Layer 2 Dictionary Concatenated Sparse Feature Vector: z t,1 Sparse Coding Feature Feature Dictionary Dictionary Layer 1 Measurement Measurement Vector: x t-1 Vector: x t Our prediction method, hierarchical sparse coding + linear SVM classification, relies on pattern matching, and can be built into circuitry with low-precision inner- product computations

Compared to predictions based on regression modeling or heuristics, learned feature-space signatures yield useful predictions with 3x longer look-ahead, giving more time for chip adjustment Highest Pred. Acc. Longest Range Prediction Accuracy Look-Ahead (500us windows) Signatures captured over the longest time scales give stable long term predictions, with up to 8ms heads-up.

Absent a system model, regression extrapolates observed data to predict future states based on the assumption that counter values change smoothly over time Past States Future States Regression Fit Counter Predicted Values Trend Time This assumption only holds over small time scales and at high sampling rates, meaning that regression-extrapolated predictions are only useful for short ranges

Power savings are subject to a predictor’s false alarms, so we model P dyn relative to baseline power (i.e. gating efficiency) and the cost of false positive recovery Power Consumption with DVFS, as False Baseline Power Consumption as Gating Efficiency Increases Positive Recovery Cost Decreases For a 0.33 gating-efficient design, with a recovery cost of +0.25 additional switching activity, predictive DVFS reduces P Dyn by 50% with 1 ms heads up for chip adjustment

Summary and next steps… Online deep learning holds promise for chip optimizations, though implementation will come in parts… Offline learning may yield good static rules that capture much of low-hanging fruit • Architectures for low-power dictionary learning are being explored • “Small data” deep learning must be better explored, to optimize accuracy under • time-biased training data Instruction throughput prediction for DVFS is a first-step application, and we will explore others that may lead to larger gains Past successes: wireless link prediction • Past failures: branch prediction, cache prefetching ( scenarios are easy-enough that • standard tools perform just as well as ML!) Others…? •

Workload Prediction for Adaptive Power Scaling Using Deep Learning - PowerPoint PPT Presentation

Workload Prediction for Adaptive Power Scaling Using Deep Learning Steve Tarsa, Amit Kumar, & HT Kung Harvard, Intel Labs MRL May 29, 2014 ICICDT 14 In these slides Machine learning (ML) is applied to performance counters in order

Workload, Fatigue, and Sleep Disruption 1 Workload 1.What is workload? 2.What is the

ADAPTIVE RADIO OUTPUT SCALING FOR POWER AND BANDWIDTH SAVING Koen Zandberg 1 ADAPTIVE RADIO

Outline Scaling Scalinga Plenitude of Power Laws Scaling-at-large Scaling-at-large

WORKLOAD WORKLOAD WORKLOAD During exercise, nasal breathing causes a reduction in FEO 2

ASHA Workload Calculator What is Direct and Other indirect workload? activities Services

UP UP AND OUT: SCALING SOFTWARE WITH AKKA Jonas Bonr CTO Typesafe @jboner Scaling software

Neural Nets for Adaptive Filter and Adaptive Neural Nets as Adaptive Filters Pattern Recognition

Adaptive Control Chapter 1: Introduction to Adaptive Control Adaptive Control Landau, Lozano,

Adaptive Control Chapter 11: Direct Adaptive Control 1 Adaptive Control Landau, Lozano,

DAY 2 Agenda for Today Introduce the workload characterization problem. Discuss a

Day 3 Agenda for Today Formulate simple problem statement Revisit the workload

Local 006 Workload Appeal COLLECTIVE AGREEMENT 2014:LETTER OF INTENT #2 Why a Workload Appeal?

Workload Formulas Judicial Branch Workload Formulas and On-Bench Time Reporting | September 23,

CS 147: Computer Systems Performance Analysis Workload Selection 1 / 39 Overview CS147

Adaptive Control Chapter 12: Indirect Adaptive Control 1 Adaptive Control Landau, Lozano,

Analysis of Scaling Algorithms for Matrix & Operator Scaling Contents Scaling Algorithms

Regional Seminar on EU- - Regional Seminar on EU SADC EPA SADC EPA South Africa and Tariff

WISDOM WARRIORS Empowering our Tribal community with opportunities to learn and experience

Updates from HDI Corporate December 2015 Whats Hot at HDI Save the Date: HDI 2016

through picture books with Nathalie Paris www.nattalingo.co.uk YouTube: Nattalingo Productions

BUDGET MEETING 2020-2021 Public Hearing May 26, 2020 at 7pm ANNUAL livestreamed on

Co-Creating Healthy Change 2013-2017 Cardiff and the Vale of Glamorgan People in Cardiff and

LIBRARY 2018 HIGHLIGHTS YOUR SUPPORT AT A GLANCE 2017 2019 STRATEGIC PRIORITES

Construction Youth Trust Mobile Classroom Programme: Delivering vocational training in the heart

Workload Prediction for Adaptive Power Scaling Using Deep Learning - PowerPoint PPT Presentation

Workload Prediction for Adaptive Power Scaling Using Deep Learning Steve Tarsa, Amit Kumar, & HT Kung Harvard, Intel Labs MRL May 29, 2014 ICICDT 14 In these slides Machine learning (ML) is applied to performance counters in order

Workload, Fatigue, and Sleep Disruption 1 Workload 1.What is workload? 2.What is the

ADAPTIVE RADIO OUTPUT SCALING FOR POWER AND BANDWIDTH SAVING Koen Zandberg 1 ADAPTIVE RADIO

Outline Scaling Scalinga Plenitude of Power Laws Scaling-at-large Scaling-at-large

WORKLOAD WORKLOAD WORKLOAD During exercise, nasal breathing causes a reduction in FEO 2

ASHA Workload Calculator What is Direct and Other indirect workload? activities Services

UP UP AND OUT: SCALING SOFTWARE WITH AKKA Jonas Bonr CTO Typesafe @jboner Scaling software

Neural Nets for Adaptive Filter and Adaptive Neural Nets as Adaptive Filters Pattern Recognition

Adaptive Control Chapter 1: Introduction to Adaptive Control Adaptive Control Landau, Lozano,

Adaptive Control Chapter 11: Direct Adaptive Control 1 Adaptive Control Landau, Lozano,

DAY 2 Agenda for Today Introduce the workload characterization problem. Discuss a

Day 3 Agenda for Today Formulate simple problem statement Revisit the workload

Local 006 Workload Appeal COLLECTIVE AGREEMENT 2014:LETTER OF INTENT #2 Why a Workload Appeal?

Workload Formulas Judicial Branch Workload Formulas and On-Bench Time Reporting | September 23,

CS 147: Computer Systems Performance Analysis Workload Selection 1 / 39 Overview CS147

Adaptive Control Chapter 12: Indirect Adaptive Control 1 Adaptive Control Landau, Lozano,

Analysis of Scaling Algorithms for Matrix &amp; Operator Scaling Contents Scaling Algorithms

Regional Seminar on EU- - Regional Seminar on EU SADC EPA SADC EPA South Africa and Tariff

WISDOM WARRIORS Empowering our Tribal community with opportunities to learn and experience

Updates from HDI Corporate December 2015 Whats Hot at HDI Save the Date: HDI 2016

through picture books with Nathalie Paris www.nattalingo.co.uk YouTube: Nattalingo Productions

BUDGET MEETING 2020-2021 Public Hearing May 26, 2020 at 7pm ANNUAL livestreamed on

Co-Creating Healthy Change 2013-2017 Cardiff and the Vale of Glamorgan People in Cardiff and

LIBRARY 2018 HIGHLIGHTS YOUR SUPPORT AT A GLANCE 2017 2019 STRATEGIC PRIORITES

Construction Youth Trust Mobile Classroom Programme: Delivering vocational training in the heart

Analysis of Scaling Algorithms for Matrix & Operator Scaling Contents Scaling Algorithms