A Mixture of Experts Approach for Runtime Mapping in Dynamic - PowerPoint PPT Presentation

A Mixture of Experts Approach for Runtime Mapping in Dynamic Environments Murali Emani School of Informatics University of Edinburgh

Modern computing hardware Diverse Stochastjc Evolving 2

Parallelism Mapping Program Computatjon Steps Hardware 3

Parallelism Mapping Program Workloads Hardware Sofuware Data Hardware 4

Parallelism Mapping Program Workloads Hardware Sofuware Data Hardware Program performance is sensitjve to the environment 5

What exactly is the problem ? Optjmal partjtjoning of the parallel work is not static and non-trivial 6

What exactly is the problem ? Existjng approaches are based on one-size-fjts-all policy 7

What exactly is the problem ? Existjng approaches are based on one-size-fjts-all policy ➔ Not suitable for dynamic environments ➔ Hard to extend and update 8

Goals ➔ Determine optjmal resources for a parallel program Avoid under-subscription / over-subscription ➔ Enable program auto-tuning Adapt smartly to varying resources ➔ Program and Platgorm aware Generic and portable 9

Where does it fit in the stack Applicatjon Runtjme Operatjng System Hardware 10

State Space 11

Idea ➔ Identjfy best mapping policy in each set 12

Idea ➔ Identjfy best mapping policy in each set E k E 1 E k-1 E 2 13

Idea ➔ Collect these policies E k E 1 E 1 E 2 …. E k-1 E k-1 E 2 E k 14

Idea ➔ Choose the best policy based on current state E k E 1 E 1 E 2 …. E k-1 E k-1 E 2 E k 15

Idea ➔ Choose the best policy based on current state E k E 1 E 1 E 2 …. E k-1 E k-1 E 2 E k 16

Mixture of Experts based Mapping ➔ Ensemble of experts (mapping policies) ➔ Smart way to select the best expert at runtjme ➔ Combine offmine prior models with online learning 17

Mixture of Experts based Mapping # threads Expert 1 # threads Expert 2 . . . . # threads Expert k 18

Mixture of Experts based Mapping How to select the best # threads Expert 1 expert ? # threads Expert 2 Expensive to evaluate with . . # threads of all experts . . # threads Expert k 19

Mixture of Experts based Mapping How to select the best # threads Expert 1 expert ? # threads Expert 2 Expensive to evaluate with . . # threads of all experts . . # threads Expert k Environment predictor 20

Mixture of Experts based Mapping How to select the best # threads Expert 1 expert ? environment # threads Expert 2 Expensive to evaluate with environment . . # threads of all experts . . # threads Expert k environment Environment predictor 21

Predictive Modelling Environment predictor Thread predictor What is the best What should the environment # threads should look like 22

Predictive Modelling Environment predictor Thread predictor What is the best What should the environment # threads should look like Input-feature-vector = < code, environment > f = (c,e) 23

Approach – Machine Learning 24

Approach – Machine Learning ➔ Hand crafued solutjons infeasible Learning prediction Data Training Model algorithm Pre-processing data New input 25

Approach – Machine Learning ➔ Hand crafued solutjons infeasible Learning prediction Data Training Model algorithm Pre-processing data New input ➔ Train offmine, deploy online ➔ Supervised learning, Cross-validated ➔ Trained on NAS, evaluated on additjonal benchmarks 26 * Training overhead: one-off cost of 9216 experiments

Training phase ➔ Various confjguratjons of program pairs and # threads 9216 experiments ; 3 weeks for runs; 1.1 GB log ➔ Feature space dimensionality reductjon : Informatjon gain 10 / 154 rich subset of features ➔ Linear Regression Models 27

Features STATIC DYNAMIC (code) (environment) # instructjons # workload threads # branches # processors # load/store run queue size CPU load page free list rate cached memory 28

How to select the best expert Online Expert Selector Select expert ‘k’ 29

How to select the best expert Online Expert Selector Select expert ‘k’ Use 'Environment predictor' as a proxy to select the best mapping policy 30

All put together... 31

How many experts ? 32

How many experts ? open questjon 33

Started with 4 experts 34

Evaluation Platgorm : 32-core Intel Xeon Benchmarks : NAS, SpecOMP, Parsec ( OpenMP ) Comparison : OpenMP default, Online, Offmine, Analytjc Workloads : Small ( light ), large ( heavy ) Hardware : Low, high frequent Online : “Parcae: a system for flexible parallel execution”, A. Raman, A. Zaks, J. W. Lee, and D. I. August. PLDI'12 Offline : “Smart, Adaptive Mapping of Parallelism in the Presence of External Workload ” , Murali Krishna Emani, Zheng Wang and Michael O'Boyle, CGO'13 35 Analytic : “Adaptive, Efficient, Parallel Execution of Parallel Programs”, S. Sridharan, G. Gupta, and G. S. Sohi. PLDI ’14.

Results 1.17x over analytjc 1.26x over offmine 1.38x over online 36

Why multjple experts ? Why not a single model ? E k E 1 M E k-1 E 2 37

Why multjple experts ? Why not a single model ? Multjple experts outperforms single model 38

Can this approach be used with other optjmizatjon techniques ? 39

Can this approach be used with other optjmizatjon techniques ? Affjnity-based scheduling 40

To sum up... Developed an approach for smart parallelism mapping ➔ Adaptjve to dynamic environment ➔ Predictjve modelling at its heart ➔ Environment predictor as a proxy to select the best mapping policy 41

What next ? ➔ Integratjng this concept in CnC ➔ Focus on tuning component ➔ Runtjme and Applicatjon tuning ➔ Dynamic partjtjoning of resources to steps 42

Idea Instances of computations (steps) Step 1 Step 2 Step 3 Step 4 ➔ Varying resource requirements for steps ➔ Mapping depends on when data is ready 43

Take away ➔ One-size-fjts- none ➔ A bag of multjple policies is more practjcal than one ➔ Machine learning can be of help !! Thank you Murali Emani University of Edinburgh m.k.emani@sms.ed.ac.uk 44

Backup 45

Adaptive Parallelism Mapping ➔ Program performance is sensitjve to the environment ● Various characteristjcs ● Large number of components ● Compute/memory/disk ● Increased chances of failure bound Inherent Hardware behavior Target Sofuware Data ● Recurring upgrades ● Varying amount of I/O ● Versions compatjbility ● Scalability issues 46

All experts use the same features, they vary in importance across each expert. 48

Evaluation Platgorm : 32-core Intel Xeon 4 one-socket nodes, 8 cores/socket, 3.7.10 kernel Compiler : gcc 4.6 “-O3 -fopenmp” Benchmarks : NAS, SpecOMP, Parsec ( OpenMP ) Comparison : OpenMP default, Online, Offmine, Analytjc Workloads : Small ( light ), large ( heavy ) Hardware : Low, high frequent Online : “Parcae: a system for flexible parallel execution”, A. Raman, A. Zaks, J. W. Lee, and D. I. August. PLDI'12 Offline : “Smart, Adaptive Mapping of Parallelism in the Presence of External Workload ” , Murali Krishna Emani, Zheng Wang and Michael O'Boyle, CGO'13 50 Analytic : “Adaptive, Efficient, Parallel Execution of Parallel Programs”, S. Sridharan, G. Gupta, and G. S. Sohi. PLDI ’14.

What is the efgect of increasing # experts ? Graceful additjon of experts What about # experts > 4 ? Needs more analysis 51

A Mixture of Experts Approach for Runtime Mapping in Dynamic - PowerPoint PPT Presentation

A Mixture of Experts Approach for Runtime Mapping in Dynamic Environments Murali Emani School of Informatics University of Edinburgh Modern computing hardware Diverse Stochastjc Evolving 2 Parallelism Mapping Program Computatjon Steps

Texture and other Mappings Texture Mapping Texture Mapping Bump Mapping Bump Mapping

Bernoulli Mixture Models Victor Medina Researcher at SBIF DataCamp Mixture Models in R The

Structure of mixture models Victor Medina Researcher at SBIF DataCamp Mixture Models in R

Image Warping Image Mapping Image Mapping - Examples Forward Mapping Forward Mapping -

TEXTURE MAPPING 1 OUTLINE Introduce Mapping Methods Texture Mapping Environment

Breaking the gridlock in Mixture-of-Experts: Consistent and Efficient algorithms Ashok Vardhan

Advanced Texturing Environment Mapping Environment Mapping reflections Environment Mapping

Texture Mapping Texture Mapping 1 Texture Mapping Texture Mapping Motivation Motivation:

Texture Mapping Surface mapping OpenGl and Implementation Details Texture mapping Bump

Another approach to runtime checking Typical runtime checking is by duplicating entire CPU

Testing Concurrency Runtime via a Testing Concurrency Runtime via a Stochastic Stress Framework

Public consultation EXPERTS WIPO ADR PRESENTATION AND CURRENT STATE OF THE EXPERTS WIPO ADR

Today Experts/Zero-Sum Games Equilibrium. Boosting and Experts. Routing and Experts. Two person

Classification of High Dimensional Data By Two-way Mixture Models Jia Li Statistics Department

RTK Mapping Process RTK Mapping Presentation RTK Mapping Presentation June 4, 2002 RTK

Mapping data Representing data with maps Geographic analysis tasks Mapping where things are

Mapping and Folding Dr. Mattox Beckman University of Illinois at Urbana-Champaign Department of

http://www.mmds.org Much of the course will be devoted

maptile mapping in Stata, made easy Michael Stepner MIT 7.36 23.67 4.59 7.36 3.07

Learning Lane Graph Representations for Motion Forecasting Ming Liang, Bin Yang, Rui Hu, Yun

Maps and organisation Ramon van Alteren VP Product Unomaly (EX Program manager Cloud Migration @

Dynamical systems Expanding maps on the circle Jana Rodriguez Hertz ICTP 2018 lifts and degree

Random triangulations coupled with an Ising model Laurent M enard (Paris Nanterre) joint work

Mapping of Address and Port so1wires - IETF83 Design