Cuttlefish Lightweight Primitives for Online Tuning by Tomer Kaftan - PowerPoint PPT Presentation

Tuning Convolution with Cuttlefish def loopConvolve(image, filters): … def fftConvolve(image, filters): … def mmConvolve(image, filters): … tuner = Tuner([loopConvolve, fftConvolve, mmConvolve]) for image, filters in convolutions: convolve, token = tuner.choose() start = now() result = convolve(image, filters) elapsedTime = now() - start reward = computeReward(elapsedTime) tuner.observe(token, reward) output result 13

Cuttlefish I. Problem & Motivation II. The Cuttlefish API III.Bandit-based Online Tuning IV. Distributed Tuning Approach V. Contextual Tuning VI. Handling Nonstationary Settings VII.Other Operators VIII.Conclusion 14

Approach: Tuning 15

Approach: Tuning Multi-armed Bandit Problem 15

Approach: Tuning Multi-armed Bandit Problem • K possible choices (called arms) 15

Approach: Tuning Multi-armed Bandit Problem • K possible choices (called arms) • Arms have unknown reward distributions 15

Approach: Tuning Multi-armed Bandit Problem • K possible choices (called arms) • Arms have unknown reward distributions • At each round: select an Arm and observe a reward 15

Approach: Tuning Multi-armed Bandit Problem • K possible choices (called arms) • Arms have unknown reward distributions • At each round: select an Arm and observe a reward Goal: Maximize Cumulative Reward (by balancing exploration & exploitation) 15

Thompson Sampling 16

Thompson Sampling Belief distributions about expected reward Reward Arm 1 Arm 2 Arm 3 Arm 4 16

Thompson Sampling Reward Arm 1 Arm 2 Arm 3 Arm 4 17

Thompson Sampling Better arms chosen more often Reward Arm 1 Arm 2 Arm 3 Arm 4 20

Thompson Sampling 21

Thompson Sampling • Gaussian runtimes with initially unknown means and variances 21

Thompson Sampling • Gaussian runtimes with initially unknown means and variances • Belief distributions form t-distributions • Depend only on sample mean, variance, count 21

Thompson Sampling • Gaussian runtimes with initially unknown means and variances • Belief distributions form t-distributions • Depend only on sample mean, variance, count • No meta-parameters, yet works well for diverse operators 21

Thompson Sampling • Gaussian runtimes with initially unknown means and variances • Belief distributions form t-distributions • Depend only on sample mean, variance, count • No meta-parameters, yet works well for diverse operators • Constant memory overhead, 0.03 ms per tuning round 21

Convolution Evaluation 22

Convolution Evaluation • Prototype in Apache Spark 22

Convolution Evaluation • Prototype in Apache Spark • Tune between three convolution algorithms (Nested Loops, FFT, or Matrix Multiply) • Reward: -1*elapsedTime (maximizes throughput) 22

Convolution Evaluation • Prototype in Apache Spark • Tune between three convolution algorithms (Nested Loops, FFT, or Matrix Multiply) • Reward: -1*elapsedTime (maximizes throughput) • Convolve 8000 Flickr images with sets of filters (~32gb) • Vary number & size of filters 22

Convolution Evaluation • Prototype in Apache Spark • Tune between three convolution algorithms (Nested Loops, FFT, or Matrix Multiply) • Reward: -1*elapsedTime (maximizes throughput) • Convolve 8000 Flickr images with sets of filters (~32gb) • Vary number & size of filters • Compute intensive • (Some configs up to 45 min on a single node) 22

Convolution Evaluation • Prototype in Apache Spark • Tune between three convolution algorithms (Nested Loops, FFT, or Matrix Multiply) • Reward: -1*elapsedTime (maximizes throughput) • Convolve 8000 Flickr images with sets of filters (~32gb) • Vary number & size of filters • Compute intensive • (Some configs up to 45 min on a single node) • Run on an 8-node (AWS EC2 4-core r3.xlarge) cluster. • 32 total cores, ~252 images per core 22

Convolution Results Relative throughput normalized against the highest-throughput algorithm 23

Cuttlefish I. Problem & Motivation II. The Cuttlefish API III. Bandit-based Online Tuning IV. Distributed Tuning Approach V. Contextual Tuning VI. Handling Nonstationary Settings VII.Other Operators VIII.Conclusion 24

Challenges in Distributed Tuning 25

Challenges in Distributed Tuning 1. Choosing and observing occur throughout a cluster • To maximize learning, need to communicate 25

Challenges in Distributed Tuning 1. Choosing and observing occur throughout a cluster • To maximize learning, need to communicate 2. Synchronization & communication overheads 25

Challenges in Distributed Tuning 1. Choosing and observing occur throughout a cluster • To maximize learning, need to communicate 2. Synchronization & communication overheads 3. Feedback delay • How many times is `choose’ called before an earlier reward is observed? • Fortunately, theoretically sound to have delays 25

Distributed Tuning Approach 26

Distributed Tuning Approach Centralized Tuner Choose/Observe Machine 1 Machine 2 Machine 3 26

Distributed Tuning Approach Independent Tuners, Centralized Tuner Centralized Store Choose/Observe Push Local / Pull Global Machine 1 Machine 1 Global Model Machine 2 Machine 2 Store Machine 3 Machine 3 26

Distributed Tuning Approach Independent Tuners, Centralized Tuner Centralized Store Choose/Observe Push Local / Pull Global Machine 1 Machine 1 Global Model Machine 2 Machine 2 Store Machine 3 Machine 3 Peer-to-Peer is also a possibility, but requires more communication 26

Distributed Tuning Approach Worker 1 Local State Model Store Local State Non-local State Worker 1: Thread 1 Thread 2 Thread 3 Local State Non-local State Worker 2: Worker 2 Local State Local State Non-local State *On Master or a Parameter Server* Thread 1 Thread 2 Thread 3 … 27

Distributed Tuning Approach Worker 1 Local State Model Store Local State Non-local State Worker 1: Thread 1 Thread 2 Thread 3 Local State Non-local State Worker 2: Worker 2 Local State Local State Non-local State *On Master or a Parameter Server* Thread 1 Thread 2 Thread 3 • When choosing: aggregate local & non-local state … 27

Distributed Tuning Approach Worker 1 Local State Model Store Local State Non-local State Worker 1: Thread 1 Thread 2 Thread 3 Local State Non-local State Worker 2: Worker 2 Local State Local State Non-local State *On Master or a Parameter Server* Thread 1 Thread 2 Thread 3 • When choosing: aggregate local & non-local state • When observing: update the local state … 27

Distributed Tuning Approach Worker 1 Local State Model Store Local State Non-local State Worker 1: Thread 1 Thread 2 Thread 3 Local State Non-local State Worker 2: Worker 2 Local State Local State Non-local State *On Master or a Parameter Server* Thread 1 Thread 2 Thread 3 • When choosing: aggregate local & non-local state • When observing: update the local state • Model store aggregates non-local state … 27

Results with Distributed Approach Relative throughput normalized against the highest-throughput algorithm 28

Results with Distributed Approach Throughput normalized against an ideal oracle that always picks the fastest option at each round 29

Cuttlefish I. Problem & Motivation II. The Cuttlefish API III. Bandit-based Online Tuning IV. Distributed Tuning Approach V. Contextual Tuning (by learning cost models) VI. Handling Nonstationary Settings VII.Other Operators VIII.Conclusion 30

Contextual Tuning 31

Contextual Tuning • Best physical operator for each round may depend on current (easy to compute) context • e.g. convolution performance depends on the image & filter dimensions 31

Contextual Tuning • Best physical operator for each round may depend on current (easy to compute) context • e.g. convolution performance depends on the image & filter dimensions • Users may know important context features • e.g. from the asymptotic algorithmic complexity 31

Contextual Tuning • Best physical operator for each round may depend on current (easy to compute) context • e.g. convolution performance depends on the image & filter dimensions • Users may know important context features • e.g. from the asymptotic algorithmic complexity • Users can specify context in Tuner.choose 31

Contextual Tuning Algorithm 32

Contextual Tuning Algorithm • Linear contextual Thompson sampling learns a linear model that maps features to rewards 32

Contextual Tuning Algorithm • Linear contextual Thompson sampling learns a linear model that maps features to rewards • Feature Normalization & Regularization • Increased robustness towards feature choices 32

Contextual Tuning Algorithm • Linear contextual Thompson sampling learns a linear model that maps features to rewards • Feature Normalization & Regularization • Increased robustness towards feature choices • Effectively learns a cost model 32

Tuning Convolution with Cuttlefish def loopConvolve(image, filters): … def fftConvolve(image, filters): … def mmConvolve(image, filters): … tuner = Tuner([loopConvolve, fftConvolve, mmConvolve]) for image, filters in convolutions: convolve, token = tuner.choose() start = now() result = convolve(image, filters) elapsedTime = now() - start reward = computeReward(elapsedTime) tuner.observe(token, reward) output result 33

Tuning Convolution with Cuttlefish def loopConvolve(image, filters): … def fftConvolve(image, filters): … def mmConvolve(image, filters): … def getDimensions(image, filters): … tuner = Tuner([loopConvolve, fftConvolve, mmConvolve]) for image, filters in convolutions: context = getDimensions(image, filters) context convolve, token = tuner.choose(context) start = now() result = convolve(image, filters) elapsedTime = now() - start reward = computeReward(elapsedTime) tuner.observe(token, reward) output result 34

Contextual Convolution Results Throughput normalized against an ideal oracle that always picks the fastest algorithm 35

Cuttlefish I. Problem & Motivation II. The Cuttlefish API III. Bandit-based Online Tuning IV. Distributed Tuning Approach V. Contextual Tuning VI.Handling Nonstationary Settings VII.Other Operators VIII.Conclusion 36

Nonstationary Settings 37

Cuttlefish Lightweight Primitives for Online Tuning by Tomer Kaftan - PowerPoint PPT Presentation

Cuttlefish Lightweight Primitives for Online Tuning by Tomer Kaftan (UW), Magdalena Balazinska (UW), Alvin Cheung (UW), Johannes Gehrke (Microsoft) 1 Data processing workloads today are complicated. 2 Motivating Workload 3 Motivating

Cuttlefish easing the pain of erlang application configuration Joe DeVivo erlanger @ basho

Cuttlefish A Lightweight Primitive for Online Tuning by Tomer Kaftan (UW), Magdalena Balazinska

Sea Monsters: Myth and Mystery Phillip G. Lee, Sr. Research Scientist Gulf Coast Research

Getting Things Done: Free Tools and Resources Presented by Angela Lewis, PhD, PE March 13, 2013

Corporate Presentation TIM Group Overview Employees 55,198 Customers (Mln) 102.5 Revenues (Bn

The Impact of Cultural Diversity on the Technological Innovation Process In Nuclear Energy

Daily Bus Wait Time GROUP 01 Raj Oak Karina Roundtree Ge Zhu Outline 1. Introduction 2.

Financial and Operational Review For the year ended 30 June 2017 Colin Goldschmidt CEO, Sonic

Machine Learning for Auto Optimization What is Machine Learning? Definition: Machine

Brands, Technology & Talent June 2017 Dis isclaimer Cli lick to to edit it NOT FOR

TRAILER STABILITY ASSIST IN SIMULATION Nithin Ambady, Mark Foster, Krzysztof Kowalski SCS

Can Machine Think? A. M. Turing Computing Machinery and Intelligence Sabrina Hao S The

What did Polya Know about One Way Functions and Quantum Randomness Steve Meyer - Pragmatic C

Quantum Algorithms for Topological Invariants Stephen Jordan Wed Feb. 3, 2010 What is a quantum

Viv Kendon Durham University viv.kendon@durham.ac.uk HPC & Quantum Summit 2019 (Westminster

Shift-complex Sequences Mushfeq Khan University of WisconsinMadison March 24th, 2011 2011

On the constructive content of proofs in abstract analysis Ulrich Berger Swansea University

What's Our Job When the Machines Do Testing? Presented

Priority Queues and Sorting for Read-Only Data Tetsuo Asano 1 , Amr Elmasry 2 , and Jyrki

Working with Data Objectives Open an existing database Sort records in a table Filter

Welcome to Teacher Table #2 August 03, 2020 Introductions 1. Name 2. Education affiliation 3.

Briefing on ISO Preparations for Solar Eclipse Deane Lyon Shift Manager EIM Governing Body

TPHA Poster Presentation Guidelines 2015 PRESENTATION DETAILS: Outstanding Poster

Termination Rate Debate in Africa Dr. Christoph Stork Termination = Monopoly Monopolies

Cuttlefish Lightweight Primitives for Online Tuning by Tomer Kaftan - PowerPoint PPT Presentation

Cuttlefish Lightweight Primitives for Online Tuning by Tomer Kaftan (UW), Magdalena Balazinska (UW), Alvin Cheung (UW), Johannes Gehrke (Microsoft) 1 Data processing workloads today are complicated. 2 Motivating Workload 3 Motivating

Cuttlefish easing the pain of erlang application configuration Joe DeVivo erlanger @ basho

Cuttlefish A Lightweight Primitive for Online Tuning by Tomer Kaftan (UW), Magdalena Balazinska

Sea Monsters: Myth and Mystery Phillip G. Lee, Sr. Research Scientist Gulf Coast Research

Getting Things Done: Free Tools and Resources Presented by Angela Lewis, PhD, PE March 13, 2013

Corporate Presentation TIM Group Overview Employees 55,198 Customers (Mln) 102.5 Revenues (Bn

The Impact of Cultural Diversity on the Technological Innovation Process In Nuclear Energy

Daily Bus Wait Time GROUP 01 Raj Oak Karina Roundtree Ge Zhu Outline 1. Introduction 2.

Financial and Operational Review For the year ended 30 June 2017 Colin Goldschmidt CEO, Sonic

Machine Learning for Auto Optimization What is Machine Learning? Definition: Machine

Brands, Technology &amp; Talent June 2017 Dis isclaimer Cli lick to to edit it NOT FOR

TRAILER STABILITY ASSIST IN SIMULATION Nithin Ambady, Mark Foster, Krzysztof Kowalski SCS

Can Machine Think? A. M. Turing Computing Machinery and Intelligence Sabrina Hao S The

What did Polya Know about One Way Functions and Quantum Randomness Steve Meyer - Pragmatic C

Quantum Algorithms for Topological Invariants Stephen Jordan Wed Feb. 3, 2010 What is a quantum

Viv Kendon Durham University viv.kendon@durham.ac.uk HPC &amp; Quantum Summit 2019 (Westminster

Shift-complex Sequences Mushfeq Khan University of WisconsinMadison March 24th, 2011 2011

On the constructive content of proofs in abstract analysis Ulrich Berger Swansea University

What's Our Job When the Machines Do Testing? Presented

Priority Queues and Sorting for Read-Only Data Tetsuo Asano 1 , Amr Elmasry 2 , and Jyrki

Working with Data Objectives Open an existing database Sort records in a table Filter

Welcome to Teacher Table #2 August 03, 2020 Introductions 1. Name 2. Education affiliation 3.

Briefing on ISO Preparations for Solar Eclipse Deane Lyon Shift Manager EIM Governing Body

TPHA Poster Presentation Guidelines 2015 PRESENTATION DETAILS: Outstanding Poster

Termination Rate Debate in Africa Dr. Christoph Stork Termination = Monopoly Monopolies

Brands, Technology & Talent June 2017 Dis isclaimer Cli lick to to edit it NOT FOR

Viv Kendon Durham University viv.kendon@durham.ac.uk HPC & Quantum Summit 2019 (Westminster