Parallel Splash Belief Propagation Joseph E. Gonzalez Yucheng Low - PowerPoint PPT Presentation

Parallel Splash Belief Propagation Joseph E. Gonzalez Yucheng Low Carlos Guestrin David O’Hallaron Computers which worked on this project: BigBro1, BigBro2, BigBro3, BigBro4, BigBro5, BigBro6, BiggerBro, BigBroFS Tashish01, Tashi02, Tashi03, Tashi04, Tashi05, Tashi06, …, Tashi30, parallel, gs6167, koobcam (helped with writing) Carnegie Mellon

Change in the Foundation of ML Why talk about parallelism now? Future Parallel Performance Log(Speed in GHz) Future Sequential Performance 1988 1990 1992 1994 1996 1998 2000 2002 2004 2006 2008 2010 Release Date 2

Why is this a Problem? Want to be Nearest Neighbor here [Google et al.] Parallelism Basic Regression [Cheng et al.] Support Vector Machines [Graf et al.] Graphical Models [Mendiburu et al.] Sophistication 3

Why is it hard? Algorithmic Efficiency Parallel Efficiency Eliminate wasted Expose independent computation computation Implementation Efficiency Map computation to real hardware 4

The Key Insight Statistical Structure ••Graphical Model Structure ••Graphical Model Parameters Computational Structure ••Chains of Computational Dependences ••Decay of Influence Parallel Structure ••Parallel Dynamic Scheduling ••State Partitioning for Distributed Computation 5

The Result Splash Belief Propagation Nearest Neighbor Goal [Google et al.] Parallelism Basic Regression [Cheng et al.] Support Vector Machines [Graf et al.] Graphical Models Graphical Models [Mendiburu et al.] [Gonzalez et al.] Sophistication 6

Outline Overview Graphical Models: Statistical Structure Inference: Computational Structure τ ε - Approximate Messages: Statistical Structure Parallel Splash Dynamic Scheduling Partitioning Experimental Results Conclusions 7

Graphical Models and Parallelism Graphical models provide a common language for general purpose parallel algorithms in machine learning A parallel inference algorithm would improve: Protein Structure Movie Computer Vision Prediction Recommendation Inference is a key step in Learning Graphical Models 8

Overview of Graphical Models Graphical represent of local statistical dependencies Observed Random Variables Noisy Picture “True” Pixel Values Latent Pixel Variables Local Dependencies Continuity Assumptions Inference What is the probability that this pixel is black? 9

Synthetic Noisy Image Problem Noisy Image Predicted Image Overlapping Gaussian noise Assess convergence and accuracy

Protein Side-Chain Prediction Model side-chain interactions as a graphical model Inference What is the most likely orientation? 11

Protein Side-Chain Prediction 276 Protein Networks: Approximately: 700 Variables 1600 Factors 70 Discrete orientations Strong Factors Example Degree Distribution 0.15 0.1 0.05 0 6 14 22 30 38 46 Degree 12

Markov Logic Networks Represent Logic as a graphical model A: Alice True/False? Friends(A,B) B: Bob Friends(A,B) And Smokes(A) Smokes(A) Smokes(B) è Smokes(B) Smokes(A) è Cancer(A) Smokes(B) è Cancer(B) Inference Pr(Cancer(B) = True | Smokes(A) = True & Friends(A,B) = True) = ? Cancer(A) Cancer(B) 13

Markov Logic Networks UW-Systems Model A: Alice Friends(A,B) True/False? B: Bob 8K Binary Variables Friends(A,B) And Smokes(A) Smokes(A) Smokes(B) è Smokes(B) 406K Factors Smokes(A) è Cancer(A) Smokes(B) è Cancer(B) Irregular degree Cancer(A) Cancer(B) distribution: Some vertices with high degree 14

The Inference Problem A: Alice Friends(A,B) True/False? B: Bob What is the probability What is the best Friends(A,B) And Smokes(A) Smokes(A) Smokes(B) è Smokes(B) that Bob Smokes given configuration of the Smokes(A) è Cancer(A) Smokes(B) è Cancer(B) Alice Smokes? protein side-chains? Cancer(A) Cancer(B) NP-Hard in General Approximate Inference: What is the probability Belief Propagation that each pixel is black? 16

Belief Propagation (BP) Iterative message passing algorithm Naturally Parallel Algorithm 17

Parallel Synchronous BP Given the old messages all new messages can be computed in parallel: CPU 1 CPU 2 CPU 3 CPU n Old New Messages Messages Map-Reduce Ready! 18

Sequential Computational Structure 19

Hidden Sequential Structure 20

Hidden Sequential Structure Evidence Evidence Running Time: Time for a single Number of Iterations parallel iteration 21

Optimal Sequential Algorithm Running Time Naturally Parallel 2n 2 /p p ≤ 2n Gap Forward-Backward 2n p = 1 Optimal Parallel n p = 2 22

Key Computational Structure Running Time Naturally Parallel 2n 2 /p p ≤ 2n Inherent Sequential Structure Gap Requires Efficient Scheduling Optimal Parallel n p = 2 23

Parallelism by Approximation True Messages 1 2 3 4 5 6 7 8 9 10 τ ε -Approximation τ ε represents the minimal sequential structure 1 25

Tau-Epsilon Structure Often τ ε decreases quickly: Protein Networks Message Approximation Error in Log Scale Markov Logic Networks 26

Running Time Lower Bound Theorem: Using p processors it is not possible to obtain a τ ε approximation in time less than: Parallel Sequential Component Component 27

Proof: Running Time Lower Bound Consider one direction using p/2 processors ( p≥2 ): τ ε n - τ ε … 1 n τ ε τ ε τ ε τ ε τ ε τ ε τ ε We must make n - τ ε vertices τ ε left-aware A single processor can only make k- τ ε +1 vertices left aware in k -iterations 28

Optimal Parallel Scheduling Processor 1 Processor 2 Processor 3 Theorem: Using p processors this algorithm achieves a τ ε approximation in time: 29

Proof: Optimal Parallel Scheduling All vertices are left-aware of the left most vertex on their processor After exchanging messages After next iteration: After k parallel iterations each vertex is (k-1)(n/p) left-aware 30

Proof: Optimal Parallel Scheduling After k parallel iterations each vertex is (k-1)(n/p) left- aware Since all vertices must be made τ ε left aware: Each iteration takes O(n/p) time: 31

Comparing with SynchronousBP Processor 1 Processor 2 Processor 3 Synchronous Schedule Optimal Schedule Gap 32

The Splash Operation Generalize the optimal chain algorithm: ~ to arbitrary cyclic graphs: 1) Grow a BFS Spanning tree with fixed size 2) Forward Pass computing all messages at each vertex 3) Backward Pass computing all messages at each vertex 34

Running Parallel Splashes CPU 1 CPU 2 CPU 3 Key Challenges: Splash Splash Splash 1) How do we schedules Splashes? 2) How do we partition the Graph? Local State Local State Local State Partition the graph Schedule Splashes locally Transmit the messages along the boundary of the partition 35

Where do we Splash? Assign priorities and use a scheduling queue to select roots: ? Splash ? ? Scheduling Queue Splash How do we assign priorities? CPU 1 Local State

Message Scheduling Residual Belief Propagation [Elidan et al., UAI 06]: Assign priorities based on change in inbound messages Large Change Small Change Large Change Small Change Message Message 1 2 Small Change: Large Change: Expensive No-Op Informative Update Message Message Message Message 37

Problem with Message Scheduling Small changes in messages do not imply small changes in belief: Large change in Small change in belief all message Message Message Message Belief Message 38

Problem with Message Scheduling Large changes in a single message do not imply large changes in belief: Small change Large change in in belief a single message Message Message Message Belief Message 39

Belief Residual Scheduling Assign priorities based on the cumulative change in belief: r v = + + 1 1 1 A vertex whose belief has changed substantially since last being updated will likely produce Message informative new messages. Change 40

Message vs. Belief Scheduling Belief Scheduling improves accuracy and convergence Error in Beliefs % Converged in 4Hrs Message Scheduling 0.06 100% L1 Error in Beliefs 80% Belief Scheduling 0.05 Better 60% 0.04 40% 0.03 20% 0.02 0% 0 50 100 Belief Message Time (Seconds) Residuals Residual 41

Splash Pruning Belief residuals can be used to dynamically reshape and resize Splashes: Low Beliefs Residual

Splash Size Using Splash Pruning our algorithm is able to dynamically select the optimal splash size 350 Running Time (Seconds) Without Pruning 300 With Pruning 250 Better 200 150 100 50 0 0 10 20 30 40 50 60 Splash Size (Messages) 43

Parallel Splash Belief Propagation Joseph E. Gonzalez Yucheng Low - PowerPoint PPT Presentation

Parallel Splash Belief Propagation Joseph E. Gonzalez Yucheng Low Carlos Guestrin David OHallaron Computers which worked on this project: BigBro1, BigBro2, BigBro3, BigBro4, BigBro5, BigBro6, BiggerBro, BigBroFS Tashish01, Tashi02,

SPLASH Water Safety Campaign 2017 Law Enforcement Off the Pavement SPLASH SPLASH Water Safety

THE SPLASH DRONE 3 THE MODULAR ALL WEATHER WATERPROOF DRONE The Splash drone 3 is the most

PLANT PROPAGATION An Overview of Plant Propagation Methods Two Techniques of Stem Cutting

Overview Independence Belief Networks Conditional Independence Belief networks Chris

26:198:722 Expert Systems I Dempster-Shafer Belief Functions I Combining Belief Functions I Types

Martindale Park Splash Pad Public Meeting #1 February 13, 2019 Please sign in. 1 Presentation

Safety and Health 2016 and beyond SPlASH - a Scottish Plan for Action on Safety and Health 2016

Introduction: Belief vs Degrees of Belief Hannes Leitgeb LMU Munich October 2014 My three

Shuffled Belief Propagation Decoding Juntan Zhang and Marc Fossorier Department of Electrical

An empirical study of Gaussian belief propagation and application in the detection of F-formations

THE AMATEURS FRIEND OR Enemy A short course on Propagation Propagation What is it? What

1 How to deal with Radio Propagation How to deal with Radio Propagation Where are you from?

Physical of radio propagation Two types of propagation models

Splash User-friendly Programming Interface for Parallelizing Stochastic Algorithms Yuchen Zhang

Belief Decision Behavior: Theory and Evidence Todd Davies Belief Concepts Proposition

Belief and assertion. Evidence from mood shift Alda Mari Institut Jean Nicod , cnrs/ens/ehess/psl

ENE 2XX: Renewable Energy Systems and Control LEC 04 : Distributed Optimization of DERs Professor

Chapter 5: CPU Scheduling Outline Wh a t i s s c h e d u l i n g i n t h

Data Locality in MapReduce Loris Marchal 1 Olivier Beaumont 2 1: CNRS and ENS Lyon, France. 2:

Frame- -Aggregated Concurrent Aggregated Concurrent Frame Matching Switch Matching Switch Bill

CS244 Advanced Topics in Networking Lecture 6: Switching Nick McKeown High-speed switch

EECS 583 Class 10 Code Generation University of Michigan October 6, 2014 Announcements

Chapter 6 Cloud Resource Management and Scheduling Contents Resource management and

Instruction Scheduling List scheduling [Gibbons & Muchnick 86] Reorder instructions to

Parallel Splash Belief Propagation Joseph E. Gonzalez Yucheng Low - PowerPoint PPT Presentation

Parallel Splash Belief Propagation Joseph E. Gonzalez Yucheng Low Carlos Guestrin David OHallaron Computers which worked on this project: BigBro1, BigBro2, BigBro3, BigBro4, BigBro5, BigBro6, BiggerBro, BigBroFS Tashish01, Tashi02,

SPLASH Water Safety Campaign 2017 Law Enforcement Off the Pavement SPLASH SPLASH Water Safety

THE SPLASH DRONE 3 THE MODULAR ALL WEATHER WATERPROOF DRONE The Splash drone 3 is the most

PLANT PROPAGATION An Overview of Plant Propagation Methods Two Techniques of Stem Cutting

Overview Independence Belief Networks Conditional Independence Belief networks Chris

26:198:722 Expert Systems I Dempster-Shafer Belief Functions I Combining Belief Functions I Types

Martindale Park Splash Pad Public Meeting #1 February 13, 2019 Please sign in. 1 Presentation

Safety and Health 2016 and beyond SPlASH - a Scottish Plan for Action on Safety and Health 2016

Introduction: Belief vs Degrees of Belief Hannes Leitgeb LMU Munich October 2014 My three

Shuffled Belief Propagation Decoding Juntan Zhang and Marc Fossorier Department of Electrical

An empirical study of Gaussian belief propagation and application in the detection of F-formations

THE AMATEURS FRIEND OR Enemy A short course on Propagation Propagation What is it? What

1 How to deal with Radio Propagation How to deal with Radio Propagation Where are you from?

Physical of radio propagation Two types of propagation models

Splash User-friendly Programming Interface for Parallelizing Stochastic Algorithms Yuchen Zhang

Belief Decision Behavior: Theory and Evidence Todd Davies Belief Concepts Proposition

Belief and assertion. Evidence from mood shift Alda Mari Institut Jean Nicod , cnrs/ens/ehess/psl

ENE 2XX: Renewable Energy Systems and Control LEC 04 : Distributed Optimization of DERs Professor

Chapter 5: CPU Scheduling Outline Wh a t i s s c h e d u l i n g i n t h

Data Locality in MapReduce Loris Marchal 1 Olivier Beaumont 2 1: CNRS and ENS Lyon, France. 2:

Frame- -Aggregated Concurrent Aggregated Concurrent Frame Matching Switch Matching Switch Bill

CS244 Advanced Topics in Networking Lecture 6: Switching Nick McKeown High-speed switch

EECS 583 Class 10 Code Generation University of Michigan October 6, 2014 Announcements

Chapter 6 Cloud Resource Management and Scheduling Contents Resource management and

Instruction Scheduling List scheduling [Gibbons &amp; Muchnick 86] Reorder instructions to

Instruction Scheduling List scheduling [Gibbons & Muchnick 86] Reorder instructions to