Sum-Product Networks CS486 / 686 University of Waterloo Lecture - PowerPoint PPT Presentation

Sum-Product Networks CS486 / 686 University of Waterloo Lecture 23: July 19, 2017

Outline • SPNs in more depth – Relationship to Bayesian networks – Parameter estimation – Online and distributed estimation – Dynamic SPNs for sequence data 2 CS486/686 Lecture Slides (c) 2017 P. Poupart

Normal SPN An SPN is said to be normal when 1. It is complete and decomposable 2. All weights are non-negative and the weights of the edges emanating from each sum node sum to 1. 3. Every terminal node in the SPN is a univariate distribution and the size of the scope of each sum node is at least 2. 4 CS486/686 Lecture Slides (c) 2017 P. Poupart

Construct Bipartite Bayes Net 1. Create observable node for each observable variable 2. Create hidden node for each sum node 3. For each variable in the scope of a sum node, add a directed edge from the hidden node associated with the sum node to the observable node associated with the variable 5 CS486/686 Lecture Slides (c) 2017 P. Poupart

Construct Conditional Distributions 1. Hidden node : 2. Observable node : construct conditional distribution in the form of an algebraic decision diagram a. Extract sub-SPN of all nodes that contain in their scope b. Remove the product nodes c. Replace each sum node by its corresponding hidden variable 6 CS486/686 Lecture Slides (c) 2017 P. Poupart

Some Observations • Deep SPNs can be converted into shallow BNs. • The depth of an SPN is proportional to the height of the highest algebraic decision diagram in the corresponding BN. 7 CS486/686 Lecture Slides (c) 2017 P. Poupart

Conversion Facts Thm 1: Any complete and decomposable SPN over variables can be converted into a BN with ADD representation in time . Furthermore and represent the same distribution and . Thm 2: Given any BN with ADD representation generated from a complete and decomposable SPN over variables , the original SPN can be recovered by applying the variable elimination algorithm in . 8 CS486/686 Lecture Slides (c) 2017 P. Poupart

Relationships Probabilistic distributions • Compact: space is polynomial in # of variables • Tractable: inference time is polynomial in # of variables SPN = BN Compact BN Compact SPN = Tractable SPN = Tractable BN 9 CS486/686 Lecture Slides (c) 2017 P. Poupart

Non-Convex Optimization s.t. • Approximations: – Projected gradient descent (PGD) – Exponential gradient (EG) – Sequential monomial approximation (SMA) – Convex concave procedure (CCCP = EM) 12 CS486/686 Lecture Slides (c) 2017 P. Poupart

Summary Algo Var Update Approximation additive linear PGD �� multiplicative linear EG �� multiplicative monomial SMA �� multiplicative Concave lower bound CCCP � (EM) � � � �� 13 CS486/686 Lecture Slides (c) 2017 P. Poupart

Scalability • Online: process data sequentially once only • Distributed: process subsets of data on different computers • Mini-batches: online PGD, online EG, online SMA, online EM • Problems: loss of information due to mini- batches, local optima, overfitting • Can we do better? 15 CS486/686 Lecture Slides (c) 2017 P. Poupart

Bayesian Learning • Bayes’ theorem (1764) • Broderick et al. (2013): facilitates – Online learning (streaming data) – Distributed computation core #1 core #3 core #2 17 CS486/686 Lecture Slides (c) 2017 P. Poupart

Exact Bayesian Learning • Assume a normal SPN where the weights of each sum node form a discrete distribution. • Prior: �� where • Likelihood: • Posterior: 18 CS486/686 Lecture Slides (c) 2017 P. Poupart

Method of Moments (1894) • Estimate model parameters by matching a subset of moments (i.e., mean and variance) • Performance guarantees – Break through: First provably consistent estimation algorithm for several mixture models • HMMs: Hsu, Kakade, Zhang (2008) • MoGs: Moitra, Valiant (2010), Belkin, Sinha (2010) • LDA: Anandkumar, Foster, Hsu, Kakade, Liu (2012) 20 CS486/686 Lecture Slides (c) 2017 P. Poupart

Bayesian Moment Matching for Sum Product Networks Bayesian Learning Online, distributed and + tractable algorithm for SPNs Method of Moments Approximate mixture of products of Dirichlets by a single product of Dirichlets that matches first and second order moments 21 CS486/686 Lecture Slides (c) 2017 P. Poupart

Recursive moment computation • Compute of posterior after observing If then Return leaf value Else if then Return �� Else if and then � Return �� ,�� Else Return ��,�� 24 CS486/686 Lecture Slides (c) 2017 P. Poupart

Sequence Data • How can we train an SPN with data sequences of varying length? • Examples – Sentence modeling: sequence of words – Activity recognition: sequence of measurements – Weather prediction: time-series data • Challenge: need structure that adapts to the length of the sequence while keeping # of parameters fixed 27 CS486/686 Lecture Slides (c) 2017 P. Poupart

Definitions • Dynamic Sum-Product Network: bottom network , a stack of template networks and a top network • Bottom network: directed acyclic graph with indicator leaves and roots that interface with the network above. • Top network: rooted directed acyclic graph with leaves that interface with the network below • Template network: directed acyclic graph of roots that interface with the network above, indicator leaves and additional leaves that interface with the network below. 29 CS486/686 Lecture Slides (c) 2017 P. Poupart

Invariance Let be a bijective mapping that associates inputs to corresponding outputs in a template network Invariance: a template network over is invariant when the scope of each interface node excludes and for all pairs of interface nodes and , the following properties hold: or • All interior and output sum nodes are complete • All interior and output product nodes are decomposable 30 CS486/686 Lecture Slides (c) 2017 P. Poupart

Completeness and Decomposability Theorem 1: If a. the bottom network is complete and decomposable, b. the scopes of all pairs of output interface nodes of the bottom network are either identical or disjoint, c. the scopes of the output interface nodes of the bottom network can be used to assign scopes to the input interface nodes of the template and top networks in such a way that the template network is invariant and the top network is complete and decomposable, then the DSPN is complete and decomposable 31 CS486/686 Lecture Slides (c) 2017 P. Poupart

Conclusion • Sum-Product Networks – Deep architecture with clear semantics – Tractable probabilistic graphical model • Future work – Decision SPNs: M. Melibari and P. Doshi • Open problem: – Thorough comparison of SPNs to other deep networks 37 CS486/686 Lecture Slides (c) 2017 P. Poupart

Sum-Product Networks CS486 / 686 University of Waterloo Lecture - PowerPoint PPT Presentation

Sum-Product Networks CS486 / 686 University of Waterloo Lecture 23: July 19, 2017 Outline SPNs in more depth Relationship to Bayesian networks Parameter estimation Online and distributed estimation Dynamic SPNs for

Product Section Product Section New Product Introduction New Product Introduction Product

ex Addition: 1-bit half adder A + Sum B Carry out Carry A B Sum out 0 0 A 0 1 Sum

Sum Product Networks What is a Sum Product Network? 1. It is a tractable probabilistic model

Basic Ruby Syntax sum = 0 Newline is statement separator i = 1 while i <= 10 do sum += i*i

ex start small with a 1-bit (half) adder A B Carry out Sum A 0 0 Sum 0 1 B 1 0 1 1

Chapter 6 Methods 1 Opening Problem Find the sum of integers from 1 to 10, from 20 to 30, and

Sum-Product Networks CS486 / 686 University of Waterloo Lecture 23: July 19, 2017 Outline

The Dot Product and Orthogonal Vectors The Dot Product Defn. The dot product (or inner product )

McQuay Product Presentation McQuay Product Presentation McQuay Product Presentation TI TLE :

Horizontal Water Source Heat Pump Horizontal Water Source Heat Pump Product Training Product

Basic Ruby Syntax No variable declarations sum = 0 Newline is statement separator i = 1 while

Nutrition Education and other Enrichm ent Activities for Sum m er Meal Program s Thursday May 5

The ML Language end; Lisp Algol 60 (We will use Standard ML.)

More on Verilog Sign extension: Example 1 wire [3:0] c, d reg [4:0] sum; always @ (c or d)

Game Theory : Zero-Sum Games, The Minimax Theorem CSC304 - Nisarg Shah 1 Zero-Sum Games

Part I bers, t - target number Question: Is there a subset of X such the sum of its elements is t ?

Controllability of parabolic systems: the moment method Evolution Equations: long time behavior

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 20:

Evaluation 1. written exam dealing with all theoretical background and examples discussed in the

MLE part 2 18 / 70 Gaussian Mixture Model Suppose data is drawn from k Gaussians, meaning Y =

On general criteria for when the spectrum of a combination of random matrices depends only on the

Polarization observables from the photoproduction of -mesons using linearly polarized photons

Response Surface Methods Response surface methodology (RSM) is a combination of statistics

Unconstrained Minimization (II) Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline

Sum-Product Networks CS486 / 686 University of Waterloo Lecture - PowerPoint PPT Presentation

Sum-Product Networks CS486 / 686 University of Waterloo Lecture 23: July 19, 2017 Outline SPNs in more depth Relationship to Bayesian networks Parameter estimation Online and distributed estimation Dynamic SPNs for

Product Section Product Section New Product Introduction New Product Introduction Product

ex Addition: 1-bit half adder A + Sum B Carry out Carry A B Sum out 0 0 A 0 1 Sum

Sum Product Networks What is a Sum Product Network? 1. It is a tractable probabilistic model

Basic Ruby Syntax sum = 0 Newline is statement separator i = 1 while i &lt;= 10 do sum += i*i

ex start small with a 1-bit (half) adder A B Carry out Sum A 0 0 Sum 0 1 B 1 0 1 1

Chapter 6 Methods 1 Opening Problem Find the sum of integers from 1 to 10, from 20 to 30, and

Sum-Product Networks CS486 / 686 University of Waterloo Lecture 23: July 19, 2017 Outline

The Dot Product and Orthogonal Vectors The Dot Product Defn. The dot product (or inner product )

McQuay Product Presentation McQuay Product Presentation McQuay Product Presentation TI TLE :

Horizontal Water Source Heat Pump Horizontal Water Source Heat Pump Product Training Product

Basic Ruby Syntax No variable declarations sum = 0 Newline is statement separator i = 1 while

Nutrition Education and other Enrichm ent Activities for Sum m er Meal Program s Thursday May 5

The ML Language end; Lisp Algol 60 (We will use Standard ML.)

More on Verilog Sign extension: Example 1 wire [3:0] c, d reg [4:0] sum; always @ (c or d)

Game Theory : Zero-Sum Games, The Minimax Theorem CSC304 - Nisarg Shah 1 Zero-Sum Games

Part I bers, t - target number Question: Is there a subset of X such the sum of its elements is t ?

Controllability of parabolic systems: the moment method Evolution Equations: long time behavior

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 20:

Evaluation 1. written exam dealing with all theoretical background and examples discussed in the

MLE part 2 18 / 70 Gaussian Mixture Model Suppose data is drawn from k Gaussians, meaning Y =

On general criteria for when the spectrum of a combination of random matrices depends only on the

Polarization observables from the photoproduction of -mesons using linearly polarized photons

Response Surface Methods Response surface methodology (RSM) is a combination of statistics

Unconstrained Minimization (II) Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline

Basic Ruby Syntax sum = 0 Newline is statement separator i = 1 while i <= 10 do sum += i*i