sum product networks
play

Sum-Product Networks CS486 / 686 University of Waterloo Lecture - PowerPoint PPT Presentation

Sum-Product Networks CS486 / 686 University of Waterloo Lecture 23: July 19, 2017 Outline SPNs in more depth Relationship to Bayesian networks Parameter estimation Online and distributed estimation Dynamic SPNs for


  1. Sum-Product Networks CS486 / 686 University of Waterloo Lecture 23: July 19, 2017

  2. Outline • SPNs in more depth – Relationship to Bayesian networks – Parameter estimation – Online and distributed estimation – Dynamic SPNs for sequence data 2 CS486/686 Lecture Slides (c) 2017 P. Poupart

  3. SPN  Bayes Net 1. Normalize SPN 2. Create structure 3. Construct conditional distribution 3 CS486/686 Lecture Slides (c) 2017 P. Poupart

  4. Normal SPN An SPN is said to be normal when 1. It is complete and decomposable 2. All weights are non-negative and the weights of the edges emanating from each sum node sum to 1. 3. Every terminal node in the SPN is a univariate distribution and the size of the scope of each sum node is at least 2. 4 CS486/686 Lecture Slides (c) 2017 P. Poupart

  5. Construct Bipartite Bayes Net 1. Create observable node for each observable variable 2. Create hidden node for each sum node 3. For each variable in the scope of a sum node, add a directed edge from the hidden node associated with the sum node to the observable node associated with the variable 5 CS486/686 Lecture Slides (c) 2017 P. Poupart

  6. Construct Conditional Distributions 1. Hidden node : 2. Observable node : construct conditional distribution in the form of an algebraic decision diagram a. Extract sub-SPN of all nodes that contain in their scope b. Remove the product nodes c. Replace each sum node by its corresponding hidden variable 6 CS486/686 Lecture Slides (c) 2017 P. Poupart

  7. Some Observations • Deep SPNs can be converted into shallow BNs. • The depth of an SPN is proportional to the height of the highest algebraic decision diagram in the corresponding BN. 7 CS486/686 Lecture Slides (c) 2017 P. Poupart

  8. Conversion Facts Thm 1: Any complete and decomposable SPN over variables can be converted into a BN with ADD representation in time . Furthermore and represent the same distribution and . Thm 2: Given any BN with ADD representation generated from a complete and decomposable SPN over variables , the original SPN can be recovered by applying the variable elimination algorithm in . 8 CS486/686 Lecture Slides (c) 2017 P. Poupart

  9. Relationships Probabilistic distributions • Compact: space is polynomial in # of variables • Tractable: inference time is polynomial in # of variables SPN = BN Compact BN Compact SPN = Tractable SPN = Tractable BN 9 CS486/686 Lecture Slides (c) 2017 P. Poupart

  10. Parameter Estimation • Maximum Likelihood Estimation • Online Bayesian Moment Matching 10 CS486/686 Lecture Slides (c) 2017 P. Poupart

  11. Maximum Log-Likelihood • Objective: � � Where and 11 CS486/686 Lecture Slides (c) 2017 P. Poupart

  12. Non-Convex Optimization s.t. • Approximations: – Projected gradient descent (PGD) – Exponential gradient (EG) – Sequential monomial approximation (SMA) – Convex concave procedure (CCCP = EM) 12 CS486/686 Lecture Slides (c) 2017 P. Poupart

  13. Summary Algo Var Update Approximation additive linear PGD ��� � �� �� �� �� multiplicative linear EG ��� � �� �� �� �� multiplicative monomial SMA ��� � �� �� �� �� multiplicative Concave lower bound CCCP � (EM) � � � ��� � �� �� � � � � 13 CS486/686 Lecture Slides (c) 2017 P. Poupart

  14. Results 14 CS486/686 Lecture Slides (c) 2017 P. Poupart

  15. Scalability • Online: process data sequentially once only • Distributed: process subsets of data on different computers • Mini-batches: online PGD, online EG, online SMA, online EM • Problems: loss of information due to mini- batches, local optima, overfitting • Can we do better? 15 CS486/686 Lecture Slides (c) 2017 P. Poupart

  16. Thomas Bayes 16 CS486/686 Lecture Slides (c) 2017 P. Poupart

  17. Bayesian Learning • Bayes’ theorem (1764) • Broderick et al. (2013): facilitates – Online learning (streaming data) – Distributed computation core #1 core #3 core #2 17 CS486/686 Lecture Slides (c) 2017 P. Poupart

  18. Exact Bayesian Learning • Assume a normal SPN where the weights of each sum node form a discrete distribution. • Prior: �� where • Likelihood: • Posterior: 18 CS486/686 Lecture Slides (c) 2017 P. Poupart

  19. Karl Pearson 19 CS486/686 Lecture Slides (c) 2017 P. Poupart

  20. Method of Moments (1894) • Estimate model parameters by matching a subset of moments (i.e., mean and variance) • Performance guarantees – Break through: First provably consistent estimation algorithm for several mixture models • HMMs: Hsu, Kakade, Zhang (2008) • MoGs: Moitra, Valiant (2010), Belkin, Sinha (2010) • LDA: Anandkumar, Foster, Hsu, Kakade, Liu (2012) 20 CS486/686 Lecture Slides (c) 2017 P. Poupart

  21. Bayesian Moment Matching for Sum Product Networks Bayesian Learning Online, distributed and + tractable algorithm for SPNs Method of Moments Approximate mixture of products of Dirichlets by a single product of Dirichlets that matches first and second order moments 21 CS486/686 Lecture Slides (c) 2017 P. Poupart

  22. Moments • Moment definition: �� • Dirichlet: �� – Moments: �� � �� �� �� �� � � – Hyperparameters: � ��� ��� ��� �� � � ��� ��� �� ��� 22 CS486/686 Lecture Slides (c) 2017 P. Poupart

  23. Moment Matching 23 CS486/686 Lecture Slides (c) 2017 P. Poupart

  24. Recursive moment computation • Compute of posterior after observing If then Return leaf value Else if then Return ����� Else if and then � Return ����� ��� �� �,����� Else Return ����,����� ����� 24 CS486/686 Lecture Slides (c) 2017 P. Poupart

  25. Results (benchmarks) 25 CS486/686 Lecture Slides (c) 2017 P. Poupart

  26. Results (Large Datasets) • Log likelihood • Time (minutes) 26 CS486/686 Lecture Slides (c) 2017 P. Poupart

  27. Sequence Data • How can we train an SPN with data sequences of varying length? • Examples – Sentence modeling: sequence of words – Activity recognition: sequence of measurements – Weather prediction: time-series data • Challenge: need structure that adapts to the length of the sequence while keeping # of parameters fixed 27 CS486/686 Lecture Slides (c) 2017 P. Poupart

  28. Dynamic SPN • Idea: stack template networks with identical structure and parameters + 28 CS486/686 Lecture Slides (c) 2017 P. Poupart

  29. Definitions • Dynamic Sum-Product Network: bottom network , a stack of template networks and a top network • Bottom network: directed acyclic graph with indicator leaves and roots that interface with the network above. • Top network: rooted directed acyclic graph with leaves that interface with the network below • Template network: directed acyclic graph of roots that interface with the network above, indicator leaves and additional leaves that interface with the network below. 29 CS486/686 Lecture Slides (c) 2017 P. Poupart

  30. Invariance Let be a bijective mapping that associates inputs to corresponding outputs in a template network Invariance: a template network over is invariant when the scope of each interface node excludes and for all pairs of interface nodes and , the following properties hold: or • All interior and output sum nodes are complete • All interior and output product nodes are decomposable 30 CS486/686 Lecture Slides (c) 2017 P. Poupart

  31. Completeness and Decomposability Theorem 1: If a. the bottom network is complete and decomposable, b. the scopes of all pairs of output interface nodes of the bottom network are either identical or disjoint, c. the scopes of the output interface nodes of the bottom network can be used to assign scopes to the input interface nodes of the template and top networks in such a way that the template network is invariant and the top network is complete and decomposable, then the DSPN is complete and decomposable 31 CS486/686 Lecture Slides (c) 2017 P. Poupart

  32. Structure Learning Anytime search-and-score framework Input: data, variables Output: Repeat Until stopping criterion is met 32 CS486/686 Lecture Slides (c) 2017 P. Poupart

  33. Initial Structure • Factorized model of univariate distributions 33 CS486/686 Lecture Slides (c) 2017 P. Poupart

  34. Neighbour generation • Replace sub-SPN rooted at a product node by a product of Naïve Bayes modes 34 CS486/686 Lecture Slides (c) 2017 P. Poupart

  35. Results 35 CS486/686 Lecture Slides (c) 2017 P. Poupart

  36. Results 36 CS486/686 Lecture Slides (c) 2017 P. Poupart

  37. Conclusion • Sum-Product Networks – Deep architecture with clear semantics – Tractable probabilistic graphical model • Future work – Decision SPNs: M. Melibari and P. Doshi • Open problem: – Thorough comparison of SPNs to other deep networks 37 CS486/686 Lecture Slides (c) 2017 P. Poupart

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend