optimal algorithms for learning bayesian optimal
play

Optimal Algorithms for Learning Bayesian Optimal Algorithms for - PowerPoint PPT Presentation

Optimal Algorithms for Learning Bayesian Optimal Algorithms for Learning Bayesian Network Structures: Network Structures: Introduction and Heuristic Search Introduction and Heuristic Search Changhe Yuan UAI 2015 Tutorial Sunday, July 12 th ,


  1. Optimal Algorithms for Learning Bayesian Optimal Algorithms for Learning Bayesian Network Structures: Network Structures: Introduction and Heuristic Search Introduction and Heuristic Search Changhe Yuan UAI 2015 Tutorial Sunday, July 12 th , 8:30-10:20am http://auai.org/uai2015/tutorialsDetails.shtml#tutorial_1 1/56

  2. About tutorial presenters • Dr. Changhe Yuan (Part I) – Associate Professor of Computer Science at Queens College/City University of New York – Director of the Uncertainty Reasoning Laboratory (URL Lab). • Dr. James Cussens (Part II) – Senior Lecturer in the Dept of Computer Science at the University of York, UK • Dr. Brandon Malone (Part I and II) – Postdoctoral researcher at the Max Planck Institute for Biology of Ageing 2/56

  3. Bayesian networks • A Bayesian Network is a directed acyclic graph (DAG) in which: – A set of random variables makes up the nodes in the network. – A set of directed links or arrows connects pairs of nodes. – Each node has a conditional probability table that quantifies the effects the parents have on the node. P(B) P(E) P(A|B,E) P(R|E) P(N|A) 3/56

  4. Learning Bayesian networks • Very often we have data sets • We can learn Bayesian networks from these data structure numerical parameters data 4/56

  5. Major learning approaches • Score-based structure learning – Find the highest-scoring network structure » Optimal algorithms (FOCUS of TUTORIAL) » Approximation algorithms • Constraint-based structure learning – Find a network that best explains the dependencies and independencies in the data • Hybrid approaches – Integrate constraint- and/or score-based structure learning • Bayesian model averaging – Average the prediction of all possible structures 5/56

  6. Score-based learning • Find a Bayesian network that optimizes a given scoring function • Two major issues – How to define a scoring function? – How to formulate and solve the optimization problem? 6/56

  7. Scoring functions • Bayesian Dirichlet Family (BD) – K2 • Minimum Description Length (MDL) • Factorized Normalized Maximum Likelihood (fNML) • Akaike’s Information Criterion (AIC) • Mutual information tests (MIT) • Etc. 7/56

  8. Decomposability • All of these are expressed as a sum over the individual variables, e.g. BDeu MDL fNML • This property is called decomposability and will be quite important for structure learning. [Heckerman 1995, etc.] 8/56

  9. Querying best parents e.g., Naive solution: Search through all Solution: Propagate optimal of the subsets and find the best scores and store as hash table. 9/56

  10. Score pruning • Theorem: Say PA i ⊂ PA’ i and Score�X i |PA i � � Score�X|PA’ i � . Then PA’ i is not optimal for X i . • Ways of pruning: – Compare Score�X i |PA i � and Score�X|PA’ i � – Using properties of scoring functions without computing scores (e.g., exponential pruning) • After pruning, each variable has a list of possibly optimal parent sets (POPS) – The scores of all POPS are called local scores POPS(X 1 | PA(X 1 )) [Teyssier and Koller 2005, de Campos and Ji 2011, Tian 2000] 10/56

  11. Number of POPS 1.00E+10 Full Largest Layer Sparse 1.00E+09 1.00E+08 1.00E+07 Optimal Parent Sets 1.00E+06 1.00E+05 1.00E+04 1.00E+03 1.00E+02 1.00E+01 1.00E+00 The number of parent sets and their scores stored in the full parent graphs (“Full”), the largest layer of the parent graphs in memory-efficient dynamic programming (“Largest Layer”), and the possibly optimal parent sets (“Sparse”). 11/56

  12. Practicalities • Empirically, the sparse AD-tree data structure is the best approach for collecting sufficient statistics. • A breadth-first score calculation strategy maximizes the efficiency of exponential pruning. • Caching significantly reduces runtime. • Local score calculations are easily parallelizable. 12/56

  13. Graph search formulation • Formulate the learning task as a shortest path problem – The shortest path solution to a graph search problem corresponds to an optimal Bayesian network [Yuan, Malone, Wu, IJCAI-11] 13/56

  14. Search graph (Order graph) Formulation: Search space: Variable subsets ϕ Start node: Empty set Goal node: Complete set 1 2 3 4 Edges: Add variable Edge cost: BestScore( X , U ) for edge U  U  { X } 1,2 1,3 2,3 1,4 2,4 3,4 3 1,2,3 1,2,4 1,3,4 2,3,4 2 4 1,2,3,4 [Yuan, Malone, Wu, IJCAI-11] 14/56

  15. Search graph (Order graph) Formulation: Search space: Variable subsets ϕ Start node: Empty set Goal node: Complete set 1 2 3 4 Edges: Add variable Edge cost: BestScore( X , U ) for edge U  U  { X } Task: find the shortest path between 1,2 1,3 2,3 1,4 2,4 3,4 start and goal nodes 1 1,3,4,2 1,2,3 1,2,4 1,3,4 2,3,4 3 1,2,3,4 4 2 [Yuan, Malone, Wu, IJCAI-11] 15/56

  16. A* algorithm A* search: Expands the nodes in the order of quality: f = g + h 0 ϕ 10 g ( U ) = Score ( U ) h ( U ) = estimated distance to goal Notation: h g : g-cost h : h-cost Red shape-outlined: open nodes No outline: closed nodes 1,2,3,4 [Yuan, Malone, Wu, IJCAI-11] 16/56

  17. A* algorithm A* search: Expands the nodes in the order of quality: f = g + h 0 ϕ 10 g ( U ) = Score ( U ) g h ( U ) = estimated distance to goal 5 2 4 3 1 2 3 4 11 10 14 8 Notation: g : g-cost h h : h-cost Red shape-outlined: open nodes No outline: closed nodes 1,2,3,4 [Yuan, Malone, Wu, IJCAI-11] 17/56

  18. A* algorithm A* search: Expands the nodes in the order of quality: f = g + h g 0 ϕ 10 g ( U ) = Score ( U ) h ( U ) = estimated distance to goal 5 2 4 3 1 2 3 4 11 10 14 8 Notation: g : g-cost 1,3 3,4 2,3 h h : h-cost 4/12 5/10 5/11 Red shape-outlined: open nodes No outline: closed nodes 1,2,3,4 [Yuan, Malone, Wu, IJCAI-11] 18/56

  19. A* algorithm A* search: Expands the nodes in the order of quality: f = g + h 0 ϕ 10 g ( U ) = Score ( U ) g h ( U ) = estimated distance to goal 5 2 4 3 1 2 3 4 11 10 14 8 Notation: g : g-cost 1,3 3,4 1,2 2,3 1,4 h : h-cost 4/12 4/10 5/11 4/13 5/12 Red shape-outlined: open nodes h No outline: closed nodes 1,2,3,4 [Yuan, Malone, Wu, IJCAI-11] 19/56

  20. A* algorithm A* search: Expands the nodes in the order of quality: f = g + h 0 ϕ 10 g ( U ) = Score ( U ) g h ( U ) = estimated distance to goal 5 2 4 3 1 2 3 4 11 10 14 8 Notation: g : g-cost 1,3 3,4 1,2 2,3 1,4 h : h-cost 4/12 4/10 5/11 4/13 5/12 Red shape-outlined: open nodes 1,2,3 1,3,4 No outline: closed nodes 5/10 5/13 h 1,2,3,4 [Yuan, Malone, Wu, IJCAI-11] 20/56

  21. A* algorithm A* search: Expands the nodes in the order of quality: f = g + h 0 ϕ 10 g ( U ) = Score ( U ) g h ( U ) = estimated distance to goal 5 2 4 3 1 2 3 4 11 10 14 8 Notation: g : g-cost 1,3 3,4 1,2 2,3 1,4 h : h-cost 4/12 4/10 5/11 4/13 5/12 Red shape-outlined: open nodes 1,2,3 1,3,4 No outline: closed nodes 5/10 5/13 1,2,3,4 15/0 [Yuan, Malone, Wu, IJCAI-11] 21/56

  22. A* algorithm A* search: Expands the nodes in the order of quality: f = g + h 0 ϕ 10 g ( U ) = Score ( U ) g h ( U ) = estimated distance to goal 5 2 4 3 1 2 3 4 11 10 14 8 Notation: g : g-cost 1,3 3,4 1,2 2,3 1,4 h : h-cost 4/12 4/10 5/11 4/13 5/12 Red shape-outlined: open nodes 1,2,3 1,3,4 No outline: closed nodes 5/10 5/13 1,2,3,4 15/0 [Yuan, Malone, Wu, IJCAI-11] 22/56

  23. Simple heuristic A* search: Expands nodes in order of quality: f = g + h ϕ g ( U ) = Score(U) h ( U ) =  X  V \ U BestScore ( X, V \{ X }) 1 2 3 4 h ({1,3}): 2 1,3 3,4 1,2 2,3 1,4 3 h 1 4 1,2,3,4 [Yuan, Malone, Wu, IJCAI-11] 23/56

  24. Properties of the simple heuristic • Theorem: The simple heuristic function h is admissible – Optimistic estimation: never overestimate the true distance – Guarantees the optimality of A* • Theorem: h is also consistent – Satisfies triangular inequality, yielding a monotonic heuristic – Consistency => admissibility – Guarantees the optimality of g cost of any node to be expanded [Yuan, Malone, Wu, IJCAI-11] 24/56

  25. BFBnB algorithm ϕ Breadth-first branch and bound 1 2 3 4 search (BFBnB): • Motivation: Exponential-size order&parent graphs 1,2 1,3 2,3 1,4 2,4 3,4 • Observation: Natural layered structure 1,2,3 1,2,4 1,3,4 2,3,4 • Solution: Search one layer at a time 1,2,3,4 [Malone, Yuan, Hansen, UAI-11] 25/56

  26. BFBnB algorithm ϕ Breadth-first branch and bound 1 2 3 4 search (BFBnB): • Motivation: Exponential-size order&parent graphs 1,2 1,3 2,3 1,4 2,4 3,4 • Observation: Natural layered structure 1,2,3 1,2,4 1,3,4 2,3,4 • Solution: Search one layer at a time 1,2,3,4 [Malone, Yuan, Hansen, UAI-11] 26/56

  27. [Malone, Yuan, Hansen, UAI-11] 27/56 4 3 ϕ 2 BFBnB algorithm 1

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend