Optimal Algorithms for Learning Bayesian Optimal Algorithms for - PowerPoint PPT Presentation

Optimal Algorithms for Learning Bayesian Optimal Algorithms for Learning Bayesian Network Structures: Network Structures: Introduction and Heuristic Search Introduction and Heuristic Search Changhe Yuan UAI 2015 Tutorial Sunday, July 12 th , 8:30-10:20am http://auai.org/uai2015/tutorialsDetails.shtml#tutorial_1 1/56

About tutorial presenters • Dr. Changhe Yuan (Part I) – Associate Professor of Computer Science at Queens College/City University of New York – Director of the Uncertainty Reasoning Laboratory (URL Lab). • Dr. James Cussens (Part II) – Senior Lecturer in the Dept of Computer Science at the University of York, UK • Dr. Brandon Malone (Part I and II) – Postdoctoral researcher at the Max Planck Institute for Biology of Ageing 2/56

Bayesian networks • A Bayesian Network is a directed acyclic graph (DAG) in which: – A set of random variables makes up the nodes in the network. – A set of directed links or arrows connects pairs of nodes. – Each node has a conditional probability table that quantifies the effects the parents have on the node. P(B) P(E) P(A|B,E) P(R|E) P(N|A) 3/56

Learning Bayesian networks • Very often we have data sets • We can learn Bayesian networks from these data structure numerical parameters data 4/56

Major learning approaches • Score-based structure learning – Find the highest-scoring network structure » Optimal algorithms (FOCUS of TUTORIAL) » Approximation algorithms • Constraint-based structure learning – Find a network that best explains the dependencies and independencies in the data • Hybrid approaches – Integrate constraint- and/or score-based structure learning • Bayesian model averaging – Average the prediction of all possible structures 5/56

Score-based learning • Find a Bayesian network that optimizes a given scoring function • Two major issues – How to define a scoring function? – How to formulate and solve the optimization problem? 6/56

Scoring functions • Bayesian Dirichlet Family (BD) – K2 • Minimum Description Length (MDL) • Factorized Normalized Maximum Likelihood (fNML) • Akaike’s Information Criterion (AIC) • Mutual information tests (MIT) • Etc. 7/56

Decomposability • All of these are expressed as a sum over the individual variables, e.g. BDeu MDL fNML • This property is called decomposability and will be quite important for structure learning. [Heckerman 1995, etc.] 8/56

Querying best parents e.g., Naive solution: Search through all Solution: Propagate optimal of the subsets and find the best scores and store as hash table. 9/56

Score pruning • Theorem: Say PA i ⊂ PA’ i and Score�X i |PA i � � Score�X|PA’ i � . Then PA’ i is not optimal for X i . • Ways of pruning: – Compare Score�X i |PA i � and Score�X|PA’ i � – Using properties of scoring functions without computing scores (e.g., exponential pruning) • After pruning, each variable has a list of possibly optimal parent sets (POPS) – The scores of all POPS are called local scores POPS(X 1 | PA(X 1 )) [Teyssier and Koller 2005, de Campos and Ji 2011, Tian 2000] 10/56

Number of POPS 1.00E+10 Full Largest Layer Sparse 1.00E+09 1.00E+08 1.00E+07 Optimal Parent Sets 1.00E+06 1.00E+05 1.00E+04 1.00E+03 1.00E+02 1.00E+01 1.00E+00 The number of parent sets and their scores stored in the full parent graphs (“Full”), the largest layer of the parent graphs in memory-efficient dynamic programming (“Largest Layer”), and the possibly optimal parent sets (“Sparse”). 11/56

Practicalities • Empirically, the sparse AD-tree data structure is the best approach for collecting sufficient statistics. • A breadth-first score calculation strategy maximizes the efficiency of exponential pruning. • Caching significantly reduces runtime. • Local score calculations are easily parallelizable. 12/56

Graph search formulation • Formulate the learning task as a shortest path problem – The shortest path solution to a graph search problem corresponds to an optimal Bayesian network [Yuan, Malone, Wu, IJCAI-11] 13/56

Search graph (Order graph) Formulation: Search space: Variable subsets ϕ Start node: Empty set Goal node: Complete set 1 2 3 4 Edges: Add variable Edge cost: BestScore( X , U ) for edge U  U  { X } 1,2 1,3 2,3 1,4 2,4 3,4 3 1,2,3 1,2,4 1,3,4 2,3,4 2 4 1,2,3,4 [Yuan, Malone, Wu, IJCAI-11] 14/56

Search graph (Order graph) Formulation: Search space: Variable subsets ϕ Start node: Empty set Goal node: Complete set 1 2 3 4 Edges: Add variable Edge cost: BestScore( X , U ) for edge U  U  { X } Task: find the shortest path between 1,2 1,3 2,3 1,4 2,4 3,4 start and goal nodes 1 1,3,4,2 1,2,3 1,2,4 1,3,4 2,3,4 3 1,2,3,4 4 2 [Yuan, Malone, Wu, IJCAI-11] 15/56

A* algorithm A* search: Expands the nodes in the order of quality: f = g + h 0 ϕ 10 g ( U ) = Score ( U ) h ( U ) = estimated distance to goal Notation: h g : g-cost h : h-cost Red shape-outlined: open nodes No outline: closed nodes 1,2,3,4 [Yuan, Malone, Wu, IJCAI-11] 16/56

A* algorithm A* search: Expands the nodes in the order of quality: f = g + h 0 ϕ 10 g ( U ) = Score ( U ) g h ( U ) = estimated distance to goal 5 2 4 3 1 2 3 4 11 10 14 8 Notation: g : g-cost h h : h-cost Red shape-outlined: open nodes No outline: closed nodes 1,2,3,4 [Yuan, Malone, Wu, IJCAI-11] 17/56

A* algorithm A* search: Expands the nodes in the order of quality: f = g + h g 0 ϕ 10 g ( U ) = Score ( U ) h ( U ) = estimated distance to goal 5 2 4 3 1 2 3 4 11 10 14 8 Notation: g : g-cost 1,3 3,4 2,3 h h : h-cost 4/12 5/10 5/11 Red shape-outlined: open nodes No outline: closed nodes 1,2,3,4 [Yuan, Malone, Wu, IJCAI-11] 18/56

A* algorithm A* search: Expands the nodes in the order of quality: f = g + h 0 ϕ 10 g ( U ) = Score ( U ) g h ( U ) = estimated distance to goal 5 2 4 3 1 2 3 4 11 10 14 8 Notation: g : g-cost 1,3 3,4 1,2 2,3 1,4 h : h-cost 4/12 4/10 5/11 4/13 5/12 Red shape-outlined: open nodes h No outline: closed nodes 1,2,3,4 [Yuan, Malone, Wu, IJCAI-11] 19/56

A* algorithm A* search: Expands the nodes in the order of quality: f = g + h 0 ϕ 10 g ( U ) = Score ( U ) g h ( U ) = estimated distance to goal 5 2 4 3 1 2 3 4 11 10 14 8 Notation: g : g-cost 1,3 3,4 1,2 2,3 1,4 h : h-cost 4/12 4/10 5/11 4/13 5/12 Red shape-outlined: open nodes 1,2,3 1,3,4 No outline: closed nodes 5/10 5/13 h 1,2,3,4 [Yuan, Malone, Wu, IJCAI-11] 20/56

A* algorithm A* search: Expands the nodes in the order of quality: f = g + h 0 ϕ 10 g ( U ) = Score ( U ) g h ( U ) = estimated distance to goal 5 2 4 3 1 2 3 4 11 10 14 8 Notation: g : g-cost 1,3 3,4 1,2 2,3 1,4 h : h-cost 4/12 4/10 5/11 4/13 5/12 Red shape-outlined: open nodes 1,2,3 1,3,4 No outline: closed nodes 5/10 5/13 1,2,3,4 15/0 [Yuan, Malone, Wu, IJCAI-11] 21/56

A* algorithm A* search: Expands the nodes in the order of quality: f = g + h 0 ϕ 10 g ( U ) = Score ( U ) g h ( U ) = estimated distance to goal 5 2 4 3 1 2 3 4 11 10 14 8 Notation: g : g-cost 1,3 3,4 1,2 2,3 1,4 h : h-cost 4/12 4/10 5/11 4/13 5/12 Red shape-outlined: open nodes 1,2,3 1,3,4 No outline: closed nodes 5/10 5/13 1,2,3,4 15/0 [Yuan, Malone, Wu, IJCAI-11] 22/56

Simple heuristic A* search: Expands nodes in order of quality: f = g + h ϕ g ( U ) = Score(U) h ( U ) =  X  V \ U BestScore ( X, V \{ X }) 1 2 3 4 h ({1,3}): 2 1,3 3,4 1,2 2,3 1,4 3 h 1 4 1,2,3,4 [Yuan, Malone, Wu, IJCAI-11] 23/56

Properties of the simple heuristic • Theorem: The simple heuristic function h is admissible – Optimistic estimation: never overestimate the true distance – Guarantees the optimality of A* • Theorem: h is also consistent – Satisfies triangular inequality, yielding a monotonic heuristic – Consistency => admissibility – Guarantees the optimality of g cost of any node to be expanded [Yuan, Malone, Wu, IJCAI-11] 24/56

BFBnB algorithm ϕ Breadth-first branch and bound 1 2 3 4 search (BFBnB): • Motivation: Exponential-size order&parent graphs 1,2 1,3 2,3 1,4 2,4 3,4 • Observation: Natural layered structure 1,2,3 1,2,4 1,3,4 2,3,4 • Solution: Search one layer at a time 1,2,3,4 [Malone, Yuan, Hansen, UAI-11] 25/56

BFBnB algorithm ϕ Breadth-first branch and bound 1 2 3 4 search (BFBnB): • Motivation: Exponential-size order&parent graphs 1,2 1,3 2,3 1,4 2,4 3,4 • Observation: Natural layered structure 1,2,3 1,2,4 1,3,4 2,3,4 • Solution: Search one layer at a time 1,2,3,4 [Malone, Yuan, Hansen, UAI-11] 26/56

[Malone, Yuan, Hansen, UAI-11] 27/56 4 3 ϕ 2 BFBnB algorithm 1

Optimal Algorithms for Learning Bayesian Optimal Algorithms for - PowerPoint PPT Presentation

Optimal Algorithms for Learning Bayesian Optimal Algorithms for Learning Bayesian Network Structures: Network Structures: Introduction and Heuristic Search Introduction and Heuristic Search Changhe Yuan UAI 2015 Tutorial Sunday, July 12 th ,

Outline Intro to RL and Bayesian Learning History of Bayesian RL Model-based Bayesian

Bayesian Learning 1 Outline MLE, MAP vs. Bayesian Learning Bayesian Linear Regression

Being Bayesian About Being Bayesian About Net work St ruct ure Net work St ruct ure A Bayesian

CS440/ECE448 Lecture 15: Bayesian Inference and Bayesian Learning Slides by Svetlana Lazebnik,

CS 331: Bayesian Networks 2 1 Bayesian Networks Youve heard about how Bayesian networks

AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS Bayesian Networks Directed Acyclic Graph (DAG)

Optimal Algorithms for Learning Bayesian Network Structures Integer Linear Programming and

Machine Learning Algorithms for Classification Machine Learning Algorithms for Classification

Bayesian decision theory Andrea Passerini passerini@disi.unitn.it Machine Learning Bayesian

Beyond Uniform Priors in Bayesian Network Structure Learning (for Discrete Bayesian Networks)

A simple Bayesian regression model Alicia Johnson Associate Professor, Macalester College

Part 7 Bayesian hierarchical modelling, simulation and MCMC by Gero Walter 252 Bayesian

Case Study: Bayesian Linear Regression and Sparse Bayesian Models Piyush Rai Dept. of CSE, IIT

Bayesian Networks Youve heard about how Bayesian networks have revolutionized AI

Meta-Bayesian Analysis A Bayesian decision-theoretic analysis of Bayesian inference under model

Lecture 6. Bayesian estimation Lecture 6. Bayesian estimation 1 (172) 6. Bayesian estimation

First Quarter 2017 Earnings Conference Call MAY 3, 2017 1 FIRST QUARTER 2017 EARNINGS

Bitcoin Portland State University CS 410/510 Blockchain Development & Security Pecursor #1:

L OVE Here are the songs we sang this Sunday. This shows the song name, the artist who performed

Outline Intuition 1. Intuition Example 1 Example 2 Example 1 Example 3 Example 2 Algorithms

The problem Combining querying of XML data with ontology queries Example XML document

Three-way competition and the emergence of do -support in English Aaron Ecay University of

TOOLS FOR ASSESSING ECONOMIC IMPACT: A Primer for Food System Practitioners September 19, 2017

Agri-Tech East, RNAA annual lecture Easton and Otley College, Norwich Consumers, environment,

Optimal Algorithms for Learning Bayesian Optimal Algorithms for - PowerPoint PPT Presentation

Optimal Algorithms for Learning Bayesian Optimal Algorithms for Learning Bayesian Network Structures: Network Structures: Introduction and Heuristic Search Introduction and Heuristic Search Changhe Yuan UAI 2015 Tutorial Sunday, July 12 th ,

Outline Intro to RL and Bayesian Learning History of Bayesian RL Model-based Bayesian

Bayesian Learning 1 Outline MLE, MAP vs. Bayesian Learning Bayesian Linear Regression

Being Bayesian About Being Bayesian About Net work St ruct ure Net work St ruct ure A Bayesian

CS440/ECE448 Lecture 15: Bayesian Inference and Bayesian Learning Slides by Svetlana Lazebnik,

CS 331: Bayesian Networks 2 1 Bayesian Networks Youve heard about how Bayesian networks

AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS Bayesian Networks Directed Acyclic Graph (DAG)

Optimal Algorithms for Learning Bayesian Network Structures Integer Linear Programming and

Machine Learning Algorithms for Classification Machine Learning Algorithms for Classification

Bayesian decision theory Andrea Passerini passerini@disi.unitn.it Machine Learning Bayesian

Beyond Uniform Priors in Bayesian Network Structure Learning (for Discrete Bayesian Networks)

A simple Bayesian regression model Alicia Johnson Associate Professor, Macalester College

Part 7 Bayesian hierarchical modelling, simulation and MCMC by Gero Walter 252 Bayesian

Case Study: Bayesian Linear Regression and Sparse Bayesian Models Piyush Rai Dept. of CSE, IIT

Bayesian Networks Youve heard about how Bayesian networks have revolutionized AI

Meta-Bayesian Analysis A Bayesian decision-theoretic analysis of Bayesian inference under model

Lecture 6. Bayesian estimation Lecture 6. Bayesian estimation 1 (172) 6. Bayesian estimation

First Quarter 2017 Earnings Conference Call MAY 3, 2017 1 FIRST QUARTER 2017 EARNINGS

Bitcoin Portland State University CS 410/510 Blockchain Development &amp; Security Pecursor #1:

L OVE Here are the songs we sang this Sunday. This shows the song name, the artist who performed

Outline Intuition 1. Intuition Example 1 Example 2 Example 1 Example 3 Example 2 Algorithms

The problem Combining querying of XML data with ontology queries Example XML document

Three-way competition and the emergence of do -support in English Aaron Ecay University of

TOOLS FOR ASSESSING ECONOMIC IMPACT: A Primer for Food System Practitioners September 19, 2017

Agri-Tech East, RNAA annual lecture Easton and Otley College, Norwich Consumers, environment,

Bitcoin Portland State University CS 410/510 Blockchain Development & Security Pecursor #1: