Learning Bayesian networks viewed as an optimization problem Milan - PowerPoint PPT Presentation

Learning Bayesian networks viewed as an optimization problem Milan Studen´ y Institute of Information Theory and Automation of the AS CR Prague COSA Workshop Combinatorial Optimization, Statistics, and Applications Munich, Germany, March 15, 2011, 10:45 the presentation is based on joint work with David Haws, Raymond Hemmecke, Silvia Lindner and Jiˇ r´ ı Vomlel Milan Studen´ y et al. (Prague) Learning BNs as an optimization problem May 15, 2011 1 / 29

Summary of the talk Motivation: learning Bayesian network structure 1 Basic concepts 2 Original research goals 3 Edges of the polytope Polyhedral characterization of the polytope Lattice points in the polytope New research topics 4 Characteristic imset Plain zero-one encoding of a directed graph Recent findings 5 Conclusions 6 Milan Studen´ y et al. (Prague) Learning BNs as an optimization problem May 15, 2011 2 / 29

Motivation: learning Bayesian network structure Bayesian networks are special graphical models widely used both in artificial intelligence and statistics. They are described by acyclic directed graphs , whose nodes correspond to variables. The motivation for our research has been learning Bayesian network (BN) structure from data by a score-and-search method. By a quality criterion , also called a score , is meant a real function Q of the BN structure (= of a graph G , typically) and of the observed database D . The value Q ( G , D ) should say how much the BN structure given by G is suitable to explain the occurrence of the database D . The aim is to maximize G �→ Q ( G , D ) given the observed database D . An example of such a criterion is Schwarz’s BIC criterion . Milan Studen´ y et al. (Prague) Learning BNs as an optimization problem May 15, 2011 3 / 29

Motivation: algebraic approach to learning M. Studen´ y (2005). Probabilistic Conditional Independence Structures . Springer Verlag, London. The basic idea of the proposed algebraic approach was to represent the BN structure given by an acyclic directed graph G by a certain vector u G having integers as components, called the standard imset (for G ). The point is that then every reasonable quality criterion Q for learning BN structure appears to be an affine function of the standard imset. More specifically, one has then Q ( G , D ) = s Q D − � t Q where s Q D , u G � , D ∈ R , t Q D is a real vector of the same dimension as u G and �∗ , ∗� denotes the scalar product. The vector t Q D was called the data vector (relative to Q ). Milan Studen´ y et al. (Prague) Learning BNs as an optimization problem May 15, 2011 4 / 29

Motivation: geometric view and optimization task M. Studen´ y, J. Vomlel and R. Hemmecke (2010). A geometric view on learning Bayesian network structures. International Journal of Approximate Reasoning 51 :578-586. The main result of this paper was that the set of standard imsets over a fixed set of variables N is the set of vertices (= extreme points) of a certain polytope P. In particular, the task to maximize Q over BN structures (= acyclic directed graphs) is equivalent to a linear optimization problem , namely to maximize an affine function over the above-mentioned polytope P. This problem has been treated thoroughly within the linear programming community. Nevertheless, to apply efficient methods of combinatorial optimization in this area one needs to solve some open mathematical problems (of geometric nature concerning the polytope). Milan Studen´ y et al. (Prague) Learning BNs as an optimization problem May 15, 2011 5 / 29

Overview of our research goals M. Studen´ y and J. Vomlel (2011). On open questions in the geometric approach to structural learning Bayesian nets. To appear in International Journal of Approximate Reasoning , a special issue devoted to WUPES 09. Specifically, we are interested in: describing the geometric edges of P, polyhedral characterization of the polytope P, finding all lattice points within the polytope P. Later, we extended our interests to: (in cooperation with R. Hemmecke, S. Lindner and D. Haws) alternative BN structure representatives, complexity tasks and application to learning restricted Bayesian network structures. Milan Studen´ y et al. (Prague) Learning BNs as an optimization problem May 15, 2011 6 / 29

Basic concepts: Bayesian network structure N a non-empty finite set of variables X i , | X i | ≥ 2 the individual sample spaces (for i ∈ N ) DAGS ( N ) collection of all acyclic directed graphs over N The (discrete) Bayesian network (BN) is a pair ( G , P ), where G ∈ DAGS ( N ) and P is a probability distribution on the joint sample space X N ≡ � i ∈ N X i which (recursively) factorizes according to G . Given G ∈ DAGS ( N ), (the statistical model of) a BN structure is the class of all distributions P on X N that factorize according to G . Since two different graphs over N may describe the same BN structure, one is interested in describing the BN structure by a unique representative. A classic such graphical representative is so-called essential graph . Milan Studen´ y et al. (Prague) Learning BNs as an optimization problem May 15, 2011 7 / 29

Basic concepts: learning by a score-and-search method Data are assumed to have the form of a complete database: x 1 , . . . , x d a sequence of elements of X N of the length d ≥ 1 called a database of the length d or a sample of the size d DATA ( N , d ) the set of all databases over N of the length d (provided the individual sample spaces X i for i ∈ N are fixed) Definition (quality criterion) Quality criterion or a score (for learning BN structure) is a real function Q ( G , D ) on DAGS ( N ) × DATA ( N , d ). The value Q ( G , D ) should somehow evaluate how the statistical model given by G fits the database D . Thus, the aim is to maximize the function G �→ Q ( G , D ) given the observed database D ∈ DATA ( N , d ). Milan Studen´ y et al. (Prague) Learning BNs as an optimization problem May 15, 2011 8 / 29

Basic concepts: imsets Definition (imset) An imset u over N is an integer-valued function on P ( N ) ≡ { A ; A ⊆ N } , the power set of N . It can be viewed as a vector whose components are integers, indexed by subsets of N . [= a lattice point in the Euclidean space R P ( N ) ] A trivial example of an imset is the zero imset , denoted by 0. Given A ⊆ N , the symbol δ A will denote this basic imset : � 1 if B = A , δ A ( B ) = for B ⊆ N . 0 if B � = A , Since { δ A ; A ⊆ N } is a linear basis of R P ( N ) , any imset can be expressed as a linear combination of these basic imsets (with integers as coefficients). Milan Studen´ y et al. (Prague) Learning BNs as an optimization problem May 15, 2011 9 / 29

Basic concepts: standard imset Definition (standard imset) Given G ∈ DAGS ( N ), the standard imset for G is given by the formula: � u G = δ N − δ ∅ + { δ pa G ( i ) − δ { i }∪ pa G ( i ) } , i ∈ N where pa G ( i ) = { j ∈ N ; j → i in G } denotes the set of parents of i in G . Note that the terms in the above formula can both sum up and cancel each other. Of course, it is a vector of an exponential length in | N | . However, it follows from the definition that u G has at most 2 · | N | non-zero values. In particular, the memory demands for representing standard imsets are polynomial in | N | . Milan Studen´ y et al. (Prague) Learning BNs as an optimization problem May 15, 2011 10 / 29

Basic concepts: algebraic approach to learning Lemma (Studen´ y 2005) Given G , H ∈ DAGS ( N ), one has u G = u H iff G and H describe the same BN structure. Thus, the standard imset is a unique representative of the BN structure. There are two important technical requirements on quality criteria introduced by researchers in computer science: they should be score equivalent and decomposable . Theorem (Studen´ y 2005) Every score equivalent and decomposable criterion Q has the form Q ( G , D ) = s Q D − � t Q D , u G � for G ∈ DAGS ( N ) , D ∈ DATA ( N , d ) , d ≥ 1 D ∈ R P ( N ) do not depend on G. where s Q D ∈ R and the vector t Q Milan Studen´ y et al. (Prague) Learning BNs as an optimization problem May 15, 2011 11 / 29

Basic concepts: geometric view Definition (standard imset polytope) Having fixed the set of variables N , let us put: S ≡ { u G ; G ∈ DAGS ( N ) } ⊆ R P ( N ) , P ≡ conv (S) . The above polytope P will be called the standard imset polytope . Theorem (Studen´ y, Vomlel, Hemmecke 2010) S is the set of vertices of the integral polytope P . Example Distinguished vertices of P are: the zero imset 0 (= the standard imset for the full graph), the imset u ∅ ≡ δ N − � i ∈ N δ { i } + ( | N | − 1) · δ ∅ (= the standard imset for the empty graph). In case | N | = 3, P is the intersection of two cones, with origins in 0 and u ∅ . Milan Studen´ y et al. (Prague) Learning BNs as an optimization problem May 15, 2011 12 / 29

Learning Bayesian networks viewed as an optimization problem Milan - PowerPoint PPT Presentation

Learning Bayesian networks viewed as an optimization problem Milan Studen y Institute of Information Theory and Automation of the AS CR Prague COSA Workshop Combinatorial Optimization, Statistics, and Applications Munich, Germany, March

CS 331: Bayesian Networks 2 1 Bayesian Networks Youve heard about how Bayesian networks

Outline Intro to RL and Bayesian Learning History of Bayesian RL Model-based Bayesian

Bayesian Networks Youve heard about how Bayesian networks have revolutionized AI

Bayesian Learning 1 Outline MLE, MAP vs. Bayesian Learning Bayesian Linear Regression

AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS Bayesian Networks Directed Acyclic Graph (DAG)

Being Bayesian About Being Bayesian About Net work St ruct ure Net work St ruct ure A Bayesian

Bayesian networks (2) Lirong Xia Last class Bayesian networks compact, graphical

CS440/ECE448 Lecture 15: Bayesian Inference and Bayesian Learning Slides by Svetlana Lazebnik,

Bayesian Methods for Neural Networks Readings: Bishop, Neural Networks for Pattern Recognition .

Chapter14 Probabilistic Reasoning (Bayesian Networks) Sec. 1 - 2 20070607 Chap14 1

Beyond Uniform Priors in Bayesian Network Structure Learning (for Discrete Bayesian Networks)

Learning Bayesian Networks: Learning Bayesian Networks: Na ve and non ve and non- -Na

Bayesian Networks Philipp Koehn 2 April 2020 Philipp Koehn Artificial Intelligence: Bayesian

Bayesian Networks Philipp Koehn 6 April 2017 Philipp Koehn Artificial Intelligence: Bayesian

Probabilistic Modeling: Bayesian Networks Bioinformatics: Sequence Analysis COMP 571 - Spring

Bayesian Networks Li Xiong Slide credits: Page (Wisconsin) CS760 , Zhu (Wisconsin) KDD 12

Bayesian model averaging Dr. Jarad Niemi Iowa State University September 7, 2017 Jarad Niemi

Introduction to Machine Learning Part 2 Yingyu Liang yliang@cs.wisc.edu Computer Sciences

What Makes A Design Difficult to Route Charles J. Alpert, Zhuo Li, Michael Moffitt, Gi-Joon Nam,

Meeting Agenda Welcome and opening remarks Israel Ruiz, Executive Vice President and Treasurer

Outline 1) Incorporating theoretical systematics in p fits 2) Spectral anomalies Frequentist

An abstract two-level Schwarz method for systems with high contrast coefficients Clemens

Customer Heterogeneity in Purchasing Habit of Variety Seeking Based on Hierarchical Bayesian

A posteriori error estimates for space-time domain decomposition method for two-phase flow