Ehsan Nazerfard nazerfard@eecs.wsu.edu October 11, 2011 - - PowerPoint PPT Presentation

ehsan nazerfard nazerfard eecs wsu edu october 11 2011
SMART_READER_LITE
LIVE PREVIEW

Ehsan Nazerfard nazerfard@eecs.wsu.edu October 11, 2011 - - PowerPoint PPT Presentation

Ehsan Nazerfard nazerfard@eecs.wsu.edu October 11, 2011 Introduction: Graphical Models Tutorial: Bayesian Networks Structure Learning Approaches RAI: Recursive Autonomy Identification for Bayesian Network Structure Learning Bayesian


slide-1
SLIDE 1

Ehsan Nazerfard nazerfard@eecs.wsu.edu October 11, 2011

slide-2
SLIDE 2

Introduction: Graphical Models Tutorial: Bayesian Networks Structure Learning

Approaches

RAI: Recursive Autonomy Identification for

Bayesian Network Structure Learning Bayesian Network Structure Learning Summary and Further Studies Analysis Next Steps Discussion Topics

slide-3
SLIDE 3

Three main types of Graphical Models Represent joint probability distribution.

  • Nodes: random variables
  • Edges: statistical dependencies

Bayesian Networks Structure Learning: Introduction | Tutorial | RAI

slide-4
SLIDE 4

Three main types of Graphical Models Represent joint probability distribution.

  • Nodes: random variables
  • Edges: direct influence in directed graphs

Bayesian Networks Structure Learning: Introduction | Tutorial | RAI

slide-5
SLIDE 5

Why we need Graphical Models?

  • Intuitive way of representation of the relations

between variables

  • Abstract out the conditional independence relations

between variables

Bayesian Networks Structure Learning: Introduction | Tutorial | RAI

between variables

Conditional independence

  • “Is A dependent on B, given the value of C ?
  • A

B |C  P(A |B,C ) = P(A|C )

  • A

B |C  P(A,B |C ) = P(A|C )P(B|C )

slide-6
SLIDE 6

Given , a Bayesian network is an

annotated DAG that represents a unique JPD

  • ver :

Bayesian Networks Structure Learning: Introduction | Tutorial | RAI

) ,..., (

1 n

X X X 

X

i i i n

X Pa X p X X p )) ( | ( ) ,..., (

1

Each node is annotated with a CPT that

represents

i

)) ( | (

i i

X pa X p

DAG: Directed Acyclic Graph JPD: Joint Probability Distribution CPT: Conditional Probability Table

slide-7
SLIDE 7

Structure Learning

  • Find a structure of Bayesian Network (BN) that best

describes the observed data.

Parameter Learning

Learning the parameters when the structure is

Bayesian Networks Structure Learning: Introduction | Tutorial | RAI

  • Learning the parameters when the structure is

known. Generally parameter learning is a part of structure learning.

slide-8
SLIDE 8

Goal

  • Find a network structure of BN that describes the
  • bserved data the most.

NP-Complete !!

Naïve Bayes …

Bayesian Networks Structure Learning: Introduction | Tutorial | RAI

  • Naïve Bayes …
  • Using domain knowledge
  • Assumptions to make the problem tractable
  • The text books (generally) assume the network is

already known.

slide-9
SLIDE 9

Two main categories:

Bayesian Networks Structure Learning: Introduction | Tutorial | RAI

1) Score and Search-Based (S&S) approach

  • Learning the network structures

Constraint-Based (CB) approach

2) Constraint-Based (CB) approach

  • Learning the edges composing a structure
slide-10
SLIDE 10

Three main issues with S&S approach:

Bayesian Networks Structure Learning: Introduction | Tutorial | RAI

  • 1. Search space
  • 2. Search strategy
  • 3. Model selection criterion
  • 3. Model selection criterion
slide-11
SLIDE 11

Number of possible DAGs containing n nodes:

Curse of dimensionality

Bayesian Networks Structure Learning: Introduction | Tutorial | RAI

) ( 2 ) 1 ( ) (

) ( 1 1

i n f i n n f

i n i n i i

          

  

Curse of dimensionality

# of variables # of the possible DAGs 1 1 2 3 3 25 … … 8 78,370,2329,343 9 1,213,442,454,842,881 10 4,175,098,976,430,598,100

slide-12
SLIDE 12

Any search method from Artificial Intelligence:

  • DFS, BFS, Best First Search, Simulated Annealing
  • A-star and IDA-star

How neighborhood is defined ?

Bayesian Networks Structure Learning: Introduction | Tutorial | RAI

  • Current structure + adding, deleting or reversing

an arc.

  • No cycle is allowed

K2 algorithm [3]

  • Greedy search (total ordering is known)
slide-13
SLIDE 13

Scoring function

  • Evaluates how well a given network G matches the

data D.

  • The best BN is the one that maximizes the scoring

function.

Bayesian Networks Structure Learning: Introduction | Tutorial | RAI

function.

  • Based on ML:

Most frequently used: Bayesian Information

Criterion (BIC) [4]

)) , | ( ( max arg

k G G

G D p

G

 

slide-14
SLIDE 14

Input: Observational data set Output: The resulting Bayesian network

Bayesian Networks Structure Learning: Introduction | Tutorial | RAI Score and Search-Based approach – Pseudo code 1 Generate the initial BN (random or from domain knowledge), 1 Generate the initial BN (random or from domain knowledge), evaluate it and set it as the current network. 2 Evaluate the neighbors of the current BN. 3 If the best score of the neighbors is better than the score of the current BN, set neighbor with the best score as the current network and go to step 2. 4 Else stop the learning process.

slide-15
SLIDE 15

Two main categories:

Bayesian Networks Structure Learning: Introduction | Tutorial | RAI

1) Score and Search-Based approach (S&S)

  • Learning the network structures

Constraint-Based approach (CB)

2) Constraint-Based approach (CB)

  • Learning the edges composing a structure
slide-16
SLIDE 16

 Learning the edges of a structure

  • Discovering the conditional independence (CI)

relations from the data

  • Infer the structure from learned relations

Bayesian Networks Structure Learning: Introduction | Tutorial | RAI

slide-17
SLIDE 17

“Is A dependent on B, given the value of C ? Examples*

  • child’s genes grandparents’ genes | parents’ genes
  • amount of speeding fine type of car | speed

Bayesian Networks Structure Learning: Introduction | Tutorial | RAI

  • amount of speeding fine type of car | speed
  • lung cancer yellow teeth | smoker

* Borrowed from Dr. Zoubin Ghahramani’s GM Tutorial

slide-18
SLIDE 18

Example

  • Child’s genes and his grandparents' genes
  • A

D | B

Variable B d-separates A and D

Bayesian Networks Structure Learning: Introduction | Tutorial | RAI

slide-19
SLIDE 19

Example: Rolling two dices …

  • B

C |

  • B

C | D

V-Structure

Bayesian Networks Structure Learning: Introduction | Tutorial | RAI

slide-20
SLIDE 20

Example: Dice example …

  • C : random numbers are in [1,6]
  • D

E | C

Bayesian Networks Structure Learning: Introduction | Tutorial | RAI

slide-21
SLIDE 21

Is B conditionally independent of C, given E ?

  • B

C | E ?

Bayesian Networks Structure Learning: Introduction | Tutorial | RAI

slide-22
SLIDE 22

Conditioned on no single variable makes C

and D independent.

  • C

D | {B, E }

Bayesian Networks Structure Learning: Introduction | Tutorial | RAI

slide-23
SLIDE 23

Different between CB algorithms

  • completeness and complexity

Algorithms (not limited to)

  • TPDA: Three Phase Dependency Analysis, 1997

Bayesian Networks Structure Learning: Introduction | Tutorial | RAI

  • TPDA: Three Phase Dependency Analysis, 1997
  • SC: Sparse Candidate, 1999
  • IC: Inductive Causation, 2000
  • PC: Peter Spirtes and Clark Glymour, 2000
  • MMHC: Max-Min Hill-Climbing, 2006
  • RAI: Recursive Autonomy Identification, 2009
slide-24
SLIDE 24

Bayesian Networks Structure Learning: Introduction | Tutorial | RAI

Title

Bayesian Network Structure Learning by Recursive Autonomy Identifi fication

Authors Raanan Yehezkel and Boaz Lerner Authors Raanan Yehezkel and Boaz Lerner Journal of Machine Learning Research (2009),

  • pp. 1527-1570
slide-25
SLIDE 25

Conditional independence tests Edge direction (orientation rule) Structure decomposition

Bayesian Networks Structure Learning: Introduction | Tutorial | RAI

Structure decomposition

  • Diminish the curse of dimensionality problem
slide-26
SLIDE 26

 d-separation resolution (X,Y )

  • Size of the smallest condition set that d-separates X and Y

Bayesian Networks Structure Learning: Introduction | Tutorial | RAI

 d-separation resolution (G )

  • The highest d-separation resolution in the graph
slide-27
SLIDE 27

Bayesian Networks Structure Learning: Introduction | Tutorial | RAI

 Given , any two non-adjacent nodes in are

d-separated given nodes either included in or its exogenous causes.

 Formally: ...

G G A  G

A

G

S Y X t s V V S

ex A

| . . } {   

slide-28
SLIDE 28

Input: Observational data set Output: Partial DAG to represent the Markov

equivalent class

Bayesian Networks Structure Learning: Introduction | Tutorial | RAI RAI algorithm-Pseudo code Start from a complete undirected graph Start from a complete undirected graph * Repeat the steps 1 to 3 from low to high graph d-separation resolution, until stopping criterion met (e.g. CI test threshold) 1 Test of CI between nodes, followed by the removal edges related to independence 2 Edge direction according to orientation rules (not always possible) 3 Graph decomposition into autonomous sub-structures. * For each sub-structure, apply RAI recursively (steps 1 to 3), while increasing the order of CI testing.

slide-29
SLIDE 29

Bayesian Networks Structure Learning: Introduction | Tutorial | RAI