Mind Change Optimal Learning Of Bayes Net Structure Oliver Schulte - - PowerPoint PPT Presentation

mind change optimal learning of bayes net structure
SMART_READER_LITE
LIVE PREVIEW

Mind Change Optimal Learning Of Bayes Net Structure Oliver Schulte - - PowerPoint PPT Presentation

Mind Change Optimal Learning Of Bayes Net Structure Oliver Schulte School of Computing Science Simon Fraser University Vancouver, Canada oschulte@cs.sfu.ca ` with Wei Luo (SFU, wluoa@cs.sfu.ca) and Russ Greiner (U of Alberta,


slide-1
SLIDE 1

Mind Change Optimal Learning Of Bayes Net Structure

Oliver Schulte

School of Computing Science Simon Fraser University Vancouver, Canada

  • schulte@cs.sfu.ca `

with Wei Luo (SFU, wluoa@cs.sfu.ca) and Russ Greiner (U of Alberta, greiner@cs.ualberta.ca)

slide-2
SLIDE 2

Mind Change Optimal Learning of Bayes Net Structure 2/19

Outline

  • 1. Brief Intro to Bayes Nets (BNs).
  • 2. Language Learning Model for BN Structure

Learning.

  • 3. Mind Change Complexity of BN Learning.
  • 4. Mind Change, Convergence Time Optimality.
  • 5. NP-hardness of Optimal Learner.
slide-3
SLIDE 3

Mind Change Optimal Learning of Bayes Net Structure 3/19

Bayes Nets: Overview

Very widely used graphical formalism for probabilistic reasoning and KR in AI and machine learning. Bayes Net Structure = Directed Acyclic Graph. Nodes = Variables of Interest. Arcs = direct “influence”, “association”. Structure represents probabilistic conditional dependencies (correlations).

slide-4
SLIDE 4

Mind Change Optimal Learning of Bayes Net Structure 4/19

Example of Bayes Net Structure

Sprinkler Season Wet Rain Slippery

1. Season depends

  • n Slippery.

2. Sprinkler depends

  • n Rain.

3. Sprinkler does not depend on Rain given Season. 4. Sprinkler depends

  • n Rain given

Season, Wet.

slide-5
SLIDE 5

Mind Change Optimal Learning of Bayes Net Structure 5/19

Graphs entail Dependencies

A B C A B C A B C Dep(A,B),Dep(A,B|C) Dep(A,B),Dep(A,B|C), Dep(B,C),Dep(B,C|A), Dep(A,C|B)

slide-6
SLIDE 6

Mind Change Optimal Learning of Bayes Net Structure 6/19

Pattern = DAG Equivalence Class

 Write Dep(G) for the dependencies defined by DAG G.  Natural Equivalence relation: G ≈ G’ ⇔ Dep(G) = Dep(G’).  A partially directed graph, called a pattern, represents the equivalence class for a given DAG G.

A B C A B C A B C G G’ pattern

slide-7
SLIDE 7

Mind Change Optimal Learning of Bayes Net Structure 7/19

Constraint-Based BN Learning as Language Learning

index pattern language dependency relation string conditional dependence: Dep(X,Y|Z) Z= set of variables Gold paradigm Bayes Net Constraint-Based Approach: Learn BN from (in)dependency

  • information. Spirtes, Glymour, Shines (2000); Pearl and Verma

(2000); Margaritis and Thrun (1999); Cheng and Greiner (2001).

A BN learner maps a sequence of dependencies (repetitions allowed) to a pattern or to ?.

slide-8
SLIDE 8

Identification with Bounded Mind Changes

 Learner Ψ changes its mind on text T at stage k+1 , Ψ(T[k]) ≠ Ψ(T[k+1]) or Ψ(T[k]) ≠ ? and Ψ(T[k+1]) = ?.  Learner Ψ identifies language collection L with k mind changes , Ψ identifies L and changes its mind at most k times

  • n any text for a language in L.

 L is identifiable with k mind changes , there is a learner Ψ that identifies L with k mind changes.

Dep(A,B) Dep(B,C) Dep(A,C|B)

B C A B C A

? … …

Text Conjectures

slide-9
SLIDE 9

Mind Change Optimal Learning of Bayes Net Structure 9/19

Inclusion Depth and Mind Change Bounds

Proposition (Luo and Schulte 2006) Suppose that L has finite thickness. Then the best mind change bound for L is given by the length of the longest inclusion chain L1 ⊂ L2 ⊂ … Lk formed by languages in L.

1 1,2 2 2,3 2,3,4

longest inclusion chain has length 4

slide-10
SLIDE 10

Mind Change Optimal Learning of Bayes Net Structure 10/19

Mind Change Complexity of BN Learning

Let LV be the collection of dependency relations definable by Bayes nets with variables V. Theorem The longest inclusion chain in LV is of length = the number of edges in a complete graph.

|V | 2

slide-11
SLIDE 11

Maximal Length Inclusion Chain

A B C A B C A B C Dep(A,B),Dep(A,B|C) Dep(A,B),Dep(A,B|C), Dep(B,C),Dep(B,C|A), Dep(A,C|B) A B C all dependencies

slide-12
SLIDE 12

Mind Change Optimal Learning of Bayes Net Structure 12/19

Mind Change Optimal Learning

 Learner Ψ is MC-optimal for language collection L , if given any data sequence σ, the learner Ψ identifies L with the best possible mind change bound for the language collection {L: L is in L and consistent with σ}.  Proposition A BN learner identifying L is MC-

  • ptimal , for all dependency sequences σ, if there is

no unique edge-minimal pattern consistent with σ, then Ψ(σ) = ?.

Proof follows from general characterization of MC-optimality in Luo and Schulte (2005,2006).

slide-13
SLIDE 13

Mind Change Optimal Learning of Bayes Net Structure 13/19

Example of Mind Change Optimal Learner

Dep(A,B) Dep(B,C) Dep(A,C|B)

B C A B C A

? … …

Text Conjectures

Alternatives: B C A B C A

slide-14
SLIDE 14

Mind Change Optimal Learning of Bayes Net Structure 14/19

Convergence Time

 Convergence Time = number of observed dependencies - important to minimize  Def (Gold) Learner Ψ is uniformly faster than learner Φ , 1. Ψ converges at least as fast as Φ on every text T, and 2. Ψ converges strictly faster on some text T.

Proposition The learner Ψfast is uniformly faster than any

  • ther MC-optimal BN learner.

Define fast() = G if G is the unique edge - minimal pattern consistent with ?

  • therwise
slide-15
SLIDE 15

Complexity Analysis

A list of dependencies is compactly represented by a dependency oracle.

Dep(A,B) Dep(B,C) Dep(A,C|B)

Oracle Q:“Dep(B,C)”

dependency list

yes Oracle Q:“Dep(A,C)” ? Unique k O-cover Given a dependency oracle O, and a bound k, is there a DAG G covering the dependencies in O with ≤k edges s.t. all other DAGs G’ covering the dependencies in O have more edges than G?

slide-16
SLIDE 16

Mind Change Optimal Learning of Bayes Net Structure 16/19

NP-hardness result

Theorem Unique X3set-Cover reduces to Unique k O-

  • Cover. So if P = RP, then UMOC is NP-hard.

Basic Idea: Construct a dependency oracle that forces a tree.

R C1 C2 Cp

  • 1

Cp … X1 X2 X3 Xm -

1

Xm …

 Universe: X1,…,Xm.  Sets: C1,..,Cp.  All elements must be dependent on R.

slide-17
SLIDE 17

Mind Change Optimal Learning of Bayes Net Structure 17/19

Conclusion

 Constraint-based approach to BN learning analyzed as language learning problem.  Mind Change Complexity = , where n is the number of variables.  Number of edges: new intuitive notion of simplicity for a BN, based on learning theory.  Unique fastest mind-change optimal method is NP-hard.

n 2

slide-18
SLIDE 18

Mind Change Optimal Learning of Bayes Net Structure 18/19

Future Work

Heuristic Implementation of MC-optimal Learner (GES search).  Leads to a new BN learning algorithm with good performance.

slide-19
SLIDE 19

Mind Change Optimal Learning of Bayes Net Structure 19/19

References

  • W. Luo and O. Schulte. Mind change

efficient learning. In COLT 2005, pages 398-412.

  • W. Luo and O. Schulte. Mind change

efficient learning. Information and Computation 204:989-1011, 2006.

THE END