+ 2. Model Selection Scores 3. New Stuff: fNML Score 2/30 + Bayesian - - PowerPoint PPT Presentation

2 model selection scores 3 new stuff fnml score 2 30
SMART_READER_LITE
LIVE PREVIEW

+ 2. Model Selection Scores 3. New Stuff: fNML Score 2/30 + Bayesian - - PowerPoint PPT Presentation

+ fNML Criterion Tomi Silander Teemu Roos Petri Kontkanen Petri Myllymaki for Learning Bayesian PGM08 Network Hirtshals Structures September 1719 2008 Helsinki Institute for Information Technology HIIT FINLAND 1. Bayesian


slide-1
SLIDE 1

+fNML Criterion

for Learning Bayesian Network Structures

Tomi Silander Teemu Roos Petri Kontkanen Petri Myllymaki

Helsinki Institute for Information Technology HIIT FINLAND PGM‐08 Hirtshals September 17‐19 2008

slide-2
SLIDE 2

+

  • 1. Bayesian Networks
  • 2. Model Selection Scores
  • 3. New Stuff: fNML Score

2/30

slide-3
SLIDE 3

+Bayesian Networks

Conditional independence assumptions Factorization of a joint probability distribution:

3/30

slide-4
SLIDE 4

+Data

NAME GENDER PROFESSION CHILDREN Teemu male researcher 2 Clark male reporter Margrethe female queen 2 : : : :

4/30

slide-5
SLIDE 5

+Data

NAME GENDER PROFESSION CHILDREN Teemu male researcher 2 Clark male reporter Margrethe female queen 2 : : : :

5/30

slide-6
SLIDE 6

+

NAME GENDER PROFESSION CHILDREN Teemu male researcher 2 Clark male reporter Margrethe female queen 2 : : : :

Data

D

6/30

slide-7
SLIDE 7

+

NAME GENDER PROFESSION CHILDREN Teemu male researcher 2 Clark male reporter Margrethe female queen 2 : : : :

Data

Di

7/30

slide-8
SLIDE 8

+

  • Bayes (BDe)
  • BIC & AIC
  • MDL

8/30

slide-9
SLIDE 9

+Bayesian Score

The state-of-the-art model selection criterion: Bayesian Dirichlet equivalent (BDe) score Assumes Dirichlet prior on model parameters θ. Evaluate marginal likelihood of data given model Depends on hyper-parameter α.

9/30

slide-10
SLIDE 10

+BIC & AIC

BIC: Asymptotic approximation of marginal likelihood: AIC: Asymptotic approximation of estimated prediction error:

10/30

slide-11
SLIDE 11

+MDL

Minimum Description Length (MDL) Principle: Choose the model that yields the shortest description of the data together with the model. Too simple model data long, model short "Just right" data short, model short Too complex model data short, model long

11/30

slide-12
SLIDE 12

+Flavours of MDL

  • 1. "Pedestrian"

Asymptotic two-part code-length same as BIC.

12/30

slide-13
SLIDE 13

+Flavours of MDL

  • 1. "Pedestrian"

Asymptotic two-part code-length same as BIC.

  • 2. "Sophisticated"

Bayesian marginal likelihood.

13/30

slide-14
SLIDE 14

+Flavours of MDL

  • 1. "Pedestrian"

Asymptotic two-part code-length same as BIC.

  • 2. "Sophisticated"

Bayesian marginal likelihood.

  • 3. "Champions League"

Modern (minimax regret optimal) code normalized maximum likelihood (NML) Problem: NML computationally very hard.

14/30

slide-15
SLIDE 15

+Bayes vs. MDL (minimax regret)

The Bayesian decision principle is minimization of expected loss: minA EX [loss(A,X)] MDL (especially NML) is based on minimization of worst-case regret: minA maxX [loss(A,X) – minA' loss(A',X)]

"regret"

15/30

slide-16
SLIDE 16

+

  • fNML = "factorized NML"
  • computation
  • consistency

16/30

slide-17
SLIDE 17

+fNML Score

We propose a new MDL score, factorized NML, which is

  • 1. easy to compute,
  • 2. decomposable (allowing fast search),
  • 3. robust (experimentally).

17/30

slide-18
SLIDE 18

+

NAME GENDER PROFESSION CHILDREN Teemu male researcher 2 Clark male reporter Margrethe female queen 2 : : : :

fNML vs. NML: what's new?

18/30

slide-19
SLIDE 19

+

NAME GENDER PROFESSION CHILDREN Teemu male researcher 2 Clark male reporter Margrethe female queen 2 : : : :

fNML vs. NML: what's new?

D

NML: Minimax code applied to whole data as one block

19/30

slide-20
SLIDE 20

+

NAME GENDER PROFESSION CHILDREN Teemu male researcher 2 Clark male reporter Margrethe female queen 2 : : : :

fNML vs. NML: what's new?

D2

fNML: minimax code applied column by column

20/30

slide-21
SLIDE 21

+

NAME GENDER PROFESSION CHILDREN Teemu male researcher 2 Clark male reporter Margrethe female queen 2 : : : :

fNML vs. NML: what's new?

D1

fNML: Conditional minimax code when parent(s) exist.

21/30

slide-22
SLIDE 22

+

NAME GENDER PROFESSION CHILDREN Teemu male researcher 2 Clark male reporter Margrethe female queen 2 : : : :

fNML vs. NML: what's new?

fNML: Conditional minimax code when parent(s) exist.

D3

22/30

slide-23
SLIDE 23

+

NAME GENDER PROFESSION CHILDREN Teemu male researcher 2 Clark male reporter Margrethe female queen 2 : : : :

fNML vs. NML: what's new?

fNML: Conditional minimax code when parent(s) exist.

D4

23/30

slide-24
SLIDE 24

+

NAME GENDER PROFESSION CHILDREN Teemu male researcher 2 Clark male reporter Margrethe female queen 2 : : : :

fNML vs. NML: what's new?

fNML: Conditional minimax code when parent(s) exist.

D4

Each column is encoded using the minimax code for multinomials. Using fast NML algorithms, this takes O(n log n) per column.

24/30

slide-25
SLIDE 25

+fNML: Consistency

(Haughton, 1988): Any penalized likelihood score of the form where an satisfies and , is consistent. Theorem: fNML behaves asymptotically like BIC, i.e.,

an = log n.

Hence, fNML is consistent.

25/30

slide-26
SLIDE 26

+Robustness

BIC BDe, fNML

26/30

slide-27
SLIDE 27

+Robustness

BDe optimal when prior "correct". fNML almost as good. BIC BDe, fNML

27/30

slide-28
SLIDE 28

+Robustness

f N M L

28/30

slide-29
SLIDE 29

+Robustness

f N M L BDe much worse when prior "incorrect". fNML more robust.

29/30

slide-30
SLIDE 30

+ Questions?

slide-31
SLIDE 31

+Decomposable Scores

Problem: Super-exponential search space. Solution: Decomposable scores SCORE(G,D) = ΣS(Di,DGi) For decomposable scores, exact search (global

  • ptimum) can be done for about m ≤ 30 nodes

(Koivisto & Sood, 2004; Silander and Myllymäki, 2006).

i=1 m