Sub-quadratic Markov tree mixture models for probability density - - PowerPoint PPT Presentation

sub quadratic markov tree mixture models for probability
SMART_READER_LITE
LIVE PREVIEW

Sub-quadratic Markov tree mixture models for probability density - - PowerPoint PPT Presentation

Sub-quadratic mixture models Sub-quadratic Markov tree mixture models for probability density estimation Sourour Ammar 1 , Ph. Leray 1 , L. Wehenkel 2 1 Equipe COnnaissances et D ecision, LINA UMR 6241 Ecole Polytechnique de lUniversit e


slide-1
SLIDE 1

Sub-quadratic mixture models

Sub-quadratic Markov tree mixture models for probability density estimation

Sourour Ammar1, Ph. Leray1, L. Wehenkel2

1 Equipe COnnaissances et D´

ecision, LINA UMR 6241 Ecole Polytechnique de l’Universit´ e de Nantes

2 Department of EECS and GIGA-Research, University of Li`

ege

COMPSTAT’2010 - Paris - 22-27 August 2010

  • S. Ammar et al.

Sub-quadratic mixture models (1/22)

slide-2
SLIDE 2

Sub-quadratic mixture models

A simple idea Proposition

Develop density estimation techniques that could scale to very high-dimensional spaces, by exploiting the Perturb & Combine idea with probabilistic graphical models.

Outline

Background Our proposal Some results Conclusions and Further works

  • S. Ammar et al.

Sub-quadratic mixture models (2/22)

slide-3
SLIDE 3

Sub-quadratic mixture models Background Perturb and combine principle

P&C principle in supervised learning

Principle :

(Bagging, Random forests, Extremely randomized trees)

How can we apply this idea to density estimation with Bayesian networks (BN)?

  • S. Ammar et al.

Sub-quadratic mixture models (3/22)

slide-4
SLIDE 4

Sub-quadratic mixture models Background Perturb and combine principle

P&C principle in supervised learning

Principle :

(Bagging, Random forests, Extremely randomized trees)

Learning algorithm

(research proc.)

How can we apply this idea to density estimation with Bayesian networks (BN)?

  • S. Ammar et al.

Sub-quadratic mixture models (3/22)

slide-5
SLIDE 5

Sub-quadratic mixture models Background Perturb and combine principle

P&C principle in supervised learning

Principle :

(Bagging, Random forests, Extremely randomized trees)

Learning algorithm

(research proc.)

Perturb

Weak algorithm

(randomization)

How can we apply this idea to density estimation with Bayesian networks (BN)?

  • S. Ammar et al.

Sub-quadratic mixture models (3/22)

slide-6
SLIDE 6

Sub-quadratic mixture models Background Perturb and combine principle

P&C principle in supervised learning

Principle :

(Bagging, Random forests, Extremely randomized trees)

Learning algorithm

(research proc.)

Perturb

Weak algorithm

(randomization)

Weak algorithm Weak algorithm Weak algorithm

  • 1

2 m

How can we apply this idea to density estimation with Bayesian networks (BN)?

  • S. Ammar et al.

Sub-quadratic mixture models (3/22)

slide-7
SLIDE 7

Sub-quadratic mixture models Background Perturb and combine principle

P&C principle in supervised learning

Principle :

(Bagging, Random forests, Extremely randomized trees)

Learning algorithm

(research proc.)

Perturb

Weak algorithm

(randomization)

Weak algorithm Weak algorithm Weak algorithm

  • 1

2 m

  • prediction m

prediction 2 prediction 1

How can we apply this idea to density estimation with Bayesian networks (BN)?

  • S. Ammar et al.

Sub-quadratic mixture models (3/22)

slide-8
SLIDE 8

Sub-quadratic mixture models Background Perturb and combine principle

P&C principle in supervised learning

Principle :

(Bagging, Random forests, Extremely randomized trees)

Learning algorithm

(research proc.)

Perturb

Weak algorithm

(randomization)

Weak algorithm Weak algorithm Weak algorithm

  • 1

2 m

  • prediction m

prediction 2 prediction 1 Final prediction

Combine

(Weighting schema)

How can we apply this idea to density estimation with Bayesian networks (BN)?

  • S. Ammar et al.

Sub-quadratic mixture models (3/22)

slide-9
SLIDE 9

Sub-quadratic mixture models Background Perturb and combine principle

P&C principle in supervised learning

Principle :

(Bagging, Random forests, Extremely randomized trees)

Learning algorithm

(research proc.)

Perturb

Weak algorithm

(randomization)

Weak algorithm Weak algorithm Weak algorithm

  • 1

2 m

  • prediction m

prediction 2 prediction 1 Final prediction

Combine

(Weighting schema)

How can we apply this idea to density estimation with Bayesian networks (BN)?

  • S. Ammar et al.

Sub-quadratic mixture models (3/22)

slide-10
SLIDE 10

Sub-quadratic mixture models Background Density estimation with BN

Density estimation with BN

A B C D E

  • S. Ammar et al.

Sub-quadratic mixture models (4/22)

slide-11
SLIDE 11

Sub-quadratic mixture models Background Density estimation with BN

Density estimation with BN

A B C D E

˜ S

  • S. Ammar et al.

Sub-quadratic mixture models (4/22)

slide-12
SLIDE 12

Sub-quadratic mixture models Background Density estimation with BN

Density estimation with BN

A B C D E

˜ S

A B C D E

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

  • S. Ammar et al.

Sub-quadratic mixture models (4/22)

slide-13
SLIDE 13

Sub-quadratic mixture models Background Density estimation with BN

Density estimation with BN

˜ S

A B C D E

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

A B C D E

  • S. Ammar et al.

Sub-quadratic mixture models (4/22)

slide-14
SLIDE 14

Sub-quadratic mixture models Background Density estimation with BN

Density estimation with BN

˜ S

A B C D E

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

A B C D E

  • S. Ammar et al.

Sub-quadratic mixture models (4/22)

slide-15
SLIDE 15

Sub-quadratic mixture models Background Density estimation with BN

Density estimation with BN

˜ S

A B C D E

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

A B C D E

  • S. Ammar et al.

Sub-quadratic mixture models (4/22)

slide-16
SLIDE 16

Sub-quadratic mixture models Background Density estimation with BN

Density estimation with BN

˜ S

A B C D E

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

A B C D E

  • S. Ammar et al.

Sub-quadratic mixture models (4/22)

slide-17
SLIDE 17

Sub-quadratic mixture models Background Density estimation with BN

Bayesian averaging

Instead of searching for an optimal model (structure + parameters):

Assume prior probabilities over the space of structures Determine posterior probabilities of each model given a dataset approach the target distribution by avereaging the different models wighted by their posterior probabilities

Caveats : Exact Bayesian averaging over large space of models is not ‘scalable’

⇒ requires to strongly constrain the space of structures

  • S. Ammar et al.

Sub-quadratic mixture models (5/22)

slide-18
SLIDE 18

Sub-quadratic mixture models

Outline

Background Our proposal Some results Conclusions and Further works

  • S. Ammar et al.

Sub-quadratic mixture models (6/22)

slide-19
SLIDE 19

Sub-quadratic mixture models Proposal

Strategy

Use simple spaces of graphical structures ˜ S (e.g. chains, trees, poly-trees etc.) Do not assume that target distribution is representable by one

  • f these structures

Rather, assume that target distribution may be approximated well by a mixture of a reasonable number of (S, θ∗) pairs, S ∈ ˜ S

  • S. Ammar et al.

Sub-quadratic mixture models (7/22)

slide-20
SLIDE 20

Sub-quadratic mixture models Proposal

Generic algorithm principle

A B C D E

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

˜ S

  • S. Ammar et al.

Sub-quadratic mixture models (8/22)

slide-21
SLIDE 21

Sub-quadratic mixture models Proposal

Generic algorithm principle

A B C D E

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

˜ S

  • Tm

T2 T1

  • S. Ammar et al.

Sub-quadratic mixture models (8/22)

slide-22
SLIDE 22

Sub-quadratic mixture models Proposal

Generic algorithm principle

A B C D E

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

˜ S

  • Tm

T2 T1

  • θ∗

m

θ∗

2

θ∗

1

  • S. Ammar et al.

Sub-quadratic mixture models (8/22)

slide-23
SLIDE 23

Sub-quadratic mixture models Proposal

Generic algorithm principle

A B C D E

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

˜ S

  • Tm

T2 T1

  • θ∗

m

θ∗

2

θ∗

1

µ1 µ2 µm

  • S. Ammar et al.

Sub-quadratic mixture models (8/22)

slide-24
SLIDE 24

Sub-quadratic mixture models Proposal

Generic algorithm principle

A B C D E

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

˜ S

  • Tm

T2 T1

  • θ∗

m

θ∗

2

θ∗

1

µ1 µ2 µm

Final model (θ∗

i , µi) i=1..m

  • S. Ammar et al.

Sub-quadratic mixture models (8/22)

slide-25
SLIDE 25

Sub-quadratic mixture models Proposal

Degrees of freedom

A B C D E

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

˜ S

  • Tm

T2 T1

  • θ∗

m

θ∗

2

θ∗

1

µ1 µ2 µm

Final model (θ∗

i , µi) i=1..m

What space ˜ S ?

  • S. Ammar et al.

Sub-quadratic mixture models (9/22)

slide-26
SLIDE 26

Sub-quadratic mixture models Proposal

Degrees of freedom

A B C D E

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

˜ S

  • Tm

T2 T1

  • θ∗

m

θ∗

2

θ∗

1

µ1 µ2 µm

Final model (θ∗

i , µi) i=1..m

What space ˜ S ? How to generate candidate structures ?

  • S. Ammar et al.

Sub-quadratic mixture models (9/22)

slide-27
SLIDE 27

Sub-quadratic mixture models Proposal

Degrees of freedom

A B C D E

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

˜ S

  • Tm

T2 T1

  • θ∗

m

θ∗

2

θ∗

1

µ1 µ2 µm

Final model (θ∗

i , µi) i=1..m

What space ˜ S ? How to generate candidate structures ? What weights?

  • S. Ammar et al.

Sub-quadratic mixture models (9/22)

slide-28
SLIDE 28

Sub-quadratic mixture models Proposal

Why Markov tree space ?

Polytrees, although more expressive, do not yield more accurate ensemble models than undirected trees [Ammar et al.2008]. Markov trees

(-) ”poor” independancy models

But

(+) Inference and parameters learning are scalable (linear complexity) (+) Uniform sampling of trees (linear complexity) (+) Optimal tree structure learning is polynomial (MWST)

  • S. Ammar et al.

Sub-quadratic mixture models (10/22)

slide-29
SLIDE 29

Sub-quadratic mixture models Proposal

How to generate candidate structures ?

Random uniform sampling [Ammar et al.2009] : linear complexity O(n) and good results Build the optimal tree (MWST) over a bootstrap replica of

  • riginal dataset D [Ammar et al.2009] : quadratic complexity

O(n2log(n2)) and better results Goal : How can we improve the complexity of our quadratic methods and keep the same accuracy ?

  • S. Ammar et al.

Sub-quadratic mixture models (11/22)

slide-30
SLIDE 30

Sub-quadratic mixture models Proposal

How to generate candidate structures ?

Random uniform sampling [Ammar et al.2009] : linear complexity O(n) and good results Build the optimal tree (MWST) over a bootstrap replica of

  • riginal dataset D [Ammar et al.2009] : quadratic complexity

O(n2log(n2)) and better results Goal : How can we improve the complexity of our quadratic methods and keep the same accuracy ?

  • S. Ammar et al.

Sub-quadratic mixture models (11/22)

slide-31
SLIDE 31

Sub-quadratic mixture models Proposal

Principle of MWST

Step 1 : fill a (n × n) mutual information matrix complexity O(n2) Step 2 : build optimal tree (Kruskal) complexity E log(E); E = n2 Step 3 : learn parameters linear complexity ⇒ Solution : reduce the first step complexity by applying the Perturb & Combine principle

  • S. Ammar et al.

Sub-quadratic mixture models (12/22)

slide-32
SLIDE 32

Sub-quadratic mixture models Proposal

Principle of MWST

Step 1 : fill a (n × n) mutual information matrix complexity O(n2) Step 2 : build optimal tree (Kruskal) complexity E log(E); E = n2 Step 3 : learn parameters linear complexity ⇒ Solution : reduce the first step complexity by applying the Perturb & Combine principle

  • S. Ammar et al.

Sub-quadratic mixture models (12/22)

slide-33
SLIDE 33

Sub-quadratic mixture models Proposal

sub-quadratic research heuristics

Partial matrix MI

X X X X X X X X X X X X X

MIK

K = n log(n) ⇒

Step 1: n log(n) Etape 2: n log(n) log(n log(n))

⇒ Total complexity = n log(n) log(n log(n)) ≺≺ quadratic and ∝ quasilinear

  • S. Ammar et al.

Sub-quadratic mixture models (13/22)

slide-34
SLIDE 34

Sub-quadratic mixture models Proposal

sub-quadratic research heuristics

Partial matrix MI

X X X X X X X X X X X X X

MIK K = number of considered terms in MI

K = n log(n) ⇒

Step 1: n log(n) Etape 2: n log(n) log(n log(n))

⇒ Total complexity = n log(n) log(n log(n)) ≺≺ quadratic and ∝ quasilinear

  • S. Ammar et al.

Sub-quadratic mixture models (13/22)

slide-35
SLIDE 35

Sub-quadratic mixture models Proposal

sub-quadratic research heuristics

Partial matrix MI

X X X X X X X X X X X X X

MIK K = number of considered terms in MI

K =?

K = n log(n) ⇒

Step 1: n log(n) Etape 2: n log(n) log(n log(n))

⇒ Total complexity = n log(n) log(n log(n)) ≺≺ quadratic and ∝ quasilinear

  • S. Ammar et al.

Sub-quadratic mixture models (13/22)

slide-36
SLIDE 36

Sub-quadratic mixture models Proposal

sub-quadratic research heuristics

Partial matrix MI

X X X X X X X X X X X X X

MIK K = number of considered terms in MI

K = n log(n)

K = n log(n) ⇒

Step 1: n log(n) Etape 2: n log(n) log(n log(n))

⇒ Total complexity = n log(n) log(n log(n)) ≺≺ quadratic and ∝ quasilinear

  • S. Ammar et al.

Sub-quadratic mixture models (13/22)

slide-37
SLIDE 37

Sub-quadratic mixture models Proposal

sub-quadratic research heuristics

Partial matrix MI

X X X X X X X X X X X X X

MIK K = number of considered terms in MI

K = n log(n)

K = n log(n) ⇒

Step 1: n log(n) Etape 2: n log(n) log(n log(n))

⇒ Total complexity = n log(n) log(n log(n)) ≺≺ quadratic and ∝ quasilinear

  • S. Ammar et al.

Sub-quadratic mixture models (13/22)

slide-38
SLIDE 38

Sub-quadratic mixture models Proposal

sub-quadratic research heuristics

Partial matrix MI

X X X X X X X X X X X X X

MIK K = number of considered terms in MI

K = n log(n)

K = n log(n) ⇒

Step 1: n log(n) Etape 2: n log(n) log(n log(n))

⇒ Total complexity = n log(n) log(n log(n)) ≺≺ quadratic and ∝ quasilinear

  • S. Ammar et al.

Sub-quadratic mixture models (13/22)

slide-39
SLIDE 39

Sub-quadratic mixture models Proposal

General algorithm : Naive Heuristic

A B C D E

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

D

Dm D2 D1

A B C D E

1 1 1 1 1 1 1 1 1 1 1

  • A B C D E

1 1 1 1 1 1 1 1 1 1 1

  • A B C D E

1 1 1 1 1 1 1 1 1 1 1

  • S. Ammar et al.

Sub-quadratic mixture models (14/22)

slide-40
SLIDE 40

Sub-quadratic mixture models Proposal

General algorithm : Naive Heuristic

A B C D E

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

D

Dm D2 D1

A B C D E

1 1 1 1 1 1 1 1 1 1 1

  • A B C D E

1 1 1 1 1 1 1 1 1 1 1

  • A B C D E

1 1 1 1 1 1 1 1 1 1 1

  • MI1

MI2 MIm

  • S. Ammar et al.

Sub-quadratic mixture models (14/22)

slide-41
SLIDE 41

Sub-quadratic mixture models Proposal

General algorithm : Naive Heuristic

A B C D E

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

D

Dm D2 D1

A B C D E

1 1 1 1 1 1 1 1 1 1 1

  • A B C D E

1 1 1 1 1 1 1 1 1 1 1

  • A B C D E

1 1 1 1 1 1 1 1 1 1 1

  • MI1

MI2 MIm

  • *

* * * * * * * * * * * * * * * * * * * * * * *

  • S. Ammar et al.

Sub-quadratic mixture models (14/22)

slide-42
SLIDE 42

Sub-quadratic mixture models Proposal

General algorithm : Naive Heuristic

A B C D E

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

D

Dm D2 D1

A B C D E

1 1 1 1 1 1 1 1 1 1 1

  • A B C D E

1 1 1 1 1 1 1 1 1 1 1

  • A B C D E

1 1 1 1 1 1 1 1 1 1 1

  • MI1

MI2 MIm

  • X

X X X X X X X X X X X X X X X X X X X X X X X

  • S. Ammar et al.

Sub-quadratic mixture models (14/22)

slide-43
SLIDE 43

Sub-quadratic mixture models Proposal

General algorithm : Naive Heuristic

A B C D E

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

D

Dm D2 D1

A B C D E

1 1 1 1 1 1 1 1 1 1 1

  • A B C D E

1 1 1 1 1 1 1 1 1 1 1

  • A B C D E

1 1 1 1 1 1 1 1 1 1 1

  • MI1

MI2 MIm

  • X

X X X X X X X X X X X X X X X X X X X X X X X Kruskal Kruskal Kruskal

  • S. Ammar et al.

Sub-quadratic mixture models (14/22)

slide-44
SLIDE 44

Sub-quadratic mixture models Proposal

General algorithm : Naive Heuristic

A B C D E

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

D

Dm D2 D1

A B C D E

1 1 1 1 1 1 1 1 1 1 1

  • A B C D E

1 1 1 1 1 1 1 1 1 1 1

  • A B C D E

1 1 1 1 1 1 1 1 1 1 1

  • MI1

MI2 MIm

  • X

X X X X X X X X X X X X X X X X X X X X X X X Kruskal Kruskal Kruskal S1 S2 Sm

  • S. Ammar et al.

Sub-quadratic mixture models (14/22)

slide-45
SLIDE 45

Sub-quadratic mixture models Proposal

General algorithm : Inertial Heuristic

A B C D E

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

D

D1

A B

0 0

C

1

D

0 1 1 1

  • E

1

  • 1

0 0 0 0 1 1 1 1 1

  • S. Ammar et al.

Sub-quadratic mixture models (15/22)

slide-46
SLIDE 46

Sub-quadratic mixture models Proposal

General algorithm : Inertial Heuristic

A B C D E

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

D

D1

A B

0 0

C

1

D

0 1 1 1

  • E

1

  • 1

0 0 0 0 1 1 1 1 1

MI1

  • S. Ammar et al.

Sub-quadratic mixture models (15/22)

slide-47
SLIDE 47

Sub-quadratic mixture models Proposal

General algorithm : Inertial Heuristic

A B C D E

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

D

D1

A B

0 0

C

1

D

0 1 1 1

  • E

1

  • 1

0 0 0 0 1 1 1 1 1

MI1 * * * * * * * *

  • S. Ammar et al.

Sub-quadratic mixture models (15/22)

slide-48
SLIDE 48

Sub-quadratic mixture models Proposal

General algorithm : Inertial Heuristic

A B C D E

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

D

D1

A B

0 0

C

1

D

0 1 1 1

  • E

1

  • 1

0 0 0 0 1 1 1 1 1

MI1 X X X X X X X X

  • S. Ammar et al.

Sub-quadratic mixture models (15/22)

slide-49
SLIDE 49

Sub-quadratic mixture models Proposal

General algorithm : Inertial Heuristic

A B C D E

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

D

D1

A B

0 0

C

1

D

0 1 1 1

  • E

1

  • 1

0 0 0 0 1 1 1 1 1

MI1 X X X X X X X X X X X X S1 Kruskal

  • S. Ammar et al.

Sub-quadratic mixture models (15/22)

slide-50
SLIDE 50

Sub-quadratic mixture models Proposal

General algorithm : Inertial Heuristic

A B C D E

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

D

D1

A B

0 0

C

1

D

0 1 1 1

  • E

1

  • 1

0 0 0 0 1 1 1 1 1

MI1 X X X X X X X X X X X X S1 Kruskal D2

1 1 1

  • 1

1 1 1 1 1 1

  • 0 0 1

0 0

A B C D E

  • S. Ammar et al.

Sub-quadratic mixture models (15/22)

slide-51
SLIDE 51

Sub-quadratic mixture models Proposal

General algorithm : Inertial Heuristic

A B C D E

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

D

D1

A B

0 0

C

1

D

0 1 1 1

  • E

1

  • 1

0 0 0 0 1 1 1 1 1

MI1 X X X X X X X X X X X X S1 Kruskal D2

1 1 1

  • 1

1 1 1 1 1 1

  • 0 0 1

0 0

A B C D E

MI2 * * * *

  • S. Ammar et al.

Sub-quadratic mixture models (15/22)

slide-52
SLIDE 52

Sub-quadratic mixture models Proposal

General algorithm : Inertial Heuristic

A B C D E

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

D

D1

A B

0 0

C

1

D

0 1 1 1

  • E

1

  • 1

0 0 0 0 1 1 1 1 1

MI1 X X X X X X X X X X X X S1 Kruskal D2

1 1 1

  • 1

1 1 1 1 1 1

  • 0 0 1

0 0

A B C D E

MI2 * * * * * * * *

  • S. Ammar et al.

Sub-quadratic mixture models (15/22)

slide-53
SLIDE 53

Sub-quadratic mixture models Proposal

General algorithm : Inertial Heuristic

A B C D E

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

D

D1

A B

0 0

C

1

D

0 1 1 1

  • E

1

  • 1

0 0 0 0 1 1 1 1 1

MI1 X X X X X X X X X X X X S1 Kruskal D2

1 1 1

  • 1

1 1 1 1 1 1

  • 0 0 1

0 0

A B C D E

MI2 X X X X X X X X

  • S. Ammar et al.

Sub-quadratic mixture models (15/22)

slide-54
SLIDE 54

Sub-quadratic mixture models Proposal

General algorithm : Inertial Heuristic

A B C D E

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

D

D1

A B

0 0

C

1

D

0 1 1 1

  • E

1

  • 1

0 0 0 0 1 1 1 1 1

MI1 X X X X X X X X X X X X S1 Kruskal D2

1 1 1

  • 1

1 1 1 1 1 1

  • 0 0 1

0 0

A B C D E

MI2 X X X X X X X X S2 Kruskal X X X X

  • S. Ammar et al.

Sub-quadratic mixture models (15/22)

slide-55
SLIDE 55

Sub-quadratic mixture models Proposal

General algorithm : Inertial Heuristic

A B C D E

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

D

D1

A B

0 0

C

1

D

0 1 1 1

  • E

1

  • 1

0 0 0 0 1 1 1 1 1

MI1 X X X X X X X X X X X X S1 Kruskal D2

1 1 1

  • 1

1 1 1 1 1 1

  • 0 0 1

0 0

A B C D E

MI2 X X X X X X X X S2 Kruskal X X X X Dm

A B C

0 0 1

D

0 1 1 1

E

  • 1
  • 1

0 0 0 0 1 1 1 1 1

MIm X X X X X X X X Kruskal Sm

  • S. Ammar et al.

Sub-quadratic mixture models (15/22)

slide-56
SLIDE 56

Sub-quadratic mixture models Proposal

Experimental protocol

Test problems

1000 binary variables; 10 target distributions : DAG (1000 vars) 10 repeated experiments, for |D| = 1000

Algorithm variants

Mixtures with growing sizes (m = 1, 10, 20, . . . , 150) of Markov trees, two variants of the generation of these trees : Sbquadratic Heuristics over the initial dataset D Sbquadratic Heuristics over a bootstrap replica of D parameter learning : MAP with uniform priors Different weighting schemas : Uniform weighting (ie. µi = 1/m) Bayesian averaging (ie. µi ∝ scoreBDeu)

Accuracy evaluation

asymmetric approached Kullback-Leibler divergence Accuracy is better when ˆ KL divergence is lower.

  • S. Ammar et al.

Sub-quadratic mixture models (16/22)

slide-57
SLIDE 57

Sub-quadratic mixture models Proposal

Experimental protocol

Test problems

1000 binary variables; 10 target distributions : DAG (1000 vars) 10 repeated experiments, for |D| = 1000

Algorithm variants

Mixtures with growing sizes (m = 1, 10, 20, . . . , 150) of Markov trees, two variants of the generation of these trees : Sbquadratic Heuristics over the initial dataset D Sbquadratic Heuristics over a bootstrap replica of D parameter learning : MAP with uniform priors Different weighting schemas : Uniform weighting (ie. µi = 1/m) Bayesian averaging (ie. µi ∝ scoreBDeu)

Accuracy evaluation

asymmetric approached Kullback-Leibler divergence Accuracy is better when ˆ KL divergence is lower.

  • S. Ammar et al.

Sub-quadratic mixture models (16/22)

slide-58
SLIDE 58

Sub-quadratic mixture models Proposal

Experimental protocol

Test problems

1000 binary variables; 10 target distributions : DAG (1000 vars) 10 repeated experiments, for |D| = 1000

Algorithm variants

Mixtures with growing sizes (m = 1, 10, 20, . . . , 150) of Markov trees, two variants of the generation of these trees : Sbquadratic Heuristics over the initial dataset D Sbquadratic Heuristics over a bootstrap replica of D parameter learning : MAP with uniform priors Different weighting schemas : Uniform weighting (ie. µi = 1/m) Bayesian averaging (ie. µi ∝ scoreBDeu)

Accuracy evaluation

asymmetric approached Kullback-Leibler divergence Accuracy is better when ˆ KL divergence is lower.

  • S. Ammar et al.

Sub-quadratic mixture models (16/22)

slide-59
SLIDE 59

Sub-quadratic mixture models Results

Naive Heuristic results

50 100 150 50 50.5 51 51.5 52 52.5 Number of components KL divergence naive subquadratic mixtures unif weights randomized mixtures

Our naive subquadratic mixtures are clearly outperforming the randomized methods with uniform weighting

  • S. Ammar et al.

Sub-quadratic mixture models (17/22)

slide-60
SLIDE 60

Sub-quadratic mixture models Results

Inertial Heuristic results

50 100 150 10 15 20 25 30 35 40 45 50 55 Number of components KL divergence randomized mixtures Inertial subquadratic mixtures (D) Inertial subquadratic mixtures (Bagging)

Our inertial methods outperform the randomized ones (Linear complexity) Our inertial methods approach the base method CL (Quadratic complexity)

  • S. Ammar et al.

Sub-quadratic mixture models (18/22)

slide-61
SLIDE 61

Sub-quadratic mixture models Results

Inertial Heuristic results

50 100 150 10 15 20 25 30 35 40 45 50 55 Number of components KL divergence randomized mixtures Inertial subquadratic mixtures (D) Inertial subquadratic mixtures (Bagging) CL

Our inertial methods outperform the randomized ones (Linear complexity) Our inertial methods approach the base method CL (Quadratic complexity)

  • S. Ammar et al.

Sub-quadratic mixture models (18/22)

slide-62
SLIDE 62

Sub-quadratic mixture models Conclusion

Conclusion

Our subquadratic mixtures (quasilinear) outperform the randomized methods (linear) Our inertial methods approach the base method CL (quadratic) Bagging does not allow an improvement in the estimation quality

  • S. Ammar et al.

Sub-quadratic mixture models (19/22)

slide-63
SLIDE 63

Sub-quadratic mixture models Conclusion

Further work

Improve our methods : linear approximation of the optimal tree [Chazelle2000] Comparison with scalable optimal structure learning algorithms (sparse candidate [Friedman et al.1999], MMHC [Tsamardinos et al.2006] ...) Consider sequential methods of combination (Boosting, MCMC)

  • S. Ammar et al.

Sub-quadratic mixture models (20/22)

slide-64
SLIDE 64

Sub-quadratic mixture models Conclusion

Bibliography

  • S. Ammar, Ph. Leray, B. Defourny, and L. Wehenkel.

2008. High-dimensional probability density estimation with randomized ensembles of tree structured bayesian networks. In Proceedings of the fourth European Workshop on Probabilistic Graphical Models (PGM08), pages 9–16.

  • S. Ammar, Ph. Leray, B. Defourny, and L. Wehenkel.

2009. Probability density estimation by perturbing and combining tree structured markov networks. In the 10th European Conference on Symbolic and Quantitative Approaches to Reasoning with Uncertainty (ECSQARU09), pages 156–167.

  • B. Chazelle.

2000. A minimum spanning tree algorithm with inverse-ackermann type complexity.

  • J. ACM, 47(6):1028–1047.
  • N. Friedman, I. Nachman, and D. Pe´

er. 1999. Learning bayesian network structure from massive datasets: The ”sparse candidate” algorithm. In Proc. Fifteenth Conf. on Uncertainty in Artificial Intelligence, pages 206–215.

  • I. Tsamardinos, L. E. Brown, and C. F. Aliferis.

2006. The max-min hill-climbing bayesian network structure learning algorithm. Machine Learning, 65(1):31–78.

  • S. Ammar et al.

Sub-quadratic mixture models (21/22)

slide-65
SLIDE 65

Sub-quadratic mixture models

  • S. Ammar et al.

Sub-quadratic mixture models (22/22)