Analysis of MAP in CRP Normal-Normal model ukasz Rajkowski Faculty - - PowerPoint PPT Presentation

analysis of map in crp normal normal model
SMART_READER_LITE
LIVE PREVIEW

Analysis of MAP in CRP Normal-Normal model ukasz Rajkowski Faculty - - PowerPoint PPT Presentation

Analysis of MAP in CRP Normal-Normal model ukasz Rajkowski Faculty of Mathematics, Informatics and Mechanics University of Warsaw l.rajkowski@mimuw.edu.pl November 28, 2016 ukasz Rajkowski Analysis of MAP in CRP Normal-Normal model


slide-1
SLIDE 1

Analysis of MAP in CRP Normal-Normal model

Łukasz Rajkowski

Faculty of Mathematics, Informatics and Mechanics University of Warsaw l.rajkowski@mimuw.edu.pl

November 28, 2016

Łukasz Rajkowski Analysis of MAP in CRP Normal-Normal model

slide-2
SLIDE 2

Chinese Restaurant Process

Chinese Restaurant Process with parameter α can be viewed as a distribution on the space of partitions of a finite set.

Łukasz Rajkowski Analysis of MAP in CRP Normal-Normal model

slide-3
SLIDE 3

Chinese Restaurant Process

Chinese Restaurant Process with parameter α can be viewed as a distribution on the space of partitions of a finite set. What is the probability of

{1, 2, 4, 6}, {3}, {5, 7} ?

Łukasz Rajkowski Analysis of MAP in CRP Normal-Normal model

slide-4
SLIDE 4

Chinese Restaurant Process

Chinese Restaurant Process with parameter α can be viewed as a distribution on the space of partitions of a finite set. What is the probability of

{1, 2, 4, 6}, {3}, {5, 7} ?

7 6 5 4 3 2 1 . . .

P(new table) ∝ α P(join table) ∝ # sitting there

Łukasz Rajkowski Analysis of MAP in CRP Normal-Normal model

slide-5
SLIDE 5

Chinese Restaurant Process

Chinese Restaurant Process with parameter α can be viewed as a distribution on the space of partitions of a finite set. What is the probability of

{1, 2, 4, 6}, {3}, {5, 7} ?

7 6 5 4 3 2 1 . . .

P(new table) ∝ α P(join table) ∝ # sitting there

P = α α

Łukasz Rajkowski Analysis of MAP in CRP Normal-Normal model

slide-6
SLIDE 6

Chinese Restaurant Process

Chinese Restaurant Process with parameter α can be viewed as a distribution on the space of partitions of a finite set. What is the probability of

{1, 2, 4, 6}, {3}, {5, 7} ?

7 6 5 4 3 1 2 . . .

P(new table) ∝ α P(join table) ∝ # sitting there

P = α α · 1 1 + α

Łukasz Rajkowski Analysis of MAP in CRP Normal-Normal model

slide-7
SLIDE 7

Chinese Restaurant Process

Chinese Restaurant Process with parameter α can be viewed as a distribution on the space of partitions of a finite set. What is the probability of

{1, 2, 4, 6}, {3}, {5, 7} ?

7 6 5 4 1 2 3 . . .

P(new table) ∝ α P(join table) ∝ # sitting there

P = α α · 1 1 + α · α 2 + α

Łukasz Rajkowski Analysis of MAP in CRP Normal-Normal model

slide-8
SLIDE 8

Chinese Restaurant Process

Chinese Restaurant Process with parameter α can be viewed as a distribution on the space of partitions of a finite set. What is the probability of

{1, 2, 4, 6}, {3}, {5, 7} ?

7 6 5 1 2 4 3 . . .

P(new table) ∝ α P(join table) ∝ # sitting there

P = α α · 1 1 + α · α 2 + α · 2 3 + α

Łukasz Rajkowski Analysis of MAP in CRP Normal-Normal model

slide-9
SLIDE 9

Chinese Restaurant Process

Chinese Restaurant Process with parameter α can be viewed as a distribution on the space of partitions of a finite set. What is the probability of

{1, 2, 4, 6}, {3}, {5, 7} ?

7 6 1 2 4 3 5 . . .

P(new table) ∝ α P(join table) ∝ # sitting there

P = α α · 1 1 + α · α 2 + α · 2 3 + α · α 4 + α

Łukasz Rajkowski Analysis of MAP in CRP Normal-Normal model

slide-10
SLIDE 10

Chinese Restaurant Process

Chinese Restaurant Process with parameter α can be viewed as a distribution on the space of partitions of a finite set. What is the probability of

{1, 2, 4, 6}, {3}, {5, 7} ?

7 1 2 4 6 3 5 . . .

P(new table) ∝ α P(join table) ∝ # sitting there

P = α α · 1 1 + α · α 2 + α · 2 3 + α · α 4 + α · 3 5 + α

Łukasz Rajkowski Analysis of MAP in CRP Normal-Normal model

slide-11
SLIDE 11

Chinese Restaurant Process

Chinese Restaurant Process with parameter α can be viewed as a distribution on the space of partitions of a finite set. What is the probability of

{1, 2, 4, 6}, {3}, {5, 7} ?

1 2 4 6 3 5 7 . . .

P(new table) ∝ α P(join table) ∝ # sitting there

P = α α · 1 1 + α · α 2 + α · 2 3 + α · α 4 + α · 3 5 + α · 1 6 + α

Łukasz Rajkowski Analysis of MAP in CRP Normal-Normal model

slide-12
SLIDE 12

Chinese Restaurant Process

Chinese Restaurant Process with parameter α can be viewed as a distribution on the space of partitions of a finite set. What is the probability of

{1, 2, 4, 6}, {3}, {5, 7} ?

1 2 4 6 3 5 7 . . .

P(new table) ∝ α P(join table) ∝ # sitting there

P = α α · 1 1 + α · α 2 + α · 2 3 + α · α 4 + α · 3 5 + α · 1 6 + α Definition J ∼ CRPn(α) means that P(J = J ) = α|J |

α(n)

  • J∈J (|J| − 1)!,

where α(n) = α(α + 1) . . . (α + n − 1).

Łukasz Rajkowski Analysis of MAP in CRP Normal-Normal model

slide-13
SLIDE 13

CRP Normal-Normal model

n observations of dimension d may be modelled as follows J ∼ CRP(α)n θ = (θJ)J∈J | J

iid

∼ N( µ, T) xJ = (xj)j∈J | J , θ

iid

∼ N(θJ, Σ) for J ∈ J

Łukasz Rajkowski Analysis of MAP in CRP Normal-Normal model

slide-14
SLIDE 14

CRP Normal-Normal model

n observations of dimension d may be modelled as follows J ∼ CRP(α)n θ = (θJ)J∈J | J

iid

∼ N( µ, T) xJ = (xj)j∈J | J , θ

iid

∼ N(θJ, Σ) for J ∈ J

J =

  • {1, 2, 4, 6}, {3}, {5, 7}
  • Łukasz Rajkowski

Analysis of MAP in CRP Normal-Normal model

slide-15
SLIDE 15

CRP Normal-Normal model

n observations of dimension d may be modelled as follows J ∼ CRP(α)n θ = (θJ)J∈J | J

iid

∼ N( µ, T) xJ = (xj)j∈J | J , θ

iid

∼ N(θJ, Σ) for J ∈ J

J =

  • {1, 2, 4, 6}, {3}, {5, 7}
  • Łukasz Rajkowski

Analysis of MAP in CRP Normal-Normal model

slide-16
SLIDE 16

CRP Normal-Normal model

n observations of dimension d may be modelled as follows J ∼ CRP(α)n θ = (θJ)J∈J | J

iid

∼ N( µ, T) xJ = (xj)j∈J | J , θ

iid

∼ N(θJ, Σ) for J ∈ J

J =

  • {1, 2, 4, 6}, {3}, {5, 7}
  • Łukasz Rajkowski

Analysis of MAP in CRP Normal-Normal model

slide-17
SLIDE 17

CRP Normal-Normal model

n observations of dimension d may be modelled as follows J ∼ CRP(α)n θ = (θJ)J∈J | J

iid

∼ N( µ, T) xJ = (xj)j∈J | J , θ

iid

∼ N(θJ, Σ) for J ∈ J

J =

  • {1, 2, 4, 6}, {3}, {5, 7}
  • Łukasz Rajkowski

Analysis of MAP in CRP Normal-Normal model

slide-18
SLIDE 18

CRP Normal-Normal model

n observations of dimension d may be modelled as follows J ∼ CRP(α)n θ = (θJ)J∈J | J

iid

∼ N( µ, T) xJ = (xj)j∈J | J , θ

iid

∼ N(θJ, Σ) for J ∈ J

J =

  • {1, 2, 4, 6}, {3}, {5, 7}
  • Łukasz Rajkowski

Analysis of MAP in CRP Normal-Normal model

slide-19
SLIDE 19

CRP Normal-Normal model

n observations of dimension d may be modelled as follows J ∼ CRP(α)n θ = (θJ)J∈J | J

iid

∼ N( µ, T) xJ = (xj)j∈J | J , θ

iid

∼ N(θJ, Σ) for J ∈ J

J =

  • {1, 2, 4, 6}, {3}, {5, 7}
  • Advantage:

No need to define a priori limit on number

  • f clusters

Łukasz Rajkowski Analysis of MAP in CRP Normal-Normal model

slide-20
SLIDE 20

CRP Normal-Normal model

n observations of dimension d may be modelled as follows J ∼ CRP(α)n θ = (θJ)J∈J | J

iid

∼ N( µ, T) xJ = (xj)j∈J | J , θ

iid

∼ N(θJ, Σ) for J ∈ J

J =

  • {1, 2, 4, 6}, {3}, {5, 7}
  • The ’true’ partition is not known

Advantage: No need to define a priori limit on number

  • f clusters

Łukasz Rajkowski Analysis of MAP in CRP Normal-Normal model

slide-21
SLIDE 21

CRP Normal-Normal model

n observations of dimension d may be modelled as follows J ∼ CRP(α)n θ = (θJ)J∈J | J

iid

∼ N( µ, T) xJ = (xj)j∈J | J , θ

iid

∼ N(θJ, Σ) for J ∈ J

J =

  • {1, 2, 4, 6}, {3}, {5, 7}
  • The ’true’ partition is not known

Advantage: No need to define a priori limit on number

  • f clusters

Task: Compute the distribution

  • f

J provided

  • bservation x

Łukasz Rajkowski Analysis of MAP in CRP Normal-Normal model

slide-22
SLIDE 22

CRP Normal-Normal model

The Posterior For µ = 0, T = τ 2I, Σ = σ2I P(J | x) ∝

α

τ

|J |

J∈J

|J|! |J|

  • |J|

σ2 + 1 τ 2

· exp

1

2σ2

  • J∈J

|J| ·

  • xJ
  • 2

1 + τ 2/|J|

σ2

  • =:Qx(J )

Łukasz Rajkowski Analysis of MAP in CRP Normal-Normal model

slide-23
SLIDE 23

CRP Normal-Normal model

The Posterior For µ = 0, T = τ 2I, Σ = σ2I P(J | x) ∝

α

τ

|J |

J∈J

|J|! |J|

  • |J|

σ2 + 1 τ 2

· exp

1

2σ2

  • J∈J

|J| ·

  • xJ
  • 2

1 + τ 2/|J|

σ2

  • =:Qx(J )

The MAP The Maximal Posterior Partition (MAP) is defined by ˆ J (x) = argmaxJ P(J | x)

Łukasz Rajkowski Analysis of MAP in CRP Normal-Normal model

slide-24
SLIDE 24

MAP properties (in Normal model)

Property 1 ˆ JMAP(x) is a convex partition with respect to x.

Łukasz Rajkowski Analysis of MAP in CRP Normal-Normal model

slide-25
SLIDE 25

MAP properties (in Normal model)

Property 1 ˆ JMAP(x) is a convex partition with respect to x. Convex and lovely

Łukasz Rajkowski Analysis of MAP in CRP Normal-Normal model

slide-26
SLIDE 26

MAP properties (in Normal model)

Property 1 ˆ JMAP(x) is a convex partition with respect to x. Convex but not lovely

Łukasz Rajkowski Analysis of MAP in CRP Normal-Normal model

slide-27
SLIDE 27

MAP properties (in Normal model)

Property 1 ˆ JMAP(x) is a convex partition with respect to x. Not convex and disastrous

Łukasz Rajkowski Analysis of MAP in CRP Normal-Normal model

slide-28
SLIDE 28

MAP properties (in Normal model)

Property 1 ˆ JMAP(x) is a convex partition with respect to x. Property 2 If ( 1

n

n

i=1 xi)∞ n=1 is bounded then the size of the largest cluster in

ˆ JMAP(x) is O(n).

Łukasz Rajkowski Analysis of MAP in CRP Normal-Normal model

slide-29
SLIDE 29

MAP properties (in Normal model)

Property 1 ˆ JMAP(x) is a convex partition with respect to x. Property 2 If ( 1

n

n

i=1 xi)∞ n=1 is bounded then the size of the largest cluster in

ˆ JMAP(x) is O(n). Property 3 If (xn)∞

n=1 is bounded then the size of the smallest cluster in

ˆ JMAP(x) is O(n).

Łukasz Rajkowski Analysis of MAP in CRP Normal-Normal model

slide-30
SLIDE 30

MAP properties (in Normal model)

Property 1 ˆ JMAP(x) is a convex partition with respect to x. Property 2 If ( 1

n

n

i=1 xi)∞ n=1 is bounded then the size of the largest cluster in

ˆ JMAP(x) is O(n). Property 3 If (xn)∞

n=1 is bounded then the size of the smallest cluster in

ˆ JMAP(x) is O(n).

Corollary: The number of clusters in ˆ JMAP(x) is bounded.

Łukasz Rajkowski Analysis of MAP in CRP Normal-Normal model

slide-31
SLIDE 31

Randomisation of input

P – a probability distribution on Rd, (Xn)∞

n=1 iid

∼ P, ˆ Jn = ˆ J (X 1:n), Mn, mn – size of largest/smallest cluster

1

ˆ Jn is convex (w.r.t. X 1:n).

2 If E X 4 < ∞ then lim inf Mn n > 0 almost surely. 3 If suppP is bounded then lim inf mn n > 0 almost surely.

⇒ (| ˆ Jn|)∞

n=1 bounded almost surely.

Łukasz Rajkowski Analysis of MAP in CRP Normal-Normal model

slide-32
SLIDE 32

Randomisation of input

P – a probability distribution on Rd, (Xn)∞

n=1 iid

∼ P, ˆ Jn = ˆ J (X 1:n), Mn, mn – size of largest/smallest cluster

1

ˆ Jn is convex (w.r.t. X 1:n).

2 If E X 4 < ∞ then lim inf Mn n > 0 almost surely. 3 If suppP is bounded then lim inf mn n > 0 almost surely.

⇒ (| ˆ Jn|)∞

n=1 bounded almost surely.

Remark There exist P such that E X 4 < ∞ and lim inf mn = 1 almost surely.

Łukasz Rajkowski Analysis of MAP in CRP Normal-Normal model

slide-33
SLIDE 33

Induced partitions

Definition (induced partition) If A is a partition of Rd then J A

n =

{i n: Xi ∈ A}: A ∈ A .

Łukasz Rajkowski Analysis of MAP in CRP Normal-Normal model

slide-34
SLIDE 34

Induced partitions

Definition (induced partition) If A is a partition of Rd then J A

n =

{i n: Xi ∈ A}: A ∈ A .

A

Łukasz Rajkowski Analysis of MAP in CRP Normal-Normal model

slide-35
SLIDE 35

Induced partitions

Definition (induced partition) If A is a partition of Rd then J A

n =

{i n: Xi ∈ A}: A ∈ A .

J A

7 =

  • {1}, {2, 7}, {3, 4, 6}, {5}
  • Łukasz Rajkowski

Analysis of MAP in CRP Normal-Normal model

slide-36
SLIDE 36

Induced partitions

Definition (induced partition) If A is a partition of Rd then J A

n =

{i n: Xi ∈ A}: A ∈ A .

Proposition

n

  • QX 1:n(J A)

a.s.

≈ n

e exp {∆(A)}, where

∆(A) = 1 2σ2

  • A∈A

P(A) ·

  • E (X | A)
  • 2 +
  • A∈A

P(A) ln P(A)

Łukasz Rajkowski Analysis of MAP in CRP Normal-Normal model

slide-37
SLIDE 37

Induced partitions

Definition (induced partition) If A is a partition of Rd then J A

n =

{i n: Xi ∈ A}: A ∈ A .

Proposition

n

  • QX 1:n(J A)

a.s.

≈ n

e exp {∆(A)}, where

∆(A) = 1 2σ2

  • A∈A

P(A) ·

  • E (X | A)
  • 2 +
  • A∈A

P(A) ln P(A) Tools: Stirling formula, SLLN

∆(A) ∼ a difference between scaled variance of E (X | σ(X −1(A))) and the entropy of that CEV.

Łukasz Rajkowski Analysis of MAP in CRP Normal-Normal model

slide-38
SLIDE 38

MAP revisisted

ˆ An =

conv{xj : j ∈ J}: J ∈ ˆ

Jn

– disjoint sets but not partition

Łukasz Rajkowski Analysis of MAP in CRP Normal-Normal model

slide-39
SLIDE 39

MAP revisisted

ˆ An =

conv{xj : j ∈ J}: J ∈ ˆ

Jn

– disjoint sets but not partition

Łukasz Rajkowski Analysis of MAP in CRP Normal-Normal model

slide-40
SLIDE 40

MAP revisisted

ˆ An =

conv{xj : j ∈ J}: J ∈ ˆ

Jn

– disjoint sets but not partition

Proposition

n

  • QX 1:n( ˆ

Jn)

a.s.

≈ n

e exp

  • ∆( ˆ

An)

  • Łukasz Rajkowski

Analysis of MAP in CRP Normal-Normal model

slide-41
SLIDE 41

MAP revisisted

ˆ An =

conv{xj : j ∈ J}: J ∈ ˆ

Jn

– disjoint sets but not partition

Proposition

n

  • QX 1:n( ˆ

Jn)

a.s.

≈ n

e exp

  • ∆( ˆ

An)

  • if P has bounded support and is

continuous w.r.t. Lebesgue measure

Łukasz Rajkowski Analysis of MAP in CRP Normal-Normal model

slide-42
SLIDE 42

MAP revisisted

ˆ An =

conv{xj : j ∈ J}: J ∈ ˆ

Jn

– disjoint sets but not partition

Proposition

n

  • QX 1:n( ˆ

Jn)

a.s.

≈ n

e exp

  • ∆( ˆ

An)

  • if P has bounded support and is

continuous w.r.t. Lebesgue measure

ˆ An

Łukasz Rajkowski Analysis of MAP in CRP Normal-Normal model

slide-43
SLIDE 43

MAP revisisted

ˆ An =

conv{xj : j ∈ J}: J ∈ ˆ

Jn

– disjoint sets but not partition

Proposition

n

  • QX 1:n( ˆ

Jn)

a.s.

≈ n

e exp

  • ∆( ˆ

An)

  • if P has bounded support and is

continuous w.r.t. Lebesgue measure

ˆ An

Łukasz Rajkowski Analysis of MAP in CRP Normal-Normal model

slide-44
SLIDE 44

MAP revisisted

ˆ An =

conv{xj : j ∈ J}: J ∈ ˆ

Jn

– disjoint sets but not partition

Proposition

n

  • QX 1:n( ˆ

Jn)

a.s.

≈ n

e exp

  • ∆( ˆ

An)

  • if P has bounded support and is

continuous w.r.t. Lebesgue measure

ˆ An+1

Łukasz Rajkowski Analysis of MAP in CRP Normal-Normal model

slide-45
SLIDE 45

MAP revisisted

ˆ An =

conv{xj : j ∈ J}: J ∈ ˆ

Jn

– disjoint sets but not partition

Proposition

n

  • QX 1:n( ˆ

Jn)

a.s.

≈ n

e exp

  • ∆( ˆ

An)

  • if P has bounded support and is

continuous w.r.t. Lebesgue measure Proof: Stirling We need a bounded num- ber of components in ˆ Jn and their size growing in- finitely large bounded support of P SLLN We need SLLN to hold uniformly for the class of all convex sets boundary of every convex set can be covered by count- ably many hyperplanes + set of P measure 0. (Pollard, Elker, Stute, 1979)

Łukasz Rajkowski Analysis of MAP in CRP Normal-Normal model

slide-46
SLIDE 46

MAP limits

General assumption: P has bounded support and is continuous w.r.t. Lebesgue measure.

Łukasz Rajkowski Analysis of MAP in CRP Normal-Normal model

slide-47
SLIDE 47

MAP limits

General assumption: P has bounded support and is continuous w.r.t. Lebesgue measure. Preparation every bounded sequence of compact&convex sets has a subsequence converging in the Hausdorff metric dH, given by dH(A, B) = infǫ>0{A ⊆ (B)ǫ and B ⊆ (A)ǫ} for a bounded sequence of compact&convex sets convergence in dH implies convergence in dP, where dP = P(A ÷ B)

  • Definition. P-partitions is a family of P-measurable sets A

such that P(A ∩ B) = 0 and P( A) = 1.

Łukasz Rajkowski Analysis of MAP in CRP Normal-Normal model

slide-48
SLIDE 48

Main theorem

Theorem Every limiting point of ( ˆ An)∞

n=1 is a finite, convex P-partition that

maximises ∆. Moreover, the distance of ( ˆ An)∞

n=1 to the set of

maximisers of ∆ is decreasing to 0.

Łukasz Rajkowski Analysis of MAP in CRP Normal-Normal model

slide-49
SLIDE 49

Main theorem

Theorem Every limiting point of ( ˆ An)∞

n=1 is a finite, convex P-partition that

maximises ∆. Moreover, the distance of ( ˆ An)∞

n=1 to the set of

maximisers of ∆ is decreasing to 0. Proof: Why convex? Because Hausdorff metric preserves convexity. Why P-partition? Basic set operations are continuous wrt dP metric, moreover the set of sampled points is dense in suppP. Why maximiser of ∆? Let ˆ Ank → ˆ

  • A. Take any other finite,

convex P-partition A. Then

n

  • QX1:n(J A

n ) ≈ n

e exp(∆(A))

n

  • QX1:n( ˆ

Jn) ≈ n e exp(∆( ˆ An)) ≈ n e exp(∆( ˆ A))

  • Łukasz Rajkowski

Analysis of MAP in CRP Normal-Normal model

slide-50
SLIDE 50

References

Jeffrey W. Miller and Matthew T. Harrison. Inconsistency of Pitman-Yor process mixtures for the number of components. Journal of Machine Learning Research, 15:3333–3370, 2014. Łukasz Rajkowski. Analysis of MAP in CRP Normal-Normal model, available on arXiv

Łukasz Rajkowski Analysis of MAP in CRP Normal-Normal model

slide-51
SLIDE 51

References

Jeffrey W. Miller and Matthew T. Harrison. Inconsistency of Pitman-Yor process mixtures for the number of components. Journal of Machine Learning Research, 15:3333–3370, 2014. Łukasz Rajkowski. Analysis of MAP in CRP Normal-Normal model, available on arXiv

Łukasz Rajkowski Analysis of MAP in CRP Normal-Normal model

slide-52
SLIDE 52

Thank you for your attention

Łukasz Rajkowski Analysis of MAP in CRP Normal-Normal model