Imprecise probability models for inference in exponential families - - PowerPoint PPT Presentation

imprecise probability models for inference in exponential
SMART_READER_LITE
LIVE PREVIEW

Imprecise probability models for inference in exponential families - - PowerPoint PPT Presentation

Imprecise probability models for inference in exponential families SYSTeMS-dialogue of 14 July 2005 Erik Quaeghebeur SYSTeMS research group General idea - Specific idea - A result - Updating - History - Classification p.1/14 Overview 1.


slide-1
SLIDE 1

Imprecise probability models for inference in exponential families

SYSTeMS-dialogue of 14 July 2005

Erik Quaeghebeur SYSTeMS research group General idea - Specific idea - A result - Updating - History - Classification

– p.1/14

slide-2
SLIDE 2

Overview

  • 1. The general idea
  • 2. Specifying the details
  • 3. A useful result
  • 4. Updating
  • 5. History: how this research got started
  • 6. An application: classification
  • 7. Conclusions

General idea - Specific idea - A result - Updating - History - Classification

– p.2/14

slide-3
SLIDE 3

The general idea

General idea - Specific idea - A result - Updating - History - Classification

– p.3/14

slide-4
SLIDE 4

The general idea

Sampling model f(x | ψ): likelihood function Lx(ψ), sufficient statistic.

General idea - Specific idea - A result - Updating - History - Classification

– p.3/14

slide-5
SLIDE 5

The general idea

Sampling model f(x | ψ): likelihood function Lx(ψ), sufficient statistic. Choose some prior C(ψ): obtain a posterior after

  • bserving samples.

General idea - Specific idea - A result - Updating - History - Classification

– p.3/14

slide-6
SLIDE 6

The general idea

Sampling model f(x | ψ): likelihood function Lx(ψ), sufficient statistic. Choose some prior C(ψ): obtain a posterior after

  • bserving samples.

Obtain the corresponding predictive distribution through

P(x) =

  • Ψ CLx.

General idea - Specific idea - A result - Updating - History - Classification

– p.3/14

slide-7
SLIDE 7

The general idea

Sampling model f(x | ψ): likelihood function Lx(ψ), sufficient statistic. Choose some prior C(ψ): obtain a posterior after

  • bserving samples.

Obtain the corresponding predictive distribution through

P(x) =

  • Ψ CLx.

Obtain the corresponding linear previsions PC and PP.

General idea - Specific idea - A result - Updating - History - Classification

– p.3/14

slide-8
SLIDE 8

The general idea

Sampling model f(x | ψ): likelihood function Lx(ψ), sufficient statistic. Choose some prior C(ψ): obtain a posterior after

  • bserving samples.

Obtain the corresponding predictive distribution through

P(x) =

  • Ψ CLx.

Obtain the corresponding linear previsions PC and PP. Imprecision: take a set of priors, use the lower envelope theorem to obtain coherent lower previsions P C and P P.

General idea - Specific idea - A result - Updating - History - Classification

– p.3/14

slide-9
SLIDE 9

Specifying the details: sampling model

Sampling model f(x | ψ): likelihood function Lx(ψ), sufficient statistic.

General idea - Specific idea - A result - Updating - History - Classification

– p.4/14

slide-10
SLIDE 10

Specifying the details: sampling model

Exponential family sampling model Ef(x | ψ): likelihood

Lx(ψ), sufficient statistic τ(x) of fixed dimension. Ef(x | ψ) = a(x) exp(ψ, τ(x) − b(ψ)).

General idea - Specific idea - A result - Updating - History - Classification

– p.4/14

slide-11
SLIDE 11

Specifying the details: sampling model

Exponential family sampling model Ef(x | ψ): likelihood

Lx(ψ), sufficient statistic τ(x) of fixed dimension. Ef(x | ψ) = a(x) exp(ψ, τ(x) − b(ψ)).

Multinomial sampling Likelihood function is a multivariate

Bernoulli Br(x | θ):

x ∈ {0, 1}d : x ≤ 1; τ(x) = x; θ ∈ (0, 1)d : θ < 1, θ0 = 1 −

  • i

θi; ψ(θ) =

  • ln( θi

θ0 ) d

i=1

; a = 1; b(ψ(θ)) = ln(θ0).

General idea - Specific idea - A result - Updating - History - Classification

– p.4/14

slide-12
SLIDE 12

Specifying the details: sampling model

Exponential family sampling model Ef(x | ψ): likelihood

Lx(ψ), sufficient statistic τ(x) of fixed dimension. Ef(x | ψ) = a(x) exp(ψ, τ(x) − b(ψ)).

Normal sampling Likelihood is a Normal N(x | µ, λ):

x ∈ R; τ(x) = (x, x2); µ ∈ R, λ ∈ R+, σ2 = 1 λ; ψ(λ, µ) = (λµ, −1 2λ); a = 1 √ 2π; b(ψ(µ, λ)) = λµ2 − ln(λ) 2 .

General idea - Specific idea - A result - Updating - History - Classification

– p.4/14

slide-13
SLIDE 13

Specifying the details: conjugate

Choose some prior C(ψ): obtain a posterior after

  • bserving samples.

General idea - Specific idea - A result - Updating - History - Classification

– p.5/14

slide-14
SLIDE 14

Specifying the details: conjugate

Choose a conjugate prior CEf(ψ | n0, y0): easily obtain a posterior CEf(ψ | nk, yk) after observing k samples.

CEf(ψ | n, y) = c(n, y) exp(n [ψ, y − b(ψ)])

General idea - Specific idea - A result - Updating - History - Classification

– p.5/14

slide-15
SLIDE 15

Specifying the details: conjugate

Choose a conjugate prior CEf(ψ | n0, y0): easily obtain a posterior CEf(ψ | nk, yk) after observing k samples.

CEf(ψ | n, y) = c(n, y) exp(n [ψ, y − b(ψ)])

Multinomial sampling The conjugate distribution is a

Dirichlet distribution Di(θ | ny, ny0):

y ∈ (0, 1)d : y < 1, y0 = 1 −

  • i

yi; c(n, y) = Γ(n) Γ(ny0)

i Γ(nyi).

General idea - Specific idea - A result - Updating - History - Classification

– p.5/14

slide-16
SLIDE 16

Specifying the details: conjugate

Choose a conjugate prior CEf(ψ | n0, y0): easily obtain a posterior CEf(ψ | nk, yk) after observing k samples.

CEf(ψ | n, y) = c(n, y) exp(n [ψ, y − b(ψ)])

Normal sampling The conjugate distribution is a

Normal-gamma distribution

N(µ | y1, nλ)Ga(λ | n+3

2 , n[y2−y1

2]

2

): y ∈ R × R+ : y2 − y12 > 0; c(n, y) = 2√n √ 2π

  • n[y2−y1

2]

2

n+3

2

Γ(n+3

2 )

.

General idea - Specific idea - A result - Updating - History - Classification

– p.5/14

slide-17
SLIDE 17

Specifying the details: predictive

Obtain the corresponding predictive distribution through

P(x) =

  • Ψ CLx.

General idea - Specific idea - A result - Updating - History - Classification

– p.6/14

slide-18
SLIDE 18

Specifying the details: predictive

Obtain the corresponding predictive distribution through

PEf(x | n, y) =

  • Ψ

CEf(· | n, y)Lx = c(n, y)a(x) c(n + 1, ny+τ(x)

n+1

) .

General idea - Specific idea - A result - Updating - History - Classification

– p.6/14

slide-19
SLIDE 19

Specifying the details: predictive

Obtain the corresponding predictive distribution through

PEf(x | n, y) =

  • Ψ

CEf(· | n, y)Lx = c(n, y)a(x) c(n + 1, ny+τ(x)

n+1

) .

Multinomial sampling The predictive distribution is a

Dirichlet-multinomial distribution DiMn(x | ny, ny0).

Normal sampling The predictive distribution is a Student

distribution St(x | y1, n+3

n+1 1 y2−y2

1 , n + 3).

General idea - Specific idea - A result - Updating - History - Classification

– p.6/14

slide-20
SLIDE 20

Specifying the details: linear previsions

Obtain the corresponding linear previsions PC and PP.

General idea - Specific idea - A result - Updating - History - Classification

– p.7/14

slide-21
SLIDE 21

Specifying the details: linear previsions

Obtain the corresponding linear previsions

PC(f | nk, y) =

  • Ψ

CEf(· | nk, y)f, f ∈ L(Ψ) ≈ [Ψ → R].

and

PP(f | nk, y) =

  • X

PEf(· | nk, y)f, f ∈ L(X) ≈ [X → R].

General idea - Specific idea - A result - Updating - History - Classification

– p.7/14

slide-22
SLIDE 22

Specifying the details: lower previsions

Imprecision: take a set of priors, use the lower envelope theorem to obtain coherent lower previsions P C and P P.

General idea - Specific idea - A result - Updating - History - Classification

– p.8/14

slide-23
SLIDE 23

Specifying the details: lower previsions

Imprecision: take a set of priors, one for every y ∈ Y0, use the lower envelope theorem to obtain coherent lower previsions

P C(· | nk, Yk) = inf

y∈Yk PC(· | nk, y).

and

P P(· | nk, Yk) = inf

y∈Yk PP(· | nk, y).

General idea - Specific idea - A result - Updating - History - Classification

– p.8/14

slide-24
SLIDE 24

A useful result

P(τ | ψ) =

  • X

Ef(· | ψ)τ

General idea - Specific idea - A result - Updating - History - Classification

– p.9/14

slide-25
SLIDE 25

A useful result

P(τ | ψ) =

  • X

Ef(· | ψ)τ

Multinomial sampling P(τ | ψ) = θ(ψ). Normal sampling P(τ | ψ) = (µ(ψ), m2(ψ)).

General idea - Specific idea - A result - Updating - History - Classification

– p.9/14

slide-26
SLIDE 26

A useful result

P(τ | Ψ)

General idea - Specific idea - A result - Updating - History - Classification

– p.9/14

slide-27
SLIDE 27

A useful result

PC(P(τ | Ψ) | nk, yk) = yk

General idea - Specific idea - A result - Updating - History - Classification

– p.9/14

slide-28
SLIDE 28

A useful result

P C(P(τ | Ψ) | nk, Yk) = inf Yk

General idea - Specific idea - A result - Updating - History - Classification

– p.9/14

slide-29
SLIDE 29

A useful result

P C(P(τ | Ψ) | nk, Yk) = inf Yk P C(P(τ | Ψ) | nk, Yk) = sup Yk

General idea - Specific idea - A result - Updating - History - Classification

– p.9/14

slide-30
SLIDE 30

Updating

Initial choice n0 ∈ R+ and Y0 ⊂ Y (bounded).

General idea - Specific idea - A result - Updating - History - Classification

– p.10/14

slide-31
SLIDE 31

Updating

Initial choice n0 ∈ R+ and Y0 ⊂ Y (bounded). Take k samples, keep sufficient statistic τ k.

General idea - Specific idea - A result - Updating - History - Classification

– p.10/14

slide-32
SLIDE 32

Updating

Initial choice n0 ∈ R+ and Y0 ⊂ Y (bounded). Take k samples, keep sufficient statistic τ k. Update parameters:

nk = n0 + k, Yk = n0y + τ k n0 + k : y ∈ Y0

  • ⊂ Y.

General idea - Specific idea - A result - Updating - History - Classification

– p.10/14

slide-33
SLIDE 33

Updating

Initial choice n0 ∈ R+ and Y0 ⊂ Y (bounded). Take k samples, keep sufficient statistic τ k. Update parameters:

nk = n0 + k, Yk =

  • n0y+τ k

n0+k : y ∈ Y0

⊂ Y.

Multinomial sampling

Yk−1 nk−1 = 2

  • 2

1

“1” observed

Yk nk = 3

✁ ✁ ✁

2 1

General idea - Specific idea - A result - Updating - History - Classification

– p.10/14

slide-34
SLIDE 34

Updating

Initial choice n0 ∈ R+ and Y0 ⊂ Y (bounded). Take k samples, keep sufficient statistic τ k. Update parameters:

nk = n0 + k, Yk =

  • n0y+τ k

n0+k : y ∈ Y0

⊂ Y.

Normal sampling

y1 y2 Yk−1 nk−1 = 2

“x” observed

y1 y2 Yk nk = 3

τ(x)

General idea - Specific idea - A result - Updating - History - Classification

– p.10/14

slide-35
SLIDE 35

History: how this research got started

General idea - Specific idea - A result - Updating - History - Classification

– p.11/14

slide-36
SLIDE 36

History: how this research got started

Using the IDM: problems with optimization problems. Literature search: no solution, but. . .

General idea - Specific idea - A result - Updating - History - Classification

– p.11/14

slide-37
SLIDE 37

History: how this research got started

Using the IDM: problems with optimization problems. Literature search: no solution, but. . . . . . the realization that the idea underlying the IDM for multinomial sampling generalizes to all exponential family sampling models: common interpretation for parameters n and y; easy updating.

General idea - Specific idea - A result - Updating - History - Classification

– p.11/14

slide-38
SLIDE 38

History: how this research got started

Using the IDM: problems with optimization problems. Literature search: no solution, but. . . . . . the realization that the idea underlying the IDM for multinomial sampling generalizes to all exponential family sampling models: common interpretation for parameters n and y; easy updating. However, using these models: again possible problems with optimization problems.

General idea - Specific idea - A result - Updating - History - Classification

– p.11/14

slide-39
SLIDE 39

An application: classification

Classifier: maps attributes a ∈ A to classes c ∈ C.

General idea - Specific idea - A result - Updating - History - Classification

– p.12/14

slide-40
SLIDE 40

An application: classification

Classifier: maps attributes a ∈ A to classes c ∈ C. Credal classifier: uses P(· | A) on L(C) to create an

  • rdering of the classes, i.e., P(fc′ − fc′′ | a) > 0?

General idea - Specific idea - A result - Updating - History - Classification

– p.12/14

slide-41
SLIDE 41

An application: classification

Classifier: maps attributes a ∈ A to classes c ∈ C. Credal classifier: uses P(· | A) on L(C) to create an

  • rdering of the classes, i.e., P(fc′ − fc′′ | a) > 0?

P(· | A) created using a class model P on L(C) and

attribute models P(· | C) on L(A).

General idea - Specific idea - A result - Updating - History - Classification

– p.12/14

slide-42
SLIDE 42

An application: classification

Classifier: maps attributes a ∈ A to classes c ∈ C. Credal classifier: uses P(· | A) on L(C) to create an

  • rdering of the classes, i.e., P(fc′ − fc′′ | a) > 0?

P(· | A) created using a class model P on L(C) and

attribute models P(· | C) on L(A). Classically, both models are IDMM’s; here, any

P P(· | nA|C, YA|C) is possible.

General idea - Specific idea - A result - Updating - History - Classification

– p.12/14

slide-43
SLIDE 43

An application: classification

Classifier: maps attributes a ∈ A to classes c ∈ C. Credal classifier: uses P(· | A) on L(C) to create an

  • rdering of the classes, i.e., P(fc′ − fc′′ | a) > 0?

P(· | A) created using a class model P on L(C) and

attribute models P(· | C) on L(A). Classically, both models are IDMM’s; here, any

P P(· | nA|C, YA|C) is possible.

Advantages: allows for continuous attributes; straightforward training. Disadvantage: optimization problems are harder to solve.

General idea - Specific idea - A result - Updating - History - Classification

– p.12/14

slide-44
SLIDE 44

An application: classification

Example optimizaton problems:

Multinomial sampling (i.e., multiple discrete attributes)

c′ ≻ c′′ ⇐ ⇒ inf

y∈YC

  • yc′
  • i

inf

yAi|c′∈YAi|c′ yai|c′ − yc′′

  • i

sup

yAi|c′′∈YAi|c′′

yai|c′′

  • > 0.

General idea - Specific idea - A result - Updating - History - Classification

– p.12/14

slide-45
SLIDE 45

An application: classification

Example optimizaton problems:

Multinomial sampling (i.e., multiple discrete attributes)

c′ ≻ c′′ ⇐ ⇒ inf

y∈YC

  • yc′
  • i

inf

yAi|c′∈YAi|c′ yai|c′ − yc′′

  • i

sup

yAi|c′′∈YAi|c′′

yai|c′′

  • > 0.

Normal sampling (i.e., one normal attribute) Replace the

products above by the inf / supyA|c∈YA|c of

  • nA|c

nA|c + 1 Γ(nA|c+4

2

) Γ(nA|c+3

2

) [nA|cyA|c,2 − nA|cy2

A|c,1]

nA|c+3 2

[nA|cyA|c,2 + a2 −

1 nA|c+1[nA|cyA|c,1 + a]2]

nA|c+4 2

.

General idea - Specific idea - A result - Updating - History - Classification

– p.12/14

slide-46
SLIDE 46

Conclusions

General idea - Specific idea - A result - Updating - History - Classification

– p.13/14

slide-47
SLIDE 47

Conclusions

We presented two imprecise probability models for inference in exponential families:

  • ne for making inferences about the parameter

describing the sampling model; the other for making inferences about future samples.

General idea - Specific idea - A result - Updating - History - Classification

– p.13/14

slide-48
SLIDE 48

Conclusions

We presented two imprecise probability models for inference in exponential families:

  • ne for making inferences about the parameter

describing the sampling model; the other for making inferences about future samples. Applicable for a large range of sampling models. . . . . . and thus potentially useful for many applications.

General idea - Specific idea - A result - Updating - History - Classification

– p.13/14

slide-49
SLIDE 49

Conclusions

We presented two imprecise probability models for inference in exponential families:

  • ne for making inferences about the parameter

describing the sampling model; the other for making inferences about future samples. Applicable for a large range of sampling models. . . . . . and thus potentially useful for many applications. However, difficult optimization problems might severely limit their use.

General idea - Specific idea - A result - Updating - History - Classification

– p.13/14

slide-50
SLIDE 50

Time for questions!

????????????????? ? ????????

General idea - Specific idea - A result - Updating - History - Classification

– p.14/14