Parameter Estimation in Mixtures of Truncated Exponentials Helge - - PowerPoint PPT Presentation

parameter estimation in mixtures of truncated exponentials
SMART_READER_LITE
LIVE PREVIEW

Parameter Estimation in Mixtures of Truncated Exponentials Helge - - PowerPoint PPT Presentation

Parameter Estimation in Mixtures of Truncated Exponentials Helge Langseth 1 Thomas D. Nielsen 2 Rafael Rum 3 Antonio Salmern 3 1 Dept. of Computer and Information Science, The Norwegian University of Science and Technology, Norway 2 Dept. of


slide-1
SLIDE 1

Parameter Estimation in Mixtures of Truncated Exponentials

Helge Langseth1 Thomas D. Nielsen2 Rafael Rumí3 Antonio Salmerón3

  • 1Dept. of Computer and Information Science, The Norwegian University of

Science and Technology, Norway

  • 2Dept. of Computer Science, Aalborg University, Denmark
  • 3Dept. of Statistics and Applied Mathematics, University of Almería, Spain

PGM, September 2008

1 Langseth, Nielsen, Rumí and Salmerón Parameter estimation in MTEs

slide-2
SLIDE 2

Outline

1

Background Motivation Mixtures of Truncated Exponentials

2

Learning MTEs from data Background Maximum likelihood estimation in MTEs Constrained optimisation and Lagrange multipliers The Newton-Raphson method The initialisation procedure

3

Model selection Locating splitpoints Determining model complexity

4

Conclusions

2 Langseth, Nielsen, Rumí and Salmerón Parameter estimation in MTEs

slide-3
SLIDE 3

Background Mixtures of Truncated Exponentials

Mixtures of Truncated Exponentials

−3 −2 −1 1 2 3 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 −6 −4 −2 2 4 6 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Z Y

Calculate P(Y = 1) in Hugin: “Illegal link”

f(z) =

1 √ 2π exp

  • − 1

2z2

P(Y = 1|z) =

1 1+exp(−z)

3 Langseth, Nielsen, Rumí and Salmerón Parameter estimation in MTEs

slide-4
SLIDE 4

Background Mixtures of Truncated Exponentials

Mixtures of Truncated Exponentials

−3 −2 −1 1 2 3 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4

−6 −4 −2 2 4 6 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Z Y

Calculate P(Y = 1) with MTEs: P(Y = 1) ≈ 0.4996851

f(z) =            −0.0172 + 0.931e1.27z if − 3 ≤ z < −1 0.442 − 0.0385e−1.64z if − 1 ≤ z < 0 0.442 − 0.0385e1.64z if 0 ≤ z < 1 −0.0172 + 0.9314e−1.27z if 1 ≤ z < 3

P(Y = 1|z) =            if z < −5 −0.0217 + 0.522e0.635z if − 5 ≤ z < 0 1.0217 − 0.522e−0.635z if 0 ≤ z ≤ 5 1 if z > 5

3 Langseth, Nielsen, Rumí and Salmerón Parameter estimation in MTEs

slide-5
SLIDE 5

Background Mixtures of Truncated Exponentials

The MTE model

Definition (Univariate MTE potential over a continuous variable) Let Z be a continuous variable. A function f : ΩZ → R+

0 is an

MTE potential over Z

1

If f(z) = a0 +

m

  • i=1

ai exp (bi · z) for all z ∈ ΩZ, where ai, bi are real numbers

2

. . . or there is a partition of ΩZ into intervals I1, . . . , Ik s.t. f is defined as above on each Ij. Generalization to arbitrary hybrid domains (Moral et al. 2001) The definition transfers to multivariate domains containing both continuous and discrete variables.

4 Langseth, Nielsen, Rumí and Salmerón Parameter estimation in MTEs

slide-6
SLIDE 6

Learning MTEs from data

Outline

1

Background Motivation Mixtures of Truncated Exponentials

2

Learning MTEs from data Background Maximum likelihood estimation in MTEs Constrained optimisation and Lagrange multipliers The Newton-Raphson method The initialisation procedure

3

Model selection Locating splitpoints Determining model complexity

4

Conclusions

5 Langseth, Nielsen, Rumí and Salmerón Parameter estimation in MTEs

slide-7
SLIDE 7

Learning MTEs from data Background

Learning MTEs from data

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 5 10 15

The MTE learning problem How to find the MTE-distribution that generated this data?

6 Langseth, Nielsen, Rumí and Salmerón Parameter estimation in MTEs

slide-8
SLIDE 8

Learning MTEs from data Background

Learning MTEs from data

The learning task involves three basic steps:

1

Determine the intervals into which ΩZ will be partitioned.

2

Determine the number of exponential terms in the mixture for each interval.

3

Estimate the parameters. Simplifying assumptions In this work we are concerned with the univariate case. For simplicity we will initially assume that:

The intervals into which ΩZ will be partitioned is known; The number of exponential terms in the mixture for each interval is fixed to 2, giving target density f(z) = k + a · exp(b · z) + c · exp(d · z).

7 Langseth, Nielsen, Rumí and Salmerón Parameter estimation in MTEs

slide-9
SLIDE 9

Learning MTEs from data Background

Learning MTEs from data

The learning task involves three basic steps:

1

Determine the intervals into which ΩZ will be partitioned.

2

Determine the number of exponential terms in the mixture for each interval.

3

Estimate the parameters. Simplifying assumptions In this work we are concerned with the univariate case. For simplicity we will initially assume that:

The intervals into which ΩZ will be partitioned is known; The number of exponential terms in the mixture for each interval is fixed to 2, giving target density f(z) = k + a · exp(b · z) + c · exp(d · z).

7 Langseth, Nielsen, Rumí and Salmerón Parameter estimation in MTEs

slide-10
SLIDE 10

Learning MTEs from data Maximum likelihood estimation in MTEs

Learning MTEs from data by Maximum Likelihood

Why learn MTEs using Maximum Likelihood? Well developed core theory, incl. good asymptotic properties under regularity conditions. ML parameters give access to a variety of model estimation procedures:

LRT or BIC for selecting no. exponential terms; Likelihood maximisation to locate split-points.

Problems The likelihood equations cannot be solved analytically. Identifiability or parameters.

8 Langseth, Nielsen, Rumí and Salmerón Parameter estimation in MTEs

slide-11
SLIDE 11

Learning MTEs from data Maximum likelihood estimation in MTEs

Initial observations

We will assume target density f(z|θj) = kj + aj · exp(bj · z) + cj · exp(dj · z), z ∈ Ij for interval Ij; θj = {kj, aj, bj, cj, dj}. Denote by nj the no. observations from interval Ij and let N =

j nj. Then the ML solution ˆ

θj must satisfy

  • z ∈ Ij

f(z|ˆ θj) dz = nj/N. (1) Parameter independence ˆ θk can be found independently of ˆ θl as long as Equation (1) is satisfied for all ˆ θj. We will therefore look at a single interval I from now on (and drop the index j when appropriate).

9 Langseth, Nielsen, Rumí and Salmerón Parameter estimation in MTEs

slide-12
SLIDE 12

Learning MTEs from data Constrained optimisation and Lagrange multipliers

Constrained optimisation

Maximize log L(θ|z) =

  • i : zi ∈ I

log L(θ|zi) =

  • i : zi ∈ I

log f(zi|θ) Subject to

  • z ∈ I

f(z|θ) dz − n/N = 0, f(e1|θ) ≥ 0, f(e2|θ) ≥ 0.

10 Langseth, Nielsen, Rumí and Salmerón Parameter estimation in MTEs

slide-13
SLIDE 13

Learning MTEs from data Constrained optimisation and Lagrange multipliers

Constrained optimisation

Maximize log L(θ|z) =

  • i : zi ∈ I

log L(θ|zi) =

  • i : zi ∈ I

log f(zi|θ) Subject to

  • z ∈ I

f(z|θ) dz − n/N = 0, f(e1|θ) − s2

1 = 0,

f(e2|θ) − s2

2 = 0.

10 Langseth, Nielsen, Rumí and Salmerón Parameter estimation in MTEs

slide-14
SLIDE 14

Learning MTEs from data Constrained optimisation and Lagrange multipliers

Constrained optimisation

Maximize log L(θ|z) =

  • i : zi ∈ I

log L(θ|zi) =

  • i : zi ∈ I

log f(zi|θ) Subject to

  • z ∈ I

f(z|θ) dz − n/N = 0, f(e1|θ) − s2

1 = 0,

f(e2|θ) − s2

2 = 0.

Notation: φ = [θT sT]T, ψ = [θT sT λT]T = [φTλT]T, g0(φ) =

  • z ∈ I f(z|θ) dz − n/N,

g1(φ) = f(e1|θ) − s2

1; g2(φ) = f(e2|θ) − s2 2.

Lagrange multipliers Find the root of ∇ ψ (log L(θ|z) + λTg(φ)) to solve the constrained optimisation problem.

10 Langseth, Nielsen, Rumí and Salmerón Parameter estimation in MTEs

slide-15
SLIDE 15

Learning MTEs from data The Newton-Raphson method

The Newton-Raphson method

Example: Find x s.t. h(x) = 0.

11 Langseth, Nielsen, Rumí and Salmerón Parameter estimation in MTEs

slide-16
SLIDE 16

Learning MTEs from data The Newton-Raphson method

The Newton-Raphson method

Example: Find x s.t. h(x) = 0. Initial “guess”: x = x0; approximate h(x) by its tangent in x0.

11 Langseth, Nielsen, Rumí and Salmerón Parameter estimation in MTEs

slide-17
SLIDE 17

Learning MTEs from data The Newton-Raphson method

The Newton-Raphson method

Example: Find x s.t. h(x) = 0. New “guess” x1: The point where tangent crosses abscissa.

11 Langseth, Nielsen, Rumí and Salmerón Parameter estimation in MTEs

slide-18
SLIDE 18

Learning MTEs from data The Newton-Raphson method

The Newton-Raphson method

Example: Find x s.t. h(x) = 0. Iterate using general formula xt+1 ← xt − {h′(xt)}−1 · h(xt).

11 Langseth, Nielsen, Rumí and Salmerón Parameter estimation in MTEs

slide-19
SLIDE 19

Learning MTEs from data The Newton-Raphson method

The Lagrange Multipliers method

Maximise likelihood given constraints Use the multivariate Newton-Raphson method to solve A(ψ|z) ≡ ∇ ψ (log L(θ|z) + λTg(φ)) = 0: ψt+1 ← ψt − J(A(ψt|z))−1 · A(ψt|z). Initialisation of Newton-Raphson:

Choose θ0 “randomly” giving s0 =

  • f(e1|θ0)
  • f(e2|θ0)

T λ0 = [1 1]T (chosen rather arbitrarily).

12 Langseth, Nielsen, Rumí and Salmerón Parameter estimation in MTEs

slide-20
SLIDE 20

Learning MTEs from data The Newton-Raphson method

Example-run, Lagrange multipliers

−6 −4 −2 2 4 6 −6 −4 −2 2 4 6

Likelihood of example data D = {z1, . . . , zn}; value of point (b0, d0) given as maxk,a,c

  • i (k + a exp(b0 · zi) + c exp(d0 · zi)} .

13 Langseth, Nielsen, Rumí and Salmerón Parameter estimation in MTEs

slide-21
SLIDE 21

Learning MTEs from data The initialisation procedure

Initialisation of the Newton-Raphson method

Initialization procedure – Main idea Instead of maximising over 5 parameters under the constraint

  • z ∈ I

f(z|θ) dz = n/N, we iteratively maximise over pairs of parameters. One parameter is varied freely, the other is chosen to make sure that the constraint is fulfilled. A high-dimensional constrained optimisation problem is thus replaced by a series of “unconstrained”

  • ptimisation problems; each in one dimension.

14 Langseth, Nielsen, Rumí and Salmerón Parameter estimation in MTEs

slide-22
SLIDE 22

Learning MTEs from data The initialisation procedure

Initialisation algorithm

Initialisation: Choose some “random” starting values for θ, making sure that sure that

  • z ∈ I f(z|θ) dz = n/N.

Constant k a · exp(b · z) c · exp(d · z)

15 Langseth, Nielsen, Rumí and Salmerón Parameter estimation in MTEs

slide-23
SLIDE 23

Learning MTEs from data The initialisation procedure

Initialisation algorithm

Maximise over a; compensate using k: k determined by a to make sure that

  • z ∈ I f(z|θ) dz = n/N.

a ← max

a′

  • i : zi ∈ I

log f(zi|k′ = func(θ, a′), a′, θ).

15 Langseth, Nielsen, Rumí and Salmerón Parameter estimation in MTEs

slide-24
SLIDE 24

Learning MTEs from data The initialisation procedure

Initialisation algorithm

Maximise over c; compensate using k: k determined by c to make sure that

  • z ∈ I f(z|θ) dz = n/N.

c ← max

c′

  • i : zi ∈ I

log f(zi|k′ = func(θ, c′), c′, θ).

15 Langseth, Nielsen, Rumí and Salmerón Parameter estimation in MTEs

slide-25
SLIDE 25

Learning MTEs from data The initialisation procedure

Initialisation algorithm

Maximise over b; compensate using a: a determined by b to make sure that

  • z ∈ I f(z|θ) dz = n/N.

b ← max

b′

  • i : zi ∈ I

log f(zi|a′ = func(θ, b′), b′, θ).

15 Langseth, Nielsen, Rumí and Salmerón Parameter estimation in MTEs

slide-26
SLIDE 26

Learning MTEs from data The initialisation procedure

Initialisation algorithm

Maximise over d; compensate using c: c determined by d to make sure that

  • z ∈ I f(z|θ) dz = n/N.

d ← max

d′

  • i : zi ∈ I

log f(zi|c′ = func(θ, d′), d′, θ).

15 Langseth, Nielsen, Rumí and Salmerón Parameter estimation in MTEs

slide-27
SLIDE 27

Learning MTEs from data The initialisation procedure

Initialisation algorithm

Check for convergence: At this point all parameters have been updated at least

  • nce.

Calculate likelihood and check if there is a significant improvement. If improved then iterate again, otherwise return.

15 Langseth, Nielsen, Rumí and Salmerón Parameter estimation in MTEs

slide-28
SLIDE 28

Learning MTEs from data The initialisation procedure

Example run (with initialisation)

−6 −4 −2 2 4 6 −6 −4 −2 2 4 6 16 Langseth, Nielsen, Rumí and Salmerón Parameter estimation in MTEs

slide-29
SLIDE 29

Model selection

Outline

1

Background Motivation Mixtures of Truncated Exponentials

2

Learning MTEs from data Background Maximum likelihood estimation in MTEs Constrained optimisation and Lagrange multipliers The Newton-Raphson method The initialisation procedure

3

Model selection Locating splitpoints Determining model complexity

4

Conclusions

17 Langseth, Nielsen, Rumí and Salmerón Parameter estimation in MTEs

slide-30
SLIDE 30

Model selection Locating splitpoints

Model selection: Split-point for dataset

New data-set: 50 samples from the standard Normal distribution

−2.5 −2 −1.5 −1 −0.5 0.5 1 1.5 2 −155 −150 −145 −140 −135 −130 −125 −120

Likelihood of data using ML estimators for different split-points

18 Langseth, Nielsen, Rumí and Salmerón Parameter estimation in MTEs

slide-31
SLIDE 31

Model selection Determining model complexity

Model selection: No. parameters per interval

−2.5 −2 −1.5 −1 −0.5 0.5 1 1.5 2 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

Interval [−2.5200, −0.1303: Constant term: L(ˆ θ1|z) = −77.641. 1 exponential term: L(ˆ θ1|z) = −55.317 = ⇒ p = 0.000. 2 exponential terms: L(ˆ θ1|z) = −55.314 = ⇒ p = 0.996.

19 Langseth, Nielsen, Rumí and Salmerón Parameter estimation in MTEs

slide-32
SLIDE 32

Model selection Determining model complexity

Model selection: No. parameters per interval

−2.5 −2 −1.5 −1 −0.5 0.5 1 1.5 2 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

Interval [−0.1303, 2.2368: Constant term: L(ˆ θ2|z) = −77.742. 1 exponential term: L(ˆ θ2|z) = −64.490 = ⇒ p = 0.000. 2 exponential terms: L(ˆ θ2|z) = −64.490 = ⇒ p = 1.000.

19 Langseth, Nielsen, Rumí and Salmerón Parameter estimation in MTEs

slide-33
SLIDE 33

Conclusions

Outline

1

Background Motivation Mixtures of Truncated Exponentials

2

Learning MTEs from data Background Maximum likelihood estimation in MTEs Constrained optimisation and Lagrange multipliers The Newton-Raphson method The initialisation procedure

3

Model selection Locating splitpoints Determining model complexity

4

Conclusions

20 Langseth, Nielsen, Rumí and Salmerón Parameter estimation in MTEs

slide-34
SLIDE 34

Conclusions

Conclusions

We have described an efficient method for learning ML estimates of univariate MTEs. ML estimates fairly robust; improvement over traditional (regression-based) method substantial. ML estimates can be used for model selection:

  • No. exponential terms in each interval;

Number of split-points, and their location.

Ongoing work: Extension to conditional distributions: Learning parameters of conditional distributions (“solved”). Locating split-points (difficult; some progress is made).

21 Langseth, Nielsen, Rumí and Salmerón Parameter estimation in MTEs