 
              Parameter Estimation in Mixtures of Truncated Exponentials Helge Langseth 1 Thomas D. Nielsen 2 Rafael Rumí 3 Antonio Salmerón 3 1 Dept. of Computer and Information Science, The Norwegian University of Science and Technology, Norway 2 Dept. of Computer Science, Aalborg University, Denmark 3 Dept. of Statistics and Applied Mathematics, University of Almería, Spain PGM, September 2008 1 Langseth, Nielsen, Rumí and Salmerón Parameter estimation in MTEs
Outline Background 1 Motivation Mixtures of Truncated Exponentials Learning MTEs from data 2 Background Maximum likelihood estimation in MTEs Constrained optimisation and Lagrange multipliers The Newton-Raphson method The initialisation procedure Model selection 3 Locating splitpoints Determining model complexity Conclusions 4 2 Langseth, Nielsen, Rumí and Salmerón Parameter estimation in MTEs
Background Mixtures of Truncated Exponentials Mixtures of Truncated Exponentials 0.4 0.35 0.3 1 − 1 2 z 2 � 0.25 � f ( z ) = 2 π exp √ 0.2 0.15 0.1 0.05 0 −3 −2 −1 0 1 2 3 Z Calculate P ( Y = 1) in Hugin: “ Illegal link ” Y 1 0.9 0.8 0.7 0.6 0.5 1 P ( Y = 1 | z ) = 0.4 1+exp( − z ) 0.3 0.2 0.1 0 −6 −4 −2 0 2 4 6 3 Langseth, Nielsen, Rumí and Salmerón Parameter estimation in MTEs
Background Mixtures of Truncated Exponentials Mixtures of Truncated Exponentials 0.4  − 0 . 0172 + 0 . 931 e 1 . 27 z if − 3 ≤ z < − 1 0.35   0.3  0 . 442 − 0 . 0385 e − 1 . 64 z  if − 1 ≤ z < 0  0.25 f ( z ) = 0.2 0 . 442 − 0 . 0385 e 1 . 64 z if 0 ≤ z < 1  0.15   − 0 . 0172 + 0 . 9314 e − 1 . 27 z 0.1  if 1 ≤ z < 3  0.05 0 −3 −2 −1 0 1 2 3 Z Calculate P ( Y = 1) with MTEs: P ( Y = 1) ≈ 0 . 4996851 Y 1 0.9  0.8 0 if z < − 5  0.7   − 0 . 0217 + 0 . 522 e 0 . 635 z 0.6  if − 5 ≤ z < 0  0.5 P ( Y = 1 | z ) = 1 . 0217 − 0 . 522 e − 0 . 635 z 0.4 if 0 ≤ z ≤ 5  0.3   0.2  1 if z > 5  0.1 0 −6 −4 −2 0 2 4 6 3 Langseth, Nielsen, Rumí and Salmerón Parameter estimation in MTEs
Background Mixtures of Truncated Exponentials The MTE model Definition (Univariate MTE potential over a continuous variable) Let Z be a continuous variable. A function f : Ω Z �→ R + 0 is an MTE potential over Z If 1 m � f ( z ) = a 0 + a i exp ( b i · z ) i =1 for all z ∈ Ω Z , where a i , b i are real numbers . . . or there is a partition of Ω Z into intervals I 1 , . . . , I k s.t. 2 f is defined as above on each I j . Generalization to arbitrary hybrid domains (Moral et al. 2001) The definition transfers to multivariate domains containing both continuous and discrete variables. 4 Langseth, Nielsen, Rumí and Salmerón Parameter estimation in MTEs
Learning MTEs from data Outline Background 1 Motivation Mixtures of Truncated Exponentials Learning MTEs from data 2 Background Maximum likelihood estimation in MTEs Constrained optimisation and Lagrange multipliers The Newton-Raphson method The initialisation procedure Model selection 3 Locating splitpoints Determining model complexity Conclusions 4 5 Langseth, Nielsen, Rumí and Salmerón Parameter estimation in MTEs
Learning MTEs from data Background Learning MTEs from data 15 10 5 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 The MTE learning problem How to find the MTE-distribution that generated this data? 6 Langseth, Nielsen, Rumí and Salmerón Parameter estimation in MTEs
Learning MTEs from data Background Learning MTEs from data The learning task involves three basic steps: Determine the intervals into which Ω Z will be partitioned. 1 Determine the number of exponential terms in the 2 mixture for each interval. Estimate the parameters . 3 Simplifying assumptions In this work we are concerned with the univariate case. For simplicity we will initially assume that: The intervals into which Ω Z will be partitioned is known; The number of exponential terms in the mixture for each interval is fixed to 2 , giving target density f ( z ) = k + a · exp( b · z ) + c · exp( d · z ) . 7 Langseth, Nielsen, Rumí and Salmerón Parameter estimation in MTEs
Learning MTEs from data Background Learning MTEs from data The learning task involves three basic steps: Determine the intervals into which Ω Z will be partitioned. 1 Determine the number of exponential terms in the 2 mixture for each interval. Estimate the parameters . 3 Simplifying assumptions In this work we are concerned with the univariate case. For simplicity we will initially assume that: The intervals into which Ω Z will be partitioned is known; The number of exponential terms in the mixture for each interval is fixed to 2 , giving target density f ( z ) = k + a · exp( b · z ) + c · exp( d · z ) . 7 Langseth, Nielsen, Rumí and Salmerón Parameter estimation in MTEs
Learning MTEs from data Maximum likelihood estimation in MTEs Learning MTEs from data by Maximum Likelihood Why learn MTEs using Maximum Likelihood? Well developed core theory, incl. good asymptotic properties under regularity conditions. ML parameters give access to a variety of model estimation procedures: LRT or BIC for selecting no. exponential terms; Likelihood maximisation to locate split-points. Problems The likelihood equations cannot be solved analytically. Identifiability or parameters. 8 Langseth, Nielsen, Rumí and Salmerón Parameter estimation in MTEs
Learning MTEs from data Maximum likelihood estimation in MTEs Initial observations We will assume target density f ( z | θ j ) = k j + a j · exp( b j · z ) + c j · exp( d j · z ) , z ∈ I j for interval I j ; θ j = { k j , a j , b j , c j , d j } . Denote by n j the no. observations from interval I j and let j n j . Then the ML solution ˆ N = � θ j must satisfy � f ( z | ˆ θ j ) dz = n j /N. (1) z ∈ I j Parameter independence ˆ θ k can be found independently of ˆ θ l as long as Equation (1) is satisfied for all ˆ θ j . We will therefore look at a single interval I from now on (and drop the index j when appropriate). 9 Langseth, Nielsen, Rumí and Salmerón Parameter estimation in MTEs
Learning MTEs from data Constrained optimisation and Lagrange multipliers Constrained optimisation � � Maximize log L ( θ | z ) = log L ( θ | z i ) = log f ( z i | θ ) i : z i ∈ I i : z i ∈ I � Subject to f ( z | θ ) dz − n/N = 0 , z ∈ I f ( e 1 | θ ) ≥ 0 , f ( e 2 | θ ) ≥ 0 . 10 Langseth, Nielsen, Rumí and Salmerón Parameter estimation in MTEs
Learning MTEs from data Constrained optimisation and Lagrange multipliers Constrained optimisation � � Maximize log L ( θ | z ) = log L ( θ | z i ) = log f ( z i | θ ) i : z i ∈ I i : z i ∈ I � Subject to f ( z | θ ) dz − n/N = 0 , z ∈ I f ( e 1 | θ ) − s 2 1 = 0 , f ( e 2 | θ ) − s 2 2 = 0 . 10 Langseth, Nielsen, Rumí and Salmerón Parameter estimation in MTEs
Learning MTEs from data Constrained optimisation and Lagrange multipliers Constrained optimisation � � Maximize log L ( θ | z ) = log L ( θ | z i ) = log f ( z i | θ ) i : z i ∈ I i : z i ∈ I � Subject to f ( z | θ ) dz − n/N = 0 , z ∈ I f ( e 1 | θ ) − s 2 1 = 0 , f ( e 2 | θ ) − s 2 2 = 0 . Notation: φ = [ θ T s T ] T , ψ = [ θ T s T λ T ] T = [ φ T λ T ] T , � g 0 ( φ ) = z ∈ I f ( z | θ ) dz − n/N , g 1 ( φ ) = f ( e 1 | θ ) − s 2 1 ; g 2 ( φ ) = f ( e 2 | θ ) − s 2 2 . Lagrange multipliers ψ (log L ( θ | z ) + λ T g ( φ )) to solve the Find the root of ∇ constrained optimisation problem. 10 Langseth, Nielsen, Rumí and Salmerón Parameter estimation in MTEs
Learning MTEs from data The Newton-Raphson method The Newton-Raphson method Example: Find x s.t. h ( x ) = 0 . 11 Langseth, Nielsen, Rumí and Salmerón Parameter estimation in MTEs
Learning MTEs from data The Newton-Raphson method The Newton-Raphson method Example: Find x s.t. h ( x ) = 0 . Initial “guess”: x = x 0 ; approximate h ( x ) by its tangent in x 0 . 11 Langseth, Nielsen, Rumí and Salmerón Parameter estimation in MTEs
Learning MTEs from data The Newton-Raphson method The Newton-Raphson method Example: Find x s.t. h ( x ) = 0 . New “guess” x 1 : The point where tangent crosses abscissa. 11 Langseth, Nielsen, Rumí and Salmerón Parameter estimation in MTEs
Learning MTEs from data The Newton-Raphson method The Newton-Raphson method Example: Find x s.t. h ( x ) = 0 . Iterate using general formula x t +1 ← x t − { h ′ ( x t ) } − 1 · h ( x t ) . 11 Langseth, Nielsen, Rumí and Salmerón Parameter estimation in MTEs
Learning MTEs from data The Newton-Raphson method The Lagrange Multipliers method Maximise likelihood given constraints Use the multivariate Newton-Raphson method to solve A ( ψ | z ) ≡ ∇ ψ (log L ( θ | z ) + λ T g ( φ )) = 0 : ψ t +1 ← ψ t − J ( A ( ψ t | z )) − 1 · A ( ψ t | z ) . Initialisation of Newton-Raphson: � T �� � Choose θ 0 “randomly” giving s 0 = f ( e 1 | θ 0 ) f ( e 2 | θ 0 ) λ 0 = [1 1] T (chosen rather arbitrarily). 12 Langseth, Nielsen, Rumí and Salmerón Parameter estimation in MTEs
Recommend
More recommend