tutorial 2
play

Tutorial 2 Monday 8 th August, 2016 Problem 1. Case for non-IID - PDF document

Tutorial 2 Monday 8 th August, 2016 Problem 1. Case for non-IID dataset: In the class, we discussed the case of Bayesian estimation for a univariate Gaussian from dataset D that consisted of IID (independent and identically distributed)


  1. Tutorial 2 Monday 8 th August, 2016 Problem 1. Case for non-IID dataset: In the class, we discussed the case of Bayesian estimation for a univariate Gaussian from dataset D that consisted of IID (independent and identically distributed) observations. • Let Pr( X ) ∼ N ( µ, σ 2 ) and let the data D = x 1 ...x m be IID. Let σ 2 be known. m m � � • µ MLE = 1 x i and σ MLE = 1 ( x i − µ ) 2 m m i =1 i =1 • The conjugate prior is Pr( µ ) = N ( µ 0 , σ 2 0 ), And the posterior is: Pr( µ | x 1 ...x m ) = N ( µ m , σ 2 m ) such that σ 2 mσ 2 µ ML ) and 1 = 1 + m 0 • µ m = ( 0 + σ 2 µ 0 ) + ( 0 + σ 2 ˆ mσ 2 mσ 2 σ 2 σ 2 σ 2 0 m Prove the above Answer: We have already done this in the class: https://www.cse.iitb.ac.in/ ~cs725/notes/lecture-slides/lecture-06-unannotated.pdf ). Now suppose, the examples x 1 ...x m in the dataset D were not necessarily independent and whose possible dependence was expressed by known covariance matrix Ω but with a common unknown (to be estimated) mean µ ∈ � . Let u = [1 , 1 , . . . 1] a m − dimensional vector of 1’s and x = [ x 1 ...x m ] and 1 2 ( x − µ u ) T Ω − 1 ( x − µ u ) 2 e − 1 Pr ( x 1 ...x m ; µ, Ω) = m 1 2 | Ω | (2 π ) Assume that Ω ∈ � m × m is positive-definite. Now answer the following questions 1. What would be the maximum likelihood estimate for µ ? Answer: This would correspond to MLE estimate for a multivariate Gaussian but with a single data point. Additionally, the restriction is that the mean vector is of the form µ u : We have already seen that maximizing a monotonically increasing transformation of the objective should yield the same point of optimality (and proved the same in this tutorial). So taking logs of the likelihood gives us the log likelihood above: 1

  2. − 1 2( x − µ u ) T Ω − 1 ( x − µ u ) µ MLE = argmax µ Setting the derivative with respect to µ to 0: d � − 1 � 2( x T Ω − 1 x − 2 µ x T Ω − 1 u + µ 2 u T Ω − 1 u ) x T Ω − 1 u − µ u T Ω − 1 u � � = = 0 dµ ⇒ µ MLE = x T Ω − 1 u u T Ω − 1 u 2. How would you go about doing Bayesian estimation for µ ? 3. What will be an appropriate conjugate prior? 4. What will the posterior be? And what will be the MAP and Bayes estimates? As hinted in the class, we will expect the conjugate prior of Answers to 2, 3 and 4: mean µ of the (product of) Gaussian to be Gaussian. Let µ ∼ N ( µ 0 , σ 2 0 ) with a fixed and known σ 2 0 . � m ( µ − µ m ) 2 � − 1 N ( µ m , σ 2 m ) = exp = Pr( µ |D ) ∝ Pr( D| µ ) Pr( µ ) = 2 σ 2 2( x − µ u ) T Ω − 1 ( x − µ u ) − ( µ − µ 0 ) 2 � − 1 � � − 1 2( x − µ u ) T Ω − 1 ( x − µ u ) − ( 1 √ 1 ∝ exp 0 exp m 1 2 σ 2 2 πσ 2 2 | Ω | (2 π ) 2 0 Our reference equality is: � − 1 2( x T Ω − 1 x − 2 µ x T Ω − 1 u + µ 2 u T Ω − 1 u ) − ( µ − µ 0 ) 2 � � � − 1 ( µ − µ m ) 2 exp = exp 2 σ 2 2 σ 2 0 m Matching coefficients of µ 2 , we get 0 ⇒ 1 = 1 − µ 2 2 µ 2 u T Ω − 1 u + − µ 2 m = − 1 + u T Ω − 1 u 2 σ 2 2 σ 2 σ 2 σ 2 m 0 Matching coefficients of µ , we get � � � � 2 µµ m x T Ω − 1 u + 2 µ 0 x T Ω − 1 u + µ 0 1 ⇒ µ m = σ 2 � σ 2 0 x T Ω − 1 u + µ 0 � m = µ ⇒ 0 u T Ω − 1 u 2 σ 2 2 σ 2 m σ 2 1+ σ 2 0 0 µ m will be the MAP estimate of µ . HOMEWORK: What about the special cases of Ω being diagonal matrices with the same or different values along the diagonal? Problem 2. We discussed atleast two settings where maximizing a monotonically increasing function of the objective is somewhat more intuitive than maximizing the original objective. Recall the two settings. Now prove that maximizing the monotonically increasing trans- formation of the objective gives the same optimality point as does maximizing the original objective. 2

  3. Answer: We will prove by contradiction. Let O ( θ ) be the objective function being maximized. Let θ ∗ = argmax O ( θ ). Let f ( β ) be a monotonically increasing function. Let θ θ � = θ ∗ and f ( O (ˆ ˆ f ( O ( θ )) such that ˆ θ )) > f ( O ( θ ∗ )). Since f is a monotonically θ = argmax θ increasing function of its arguments, it must be that O (ˆ θ ) > O ( θ ∗ ). Which is a contradiction, since we had θ ∗ = argmax θ = θ ∗ OR f ( O (ˆ O ( θ ). Thus either, it must be that ˆ θ )) = f ( O ( θ ∗ )). θ 3

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend