regression
play

Regression Noise model and likelihood Given a dataset D = { x n , y - PowerPoint PPT Presentation

Linear Regression Regression Noise model and likelihood Given a dataset D = { x n , y n } S n =1 , where x n = { x n, 1 , . . . , x n,S } is S dimensional, fit parameters of a regressor f with added Gaussian noise: 0 , 2


  1. Linear Regression Regression Noise model and likelihood ◮ Given a dataset D = { x n , y n } S n =1 , where x n = { x n, 1 , . . . , x n,S } is S dimensional, fit parameters θ of a regressor f with added Gaussian noise: � � � 0 , σ 2 � p ( ǫ | σ 2 ) = N y n = f ( x n ; θ ) + ǫ n where ǫ . ◮ Equivalent likelihood formulation: N � � � � f ( x n ) , σ 2 � p ( y | X ) = N y n n =1 Christoph Lippert GWAS II: Linear models and Statistics Summer 2011 1

  2. Linear Regression Regression Choosing a regressor ◮ Choose f to be linear: N � � � � x n · θ + c, σ 2 � p ( y | X ) = N y n n =1 ◮ Consider bias free case, c = 0 , otherwise include an additional column of ones in each x n . Christoph Lippert GWAS II: Linear models and Statistics Summer 2011 2

  3. Linear Regression Regression Choosing a regressor ◮ Choose f to be linear: N � � � � x n · θ + c, σ 2 � p ( y | X ) = N y n n =1 ◮ Consider bias free case, c = 0 , otherwise include an additional column of ones in each x n . Equivalent graphical model Christoph Lippert GWAS II: Linear models and Statistics Summer 2011 2

  4. Linear Regression Linear Regression Maximum likelihood ◮ Taking the logarithm, we obtain N � � � � x n · θ , σ 2 � ln p ( y | θ σ 2 ) = ln N y n n =1 N = − N 1 � 2 ln 2 πσ 2 − ( y n − x n · θ ) 2 2 σ 2 n =1 � �� � Sum of squares ◮ The likelihood is maximized when the squared error is minimized. ◮ Least squares and maximum likelihood are equivalent. Christoph Lippert GWAS II: Linear models and Statistics Summer 2011 3

  5. Linear Regression Linear Regression Maximum likelihood ◮ Taking the logarithm, we obtain N � � � � x n · θ , σ 2 � ln p ( y | θ σ 2 ) = ln N y n n =1 N = − N 1 � 2 ln 2 πσ 2 − ( y n − x n · θ ) 2 2 σ 2 n =1 � �� � Sum of squares ◮ The likelihood is maximized when the squared error is minimized. ◮ Least squares and maximum likelihood are equivalent. Christoph Lippert GWAS II: Linear models and Statistics Summer 2011 3

  6. Linear Regression Linear Regression Maximum likelihood ◮ Taking the logarithm, we obtain N � � � � x n · θ , σ 2 � ln p ( y | θ σ 2 ) = ln N y n n =1 N = − N 1 � 2 ln 2 πσ 2 − ( y n − x n · θ ) 2 2 σ 2 n =1 � �� � Sum of squares ◮ The likelihood is maximized when the squared error is minimized. ◮ Least squares and maximum likelihood are equivalent. Christoph Lippert GWAS II: Linear models and Statistics Summer 2011 3

  7. Linear Regression Linear Regression and Least Squares y n y f ( x n , w ) x x n (C.M. Bishop, Pattern Recognition and Machine Learning) N E ( θ ) = 1 � ( y n − x n · θ ) 2 2 n =1 Christoph Lippert GWAS II: Linear models and Statistics Summer 2011 4

  8. Linear Regression Linear Regression and Least Squares ◮ Derivative w.r.t a single weight entry θ i � � N d d − 1 � ln p ( y | θ , σ 2 ) = ( y n − x n · θ ) 2 d θ i d θ i 2 σ 2 n =1 ◮ Set gradient w.r.t. θ to zero Christoph Lippert GWAS II: Linear models and Statistics Summer 2011 5

  9. Linear Regression Linear Regression and Least Squares ◮ Derivative w.r.t a single weight entry θ i � � N d d − 1 � ln p ( y | θ , σ 2 ) = ( y n − x n · θ ) 2 d θ i d θ i 2 σ 2 n =1 ◮ Set gradient w.r.t. θ to zero Christoph Lippert GWAS II: Linear models and Statistics Summer 2011 5

  10. Linear Regression Linear Regression and Least Squares ◮ Derivative w.r.t a single weight entry θ i � � N d d − 1 � ln p ( y | θ , σ 2 ) = ( y n − x n · θ ) 2 d θ i d θ i 2 σ 2 n =1 ◮ Set gradient w.r.t. θ to zero N ∇ θ ln p ( y | θ , σ 2 ) = 1 � ( y n − x n · θ ) x T n = 0 σ 2 n =1 ⇒ θ ML =? = Christoph Lippert GWAS II: Linear models and Statistics Summer 2011 5

  11. Linear Regression Linear Regression and Least Squares ◮ Derivative w.r.t. a single weight entry θ i � � N d d − 1 � ln p ( y | θ , σ 2 ) = ( y n − x n · θ ) 2 d θ i d θ i 2 σ 2 n =1 N = 1 � ( y n − x n · θ ) x i σ 2 n =1 ◮ Set gradient w.r.t. θ to zero N ∇ θ ln p ( y | θ , σ 2 ) = 1 � ( y n − x n · θ ) x T n = 0 σ 2 n =1 ⇒ θ ML = ( X T X ) − 1 X T = y � �� � Pseudo inverse   x 1 , 1 . . . x 1 , S ◮ Here, the matrix X is defined as X = . . . . . . . . .   x N, 1 . . . x N,S Christoph Lippert GWAS II: Linear models and Statistics Summer 2011 6

  12. Linear Regression Linear Regression and Least Squares ◮ Derivative w.r.t. a single weight entry θ i � � N d d − 1 � ln p ( y | θ , σ 2 ) = ( y n − x n · θ ) 2 d θ i d θ i 2 σ 2 n =1 N = 1 � ( y n − x n · θ ) x i σ 2 n =1 ◮ Set gradient w.r.t. θ to zero N ∇ θ ln p ( y | θ , σ 2 ) = 1 � ( y n − x n · θ ) x T n = 0 σ 2 n =1 ⇒ θ ML = ( X T X ) − 1 X T = y � �� � Pseudo inverse   x 1 , 1 . . . x 1 , S ◮ Here, the matrix X is defined as X = . . . . . . . . .   x N, 1 . . . x N,S Christoph Lippert GWAS II: Linear models and Statistics Summer 2011 6

  13. Linear Regression Linear Regression and Least Squares ◮ Derivative w.r.t. a single weight entry θ i � � N d d − 1 � ln p ( y | θ , σ 2 ) = ( y n − x n · θ ) 2 d θ i d θ i 2 σ 2 n =1 N = 1 � ( y n − x n · θ ) x i σ 2 n =1 ◮ Set gradient w.r.t. θ to zero N ∇ θ ln p ( y | θ , σ 2 ) = 1 � ( y n − x n · θ ) x T n = 0 σ 2 n =1 ⇒ θ ML = ( X T X ) − 1 X T = y � �� � Pseudo inverse   x 1 , 1 . . . x 1 , S ◮ Here, the matrix X is defined as X = . . . . . . . . .   x N, 1 . . . x N,S Christoph Lippert GWAS II: Linear models and Statistics Summer 2011 6

  14. Hypothesis Testing Hypothesis Testing Example: ◮ Given a sample D = { x 1 , . . . , x N } . ◮ Test whether H 0 : θ s = 0 (null hypothesis) or H 1 : θ s � = 0 (alternative hypothesis) is true. ◮ To show that θ s � = 0 we can perform a statistical test that tries to reject H 0 . ◮ type 1 error: H 0 is rejected but does hold. ◮ type 2 error: H 0 is accepted but does not hold. Christoph Lippert GWAS II: Linear models and Statistics Summer 2011 7

  15. Hypothesis Testing Hypothesis Testing Example: ◮ Given a sample D = { x 1 , . . . , x N } . ◮ Test whether H 0 : θ s = 0 (null hypothesis) or H 1 : θ s � = 0 (alternative hypothesis) is true. ◮ To show that θ s � = 0 we can perform a statistical test that tries to reject H 0 . ◮ type 1 error: H 0 is rejected but does hold. ◮ type 2 error: H 0 is accepted but does not hold. Christoph Lippert GWAS II: Linear models and Statistics Summer 2011 7

  16. Hypothesis Testing Hypothesis Testing Example: ◮ Given a sample D = { x 1 , . . . , x N } . ◮ Test whether H 0 : θ s = 0 (null hypothesis) or H 1 : θ s � = 0 (alternative hypothesis) is true. ◮ To show that θ s � = 0 we can perform a statistical test that tries to reject H 0 . ◮ type 1 error: H 0 is rejected but does hold. ◮ type 2 error: H 0 is accepted but does not hold. Christoph Lippert GWAS II: Linear models and Statistics Summer 2011 7

  17. Hypothesis Testing Hypothesis Testing Example: ◮ Given a sample D = { x 1 , . . . , x N } . ◮ Test whether H 0 : θ s = 0 (null hypothesis) or H 1 : θ s � = 0 (alternative hypothesis) is true. H 0 holds H 0 doesn’t hold H 0 accepted true negatives false negatives ◮ To show that θ s � = 0 we can type-2 error H 0 rejected false positives true positives perform a statistical test that type-1 error tries to reject H 0 . ◮ type 1 error: H 0 is rejected but does hold. ◮ type 2 error: H 0 is accepted but does not hold. Christoph Lippert GWAS II: Linear models and Statistics Summer 2011 7

  18. Hypothesis Testing Hypothesis Testing Example: ◮ Given a sample D = { x 1 , . . . , x N } . ◮ Test whether H 0 : θ s = 0 (null hypothesis) or H 1 : θ s � = 0 (alternative hypothesis) is true. H 0 holds H 0 doesn’t hold H 0 accepted true negatives false negatives ◮ To show that θ s � = 0 we can type-2 error H 0 rejected false positives true positives perform a statistical test that type-1 error tries to reject H 0 . ◮ type 1 error: H 0 is rejected but does hold. ◮ type 2 error: H 0 is accepted but does not hold. Christoph Lippert GWAS II: Linear models and Statistics Summer 2011 7

  19. Hypothesis Testing Hypothesis Testing ◮ Given a sample D = { x 1 , . . . , x N } . ◮ Test whether H 0 : θ s = 0 (null hypothesis) or H 1 : θ s � = 0 (alternative hypothesis) is true. ◮ The significance level α defines the threshold and the sensitivity of the test. This equals the probability of a type-1 error. ◮ Usually decision is based on a test statistic . ◮ The critical region defines the values of the test statistic that lead to a rejection of the test. Christoph Lippert GWAS II: Linear models and Statistics Summer 2011 8

  20. Hypothesis Testing Hypothesis Testing ◮ Given a sample D = { x 1 , . . . , x N } . ◮ Test whether H 0 : θ s = 0 (null hypothesis) or H 1 : θ s � = 0 (alternative hypothesis) is true. ◮ The significance level α defines the threshold and the sensitivity of the test. This equals the probability of a type-1 error. ◮ Usually decision is based on a test statistic . ◮ The critical region defines the values of the test statistic that lead to a rejection of the test. Christoph Lippert GWAS II: Linear models and Statistics Summer 2011 8

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend