Maximum likelihood and EM algorithm (after the Chapter 8) Pasha - PowerPoint PPT Presentation

1 Maximum likelihood and EM algorithm (after the Chapter 8) Pasha Zusmanovich, deCODE Statistics Colloquium March 30, 2007

2 What is likelihood and what it is good for? Likelihood is just a conditional probability. Formal definition Given random events A and B , the likelihood function of A relative to B is: { set of states of B } → [0 , 1] x �→ Pr ( A | B = x ) . Nothing fancy so far. Consider an ...

3 What is likelihood and what it is good for? Example: alleles and genotypes frequencies of alleles: a : θ A : 1 − θ

3 What is likelihood and what it is good for? Example: alleles and genotypes frequencies of genotypes: frequencies of alleles: aa : θ 2 a : θ = ⇒ aA : 2 θ (1 − θ ) A : 1 − θ AA : (1 − θ ) 2

3 What is likelihood and what it is good for? Example: alleles and genotypes frequencies of genotypes: numbers: frequencies of alleles: aa : θ 2 n aa a : θ = ⇒ aA : 2 θ (1 − θ ) n aA A : 1 − θ AA : (1 − θ ) 2 n AA The probability that numbers of genotypes would be exactly ( n aa , n aA , n AA ): f ( θ ) = ( n aa + n aA + n AA )! θ 2 n aa (2 θ (1 − θ )) n aA (1 − θ ) 2 n AA n aa ! n aA ! n AA ! f is a likelihood function: { probability of alleles } → { conditional probability of genotypes assuming given probability of alleles } .

3 What is likelihood and what it is good for? Example: alleles and genotypes frequencies of genotypes: numbers: frequencies of alleles: aa : θ 2 n aa a : θ = ⇒ aA : 2 θ (1 − θ ) n aA A : 1 − θ AA : (1 − θ ) 2 n AA The probability that numbers of genotypes would be exactly ( n aa , n aA , n AA ): f ( θ ) = ( n aa + n aA + n AA )! θ 2 n aa (2 θ (1 − θ )) n aA (1 − θ ) 2 n AA n aa ! n aA ! n AA ! f is a likelihood function: { probability of alleles } → { conditional probability of genotypes assuming given probability of alleles } . This is a model with parameter θ . Question : Which parameter makes model the “best”? Answer ...

4 What is likelihood and what it is good for? Example: alleles and genotypes (continued) Question : Which parameter makes model the “best”? Answer : Those which makes the observed data more likely, i.e. which maximizes f ( θ ) = ( n aa + n aA + n AA )! θ 2 n aa (2 θ (1 − θ )) n aA (1 − θ ) 2 n AA n aa ! n aA ! n AA ! on [0 , 1]. Solution : 2 n aa + n aA ˆ θ = 2( n aa + n aA + n AA ) .

4 What is likelihood and what it is good for? Example: alleles and genotypes (continued) Question : Which parameter makes model the “best”? Answer : Those which makes the observed data more likely, i.e. which maximizes f ( θ ) = ( n aa + n aA + n AA )! θ 2 n aa (2 θ (1 − θ )) n aA (1 − θ ) 2 n AA n aa ! n aA ! n AA ! on [0 , 1]. Solution : 2 n aa + n aA ˆ θ = 2( n aa + n aA + n AA ) . But this is exactly the Hardy-Weinberg equilibrium!

5 What is likelihood and what it is good for? Another example: linear regression Fitting a line to the set of points on the plane { ( x 1 , y 1 ) , . . . , ( x n , y n ) } , assuming observations are independent, and errors are normally distributed. The model is: ε ∼ N (0 , σ 2 ) . Y = β 1 X + β 0 + ε, What is the “probability” to have the observed data under the given model?

5 What is likelihood and what it is good for? Another example: linear regression Fitting a line to the set of points on the plane { ( x 1 , y 1 ) , . . . , ( x n , y n ) } , assuming observations are independent, and errors are normally distributed. The model is: ε ∼ N (0 , σ 2 ) . Y = β 1 X + β 0 + ε, What is the “probability” to have the observed data under the given model? P ( Y lies in δ -neighbourhood of y i | X = x i ) ≈ density( Y ) | X = x i , Y = y i · 2 δ, so “probability” is replaced by density. If X is fixed, Y − β 1 X − β 0 ∼ N (0 , σ 2 ) ⇒ Y ∼ N ( β 1 X + β 0 , σ 2 ) .

6 What is likelihood and what it is good for? Another example: linear regression (continued) Maximizing n − ( β 1 x i + β 0 − y i ) 2 1 � � � density( Y ) | X = x i , Y = y i = √ exp 2 σ 2 2 πσ i =1 n 1 1 � n exp � ( β 1 x i + β 0 − y i ) 2 � � � = √ − 2 σ 2 2 πσ i =1 is equivalent to minimizing n � ( β 1 x i + β 0 − y i ) 2 . i =1

6 What is likelihood and what it is good for? Another example: linear regression (continued) Maximizing n − ( β 1 x i + β 0 − y i ) 2 1 � � � density( Y ) | X = x i , Y = y i = √ exp 2 σ 2 2 πσ i =1 n 1 1 � n exp � ( β 1 x i + β 0 − y i ) 2 � � � = √ − 2 σ 2 2 πσ i =1 is equivalent to minimizing n � ( β 1 x i + β 0 − y i ) 2 . i =1 But this is exactly the least squares!

7 What is likelihood and what it is good for? Refined formal definition Assuming a random variable X has a density function f ( x , θ ) parametrized by θ , the likelihood function is: θ �→ f ( x , θ ) . “Conceptual” definition Likelihood is the probability of observed data under the given model. Thus, the maximum likelihood correspond to the model (in the given parametrized class of models) which makes the observerd data “most likely”. One usually maximize log f ( x , θ ) instead of f ( x , θ ) ( log-likelihood function ). Ok, since log is monotonic. But ...

8 Why logarithm? ◮ Turns multiplicative things to additive. ◮ Diminishes the “long tail”.

8 Why logarithm? ◮ Turns multiplicative things to additive. In most cases on practice, the likelihood function is the product of several functions. E.g., if X 1 , . . . , X n are independent random variables, then their likelihood function: f ( x 1 , . . . , x n , θ ) = f ( x 1 , θ ) . . . f ( x n , θ ) , so logarithm turns multiplicative things to additive and easier to deal with. (And logarithm is the only “good” function taking multiplication to addition). ◮ Diminishes the “long tail”.

8 Why logarithm? ◮ Turns multiplicative things to additive. In most cases on practice, the likelihood function is the product of several functions. E.g., if X 1 , . . . , X n are independent random variables, then their likelihood function: f ( x 1 , . . . , x n , θ ) = f ( x 1 , θ ) . . . f ( x n , θ ) , so logarithm turns multiplicative things to additive and easier to deal with. (And logarithm is the only “good” function taking multiplication to addition). ◮ Diminishes the “long tail”. A random variable with values in R + (say, results of a measurement) tends to have a skewed distribution to the right because there is lower limit but not upper limit. Passing to log diminishes this skewness.

9 What is likelihood and what it is good for? Maximum likelihood behaves nicely asymtotically Taylor series: θ ) + 1 ℓ ( θ ) = ℓ (ˆ 2( θ − ˆ θ ) 2 ℓ ′′ (ˆ θ ) + . . . i ( θ ) = E ( − ℓ ′′ ( θ )) – Fisher information . ˆ θ ∼ N ( θ 0 , i ( θ 0 ) − 1 ) as number of samples → ∞ . Could be used to assess the precision of ˆ θ .

10 What is likelihood and what it is good for? Connection with some fancy areas of Mathematics Back to alleles and genotypes example: model with inbreeding coefficient λ : frequencies of genotypes: numbers: frequencies of alleles: aa : θ 2 + θ (1 − θ ) λ 38 a : θ aA : 2 θ (1 − θ )(1 − λ ) 95 A : 1 − θ AA : (1 − θ ) 2 + θ (1 − θ ) λ 53 (some real blood groups data from UK, 1947) Scoring equations are equivalent to:

10 What is likelihood and what it is good for? Connection with some fancy areas of Mathematics Back to alleles and genotypes example: model with inbreeding coefficient λ : frequencies of genotypes: numbers: frequencies of alleles: aa : θ 2 + θ (1 − θ ) λ 38 a : θ aA : 2 θ (1 − θ )(1 − λ ) 95 A : 1 − θ AA : (1 − θ ) 2 + θ (1 − θ ) λ 53 (some real blood groups data from UK, 1947) Scoring equations are equivalent to: 372 θ 3 λ 2 − 744 θ 3 λ − 558 θ 2 λ 2 +372 θ 3 +1131 θ 2 λ +186 θλ 2 − 573 θ 2 − 668 θλ + 201 θ + 148 λ = 0; 186 θ 2 λ 2 − 372 θ 2 λ − 186 θλ 2 +186 θ 2 +387 θλ − 201 θ − 148 λ +53 = 0 .

10 What is likelihood and what it is good for? Connection with some fancy areas of Mathematics Back to alleles and genotypes example: model with inbreeding coefficient λ : frequencies of genotypes: numbers: frequencies of alleles: aa : θ 2 + θ (1 − θ ) λ 38 a : θ aA : 2 θ (1 − θ )(1 − λ ) 95 A : 1 − θ AA : (1 − θ ) 2 + θ (1 − θ ) λ 53 (some real blood groups data from UK, 1947) Scoring equations are equivalent to: 372 θ 3 λ 2 − 744 θ 3 λ − 558 θ 2 λ 2 +372 θ 3 +1131 θ 2 λ +186 θλ 2 − 573 θ 2 − 668 θλ + 201 θ + 148 λ = 0; 186 θ 2 λ 2 − 372 θ 2 λ − 186 θλ 2 +186 θ 2 +387 θλ − 201 θ − 148 λ +53 = 0 . Statistics + Algebraic Geometry = Algebraic Statistics .

11 What is likelihood and what it is good for? Advantages (to summarize) ◮ Agrees with intuition. ◮ Confirmed by other methods. ◮ “Nice” asymptotic behavior. ◮ Very good practical results. ◮ Universal. ◮ Connection with other areas of Mathematics.

11 What is likelihood and what it is good for? Advantages (to summarize) ◮ Agrees with intuition. ◮ Confirmed by other methods. ◮ “Nice” asymptotic behavior. ◮ Very good practical results. ◮ Universal. ◮ Connection with other areas of Mathematics. Disadvantages ◮ No “theoretical” justification. ◮ Could be bad for small samples. ◮ No way to compare “disjoint” models. ◮ “Bayesian” issue ...

Maximum likelihood and EM algorithm (after the Chapter 8) Pasha - PowerPoint PPT Presentation

1 Maximum likelihood and EM algorithm (after the Chapter 8) Pasha Zusmanovich, deCODE Statistics Colloquium March 30, 2007 2 What is likelihood and what it is good for? Likelihood is just a conditional probability. Formal definition Given

Maximum Likelihood properties Maximum parsimony Maximum likelihood Experimental design

Chapter 8: Estimation In this chapter we will cover: 1. The likelihood and maximum likelihood

Phylogenetic trees IV Maximum Likelihood Gerhard Jger ESSLLI 2016 Gerhard Jger Maximum

Maximum Likelihood Estimation CS 446 Maximum likelihood: abstract formulation Weve had one

Maximum Likelihood Estimation CS 446 Maximum likelihood: abstract formulation Weve had one

Maximum likelihood models Tues. Feb. 27, 2018 1 Overview of today Informal notion of

Curve Fitting Re-visited, Bishop1.2.5 Maximum Likelihood Bishop 1.2.5 Model Likelihood

Maximum likelihood parameter estimation Maximum likelihood parameter estimation For an HMM

15-388/688 - Practical Data Science: Maximum likelihood estimation, nave Bayes J. Zico Kolter

Binary choice 3.3 Maximum likelihood estimation Michel Bierlaire Maximum likelihood

Binary choice 3.3 Maximum likelihood estimation Michel Bierlaire Output of the estimation

Phylogenetic trees IV Maximum Likelihood Gerhard Jger Words, Bones, Genes, Tools February 28,

MAXIMUM CARDS MAXIMUM CARDS What is a Maximum Card ? The Maximum Card is the one which contains a

Outline n Maximum likelihood (ML) n Priors, and maximum a posteriori (MAP) n

CS 287 Advanced Robotics (Fall 2019) Lecture 13: Kalman Smoother, Maximum A Posteriori, Maximum

Maximum Likelihood (ML), Expectation Maximization (EM) Pieter Abbeel UC Berkeley EECS Many

Fiv ive In Innovatio ions that has changed Shrim imp cult lture SPF/Nucleus Breeding;

Z ACZNIK 3 AUTHORS REVIEW OF T HE RESEARCH ACHIEVEMENTS AND PUBLICATIONS 1. Name and

Population Structure Population Structure Nonrandom Mating HWE assumes that mating is random in

2-D and 3-D Coordinates For M-Mers And Dynamic Graphics For Representing Associated Statistics

Proceedings, 10 th World Congress of Genetics Applied to Livestock Production How to teach Animal

Agriculture in China and Cooperative Prospect with Pakistan Prof. Li Huishang Agricultural

Introduction to Complex Genetics: Concepts and Tools: part A Andr G Uitterlinden Genetic

GENETICS OF DELI-AVROS DXP OIL PALM BREEDING POPULATIONS AT CONSOLIDATED PLANTATIONS, MALAYSIA