quantitative genomics and genetics btry 4830 6830 pbsb
play

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB.5201.01 - PowerPoint PPT Presentation

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB.5201.01 Lecture 7: Maximum likelihood estimators and estimation topics Jason Mezey jgm45@cornell.edu Feb. 23, 2016 (T) 8:40-9:55 Announcements Homework #2 is graded (!!) key will be


  1. Quantitative Genomics and Genetics BTRY 4830/6830; PBSB.5201.01 Lecture 7: Maximum likelihood estimators and estimation topics Jason Mezey jgm45@cornell.edu Feb. 23, 2016 (T) 8:40-9:55

  2. Announcements • Homework #2 is graded (!!) key will be posted (available in computer lab) • Reminder: Homework #3 is due 11:59 PM, Fri. (Homework #4 available ~week from today) • Check out class site later today (lecture slide updates, videos, updated schedule, homework key, new supplemental material, etc.)

  3. Summary of lecture 7 • Last lecture, we discussed estimators and began our discussion of maximum likelihood estimators (MLE’s) • Today, we will continue our discussion of MLE’s

  4. Conceptual Overview System Experiment Question Sample s l Inference e d o M . b o r P Statistics Assumptions

  5. Estimators ∈ ˆ θ ∈ Θ θ = T ( x ) , P Pr ( T ( X ) | θ ) → [ X 1 = x 1 , ..., X n = x n ] , Pr ([ X 1 = x 1 , ..., X n = x n ]) Sampling Distribution Sample of size n X = x , Pr ( X ) X = Random Variable x Pr ( F ) X ( ω ) , ω ∈ Ω Experiment Ω F

  6. Review of essential concepts 1 • System - a process, an object, etc. which we would like to know something about • Experiment - a manipulation or measurement of a system that produces an outcome we can observe • Experimental trial - one instance of an experiment • Sample Space ( ) - a set comprising all possible outcomes associated with Ω an experiment • Sigma Algebra ( ) - a collection of all events of a sample space F • Probability measure (=function) Pr( ) - maps sigma algebra to the F reals (Axioms of probability!) • Random variable / vector ( X )- real valued function on the sample space • Sampling Distribution - probability distribution function of the sample (represents the probability of every sample under given assumptions, e.g., iid): • Parameterized probability model - a family of probability models at θ indexed by constant(s) (=parameter) belonging to probability model “family”

  7. Review of essential concepts II • Inference - the process of reaching a conclusion about the true probability distribution (from an assumed family of probability distributions indexed by parameters) on the basis of a sample • Sample - repeated observations of a random variable X , generated by experimental trials (= random vector!) x = [ x 1 , x 2 , ..., x n ] • Sampling distribution - probability distribution on the sample random vector (usually assume i.i.d.!!): Pr ([ X 1 , X 2 , ..., X n ]) Pr ( X 1 = x 1 ) = Pr ( X 2 = x 2 ) = ... = Pr ( X n = x n ) Pr ( X = x ) = Pr ( X 1 = x 1 ) Pr ( X 2 = x 2 ) ...Pr ( X n = x n ) • Statistic - a function on a sample: T ( x ) = T ([ x 1 , x 2 , ..., x n ]) = t • Statistic sampling distribution - probability distribution on the statistic: Pr ( T ( X )) specific val • e T ( x ) = ˆ Estimator - a statistic defined to return a value that represents θ . our best evidence for being the true value of a parameter • Pr (ˆ Estimator probability distribution - probability distribution of estimator: θ )

  8. Review of estimators I • Estimator - a statistic defined to return a value that represents our best evidence for being the true value of a parameter specific val • e T ( x ) = ˆ In such a case, our statistic is an estimator of the parameter: θ . • Note that ANY statistic on a sample can in theory be an estimator. • However, we generally define estimators (=statistics) in such a way that it returns a reasonable or “good” estimator of the true parameter value under a variety of conditions • How we assess how “good” an estimator depends on our criteria for assessing “good” and our underlying assumptions

  9. Review of estimators II • Since our underlying probability model induces a probability distribution on a statistic, and an estimator is just a statistic, there is an underlying probability distribution on an estimator: on Pr ( T ( X = x )) = Pr (ˆ θ ), original random variable • Our estimator takes in a vector as input (the sample) and may be defined to output a single value or a vector of estimates: at is a vector valued function on h i s T ( X = x ) = ˆ θ 1 , ˆ ˆ θ = θ 2 , ... . • We cannot define a statistic that always outputs the true value of the parameter for every possible sample (hence no perfect estimator!) • There are different ways to define “good” estimators and lots of ways to define “bad” estimators (examples?)

  10. Review of maximum likelihood estimators (MLE) • We will generally consider maximum likelihood estimators (MLE), which is one class of estimators, in this course • The critical point to remember is that an MLE is just an estimator (a function on a sample!!), • i.e. it takes a sample in, and produces a number as an output that is our estimate of the true parameter value • These estimators also have sampling distributions just like any other statistic! • The structure of this particular estimator / statistic is complicated but just keep this big picture in mind

  11. Review of Likelihood • To introduce MLE’s we first need the concept of likelihood • Recall that a probability distribution (of a r.v. or for our purposes now, a statistic) has fixed constants in the formula called parameters • The function is therefore takes different inputs of the statistic, where different sample inputs produce different outputs • For example, for a normally distributed random variable 2 πσ 2 e − ( x − µ )2 1 Pr ( X = x | µ, σ 2 ) = f X ( x | µ, σ 2 ) = 2 σ 2 √ 2 πσ 2 e − ( x − µ )2 1 ⇧ ⇤ Pr ( µ, σ 2 | X = x ) = L ( µ, σ 2 | X = x ) = 2 σ 2 √ • Likelihood - a probability function which we consider to be a function of the parameters for a fixed sample • Likelihoods have the structure of probability functions but they are NOT probability functions, e.g. they are functions of parameters and they are used for estimation • Intuitively, a probability function represents the frequency at which a specific realization of T ( X ) will occur, while a likelihood is our supposition that the true value of the parameter (that determines the probability of the values of X and T ( X )!) is a specific value

  12. Normal model example I • As an example, for our heights experiment / identity random variable, the (marginal) probability of a single observation in our sample is x i is: 2 πσ 2 e − ( xi − µ )2 1 Pr ( X i = x i | µ, σ 2 ) = f X i ( x i | µ, σ 2 ) = 2 σ 2 √ • ⇧ ⇤ The joint probability distribution of the entire sample of n observations is a multivariate (n-variate) normal distribution • Note that for an i.i.d. sample, we may use the property of independence Pr ( X = x ) = Pr ( X 1 = x 1 ) Pr ( X 2 = x 2 ) ...Pr ( X n = x n ) to write pdf of this entire sample as follow: n − ( xi − µ )2 1 ⇤ P ( X = x | µ, σ 2 ) = 2 πσ 2 e 2 σ 2 √ • i =1 The likelihood is therefore: n − ( xi − µ )2 1 Y L ( µ, σ 2 | X = x ) = 2 πσ 2 e 2 σ 2 √ i =1

  13. Normal model example II ⇤ • Let’s consider a sample of size n=10 generated under a standard normal, i.e. X i ∼ N ( µ = 0 , σ 2 = 1) • So what does the likelihood for this sample “look” like? It is actually a 3-D σ 2 plot where the x and y axes are and and the z-axis is the likelihood: x | µ, n − ( xi − µ )2 1 Y L ( µ, σ 2 | X = x ) = ⇤ 2 πσ 2 e 2 σ 2 √ i =1 • Since this makes it tough to see what is going on, let’s set just look at the σ 2 = 1 marginal likelihood for when using the sample above:

  14. Introduction to MLE’s • A maximum likelihood estimator (MLE) has the following definition: MLE (ˆ θ ) = ˆ θ = argmax θ ∈ Θ L ( θ | x ) • Again, recall that this statistic still takes in a sample and outputs a value that is our estimator (!!) • Sometimes these estimators have nice forms (equations) that we can write out and sometimes they do not • For example the maximum likelihood estimator of our single coin example: n p ) = 1 ⇥ MLE (ˆ x i n � ⇥ i =1 • And for our heights example: n n X = 1 σ 2 ) = 1 µ ) = ¯ ⇤ X ( x i − x ) 2 MLE (ˆ MLE (ˆ x i n n i =1 i

  15. Getting to the MLE • To use a likelihood function to extract the MLE, we have to find the maximum of the likelihood function for our observed sample ln [ L ( θ | x )] • To do this, we take the derivative of the likelihood function and set it equal to zero (why?) • Note that in practice, before we take the derivative and set the function equal to zero, we often transform the likelihood by the natural log ( ln ) to produce the log-likelihood: l ( θ | x ) = ln [ L ( θ | x )] • We do this because the likelihood and the log-likelihood have the same maximum and because it is often easier to work with the log-likelihood • Also note that the domain of the natural log function is limited to [0 , ∞ ) but likelihoods are never negative (consider the structure of probability!)

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend