18.05 Exam 2 review problems with solutions Spring 2014 Jeremy Orloff - PDF document

18.05 Exam 2 review problems with solutions Spring 2014 Jeremy Orloff and Jonathan Bloom 1 Summary • Data: x 1 , . . . , x n • Basic statistics: sample mean, sample variance, sample median • Likelihood, maximum likelihood estimate (MLE) • Bayesian updating: prior, likelihood, posterior, predictive probability, probability in- tervals; prior and likelihood can be discrete or continuous • NHST: H 0 , H A , significance level, rejection region, power, type 1 and type 2 errors, p values. 2 Basic statistics Data : x 1 , . . . , x n . x 1 + . . . + x n sample mean = x ¯ = n n ( x i − x ¯) 2 2 i =1 sample variance = s = n − 1 sample median = middle value Example. Data: 1, 2, 3, 6, 8. 2 9+4+1+4+16 x ¯ = 4, s = = 8 . 5, median = 3. 4 3 Likelihood x = data θ = parameter of interest or hypotheses of interest Likelihood: p ( x | θ ) (discrete distribution) f ( x | θ ) (continuous distribution) 1

2 18.05 Exam 2 review problems with solutions Log likelihood : ln( p ( x | θ )) . ln( f ( x | θ )) . Likelihood examples. Find the likelihood function of each of the following. 1. Coin with probability of heads θ . Toss 10 times get 3 heads. 2. Wait time follows exp( λ ). In 5 independent trials wait 3,5,4,5,2 3. Usual 5 dice. Two independent rolls, 9, 5. (Likelihood given in a table) 4. Independent x 1 , . . . , x n ∼ N( µ, σ 2 ) 5. x = 6 drawn from uniform(0 , θ ) 6. x ∼ uniform(0 , θ ) Solutions. 10 θ 3 (1 − θ ) 7 . 1. Let x be the number of heads in 10 tosses. P ( x = 3 | θ ) = 3 2. f (data | λ ) = λ 5 e − λ (3+5+4+5+2) = λ 5 e − 19 λ 3. Hypothesis θ Likelihood P (data | θ ) 4sided 0 6sided 0 8sided 0 12sided 1/144 20sided 1/400 − [ ( x 1 − µ )2+( x 2 − µ )2+ ... +( xn − µ )2 ] n � � √ 1 4. f (data | µ, σ ) = e 2 σ 2 2 πσ � 0 if θ < 6 5. f ( x = 6 | θ ) = 1 /θ if 6 ≤ θ � 0 if θ < x or x < 0 6. f ( x | θ ) = 1 /θ if 0 ≤ x ≤ θ 3.1 Maximum likelihood estimates (MLE) Methods for finding the maximum likelihood estimate (MLE). • Discrete hypotheses: compute each likelihood • Discrete hypotheses: maximum is obvious • Continuous parameter: compute derivative (often use log likelihood) • Continuous parameter: maximum is obvious Examples. Find the MLE for each of the examples in the previous section.

3 18.05 Exam 2 review problems with solutions Solutions. 10 1. ln( f ( x − 3 | θ ) = ln + 3 ln( θ ) − 7 ln(1 − θ ). 3 3 7 3 ˆ Take the derivative and set to 0: + = 0 ⇒ θ = . θ 1 − θ 10 2. ln( f (data | λ ) = 5 ln( λ ) − 19 λ . 5 5 ˆ Take the derivative and set to 0: − 19 = 0 ⇒ λ = . λ 19 3. Read directly from the table: MLE = 12sided die. 4. For the exam do not focus on the calculation here. You should understand the idea that we need to set the partial derivatives with respect to µ and σ to 0 and solve for the critical ˆ 2 ). point (ˆ µ, σ ˆ) 2 ˆ ( x i − µ µ = x , σ 2 = The result is ˆ . n 5. Because of the term 1 /θ in the likelihood, the likelihood is at a maximum when θ is as ˆ small as possible. answer: : θ = 6. ˆ = x . 6. This is identical to problem 5 except the exact value of x is not given. answer: θ 4 Bayesian updating 4.1 Bayesian updating: discrete prior-discrete likelihood. Jon has 1 fourside, 2 sixsided, 2 eightsided, 2 twelve sided, and 1 twentysided dice. He picks one at random and rolls a 7. 1. For each type of die, find the posterior probability Jon chose that type. 2. What are the posterior odds Jon chose the 20sided die? 3. Compute the prior predictive probability of rolling a 7 on the first roll. 4. Compute the posterior predictive probability of rolling an 8 on the second roll. Solutions. 1. . Make a table. (We include columns to answer question 4.) Hypothesis Prior Likelihood Unnorm. posterior posterior likelihood unnorm. posterior θ P ( θ ) f ( x 1 = 7 | θ ) f ( θ | x 1 = 7) P ( x 2 = 8 | θ ) 4sided 1/8 0 0 0 0 0 6sided 1/4 0 0 0 0 0 8sided 1/4 1/8 1 / 32 1 / 32 c 1 / 8 1 / 256 c 12sided 1/4 1 / 12 1 / 48 1 / 48 c 1 / 12 1 / 576 c 20sided 1/8 1 / 20 1 / 160 1 / 160 c 1 / 20 1 / 3200 c c = 1 32 + 1 48 + 1 1 1 Total 160 The posterior probabilities are given in the 5th column of the table. The total probability 7 c = 120 is also the answer to problem 3.

4 18.05 Exam 2 review problems with solutions P (20-sided | x 1 =7) 1 / 160 c 1 / 160 96 3 2. Odds(20sided | x 1 = 7) = P (not 20-sided | x 1 =7) = 1 / 32 c +1 / 48 c = 5 / 96 = 800 = 25 . 3. P ( x 1 = 7) = c = 7 / 120. 1 1 1 49 4. See the last two columns in the table. P ( x 2 = 8 | x 1 = 7) = + + = 480 . 256 c 576 c 3200 c 4.2 Bayesian updating: conjugate priors. Beta prior, binomial likelihood Data: x ∼ binomial( n, θ ). θ is unknown. Prior: f ( θ ) ∼ beta( a, b ) Posterior: f ( θ | x ) ∼ beta( a + x, b + n − x ) 1. Suppose x ∼ binomial(30 , θ ), x = 12. If we have a prior f ( θ ) ∼ beta(1 , 1) find the posterior for θ . Beta prior, geometric likelihood Data: x Prior: f ( θ ) ∼ beta( a, b ) Posterior: f ( θ | x ) ∼ beta( a + x, b + 1). 2. Suppose x ∼ geometric( θ ), x = 6. If we have a prior f ( θ ) ∼ beta(4 , 2) find the posterior for θ . Normal prior, normal likelihood 1 n a = b = σ 2 σ 2 prior aµ prior + bx ¯ 1 σ 2 µ post = , = . post a + b a + b 3. In the population IQ is normally distributed: θ ∼ N(100 , 15 2 ). An IQ test finds a person’s ‘true’ IQ + random error ∼ N (0 , 10 2 ). Someone takes the test and scores 120. Find the posterior pdf for this person’s IQ. Solutions. 1. f ( θ ) ∼ beta(1 , 1), x ∼ binom(30 , θ ). x = 12, so f ( θ | x = 12) ∼ beta(13 , 19) 2. f ( θ ) ∼ beta(4 , 2), x ∼ geom( θ ). x = 6, so f ( θ | x = 6) ∼ beta(10 , 3) 3. Prior, f ( θ ) ∼ N(100 , 15 2 ), x ∼ N( θ, 10 2 ). So we have, µ prior = 100, σ 2 = 15 2 , σ 2 = 10 2 , n = 1, x = x = 120. prior Applying the normalnormal update formulas: a = 1 b = 1 15 2 , 10 2 . This gives 100 / 15 2 +120 / 10 2 σ 2 1 µ post = = 113 . 8, = = 69 . 2 1 / 15 2 +1 / 10 2 post 1 / 15 2 +1 / 10 2 Bayesian updating: continuous prior-continuous likelihood Examples. Update from prior to posterior for each of the following with the given data. Graph the prior and posterior in each case.

5 18.05 Exam 2 review problems with solutions 1. Romeo is late: likelihood: x ∼ U (0 , θ ), prior: U (0 , 1), data: 0.3, 0.4. 0.4. 2. Waiting times: likelihood: x ∼ exp( λ ), prior: λ ∼ exp(2), data: 1, 2. 3. Waiting times: likelihood: x ∼ exp( λ ), prior: λ ∼ exp(2), data: x 1 , x 2 , . . . , x n . Solutions. 1. In the update table we split the hypotheses into the two different cases θ < 0 . 4 and prior likelihood unnormalized posterior hyp. f ( θ ) f (data | θ ) posterior f ( θ | data) θ ≥ 0 . 4 : θ < 0 . 4 dθ 0 0 0 1 dθ 1 θ ≥ 0 . 4 dθ T θ 3 dθ θ 3 θ 3 Tot. 1 T 1 The total probability 1 1 dθ 1 21 T = ⇒ T = − = = 2 . 625 . θ 3 2 θ 2 8 0 . 4 0 . 4 We use 1 /T as a normalizing factor to make the total posterior probability equal to 1. Prior and posterior for θ 6 4 2 0 0.0 0.2 0.4 0.6 0.8 1.0 Prior in red, posterior in cyan 2. This follows the same pattern as problem 1. − λ · 1 λ e − λ · 2 = λ 2 e − 3 λ The likelihood f (data | λ ) = λ e . prior likelihood unnormalized posterior hyp. f ( λ ) f (data | λ ) posterior f ( λ | data) 2 2e − 2 λ λ 2 e − 3 λ 2 λ 2 e − 5 λ dλ T λ 2 e − 5 λ dλ 0 < λ < ∞ Tot. 1 T 1 The total probability (computed using integration by parts) ∞ 4 2 λ 2 e − 5 λ dλ ⇒ T = T = . 125 0 We use 1 /T as a normalizing factor to make the total posterior probability equal to 1.

6 18.05 Exam 2 review problems with solutions Prior and posterior for λ 2.0 1.0 0.0 0.0 0.5 1.0 1.5 2.0 2.5 Prior in red, posterior in cyan 3. This is nearly identical to problem 2 except the exact values of the data are not given, so we have to work abstractly. n − λ · x The likelihood f (data | λ ) = λ e i . prior likelihood unnormalized posterior hyp. f ( λ ) f (data | λ ) posterior f ( λ | data) 2 λ n e − λ (2+ x i ) dλ 2 λ n e − λ (2+ x i ) dλ 2e − 2 λ λ n e − λ x i 0 < λ < ∞ T Tot. 1 T 1 For this problem you should be able to write down the integral for the total probability y . . We won’t ask you to compute something this complicated on the exam. ∞ 2 ! n i dλ ⇒ T = 2 λ n e − λ x T = . n +1 (2 + x i ) 0 We use 1 /T as a normalizing factor to make the total posterior probability equal to 1. The plot for problem 2 is one example of what the graphs can look like. 5 Null hypothesis significance testing (NHST) 5.1 NHST: Steps 1. Specify H 0 and H A . 2. Choose a significance level α . 3. Choose a test statistic and determine the null distribution. 4. Determine how to compute a p value and/or the rejection region. 5. Collect data. 6. Compute p value or check if test statistic is in the rejection region. 7. Reject or fail to reject H 0 .

18.05 Exam 2 review problems with solutions Spring 2014 Jeremy Orloff - PDF document

18.05 Exam 2 review problems with solutions Spring 2014 Jeremy Orloff and Jonathan Bloom 1 Summary Data: x 1 , . . . , x n Basic statistics: sample mean, sample variance, sample median Likelihood, maximum likelihood estimate (MLE)

Quicksort Sorting Lower Bound Exam Exam Exam Exam 2 2 tomorrow evening 2 2 tomorrow

Exam4 Information and Guidance General Topics General Exam Information Exam types

Examination Lydia Love DVM DACVAA 2018 Exam Committee Chair September 2018 Exam Format

Practice problems Oleg Ivrii July 12, 2020 Oleg Ivrii Practice problems Exam topics The exam

Review Final exam Final exam will be 11-12 problems, drop any 2 Cumulative up to and including

Announcements Announcements Final Exam will be a take Final Exam will be a take- -home exam

Exam Review 1 logistical note post-exam stack smashing assignment due two weeks after spring

ICS 101 Final Exam Review Fall 2016 Final Exam information In lab: check final exam schedule

Exam 2 Review CS461/ECE422 Fall 2009 Exam guidelines Same as for first exam A single page

Exam Review 2 Exam Overview Final Exam Friday,

Math 211 Math 211 Review for the Final Exam December 8, 2002 2 The Final Exam The Final Exam

Prelim 2 Review Spring 2019 Exam Info 4/21/19 Prelim 2 Review 2 What is on the Exam?

Operating Systems Operating Systems CMPSC 473 CMPSC 473 Exam 1 Review Exam 1 Review February

The Bohr Model of Hydrogen Exam Details The exam will be held Wednesday, October 5th from

The final exam Other finals review Final Exam Review CSH Review November 17 th

Vector Graphics Project Check out FilesAndExceptions from SVN Exam 2 review File I/O and

Statistics I Chapter 9 Hypothesis Testing for One Population (Part 1) Ling-Chieh Kung

Hypothesis Testing Jeremy Straughter CASOS Summer Institute June 2020 Center for Computational

Resampling Methods general problem scientific Qs are about populations we cant measure

Multi tiplicity a and B Bayes Multiplicity What is the probability that my finding is real or

Deconstructing Data Science David Bamman, UC Berkeley Info 290 Lecture 10: Validity Feb

Informatics 1: Data & Analysis Lecture 18: Hypothesis Testing and Correlation Ian Stark

Statistical Data Mining Definitions Population, Sample, Statistic Simple Statistics

Training, Education, & Staffing: Focus on low & m iddle-incom e countries ( I ndia as a

18.05 Exam 2 review problems with solutions Spring 2014 Jeremy Orloff - PDF document

18.05 Exam 2 review problems with solutions Spring 2014 Jeremy Orloff and Jonathan Bloom 1 Summary Data: x 1 , . . . , x n Basic statistics: sample mean, sample variance, sample median Likelihood, maximum likelihood estimate (MLE)

Quicksort Sorting Lower Bound Exam Exam Exam Exam 2 2 tomorrow evening 2 2 tomorrow

Exam4 Information and Guidance General Topics General Exam Information Exam types

Examination Lydia Love DVM DACVAA 2018 Exam Committee Chair September 2018 Exam Format

Practice problems Oleg Ivrii July 12, 2020 Oleg Ivrii Practice problems Exam topics The exam

Review Final exam Final exam will be 11-12 problems, drop any 2 Cumulative up to and including

Announcements Announcements Final Exam will be a take Final Exam will be a take- -home exam

Exam Review 1 logistical note post-exam stack smashing assignment due two weeks after spring

ICS 101 Final Exam Review Fall 2016 Final Exam information In lab: check final exam schedule

Exam 2 Review CS461/ECE422 Fall 2009 Exam guidelines Same as for first exam A single page

Exam Review 2 Exam Overview Final Exam Friday,

Math 211 Math 211 Review for the Final Exam December 8, 2002 2 The Final Exam The Final Exam

Prelim 2 Review Spring 2019 Exam Info 4/21/19 Prelim 2 Review 2 What is on the Exam?

Operating Systems Operating Systems CMPSC 473 CMPSC 473 Exam 1 Review Exam 1 Review February

The Bohr Model of Hydrogen Exam Details The exam will be held Wednesday, October 5th from

The final exam Other finals review Final Exam Review CSH Review November 17 th

Vector Graphics Project Check out FilesAndExceptions from SVN Exam 2 review File I/O and

Statistics I Chapter 9 Hypothesis Testing for One Population (Part 1) Ling-Chieh Kung

Hypothesis Testing Jeremy Straughter CASOS Summer Institute June 2020 Center for Computational

Resampling Methods general problem scientific Qs are about populations we cant measure

Multi tiplicity a and B Bayes Multiplicity What is the probability that my finding is real or

Deconstructing Data Science David Bamman, UC Berkeley Info 290 Lecture 10: Validity Feb

Informatics 1: Data &amp; Analysis Lecture 18: Hypothesis Testing and Correlation Ian Stark

Statistical Data Mining Definitions Population, Sample, Statistic Simple Statistics

Training, Education, &amp; Staffing: Focus on low &amp; m iddle-incom e countries ( I ndia as a

Informatics 1: Data & Analysis Lecture 18: Hypothesis Testing and Correlation Ian Stark

Training, Education, & Staffing: Focus on low & m iddle-incom e countries ( I ndia as a