Introduction to Statistics 18.05 Spring 2017 T T T H H T H H - PowerPoint PPT Presentation

Introduction to Statistics 18.05 Spring 2017 T T T H H T H H H T H T H T H T H T H T H T T T H T T T T H H T T H H T H H T H T T H H H H T H T H T T T H T H H H H T T T T H H H T T T H H H H H H H H T T T H T H H T T T H H T H T H H H T T T H H

Three ‘phases’ Data Collection: Informal Investigation / Observational Study / Formal Experiment Descriptive statistics Inferential statistics (the focus in 18.05) To consult a statistician after an experiment is finished is often merely to ask him to conduct a post-mortem examination. He can perhaps say what the experiment died of. R.A. Fisher March 9, 2017 2 / 16

Is it fair? T T T H H T H H H T H T H T H T H T H T H T T T H T T T T H H T T H H T H H T H T T H H H H T H T H T T T H T H H H H T T T T H H H T T T H H H H H H H H T T T H T H H T T T H H T H T H H H T T T H H March 9, 2017 3 / 16

Is it normal? Does it have µ = 0? Is it normal? Is it standard normal? 0.20 Density 0.10 0.00 −4 −2 0 2 4 x March 9, 2017 4 / 16

Is it normal? Does it have µ = 0? Is it normal? Is it standard normal? 0.20 Density 0.10 0.00 −4 −2 0 2 4 x Sample mean = 0.38; sample standard deviation = 1.59 March 9, 2017 4 / 16

What is a statistic? Definition . A statistic is anything that can be computed from the collected data. That is, a statistic must be observable. Point statistic: a single value computed from data, e.g sample average x n or sample standard deviation s n . Interval or range statistics: an interval [ a , b ] computed from the data. (Just a pair of point statistics.) Often written as x ± s . Important: A statistic is itself a random variable since a new experiment will produce new data to compute it. March 9, 2017 5 / 16

Concept question You believe that the lifetimes of a certain type of lightbulb follow an exponential distribution with parameter λ . To test this hypothesis you measure the lifetime of 5 bulbs and get data x 1 , . . . x 5 . Which of the following are statistics? (a) The sample average x = x 1 + x 2 + x 3 + x 4 + x 5 . 5 (b) The expected value of a sample, namely 1 /λ . (c) The difference between x and 1 /λ . 1. (a) 2. (b) 3. (c) 4. (a) and (b) 5. (a) and (c) 6. (b) and (c) 7. all three 8. none of them March 9, 2017 6 / 16

Notation Big letters X , Y , X i are random variables. Little letters x , y , x i are data (values) generated by the random variables. Example. Experiment: 10 flips of a coin: X i is the random variable for the i th flip: either 0 or 1. x i is the actual result (data) from the i th flip. e.g. x 1 , . . . , x 10 = 1 , 1 , 1 , 0 , 0 , 0 , 0 , 0 , 1 , 0. March 9, 2017 7 / 16

Reminder of Bayes’ theorem Bayes’s theorem is the key to our view of statistics. (Much more next week!) P ( H|D ) = P ( D|H ) P ( H ) . P ( D ) P (hypothesis | data) = P (data | hypothesis) P (hypothesis) P (data) March 9, 2017 8 / 16

Estimating a parameter Example. Suppose we want to know the percentage p of people for whom cilantro tastes like soap. Experiment: Ask n random people to taste cilantro. Model: X i ∼ Bernoulli( p ) is whether the i th person says it tastes like soap. Data: x 1 , . . . , x n are the results of the experiment Inference : Estimate p from the data. March 9, 2017 9 / 16

Parameters of interest Example. You ask 100 people to taste cilantro and 55 say it tastes like soap. Use this data to estimate p the fraction of all people for whom it tastes like soap. So, p is the parameter of interest. March 9, 2017 10 / 16

Likelihood For a given value of p the probability of getting 55 ‘successes’ is the binomial probability � 100 � p 55 (1 − p ) 45 . P (55 soap | p ) = 55 Definition: � 100 � p 55 (1 − p ) 45 . The likelihood P (data | p ) = 55 NOTICE: The likelihood takes the data as fixed and computes the probability of the data for a given p . March 9, 2017 11 / 16

Maximum likelihood estimate (MLE) The maximum likelihood estimate (MLE) is a way to estimate the value of a parameter of interest. The MLE is the value of p that maximizes the likelihood. Different problems call for different methods of finding the maximum. Here are two –there are others: d 1. Calculus: To find the MLE, solve dp P (data | p ) = 0 for p . (We should also check that the critical point is a maximum.) 2. Sometimes the derivative is never 0 and the MLE is at an endpoint of the allowable range. March 9, 2017 12 / 16

Log likelihood Because the log function turns multiplication into addition it is often convenient to use the log of the likelihood function log likelihood = ln(likelihood) = ln( P (data | p )) . Example. � 100 � p 55 (1 − p ) 45 Likelihood P (data | p ) = 55 �� 100 �� Log likelihood = ln + 55 ln( p ) + 45 ln(1 − p ) . 55 (Note first term is just a constant.) March 9, 2017 13 / 16

Board Question: Coins A coin is taken from a box containing three coins, which give heads with probability p = 1 / 3, 1 / 2, and 2 / 3. The mystery coin is tossed 80 times, resulting in 49 heads and 31 tails. (a) What is the likelihood of this data for each type on coin? Which coin gives the maximum likelihood? (b) Now suppose that we have a single coin with unknown probability p of landing heads. Find the likelihood and log likelihood functions given the same data. What is the maximum likelihood estimate for p ? March 9, 2017 14 / 16

Continuous likelihood Use the pdf instead of the pmf Example. Light bulbs Lifetime of each bulb ∼ exp( λ ). Test 5 bulbs and find lifetimes of x 1 , . . . , x 5 . (i) Find the likelihood and log likelihood functions. (ii) Then find the maximum likelihood estimate (MLE) for λ . March 9, 2017 15 / 16

Board Question Suppose the 5 bulbs are tested and have lifetimes of 2, 3, 1, 3, 4 years respectively. What is the maximum likelihood estimate (MLE) for λ ? Work from scratch. Do not simply use the formula just given. Set the problem up carefully by defining random variables and densities. March 9, 2017 16 / 16

Introduction to Statistics 18.05 Spring 2017 T T T H H T H H - PowerPoint PPT Presentation

Introduction to Statistics 18.05 Spring 2017 T T T H H T H H H T H T H T H T H T H T H T T T H T T T T H H T T H H T H H T H T T H H H H T H T H T T T H T H H H H T T T T H H H T T

Official Statistics Matt Dray, Assistant Statistician Official Statistics 2 Official

Spring 3 Spring without XML Agenda Industry Forces Whats New Spring 2.0

1 Practical Information 2 Introduction to Statistics Per Bruun Brockhoff 3 Descriptive Statistics:

Statistics for Social Sciences I: Introduction to Statistics Introduction to Statistics

Areal statistics Barry Rowlingson Research Fellow DataCamp Spatial Statistics in R Borders

The Pulse monitors: Statistics Smartpods PULSE 1 - Improve Facility Efficiencies 2 - Increase

Quality Assurance in Official Statistics Directorate of Economics & Statistics, Planning

UK Bleeding Disorder Statistics UK Bleeding Disorder Statistics UK Bleeding Disorder Statistics

The Statistics Network The Statistics Network Statistics network Compute servers Desktop PCs

Synthesis Spring 2017 CS295, Spring 2017 Shuang Zhao 1 Ray Tracing for Shadows [Appel 1968]

Royersford Spring City Bridge Rehabilitation Royersford Spring City Bridge Rehabilitation

Order Statistics and Applications Rosemary Smith Introduction to Order Statistics Unordered

Order Statistics and Pitman Closeness Katherine F. Davies Department of Statistics University of

Advanced Statistics Janette Walde janette.walde@uibk.ac.at Department of Statistics University

Draft MEGL Outreach Spring 2017 Sean Lawton, Jack Love (GMU) MEGL Outreach Spring 2017 1 / 15

I t Introduction to d t i t Descriptive Descriptive Statistics Statistics 17.871 Spring

FRBRization Automated work creation in data.bnf.fr Five entities... The interface The data

Orientation Day Welcome If your child is sick... Message for all year groups Please inform

Student Success Launch 2017 Sponsored by Associate Provost for Undergraduate Education

Fuzzy Integration Kazimierz Musia l University of Wroc law (Poland)

15-251 Great Ideas in Theoretical Computer Science Lecture 1: Introduction to the course

Thinking like a Retailer March 19, 2014 Welcom ome Moderat ator: : Brian Moyer Penn State

COS868 - Probabilidade e COS868 - Probabilidade e Estatstica para Aprendizado de Estatstica

Genomic Prediction and Selection for Multi-Environments with Big Data using the BGLR statistical

Sambuz

Useful Links

Newsletter

Mail Us

Introduction to Statistics 18.05 Spring 2017 T T T H H T H H - PowerPoint PPT Presentation

Introduction to Statistics 18.05 Spring 2017 T T T H H T H H H T H T H T H T H T H T H T T T H T T T T H H T T H H T H H T H T T H H H H T H T H T T T H T H H H H T T T T H H H T T

Official Statistics Matt Dray, Assistant Statistician Official Statistics 2 Official

Spring 3 Spring without XML Agenda Industry Forces Whats New Spring 2.0

1 Practical Information 2 Introduction to Statistics Per Bruun Brockhoff 3 Descriptive Statistics:

Statistics for Social Sciences I: Introduction to Statistics Introduction to Statistics

Areal statistics Barry Rowlingson Research Fellow DataCamp Spatial Statistics in R Borders

The Pulse monitors: Statistics Smartpods PULSE 1 - Improve Facility Efficiencies 2 - Increase

Quality Assurance in Official Statistics Directorate of Economics &amp; Statistics, Planning

UK Bleeding Disorder Statistics UK Bleeding Disorder Statistics UK Bleeding Disorder Statistics

The Statistics Network The Statistics Network Statistics network Compute servers Desktop PCs

Synthesis Spring 2017 CS295, Spring 2017 Shuang Zhao 1 Ray Tracing for Shadows [Appel 1968]

Royersford Spring City Bridge Rehabilitation Royersford Spring City Bridge Rehabilitation

Order Statistics and Applications Rosemary Smith Introduction to Order Statistics Unordered

Order Statistics and Pitman Closeness Katherine F. Davies Department of Statistics University of

Advanced Statistics Janette Walde janette.walde@uibk.ac.at Department of Statistics University

Draft MEGL Outreach Spring 2017 Sean Lawton, Jack Love (GMU) MEGL Outreach Spring 2017 1 / 15

I t Introduction to d t i t Descriptive Descriptive Statistics Statistics 17.871 Spring

FRBRization Automated work creation in data.bnf.fr Five entities... The interface The data

Orientation Day Welcome If your child is sick... Message for all year groups Please inform

Student Success Launch 2017 Sponsored by Associate Provost for Undergraduate Education

Fuzzy Integration Kazimierz Musia l University of Wroc law (Poland)

15-251 Great Ideas in Theoretical Computer Science Lecture 1: Introduction to the course

Thinking like a Retailer March 19, 2014 Welcom ome Moderat ator: : Brian Moyer Penn State

COS868 - Probabilidade e COS868 - Probabilidade e Estatstica para Aprendizado de Estatstica

Genomic Prediction and Selection for Multi-Environments with Big Data using the BGLR statistical

Sambuz

Useful Links

Newsletter

Mail Us

Quality Assurance in Official Statistics Directorate of Economics & Statistics, Planning