Probability and Statistics for Computer Science "StaGsGcal - PowerPoint PPT Presentation

Probability and Statistics ì for Computer Science "StaGsGcal thinking will one day be as necessary for efficient ciGzenship as the ability to read and write." H. G. Wells Credit: wikipedia Hongye Liu, Teaching Assistant Prof, CS361, UIUC, 10.15.2020

Last Lecture ✺ Review of sample mean and confidence interval ✺ Bootstrap simulaGon of other sample staGsGc ✺ Hypothesis test intro

Q. ✺ Given the histogram of Histogram of sample_median the bootstrap samples’ 3000 staGsGc, we want to get its 95% confidence 2500 interval. Where is the 2000 leV side threshold? Frequency 1500 A. 0.025 quanGle 1000 B. 0.05 quanGle 500 C. 0.975 quanGle 0 250 300 350 400 450 500 550 sample_median

Objectives ✺ Hypothesis test ✺ Chi-square test ✺ Maximum Likelihood EsGmaGon

A hypothesis ✺ Ms. Smith’s vote percentage is 55% This is what we want to test, oVen called null hypothesis H 0 51% ✺ Should we reject this hypothesis given the poll data?

Fraction of “less extreme” samples ✺ Assuming the hypothesis H 0 is true ✺ Define a test staGsGc x = ( sample mean ) − ( hypothesized value ) standard error ✺ Since N >30, x should come from a standard normal ✺ So, the fracGon of “less extreme” samples is: � | x | exp ( − u 2 1 f = 2 ) du √ 2 π − | x |

Rejection region of null hypothesis H 0 ✺ Assuming the hypothesis H 0 is true ✺ Define a test staGsGc x = ( sample mean ) − ( hypothesized value ) standard error ✺ Since N >30, x should come from a standard normal RejecGon region (2α) Credit: J. Orloff et al

P-value: Rejection region- “The extreme fraction” ✺ It is convenGonal to report the p-value of a hypothesis test � | x | exp ( − u 2 1 p = 1 − f = 1 − 2 ) du √ 2 π − | x | ✺ Since N >30, x should come from a standard normal By convenGon: RejecGon region 2α = 0.05 (2α) That is: If p < 0.05, reject H 0

p-value: election polling ✺ H 0: Ms. Smith’s vote percentage is 55% ✺ The sample mean is 51% and stderr is 1.44% x = 51 − 55 ✺ The test staGsGc = − 2 . 7778 1 . 44 ✺ And the p-value for the test is: � 2 . 7778 exp ( − u 2 1 p = 1 − 2 ) du = 0 . 00547 < 0 . 05 √ 2 π − 2 . 7778 ✺ So we reject the hypothesis

Hypothesis test if N < 30 ✺ Q: what distribuGon should we use to test the hypothesis of sample mean if N<30? A. Normal distribuGon B. t-distribuGon with degree =30 C. t-distribuGon with degree = N D. t-distribuGon with degree = N-1

The use and misuse of p-value ✺ p-value use in scienGfic pracGce ✺ Usually used to reject the null hypothesis that the data is random noise ✺ Common pracGce is p < 0.05 is considered significant evidence for something interesGng ✺ CauGon about p-value hacking ✺ RejecGng the null hypothesis doesn’t mean the alternaGve is true ✺ P < 0.05 is arbitrary and oVen is not enough for controlling false posiGve phenomenon

Be wary of one tailed p-values ✺ The one tailed p-value should only be considered when the realized sample mean or differences will for sure fall only to one size of the distribuGon. ✺ SomeGmes scienGst are tempted to use one tailed test because it’ll give smaller p-val. But this is bad staGsGcs!

Chi-square distribution ✺ If are independent variables of standard normal Z ′ i s m distribuGon, � X = Z 2 1 + Z 2 2 + ... + Z 2 Z 2 m = i i =1 has a Chi-square distribuGon with degree of freedom m , X ∼ χ 2 ( m ) ✺ We can test the goodness of fit for a model using a staGsGc C against this distribuGon, where m ( f o ( ε i ) − f t ( ε i )) 2 � C = f t ( ε i ) i =1

Independence analysis using Chi-square ✺ Given the two way table, test whether the column and row are independent Boy Girl Total 117 130 247 Grades Popular 50 91 141 Sports 60 30 90 227 251 478 Total

Independence analysis using Chi-square ✺ The theoreGcal expected values if independent Boy Girl Total 117.29916 129.70084 247 Grades Popular 66.96025 74.03975 141 Sports 42.74059 47.25941 90 227 251 478 Total

The degree of the chi-square distribution for the two way table ✺ The degree of freedom for the chi-square distribuGon for a r by c table is (r-1) × (c-1) where r>1 and c>1 ✺ Because the degree df = n-1-p See textbook Pg 171-172 = rc -1- (r-1) - (c-1) = (r-1) × (c-1) n is the number of cells of data; = 2 p is the number of unknown parameters

Chi-square test for the popular kid data ✺ The Chi-staGsGc : 21.455 chisq.test(data_BG) Pearson's Chi-squared test data: data_BG X-squared = 21.455, df = 2, p-value = 2.193e-05 ✺ P-value: 2.193e-05 ✺ It’s very unlikely the two categories are independent

Q. What is the degree of freedom for this? ✺ The following 2-way table for chi-square test has a degree of freedom equal to: Class Male Female 1st 118 4 2nd 154 13 3rd 387 89 Crew 670 3 A. 8 B. 6 C. 3 D. 4

Chi-square test is very versatile ✺ Chi-square test is so versaGle that it can be uGlized in many ways either for discrete data or conGnuous data via intervals ✺ Please check out the worked-out examples in the textbook and read more about its applicaGons.

Maximum likelihood estimation

The parameter estimation problem ✺ Suppose we have a dataset that we know comes from a distribuGon (ie. Binomial, Geometric, or Poisson, etc.) ✺ What is the best esGmate of the parameters ( θ or θ s) of the distribuGon? ✺ Examples: ✺ For binomial and geometric distribuGon, θ = p (probability of success) ✺ For Poisson and exponenGal distribuGons, θ = λ (intensity) ✺ For normal distribuGons, θ could be μ or σ 2 .

Motivation: Poisson example ✺ Suppose we have data on the number of babies born each hour in a large hospital 1 2 N hour … # of babies k 1 k 2 k N … ✺ We can assume the data comes from a Poisson distribuGon ✺ What is your best esGmate of the intensity λ? Credit: David Varodayan

Maximum likelihood estimation (MLE) ✺ We write the probability of seeing the data D given parameter θ L ( θ ) = P ( D | θ ) ✺ The likelihood funcBon is not a L ( θ ) probability distribuGon ✺ The maximum likelihood esBmate (MLE) of θ is ˆ θ = arg max L ( θ ) θ

Why is L (θ) not a probability distribution? A. It doesn’t give the probability of all the possible θ values. B. Don’t know whether the sum or integral of L ( θ ) for all possible θ values is one or not. C. Both.

Likelihood function: Binomial example ✺ Suppose we have a coin with unknown probability of coming up heads ✺ We toss it N Gmes and observe k heads ✺ We know that this data comes from a binomial distribuGon ✺ What is the likelihood funcGon ? L ( θ ) = P ( D | θ )

Likelihood function: binomial example ✺ Suppose we have a coin with unknown probability of coming up heads ✺ We toss it N Gmes and observe k heads ✺ We know that this data comes from a binomial distribuGon ✺ What is the likelihood funcGon ? L ( θ ) = P ( D | θ ) � N � θ k (1 − θ ) N − k L ( θ ) = k

MLE derivation: binomial example � N � θ k (1 − θ ) N − k L ( θ ) = k ˆ In order to find: θ = arg max L ( θ ) θ We set: d L ( θ ) = 0 d θ

MLE derivation: binomial example � N � θ k (1 − θ ) N − k L ( θ ) = k

MLE derivation: binomial example � N � θ k (1 − θ ) N − k L ( θ ) = k � N � d ( k θ k − 1 (1 − θ ) N − k − θ k ( N − k )(1 − θ ) N − k − 1 ) = 0 d θ L ( θ ) = k

MLE derivation: binomial example � N � θ k (1 − θ ) N − k L ( θ ) = k � N � d ( k θ k − 1 (1 − θ ) N − k − θ k ( N − k )(1 − θ ) N − k − 1 ) = 0 d θ L ( θ ) = k k θ k − 1 (1 − θ ) N − k = θ k ( N − k )(1 − θ ) N − k − 1

MLE derivation: binomial example � N � θ k (1 − θ ) N − k L ( θ ) = k � N � d ( k θ k − 1 (1 − θ ) N − k − θ k ( N − k )(1 − θ ) N − k − 1 ) = 0 d θ L ( θ ) = k k θ k − 1 (1 − θ ) N − k = θ k ( N − k )(1 − θ ) N − k − 1 k − k θ = N θ − k θ

MLE derivation: binomial example � N � θ k (1 − θ ) N − k L ( θ ) = k � N � d ( k θ k − 1 (1 − θ ) N − k − θ k ( N − k )(1 − θ ) N − k − 1 ) = 0 d θ L ( θ ) = k k θ k − 1 (1 − θ ) N − k = θ k ( N − k )(1 − θ ) N − k − 1 k − k θ = N θ − k θ θ = k The MLE of p ˆ N

Likelihood function: geometric example ✺ Suppose we have a die with unknown probability of coming up six ✺ We roll it and it comes up six for the first Gme on the kth roll ✺ We know that this data comes from a geometric distribuGon ✺ What is the likelihood funcGon ? L ( θ ) = P ( D | θ ) Assume θ is p .

MLE derivation: geometric example L ( θ ) = (1 − θ ) k − 1 θ

MLE derivation: geometric example L ( θ ) = (1 − θ ) k − 1 θ d d θ L ( θ ) = (1 − θ ) k − 1 − ( k − 1)(1 − θ ) k − 2 θ = 0

Probability and Statistics for Computer Science "StaGsGcal - PowerPoint PPT Presentation

Probability and Statistics for Computer Science "StaGsGcal thinking will one day be as necessary for efficient ciGzenship as the ability to read and write." H. G. Wells Credit: wikipedia Hongye Liu, Teaching Assistant Prof, CS361,

Probability Basics Martin Emms October 1, 2020 Probability Basics Outline Probability

Continuing Probability. Wrap up: Total Probability and Conditional Probability. Continuing

Chapter 2 Probability 1. Definition of Probability 2. Probability of disjoint events 3.

Probability Basics Probability Background Martin Emms October 1, 2020 Probability Basics

Chapter 2 Probability 1. Definition of Probability 2. Probability of disjoint events 3.

Categorical Probability and Statistics Peter McCullagh Department of Statistics University of

Counting and Probability Whats to come? Counting and Probability Whats to come?

Unit 2: Probability and distributions Lecture 1: Probability and conditional probability

Which probability Which probability Which probability Which probability theory for cosmology?

Recap of Basic Probability Elements of basic probability theory probability theory The

1 2 3 4 Stopping Probability Visiting Probability 5 Stopping

ACMS 20340 Statistics for Life Sciences Chapter 9: Introducing Probability Why Consider

Statistics 1B Statistics 1B 1 (11) 0. Lecture 1. Introduction and probability review

Statistics 370 Probability and Statistics for Engineers Instructor: Peter Bloomfield Course

Chapter II.2: Basic Probability Theory and Statistics 1. What is a probability? 1.1. Probability

Official Statistics Matt Dray, Assistant Statistician Official Statistics 2 Official

Goodness-of-Fit Measures for Induction Trees Gilbert Ritschard, Department of Econometrics,

Introduction to Business Statistics QM 220 QM 220 Chapter 11 Dr. Mohammad Zainal Chapter 11:

ACMS 20340 Statistics for Life Sciences Chapter 21: The Chi-Square Test for Goodness of Fit

Ch05. Introduction to Probability Theory Ping Yu Faculty of Business and Economics The

S e n s i t i v i t y t o C P v i o l a t i o n i n n e u t r i n

On the Chi square and higher-order Chi distances for approximating f -divergences Frank Nielsen 1

Chapter 6 Inference for categorical data Huamei Dong 03/22/2016 1. Review of hypothesis test

Review - Mathematical Tools & Probability Logarithm Fundamentals of Probability Discrete

Probability and Statistics for Computer Science "StaGsGcal - PowerPoint PPT Presentation

Probability and Statistics for Computer Science "StaGsGcal thinking will one day be as necessary for efficient ciGzenship as the ability to read and write." H. G. Wells Credit: wikipedia Hongye Liu, Teaching Assistant Prof, CS361,

Probability Basics Martin Emms October 1, 2020 Probability Basics Outline Probability

Continuing Probability. Wrap up: Total Probability and Conditional Probability. Continuing

Chapter 2 Probability 1. Definition of Probability 2. Probability of disjoint events 3.

Probability Basics Probability Background Martin Emms October 1, 2020 Probability Basics

Chapter 2 Probability 1. Definition of Probability 2. Probability of disjoint events 3.

Categorical Probability and Statistics Peter McCullagh Department of Statistics University of

Counting and Probability Whats to come? Counting and Probability Whats to come?

Unit 2: Probability and distributions Lecture 1: Probability and conditional probability

Which probability Which probability Which probability Which probability theory for cosmology?

Recap of Basic Probability Elements of basic probability theory probability theory The

1 2 3 4 Stopping Probability Visiting Probability 5 Stopping

ACMS 20340 Statistics for Life Sciences Chapter 9: Introducing Probability Why Consider

Statistics 1B Statistics 1B 1 (11) 0. Lecture 1. Introduction and probability review

Statistics 370 Probability and Statistics for Engineers Instructor: Peter Bloomfield Course

Chapter II.2: Basic Probability Theory and Statistics 1. What is a probability? 1.1. Probability

Official Statistics Matt Dray, Assistant Statistician Official Statistics 2 Official

Goodness-of-Fit Measures for Induction Trees Gilbert Ritschard, Department of Econometrics,

Introduction to Business Statistics QM 220 QM 220 Chapter 11 Dr. Mohammad Zainal Chapter 11:

ACMS 20340 Statistics for Life Sciences Chapter 21: The Chi-Square Test for Goodness of Fit

Ch05. Introduction to Probability Theory Ping Yu Faculty of Business and Economics The

S e n s i t i v i t y t o C P v i o l a t i o n i n n e u t r i n

On the Chi square and higher-order Chi distances for approximating f -divergences Frank Nielsen 1

Chapter 6 Inference for categorical data Huamei Dong 03/22/2016 1. Review of hypothesis test

Review - Mathematical Tools &amp; Probability Logarithm Fundamentals of Probability Discrete

Review - Mathematical Tools & Probability Logarithm Fundamentals of Probability Discrete