Statistical Inference Review Gonzalo Mateos Dept. of ECE and - PowerPoint PPT Presentation

Statistical Inference Review Gonzalo Mateos Dept. of ECE and Goergen Institute for Data Science University of Rochester gmateosb@ece.rochester.edu http://www.ece.rochester.edu/~gmateosb/ January 20, 2020 Network Science Analytics Statistical Inference Review 1

Statistical inference and models Statistical inference and models Point estimates, confidence intervals and hypothesis tests Tutorial on inference about a mean Tutorial on linear regression inference Network Science Analytics Statistical Inference Review 2

Probability and inference Probability theory Data-generating process Observed data Inference and data mining ◮ Probability theory is a formalism to work with uncertainty ◮ Given a data-generating process, what are properties of outcomes? ◮ Statistical inference deals with the inverse problem ◮ Given outcomes, what can we say on the data-generating process? Network Science Analytics Statistical Inference Review 3

Statistical inference ◮ Statistical inference refers to the process whereby ⇒ Given observations x = [ x 1 , . . . , x n ] T from X 1 , . . . , X n ∼ F ⇒ We aim to extract information about the distribution F ◮ Ex: Infer a feature of F such as its mean ◮ Ex: Infer the CDF F itself, or the PDF f = F ′ ◮ Often observations are of the form ( y i , x i ), i = 1 , . . . , n ⇒ Y is the response or outcome. X is the predictor or feature ◮ Q: Relationship between the random variables (RVs) Y and X ? � X = x ◮ Ex: Learn E � � � Y as a function of x ◮ Ex: Foretelling a yet-to-be observed value y ∗ from the input X ∗ = x ∗ Network Science Analytics Statistical Inference Review 4

Models ◮ A statistical model specifies a set F of CDFs to which F may belong ◮ A common parametric model is of the form F = { f ( x ; θ ) : θ ∈ Θ } ◮ Parameter(s) θ are unknown, take values in parameter space Θ ◮ Space Θ has dim(Θ) < ∞ , not growing with the sample size n ◮ Ex: Data come from a Gaussian distribution � 2 πσ 2 e − ( x − µ )2 1 � 2 σ 2 , µ ∈ R , σ > 0 F N = √ f ( x ; µ, σ ) = ⇒ A two-parameter model: θ = [ µ, σ ] T and Θ = R × R + ◮ A nonparametric model has dim(Θ) = ∞ , or dim(Θ) grows with n ◮ Ex: F All = { All CDFs F } Network Science Analytics Statistical Inference Review 5

Models and inference tasks ◮ Given independent data x = [ x 1 , . . . , x n ] T from X 1 , . . . , X n ∼ F ⇒ Statistical inference often conducted in the context of a model Ex: One-dimensional parametric estimation ◮ Suppose observations are Bernoulli distributed with parameter p ◮ The task is to estimate the parameter p (i.e., the mean) Ex: Two-dimensional parametric estimation ◮ Suppose the PDF f ∈ F N , i.e., data are Gaussian distributed ◮ The problem is to estimate the parameters µ and σ ◮ May only care about µ , and treat σ as a nuisance parameter Ex: Nonparametric estimation of the CDF ◮ The goal is to estimate F assuming only F ∈ F All = { All CDFs F } Network Science Analytics Statistical Inference Review 6

Regression models ◮ Suppose observations are from ( Y 1 , X 1 ) , . . . , ( Y n , X n ) ∼ F YX ⇒ Goal is to learn the relationship between the RVs Y and X ◮ A typical approach is to model the regression function � ∞ � X = x � � � r ( x ) := E = yf Y | X ( y | x ) dy Y −∞ ⇒ Equivalent to the regression model Y = r ( X ) + ǫ , E [ ǫ ] = 0 ◮ Ex: Parametric linear regression model r ∈ F Lin = { r : r ( x ) = β 0 + β 1 x } ◮ Ex: Nonparametric regression model, assuming only smoothness � ∞ � � ( r ′′ ( x )) 2 dx < ∞ r ∈ F Sob = r : −∞ Network Science Analytics Statistical Inference Review 7

Regression, prediction and classification ◮ Given data ( y 1 , x 1 ) , . . . , ( y n , x n ) from ( Y 1 , X 1 ) , . . . , ( Y n , X n ) ∼ F YX ◮ Ex: x i is the blood pressure of subject i , y i how long she lived � X = x ◮ Model the relationship between Y and X via r ( x ) = E � � � Y ⇒ Q: What are classical inference tasks in this context? Ex: Regression or curve fitting ◮ The problem is to estimate the regression function r ∈ F Ex: Prediction ◮ The goal is to predict Y ∗ for a new patient based on their X ∗ = x ∗ ◮ If a regression estimate ˆ r is available, can do y ∗ := ˆ r ( x ∗ ) Ex: Classification ◮ Suppose RVs Y i are discrete, e.g. live or die encoded as ± 1 ◮ The prediction problem above is termed classification Network Science Analytics Statistical Inference Review 8

Fundamental concepts in inference Statistical inference and models Point estimates, confidence intervals and hypothesis tests Tutorial on inference about a mean Tutorial on linear regression inference Network Science Analytics Statistical Inference Review 9

Point estimators ◮ Point estimation refers to making a single “best guess” about F ◮ Ex: Estimate the parameter β in a linear regression model � � r : r ( x ) = β T x F Lin = ◮ Def: Given data x = [ x 1 , . . . , x n ] T from X 1 , . . . , X n ∼ F , a point estimator ˆ θ of a parameter θ is some function ˆ θ = g ( X 1 , . . . , X n ) ⇒ The estimator ˆ θ is computed from the data, hence it is a RV ⇒ The distribution of ˆ θ is called sampling distribution ◮ The estimate is the specific value for the given data sample x ⇒ May write ˆ θ n to make explicit reference to the sample size Network Science Analytics Statistical Inference Review 10

Bias, standard error and mean squared error � � ◮ Def: The bias of an estimator ˆ θ is given by bias(ˆ ˆ θ ) := E θ − θ ◮ Def: The standard error is the standard deviation of ˆ θ � � � se = se(ˆ ˆ θ ) := var θ ⇒ Often, se depends on the unknown F . Can form an estimate ˆ se ◮ Def: The mean squared error (MSE) is a measure of quality of ˆ θ � θ − θ ) 2 � (ˆ MSE = E ◮ Expected values are with respect to the data distribution n � f ( x 1 , . . . , x n ; θ ) = f ( x i ; θ ) i =1 Network Science Analytics Statistical Inference Review 11

The bias-variance decomposition of the MSE Theorem � θ − θ ) 2 � (ˆ The MSE = E can be written as � � MSE = bias 2 (ˆ ˆ θ ) + var θ Proof. � � ◮ Let ¯ ˆ θ = E θ . Then � θ − θ ) 2 � � θ − θ ) 2 � (ˆ (ˆ θ − ¯ θ + ¯ E = E � θ ) 2 � � � (ˆ θ − ¯ + 2(¯ θ − ¯ ˆ + (¯ θ − θ ) 2 θ − θ ) E = E θ � � ˆ + bias 2 (ˆ = var θ θ ) � � � � θ − ¯ ˆ ˆ − ¯ ◮ The last equality follows since E θ = E θ θ = 0 Network Science Analytics Statistical Inference Review 12

Desirable properties of point estimators ◮ Q: Desiderata for an estimator ˆ θ of the parameter θ ? � � ◮ Def: An estimator is unbiased if bias(ˆ ˆ θ ) = 0, i.e., if E θ = θ ⇒ An unbiased estimator is “on target” on average p ◮ Def: An estimator is consistent if ˆ → θ , i.e. for any ǫ > 0 θ n � � | ˆ n →∞ P lim θ n − θ | < ǫ = 1 ⇒ A consistent estimator converges to θ as we collect more data ◮ Def: An unbiased estimator is asymptotically Normal if � ˆ � x � θ n − θ 1 e − u 2 / 2 du √ n →∞ P lim ≤ x = se 2 π −∞ ⇒ Equivalently, for large enough sample size then ˆ θ n ∼ N ( θ, se 2 ) Network Science Analytics Statistical Inference Review 13

Coin tossing example Ex: Consider tossing the same coin n times and record the outcomes ◮ Model observations as X 1 , . . . , X n ∼ Ber( p ). Estimate of p ? ◮ A natural choice is the sample mean estimator n p = 1 � ˆ X i n i =1 ◮ Recall that for X ∼ Ber( p ), then E [ X ] = p and var [ X ] = p (1 − p ) ◮ The estimator ˆ p is unbiased since � n � n 1 = 1 � � E [ˆ p ] = E E [ X i ] = p X i n n i =1 i =1 ⇒ Also used that the expected value is a linear operator Network Science Analytics Statistical Inference Review 14

Coin tossing example (continued) ◮ The standard error is � � � n � n � � � 1 � 1 p (1 − p ) � � � � se = � var = var [ X i ] = X i n 2 n n i =1 i =1 � p (1 − ˆ ˆ p ) ⇒ Unknown p . Estimated standard error is ˆ se = n = p (1 − p ) ◮ Since ˆ � p n − p ) 2 � p n is unbiased, then MSE = E (ˆ → 0 n p ◮ Thus ˆ p converges in the mean square sense, hence also ˆ → p p n ◮ Establishes ˆ p is a consistent estimator of the parameter p ◮ Also, ˆ p is asymptotically Normal by the Central Limit Theorem Network Science Analytics Statistical Inference Review 15

Confidence intervals ◮ Set estimates specify regions of Θ where θ is likely to lie on ◮ Def: Given i.i.d. data X 1 , . . . , X n ∼ F , a 1 − α confidence interval of a parameter θ is an interval C n = ( a , b ), where a = a ( X 1 , . . . , X n ) and b = b ( X 1 , . . . , X n ) are functions of the data such that P ( θ ∈ C n ) = 1 − α, for all θ ∈ Θ ⇒ In words, C n = ( a , b ) traps θ with probability 1 − α ⇒ The interval C n is computed from the data, hence it is random ◮ We call 1 − α the coverage of the confidence interval ◮ Ex: It is common to report 95% confidence intervals, i.e., α = 0 . 05 Network Science Analytics Statistical Inference Review 16

Statistical Inference Review Gonzalo Mateos Dept. of ECE and - PowerPoint PPT Presentation

Statistical Inference Review Gonzalo Mateos Dept. of ECE and Goergen Institute for Data Science University of Rochester gmateosb@ece.rochester.edu http://www.ece.rochester.edu/~gmateosb/ January 20, 2020 Network Science Analytics Statistical

STAT 401A - Statistical Methods for Research Workers Statistical Inference Jarad Niemi (Dr. J)

Foundations for Inference I Dajiang Liu @PHS525 Feb-09-2016 Statistical Inference

UQ, STAT2201, 2017, Lecture 6 Unit 6 Statistical Inference Ideas. 1 Statistical Inference is

Statistical Natural Language Processing Statistical models: learning, inference, estimation,

Inference in Bayesian networks Chapter 14.45 Chapter 14.45 1 Outline Exact inference

Statistical Statistical Statistical Model Statistical Model Model Checking Model Checking

Lifted Inference in Statistical Relational Models Guy Van den Broeck BUDA Invited Tutorial June

Statistical Inference https://people.bath.ac.uk/masss/APTS/apts.html Simon Shaw University of

COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Trevor Cohn 23. PGM

Post-Selection Inference Todd Kuffner Washington University in St. Louis PhyStat 2016

Soft Inference and Posterior Marginals September 19, 2013 Soft vs. Hard Inference Hard

Type Inference 75 Definition Type Inference Type inference = Java compiler's ability

Inference in Bayesian networks Chapter 14.45 Chapter 14.45 1 Outline Exact inference

Exact Inference Inference Basic task for inference: Compute

Statistical graphics with Statistical graphics with ggplot2 ggplot2 Programming for Statistical

Modes of Statistical Inference for Causal Efgects Plus an overview of the testing based approach

Issues in Non-Clinical Statistics Stan Altan Chemistry, Manufacturing & Control Statistical

Analytics for Object Storage Simplified - Unified File and Object for Hadoop Sandeep R Patil

Review Selection bias, overfitting Bias v. variance v. residual Bias-variance tradeoff

Estimating Feedbacks from Natural Variability in the Global Energy Budget Cristian Proistosescu,

P ARAGON : Q O S-A WARE S CHEDULING F OR H ETEROGENEOUS D ATACENTERS Christina Delimitrou and

Exploiting compositionality to explore a large space of model structures Roger Grosse Dept. of

Visual Analytics and Data Mining Visual Analytics and Data Mining in S- in S -T T-

shortened Notation Measures of Location Measures of Dispersion Standardization

Statistical Inference Review Gonzalo Mateos Dept. of ECE and - PowerPoint PPT Presentation

Statistical Inference Review Gonzalo Mateos Dept. of ECE and Goergen Institute for Data Science University of Rochester gmateosb@ece.rochester.edu http://www.ece.rochester.edu/~gmateosb/ January 20, 2020 Network Science Analytics Statistical

STAT 401A - Statistical Methods for Research Workers Statistical Inference Jarad Niemi (Dr. J)

Foundations for Inference I Dajiang Liu @PHS525 Feb-09-2016 Statistical Inference

UQ, STAT2201, 2017, Lecture 6 Unit 6 Statistical Inference Ideas. 1 Statistical Inference is

Statistical Natural Language Processing Statistical models: learning, inference, estimation,

Inference in Bayesian networks Chapter 14.45 Chapter 14.45 1 Outline Exact inference

Statistical Statistical Statistical Model Statistical Model Model Checking Model Checking

Lifted Inference in Statistical Relational Models Guy Van den Broeck BUDA Invited Tutorial June

Statistical Inference https://people.bath.ac.uk/masss/APTS/apts.html Simon Shaw University of

COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Trevor Cohn 23. PGM

Post-Selection Inference Todd Kuffner Washington University in St. Louis PhyStat 2016

Soft Inference and Posterior Marginals September 19, 2013 Soft vs. Hard Inference Hard

Type Inference 75 Definition Type Inference Type inference = Java compiler's ability

Inference in Bayesian networks Chapter 14.45 Chapter 14.45 1 Outline Exact inference

Exact Inference Inference Basic task for inference: Compute

Statistical graphics with Statistical graphics with ggplot2 ggplot2 Programming for Statistical

Modes of Statistical Inference for Causal Efgects Plus an overview of the testing based approach

Issues in Non-Clinical Statistics Stan Altan Chemistry, Manufacturing &amp; Control Statistical

Analytics for Object Storage Simplified - Unified File and Object for Hadoop Sandeep R Patil

Review Selection bias, overfitting Bias v. variance v. residual Bias-variance tradeoff

Estimating Feedbacks from Natural Variability in the Global Energy Budget Cristian Proistosescu,

P ARAGON : Q O S-A WARE S CHEDULING F OR H ETEROGENEOUS D ATACENTERS Christina Delimitrou and

Exploiting compositionality to explore a large space of model structures Roger Grosse Dept. of

Visual Analytics and Data Mining Visual Analytics and Data Mining in S- in S -T T-

shortened Notation Measures of Location Measures of Dispersion Standardization

Issues in Non-Clinical Statistics Stan Altan Chemistry, Manufacturing & Control Statistical