II.2 Statistical Inference: Sampling and Estimation A statistical - PowerPoint PPT Presentation

II.2 Statistical Inference: Sampling and Estimation A statistical model Μ is a set of distributions (or regression functions), e.g., all uni-modal, smooth distributions. Μ is called a parametric model if it can be completely described by a finite number of parameters, e.g., the family of Normal distributions for a finite number of parameters μ , σ : 2 ( x ) 2 1 2 f ( x ; , ) e for R , 0 X 2 2 IR&DM, WS'11/12 October 25, 2011 II.1

Statistical Inference Given a parametric model M and a sample X 1 ,...,X n , how do we infer (learn) the parameters of M? For multivariate models with observed variable X and „outcome (response)“ variable Y, this is called prediction or regression , for a discrete outcome variable this is also called classification . r(x) = E[Y | X=x] is called the regression function . IR&DM, WS'11/12 October 25, 2011 II.2

Idea of Sampling Distribution X Statistical Inference (e.g., a population, What can we say about X based on objects of interest) X 1 ,…, X n ? Samples X 1 ,…, X n drawn from X Distrib. Param. Sample Param. (e.g., people, objects) μ X mean X 2 variance 2 S X X N size n Example: Suppose we want to estimate the average salary of employees in German companies.  Sample 1: Suppose we look at n=200 top-paid CEOs of major banks.  Sample 2: Suppose we look at n=100 employees across all kinds of companies. IR&DM, WS'11/12 October 25, 2011 II.3

Basic Types of Statistical Inference Given a set of iid. samples X 1 ,...,X n ~ X of an unknown distribution X. e.g.: n single-coin-toss experiments X 1 ,...,X n ~ X: Bernoulli(p) • Parameter Estimation e.g.: - what is the parameter p of X: Bernoulli(p) ? - what is E[X], the cdf F X of X, the pdf f X of X, etc.? • Confidence Intervals e.g.: give me all values C=(a,b) such that P(p C ) ≥ 0.95 where a and b are derived from samples X 1 ,...,X n • Hypothesis Testing e.g.: H 0 : p = 1/2 vs. H 1 : p ≠ 1/2 IR&DM, WS'11/12 October 25, 2011 II.4

Statistical Estimators A point estimator for a parameter of a prob. distribution X is a ˆ random variable derived from an iid. sample X 1 ,...,X n . n n 1 Examples: X : X Sample mean: i n i 1 n 1 2 2 S : ( X X ) Sample variance: X i n 1 i 1 ˆ An estimator for parameter is unbiased n ˆ if E[ ] = ; n ˆ otherwise the estimator has bias E[ ] – . n An estimator on a sample of size n is consistent ˆ lim P [| | ] 1 for any 0 if n n Sample mean and sample variance are unbiased and consistent estimators of μ X and . 2 X IR&DM, WS'11/12 October 25, 2011 II.5

Estimator Error ˆ Let be an estimator for parameter over iid. samples X 1 , ...,X n . n ˆ The distribution of is called the sampling distribution . n ˆ ˆ ˆ The standard error for is: se ( ) Var [ ] n n n ˆ The mean squared error (MSE) for is: n ˆ ˆ 2 MSE ( ) E [( ) ] n n ˆ ˆ 2 bias ( ) Var[ ] n n Theorem: If bias 0 and se 0 then the estimator is consistent. ˆ The estimator is asymptotically Normal if n ˆ converges in distribution to standard Normal N(0,1). ( ) / se n IR&DM, WS'11/12 October 25, 2011 II.6

Types of Estimation • Nonparametric Estimation No assumptions about model M nor the parameters θ of the underlying distribution X  “ Plug- in estimators” (e.g. histograms) to approximate X • Parametric Estimation (Inference) Requires assumptions about model M and the parameters θ of the underlying distribution X Analytical or numerical methods for estimating θ  Method-of-Moments estimator  Maximum Likelihood estimator and Expectation Maximization (EM) IR&DM, WS'11/12 October 25, 2011 II.7

Nonparametric Estimation ˆ The empirical distribution function is the cdf that F n puts probability mass 1/n at each data point X i : 1 ˆ n F ( x ) I( X x ) n i i 1 n 1 if X x i with I ( X x ) i 0 if X x i A statistical functional (“statistics”) T(F) is any function over F, e.g., mean, variance, skewness, median, quantiles, correlation. ˆ ˆ The plug-in estimator of = T(F) is: n T( F ) n ˆ  Simply use instead of F to calculate the statistics T of interest. F n IR&DM, WS'11/12 October 25, 2011 II.8

Histograms as Density Estimators Instead of the full empirical distribution, often compact data synopses may be used, such as histograms where X 1 , ...,X n are grouped into m cells (buckets) c 1 , ..., c m with bucket boundaries lb(c i ) and ub(c i ) s.t. lb(c 1 ) = , ub(c m ) = , ub(c i ) = lb(c i+1 ) for 1 i<m , and 1 ˆ n f ( x ) I ( lb ( c ) X ub ( c )) freq f (c i ) = n i v i v 1 n 1 ˆ n freq F (c i ) = F ( x ) I ( X ub ( c )) n v i v 1 n Example: 2 3 5 ˆ n f X (x) 1 2 3 X 1 = 1 20 20 20 X 2 = 1 4 3 2 1 X 3 = 2 4 5 6 7 5/20 20 20 20 20 X 4 = 2 4/20 X 5 = 2 3 . 65 3/20 3/20 X 6 = 3 2/20 2/20 … 1/20 x 1 2 3 4 5 6 7 X 20 =7 Histograms provide a (discontinuous) density estimator . IR&DM, WS'11/12 October 25, 2011 II.9

Parametric Inference (1): Method of Moments Suppose parameter θ = ( θ 1 ,…,θ k ) has k components. j j ( ) E [ X ] x f ( x ) dx Compute j-th moment: j j X 1 n ˆ j-th sample moment: for 1 ≤ j ≤ k j X j i i 1 n ˆ Estimate parameter by method-of-moments estimator s.t. n ˆ ˆ ( ) 1 n 1 ˆ ˆ and ( ) 2 n 2 … … ˆ ˆ and (for the first k moments) ( ) k n k  Solve equation system with k equations and k unknowns. Method-of-moments estimators are usually consistent and asymptotically Normal , but may be biased . IR&DM, WS'11/12 October 25, 2011 II.10

Parametric Inference (2): Maximum Likelihood Estimators (MLE) Let X 1 ,...,X n be iid. with pdf f(x; θ ). Estimate parameter of a postulated distribution f(x; ) such that the likelihood that the sample values x 1 ,...,x n are generated by this distribution is maximized. Maximum likelihood estimation: Maximize L(x 1 ,...,x n ; ) ≈ P[x 1 , ...,x n originate from f(x; )] Usually formulated as L n ( ) = ∏ i f(X i ; ) Or (alternatively) Maximize l n ( ) = log L n ( ) ˆ The value that maximizes L n ( ) is the MLE of . n If analytically untractable use numerical iteration methods IR&DM, WS'11/12 October 25, 2011 II.11

Simple Example for Maximum Likelihood Estimator Given: • Coin toss experiment (Bernoulli distribution) with unknown parameter p for seeing heads, 1-p for tails • Sample (data): h times head with n coin tosses Want: Maximum likelihood estimation of p n n X 1 X h n h Let L(h, n, p) f ( X ; p ) p ( 1 p ) p ( 1 p ) i i i i 1 i 1 with h = ∑ i X i Maximize log-likelihood function: log L (h, n, p) h log( p ) ( n h ) log( 1 p ) h ln L h n h p 0 n p p 1 p IR&DM, WS'11/12 October 25, 2011 II.12

MLE for Parameters of Normal Distributions 2 n ( x ) n 1 i 2 2 L ( x ,..., x , , ) e 2 1 n 2 i 1 n ln( L ) 1 2( x ) 0 i 2 i 1 2 n ln( L ) n 1 2 ( x ) 0 i 2 2 4 2 2 i 1 n n 1 1 ˆ ˆ ˆ 2 2 x ( x ) i i n n i 1 i 1 IR&DM, WS'11/12 October 25, 2011 II.13

MLE Properties Maximum Likelihood estimators are consistent , asymptotically Normal , and asymptotically optimal (i.e., efficient ) in the following sense: Consider two estimators U and T which are asymptotically Normal. Let u 2 and t 2 denote the variances of the two Normal distributions to which U and T converge in probability. The asymptotic relative efficiency of U to T is ARE(U,T) := t 2 /u 2 . ˆ Theorem: For an MLE and any other estimator n n the following inequality holds: ˆ ARE( , ) 1 n n That is, among all estimators MLE has the smallest variance. IR&DM, WS'11/12 October 25, 2011 II.14

Bayesian Viewpoint of Parameter Estimation • Assume prior distribution g( ) of parameter • Choose statistical model ( generative model ) f (x | ) that reflects our beliefs about RV X • Given RVs X 1 ,...,X n for the observed data, the posterior distribution is h ( | x 1 ,...,x n ) For X 1 = x 1 , ... ,X n = x n the likelihood is h ( | x ) f ( x | ' ) g ( ' ) n n i i ' L ( x ... x , ) f ( x | ) 1 n i i 1 i 1 g ( ) which implies (posterior is proportional to h ( | x ... x ) ~ L ( x ... x , ) g ( ) likelihood times prior) 1 n 1 n MAP estimator (maximum a posteriori): Compute that maximizes h( | x 1 , …, x n ) given a prior for . IR&DM, WS'11/12 October 25, 2011 II.15

Analytically Non-tractable MLE for parameters of Multivariate Normal Mixture Consider samples from a k -mixture of m -dimensional Normal distributions with the density (e.g. height and weight of males and females):    f ( x , ,..., , ,..., , ,..., ) 1 k 1 k 1 k     1 k   k 1 T 1 ( x ) ( x ) j j j n ( x , , ) e 2 j j j j m ( 2 ) j 1 j 1 j  with expectation values j and invertible, positive definite, symmetric m m covariance matrices j Maximize log-likelihood function: n    n k   log L ( x ,..., x , ) : log P [ x | ] log n ( x , , ) 1 n i j i j j i 1 i 1 j 1 IR&DM, WS'11/12 October 25, 2011 II.16

II.2 Statistical Inference: Sampling and Estimation A statistical - PowerPoint PPT Presentation

II.2 Statistical Inference: Sampling and Estimation A statistical model is a set of distributions (or regression functions), e.g., all uni-modal, smooth distributions. is called a parametric model if it can be completely described by a

Sampling Methods Oliver Schulte - CMPT 419/726 Bishop PRML Ch. 11 Sampling Rejection Sampling

What is the strengths and weakness of these sampling methods? Sampling Strengths /

Chapter 7. Sampling Chapter 7. Sampling methods? methods? Two types of sampling methods Two

Multiple importance sampling Slides for CS6630 lecture 6 sampling the BRDF sampling the

Sampling Methods CMSC 678 UMBC Outline Recap Monte Carlo methods Sampling Techniques Uniform

Statistical Natural Language Processing Statistical models: learning, inference, estimation,

Sampling Sediment and Sampling Sediment and Sampling Sediment and Porewater Sampling Sediment

Sampling Overview R toy sampling Non-probability sampling Probability Methods (AKA random)

Sampling and Inference Sampling and Inference The Quality of Data and Measures 2012 1 Why do we

Sampling Methods CMSC 691 UMBC (Some) Learning Techniques MAP/MLE: Point estimation, basic EM

UQ, STAT2201, 2017, Lecture 6 Unit 6 Statistical Inference Ideas. 1 Statistical Inference is

Double, Multiple, and Sequential Sampling Double-sampling In a double-sampling plan, a first

STAT 401A - Statistical Methods for Research Workers Statistical Inference Jarad Niemi (Dr. J)

Newfound Water Quality Sampling: In Lake Sampling 8 Historic Sampling locations

Sampling Distributions Sampling Distribution of the Mean & Hypothesis Testing Sampling

Overview of Sampling Topics (Shannon) sampling theorem Impulse-train sampling

Evaluation INFM 718X/LBSC 718X Session 6 Douglas W. Oard Evaluation Criteria Effectiveness

M A R K E T FA I LU R E S PMAP 8141: Economy, Society, and Public Policy October 31, 2019 Fill

Q1 Fiscal 2017 Results May 9, 2017 Cautionary Statements Forward-Looking Statements This

WIRRAL PLAN 2020: ENVIRONMENT WIRRAL PLAN 2020 OVERVIEW DELIVERY PLAN (PHASE ONE) Shows

Estimating Frequency Moments of Streams In this class we will look at the two simple sketches for

Statistics Point Estimation Shiu-Sheng Chen Department of Economics National Taiwan University

Lecture 3. Fitting Distributions to data - choice of a model. Igor Rychlik Chalmers Department

An Introduction to Spectral Learning Hanxiao Liu November 8, 2013 An Introduction to Spectral

II.2 Statistical Inference: Sampling and Estimation A statistical - PowerPoint PPT Presentation

II.2 Statistical Inference: Sampling and Estimation A statistical model is a set of distributions (or regression functions), e.g., all uni-modal, smooth distributions. is called a parametric model if it can be completely described by a

Sampling Methods Oliver Schulte - CMPT 419/726 Bishop PRML Ch. 11 Sampling Rejection Sampling

What is the strengths and weakness of these sampling methods? Sampling Strengths /

Chapter 7. Sampling Chapter 7. Sampling methods? methods? Two types of sampling methods Two

Multiple importance sampling Slides for CS6630 lecture 6 sampling the BRDF sampling the

Sampling Methods CMSC 678 UMBC Outline Recap Monte Carlo methods Sampling Techniques Uniform

Statistical Natural Language Processing Statistical models: learning, inference, estimation,

Sampling Sediment and Sampling Sediment and Sampling Sediment and Porewater Sampling Sediment

Sampling Overview R toy sampling Non-probability sampling Probability Methods (AKA random)

Sampling and Inference Sampling and Inference The Quality of Data and Measures 2012 1 Why do we

Sampling Methods CMSC 691 UMBC (Some) Learning Techniques MAP/MLE: Point estimation, basic EM

UQ, STAT2201, 2017, Lecture 6 Unit 6 Statistical Inference Ideas. 1 Statistical Inference is

Double, Multiple, and Sequential Sampling Double-sampling In a double-sampling plan, a first

STAT 401A - Statistical Methods for Research Workers Statistical Inference Jarad Niemi (Dr. J)

Newfound Water Quality Sampling: In Lake Sampling 8 Historic Sampling locations

Sampling Distributions Sampling Distribution of the Mean &amp; Hypothesis Testing Sampling

Overview of Sampling Topics (Shannon) sampling theorem Impulse-train sampling

Evaluation INFM 718X/LBSC 718X Session 6 Douglas W. Oard Evaluation Criteria Effectiveness

M A R K E T FA I LU R E S PMAP 8141: Economy, Society, and Public Policy October 31, 2019 Fill

Q1 Fiscal 2017 Results May 9, 2017 Cautionary Statements Forward-Looking Statements This

WIRRAL PLAN 2020: ENVIRONMENT WIRRAL PLAN 2020 OVERVIEW DELIVERY PLAN (PHASE ONE) Shows

Estimating Frequency Moments of Streams In this class we will look at the two simple sketches for

Statistics Point Estimation Shiu-Sheng Chen Department of Economics National Taiwan University

Lecture 3. Fitting Distributions to data - choice of a model. Igor Rychlik Chalmers Department

An Introduction to Spectral Learning Hanxiao Liu November 8, 2013 An Introduction to Spectral

Sampling Distributions Sampling Distribution of the Mean & Hypothesis Testing Sampling