chapter ii basics from linear algebra probability theory
play

Chapter II: Basics from Linear Algebra, Probability Theory, and - PowerPoint PPT Presentation

Chapter II: Basics from Linear Algebra, Probability Theory, and Statistics Information Retrieval & Data Mining Universitt des Saarlandes, Saarbrcken Wintersemester 2013/14 Chapter II II.1 Linear Algebra Vectors, Matrices,


  1. Chapter II: Basics from Linear Algebra, 
 Probability Theory, and Statistics Information Retrieval & Data Mining Universität des Saarlandes, Saarbrücken Wintersemester 2013/14

  2. Chapter II II.1 Linear Algebra 
 Vectors, Matrices, Eigenvalues, Eigenvectors, 
 Singular Value Decomposition II.2 Probability Theory 
 Events, Probabilities, Random Variables, Distributions, 
 Bounds, Limit Theorems II.3 Statistical Inference 
 Parameter Estimation, Confidence Intervals, Hypothesis Testing � IR&DM ’13/’14 � 2

  3. 
 
 
 
 
 
 
 
 
 
 
 II.3 Statistical Inference 1. Parameter Estimation 2. Confidence Intervals 3. Hypothesis Testing 
 Based on LW Chapters 6, 7, 9, 10 IR&DM ’13/’14 � 3

  4. Statistical Model • A statistical model M is a set of distributions (or regression functions), e.g., all unimodal smooth distributions • M is called a parametric model if it can be completely described by a finite number of parameters, e.g., the family of Normal distributions for a finite number of parameters µ and σ ⇢ � 1 2 π σ e − ( x − µ )2 M = f X ( x ; µ, σ ) = | µ ∈ R , σ > 0 2 σ 2 √ IR&DM ’13/’14 � 4

  5. Statistical Inference • Given a parametric model M and a sample X 1 ,…, X m , 
 how do we infer (learn) the parameters of M ? • For multivariate models with observed variable X and 
 response variable Y , this is called prediction or regression , 
 for a discrete outcome variable this is also called classification IR&DM ’13/’14 � 5

  6. Idea of Sampling Samples 
 X 1 , …, X m 
 Distribution X 
 (e.g., people) (population of interest) Statistical Inference What can we say about X based on X 1 , …, X m ? • Example: Suppose we want to estimate the average salary of employees in German companies • Sample 1: Suppose we look at n = 200 top-paid CEOs of major banks • Sample 2: Suppose we look at n = 1,000 employees across all sectors IR&DM ’13/’14 � 6

  7. Basic Types of Statistical Inference • Given independent and identically distributed (iid.) 
 samples X 1 , …, X n ~ X of an unknown distribution X • e.g.: n single-coin-toss experiments X 1 , …, X n ~ Bernoulli( p ) • Parameter estimation • e.g.: what is the parameter p of Bernoulli( p )? 
 what is E [ X ], the cdf F X of X , the pdf f X of X , etc.? • Confidence intervals • e.g.: give me all values C = [ a , b ] such that P [ p ∈ C ] ≥ 0.95 
 with interval boundaries a and b derived from samples X 1 , …, X n • Hypothesis testing • e.g.: H 0 : p = 1/2 (i.e., coin is fair) vs. H 1 : p ⧧ 1/2 IR&DM ’13/’14 � 7

  8. 
 1. Parameter Estimation • A point estimator for a parameter θ of a probability distribution X is a random variable θ derived from an iid. sample X 1 , …, X n ˆ θ n • Examples: n X := 1 X • Sample mean 
 ¯ X i n i =1 n 1 ( X i − ¯ X S 2 X ) 2 • Sample variance X := n − 1 i =1 � E [ˆ ˆ • An estimator for parameter θ is unbiased if 
 θ n θ n ] = θ E [ˆ otherwise the estimator has bias θ n ] − θ • An estimator on sample size n is consistent if n →∞ P [ | ˆ lim ✓ n − ✓ | < ✏ ] = 1 for any ✏ > 0 IR&DM ’13/’14 � 8

  9. 
 Estimation Error • Let be an estimator for parameter θ over iid. samples X 1 , …, X n ˆ θ n ˆ • The distribution of is called sampling distribution θ n q se (ˆ V ar (ˆ ˆ • The standard error for is: θ ) = θ n ) θ n ˆ • The mean squared error (MSE) for is: θ n MSE (ˆ θ n ) = E [(ˆ θ n − θ ) 2 ] = bias 2 (ˆ θ n ) + V ar (ˆ θ n ) � ˆ • The estimator is asymptotically Normal if 
 θ n (ˆ converges in distribution to N(0,1) 
 θ n − θ ) /se IR&DM ’13/’14 � 9

  10. Types of Estimation • Non-Parametric Estimation • no assumptions about the model M nor the parameters θ 
 of the underlying distribution X • e.g.: “plug-in estimators” (e.g., histograms) to approximate X • Parametric Estimation • requires assumptions about the model M and the parameters θ 
 of the underlying distribution X • analytical or numerical methods for estimating θ • Method of Moments • Maximum Likelihood • Expectation Maximization (EM) IR&DM ’13/’14 � 10

  11. 
 
 
 Empirical Distribution Function ˆ • The empirical distribution function is the cdf that puts 
 F n probability mass 1/ n at each data point X i 
 n F n ( x ) = 1 ˆ X I ( X i ≤ x ) n i =1 with indicator function ⇢ 1 : X i ≤ x � I ( X i ≤ x ) = 0 : X i > x � • A statistical function (“statistics”) T ( F ) is any function over F , e.g., mean, variance, skewness, median, quantiles, correlation ˆ θ n = T ( ˆ • The plug-in estimator of θ = T ( F ) is F n ) IR&DM ’13/’14 � 11

  12. 
 
 
 
 
 Histograms as Density Estimators • Instead of the full empirical distribution, often compact synopses 
 can be used, such as histograms where X 1 , …, X n are grouped 
 into m cells (buckets) c 1 , …, c m with 
 bucket boundaries lb ( c i ) and ub ( c i ) 
 lb ( c 1 ) = −∞ , ub ( c m ) = ∞ , ub ( c i − 1 ) = lb ( c i ) for (1 ≤ i ≤ m ) , and freq f ( c i ) = ˆ P n f n ( x ) = 1 j =1 I ( lb ( c i ) < X j ≤ ub ( c i )) n freq F ( c i ) = ˆ P n F n ( x ) = 1 j =1 I ( X j ≤ ub ( c i )) n • Example: 
 X 1 = X 2 = 1 
 1 × 2 20 + 2 × 3 20 + . . . + 7 × 1 ˆ = f X (x) µ n 20 X 3 = X 4 = X 5 = 2 
 = 3 . 65 X 6 = … X 10 = 3 
 5/20 X 11 = … X 14 = 4 
 4/20 X 15 = … X 17 = 5 
 3/20 3/20 X 18 = X 19 = 6 
 2/20 2/20 1/20 X 20 = 7 x 1 2 3 4 5 6 7 IR&DM ’13/’14 � 12

  13. 
 
 
 Method of Moments • Suppose parameter θ = ( θ 1 , …, θ k ) has k components • Compute j -th moment for 1 ≤ j ≤ k : 
 Z + ∞ x j f X ( x ) dx α j = α j ( θ ) = E θ [ X j ] = −∞ • Compute j -th sample moment for 1 ≤ j ≤ k : 
 n α j = 1 X X j ˆ i n i =1 • Method-of-moments estimate of θ is obtained by solving a system of k equations in k unknowns α 1 (ˆ θ n ) = ˆ α 1 . . . α k (ˆ θ n ) = ˆ α k IR&DM ’13/’14 � 13

  14. 
 
 
 
 
 
 Method of Moments (Example) • Let X 1 , …, X n ~ Normal( µ , σ 2 ). α 1 = E θ [ X ] = µ � α 2 = E θ [ X 2 ] = V ar ( X ) + ( E [ X ]) 2 = σ 2 + µ 2 � • By solving the system of 2 equations in 2 unknowns 
 n µ = 1 X X i ˆ n i =1 n µ 2 = 1 σ 2 + ˆ X X 2 ˆ i n i =1 we obtain as solutions n σ 2 = 1 ( X i − ¯ X µ = ¯ X n ) 2 ˆ X n ˆ n i =1 IR&DM ’13/’14 � 14

  15. Maximum Likelihood • Let X 1 , …, X n be iid. with pdf f ( x ; θ ) • Estimate parameter θ of a postulated distribution f ( x ; θ ) such that 
 the likelihood that the sample values x 1 , …, x n are generated by 
 the distribution are maximized • Maximize L ( x 1 , …, x n , θ ) ≈ P[ x 1 , …, x n originate from f ( x ; θ )] • Usually formulated as: n � Y L n [ θ ] = f ( X i , θ ) arg max θ i =1 � ˆ • The value that maximizes L n [ θ ] is called the 
 θ maximum-likelihood estimate (MLE) of θ • If analytically intractable, MLE can be determined using numerical iteration methods IR&DM ’13/’14 � 15

  16. Maximum Likelihood (Example) • Let X 1 , …, X n ~ Bernoulli( p ) (corresponding to n coin tosses) • Assume that we observed h times head and ( n - h ) times tail • Maximum-likelihood estimation of parameter p n n p X i (1 − p ) 1 − X i = p h (1 − p ) ( n − h ) � Y Y L [ h, n, p ] = f ( X i ; p ) = i =1 i =1 � • Maximize log-likelihood function log L [ h, n, p ] = h × log ( p ) + ( n − h ) × log (1 − p ) ∂ L ∂ p = h p − n − h p = h 1 − p = 0 ⇒ n IR&DM ’13/’14 � 16

  17. Maximum Likelihood for Normal Distributions n ◆ n ✓ 1 e − ( xi − µ )2 Y L ( x 1 , . . . , x n , µ, σ 2 ) = 2 σ 2 √ 2 πσ i =1 n ∂ µ = − 1 ∂ L X 2 ( x i − σ ) = 0 2 σ 2 i =1 n 1 ∂ L ∂σ 2 = − n ( x i − µ ) 2 = 0 X 2 σ 2 + 2 σ 4 i =1 n n µ = 1 σ 2 = 1 X X µ ) 2 ⇒ ˆ ( x i − ˆ x i n n i =1 i =1 IR&DM ’13/’14 � 17

  18. 
 
 2. Confidence Intervals • Determine interval estimator T for parameter θ such that 
 P [ T − a ≤ θ ≤ T + a ] = 1 − α T ± a is the confidence interval and 1- α the confidence level • For the distribution of a random variable X , a value x γ (0 < γ < 1) 
 is with P [ X ≤ x γ ] ≥ γ and P [ X ≥ x γ ] ≥ 1- γ is called γ -quantile • the 0.5-quantile is known as median • for the standard Normal distribution N(0,1) the γ -quantile is denoted Φ γ • For a given a or α , find a value z of N(0,1) 
 that denotes the [T-a,T+a] confidence 
 interval or a corresponding γ -quantile 
 for 1- α IR&DM ’13/’14 � 18

  19. Confidence Intervals for Expectations (I) • Let X 1 , …, X n be a sample from a distribution X with unknown 
 expectation µ and known variance σ 2 • For sufficiently large n, the sample mean is N( µ, σ 2 /n ) ¯ X distributed and P [ − z ≤ ( ¯ X − µ ) √ n ≤ z ] = Φ ( z ) − Φ ( − z ) σ = Φ ( z ) − (1 − Φ ( z )) = 2 Φ ( z ) − 1 P [ ¯ √ n ≤ µ ≤ ¯ X − z σ X + z σ = √ n ] Φ 1 − α / 2 σ Φ 1 − α / 2 σ P [ ¯ ≤ µ ≤ ¯ X + ] = 1 − α X − ⇒ √ n √ n IR&DM ’13/’14 � 19

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend