Chapter II: Basics from Linear Algebra, Probability Theory, and - PowerPoint PPT Presentation

Chapter II: Basics from Linear Algebra,   Probability Theory, and Statistics Information Retrieval & Data Mining Universität des Saarlandes, Saarbrücken Wintersemester 2013/14

Chapter II II.1 Linear Algebra   Vectors, Matrices, Eigenvalues, Eigenvectors,   Singular Value Decomposition II.2 Probability Theory   Events, Probabilities, Random Variables, Distributions,   Bounds, Limit Theorems II.3 Statistical Inference   Parameter Estimation, Confidence Intervals, Hypothesis Testing � IR&DM ’13/’14 � 2

                      II.3 Statistical Inference 1. Parameter Estimation 2. Confidence Intervals 3. Hypothesis Testing   Based on LW Chapters 6, 7, 9, 10 IR&DM ’13/’14 � 3

Statistical Model • A statistical model M is a set of distributions (or regression functions), e.g., all unimodal smooth distributions • M is called a parametric model if it can be completely described by a finite number of parameters, e.g., the family of Normal distributions for a finite number of parameters µ and σ ⇢ � 1 2 π σ e − ( x − µ )2 M = f X ( x ; µ, σ ) = | µ ∈ R , σ > 0 2 σ 2 √ IR&DM ’13/’14 � 4

Statistical Inference • Given a parametric model M and a sample X 1 ,…, X m ,   how do we infer (learn) the parameters of M ? • For multivariate models with observed variable X and   response variable Y , this is called prediction or regression ,   for a discrete outcome variable this is also called classification IR&DM ’13/’14 � 5

Idea of Sampling Samples   X 1 , …, X m   Distribution X   (e.g., people) (population of interest) Statistical Inference What can we say about X based on X 1 , …, X m ? • Example: Suppose we want to estimate the average salary of employees in German companies • Sample 1: Suppose we look at n = 200 top-paid CEOs of major banks • Sample 2: Suppose we look at n = 1,000 employees across all sectors IR&DM ’13/’14 � 6

Basic Types of Statistical Inference • Given independent and identically distributed (iid.)   samples X 1 , …, X n ~ X of an unknown distribution X • e.g.: n single-coin-toss experiments X 1 , …, X n ~ Bernoulli( p ) • Parameter estimation • e.g.: what is the parameter p of Bernoulli( p )?   what is E [ X ], the cdf F X of X , the pdf f X of X , etc.? • Confidence intervals • e.g.: give me all values C = [ a , b ] such that P [ p ∈ C ] ≥ 0.95   with interval boundaries a and b derived from samples X 1 , …, X n • Hypothesis testing • e.g.: H 0 : p = 1/2 (i.e., coin is fair) vs. H 1 : p ⧧ 1/2 IR&DM ’13/’14 � 7

  1. Parameter Estimation • A point estimator for a parameter θ of a probability distribution X is a random variable θ derived from an iid. sample X 1 , …, X n ˆ θ n • Examples: n X := 1 X • Sample mean   ¯ X i n i =1 n 1 ( X i − ¯ X S 2 X ) 2 • Sample variance X := n − 1 i =1 � E [ˆ ˆ • An estimator for parameter θ is unbiased if   θ n θ n ] = θ E [ˆ otherwise the estimator has bias θ n ] − θ • An estimator on sample size n is consistent if n →∞ P [ | ˆ lim ✓ n − ✓ | < ✏ ] = 1 for any ✏ > 0 IR&DM ’13/’14 � 8

  Estimation Error • Let be an estimator for parameter θ over iid. samples X 1 , …, X n ˆ θ n ˆ • The distribution of is called sampling distribution θ n q se (ˆ V ar (ˆ ˆ • The standard error for is: θ ) = θ n ) θ n ˆ • The mean squared error (MSE) for is: θ n MSE (ˆ θ n ) = E [(ˆ θ n − θ ) 2 ] = bias 2 (ˆ θ n ) + V ar (ˆ θ n ) � ˆ • The estimator is asymptotically Normal if   θ n (ˆ converges in distribution to N(0,1)   θ n − θ ) /se IR&DM ’13/’14 � 9

Types of Estimation • Non-Parametric Estimation • no assumptions about the model M nor the parameters θ   of the underlying distribution X • e.g.: “plug-in estimators” (e.g., histograms) to approximate X • Parametric Estimation • requires assumptions about the model M and the parameters θ   of the underlying distribution X • analytical or numerical methods for estimating θ • Method of Moments • Maximum Likelihood • Expectation Maximization (EM) IR&DM ’13/’14 � 10

      Empirical Distribution Function ˆ • The empirical distribution function is the cdf that puts   F n probability mass 1/ n at each data point X i   n F n ( x ) = 1 ˆ X I ( X i ≤ x ) n i =1 with indicator function ⇢ 1 : X i ≤ x � I ( X i ≤ x ) = 0 : X i > x � • A statistical function (“statistics”) T ( F ) is any function over F , e.g., mean, variance, skewness, median, quantiles, correlation ˆ θ n = T ( ˆ • The plug-in estimator of θ = T ( F ) is F n ) IR&DM ’13/’14 � 11

          Histograms as Density Estimators • Instead of the full empirical distribution, often compact synopses   can be used, such as histograms where X 1 , …, X n are grouped   into m cells (buckets) c 1 , …, c m with   bucket boundaries lb ( c i ) and ub ( c i )   lb ( c 1 ) = −∞ , ub ( c m ) = ∞ , ub ( c i − 1 ) = lb ( c i ) for (1 ≤ i ≤ m ) , and freq f ( c i ) = ˆ P n f n ( x ) = 1 j =1 I ( lb ( c i ) < X j ≤ ub ( c i )) n freq F ( c i ) = ˆ P n F n ( x ) = 1 j =1 I ( X j ≤ ub ( c i )) n • Example:   X 1 = X 2 = 1   1 × 2 20 + 2 × 3 20 + . . . + 7 × 1 ˆ = f X (x) µ n 20 X 3 = X 4 = X 5 = 2   = 3 . 65 X 6 = … X 10 = 3   5/20 X 11 = … X 14 = 4   4/20 X 15 = … X 17 = 5   3/20 3/20 X 18 = X 19 = 6   2/20 2/20 1/20 X 20 = 7 x 1 2 3 4 5 6 7 IR&DM ’13/’14 � 12

      Method of Moments • Suppose parameter θ = ( θ 1 , …, θ k ) has k components • Compute j -th moment for 1 ≤ j ≤ k :   Z + ∞ x j f X ( x ) dx α j = α j ( θ ) = E θ [ X j ] = −∞ • Compute j -th sample moment for 1 ≤ j ≤ k :   n α j = 1 X X j ˆ i n i =1 • Method-of-moments estimate of θ is obtained by solving a system of k equations in k unknowns α 1 (ˆ θ n ) = ˆ α 1 . . . α k (ˆ θ n ) = ˆ α k IR&DM ’13/’14 � 13

            Method of Moments (Example) • Let X 1 , …, X n ~ Normal( µ , σ 2 ). α 1 = E θ [ X ] = µ � α 2 = E θ [ X 2 ] = V ar ( X ) + ( E [ X ]) 2 = σ 2 + µ 2 � • By solving the system of 2 equations in 2 unknowns   n µ = 1 X X i ˆ n i =1 n µ 2 = 1 σ 2 + ˆ X X 2 ˆ i n i =1 we obtain as solutions n σ 2 = 1 ( X i − ¯ X µ = ¯ X n ) 2 ˆ X n ˆ n i =1 IR&DM ’13/’14 � 14

Maximum Likelihood • Let X 1 , …, X n be iid. with pdf f ( x ; θ ) • Estimate parameter θ of a postulated distribution f ( x ; θ ) such that   the likelihood that the sample values x 1 , …, x n are generated by   the distribution are maximized • Maximize L ( x 1 , …, x n , θ ) ≈ P[ x 1 , …, x n originate from f ( x ; θ )] • Usually formulated as: n � Y L n [ θ ] = f ( X i , θ ) arg max θ i =1 � ˆ • The value that maximizes L n [ θ ] is called the   θ maximum-likelihood estimate (MLE) of θ • If analytically intractable, MLE can be determined using numerical iteration methods IR&DM ’13/’14 � 15

Maximum Likelihood (Example) • Let X 1 , …, X n ~ Bernoulli( p ) (corresponding to n coin tosses) • Assume that we observed h times head and ( n - h ) times tail • Maximum-likelihood estimation of parameter p n n p X i (1 − p ) 1 − X i = p h (1 − p ) ( n − h ) � Y Y L [ h, n, p ] = f ( X i ; p ) = i =1 i =1 � • Maximize log-likelihood function log L [ h, n, p ] = h × log ( p ) + ( n − h ) × log (1 − p ) ∂ L ∂ p = h p − n − h p = h 1 − p = 0 ⇒ n IR&DM ’13/’14 � 16

Maximum Likelihood for Normal Distributions n ◆ n ✓ 1 e − ( xi − µ )2 Y L ( x 1 , . . . , x n , µ, σ 2 ) = 2 σ 2 √ 2 πσ i =1 n ∂ µ = − 1 ∂ L X 2 ( x i − σ ) = 0 2 σ 2 i =1 n 1 ∂ L ∂σ 2 = − n ( x i − µ ) 2 = 0 X 2 σ 2 + 2 σ 4 i =1 n n µ = 1 σ 2 = 1 X X µ ) 2 ⇒ ˆ ( x i − ˆ x i n n i =1 i =1 IR&DM ’13/’14 � 17

    2. Confidence Intervals • Determine interval estimator T for parameter θ such that   P [ T − a ≤ θ ≤ T + a ] = 1 − α T ± a is the confidence interval and 1- α the confidence level • For the distribution of a random variable X , a value x γ (0 < γ < 1)   is with P [ X ≤ x γ ] ≥ γ and P [ X ≥ x γ ] ≥ 1- γ is called γ -quantile • the 0.5-quantile is known as median • for the standard Normal distribution N(0,1) the γ -quantile is denoted Φ γ • For a given a or α , find a value z of N(0,1)   that denotes the [T-a,T+a] confidence   interval or a corresponding γ -quantile   for 1- α IR&DM ’13/’14 � 18

Confidence Intervals for Expectations (I) • Let X 1 , …, X n be a sample from a distribution X with unknown   expectation µ and known variance σ 2 • For sufficiently large n, the sample mean is N( µ, σ 2 /n ) ¯ X distributed and P [ − z ≤ ( ¯ X − µ ) √ n ≤ z ] = Φ ( z ) − Φ ( − z ) σ = Φ ( z ) − (1 − Φ ( z )) = 2 Φ ( z ) − 1 P [ ¯ √ n ≤ µ ≤ ¯ X − z σ X + z σ = √ n ] Φ 1 − α / 2 σ Φ 1 − α / 2 σ P [ ¯ ≤ µ ≤ ¯ X + ] = 1 − α X − ⇒ √ n √ n IR&DM ’13/’14 � 19

Chapter II: Basics from Linear Algebra, Probability Theory, and - PowerPoint PPT Presentation

Chapter II: Basics from Linear Algebra, Probability Theory, and Statistics Information Retrieval & Data Mining Universitt des Saarlandes, Saarbrcken Wintersemester 2013/14 Chapter II II.1 Linear Algebra Vectors, Matrices,

Probability Basics Martin Emms October 1, 2020 Probability Basics Outline Probability

Probability Basics Probability Background Martin Emms October 1, 2020 Probability Basics

Quick Tour of Basic Probability Theory and Linear Algebra CS224w: Social and Information Network

Chapter 1 What is Linear Algebra? Chapter 1 What is Linear Algebra? The study of linear

Probability Basics Probabilistic Inference Martin Emms October 1, 2020 Probability Basics

Basics of Probability Basics of Probability Janyl Jumadinova February 2426, 2020 Janyl

5.1 Basic Operations Chapter 5: Algebra 2 Chapter 5: Algebra

Chapter 2 Probability 1. Definition of Probability 2. Probability of disjoint events 3.

Chapter 2 Probability 1. Definition of Probability 2. Probability of disjoint events 3.

Which probability Which probability Which probability Which probability theory for cosmology?

Recap of Basic Probability Elements of basic probability theory probability theory The

Chapter 2: Basics from Probability Theory and Statistics 2.1 Probability Theory Events,

1. Boolean Algebra 1.1 Boolean Algebra Basics Verification Technology AND-operation

Graphics 2014 Linear Algebra II Linear Maps & Matrices Linear Maps & Matrices CORE

Continuing Probability. Wrap up: Total Probability and Conditional Probability. Continuing

MATRICES AND LINEAR ALGEBRA Linear Algebra Matrix manipulation is the original essence of

Even Simpler Standard Errors for Two-Stage Optimization Estimators: Mata Implementation via the

Introd u cing the dataset IN TR OD U C TION TO P YTH ON FOR FIN AN C E Adina Ho w e Instr u

New Jersey Electric Vehicle Infrastructure Stakeholder Group Meeting #4 Predecisional Draft Mike

GlideinWMS Marco Mambelli Stakeholders Meeting July 11, 2018 Overview Releases since last

Reca Recall: Econom Economics cs Goa oal in n Li Life fe Pi Pigging Ou Out (consump

Sustainable Human Development Index a pragmatic proposal for monitoring sustainability within

12/2/2013 The Common Core State Standards and Students with Moderate/Severe Disabilities Using

A Light-Weight Virtual Machine Monitor for Blue Gene/P KIT University of the State of

Chapter II: Basics from Linear Algebra, Probability Theory, and - PowerPoint PPT Presentation

Chapter II: Basics from Linear Algebra, Probability Theory, and Statistics Information Retrieval & Data Mining Universitt des Saarlandes, Saarbrcken Wintersemester 2013/14 Chapter II II.1 Linear Algebra Vectors, Matrices,

Probability Basics Martin Emms October 1, 2020 Probability Basics Outline Probability

Probability Basics Probability Background Martin Emms October 1, 2020 Probability Basics

Quick Tour of Basic Probability Theory and Linear Algebra CS224w: Social and Information Network

Chapter 1 What is Linear Algebra? Chapter 1 What is Linear Algebra? The study of linear

Probability Basics Probabilistic Inference Martin Emms October 1, 2020 Probability Basics

Basics of Probability Basics of Probability Janyl Jumadinova February 2426, 2020 Janyl

5.1 Basic Operations Chapter 5: Algebra 2 Chapter 5: Algebra

Chapter 2 Probability 1. Definition of Probability 2. Probability of disjoint events 3.

Chapter 2 Probability 1. Definition of Probability 2. Probability of disjoint events 3.

Which probability Which probability Which probability Which probability theory for cosmology?

Recap of Basic Probability Elements of basic probability theory probability theory The

Chapter 2: Basics from Probability Theory and Statistics 2.1 Probability Theory Events,

1. Boolean Algebra 1.1 Boolean Algebra Basics Verification Technology AND-operation

Graphics 2014 Linear Algebra II Linear Maps &amp; Matrices Linear Maps &amp; Matrices CORE

Continuing Probability. Wrap up: Total Probability and Conditional Probability. Continuing

MATRICES AND LINEAR ALGEBRA Linear Algebra Matrix manipulation is the original essence of

Even Simpler Standard Errors for Two-Stage Optimization Estimators: Mata Implementation via the

Introd u cing the dataset IN TR OD U C TION TO P YTH ON FOR FIN AN C E Adina Ho w e Instr u

New Jersey Electric Vehicle Infrastructure Stakeholder Group Meeting #4 Predecisional Draft Mike

GlideinWMS Marco Mambelli Stakeholders Meeting July 11, 2018 Overview Releases since last

Reca Recall: Econom Economics cs Goa oal in n Li Life fe Pi Pigging Ou Out (consump

Sustainable Human Development Index a pragmatic proposal for monitoring sustainability within

12/2/2013 The Common Core State Standards and Students with Moderate/Severe Disabilities Using

A Light-Weight Virtual Machine Monitor for Blue Gene/P KIT University of the State of

Graphics 2014 Linear Algebra II Linear Maps & Matrices Linear Maps & Matrices CORE