Review of Estimation Theory Berlin 2003 References: 1. X. Huang - PowerPoint PPT Presentation

Review of Estimation Theory Berlin 2003 References: 1. X. Huang et. al., Spoken Language Processing, Chapter 3

Introduction • Estimation theory is the most important theory and method in statistical inference • Statistical inference – Data generated in accordance with some unknown probability distribution must be analyzed – Some type of inference about the unknown distribution must be made like the characteristics (parameters) of the distribution generating the experimental data, the mean and variance etc. ( ) ( ) θ X { } The vector of g X Φ = X X , X ,..., X 1 2 n random variables estimator { } ( ) The vector of ( ) = x x , x ,..., x θ x g x Φ 1 2 n sample values estimate Φ :the parameters of the distribution 2

Introduction • Three common estimators (estimation methods) – Minimum mean square estimator • Estimate the random variable itself • Function approximation, curve fitting, … – Maximum likelihood estimator • Estimate the parameters of the distribution of the random variables – Bayes’ estimator • Estimate the parameters of the distribution of the random variables 3

Minimum Mean Square Error Estimation and Least Square Error Estimation • There are two random variables and . When X Y observing the value of , we want to find a X ( ) ˆ transform ( the parameter vectors of Y = Φ g X , Φ function ) to predict the value of g Y – Minimum Mean Square Error Estimation [ [ ] ] If the joint distribution ( ( ) ) 2 = − Φ arg min E Y g X , Φ ( ) Is known MMSE f X , Y Φ X , Y – Least Square Error Estimation ( ) x , i y When samples of i n [ ] ( ) ∑ pairs are observed 2 = − Φ arg min y g x , Φ LSE i i Φ = i 1 • Base on the law of large numbers, when the joint ( ) ˆ probability is uniform or the number of samples = Y f X , Y X , Y approaches to infinity, MMSE and LSE are equivalent 4

Minimum Mean Square Error Estimation and Least Square Error Estimation ( ) • Constant functions = g X c – MMSE -- LSE ( ) n [ ] ∇ − 2 = y c 0 ∑ ( ) 2 ∇ − = c i E Y c 0 = i 1 c [ ] 1 n ∴ = c E Y ∴ = c y ∑ MMSE LSE i n = sample mean mean i 1 ( ) = + • Linear functions g X aX b – MMSE [ ] [ ] ( ) [ ] [ ] 2 ∇ − + = E Y ( aX b ) 0 + − = 2 aE X bE X E XY 0 a [ ] [ ] [ ] ( ) ∇ − + 2 = + − = E Y ( aX b ) 0 aE X b E Y 0 b ( ) cov X , Y = a ( ) Var X σ [ ] [ ] = − ρ Y b E Y E X XY σ X 5

Minimum Mean Square Error Estimation and Least Square Error Estimation • Linear functions – LSE • Suppose that x are d-dimensional vectors and y are scalars c 0 c d c 1    y  1 d  a  1 x x L 1 0 1 1       1 d y a 1 x x L       ˆ = 2 = = 1 2 2 Y XA       M M M M M       y 1 d a  1 x x    L     n n n n ( ) n ( ) ∑ ˆ 2 = − = − t e A Y Y A x y i 1 = i 1 ( ) n ( ) ( ) ∑ ∇ = t − = t − = e A 2 A x y x 2 X XA Y 0 i i i = i 1 Y ⇒ = t t X XA X Y ( ) c 0 c 1 − ˆ 1 ⇒ = t t Y A X X X Y c d ..... 6

Maximum Likelihood Estimation (MLE/ML) • ML is the most widely used parametric estimation method { } • A set of random samples is to be drawn = X X , X ,..., X 1 2 n independently according to a distribution ( ) with the pdf p x Φ ( ) = x x , x ,..., x – Given a sequence of random samples the 1 2 n ( ) ( ) x , x ,..., x likelihood of it is defined as , a joint pdf of p x Φ 1 2 n n ( ) ( ) n = p x Φ p x Φ , X , X , ... X are iid Q ∏ n k 1 2 n = k 1 – Maximum likelihood estimator of is denoted as Φ ( ) ( ) n = = Φ arg max p x Φ arg max p x Φ ∏ ML n k = k 1 Φ Φ – Since the logarithm function is monotonically increasing function , Φ the parameter set that maximizes the log-likelihood should ML also maximize the likelihood. The log-likelihood can be ( ) ( ) ( ) expressed as: n = = l Φ log p x Φ log p x Φ ∑ 7 n k = k 1

Maximum Likelihood Estimation (MLE/ML) ( ) • If is differentiable function of , can be p x Φ Φ Φ n ML attained by taking the partial derivative with respect to and setting it to zero Φ ( ) = – Let be a M-component parameter vector t Φ Φ Φ , Φ ,..., Φ 1 2 M ( ) ∂   l Φ   ∂ Φ   1 . ( ) ( )   n ∇ = ∇ = = l Φ log p x Φ 0 ∑   Φ Φ k . = k 1   ( ) ∂ l Φ   ∂   Φ   M ( ) • Example: is a univariate Gaussian pdf with the p x Φ ( ) µ , σ parameter set 2 ( )   − µ 2 ( ) 1 x = − p x Φ exp   σ π σ 2 2 2   ( )     − µ 2 ( ) ( ) 1 x n 1 ( ) ( )   n n n = = − = − πσ − − µ 2 log p x Φ log p x Φ log exp log 2 2 x k   ∑ ∑ ∑   σ σ n k π σ k 2 2 2 2 2 2 = =   = k 1 k 1   k 1 8

Maximum Likelihood Estimation (MLE/ML) • Example: univariate Gaussian pdf (cont.) – Take the partial derivatives of the above expression and set them to zero Φ itself is fixed but unkown ∂ ( ) 1 ( ) n = − µ = log p x Φ x 0 ∑ n k ∂ µ σ 2 = k 1 ( ) 2 ∂ − µ ( ) n x n = − + = k log p x Φ 0 ∑ n ∂ σ 2 σ 2 σ 4 = k 1 µ – The maximum likelihood estimates for and are σ 2 ( ) n µ = = x E x ∑ ML k = k 1 [ ] 1 ( ) ( ) σ = − µ = − µ 2 2 2 x E x ML k ML k ML n • The maximum likelihood estimation for mean and variance is just the sample mean and variance 9

Maximum Likelihood Estimation (MLE/ML) • Example: multivariate Gaussian pdf (cont.) ( )   1 1 ( ) ( )  t − = − − 1 − p x Φ exp x µ Σ x µ  d 2   ( ) 1 2 2 π Σ 2 µ – The maximum likelihood estimates for and are Σ 1 n = ˆ µ x ∑ MLE k n = k 1 1 ( )( ) n ˆ t = − − ˆ ˆ Σ x µ x µ ∑ MLE k MLE k MLE n = k 1 [ ] ( )( ) t = − − ˆ ˆ E x µ x µ k MLE k MLE • The maximum likelihood estimation for mean vector and variance matrix is just the sample mean vector and variance matrix Φ • In fact, itself is also a Gaussian distribution MLE 10

Bayesian Estimation • Bayesian estimation has a different philosophy than maximum likelihood (ML) estimation – ML assumes the parameter set is fixed but unknown (non- Φ informative, uniform prior) – Bayesian estimation assumes the parameter set itself is a Φ ( ) p Φ random variable with a prior distribution ( ) = x x , x ,..., x – Given a sequence of random samples , which are 1 2 n ( ) i.i.d. with a joint pdf , the posterior distribution of can p x Φ Φ be the following according to the Bayes’ rule ( ) ( ) p x Φ p Φ ( ) ( ) ( ) = ∝ p Φ x p x Φ p Φ ( ) p x 11

Bayesian Estimation ( ) : the posterior probability, the distribution of Φ • p Φ x after we observed the values of random variables ( ) p Φ • : a conjugate prior of the random variables (or vector) is defined as the prior distribution for the parameters of the density function (e.g. ) of the random variables (or vectors) Φ – Before we observed the values of random variables • The joint pdf/likelihood function     2 2 − Φ − Φ     ( ) 1 1 n x 1 n x ∑ ∑ = − ∝ −     p x Φ exp  i  exp  i  σ σ n ( ) 2   2       2 σ     n 2 π = = i 1 i 1 • The prior is also a Gaussian distribution     2 2 Φ − µ Φ − µ 1 1   1   ( ) = − ∝ − p Φ exp     exp     ν ν 1 ( ) 2   2       2 ν     2 π 12

Review of Estimation Theory Berlin 2003 References: 1. X. Huang - PowerPoint PPT Presentation

Review of Estimation Theory Berlin 2003 References: 1. X. Huang et. al., Spoken Language Processing, Chapter 3 Introduction Estimation theory is the most important theory and method in statistical inference Statistical inference

Motion Estimation by Affine Transforms Motion Estimation by Affine Transforms Motion Estimation

Estimation theory Parametric estimation Properties of estimators Minimum variance

MLSE Channel Estimation MLSE Channel Estimation MLSE Channel Estimation Parametric or Non-

M-Estimation under High-Dimensional Asymptotics DLD, Andrea Montanari 2014-05-01 DLD, Andrea

Part 3. Spectrum Estimation Part 3. Spectrum Estimation 3.2 Parametric Methods for Spectral

Lecture 6. Bayesian estimation Lecture 6. Bayesian estimation 1 (172) 6. Bayesian estimation

Chapter 2- -3 3 Chapter 2 Definition of Theory: A theory is a systematic Definition of

Detection and Estimation Theory Lecture 13 Mojtaba Soltanalian- UIC msol@uic.edu

ESTIMATION AS UNCERTAINTY REDUCTION What is this estimation thing, anyway? Michael Godeck

Outline Introduction Knowledge Structures Parameter Estimation Maximum Likelihood Estimation

4CSLL5 Parameter Estimation (Supervised and Unsupervised) Unsupervised Maximum Likelihood

Point Estimation The goal of Point Estimation is to find the point in -space which gives the

Binary choice 3.3 Maximum likelihood estimation Michel Bierlaire Output of the estimation

State estimation approach to nonstationary Introduction inverse problems State estimation

I 4 - Bayesian parameter estimation in a normal model STAT 587 (Engineering) Iowa State

4CSLL5 Parameter Estimation (Supervised and Unsupervised) Supervised Maximum Likelihood

PyParadise Developed by: Bernd Husemann (MPIA), Omar Choudhury (AIP) C. Jakob Walcher

Preliminary Results of LIGO-ALLEGRO Stochastic Background Search John T. Whelan

WS Calibration Program and GUI in python 18/11/2014 Carolina Bianchini BE-BI-BL Outline Wire

Graphics Writing Functions Marco Chiarandini Department of Mathematics & Computer Science

Recent IGS Analysis Centres Coordinator Activities Guorong Hu & Michael Moore Geodesy Section,

2015 (and early 2016) How to make this talk. The Year in Review Areas Searched No conflicts

(ELI-NP at Magurele - Pulse and Impulse of ELI) Extensive Light Investigations-ELI-apoma

4.3 Homogeneous linear equations with constant coefficients a lesson for MATH F302 Differential

Review of Estimation Theory Berlin 2003 References: 1. X. Huang - PowerPoint PPT Presentation

Review of Estimation Theory Berlin 2003 References: 1. X. Huang et. al., Spoken Language Processing, Chapter 3 Introduction Estimation theory is the most important theory and method in statistical inference Statistical inference

Motion Estimation by Affine Transforms Motion Estimation by Affine Transforms Motion Estimation

Estimation theory Parametric estimation Properties of estimators Minimum variance

MLSE Channel Estimation MLSE Channel Estimation MLSE Channel Estimation Parametric or Non-

M-Estimation under High-Dimensional Asymptotics DLD, Andrea Montanari 2014-05-01 DLD, Andrea

Part 3. Spectrum Estimation Part 3. Spectrum Estimation 3.2 Parametric Methods for Spectral

Lecture 6. Bayesian estimation Lecture 6. Bayesian estimation 1 (172) 6. Bayesian estimation

Chapter 2- -3 3 Chapter 2 Definition of Theory: A theory is a systematic Definition of

Detection and Estimation Theory Lecture 13 Mojtaba Soltanalian- UIC msol@uic.edu

ESTIMATION AS UNCERTAINTY REDUCTION What is this estimation thing, anyway? Michael Godeck

Outline Introduction Knowledge Structures Parameter Estimation Maximum Likelihood Estimation

4CSLL5 Parameter Estimation (Supervised and Unsupervised) Unsupervised Maximum Likelihood

Point Estimation The goal of Point Estimation is to find the point in -space which gives the

Binary choice 3.3 Maximum likelihood estimation Michel Bierlaire Output of the estimation

State estimation approach to nonstationary Introduction inverse problems State estimation

I 4 - Bayesian parameter estimation in a normal model STAT 587 (Engineering) Iowa State

4CSLL5 Parameter Estimation (Supervised and Unsupervised) Supervised Maximum Likelihood

PyParadise Developed by: Bernd Husemann (MPIA), Omar Choudhury (AIP) C. Jakob Walcher

Preliminary Results of LIGO-ALLEGRO Stochastic Background Search John T. Whelan

WS Calibration Program and GUI in python 18/11/2014 Carolina Bianchini BE-BI-BL Outline Wire

Graphics Writing Functions Marco Chiarandini Department of Mathematics &amp; Computer Science

Recent IGS Analysis Centres Coordinator Activities Guorong Hu &amp; Michael Moore Geodesy Section,

2015 (and early 2016) How to make this talk. The Year in Review Areas Searched No conflicts

(ELI-NP at Magurele - Pulse and Impulse of ELI) Extensive Light Investigations-ELI-apoma

4.3 Homogeneous linear equations with constant coefficients a lesson for MATH F302 Differential

Graphics Writing Functions Marco Chiarandini Department of Mathematics & Computer Science

Recent IGS Analysis Centres Coordinator Activities Guorong Hu & Michael Moore Geodesy Section,