Lecture 3, Estimation and model validation Magnus Wiktorsson

Maximum likelihood, recap argument, so this is equivalently written as parametric class of models with known density some unknown parameter vector. defined as ▶ Let x ( N ) = ( x 1 , . . . , x N ) be a sample from some f X ( N ) ( x 1 , . . . , x n ; Θ) = L ( x ( N ) ; θ ) , where θ ∈ Θ is ▶ The Maximum Likelihood estimator (MLE) is ˆ L ( x ( N ) ; θ ) θ MLE = arg max θ ∈ Θ ▶ Taking logarithm does not change the ˆ ℓ ( x ( N ) ; θ ) θ MLE = arg max θ ∈ Θ with ℓ ( θ ) = log L ( x ( N ) ; θ ) .

X N be an unbiased estimator of . It log L x N log L x N then holds that 1 2 E 1 E 1 N I F V T X N and the MLE attains this lower bound asymptotically. by Let T X 1 Theorem (Cramer-Rao) N ▶ The asymptotic distribution for the MLE is given √ ( ) d ( N ( θ ) − 1 ) ˆ θ − θ → N 0 , I F

and the MLE attains this lower bound asymptotically. then holds that E N E by ▶ The asymptotic distribution for the MLE is given √ ( ) d ( N ( θ ) − 1 ) ˆ θ − θ → N 0 , I F ▶ Theorem (Cramer-Rao) Let T ( X 1 , . . . , X N ) be an unbiased estimator of θ . It ]) − 1 N ( θ ) − 1 = − ( [ ∇ θ ∇ θ log ( L ( x ( N ) ; θ )) V ( T ( X N )) ≥ I F , ) 2 ]) − 1 ( [( ∇ θ log ( L ( x ( N ) ; θ )) =

Misspecified models What happens if the model is wrong? We look at two simple cases ▶ The model is too simple ▶ The model is too complex

Too simple ▶ Assume that the data is given by ( θ ) Y = [ X Z ] + ϵ β ▶ While the model is given by Y = X θ + ϵ. ▶ What happens? Bias!

Proof, model is too simple 1 bias + noise X T X T Z X T X 1 X T X X Z X T X T X Estimate is given (in matrix notation) by OLS Plug the expression for Y into that equation X T Y X T X Interpretation of the bias? ) − 1 ( ˆ θ OLS =

Proof, model is too simple Plug the expression for Y into that equation X T X Estimate is given (in matrix notation) by X T X T X Interpretation of the bias? X T Y X T X ) − 1 ( ˆ θ OLS = ( ( θ ) ) ) − 1 ( ˆ θ OLS = [ X Z ] + ϵ β ) − 1 ( ( ) = X T X θ + X T Z β + X T ϵ = θ + bias + noise

efficiency Model is too complex ▶ Assume that the data is given by Y = X θ + ϵ. ▶ While the model is given by ( θ ) Y = [ X Z ] + ϵ β ▶ What happens? No bias, but potentially poor

Proof X T Z Z T Z Z T X X T Z X T Z Z T X 0 Z T Z that Z T Z Z T X 0 V Z T Z Z T X X T Z ▶ Estimates are given by ] − 1 ([ X T X ( ˆ ) [ X T X ] ( θ ) [ X T ϵ ]) θ = + ˆ Z T ϵ β ] − 1 [ X T ϵ ( θ ) [ X T X ] = + Z T ϵ ▶ It then follow that ˆ θ is unbiased and E [ˆ β ] = 0 and ] − 1 ( ˆ θ ) [ X T X = V ( ϵ 1 ) ˆ β

Examination of the data Before starting to do any estimation we should carefully look at the dataset. trade... explanatory variables? ▶ Is the data correct? Most orders never result in a ▶ Does the data contain outliers? ▶ Missing values? ▶ Do we have measurements of all relevant ▶ Timing errors?

Model validation There are two types of validation. Absolute: Are the model assumptions fulfilled? Relative: Is the estimated model good enough, compared to some other model. Both can still be wrong...

Absolute tests We have some external knowledge of data e.g. underlying physics (Gray box models). make sense. ▶ Looking at whether the estimated parameters ▶ Are effects going in the right directions? ▶ Do the parameters have reasonable values?

such that E f e 2 E g e 2 such that E f e 2 E g u 2 Residuals Cov f e n some external signal used as explanatory where u is f g k 0 k g u n . No cross- dependence : f g k 0 k g e n Cov f e n No auto- dependence This implies: variable. The residuals { e } should be i.i.d. Why?

such that E f e 2 E g u 2 Residuals 0 some external signal used as explanatory where u is f g k k g u n Cov f e n No cross- dependence : variable. The residuals { e } should be i.i.d. Why? This implies: ▶ No auto- dependence Cov ( f ( e n ) , g ( e n + k )) = 0 , ∀ k ∈ Z , ∀ f , g , such that E [ f ( e ) 2 ] < ∞ , E [ g ( e ) 2 ] < ∞ .

some external signal used as explanatory Residuals variable. The residuals { e } should be i.i.d. Why? This implies: ▶ No auto- dependence Cov ( f ( e n ) , g ( e n + k )) = 0 , ∀ k ∈ Z , ∀ f , g , such that E [ f ( e ) 2 ] < ∞ , E [ g ( e ) 2 ] < ∞ . ▶ No cross- dependence : Cov ( f ( e n ) , g ( u n + k )) = 0 , ∀ k ∈ Z , ∀ f , g , such that E [ f ( e ) 2 ] < ∞ , E [ g ( u ) 2 ] < ∞ where u is

Normalized prediction errors Residuals are usually normalized prediction errors This can in many cases also be generalized to SDE-models. e n = y n − E [ Y n |F n − 1 ] √ V ( Y n |F n − 1 ) .

Formal tests p Signtest on residuals # of positive Bin N 1 2 . Number of changes of sign (Wald-Wolfowitz runs test) Resimulate the model from residuals. Can it reproduce data? ▶ Test for dependence in residuals (Box-Ljung). γ ( k ) 2 ∑ T = N ( N + 2 ) N − k . k = 1 Reject if T > χ 2 1 − α, p .

Formal tests p runs test) reproduce data? ▶ Test for dependence in residuals (Box-Ljung). γ ( k ) 2 ∑ T = N ( N + 2 ) N − k . k = 1 Reject if T > χ 2 1 − α, p . ▶ Signtest on residuals # of positive ∈ Bin ( N , 1 / 2 ) . ▶ Number of changes of sign (Wald-Wolfowitz ▶ Resimulate the model from residuals. Can it

Scatterplots of residuals remaining auto dependence. ▶ e n vs e n − 1 (autocorr). ▶ e n vs y n | n − 1 = E [ y n |F n − 1 ] prediction error- ▶ e n vs u n external dependence.

A good example (a well estimated AR(1) process) SACF Normplot e n − 1 vs e n e n vs y n | n − 1 0.4 0.4 0.3 0.3 0.2 0.2 0.1 0.1 0 0 −0.1 −0.1 −0.2 −0.2 −0.3 −0.3 −0.4 −0.4 −0.4 −0.3 −0.2 −0.1 0 0.1 0.2 0.3 0.4 −0.5 −0.4 −0.3 −0.2 −0.1 0 0.1 0.2 0.3 0.4 0.5 1.2 Normal Probability Plot 0.999 0.997 1 0.99 0.98 0.95 0.8 0.90 0.75 0.6 Probability 0.50 0.4 0.25 0.2 0.10 0.05 0.02 0 0.01 0.003 0.001 −0.2 0 2 4 6 8 10 12 14 16 18 20 −0.3 −0.2 −0.1 0 0.1 0.2 0.3 lag Data

An example of wrong order (an AR(2) model estimated with a AR(1) model) Normplot SACF e n − 1 vs e n e n vs y n | n − 1 0.4 0.4 0.3 0.3 0.2 0.2 0.1 0.1 0 0 −0.1 −0.1 −0.2 −0.2 −0.3 −0.3 −0.4 −0.4 −0.4 −0.3 −0.2 −0.1 0 0.1 0.2 0.3 0.4 −1.5 −1 −0.5 0 0.5 1 1.2 Normal Probability Plot 0.999 0.997 1 0.99 0.98 0.8 0.95 0.90 0.6 0.75 Probability 0.50 0.4 0.25 0.2 0.10 0.05 0.02 0 0.01 0.003 0.001 −0.2 0 2 4 6 8 10 12 14 16 18 20 −0.3 −0.2 −0.1 0 0.1 0.2 0.3 lag Data

SACF An example of wrong model structure (a non-linear model Normplot estimated with a AR(1) model) e n − 1 vs e n e n vs y n | n − 1 1.5 1.5 1 1 0.5 0.5 0 0 −0.5 −0.5 −1 −1 −1.5 −1.5 −1.5 −1 −0.5 0 0.5 1 1.5 −0.4 −0.3 −0.2 −0.1 0 0.1 0.2 0.3 1.2 Normal Probability Plot 0.999 0.997 1 0.99 0.98 0.8 0.95 0.90 0.6 0.75 Probability 0.50 0.4 0.25 0.2 0.10 0.05 0.02 0 0.01 0.003 0.001 −0.2 0 2 4 6 8 10 12 14 16 18 20 −1 −0.5 0 0.5 1 lag Data

Overfitting Overfitting gives residuals that look good. Therefore it is important to test predictions also out of sample. ▶ Split data into an estimation and a validation set. ▶ Cross validation

Example overfitting (ARMA(1,1) fitted with ARMA(3,3)) SACF in sample SACF out of sample e n − 1 vs e n in sample 4 1.2 3 1 0.8 2 0.6 1 0.4 0 0.2 −1 0 −2 −0.2 0 2 4 6 8 10 12 14 16 18 20 −3 lag −3 −2 −1 0 1 2 3 4 e n − 1 vs e n out of sample 3 1 2 0.5 1 0 0 −1 −2 −0.5 0 2 4 6 8 10 12 14 16 18 20 −3 lag −3 −2 −1 0 1 2 3

Relative model validation Test if a larger model is necessary. 0 Hypothesis test: Wald, LM or LR. Wald: H 0 : θ ′ = θ ′ H 1 : θ ′ free . θ = ˆ θ ± λ α/ 2 d (ˆ θ ) I ˆ

LR for Gaussian models i) ii) exact test for AR models. estimated model with n parameters from N observations. Let Q ( n ) be the sum of squared residuals for an Test n 1 vs n 2 parameters, then for true order n 0 ≤ n 1 < n 2 Q ( n 2 ) ∈ χ 2 ( N − n 2 ) . σ 2 Q ( n 1 ) − Q ( n 2 ) ∈ χ 2 ( n 2 − n 1 ) . σ 2 iii) Q ( n 2 ) and Q ( n 1 ) − Q ( n 2 ) are independent. iv) η = N − n 2 Q ( n 1 ) − Q ( n 2 ) ∈ F ( n 2 − n 1 , N − n 2 ) . n 2 − n 1 Q ( n 2 ) If η is large pick model 2 else pick model 1. This is an

Lecture 3, Estimation and model validation Magnus Wiktorsson - PowerPoint PPT Presentation

Lecture 3, Estimation and model validation Magnus Wiktorsson Maximum likelihood, recap argument, so this is equivalently written as parametric class of models with known density some unknown parameter vector. defined as Let x ( N ) = ( x 1

Data Mining II Model Validation Heiko Paulheim Why Model Validation? We have seen so far

Validation of National Burn Severity Validation of National Burn Severity Validation of National

Form Validation 1 CS380 What is form validation? 2 validation: ensuring that form's values

Learning From Data Lecture 13 Validation and Model Selection The Validation Set Model Selection

Lecture 6. Bayesian estimation Lecture 6. Bayesian estimation 1 (172) 6. Bayesian estimation

LaGov LaGov Version 2.2 Updated: 12/17/08 Visit our website for Blueprint Presentations,

Progress to Date in A3: Method Transfer, Partial Validation and Cross validation A3: Method

Module 4 19/05/2015 2 Agenda 1. What is validation? 2. Three-part empathy 3. What is

LaGov LaGov Validation Session Agenda Validation Session Agenda Purpose Work Session

Bounce Address Tag Validation Bounce Address Tag Validation Bounce Address Tag Validation (BATV)

Capital Quality Validation Webinar Sept. 17, 2020 Agenda Validation Overview

AIRS Validation Overview & TDS Support of Validation Eric Fetzer AIRS Science Team Meeting

AngularJS & Bootstrap Form Validation HTML default validation Browsers have built-in

Chapter 5 Analysis: Four Level for Validation Vis/Visual Analytics, Chap 5 Validation 1 CGGM

Motion Estimation by Affine Transforms Motion Estimation by Affine Transforms Motion Estimation

CSc 337 LECTURE 23: REGULAR EXPRESSIONS What is form validation? validation : ensuring that

Principal Component Analysis (PCA) Dr. Veselina Kalinova Max Planck Institute for

Small groups and Questionnaires (for quality control) useR! 2 8 Lucien Lemmens Introduction

Fractions Return to Table of Contents Slide 5 / 305 Slide 6 / 305 Greatest Common Factor 1

STATS 700-002 Data Analysis using Python Lecture 5: numpy and matplotlib Some examples adapted

Markov Chain Monte Carlo (MCMC) Inference Seung-Hoon Na Chonbuk National University Monte Carlo

Stochastic Simulation Generation of random variables Continuous sample space Bo Friis Nielsen

Stochastic Simulation Independent, uniformly distributed RN Generation of random variables

Monte Carlo Methods Lecture notes for MAP001169 Based on Script by Martin Sk old adopted by