Advanced Section #2 Model Selection & Information Criteria - PowerPoint PPT Presentation

Advanced Section #2 Model Selection & Information Criteria Akaike Information Criterion Marios Mattheakis and Pavlos Protopapas CS109A Introduction to Data Science Pavlos Protopapas and Kevin Rader 1

Outline • Maximum Likelihood Estimation (MLE). Fit a distribution Exponential distribution • • Normal (Linear Regression Model) Model Selection & Information Criteria • • KL divergence • MLE justification through KL divergence Model Comparison • • Akaike Information Criterion (AIC) CS109A, P ROTOPAPAS , R ADER 2

Maximum Likelihood Estimation (MLE) & Parametric Models 3

Maximum Likelihood Estimation (MLE) Fit your data with a parametric distribution q ( y | θ ). θ =( θ 1 , … , θ k ) is a parameter set to be estimated. y CS109A, P ROTOPAPAS , R ADER 4

Maximum Likelihood Estimation (MLE) Fit your data with a parametric distribution q ( y | θ ). θ =( θ 1 , … , θ k ) is a parameter set to be estimated. y CS109A, P ROTOPAPAS , R ADER 5

Maximize the Likelihood L Scanning over all the parameters until find the maximum L ...but this is a too time-consuming approach. CS109A, P ROTOPAPAS , R ADER 6

Maximum Likelihood Estimation (MLE) A formal and efficient method is given by MLE Observations: y =(y 1 , …, y n ) Easier and numerically more stable to work with log-likelihood CS109A, P ROTOPAPAS , R ADER 7

Maximum Likelihood Estimation (MLE) Easier and numerically more stable to work with log-likelihood ⟹ CS109A, P ROTOPAPAS , R ADER 8

Exponential distribution: A simple and useful example A one parameter distribution: rate parameter λ CS109A, P ROTOPAPAS , R ADER 9

Linear Regression Model with gaussian error CS109A, P ROTOPAPAS , R ADER 10

Linear Regression Model through MLE Loss Function CS109A, P ROTOPAPAS , R ADER 11

Linear Regression Model: Standard Formulas Minimize the loss essentially maximize the likelihood, and we get CS109A, P ROTOPAPAS , R ADER 12

Model Selection & Information Theory: Akaike Information Criterion 13

Kullback-Leibler (KL) divergence (or relative entropy) How good do we fit the data? What additional uncertainty have we introduced? CS109A, P ROTOPAPAS , R ADER 14

KL divergence The KL divergence shows the distance between two distributions, hence it is a non-negative quantity. With Jensen’s inequality for convex functions 𝑔 𝒛 , 𝔽[𝑔 𝒛 ] ≥ 𝑔(𝔽 [ y ]): KL divergence is a non-symmetric quantity CS109A, P ROTOPAPAS , R ADER 15

MLE justification through KL divergence Empirical distribution Minimize KL divergence is the same with maximize likelihood (empirical distribution) log-likelihood CS109A, P ROTOPAPAS , R ADER 16

Model Comparison Consider to model distributions By using the empirical distribution: p is eliminated. CS109A, P ROTOPAPAS , R ADER 17

Akaike Information Criterion (AIC) AIC is a trade off between the number of parameters k and the error that is introduced (overfitting). AIC is an asymptotic approximation of the KL-divergence The data are being used twice: first for MLE and second for the KL-divergence estimation. AIC estimates which is the optimal number of parameters k CS109A, P ROTOPAPAS , R ADER 18

Polynomial Regression Model Example Suppose a polynomial regression model Which is the optimal k? For k smaller than the optimal: Underfitting For k larger than the optimal: Overfitting CS109A, P ROTOPAPAS , R ADER 19

Minimizing real and empirical KL-divergence Suppose many models indicated by index j Work with the j -th model which has k j parameters CS109A, P ROTOPAPAS , R ADER 20

Numerical verification of AIC CS109A, P ROTOPAPAS , R ADER 21

Akaike Information Criterion (AIC): Proof Asymptotic Expansion around true ideal MLE θ 0 CS109A, P ROTOPAPAS , R ADER 22

Akaike Information Criterion (AIC): Proof CS109A, P ROTOPAPAS , R ADER 23

Akaike Information Criterion (AIC): Proof In the limit of a correct model: CS109A, P ROTOPAPAS , R ADER 24

Review Maximum Likelihood Estimation (MLE) • A powerful method to estimate the ideal fitting parameters of a 1. model. Exponential distribution, a simple but useful example. 2. 3. Linear Regression Model as a special paradigm of MLE implementation. • Model Selection & Information Criteria 1. KL-divergence quantifies the “distance” between the fitting model and the “real” distribution. KL-divergence justifies the MLE and is used for model comparison. 2. AIC: Estimates the number of model parameters and protects from 3. overfitting. CS109A, P ROTOPAPAS , R ADER 25

Advanced Section 2: Model Selection & Information Criteria Thank you Office hours are: Monday 6-7:30 (Marios) Tuesday 6:30-8 (Trevor) CS109A, P ROTOPAPAS , R ADER 26

Advanced Section #2 Model Selection & Information Criteria - PowerPoint PPT Presentation

Advanced Section #2 Model Selection & Information Criteria Akaike Information Criterion Marios Mattheakis and Pavlos Protopapas CS109A Introduction to Data Science Pavlos Protopapas and Kevin Rader 1 Outline Maximum Likelihood

ERP Selection KIRTANE & PANDIT Suhas Deshpande Why ERP Selection is important ?

ESG Criteria: ESG Criteria: ESG Criteria: ESG Criteria: New paradigm that will redefine the

Selection Sort Section 10.2 Code for Selection Sort (cont.) Code for an Array Sort Code for an

29 May 2015 ADMISSIBILITY AND SELECTION CRITERIA PART 1: FUND OVERVIEW AND ADMISSIBILITY CRITERIA

Module V: Vector Spaces Module V Math 237 Module V Section V.0 Section V.1 Section V.2

STAT 213 Model Selection II Colin Reimer Dawson Oberlin College March 30, 2018 1 / 13 Outline

SECONDHAND SELECTION Sales Price - 275,000.00 EU SECONDHAND SELECTION INTERNAL VIEWS SECONDHAND

Variable selection bias Bias in Ensemble Bias in Ensemble Methods Methods Variable selection

SELECTION Deterministic Stochastic Proportionate selection: Roulette Wheel Selection

Selection 2 Selection Selection given a set of (distinct) elements, finding the element larger

PLS Advanced Diffusion Model New Advanced Diffusion Model for Dopants in Silicon Advanced Dopant

MODEL SELECTION AND REGULARISATION MODEL SELECTION ESTIMATING THE ACCURACY OF THE MODEL We

Half Year Results Presentation 2019 6 months ended 30 June 2019 Section 1 Section 2 Section 3

2018 Full year results presentation 12 months ended 31 December 2018 1 Section 1 Section 2

Model Selection and Assumptions November 15, 2019 November 15, 2019 1 / 32 Forward Selection

STAT 213 Multicollinearity and Model Selection Colin Reimer Dawson Oberlin College 7 April 2016

Understanding and communicating widespread flood risk Ross Towe 1 , 2 Jonathan Tawn 1 Rob Lamb 1 ,

PERSISTENCE IN TURKISH REAL EXCHANGE RATES: PANEL APPROACHES Haluk Erlat Department of Economics

How far can we forecast? Statistical tests of the predictive content Jrg Breitung and Malte

10-12 M AY 2006 A Random Walk through Seasonal Adjustment: Noninvertible Moving Averages and Unit

Random Matrices in Wireless Communications M erouane Debbah Eurecom Institute debbah@eurecom.fr

Towards Characterization of Identifiability of Profile HMMs Srilakshmi Pattabiraman University

The ergodic high SNR capacity of the Introduction spatially-correlated non-coherent MIMO System

Doru Caraeni CD-adapco, USA CFD Futures Conference, August 6-8, 2012 Why I did Residual-based

Advanced Section #2 Model Selection & Information Criteria - PowerPoint PPT Presentation

Advanced Section #2 Model Selection & Information Criteria Akaike Information Criterion Marios Mattheakis and Pavlos Protopapas CS109A Introduction to Data Science Pavlos Protopapas and Kevin Rader 1 Outline Maximum Likelihood

ERP Selection KIRTANE &amp; PANDIT Suhas Deshpande Why ERP Selection is important ?

ESG Criteria: ESG Criteria: ESG Criteria: ESG Criteria: New paradigm that will redefine the

Selection Sort Section 10.2 Code for Selection Sort (cont.) Code for an Array Sort Code for an

29 May 2015 ADMISSIBILITY AND SELECTION CRITERIA PART 1: FUND OVERVIEW AND ADMISSIBILITY CRITERIA

Module V: Vector Spaces Module V Math 237 Module V Section V.0 Section V.1 Section V.2

STAT 213 Model Selection II Colin Reimer Dawson Oberlin College March 30, 2018 1 / 13 Outline

SECONDHAND SELECTION Sales Price - 275,000.00 EU SECONDHAND SELECTION INTERNAL VIEWS SECONDHAND

Variable selection bias Bias in Ensemble Bias in Ensemble Methods Methods Variable selection

SELECTION Deterministic Stochastic Proportionate selection: Roulette Wheel Selection

Selection 2 Selection Selection given a set of (distinct) elements, finding the element larger

PLS Advanced Diffusion Model New Advanced Diffusion Model for Dopants in Silicon Advanced Dopant

MODEL SELECTION AND REGULARISATION MODEL SELECTION ESTIMATING THE ACCURACY OF THE MODEL We

Half Year Results Presentation 2019 6 months ended 30 June 2019 Section 1 Section 2 Section 3

2018 Full year results presentation 12 months ended 31 December 2018 1 Section 1 Section 2

Model Selection and Assumptions November 15, 2019 November 15, 2019 1 / 32 Forward Selection

STAT 213 Multicollinearity and Model Selection Colin Reimer Dawson Oberlin College 7 April 2016

Understanding and communicating widespread flood risk Ross Towe 1 , 2 Jonathan Tawn 1 Rob Lamb 1 ,

PERSISTENCE IN TURKISH REAL EXCHANGE RATES: PANEL APPROACHES Haluk Erlat Department of Economics

How far can we forecast? Statistical tests of the predictive content Jrg Breitung and Malte

10-12 M AY 2006 A Random Walk through Seasonal Adjustment: Noninvertible Moving Averages and Unit

Random Matrices in Wireless Communications M erouane Debbah Eurecom Institute debbah@eurecom.fr

Towards Characterization of Identifiability of Profile HMMs Srilakshmi Pattabiraman University

The ergodic high SNR capacity of the Introduction spatially-correlated non-coherent MIMO System

Doru Caraeni CD-adapco, USA CFD Futures Conference, August 6-8, 2012 Why I did Residual-based

ERP Selection KIRTANE & PANDIT Suhas Deshpande Why ERP Selection is important ?