Model Assessment Generalized Linear Models Marco Chiarandini - PowerPoint PPT Presentation

DM825 Introduction to Machine Learning Lecture 4 Model Assessment Generalized Linear Models Marco Chiarandini Department of Mathematics & Computer Science University of Southern Denmark

Error Estimation Methods Outline Generalized Linear Models 1. Error Estimation Methods 2. Generalized Linear Models 2

Error Estimation Methods Loss Function in Classification Generalized Linear Models G = { 1 , . . . , k } x ) = Pr( G = k | � p k ( � X = � x ) the probability modeled ˆ G ( � x ) = argmax k ˆ p k ( � x ) predicted L ( G, ˆ x )) = I ( G � = ˆ G ( � G ( � x )) 0–1 loss K � L ( G, ˆ G ( � x ) = − 2 I ( G = k ) log 2 ˆ p k ( � x ) entropy k =1 = − 2 log 2 ˆ p G ( � x ) 4

Error Estimation Methods Akaike Information Criterion Generalized Linear Models AIC = log ( p ( D | θ )) − p requires an adjustment of max likelihood to account for different complexities in the models choose model with largest AIC: computed on training set only. 5

Error Estimation Methods Methods to Estimate Error Curves Generalized Linear Models Model selection: estimate performance in order to choose the best model model assessment: selected a final model, estimating its prediction error on new data. If plenty of data, divide data randomly and use: 50% for training 25% for model selection (validation) 25% for assessment If less data: cross validation Bootstrap method 6

Error Estimation Methods Cross Validation Generalized Linear Models k -fold cross validation: k parts of m/k elements leave k part out and use the rest of the data to train the model (if k = m then leave-one-out) We use extra sample to estimate error Err = E [ L ( Y, h ( x ))] where ( Y, � X ) from joint distribution for i from 1 to k do take out the i th part fit models on other k − 1 parts calculate prediction error when predicting i th part ϕ : { 1 . . . m } → { 1 . . . k } by randomization ˆ h − i ( � x ) fitted function on data � x with i th part removed m � CV = 1 ( L ( y i , ˆ h − ϕ ( i ) ( � x i )) m i =1 k = 5 , 10 search ˆ θ that minimizes CV. 7

Error Estimation Methods Bootstrap Method Generalized Linear Models z = ( z 1 , z 2 , . . . , z m ) and z i = ( x i , y i ) Training set � randomly draw data sets with replacement repeat draw a data set fit the model until B = 100 times ; 8

Error Estimation Methods Generalized Linear Models We can estimate any aspect of S ( � z ) � B 1 � ( S ( z ∗ b ) − ¯ S ∗ ) 2 Var[ S ( � z )] = B − 1 b =1 B m � � Err boost = 1 1 � L ( y i , ˆ h ∗ b ( x i )) B m j =1 b =1 x i of model fitted on b th. There are common ˆ h ∗ b ( x i ) is predicted value at � observations between training and test observations. To avoid this: m � � Err boost = 1 1 � L ( y i , ˆ h ∗ b ( x i )) m | C − i | i =1 b ∈ C − i C − i is set of indices of the bootstrap samples b that do not contain observation i . 9

Error Estimation Methods Exponential Family of Distributions Generalized Linear Models We have seen: regression y | x ; θ ∼ N ( µ, σ 2 ) classification y | x ; θ ∼ Bern( µ, σ 2 ) They can be shown to belong to the framework: GLM Exponential distribution: η T � η T � p ( � y | η ) = c ( � y ) g ( � η ) exp { � u ( � y ) } = b ( � y ) exp { � T ( � y ) − a ( � η ) } � y scalar or vector, discrete or continuous c ( y ) = b ( y ) � η canonical or natural parameters u ( y ) = T ( y ) � u ( � y ) function of � y 1 g ( η ) = g ( � η ) ensures the distribution is normalized: exp( a ( η )) � η T � g ( � η ) c ( � y ) exp { � u ( � y ) } d� y = 1 11

Exponential Family of Distributions Error Estimation Methods Generalized Linear Models Gaussian distribution Gaussian distribution with σ 2 = 1 as an exponential distribution � � 1 − 1 2( y − µ ) 2 √ p ( y | µ ) = 2 π exp � � � � 1 − 1 µy − 1 2 y 2 2 µ 2 √ = 2 π exp exp η = µ u ( y ) = y � � 1 − 1 2 y 2 c ( y ) = √ 2 π exp � � − µ 2 g ( η ) = exp 2 12

Exponential Family of Distributions Error Estimation Methods Generalized Linear Models Gaussian distribution Gaussian distribution as an exponential distribution � � 1 − 1 2 σ 2 ( y − µ ) 2 p ( y | µ ) = √ 2 πσ 2 exp � � � µy � 1 − 1 1 2 σ 2 y 2 2 σ 2 µ 2 = √ 2 πσ 2 exp exp σ 2 − � � µ � η = σ 2 1 − 2 σ 2 � y � � u ( y ) = y 2 1 √ c ( y ) = 2 π � η 2 � � 1 g ( � η ) = − 2 η 2 exp 4 η 2 13

Exponential Family of Distributions Error Estimation Methods Generalized Linear Models Bernoulli distribution Bernoulli distribution as an exponential distribution p ( y | µ ) = Bern( y | µ ) = µ y (1 − µ ) 1 − y = exp { y log µ + (1 − y ) log(1 − µ ) } exponent of log = exp { y log µ + log(1 − µ ) − y log(1 − µ ) } � � � � µ = (1 − µ ) exp log y 1 − µ µ 1 η = log µ = σ ( η ) = 1 − µ 1+exp( − η ) link function response function 1 − µ = 1 − σ ( η ) 1 − σ ( η ) = σ ( − η ) u ( y ) = y p ( y | η ) = σ ( − η ) exp( ηy ) c ( y ) = 1 = σ ( − η ) g ( η ) 14

Exponential Family of Distributions Error Estimation Methods Generalized Linear Models Multinomial distribution y ∈ { 1 , 2 , . . . k } modeled as multinomial variable: � y | θ ∼ Multinomial( � µ ) � k j =1 µ j = 1 � µ 1 , . . . µ k − 1 independent parameters � p ( y = j | � µ ) = µ j µ ) = µ k = 1 − � k − 1 and p ( y = k | � j =1 µ j j =1 µ x j µ ) = Π k p ( � y | � � y = ( y 1 , . . . , y k ) j     � k = exp y j ln µ j   j =1 η T � y | � p ( � η ) = exp( � y ) η j = ln µ j , � η = ( η 1 , . . . , η m ) � u ( � y ) = � y c ( � y ) = 1 g ( � η ) = 1 15

Error Estimation Methods Generalized Linear Models removing the constraint that � k j =1 µ j = 1         � k � k − 1 k − 1 � k − 1 � exp y j ln µ j  = exp y j ln µ j + (1 − y j ) ln(1 − µ j )    j =1 j =1 j =1 j =1     k − 1 � � k − 1 µ j + ln(1 − = exp y j ln (1 − � m − 1 µ j )   j =1 y j ) j =1 j =1 µ j ln = η j (1 − � k − 1 j =1 y j ) exp( η j ) µ j = softmax function 1 + � k − 1 exp( η j ) j η T � exp( � x ) u ( � � y ) = � y p ( � y | � η ) = 1 + � k − 1 j =1 exp( η j ) c ( � y ) = 1 1 g ( � y ) = 1 + � k − 1 j =1 exp( η j ) 16

Error Estimation Methods Exponential Family of Distributions Generalized Linear Models Other distributions: Poisson (for counting problems) gamma and exponential (for continuous nonnegative random variables, such as time intervals) beta and Dirichelet (for distributions over probabilities) 17

Error Estimation Methods Maximum Likelihood Generalized Linear Models estimate parameter � η in general exponential family distribution x 1 , . . . , � x m ) training data X = ( � � m � � � m � � η ) m exp x i ) η T x i ) p ( X | � η ) = h ( � g ( � � � u ( � i =1 i =1 m � −∇ log g ( η ML ) = 1 x i ) � u ( � m i =1 18

Error Estimation Methods Conjugate Priors Generalized Linear Models we seek a prior that is conjugate to the likelihood function such that the posterior has the same functional form as the prior η ) ν exp { ν� η T � p ( � η | X , � χ, ν ) = f ( � χ, ν ) g ( � χ } 19

Error Estimation Methods Constructing GLM Generalized Linear Models Consider a classification or a regression problem ( y, � x ) . Predict y as a function of � x . (eg, predict number of page views in our web site based on certain features such as time of the day, advertising, etc.) Assumptions: 1. y | � x ; θ ∼ ExpFam( � η ) 2. given � x , predict expected value of u ( y ) : if u ( y ) = y = ⇒ h ( y ) = E [ y | � x ] 3. � η and input � x are related linearly (linear predictor): η = � ( η i = � θ T � θ T x i � x ) 20

Error Estimation Methods Ordinary Least Squares Generalized Linear Models x ; θ ∼ N ( µ, σ 2 ) y | � h � θ ( � x ) = E [ y | � x ; θ ] assumption 2. = µ because normal = η ass. 1 + what shown before = θ T � x ass. 2. 21

Error Estimation Methods Logistic Regression Generalized Linear Models y | � x ; θ ∼ Bern( µ ) h � θ ( � x ) = E [ y | � x ; θ ] assumption 2. = µ because Bernoulli 1 = ass. 1 + what shown before 1 + exp( − � η ) 1 = ass. 2. 1 + exp( − � θ T � x ) This answers also the question why the logistic sigmoid function was chosen g ( η ) = E [ � u ( � x ); η ] canonical response function g − 1 canonical link function 22

Model Assessment Generalized Linear Models Marco Chiarandini - PowerPoint PPT Presentation

DM825 Introduction to Machine Learning Lecture 4 Model Assessment Generalized Linear Models Marco Chiarandini Department of Mathematics & Computer Science University of Southern Denmark Error Estimation Methods Outline Generalized

Overview of logistic regression Richard Erickson Instructor DataCamp Generalized Linear Models

Limitations of linear models Richard Erickson Instructor DataCamp Generalized Linear Models in

Multiple logistic regression Richard Erickson Instructor DataCamp Generalized Linear Models in

Introduction to the R Statistical Computing Environment Linear and Generalized Linear Models in R

Generalized linear models Christopher F Baum EC 823: Applied Econometrics Boston College, Spring

Introduction to General and Generalized Linear Models Generalized Linear Models - part II Henrik

Introduction to General and Generalized Linear Models Generalized Linear Models - part I Henrik

Workshop 11.2a: Generalized Linear Mixed Effects Models (GLMM) Murray Logan February 7, 2017

Introduction to General and Generalized Linear Models Generalized Linear Models - part III Henrik

Generalized Nonlinear Models gnm : a Package for Generalized Nonlinear Models Same form as

Statistics for Applications Chapter 10: Generalized Linear Models (GLMs) 1/52 Linear model A

Generalized Additive Models September 10, 2019 Generalized Additive Models September 10, 2019 1

CS 7616 Pattern Recognition Linear, Linear, Linear Aaron Bobick School of Interactive

Introduction to Data Science: Logistic 0 1 1 according to a data fit criterion. account

Workshop 7: (Generalized) Linear models Murray Logan July 19, 2017 Table of contents 1

Workshop 3 Building from Linear Models to Generalised Linear Models Part 2: GLMs 2 2 What are

Parameterisation of reflected light arrival times Patrick Green The University of Manchester

Freshness Crawling, session 6 CS6200: Information Retrieval Slides by: Jesse Anderton Page

Exponential functionals of conditioned Lvy processes and local time of a diffusion in a Lvy

Outline Continuous Probability Distributions The Uniform Distribution (4.1) ( ) The

Stochastic Simulation of Chemical Reactions Computational Models for Complex Systems Paolo

Empirical Analysis of SLS Algorithms adapted and extended from slides for SLS:FA, Chapter 4

Nanocount Geert Cornelis University of Gothenburg, Dept. Chemistry and Molecular Biology

9 October 2013 Summary Talk to cover three themes: 1. How universities contribute to economic