Multivariate GLMs Author: Nicholas Reich, transcribed by Kate Hoff - PowerPoint PPT Presentation

Multivariate GLMs Author: Nicholas Reich, transcribed by Kate Hoff Shutta and Herb Susmann Course: Categorical Data Analysis (BIOSTATS 743) Made available under the Creative Commons Attribution-ShareAlike 4.0 International License.

Overview: Models for Multinomial Responses Note: This lecture focuses mainly on the Baseline Category Logit Model (see Agresti Ch. 8), but for the exam we are responsible for reading Chapter 8 of the text and being familiar with all types of models for multinomial responses introduced there. ◮ GLMs for Nominal Responses ◮ Baseline Category Logit Model (Multinomial Logit Model) ◮ Multinomial Probit Model ◮ GLMs for Ordinal Responses ◮ Cumulative Logit Model ◮ Cumulative Link Models ◮ Cumulative Probit Model ◮ Cumulative Log-Log Model ◮ Adjacent-Categories Logit Models ◮ Continuation-Ratio Logit Models ◮ Discrete-Choice Models ◮ Conditional Logit Models (and relationship to Multinomial Logit Model) ◮ Multinomial Probit Discrete-Choice Models ◮ Extension to Nested Logit and Mixed Logit Models ◮ Extension to Discrete Choice Model with Ordered Categories

Baseline Category Logit Model The Baseline Category Logit (BCL) model is appropriate for modeling nominal response data as a function of one or more categorical or quantitative covariates. ◮ Example: Modeling choice of voter candidate as a function of voter age (quantitative), gender (categorical nominal), race (categorical nominal), and socioeconomic status (categorical ordinal). ◮ Example: Modeling transcription factor binding to a promoter region as a function of transcription factor abundance (quantitative), affinity for the binding site (quantitative), and primary immune response activation status (categorical binary). ◮ Non-Example: Modeling consumer choice of soda size as a function of air temperature (quantitative) and time of day (quantitative). Soda size is a categorical ordinal variable, so although this model will technically work, it does not incorporate all of the information that our data contain.

BCL Model Formulation Consider the set of J possible values of a categorical response variable { C 1 , C 2 , . . . , C J } and the vector of P covariates � X = ( X 1 , X 2 , . . . , X P ) Goal: For a particular vector of covariates � x i = ( x i 1 , x i 2 , . . . , x iP ), predict Y i , the category to which the observation with covariates � x i belongs. (Note that Y i ∈ { C 1 , . . . , C J } .) Intermediate Goal: For all j ∈ 1 , . . . , J , use training data to fit x i ) under the constraint that � J π j ( � x i ) = P ( Y i = C J | � j =1 π j ( � x i ) = 1 Conditional on the observed covariates and the estimates for the functions π j , Y i is Multinomial: Y i | � x i ∼ Multinomial (1 , { π 1 ( � x i ) , . . . , π J ( � x i ) } )

Overview of Modeling Process ◮ Choose one of the J categories as a baseline. Without loss of generality, use C J (since the C j are nominal and ordering is irrelevant). ◮ Let β j = ( β j 1 , . . . , β jP ) be the category-specific coefficients of the covariates � x i for a particular category C J . (note the dimensions of β j are P x 1) ◮ Recall � x i = ( x i 1 , x i 2 , . . . , x iP ) is P x 1 ◮ We now can calculate the following scalar quantity, which is a log probability ratio that is modeled as a linear function of the covariates � x i : � π j ( � x i ) � T � log = α j + β j β j β j x i π J ( � x i )

Overview of Modeling Process, continued ◮ Specifying the probabilities π j relative to the reference category π J specifies a similar log probability ratio for any two categories π a , π b , a � = b , since � π a ( � x i ) � π b ( � x i ) � π a ( � x i ) � � � log − log = log π J ( � x i ) π J ( � x i ) π b ( � x i ) ◮ Note that we only need to model ( J − 1) of the probabilities π j , since the constraint � J j =1 π j ( � x i ) = 1 uniquely constrains the J th conditional on the ( J − 1).

Formulation of the BCL Model as a Multivariate GLM Response Vector y i = ( y i 1 , y i 2 , . . . , y i ( J − 1) ) � Expected Response Vector E [ � y i ] = g ( � µ i ) Argument to Link Function µ i � = ( µ i 1 , µ i 2 , . . . , µ i ( J − 1) ) = ( π 1 ( � x i ) , π 2 ( � x i ) , . . . , π J − 1 ( � x i )) Link Function � T � x i ) , . . . , log π ( J − 1) ( � x i ) log π 1 ( � x i ) x i ) , log π 2 ( � x i ) g ( � µ i ) = = X i β β β π J ( � π J ( � π J ( � x i ) where X i and β β β are defined on the next slide

Formulation of the BCL Model as a Multivariate GLM Matrix of Covariates X i is a ( J − 1) x P ( J − 1) matrix (recall that P is the number of covariates) constructed from blocks of the form (1 , x i 1 , x i 2 , . . . , x i ( P − 1) )   1 x i 1 . . . x iP 0 0 . . . 0 . . . 0 . . . 0 0 0 0 1 0 0 . . . x i 1 . . . x iP . . . . . .     X i = . . . . . . . . . . . . . . . . . .  . . . . . . . . . . . . . . . . . .  . . . . . . . . . . . . . . . . . .     0 0 . . . 0 0 0 . . . 0 . . . 1 x i 1 .. x ip Vector of Parameters β β β is a column vector with dimension ( J − 1) P x 1, containing the category-specific coefficients α j and β jk for j ∈ { 1 , J − 1 } and k ∈ { 1 , P } : β = ( α 1 , β 11 , . . . , β 1 P , α 2 , β 21 , . . . , β 2 P , . . . , α J − 1 , β ( J − 1)1 , . . . ,β ( J − 1) P ) T β β

Multivariate GLM : The Mechanics of Prediction ◮ X i is J − 1 x P ( J − 1) and β β β is P ( J − 1) x 1 ◮ � y i = g ( � µ i ) = X i β β β is a J − 1 x 1 column vector Let X ( j ) refer to the j th row vector of X i . Then the dot product of i X ( j ) with the parameter vector β β β is the predicted log probability i ratio for observation i and non-reference category C j : � π j ( � x i ) � = X ( j ) y ij = g ( � µ i ) = log · β β β i π J ( � x i )

Multivariate GLM : Example of the Mechanics of Prediction Suppose we wish to calculate y i 1 . The first row vector of X i is: X ( 1 ) = (1 , x i 1 , x i 2 , . . . , x iP , 0 , 0 , 0 , . . . , 0) i The column vector of parameters β is the same for all i : β β β = ( α 1 , β 11 , . . . , β 1 P , α 2 , β 21 , . . . , β 2 P , . . . ,α J − 1 , β ( J − 1)1 , . . . , β ( J − 1) P ) Their dot product gives us the predicted y i 1 : � π 1 ( � x i ) � y i 1 = g ( π 1 ( � x i )) = log π J ( � x i ) = X ( 1 ) · β β β i =1 α 1 + x i 1 β 11 + · · · + x iP β 1 P + 0 ∗ α 2 +0 ∗ β 21 + · · · + 0 ∗ β 2 p + . . . + 0 ∗ α J − 1 +0 ∗ β ( J − 1)1 + · · · + 0 ∗ β ( J − 1) p =1 α 1 + x i 1 β 11 + · · · + x iP β 1 P

Response Probabilities Note the following relationship: � π j ( � x i ) exp( X i β j β j β j ) � log = X i β β β = ⇒ π j ( � x i ) = 1 + � J − 1 π J ( � x i ) n =1 exp( X i β n β n β n ) The argument of the log function here is sometimes referred to as the “relative risk” in the public health setting.

Response Probabilities Plotting the π j ( � x i ) as a function of one covariate x ij can provide a nice graphic of how these probabilities compare to one another when projected onto x ij × π j (i.e., compare the category-specific response probabilities for different values of the j th covariate for subject i with all other covariates held constant).

Using χ 2 or G 2 as a Model Check When all predictors in a model are categorical and the training data can be represented in a contingency table that is not sparse, the chi 2 or G 2 goodness-of-fit tests used earlier in the semester can be used to assess whether or not the fitted BCL model is appropriate. (generate “expected” contingency table from predicted results and then “residuals” are expected-observed) If some predictors are not categorical or the contingency table is sparse, these statistics are “valid only for comparing nested models differing by relatively few terms” (A. Agresti, Categorical Data Analysis p. 294). This means that they cannot validly be used as a model check overall, but they can be used to compare fit of full vs. reduced models if the full model only has “relatively few” more covariates than the reduced one(s).

Example: Using Symptoms to Classify Disease (Reich Lab Research) Motivating Question: Confirmatory clinical tests are expensive and take time, meaning they are not a reasonable diagnostic option in many public health settings. Can we instead design a model that can use routine observable symptoms to classify sick individuals accurately? (Adapted from work in progress by Brown et. al. ) Categories: Covariates (a few of many in the actual model): ◮ C 1 : Dengue ◮ Age ◮ C 2 : Zika ◮ Headache ◮ C 3 : Flu ◮ Rash ◮ C 4 : Chikingunya ◮ Conjunctivitis ◮ C 5 : Other ◮ ... ◮ C 6 : No Diagnosis

Multivariate GLMs Author: Nicholas Reich, transcribed by Kate Hoff - PowerPoint PPT Presentation

Multivariate GLMs Author: Nicholas Reich, transcribed by Kate Hoff Shutta and Herb Susmann Course: Categorical Data Analysis (BIOSTATS 743) Made available under the Creative Commons Attribution-ShareAlike 4.0 International License. Overview:

Outline Multivariate Data 1 Multivariate Parametric Methods Multivariate Normal Distribution 2

Lecture 15 GPs for GLMs + Spatial Data 3/20/2018 1 GPs and GLMs 2 Bern (

Multivariate t-distributions Surajit Ray Reader, University of Glasgow DataCamp Multivariate

Reading multivariate data Surajit Ray Reader, University of Glasgow DataCamp Multivariate

Multivariate Ordination Analyses: Principal Component Analysis Dilys Vela Tatiana Boza Tatiana

Regression Diagnostics and the Forward Search 3. A Single Multivariate Sample Anthony Atkinson,

Multivariate Linear Regression Max Turgeon STAT 4690Applied Multivariate Analysis

Robust Statistics Part 2: Multivariate location and scatter Peter Rousseeuw LARS-IASC School,

Advanced PHP Dr. Steven Bitner A/B and Multivariate testing Why use multivariate testing If

Multivariate normal distribution Surajit Ray Reader, University of Glasgow DataCamp

Multivariate Normal Distribution Max Turgeon STAT 4690Applied Multivariate Analysis Building

Multivariate Control Charts Stat 3570 28 Feb, 2013 1 / 13 Multivariate Control Charts In

A Multivariate Study of Graduate Student A Multivariate Study of Graduate Student Satisfaction

Four As Model Multivariate Solutions Multivariate Solutions June 2005 June 2005 Background

Tests for Multivariate Linear Models with the car Package John Fox McMaster University Hamilton,

Ensembled Multivariate Adaptive Regression Splines Ensembled Multivariate Adaptive Regression

Outline Storage local/mounted on Compute Elements $OSG_APP, $OSG_WN_TMP, $OSG_DATA

Matching skills needs with skills reserves: Protecting workers and communities for a Just

Introduction to Reinforcement Learning Finale Doshi-Velez Harvard University Buenos Aires MLSS

Advanced Econometrics 2, Hilary term 2021 Reinforcement learning Maximilian Kasy Department of

Finite mixture models Dr. Jarad Niemi STAT 615 - Iowa State University November 28, 2017 Jarad

Explainable Neural Computation via Stack Neural Module Networks (July, 2018) Ronghang Hu, Jacob

Some Discrete Distribution Families Many families of discrete distributions have been studied; we

Discrete Mathematics & Mathematical Reasoning Chapter 7: Discrete Probability Kousha

Multivariate GLMs Author: Nicholas Reich, transcribed by Kate Hoff - PowerPoint PPT Presentation

Multivariate GLMs Author: Nicholas Reich, transcribed by Kate Hoff Shutta and Herb Susmann Course: Categorical Data Analysis (BIOSTATS 743) Made available under the Creative Commons Attribution-ShareAlike 4.0 International License. Overview:

Outline Multivariate Data 1 Multivariate Parametric Methods Multivariate Normal Distribution 2

Lecture 15 GPs for GLMs + Spatial Data 3/20/2018 1 GPs and GLMs 2 Bern (

Multivariate t-distributions Surajit Ray Reader, University of Glasgow DataCamp Multivariate

Reading multivariate data Surajit Ray Reader, University of Glasgow DataCamp Multivariate

Multivariate Ordination Analyses: Principal Component Analysis Dilys Vela Tatiana Boza Tatiana

Regression Diagnostics and the Forward Search 3. A Single Multivariate Sample Anthony Atkinson,

Multivariate Linear Regression Max Turgeon STAT 4690Applied Multivariate Analysis

Robust Statistics Part 2: Multivariate location and scatter Peter Rousseeuw LARS-IASC School,

Advanced PHP Dr. Steven Bitner A/B and Multivariate testing Why use multivariate testing If

Multivariate normal distribution Surajit Ray Reader, University of Glasgow DataCamp

Multivariate Normal Distribution Max Turgeon STAT 4690Applied Multivariate Analysis Building

Multivariate Control Charts Stat 3570 28 Feb, 2013 1 / 13 Multivariate Control Charts In

A Multivariate Study of Graduate Student A Multivariate Study of Graduate Student Satisfaction

Four As Model Multivariate Solutions Multivariate Solutions June 2005 June 2005 Background

Tests for Multivariate Linear Models with the car Package John Fox McMaster University Hamilton,

Ensembled Multivariate Adaptive Regression Splines Ensembled Multivariate Adaptive Regression

Outline Storage local/mounted on Compute Elements $OSG_APP, $OSG_WN_TMP, $OSG_DATA

Matching skills needs with skills reserves: Protecting workers and communities for a Just

Introduction to Reinforcement Learning Finale Doshi-Velez Harvard University Buenos Aires MLSS

Advanced Econometrics 2, Hilary term 2021 Reinforcement learning Maximilian Kasy Department of

Finite mixture models Dr. Jarad Niemi STAT 615 - Iowa State University November 28, 2017 Jarad

Explainable Neural Computation via Stack Neural Module Networks (July, 2018) Ronghang Hu, Jacob

Some Discrete Distribution Families Many families of discrete distributions have been studied; we

Discrete Mathematics &amp; Mathematical Reasoning Chapter 7: Discrete Probability Kousha

Discrete Mathematics & Mathematical Reasoning Chapter 7: Discrete Probability Kousha