Conditional likelihood models for distributional regression analysis - PowerPoint PPT Presentation

Conditional likelihood models for distributional regression analysis Philippe Van Kerm University of Luxembourg and LISER 2020 Swiss Stata Conference — November 19, 2020

Conditional likelihood models in a nutshell • Fit a parametric distribution function f θ ( y ) ... .0005 • θ is a small vector of parameters .0004 (typically, say, 2–4 parameters) • e.g., a (log-)normal, a gamma, a beta .0003 Density distribution, etc. • ... conditioning on vector of covariates, .0002 f θ ( X ) ( y ) .0001 • ... by specifying a parametric relationship between X and θ 0 0 5000 10000 15000 • For example, θ ( X ) = Xβ (or Income θ ( x ) = exp ( Xβ ) if θ ( X ) must be > 0)

Conditional likelihood models in a nutshell • Fit a parametric distribution function f θ ( y ) ... .0005 • θ is a small vector of parameters Mother has low education .0004 (typically, say, 2–4 parameters) • e.g., a (log-)normal, a gamma, a beta .0003 Density distribution, etc. • ... conditioning on vector of covariates, .0002 f θ ( X ) ( y ) .0001 • ... by specifying a parametric relationship between X and θ 0 0 5000 10000 15000 • For example, θ ( X ) = Xβ (or Income θ ( x ) = exp ( Xβ ) if θ ( X ) must be > 0)

Conditional likelihood models in a nutshell • Fit a parametric distribution function f θ ( y ) ... .0005 • θ is a small vector of parameters Mother has high education .0004 (typically, say, 2–4 parameters) • e.g., a (log-)normal, a gamma, a beta .0003 Density distribution, etc. • ... conditioning on vector of covariates, .0002 f θ ( X ) ( y ) .0001 • ... by specifying a parametric relationship between X and θ 0 0 5000 10000 15000 • For example, θ ( X ) = Xβ (or Income θ ( x ) = exp ( Xβ ) if θ ( X ) must be > 0)

Uses of conditional likelihood models • Functional outcomes (Biewen and Jenkins, 2005) .0005 • Quantile regression... without running .0004 quantile regression (Noufaily and Jones, 2013) .0003 Density • Censored data (Jenkins et al., 2011) .0002 • Endogenous selection (Van Kerm, 2013) • Instrumental variables (Briseño Sanchez .0001 et al., 2020) 0 • Marginalisation and counterfactual 0 5000 10000 15000 Income distributions (Van Kerm et al., 2017)

Array of models for conditional distributions F X Many models and estimators available, more or less parametrically restricted, e.g., • quantile regression (Koenker and Bassett, 1978) • distribution regression (Foresi and Peracchi, 1995, Chernozhukov et al., 2013, Van Kerm, 2016) • duration models (Donald et al., 2000, Royston, 2001) • conditional likelihood models (Biewen and Jenkins, 2005, Van Kerm et al., 2017)

1 Quantile regression 2 Distribution regression 3 Conditional likelihood models

Linear quantile regression model Assume a particular relationship (linear) between conditional quantile and x : Q τ ( y | x ) = xβ τ (Or equivalently y i = x i β τ + u i where F − 1 u i | x i ( τ ) = 0) ˆ � β τ = arg min ρ τ ( y i − x i β ) β i (Koenker and Bassett, 1978) Estimate of the conditional quantile (given linear model): Q τ ( y | x ) = x ˆ ˆ β τ ˆ β τ can be interpreted as the marginal change in the τ conditional quantile for a marginal change in x

Recovering υ ( F x ) Estimation of ˆ Q τ ( y | x ) for a continuum of τ in ( 0, 1 ) provides a model for the entire conditional quantile function of Y given X (the quantile ‘process’–See Blaise Melly’s presentation and qrprocess for fast implementation) After estimation of the quantile process ( 0, 1 ) , estimation of the distributional statistic conditional on X is relatively easy by simulation: • a set of predicted conditional quantile values { x i ˆ β θ } θ ∈ ( 0,1 ) is a pseudo-random draw from F x (if grid for θ is equally-spaced) (Autor et al., 2005) • so, a simple estimator for υ from unit-record data can be used to estimate υ ( F X i )

Disadvantage? Linearity of the model Q τ ( y | x ) = xβ τ may possibly be problematic in some situations • discontinuities (e.g. minimum wage) • quantile crossing within the support of X (Simple solution is re-arrangement of quantile predictions (Chernozhukov et al., 2009))

‘Distribution regression’ F x ( y ) = Pr { y i � y | x } is a binary choice model once y is fixed (dependent variable is 1 ( y i < y ) ) Estimate F x ( y ) on a grid of values for y spanning the domain of definition of Y by running repeated standard binary choice models, e.g. a logit: F x ( y ) = Pr { y i � y | x } = Λ ( xβ y ) exp ( xβ y ) = 1 + exp ( xβ y ) or a probit F x ( y ) = Φ ( xβ y ) or else ...

‘Distribution regression’ • Estimate distributional process by repeating estimation at different values of y —makes little assumptions about the overall shape of distribution • Discontinuities are handled without difficulties • Estimation of these models is well-known and straightforward ( probit , logit ) • Faster to run than quantile regression • Evidence that provides better fit to conditional quantile processes than quantile regression (Rothe and Wied, 2013, Van Kerm et al., 2017)

Disadvantage Drawback: Conditional statistic υ ( F x ) often less easy to recover from the ˆ F X predictions than with quantile regression • invert the predicted F x to obtain predicted quantiles • proceed as with quantiles predicted from quantile regression (see above)

Conditional likelihood models Assume that the conditional distribution has a particular parametric form: e.g., (log-)normal (2 parameters – quite restrictive), Gamma (2 params), Singh-Maddala (3 param.), Dagum (3 param.), GB2 (4 param.), ... or any other distribution that is likely to fit the data at hand (think domain of definition, fatness of tails, modality) Let parameters (say vector θ ) depend on x in a particular fashion, typically linearly (up to some transformation satisfyng range of variation of pthe arameters), e.g., θ 1 X = exp ( xβ 1 ) , θ 2 X = exp ( xβ 2 ) and θ 3 X = xβ 3 This gives a fully specified parametric model which can be estimated using maximum likelihood ( = ⇒ inference is straightforward).

Functionals derived from conditional likelihood models • With parameter estimates ˆ θ X , we can recover conditional quantiles, CDF, PDF and all sort of functionals υ ( F x ) (means, dispersion measures, etc.) often from closed-from expressions • Typically much less computationally expensive than estimating full quantile/distributional processes • Price to pay is stronger parametric assumptions! (Look at goodness-of-fit statistics (KS, KL, of predicted dist – contrast with non-parametric fit also useful; see (Rothe and Wied, 2013)) • User-written commands in Stata do these estimations for many models (Stephen Jenkins, Nick Cox and colleagues): smfit , dagumfit , gb2fit , lognfit , paretofit , fiskfit , gammafit , betafit , gevfit , invgammafit , weibullfit ) – and relatively easy to program new distributions

Likelihood framework makes several important extensions easy • Censoring (e.g., top-coding in income data, minimum wage) • Involves minor modification to likelihood contribution for censored observations (1 − F ( y ) instead of f ( y ) ) • Endogenous selection • Standard selection model à la Heckman (joint normal) (relatively) easily extended to other distributional assumptions in likelihood framework using copula-based representations (Van Kerm, 2013) Details • Multivariate distributions Details

Example: Modelling income with a Singh-Maddala distribution Household income in Luxembourg, by educational achievement of father and mother (cf. inequality of opportunity analysis) 3-parameters Singh-Maddala distribution often provides good fit to income distributions .0005 .0004 • Constrained version of 4-parameter GB2; similar to a .0003 Density Dagum distribution .0002 • Stephen Jenkins’ smfit .0001 • (Using here home-brewed smfit2 —log-linear in covariates) 0 0 5000 10000 15000 Income • Closed-form expressions available for PDF, CDF, percentiles, mode, Gini coefficient, etc. (see help smfit )

Fitting a model with no covariates

Fitting a model with no covariates Recover functionals with closed form expressions: nlcom

Conditional likelihood models for distributional regression analysis - PowerPoint PPT Presentation

Conditional likelihood models for distributional regression analysis Philippe Van Kerm University of Luxembourg and LISER 2020 Swiss Stata Conference November 19, 2020 Conditional likelihood models in a nutshell Fit a parametric

Maximum likelihood models Tues. Feb. 27, 2018 1 Overview of today Informal notion of

Max. likelihood & Bayesian techniques are both likelihood-based. Weaknesses of likelihood for

Distributional Semantics The unsupervised modeling of meaning on a large scale Tim Van de Cruys

Compositional Distributional Semantic Models for Semantic Relatedness and Entailment Sidharth

Max Likelihood for Log-Linear Models Daphne Koller Log-Likelihood for Markov Nets A B C

Linear mixed models with improper priors and flexible distributional assumptions for longitudinal

Lesson 3: Likelihood-based inference for POMP models Aaron A. King, Edward L. Ionides, Kidus

Maximum Likelihood properties Maximum parsimony Maximum likelihood Experimental design

Applied Statistics Lecturer: Serena Arima Likelihood ML estimator Summaries ML properties LR

Review: Conditional Probability Conditional Probability The conditional probability of event

11/15/16 Conditional distributions Let X and Y be discrete r.v.s. Conditional probability mass

Distributional Compositionality Intro to Distributional Semantics Raffaella Bernardi University

Statistics and Samples in Distributional Reinforcement Learning Mark Rowland, Robert Dadashi,

Statistics and Samples in Distributional Reinforcement Learning Rowland, Dadashi, Kumar, Munos,

Automatic construction of distributional thesaurus (for multiple languages) Zheng ZHANG 1 st

Chapter 8: Estimation In this chapter we will cover: 1. The likelihood and maximum likelihood

Tema Custeio Varivel e Margem de Contribuio Projeto Ps-graduao Curso MBA em

Haskell: Batteries Included Don Stewart Duncan Coutts Isaac Potoczny-Jones Data visualisation

Sichig lie Sara Santos, Maths Busking Bozidar Butorac

COMPUTER GRAPHICS COURSE Rasterization Architectures Georgios Papaioannou - 2014 A High Level

Kerberos Nicolas Gren` eche Centre de Ressources Informatique (CRI) - Universit e dOrl

Scapy en pratique Renaud Lifchitz Scapy en pratique 1 PyCON FR 17 Mai 2008 - Renaud

Type checking and normalisation James Chapman - University of Nottingham My thesis Type

ARCHER Training Courses General Overview Reusing this material This work is licensed under a

Conditional likelihood models for distributional regression analysis - PowerPoint PPT Presentation

Conditional likelihood models for distributional regression analysis Philippe Van Kerm University of Luxembourg and LISER 2020 Swiss Stata Conference November 19, 2020 Conditional likelihood models in a nutshell Fit a parametric

Maximum likelihood models Tues. Feb. 27, 2018 1 Overview of today Informal notion of

Max. likelihood &amp; Bayesian techniques are both likelihood-based. Weaknesses of likelihood for

Distributional Semantics The unsupervised modeling of meaning on a large scale Tim Van de Cruys

Compositional Distributional Semantic Models for Semantic Relatedness and Entailment Sidharth

Max Likelihood for Log-Linear Models Daphne Koller Log-Likelihood for Markov Nets A B C

Linear mixed models with improper priors and flexible distributional assumptions for longitudinal

Lesson 3: Likelihood-based inference for POMP models Aaron A. King, Edward L. Ionides, Kidus

Maximum Likelihood properties Maximum parsimony Maximum likelihood Experimental design

Applied Statistics Lecturer: Serena Arima Likelihood ML estimator Summaries ML properties LR

Review: Conditional Probability Conditional Probability The conditional probability of event

11/15/16 Conditional distributions Let X and Y be discrete r.v.s. Conditional probability mass

Distributional Compositionality Intro to Distributional Semantics Raffaella Bernardi University

Statistics and Samples in Distributional Reinforcement Learning Mark Rowland, Robert Dadashi,

Statistics and Samples in Distributional Reinforcement Learning Rowland, Dadashi, Kumar, Munos,

Automatic construction of distributional thesaurus (for multiple languages) Zheng ZHANG 1 st

Chapter 8: Estimation In this chapter we will cover: 1. The likelihood and maximum likelihood

Tema Custeio Varivel e Margem de Contribuio Projeto Ps-graduao Curso MBA em

Haskell: Batteries Included Don Stewart Duncan Coutts Isaac Potoczny-Jones Data visualisation

Sichig lie Sara Santos, Maths Busking Bozidar Butorac

COMPUTER GRAPHICS COURSE Rasterization Architectures Georgios Papaioannou - 2014 A High Level

Kerberos Nicolas Gren` eche Centre de Ressources Informatique (CRI) - Universit e dOrl

Scapy en pratique Renaud Lifchitz Scapy en pratique 1 PyCON FR 17 Mai 2008 - Renaud

Type checking and normalisation James Chapman - University of Nottingham My thesis Type

ARCHER Training Courses General Overview Reusing this material This work is licensed under a

Max. likelihood & Bayesian techniques are both likelihood-based. Weaknesses of likelihood for