Approximate Bayesian logistic regression via penalized likelihood - PowerPoint PPT Presentation

Introduction Methods and formulas The penlogit command Example Conclusions Approximate Bayesian logistic regression via penalized likelihood estimation with data augmentation Andrea Discacciati Nicola Orsini Unit of Biostatistics and Unit of Nutritional Epidemiology Institute of Environmental Medicine Karolinska Institutet http://www.imm.ki.se/biostatistics/ andrea.discacciati@ki.se 2014 Italian Stata Users Group meeting 13 th November 2014 Andrea Discacciati Karolinska Institutet Approximate Bayesian logistic regression via PLE with DA 1 of 24

Introduction Methods and formulas The penlogit command Example Conclusions Background • Bayesian analyses are uncommon in epidemiological research • Partly because of the absence of Bayesian methods from most basic courses in statistics... • ...but also because of the misconception that they are computationally difficult and require specialized software • However, approximate Bayesian analyses can be carried out using standard software for frequentist analyses (e.g.: Stata) • This can be done through penalized likelihood estimation, which in turn can be implemented via data augmentation Andrea Discacciati Karolinska Institutet Approximate Bayesian logistic regression via PLE with DA 2 of 24

Introduction Methods and formulas The penlogit command Example Conclusions Aims of this presentation • Introduce penalized likelihood (PL) estimation in the context of logistic regression • Present a new Stata command ( penlogit ) that fits penalized logistic regression via data augmentation • Show a practical example of a Bayesian analysis using penlogit Andrea Discacciati Karolinska Institutet Approximate Bayesian logistic regression via PLE with DA 3 of 24

Introduction Methods and formulas The penlogit command Example Conclusions How to fit a Bayesian model A partial list (in order of increasing “exactness”): • Monte Carlo sensitivity analysis • Inverse-variance weighting (information-weighted averaging) • Penalized likelihood • Posterior sampling (e.g.: Markov chain Monte Carlo (MCMC)) Andrea Discacciati Karolinska Institutet Approximate Bayesian logistic regression via PLE with DA 4 of 24

Introduction Methods and formulas The penlogit command Example Conclusions Penalized log-likelihood • A penalized log-likelihood (PLL) is a log-likelihood with a penalty function added to it PLL for a logistic regression model ln [ L ( β ; x )] + P ( β ) = x T x T � � � � �� ln expit i β y i + ln 1 − expit i β ( n i − y i ) + P ( β ) i • β = { β 1 , . . . , β p } is the vector of unknown regression coefficients • ln ( L ( β ; x )) is the log-likelihood of a standard logistic regression • P ( β ) is the penalty term • The penalty P ( β ) pulls or shrinks the final estimates away from the ML estimates, toward m = { m 1 , . . . , m p } Andrea Discacciati Karolinska Institutet Approximate Bayesian logistic regression via PLE with DA 5 of 24

Introduction Methods and formulas The penlogit command Example Conclusions Bayesian perspective Link between PLL and Bayesian framework We add the logarithm of the prior density function f ( β ) as the penalty term P ( β ) in the log-likelihood • A prior for a parameter β i is a probability distribution that reflects one’s uncertainty about β i before the data under analysis is taken into account • Two extreme cases: priors with + ∞ variance and priors with 0 variance Andrea Discacciati Karolinska Institutet Approximate Bayesian logistic regression via PLE with DA 6 of 24

Introduction Methods and formulas The penlogit command Example Conclusions Normal priors • Normal priors for β i (ln(OR)): β i ∼ N ( m i , v i ) • These priors are symmetric and unimodal • m i =mean=median=mode • Amount of background information controlled by the variance v i • Equivalently, these are log-normal priors on the OR scale (exp( β i )) Penalty function �� q v j ( β j − m j ) 2 � P (˜ β ) = − 1 1 2 j =1 Andrea Discacciati Karolinska Institutet Approximate Bayesian logistic regression via PLE with DA 7 of 24

Introduction Methods and formulas The penlogit command Example Conclusions Generalized log-F priors • Characterized by 4 parameters: β i ∼ log-F( m i , df 1 , i , df 2 , i , s i ) • These priors are unimodal ( m i ), but can be skewed (increasing the difference between df 1 , i and df 2 , i ) • Log-F priors are more flexible than normal priors and are useful for example when prior information is directional −6 −4 −2 0 2 4 6 β Andrea Discacciati Karolinska Institutet Approximate Bayesian logistic regression via PLE with DA 8 of 24

Introduction Methods and formulas The penlogit command Example Conclusions Posterior distribution Posterior distribution and PLL The PLL is, apart from an additive constant, equal to the logarithm of the posterior distribution of β given the data • In terms of PL: PL ( β ; x ) ∝ f ( β | x ) = k × L ( β ; x ) × � j f j ( β j ) • Maximum PL estimate of β ( β post ) is the maximum a posteriori estimate • 100(1 − α )% Wald CL are the approximate posterior limits, i.e. the α 2 and (1 − α 2 ) quantiles of the posterior distribution • It the profile PLL of β i is not closely quadratic, it is better to use penalized profile-likelihood limits to approximate posterior limits Andrea Discacciati Karolinska Institutet Approximate Bayesian logistic regression via PLE with DA 9 of 24

Introduction Methods and formulas The penlogit command Example Conclusions Data-augmentation priors (DAPs) • Algebraically equivalent way of maximizing the PLL is using DAPs • Prior distributions on the parameters are represented by prior data records created ad hoc • Prior data records generate a penalty function that imposes the desired priors on the model parameters • Estimation carried out using standard ML machinery on the augmented dataset (i.e. original and DAP records) Advantage of PL estimation via DAPs By translating prior distributions to equivalent data, DAPs are one way of understanding the logical strength of the imposed priors Andrea Discacciati Karolinska Institutet Approximate Bayesian logistic regression via PLE with DA 10 of 24

Introduction Methods and formulas The penlogit command Example Conclusions penlogit — a brief overview Description penlogit provides estimates for the penalized logistic model, whose PLL was defined in slide 5, using data augmentation priors • Specify a binary outcome and one or more covariates • Priors can be imposed using the nprior and lfprior options • Penalized profile-likelihood limits can be obtained with the ppl option • net install penlogit, from(http://www.imm.ki.se/biostatistics/stata/) Andrea Discacciati Karolinska Institutet Approximate Bayesian logistic regression via PLE with DA 11 of 24

Introduction Methods and formulas The penlogit command Example Conclusions The data • Data from a study of obstetric care and neonatal death ( n = 2992) • The full dataset includes a total of 14 covariates • Univariate analysis: hydramnios during pregnancy as the exposure Hydramnios X = 1 X = 0 Total Deaths ( Y = 1) 1 16 17 Survivals ( Y = 0) 9 2 , 966 2 , 975 Total 10 2 , 982 2 , 992 • Sparse data (only one exposed case) Andrea Discacciati Karolinska Institutet Approximate Bayesian logistic regression via PLE with DA 12 of 24

Introduction Methods and formulas The penlogit command Example Conclusions Frequentist analysis • No explicit prior on β hydram • This corresponds to an implicit prior N (0 , + ∞ ) • This prior gives equal odds on OR = 10 − 100 , OR = 1 or OR = 10 100 Logistic regression Number of obs = 2992 ------------------------------------------------------------------------------ death | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- hydram | 3.025156 1.083489 2.79 0.005 .9015571 5.148755 ------------------------------------------------------------------------------ death | Coef. Std. Err. [95% PLL Conf. Int .] -------------+----------------------------------------------- hydram | 3.025156 1.199495 .0819808 4.783916 • OR = 20 . 6 (95% profile-likelihood C.I.: 1.08, 119) • Profile-likelihood function for β hydram is strongly asymmetrical Andrea Discacciati Karolinska Institutet Approximate Bayesian logistic regression via PLE with DA 13 of 24

Introduction Methods and formulas The penlogit command Example Conclusions Specifying the prior for β hydram • Normal prior on β hydram • Prior information was expressed in terms of 95% prior limits on the OR scale: (1, 16) • Under normality, it is easy to calculate the corresponding hyperparameters m hydram and v hydram that yield those 95% prior limits • β hydram ∼ N (ln(4) , 0 . 5) • Semi-Bayes analysis because we do not impose a prior on the intercept β 0 Andrea Discacciati Karolinska Institutet Approximate Bayesian logistic regression via PLE with DA 14 of 24

Approximate Bayesian logistic regression via penalized likelihood - PowerPoint PPT Presentation

Introduction Methods and formulas The penlogit command Example Conclusions Approximate Bayesian logistic regression via penalized likelihood estimation with data augmentation Andrea Discacciati Nicola Orsini Unit of Biostatistics and Unit of

Regression 3: Logistic Regression Marco Baroni Practical Statistics in R Outline Logistic

Logistic Regression James H. Steiger Department of Psychology and Human Development Vanderbilt

Todays lecture Logistic regression How can we use logistic regression for reranking? Shay

From Logistic Regression to Neural Networks CMSC 470 Marine Carpuat Logistic Regression What

LEARNING Outline Math Behind Logistic Regression Visualizing Logistic Regression Loss

Workshop 10.5a: Logistic regression Murray Logan August 23, 2016 Table of contents 1 Logistic

Logistic Regression using OLS1D in Excel 2013 XL4D: V0H XL4D: V0H XL4D: V0H 2015 Schield

Workshop 10.5a: Logistic regression Murray Logan 05 Sep 2016 Section 1 Logistic regression

Lecture 3: Logistic Regression Feng Li Shandong University fli@sdu.edu.cn September 21, 2020

Regression Methods 1. Linear Regression and Logistic Regression: definitions, and a common

XL4B: Logistic Regression using OLS1B in Excel 2013 25 Feb 2018 V0C-2x XL4B: V0C-2x XL4B: V0C-2x

Logistic regression Shay Cohen (based on slides by Sharon Goldwater) 28 October 2019 Todays

Machine Learning Logistic Regression Hamid R. Rabiee Spring 2015

Learning From Data Lecture 9 Logistic Regression and Gradient Descent Logistic Regression

Logistic regression Predict binary outcomes (success/failure) from numerical or categorical

Approximate Bayesian Computation Chris Drovandi, Charisse Farr October 24, 2012 Chris Drovandi,

Technical Briefing: the discharge of Condition 19 to the deemed planning permission for 9 East

Is Social Cohesion the missing link in overcoming violence, inequality and poverty? Laboratory

High Resolution MS in Forensic Toxicology Screening Osama Abu-Nimreh CMD Sales Support Specialist

Approaches in modelling tritium uptake by crops EMRAS II Approaches for Assessing Emergency

KODA AUTO University KODA AUTO University Agenda on KODA AUTO University Enterprise

Certification in Humanitarian Logistics Level I HLC 2007 How Certification came about.HLC

Personalized Medicine - Transportation and Logistic Aspects Israel December 2016 Victoria Nahary

Improving Supply Chain Efficiency in the Freight Transport and Logistics Industry Michael