Boosting a Generalized Poisson Hurdle Model Vera Hofer University - PowerPoint PPT Presentation

Boosting a Generalized Poisson Hurdle Model Vera Hofer University of Graz Paris, 23/08/2010 Vera Hofer Boosting a Generalized Poisson Hurdle Model

Ensemble Techniques ◮ Aim at improving the predictive performance of fitting techniques by by constructing multiple function predictions from the data by means of a “weak” base procedure and then using a convex combination of them for final aggregated prediction ◮ Random forest, boosting and bagging most famous ensemble techniques ◮ Originally designed for classification ◮ Gradient descent approximation in function space (Breiman, 1998, 1999) is an easy tool to use boosting in regression Vera Hofer Boosting a Generalized Poisson Hurdle Model

Usual Regression Let Y ∈ R be a random variable and x ∈ R p a vector of predictor values Let f be a regression function such that ˆ Y = f ( x ). Let L ( Y , f ( x )) be the loss function that measures goodness of fit. For example L ( Y , f ( x )) = ( Y − F ( x )) 2 , known as L 2 -loss. The regression function f is found from minimizing the the expected loss f ( x ) = arg min F E Y | x ( L ( Y , F ( x )) | x = x )) Vera Hofer Boosting a Generalized Poisson Hurdle Model

Boosting Boosting attemts to find a regression function f of the form m � f ( x ) = f m ( x ) i =0 by minimizing expected loss using gradient descent techniques, i.e. following the steepest descent with respect to f of the loss function in a forward stagewise manner. f m are simple functions of x (“base learners”). Choice of the loss function and the type of base learners yield a variety of different boosted regression models. Vera Hofer Boosting a Generalized Poisson Hurdle Model

Gradient Descent Start with initial function f 0 ( x ). In step m ≥ 1, the current argument f m − 1 is changed into the direction of the negative gradient of expected loss − ∂ U m ( x ) = ∂ f E Y | x ( L ( Y , F ( x )) | x = x )) | f = f m − 1 ( x ) = = E Y | x ( −∇ L ( y , f )) | f = f m − 1 ( x ) such that f m = f m − 1 + ν U m , where ∇ L is the gradient of the loss function with respect to f , and ν is the shrinkage parameter. Vera Hofer Boosting a Generalized Poisson Hurdle Model

Sample Version of Gradient Descent � N f 0 is traditionally chosen as f 0 = arg min c i =1 L ( y i , c ). The conditional mean of the negative gradient is found from regression: − The negative gradient of the loss function, V i = −∇ L ( y i , f m − 1 ( x i )), is evaluated at the given sample. − This “pseudo-response” is fitted to the predictors x i by the “base learner” u m to get the direction ˆ U m ( x ) = u m ( x ). − The regression function then becomes f m = f m − 1 + ν u m . − The process is iterated until m = M . Vera Hofer Boosting a Generalized Poisson Hurdle Model

Tuning Parameters M can be determined by cross validation. ν is of minor importance unless it is not too large. Typically, ν = 0 . 1. Smaller values of ν favor better test error but need a larger number of iterations. As “base learner” simple models such as regression tree or componentwise linear least squares (CLLS) are used. CLLS are very fast in calculation, wheras tree can cope with nonlinear structures. Vera Hofer Boosting a Generalized Poisson Hurdle Model

Count Data Regression Common models: Poisson, negative binomial Alternative model: The generalised Poisson distribution (Consul and Jain (1970); Consul (1979)) To address overdispersion caused by an excess of zeros, zero-inflated models were introduced (Johnson and Kotz, 1969; Mullahy, 1986; Lambert, 1992). − Derived from mixing a count distribution and a point mass at zero. − Problem: different sources of zeros impede interpretation Alternative model: hurdle models consist of a hurdle component to account for zeros, and a zero-trunctated count component to account for non-zeros. The zero-truncated component follows any zero-truncated count distribution. Vera Hofer Boosting a Generalized Poisson Hurdle Model

Generalized Poisson Distribution of Y Probability density function, p ( y | µ, φ ), with mean µ , and dispersion parameter φ p ( y | µ, φ ) = µ W y − 1 φ − y e − W φ y ! where W = µ + ( φ − 1) y and µ > 0. Assume φ > 1. Otherwise φ must be restricted to guarantee that p ( y | µ, φ ) ≥ 0. φ > 1 indicates overdispersion, whereas φ < 1 indicates underdispersion. For φ = 1 the GP reduces to the Poisson distribution Mean and variance of the GP are: Var ( Y ) = φ 2 µ E ( Z ) = µ Vera Hofer Boosting a Generalized Poisson Hurdle Model

Generalized Poisson Hurdle Distribution (1) Two-component model: a hurdle component to model zeros versus nonzeros, and a zero-trunctated count component to account for the nonzeros. The hurdle at zero is assumed to be a Bernoulli variable B ( ω, 1) where ω = P ( Y 0 = 0). The zero-truncated component Y T ∼ GP T ( µ, φ, p ) with probability density function p T ( y | µ, φ ) = p ( y | µ, φ ) p (0 | µ, φ ) = p ( y | µ, φ ) 1 − e − µ/φ . where p ( y | µ, φ ) is the GP probability density function Vera Hofer Boosting a Generalized Poisson Hurdle Model

Generalized Poisson Hurdle Distribution (2) Probability density function of a generalised Poisson hurdle distribution (GPH): p H ( y | µ, φ, ω ) = 1 ( y ==0) · ω + 1 ( y > 0) · (1 − ω ) p ( y | µ, φ ) 1 − e − µ/φ , Mean and variance of GPH are (1 − ω ) µ E ( Z ) = 1 − e − µ/φ φ 2 µ (1 − ω ) + µ 2 (1 − ω )( ω − e − µ/φ ) Var ( Z ) = . 1 − e − µ/φ (1 − e − µ/φ ) 2 Vera Hofer Boosting a Generalized Poisson Hurdle Model

Regression Model iid Y i ∼ GPH ( µ i , φ i , ω i ). log( µ i ) = g ( x i ) log( φ i − 1) = h ( x i ) � � ω i log = l ( x i ) 1 − ω i where x i = ( x i 1 , . . . , x ip ) is a vector of predictor values. Vera Hofer Boosting a Generalized Poisson Hurdle Model

Loss Function The loglikelihood function serves as a loss function for determining the predictors g , h , and l : L ( Y , g , h , l ) = � � 1 + e − l �� − log(1 + e l ) + g + = − 1 ( Y =0) − log − 1 ( Y > 0) +( Y − 1) log( e g + e h Y ) − log( Y !) − Y log(1 + e h ) − e g + e h Y e g � � �� − log 1 − exp − 1 + e h 1 + e h Vera Hofer Boosting a Generalized Poisson Hurdle Model

Boosting Generalized Poisson Hurdle Model (1) Common boosting methods are based on a loss function that involves only one ensemble. Thus, they can only be applied when a regression function is fit only for one parameter. The GPH model requires estimating a regression function on all three parameters. When using ensemble techniques, three ensembles must be fit simultaneously. The loss function of the GPH model depends on three inter-related regression functions, g , h , and l . Thus, the gradient of the GPH boost is a three components vector. Vera Hofer Boosting a Generalized Poisson Hurdle Model

Boosting Generalized Poisson Hurdle Model (2) At any step m > 0 the pseudo-responses, ( V g i , V h i , V l i ) , of the three ensembles, are obtained as the negative gradient of the loss function evaluated at the current values ( g m − 1 , h m − 1 , l m − 1 ) of g , h and l � � � − ∂ L ∂ g , − ∂ L ∂ h , − ∂ L ( V g � i , V h i , V w i ) = � ∂ w � ( y i , g m − 1 , h m − 1 , w m − 1 ) where � �  e g e g  exp −  1 + ( y − 1) e g e g − ∂ L 1+ e h 1+ e h ∂ g = 1 ( y > 0) e g + y e h − 1 + e h −  � � e g 1 − exp − 1+ e h Vera Hofer Boosting a Generalized Poisson Hurdle Model

Boosting Generalized Poisson Hurdle Model (3) � y ( y − 1) e h 1 + e h − e h ( y − e g ) ye h − ∂ L ∂ h = 1 ( y > 0) − (1 + e h ) 2 + e g + ye h � � e g e g + h  exp − 1+ e h (1+ e h ) 2 +  � � e g 1 − exp − 1+ e h � 1 � � 1 � − ∂ L = 1 ( y =0) − 1 ( y > 0) 1 + e − l 1 + e l ∂ l Vera Hofer Boosting a Generalized Poisson Hurdle Model

Multivariate Componentwise Least Squares (1) The three pseudo-responses are estimated by multivariate componentwise least squares (MCLLS). The methods assumes that all three ensemble have the same predictors. In each boosting step only one predictor variable is selected in the sense of Wilks’ lambda. − Let X ( j ) be the j -column of the design matrix, and let V be the matrix with i th row ( V g i , V h i , V l i ). − The “base learner” has the form u m ( x ) = β ( s ) x ( s ) , where β ( j ) = � � β ( s ) g , β ( s ) h , β ( s ) = || X ( j ) || − 2 ( X ( j ) ) t V l Vera Hofer Boosting a Generalized Poisson Hurdle Model

Multivariate Componentwise Least Squares (2) det( V t V − ( β ( j ) ) t ( X ( j ) ) t V ) s = arg min t V ) 1 ≤ j ≤ p det( V t V − n V where V is the mean gradient, and n stands for the sample size. This yields the coefficient β ( s ) for the µ -ensemble g , β ( s ) g h for the φ ensemble h , and β ( l ) for the ω ensemble l . Then the l ensembles are updated as g x ( s m ) , g m − 1 ( x ) + νβ ( s ) g m ( x ) = h x ( s m ) , h m − 1 ( x ) + νβ ( s ) h m ( x ) = x ( s m ) . w m − 1 ( x ) + νβ ( s ) w m ( x ) = l Vera Hofer Boosting a Generalized Poisson Hurdle Model

Boosting a Generalized Poisson Hurdle Model Vera Hofer University - PowerPoint PPT Presentation

Boosting a Generalized Poisson Hurdle Model Vera Hofer University of Graz Paris, 23/08/2010 Vera Hofer Boosting a Generalized Poisson Hurdle Model Ensemble Techniques Aim at improving the predictive performance of fitting techniques by

Boosting: Foundations and Algorithms Boosting: Foundations and Algorithms Boosting: Foundations

Theory and Applications of Boosting Theory and Applications of Boosting Theory and Applications

Theory and Applications of Boosting Theory and Applications of Boosting Theory and Applications

Poisson Distribution: Review Poisson Over Time Let B 1 Poisson( ) be the number of bikes

Randomness in Computing L ECTURE 14 Last time Poisson distribution Poisson approximation

Poisson Regression Models for Count Data Outline Review Introduction to Poisson

RECSM Summer School: Machine Learning for Social Sciences Session 2.4: Boosting Reto West

Interest Rate Hedging Overview January 29, 2009 Hans Hurdle Domenic DGinto Managing

15. Poisson Processes In Lecture 4, we introduced Poisson arrivals as the limiting behavior of

Simulating events: the Poisson process Michel Bierlaire michel.bierlaire@epfl.ch Transport and

X-RAY SPECTRAL WORKSHOP 2019 POISSON STATISTICS WITH BACKGROUNDS Background measurement Image

Workshop 10.6a: Poisson regression Murray Logan 12 Sep 2016 Section 1 Poisson regression

Generalized MPLS Signaling draft-ietf-mpls-generalized-signaling-05.txt

Overview of logistic regression Richard Erickson Instructor DataCamp Generalized Linear Models

Poisson Point Processes Will Perkins April 23, 2013 The Poisson Process Say you run a website

Limitations of linear models Richard Erickson Instructor DataCamp Generalized Linear Models in

Derivatives of Exponential and Logarithm Functions 10/17/2011 The Derivative of y = e x Recall!

None Vincent L. Rowe, M.D., F.A.C.S. Professor of Surgery Division of Vascular Surgery and

Tie public debt-to-GDP ratios for CEE countries have improved substantially Debt reduction PEAK

Lecture 7: Term Structure Models Simon Gilchrist Boston Univerity and NBER EC 745 Fall, 2013

Outline Clustering Clustering Clustering is a widely used statistical tool to determine subsets

Two-photon exchange calculations versus data Oleksandr Tomalak Johannes Gutenberg University,

Model Order Reduction of Elastic Multibody Systems with Large Finite Element Models Michael

Magical parallel variant of SIDH Daniel Cervantes-V azquez Eduardo

Sambuz

Useful Links

Newsletter

Mail Us