mboost - Componentwise Boosting for Generalised Regression Models - PowerPoint PPT Presentation

mboost - Componentwise Boosting for Generalised Regression Models Thomas Kneib & Torsten Hothorn Department of Statistics Ludwig-Maximilians-University Munich 13.8.2008

Thomas Kneib Boosting in a Nutshell Boosting in a Nutshell • Boosting is a simple but versatile iterative stepwise gradient descent algorithm. • Versatility: Estimation problems are described in terms of a loss function ρ (e.g. the negative log-likelihood). • Simplicity: Estimation reduces to iterative fitting of base-learners to residuals (e.g. regression trees). • Componentwise boosting yields – a structured model fit (interpretable results), – model choice and variable selection. mboost - Componentwise Boosting for Generalised Regression Models 1

Thomas Kneib Boosting in a Nutshell • Example: Estimation of a generalised linear model E ( y | η ) = h ( η ) , η = β 0 + x 1 β 1 + . . . + x p β p . • Employ the negative log-likelihood as the loss function ρ . • Componentwise boosting algorithm: (i) Initialise the parameters (e.g. ˆ β j ≡ 0 ); set m = 0 . (ii) Compute the negative gradients (’residuals’) � u i = − ∂ � ∂ηρ ( y i , η ) η [ m − 1] , i = 1 , . . . , n. � � η =ˆ mboost - Componentwise Boosting for Generalised Regression Models 2

Thomas Kneib Boosting in a Nutshell (iii) Fit least-squares base-learning procedures for all the parameters yielding j X j ) − 1 X ′ b j = ( X ′ j u and find the best-fitting one: n j ∗ = argmin � ( u i − x ij b j ) 2 . 1 ≤ j ≤ p i =1 (iv) Update the estimates via β [ m ] ˆ = ˆ β [ m − 1] + νb j ∗ , j ∗ j ∗ and β [ m ] β [ m − 1] ˆ = ˆ for all j � = j ∗ . j j (v) If m < m stop , increase m by 1 and go back to step (ii). mboost - Componentwise Boosting for Generalised Regression Models 3

Thomas Kneib Boosting in a Nutshell • The reduction factor ν turns the base-learner into a weak learning procedure (avoids to large steps along the gradient in the boosting algorithm). • The componentwise strategy yields a structured model fit (recurs to single regression coefficients). • Most crucial point: Determine optimal stopping iteration m stop . • Most frequent strategies: AIC-reduction or cross-validation. • When stopping the algorithm, redundant covariate effects will never have been selected as the best-fitting component ⇒ These drop completely out of the model. • Componentwise boosting with early stopping implements model choice and variable selection. mboost - Componentwise Boosting for Generalised Regression Models 4

Thomas Kneib mboost mboost • mboost implements a variety of base-learners and boosting algorithms for generalised regression models. • Examples of loss functions: L 2 , L 1 , exponential family log-likelihoods, Huber, etc. • Three model types: – glmboost for models with linear predictor. – blackboost for prediction oriented black-box models. – gamboost for models with additive predictors. mboost - Componentwise Boosting for Generalised Regression Models 5

Thomas Kneib mboost • Various baselearning procedures: – bbs : penalized B-splines for univariate smoothing and varying coefficients. – bspatial : penalized tensor product splines for spatial effects and interaction surfaces. – brandom : ridge regression for random intercepts and slopes. – btree : stumps for one or two variables. – further univariate smoothing baselearners: bss , bns . mboost - Componentwise Boosting for Generalised Regression Models 6

Thomas Kneib Penalised Least Squares Base-Learners Penalised Least Squares Base-Learners • Several of mboost ‘s baselearning procedures are based on penalised least-squares fits. • Characterised by the hat matrix S λ = X ( X ′ X + λK ) − 1 X ′ with smoothing parameter λ and penalty matrix K . • Crucial: Choose the smoothing parameter appropriately. • To avoid biased selection towards more flexible effects, all base-learners should be assigned comparable degrees of freedom df( λ ) = trace( X ( X ′ X + λK ) − 1 X ′ ) . mboost - Componentwise Boosting for Generalised Regression Models 7

Thomas Kneib Penalised Least Squares Base-Learners • In many cases, a reparameterisation is required to achieve suitable values for the degrees of freedom. • Example: A linear effect remains unpenalised with penalised spline smoothing and second derivative penalty ⇒ df( λ ) ≥ 2 . • Decompose f ( x ) into a linear component and the deviation from the linear component. • Assign separate base-learners (with df = 1 ) to the linear effect and the deviation. • Additional advantage: Allows to decide whether a non-linear effect is required. mboost - Componentwise Boosting for Generalised Regression Models 8

Thomas Kneib Forest Health Example: Geoadditive Regression Forest Health Example: Geoadditive Regression • Aim of the study: Identify factors influencing the health status of trees. • Database: Yearly visual forest health inventories carried out from 1983 to 2004 in a northern Bavarian forest district. • 83 observation plots of beeches within a 15 km times 10 km area. • Response: binary defoliation indicator y it of plot i in year t (1 = defoliation higher than 25%). • Spatially structured longitudinal data. mboost - Componentwise Boosting for Generalised Regression Models 9

Thomas Kneib Forest Health Example: Geoadditive Regression • Covariates: Continuous: average age of trees at the observation plot elevation above sea level in meters inclination of slope in percent depth of soil layer in centimeters pH-value in 0 – 2cm depth density of forest canopy in percent Categorical thickness of humus layer in 5 ordered categories base saturation in 4 ordered categories Binary type of stand application of fertilisation mboost - Componentwise Boosting for Generalised Regression Models 10

Thomas Kneib Forest Health Example: Geoadditive Regression • Specification of a logit model exp( η it ) P ( y it = 1) = 1 + exp( η it ) with geoadditive predictor η it . • All continuous covariates are included with penalised spline base-learners decomposed into a linear component and the orthogonal deviation, i.e. g ( x ) = xβ + g centered ( x ) . • An interaction effect between age and calendar time is included in addition (centered around the constant effect). • The spatial effect is included both as a plot-specific random intercept and a bivariate surface of the coordinates (centered around the constant effect). • Categorical and binary covariates are included as least-squares base-learners. mboost - Componentwise Boosting for Generalised Regression Models 11

Thomas Kneib Forest Health Example: Geoadditive Regression • Results: – No effects of ph-value, inclination of slope and elevation above sea level. – Parametric effects for type of stand, fertilisation, thickness of humus layer, and base saturation. – Nonparametric effects for canopy density and soil depth. – Both spatially structured effects (surface) and unstructured effect (random effect) with a clear domination of the latter. – Interaction effect between age and calendar time. mboost - Componentwise Boosting for Generalised Regression Models 12

Thomas Kneib Forest Health Example: Geoadditive Regression canopy density depth of soil layer 0.6 0.6 0.4 0.4 0.2 0.2 0.0 0.0 −0.2 −0.2 −0.4 −0.4 −0.6 −0.6 0.0 0.2 0.4 0.6 0.8 1.0 10 20 30 40 50 Correlated spatial effect Uncorrelated random effect 2.0 0.02 1.5 0.01 1.0 0.5 0.00 0.0 −0.01 −0.5 −1.0 mboost - Componentwise Boosting for Generalised Regression Models 13

Thomas Kneib Forest Health Example: Geoadditive Regression 2 1 0 −1 −2 200 2000 150 a g e 1995 o f 100 t h e calendar year t r e e 1990 50 1985 mboost - Componentwise Boosting for Generalised Regression Models 14

Thomas Kneib Summary Summary • Boosting provides both a structured model fit and a possibility for model choice and variable selection in generalised regression models. • Simple approach based on iterative fitting of negative gradients. • Flexible class of base-learners based on penalised least squares. • Implemented in the R package mboost (Hothorn & B¨ uhlmann with contributions by Kneib & Schmid). mboost - Componentwise Boosting for Generalised Regression Models 15

Thomas Kneib Summary • References: – Kneib, T., Hothorn, T. and Tutz, G. (2008): Model Choice and Variable Selection in Geoadditive Regression. To appear in Biometrics . – B¨ uhlmann, P. and Hothorn, T. (2007): Boosting Algorithms: Regularization, Prediction and Model Fitting. Statistical Science , 22, 477–505. • Find out more: http://www.stat.uni-muenchen.de/~kneib mboost - Componentwise Boosting for Generalised Regression Models 16

mboost - Componentwise Boosting for Generalised Regression Models - PowerPoint PPT Presentation

mboost - Componentwise Boosting for Generalised Regression Models Thomas Kneib & Torsten Hothorn Department of Statistics Ludwig-Maximilians-University Munich 13.8.2008 Thomas Kneib Boosting in a Nutshell Boosting in a Nutshell

Boosting: Foundations and Algorithms Boosting: Foundations and Algorithms Boosting: Foundations

Theory and Applications of Boosting Theory and Applications of Boosting Theory and Applications

Theory and Applications of Boosting Theory and Applications of Boosting Theory and Applications

Implementing Generalised Alt Gavin Lowe Implementing Generalised Alt 02 CSO for dummies

Generalised Parsing with Parser Combinators L. Thomas van Binsbergen Royal Holloway, University

Generalised Quantifiers on Automatic Structures Sasha Rubin rubin@cs.auckland.ac.nz Department

Generalised Closed Unbounded and Stationary Sets Hazel Brickhill Young Set Theory Workshop 28

Generalised Parsing and Combinator Parsing A Happy Marriage? L. Thomas van Binsbergen

Generalised n -gons with symmetry conditions Joy Morris joint work with John Bamberg, Michael

Componentwise accurate numerical methods for Markov-modulated Brownian motion Giang T. Nguyen 1

RECSM Summer School: Machine Learning for Social Sciences Session 2.4: Boosting Reto West

Boosting Methods: Implicit Combinatorial Optimization via First-Order Convex Optimization Robert

Regression 3: Logistic Regression Marco Baroni Practical Statistics in R Outline Logistic

Regression Methods 1. Linear Regression and Logistic Regression: definitions, and a common

Lecture #16: Boosting Data Science 1 CS 109A, STAT 121A, AC 209A, E-109A Pavlos Protopapas

An overview of Boosting Yoav Freund UCSD Plan of talk Generative vs. non-generative

Overview Multi-Attribute Probabilistic Choice Models Probabilistic choice models Florian

Search Marco Chiarandini Department of Mathematics & Computer Science University of Southern

Dynamic Programming Greedy. Build up a solution incrementally, myopically optimizing

Probabilistic Models Shan-Hung Wu shwu@cs.nthu.edu.tw Department of Computer Science, National

1 A simplified definition Example: Rolling a dice Let be the sample space of an experiment. A

Chapter 4 continued: Probability models 1. Random variables: a) Idea. b) Discrete and continuous

Probability Chapters 4 & 5 Overview Statistics important for game analysis

Mixed models in R using the lme4 package Part 5: Generalized linear mixed models Douglas Bates

mboost - Componentwise Boosting for Generalised Regression Models - PowerPoint PPT Presentation

mboost - Componentwise Boosting for Generalised Regression Models Thomas Kneib & Torsten Hothorn Department of Statistics Ludwig-Maximilians-University Munich 13.8.2008 Thomas Kneib Boosting in a Nutshell Boosting in a Nutshell

Boosting: Foundations and Algorithms Boosting: Foundations and Algorithms Boosting: Foundations

Theory and Applications of Boosting Theory and Applications of Boosting Theory and Applications

Theory and Applications of Boosting Theory and Applications of Boosting Theory and Applications

Implementing Generalised Alt Gavin Lowe Implementing Generalised Alt 02 CSO for dummies

Generalised Parsing with Parser Combinators L. Thomas van Binsbergen Royal Holloway, University

Generalised Quantifiers on Automatic Structures Sasha Rubin rubin@cs.auckland.ac.nz Department

Generalised Closed Unbounded and Stationary Sets Hazel Brickhill Young Set Theory Workshop 28

Generalised Parsing and Combinator Parsing A Happy Marriage? L. Thomas van Binsbergen

Generalised n -gons with symmetry conditions Joy Morris joint work with John Bamberg, Michael

Componentwise accurate numerical methods for Markov-modulated Brownian motion Giang T. Nguyen 1

RECSM Summer School: Machine Learning for Social Sciences Session 2.4: Boosting Reto West

Boosting Methods: Implicit Combinatorial Optimization via First-Order Convex Optimization Robert

Regression 3: Logistic Regression Marco Baroni Practical Statistics in R Outline Logistic

Regression Methods 1. Linear Regression and Logistic Regression: definitions, and a common

Lecture #16: Boosting Data Science 1 CS 109A, STAT 121A, AC 209A, E-109A Pavlos Protopapas

An overview of Boosting Yoav Freund UCSD Plan of talk Generative vs. non-generative

Overview Multi-Attribute Probabilistic Choice Models Probabilistic choice models Florian

Search Marco Chiarandini Department of Mathematics &amp; Computer Science University of Southern

Dynamic Programming Greedy. Build up a solution incrementally, myopically optimizing

Probabilistic Models Shan-Hung Wu shwu@cs.nthu.edu.tw Department of Computer Science, National

1 A simplified definition Example: Rolling a dice Let be the sample space of an experiment. A

Chapter 4 continued: Probability models 1. Random variables: a) Idea. b) Discrete and continuous

Probability Chapters 4 &amp; 5 Overview Statistics important for game analysis

Mixed models in R using the lme4 package Part 5: Generalized linear mixed models Douglas Bates

Search Marco Chiarandini Department of Mathematics & Computer Science University of Southern

Probability Chapters 4 & 5 Overview Statistics important for game analysis