statistical modelling
play

Statistical Modelling Helen Ogden & Antony Overstall University - PowerPoint PPT Presentation

Statistical Modelling Helen Ogden & Antony Overstall University of Southampton c 2019 (Chapters 12 closely based on original notes by Anthony Davison, Jon Forster & Dave Woods) APTS: Statistical Modelling April 2019 slide 0


  1. Statistical Modelling Helen Ogden & Antony Overstall University of Southampton c � 2019 (Chapters 1–2 closely based on original notes by Anthony Davison, Jon Forster & Dave Woods) APTS: Statistical Modelling April 2019 – slide 0

  2. Statistical Modelling Statistical Modelling 1. Model Selection 1. Model ◃ Selection 2. Beyond the Generalised Linear Model Basic Ideas 3. Non-linear models Linear Model Bayesian Inference APTS: Statistical Modelling April 2019 – slide 0

  3. Statistical Modelling 1. Model ◃ Selection Overview Basic Ideas Linear Model Bayesian Inference 1. Model Selection APTS: Statistical Modelling April 2019 – slide 0

  4. Overview Statistical Modelling 1. Basic ideas 1. Model Selection ◃ Overview 2. Linear model Basic Ideas 3. Bayesian inference Linear Model Bayesian Inference APTS: Statistical Modelling April 2019 – slide 1

  5. Statistical Modelling 1. Model Selection ◃ Basic Ideas Why model? Criteria for model selection Motivation Setting Logistic regression Basic Ideas Nodal involvement Kullback–Leibler discrepancy Log likelihood Wrong model Out-of-sample prediction Information criteria Nodal involvement Theoretical aspects Properties of AIC, NIC, BIC Linear Model Bayesian Inference APTS: Statistical Modelling April 2019 – slide 2

  6. Why model? Statistical Modelling 1. Model Selection Basic Ideas ◃ Why model? Criteria for model selection Motivation George E. P. Box (1919–2013): Setting Logistic regression All models are wrong, but some models are useful. Nodal involvement Kullback–Leibler discrepancy Some reasons we construct models: � Log likelihood – to simplify reality (e ffi cient representation); Wrong model Out-of-sample – to gain understanding; prediction Information criteria – to compare scientific, economic, . . . theories; Nodal involvement – Theoretical aspects to predict future events/data; Properties of AIC, – to control a process. NIC, BIC Linear Model We (statisticians!) rarely believe in our models, but regard them as � Bayesian Inference temporary constructs subject to improvement. Often we have several and must decide which is preferable, if any. � APTS: Statistical Modelling April 2019 – slide 3

  7. Criteria for model selection Substantive knowledge, from prior studies, theoretical Statistical Modelling � 1. Model Selection arguments, dimensional or other general considerations Basic Ideas (often qualitative) Why model? Criteria for model Sensitivity to failure of assumptions (prefer models that are ◃ � selection Motivation robustly valid) Setting Quality of fit—residuals, graphical assessment (informal), or Logistic regression � Nodal involvement goodness-of-fit tests (formal) Kullback–Leibler discrepancy Prior knowledge in Bayesian sense (quantitative) Log likelihood � Wrong model Generalisability of conclusions and/or predictions: Out-of-sample � prediction same/similar models give good fit for many di ff erent datasets Information criteria Nodal involvement Theoretical aspects Properties of AIC, . . . but often we have just one dataset . . . � NIC, BIC Linear Model Bayesian Inference APTS: Statistical Modelling April 2019 – slide 4

  8. Motivation Statistical Modelling Even after applying these criteria (but also before!) we may 1. Model Selection compare many models: Basic Ideas linear regression with p covariates, there are 2 p possible � Why model? Criteria for model combinations of covariates (each in/out), before allowing for selection ◃ Motivation transformations, etc.— if p = 20 then we have a problem; Setting Logistic regression choice of bandwidth h > 0 in smoothing problems � Nodal involvement Kullback–Leibler the number of di ff erent clusterings of n individuals is a Bell � discrepancy number (starting from n = 1 ): 1, 2, 5, 15, 52, 203, 877, Log likelihood Wrong model 4140, 21147, 115975, . . . Out-of-sample prediction we may want to assess which among 5 × 10 5 SNPs on the � Information criteria Nodal involvement genome may influence reaction to a new drug; Theoretical aspects Properties of AIC, . . . � NIC, BIC Linear Model For reasons of economy we seek ‘simple’ models. Bayesian Inference APTS: Statistical Modelling April 2019 – slide 5

  9. Albert Einstein (1879–1955) Statistical Modelling 1. Model Selection Basic Ideas Why model? Criteria for model selection ◃ Motivation Setting Logistic regression Nodal involvement Kullback–Leibler discrepancy Log likelihood Wrong model Out-of-sample prediction Information criteria Nodal involvement Theoretical aspects Properties of AIC, NIC, BIC ‘Everything should be made as simple as possible, but no Linear Model simpler .’ Bayesian Inference APTS: Statistical Modelling April 2019 – slide 6

  10. William of Occam (?1288–?1348) Statistical Modelling 1. Model Selection Basic Ideas Why model? Criteria for model selection ◃ Motivation Setting Logistic regression Nodal involvement Kullback–Leibler discrepancy Log likelihood Wrong model Out-of-sample prediction Information criteria Nodal involvement Theoretical aspects Properties of AIC, NIC, BIC Occam’s razor: Entia non sunt multiplicanda sine Linear Model necessitate : entities should not be multiplied beyond Bayesian Inference necessity . APTS: Statistical Modelling April 2019 – slide 7

  11. Setting To focus and simplify discussion we will consider parametric models, but the � ideas generalise to semi-parametric and non-parametric settings We shall take generalised linear models (GLMs) as example of moderately � complex parametric models: – Normal linear model has three key aspects: structure for covariates : linear predictor η = x T β ; ◃ response distribution : y ∼ N ( µ, σ 2 ) ; and ◃ relation η = µ between µ = E( y ) and η . ◃ – GLM extends last two to y has density ◃ � y θ − b ( θ ) � f ( y ; θ , φ ) = exp + c ( y ; φ ) , φ where θ depends on η ; dispersion parameter φ is often known; and η = g ( µ ) , where g is monotone link function . ◃ APTS: Statistical Modelling April 2019 – slide 8

  12. Logistic regression Statistical Modelling Commonest choice of link function for binary reponses: � 1. Model Selection exp( x T β ) 1 Basic Ideas Pr( Y = 1) = π = 1 + exp( x T β ) , Pr( Y = 0) = 1 + exp( x T β ) , Why model? Criteria for model selection giving linear model for log odds of ‘success’, Motivation Setting � Pr( Y = 1) � � � Logistic π ◃ regression log = log = x T β . Pr( Y = 0) 1 − π Nodal involvement Kullback–Leibler discrepancy Log likelihood Log likelihood for β based on independent responses y 1 , . . . , y n Wrong model � Out-of-sample with covariate vectors x 1 , . . . , x n is prediction Information criteria n n � � Nodal involvement � � ℓ ( β ) = y j x T log 1 + exp( x T j β ) Theoretical aspects j β − Properties of AIC, j =1 j =1 NIC, BIC � � Linear Model ℓ (˜ β ) − ℓ ( � , where � Good fit gives small deviance D = 2 β ) β is � Bayesian Inference model fit MLE and ˜ β is unrestricted MLE. APTS: Statistical Modelling April 2019 – slide 9

  13. Nodal involvement data Statistical Modelling 1. Model Selection Table 1: Data on nodal involvement: 53 patients with prostate Basic Ideas cancer have nodal involvement ( r ), with five binary covariates a ge Why model? Criteria for model etc. selection Motivation a ge s tage g rade x ray a cid m r Setting 6 5 0 1 1 1 1 Logistic regression 6 1 0 0 0 0 1 Nodal ◃ 4 0 1 1 1 0 0 involvement 4 2 1 1 0 0 1 Kullback–Leibler 4 0 0 0 0 0 0 discrepancy 3 2 0 1 1 0 1 Log likelihood 3 1 1 1 0 0 0 Wrong model 3 0 1 0 0 0 1 Out-of-sample 3 0 1 0 0 0 0 prediction 2 0 1 0 0 1 0 Information criteria 2 1 0 1 0 0 1 Nodal involvement 2 1 0 0 1 0 0 Theoretical aspects 1 1 1 1 1 1 1 Properties of AIC, . . . . . . NIC, BIC . . . . . . . . . . . . 1 1 0 0 1 0 1 Linear Model 1 0 0 0 0 1 1 Bayesian Inference 1 0 0 0 0 1 0 APTS: Statistical Modelling April 2019 – slide 10

  14. Nodal involvement deviances Deviances D for 32 logistic regression models for nodal involvement data. + denotes a term included in the model. a ge s t g r x r a c df a ge s t g r x r a c df D D 52 40.71 + + + 49 29.76 + 51 39.32 + + + 49 23.67 + 51 33.01 + + + 49 25.54 + 51 35.13 + + + 49 27.50 + 51 31.39 + + + 49 26.70 + 51 33.17 + + + 49 24.92 + + 50 30.90 + + + 49 23.98 + + 50 34.54 + + + 49 23.62 + + 50 30.48 + + + 49 19.64 + + 50 32.67 + + + 49 21.28 + + 50 31.00 + + + + 48 23.12 + + 50 24.92 + + + + 48 23.38 + + 50 26.37 + + + + 48 19.22 + + 50 27.91 + + + + 48 21.27 + + 50 26.72 + + + + 48 18.22 + + 50 25.25 + + + + + 47 18.07 APTS: Statistical Modelling April 2019 – slide 11

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend