the two cultures a discussion
play

the two cultures: a discussion Katrin Newger Supervisor: Christoph - PowerPoint PPT Presentation

the two cultures: a discussion Katrin Newger Supervisor: Christoph Jansen M.Sc. and Dipl.-Math. Georg Schollmeyer June 27, 2015 Department of Statistics, LMU Munich table of contents 1. The Two Cultures 2. Breimans Argument 3. Discussion


  1. the two cultures: a discussion Katrin Newger Supervisor: Christoph Jansen M.Sc. and Dipl.-Math. Georg Schollmeyer June 27, 2015 Department of Statistics, LMU Munich

  2. table of contents 1. The Two Cultures 2. Breiman’s Argument 3. Discussion 4. Personal Impressions and Conclusion 1

  3. the two cultures

  4. nature 3

  5. data model Assumptions: ∙ Stochastic model ∙ Distribution of residuals ∙ Further model specific assumptions 4

  6. algorithmic model Goal: 5 Function f ( x ) that minimizes loss L ( Y , f ( x ))

  7. examples for algorithmic models Methods: ∙ Support vector machines ∙ Random forests ∙ Artificial neural networks ∙ … 6

  8. breiman’s argument

  9. the data model—too simple a picture ∙ Critical model assumptions ∙ Conclusions about model, not about nature 8 ∙ Wrong model → wrong conclusions about nature ∙ Algorithmic models only assume iid. variables

  10. the model’s fit (1/3) “A few decades ago (…) the belief in data models was such that even simple precautions such as residual analysis or goodness-of-fit tests were not used” (Breiman 2001, p. 199) 9

  11. the model’s fit (2/3) ∙ Necessity of checking the model’s fit ∙ Discussion of the fit is superficial ∙ Most popular: goodness-of-fit tests, residual analysis 10

  12. the model’s fit (3/3) Goodness-of-Fit Tests ∙ Not useful if direction of alternative not precisely defined ∙ Extreme discrepancy to the data is needed Residual Analysis ∙ For more than four dimensions: interactions between variables Algorithmic modeling: cross-validation is standard procedure 11 → manipulation of residual plots

  13. multiplicity of models ∙ Neither model is able to trump ∙ Further problem: variable selection based on model 12 ∙ Different models → different assumptions → different conclusions ∙ Algorithmic modeling: only iid. assumption

  14. inference ∙ Testing on 5% level is arbitrary (“suspect way to arrive at conclusions”, Breiman 2001, p. 203) 13 ∙ Common assumption: n → ∞ never fulfilled ∙ Algorithmic modeling: no inference

  15. curse of dimensionality ∙ Data models become too complex ∙ Common procedure: reducing dimensionality (e.g. principal 14 ∙ Originally: n ≫ p ↔ nowadays: p ≫ n component analysis) → loss of information ∙ Algorithmic modeling: the more variables the more information

  16. prediction ∙ Prediction is more important than interpretation—always ∙ If prediction is bad, how can interpretation be good? 15 ∙ Breiman’s experience: algorithmic models are best predictors

  17. breiman’s conclusion ∙ Everyone’s choice which model is best “The best solution could be an algorithmic model, or maybe a data model, or maybe a combination” (Breiman 2001, p. 206) ∙ Openness for new methods 16

  18. discussion

  19. bias–variance trade-off “[The Bias] has to be lurking somewhere inside the theory” (Brad Efron, in Breiman 2001, p. 219) ∙ In algorithmic modeling, small variance at cost of bias? ∙ Breiman avoids answer 18

  20. multiplicity of models ∙ Does not concern prediction ∙ Just as well in algorithmic models ∙ Main difference between models: distribution ∙ Breiman manipulates reader 19

  21. model assumptions ∙ Why not use known information (e.g. distribution)? ∙ Critical iid. assumption in data models and algorithmic models ∙ Alternatives if iid. assumption is violated? 20

  22. prediction versus interpretability ∙ Rivaling abilities of models ∙ Often interpretation required ∙ Prediction sometimes indirectly related to data “The whole point of science is to open up black boxes, under- stand their insides, and build better boxes for the purposes of mankind” (Brad Efron, in Breiman 2001, p. 219) 21

  23. personal impressions and con- clusion

  24. references Leo Breiman Statistical Modeling: The Two Cultures. Statistical Science 16 (3), 2001: 199–231. T. Hastie, R. Tibshirani and J. Friedman The Elements of Statistical Lernaning. Data Mining, Inference and Prediction. Heidelberg: Springer, 2009. 23

  25. questions and discussion 24

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend