a comparison of covariate based predictition methods for
play

A Comparison of Covariate-based Predictition Methods for FIFA World - PowerPoint PPT Presentation

A Comparison of Covariate-based Predictition Methods for FIFA World Cups A. Groll Faculty of Statistics, TU Dortmund University (joint work with J. Abedieh, C. Ley, A. Mayr, T. Kneib, G. Schauberger, G. Tutz & H. Van Eetvelde) Zurich R


  1. A Comparison of Covariate-based Predictition Methods for FIFA World Cups A. Groll Faculty of Statistics, TU Dortmund University (joint work with J. Abedieh, C. Ley, A. Mayr, T. Kneib, G. Schauberger, G. Tutz & H. Van Eetvelde) Zurich R User Group Meetup October 25 th 2018, University of Zurich A. Groll (TU Dortmund) Predicting International Soccer Tournaments 1 / 38

  2. Who will celebrate? Sources: youtube.com,EMAJ Magazine,youfrisky.com,Bailiwick Express A. Groll (TU Dortmund) Predicting International Soccer Tournaments 2 / 38

  3. Who will cry? Sources: youtube.com,pinterest,BBC,Daily Mail A. Groll (TU Dortmund) Predicting International Soccer Tournaments 3 / 38

  4. Theoretical Background A. Groll (TU Dortmund) Predicting International Soccer Tournaments 4 / 38

  5. Part I: Regression-based Methods A. Groll (TU Dortmund) Predicting International Soccer Tournaments 5 / 38

  6. Model for international soccer tournaments y ijk ∣ x ik , x jk ∼ Pois ( λ ijk ) i , j ∈ { 1 ,..., n } , i ≠ j λ ijk = exp ( β 0 + ( x ik − x jk ) ⊺ β ) n : Number of teams y ijk : Number of goals scored by team i against opponent j at tournament k x ik , x jk : Covariate vectors of team i and opponent j varying over tournaments β : Parameter vector of covariate effects A. Groll (TU Dortmund) Predicting International Soccer Tournaments 6 / 38

  7. Regularized estimation Maximize penalized log-likelihood l p ( β 0 ,β β ) = l ( β 0 ,β β ) − λ J ( β β ) β β β A. Groll (TU Dortmund) Predicting International Soccer Tournaments 7 / 38

  8. Regularized estimation Maximize penalized log-likelihood l p ( β 0 ,β β ) = l ( β 0 ,β β ) − λ J ( β β ) β β β = l ( β 0 ,β β ) − λ p ∣ β i ∣ , ∑ β i = 1 with lasso penalty term (Tibshirani, 1996): p J ( β β ) = ∣ β i ∣ . ∑ β i = 1 The model can be estimated with the R -package glmnet (Friedman et al., 2010). A. Groll (TU Dortmund) Predicting International Soccer Tournaments 7 / 38

  9. Regularized estimation Maximize penalized log-likelihood l p ( β 0 ,β β ) = l ( β 0 ,β β ) − λ J ( β β ) β β β = l ( β 0 ,β β ) − λ p ∣ β i ∣ , ∑ β i = 1 with lasso penalty term (Tibshirani, 1996): p J ( β β ) = ∣ β i ∣ . ∑ β i = 1 The model can be estimated with the R -package glmnet (Friedman et al., 2010). Versions used for: EURO 2012 (Groll and Abedieh, 2013); World Cup 2014 (Groll et al., 2015); EURO 2016 (Groll et al., 2018) A. Groll (TU Dortmund) Predicting International Soccer Tournaments 7 / 38

  10. Part II: Ranking Methods A. Groll (TU Dortmund) Predicting International Soccer Tournaments 8 / 38

  11. Independent Poisson ranking model ∼ Pois ( λ ijm ) , Y ijm = exp ( β 0 + ( r i − r j ) + h ⋅ I ( team i playing at home )) λ ijm n : Number of teams M : Number of matches y ijm : Number of goals scored by team i against opponent j in match m r i , r j : strengths / ability parameters of team i and team j h : home effect; added if team i plays at home A. Groll (TU Dortmund) Predicting International Soccer Tournaments 9 / 38

  12. Independent Poisson ranking model Likelihood function : ⎛ y jim ! exp ( − λ jim )⎞ w type , m ⋅ w time , m λ y ijm λ y jim L = ∏ M y ijm ! exp (− λ ijm ) ⋅ ijm jim ⎝ ⎠ , m = 1 with weights tm w time , m ( t m ) = ( 1 2 ) Half period and w type , m ∈ { 1 , 2 , 3 , 4 } (depending on type of match) . A. Groll (TU Dortmund) Predicting International Soccer Tournaments 10 / 38

  13. Independent Poisson ranking model Likelihood function : ⎛ y jim ! exp (− λ jim )⎞ w type , m ⋅ w time , m λ y ijm λ y jim L = M y ijm ! exp (− λ ijm ) ⋅ ∏ ijm jim , ⎝ ⎠ m = 1 with weights tm w time , m ( t m ) = ( 1 2 ) Half period and w type , m ∈ { 1 , 2 , 3 , 4 } (depending on type of match) . Different extensions, for example, bivariate Poisson models . Ley et al. (2018) show that bivariate Poisson with Half Period of 3 years is best for prediction. A. Groll (TU Dortmund) Predicting International Soccer Tournaments 10 / 38

  14. Part III: Random Forests A. Groll (TU Dortmund) Predicting International Soccer Tournaments 11 / 38

  15. Random Forests ● introduced by Breiman (2001) ● principle : aggregation of (large) number of classification / regression trees � ⇒ can be used both for classification & regression purposes A. Groll (TU Dortmund) Predicting International Soccer Tournaments 12 / 38

  16. Random Forests ● introduced by Breiman (2001) ● principle : aggregation of (large) number of classification / regression trees � ⇒ can be used both for classification & regression purposes ● final predictions : single tree predictions are aggregated, either by majority vote (classification) or by averaging (regression) A. Groll (TU Dortmund) Predicting International Soccer Tournaments 12 / 38

  17. Random Forests ● introduced by Breiman (2001) ● principle : aggregation of (large) number of classification / regression trees � ⇒ can be used both for classification & regression purposes ● final predictions : single tree predictions are aggregated, either by majority vote (classification) or by averaging (regression) ● feature space is partitioned recursively, each partition has its own prediction A. Groll (TU Dortmund) Predicting International Soccer Tournaments 12 / 38

  18. Random Forests ● introduced by Breiman (2001) ● principle : aggregation of (large) number of classification / regression trees � ⇒ can be used both for classification & regression purposes ● final predictions : single tree predictions are aggregated, either by majority vote (classification) or by averaging (regression) ● feature space is partitioned recursively, each partition has its own prediction ● find split with strongest difference between the two new partitions w.r.t. some criterion A. Groll (TU Dortmund) Predicting International Soccer Tournaments 12 / 38

  19. Random Forests ● introduced by Breiman (2001) ● principle : aggregation of (large) number of classification / regression trees � ⇒ can be used both for classification & regression purposes ● final predictions : single tree predictions are aggregated, either by majority vote (classification) or by averaging (regression) ● feature space is partitioned recursively, each partition has its own prediction ● find split with strongest difference between the two new partitions w.r.t. some criterion ● Observations within the same partition as similar as possible, observations from different partitions very different (w.r.t. response variable) A. Groll (TU Dortmund) Predicting International Soccer Tournaments 12 / 38

  20. Random Forests ● introduced by Breiman (2001) ● principle : aggregation of (large) number of classification / regression trees � ⇒ can be used both for classification & regression purposes ● final predictions : single tree predictions are aggregated, either by majority vote (classification) or by averaging (regression) ● feature space is partitioned recursively, each partition has its own prediction ● find split with strongest difference between the two new partitions w.r.t. some criterion ● Observations within the same partition as similar as possible, observations from different partitions very different (w.r.t. response variable) ● a single tree is usually pruned (lower variance but increases bias) A. Groll (TU Dortmund) Predicting International Soccer Tournaments 12 / 38

  21. Random Forests ● introduced by Breiman (2001) ● principle : aggregation of (large) number of classification / regression trees � ⇒ can be used both for classification & regression purposes ● final predictions : single tree predictions are aggregated, either by majority vote (classification) or by averaging (regression) ● feature space is partitioned recursively, each partition has its own prediction ● find split with strongest difference between the two new partitions w.r.t. some criterion ● Observations within the same partition as similar as possible, observations from different partitions very different (w.r.t. response variable) ● a single tree is usually pruned (lower variance but increases bias) ● visualized in dendrogram A. Groll (TU Dortmund) Predicting International Soccer Tournaments 12 / 38

  22. Dendrogram of regression tree 1 Rank p < 0.001 ≤ −15 > −15 3 Oddset p = 0.003 ≤ −0.003 > −0.003 Node 2 (n = 139) Node 4 (n = 213) Node 5 (n = 160) 8 8 8 6 6 6 4 4 4 2 2 2 0 0 0 Exemplary regression tree for FIFA World Cup 2002 – 2014 data using the function ctree from the R -package party (Hothorn et al., 2006). Response : Number of goals ; predictors : only FIFA Rank and Oddset are used. A. Groll (TU Dortmund) Predicting International Soccer Tournaments 13 / 38

  23. Random Forests ● repeatedly grow different regression trees ● main goal: decrease variance A. Groll (TU Dortmund) Predicting International Soccer Tournaments 14 / 38

  24. Random Forests ● repeatedly grow different regression trees ● main goal: decrease variance � ⇒ decrease correlation between single trees. A. Groll (TU Dortmund) Predicting International Soccer Tournaments 14 / 38

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend