Combining Boosting with Trees for the KDD Cup 2009 - PowerPoint PPT Presentation

Combining Boosting with Trees for the KDD Cup 2009 dmclab@i6.informatik.rwth-aachen.de June 28, 2009 Human Language Technology and Pattern Recognition Lehrstuhl für Informatik 6 Computer Science Department RWTH Aachen University, Germany RWTH Report: KDD 2009 KDD 2009 June 28, 2009 1

Outline ◮ Task Description ◮ Preprocessing Missing Values Feature Generation & Selection ◮ Classification Boosted Decision Stumps Logistic Model Tree ◮ Combinations AUC based optimizations ◮ Conclusions RWTH Report: KDD 2009 KDD 2009 June 28, 2009 2

The Small Challence ◮ RWTH Aachen Data Mining Lab Organized since 2004 This year first participation in the KDD Cup with eight students ◮ Slow track with small data set 230 features (190 numerical and 40 categorical) 32 days duration ◮ Best submission without unscrambling Ranked 35th in final evaluation RWTH Report: KDD 2009 KDD 2009 June 28, 2009 3

Preprocessing: Missing Values ◮ Missing Value Ratio (MVR) for feature f : average number of samples per class and feature where f is not missing Features High Low Moderate SVM Regression Tree FIML RWTH Report: KDD 2009 KDD 2009 June 28, 2009 4

Preprocessing: Features ◮ Generation of binary features ◮ Feature Selection: Ranking based on information gain and likelihood-ratio RWTH Report: KDD 2009 KDD 2009 June 28, 2009 5

Boosted Decision Stumps ◮ AdaBoost with one-level decision trees as weak learners Implemented in Boostexter by Schapire and Singer [2000] ◮ Linear complexity in training: O ( CN ) C : number of classes N : number of training instances ◮ Best performance as single classifier RWTH Report: KDD 2009 KDD 2009 June 28, 2009 6

Logistic Model Tree F d c : linear regression of observation vector x in node d for class c β d i : regression coefficient for the i th component of x in node d RWTH Report: KDD 2009 KDD 2009 June 28, 2009 7

AUC Split Criterion ◮ Introduced by Ferri et al. [2003] for trees with two classes ( + ) and ( − ) ◮ Each labeling of the leaves corresponds to one point in the ROC space N + ◮ Local Positive Accuracy: l N + l + N − l N c l : Number of training samples in leaf l assigned to class c ◮ Select split point resulting in largest AUC RWTH Report: KDD 2009 KDD 2009 June 28, 2009 8

Combinations ◮ Stacking Predictions of boosted decision stumps as features for Logistic Model Tree ◮ Linear combinations of predictions, optimizing on the AUC Weighted scores Weighted voting RWTH Report: KDD 2009 KDD 2009 June 28, 2009 9

Results ◮ AUC score of single classifiers on cross-validation Classifier Appetency Churn Up-selling Score Boosted decision stumps 0.8172 0.7254 0.8488 0.7971 Logistic Model Tree 0.8176 0.7161 0.8450 0.7929 Multilayer perceptron 0.8175 0.7049 0.7741 0.7655 ◮ Combination of LMT, MLP and boosted decision stumps Combination method Appetency Churn Up-selling Score Weighted scores 0.8256 0.7306 0.8493 0.8018 Weighted votes 0.8225 0.7331 0.8515 0.8023 RWTH Report: KDD 2009 KDD 2009 June 28, 2009 10

Conclusions ◮ Best performance: Boosted decision stumps and Logistic Model Tree ◮ Combinations and stacking ◮ AUC-optimized combinations ◮ Results in KDD Cup Rank Method Appetency Churn Up-selling Score 35 LMT + AUCsplit 0.8268 0.7359 0.8615 0.8080 36 Weighted votes 0.8204 0.7398 0.8621 0.8074 RWTH Report: KDD 2009 KDD 2009 June 28, 2009 11

Thank you for your attention Patrick Doetsch patrick.doetsch@rwth-aachen.de http://www-i6.informatik.rwth-aachen.de/ RWTH Report: KDD 2009 KDD 2009 June 28, 2009 12

References C. Ferri, P. A. Flach, and J. Hernandez-Orallo. Improving the auc of probabilistic estimation trees. In Proceedings of the 14th European Conference on Machine Learning , pages 121–132. Springer, 2003. 8 R. E. Schapire and Y. Singer. Boostexter: A boosting-based system for text categorization. Machine Learning , 39(2/3):135–168, 2000. 6 RWTH Report: KDD 2009 June 28, 2009

Combining Boosting with Trees for the KDD Cup 2009 - PowerPoint PPT Presentation

Combining Boosting with Trees for the KDD Cup 2009 dmclab@i6.informatik.rwth-aachen.de June 28, 2009 Human Language Technology and Pattern Recognition Lehrstuhl fr Informatik 6 Computer Science Department RWTH Aachen University, Germany

KDD Cup 2009 Fast Scoring on a Large Database Presentation of the Results at the KDD Cup

Boosting: Foundations and Algorithms Boosting: Foundations and Algorithms Boosting: Foundations

Theory and Applications of Boosting Theory and Applications of Boosting Theory and Applications

Theory and Applications of Boosting Theory and Applications of Boosting Theory and Applications

INTRODUCTION THE WORLD CUP OF AIR RACING P 1 AIR RACE 1 WORLD CUP AIR RACE 1 WORLD CUP THE

Trees Trees CSE, IIT KGP Trees and Spanning Trees Trees and Spanning Trees A graph having

( ( ) ) ( ) ( ) = = Work = h log t n B- B -Trees Trees B B- -Trees

Trees Chapter 11 Chapter Summary Introduction to Trees Applications of Trees Tree

Combining Models Oliver Schulte - CMPT 726 Bishop PRML Ch. 14 Combining Models: Some Theory

Trees Eric McCreath Overview In this lecture we will explore: general trees, binary trees,

RECSM Summer School: Machine Learning for Social Sciences Session 2.4: Boosting Reto West

PLYMOUTH ARGYLE | EMIRATES FA CUP SLEEVE SPONSORSHIP 2019/20 SEASON PLYMOUTH ARGYLE The Emirates

2-3-4 Trees and Red- Black Trees 204 erm CS 16: Balanced Trees 2-3-4 Trees Revealed Nodes

/ + - * * 5 3 2 6 5 2 Examples Binary Trees BSTs Augmenting BinExpr General Trees

Evaluation of text data mining for Evaluation of text data mining for database curation: lessons

An overview of Boosting Yoav Freund UCSD Plan of talk Generative vs. non-generative

Annual General Meetings FY 2018 29 April 2019 SINGAPORE | BRISBANE | PERTH | AUCKLAND |

Playoff Draw Procedures European Zone Playoff Draw Procedures Playoff Format: 8 best runners-up

June 1, 2016 Meeting Materials The mission of the Boston Green Ribbon

Euro 2016 Predictions Using Team Rating Systems Jan Lasek deepsense.io Machine Learning and

International Negotiation EU Simulation Lecture 1 What is Europe? Dean LaRue dlarue@uw.edu

Eastwest Model European Union BECOME PROTAGONISTS OF THE COUNCIL, COMMISSION AND EUROPEAN

Preparations Towards Establishing UN-GGIM: Europe Greg Scott, UNSD on behalf of UN-GGIM: Europe

Towards a Shared Programme between European Countries and the European Union? Nria Bel