XGBOOST: A SCALABLE TREE BOOSTING SYSTEM ADVISOR: JIA-LING KOH - PowerPoint PPT Presentation

XGBOOST: A SCALABLE TREE BOOSTING SYSTEM ADVISOR: JIA-LING KOH SPEAKER: YIN-HSIANG LIAO 2018/04/17, FROM KDD 2016

Outline Introduction Method Experiment Conclusion 2

Introduction Regression tree CART (Gini) Boosting Ensemble method, an iterative procedure adaptively change the distribution of training examples. Adaboost 3

Introduction The most important factor of XGBoost — Scalability. Billions of examples. 4

Introduction A practical choice: 17 out of 29 winning solutions in Kaggle 2015. Top-10 teams all used XGBoost in KDDcup 2015 T-brain: used in top-3 teams. Ad click through rate prediction, malware classification, customer behavior prediction, etc. 5

Method Tree ensemble model: Prediction Leaf weights of a tree 6

Objective function Method Regularized objective function: Model complexity Number of leaves Differentiable convex loss function Number of leaves + Weights on leave 7

Objective function Method Usual Gradient tree boosting: Model is trained in additive manner. __ _________ 8

Objective function Method Additive training (Boosting) 9

Objective function Method Taylor expansion: 10

Objective function Method :instance set of j ( x i in leaf j ) T : number of leaf 11

Objective function Method For a fixed tree q, the optimal weight is: 12

Objective function Method For a fixed tree q, the optimal weight is: The corresponding optimal value is: 13

Objective function Method From now, if the tree is known, we get the optimal value. The problem becomes “what tree is the best ?” Loss reduction Greedy strategy Left subtree. Right subtree. Parent The larger the better, might be negative 14

Objective function Method Preventing overfitting further: Shrinkage. Subsampling. (column) 15

Split Finding Method Basic Exact Greedy Algorithm. Approximate Algorithm. Global Local 16

Split Finding Method Basic Exact Greedy Algorithm: .m When to stop? 17

Split Finding Method B.E.G.A. is good, since all possible splits, but…. When data can’t fit in memory, the thrashing slow down the system. Approximations: 18

Split Finding Method Local/ Global agendas: Global: less proposal but more candidate point. 19

Split Finding Method Weighted quantile sketch: Each interval has the same “impact” on OF . 20

Split Finding Method Sparsity-aware: Possible reasons: Missing value Frequent zero Artifacts of feature engineering (like one-hot) Solution: default direction 21

Split Finding Method Learn the best direction (of the feature) Sort criteria: Missing value last 22

Split Finding Method Non-presence -> missing value. Only deal with presence. 50x faster than naive ver. , on Allstate. 23

System Design Method The most time consuming part: sorting. Sort just once. Store data in in-memory unit: block. 24

System Design Method CSC format (compressed column) Ex: Di ff erent blocks can be distributed across machine, stored on disk in the out-of-core setting. 25

System Design Method Block structure helps split finding. However, it’s a non-continuous memory access. Solution: allocate an internal bu ff er in each thread. 26

System Design Method Block size matters. (max number of examples) Small blocks result in small workload for each thread. Balance! Large blocks lead cache missing. 27

System Design Method Out-of-core computation: Block compression Ex: [0, 2, 2, 0, 1, 2] Block sharding A prefetch thread is assigned to each disk. 28

Experiment The open source package: GitHub.com/dmlc/xgboost 29

Experiment Classification: GBM expands one branch of a tree. Other two expand full tree. 30

Experiment Learning to rank: pGBRT: the best previously published system. pGBRT only supports approximate algorithm. 31

Experiment Out-of-core experiment Compression helps 3x times. Sharding into two give 2x speedup. 32

System Design Conclusion The most Important feature: Scalability ! Lessons from building XGBoost: Sparsity aware, weighted quantile sketch, cache aware, parallelization. Fin. 33

XGBOOST: A SCALABLE TREE BOOSTING SYSTEM ADVISOR: JIA-LING KOH - PowerPoint PPT Presentation

XGBOOST: A SCALABLE TREE BOOSTING SYSTEM ADVISOR: JIA-LING KOH SPEAKER: YIN-HSIANG LIAO 2018/04/17, FROM KDD 2016 Outline Introduction Method Experiment Conclusion 2 Introduction Regression tree CART (Gini) Boosting Ensemble method, an

Boosting: Foundations and Algorithms Boosting: Foundations and Algorithms Boosting: Foundations

Theory and Applications of Boosting Theory and Applications of Boosting Theory and Applications

Theory and Applications of Boosting Theory and Applications of Boosting Theory and Applications

Are Hybrid Physical Designs Important? 1 B+ tree 2 C O L B+ tree 3 ? C O L C O L B+ tree

61A Lecture 21 Announcements Binary Trees Binary Tree Class 4 Binary Tree Class class

RECSM Summer School: Machine Learning for Social Sciences Session 2.4: Boosting Reto West

Tree-sitter @maxbrunsfeld What is Tree-sitter? Why I wrote Tree-sitter What were

Cache Coherence in Scalable Machines Scalable Cache Coherent Systems Scalable, distributed

Final Examples Announcements Trees Tree-Structured Data def tree(label, branches=[]): A tree

Lecture #16: Boosting Data Science 1 CS 109A, STAT 121A, AC 209A, E-109A Pavlos Protopapas

An overview of Boosting Yoav Freund UCSD Plan of talk Generative vs. non-generative

Multiclass Boosting with Repartitioning Ling Li Learning Systems Group, Caltech ICML 2006

Boosting (ensemble) Module 4 - Ensemble classifiers - Objectives module 4: boosting (ensemble

The Boosting Approach to Machine Learning Maria-Florina Balcan 03/16/2015 Boosting General

ECON 950 Winter 2020 Prof. James MacKinnon 7. Boosting Like bagging and random forests,

mboost - Componentwise Boosting for Generalised Regression Models Thomas Kneib & Torsten

MetiTarski: Past and Future Prof. Lawrence C Paulson, University of Cambridge Interactive Theorem

1986) 1. double f_function() 2. double g_function() 3. void G_gradient() 4. void G_hessian() 5.

MEI Conference 2014 How rare are co-prime pairs? Bernard Murphy bernard.murphy@mei.org.uk MEI

Mathematization: Student Resource Use in E&M 1 DYLAN E. MCKNIGHT ADVISOR: DR. ELEANOR C.

Max-Min Fair Resource Allocation for Multiuser Amplify-and-Forward Relay Networks Alireza

The Faltings Heights of CM Elliptic Curves and Special Gamma Values Lindsay Cadwallader, Olivia

Its All in the Hidden States: A Hedging Method with an Explicit Measure of Population Basis

Beam Parameters Reconstruction Using Pair Monitor can we do something more? Goran Kaarevi ,

XGBOOST: A SCALABLE TREE BOOSTING SYSTEM ADVISOR: JIA-LING KOH - PowerPoint PPT Presentation

XGBOOST: A SCALABLE TREE BOOSTING SYSTEM ADVISOR: JIA-LING KOH SPEAKER: YIN-HSIANG LIAO 2018/04/17, FROM KDD 2016 Outline Introduction Method Experiment Conclusion 2 Introduction Regression tree CART (Gini) Boosting Ensemble method, an

Boosting: Foundations and Algorithms Boosting: Foundations and Algorithms Boosting: Foundations

Theory and Applications of Boosting Theory and Applications of Boosting Theory and Applications

Theory and Applications of Boosting Theory and Applications of Boosting Theory and Applications

Are Hybrid Physical Designs Important? 1 B+ tree 2 C O L B+ tree 3 ? C O L C O L B+ tree

61A Lecture 21 Announcements Binary Trees Binary Tree Class 4 Binary Tree Class class

RECSM Summer School: Machine Learning for Social Sciences Session 2.4: Boosting Reto West

Tree-sitter @maxbrunsfeld What is Tree-sitter? Why I wrote Tree-sitter What were

Cache Coherence in Scalable Machines Scalable Cache Coherent Systems Scalable, distributed

Final Examples Announcements Trees Tree-Structured Data def tree(label, branches=[]): A tree

Lecture #16: Boosting Data Science 1 CS 109A, STAT 121A, AC 209A, E-109A Pavlos Protopapas

An overview of Boosting Yoav Freund UCSD Plan of talk Generative vs. non-generative

Multiclass Boosting with Repartitioning Ling Li Learning Systems Group, Caltech ICML 2006

Boosting (ensemble) Module 4 - Ensemble classifiers - Objectives module 4: boosting (ensemble

The Boosting Approach to Machine Learning Maria-Florina Balcan 03/16/2015 Boosting General

ECON 950 Winter 2020 Prof. James MacKinnon 7. Boosting Like bagging and random forests,

mboost - Componentwise Boosting for Generalised Regression Models Thomas Kneib &amp; Torsten

MetiTarski: Past and Future Prof. Lawrence C Paulson, University of Cambridge Interactive Theorem

1986) 1. double f_function() 2. double g_function() 3. void G_gradient() 4. void G_hessian() 5.

MEI Conference 2014 How rare are co-prime pairs? Bernard Murphy bernard.murphy@mei.org.uk MEI

Mathematization: Student Resource Use in E&amp;M 1 DYLAN E. MCKNIGHT ADVISOR: DR. ELEANOR C.

Max-Min Fair Resource Allocation for Multiuser Amplify-and-Forward Relay Networks Alireza

The Faltings Heights of CM Elliptic Curves and Special Gamma Values Lindsay Cadwallader, Olivia

Its All in the Hidden States: A Hedging Method with an Explicit Measure of Population Basis

Beam Parameters Reconstruction Using Pair Monitor can we do something more? Goran Kaarevi ,

mboost - Componentwise Boosting for Generalised Regression Models Thomas Kneib & Torsten

Mathematization: Student Resource Use in E&M 1 DYLAN E. MCKNIGHT ADVISOR: DR. ELEANOR C.