Statistics and learning: Big Data Learning Decision Trees and an - PowerPoint PPT Presentation

Statistics and learning: Big Data Learning Decision Trees and an Introduction to Boosting S´ ebastien Gadat Toulouse School of Economics February 2017 S. Gadat (TSE) SAD 2013 1 / 30

Keywords ◮ Decision trees ◮ Divide and Conquer ◮ Impurity measure, Gini index, Information gain ◮ Pruning and overfitting ◮ CART and C4.5 Contents of this class: The general idea of learning decision trees Regression trees Classification trees Boosting and trees Random Forests and trees S. Gadat (TSE) SAD 2013 2 / 30

Introductory example Alt Bar F/S Hun Pat Pri Rai Res Typ Dur Wai x 1 Y N N Y 0.38 $$$ N Y French 8 Y x 2 Y N N Y 0.83 $ N N Thai 41 N x 3 N Y N N 0.12 $ N N Burger 4 Y x 4 Y N Y Y 0.75 $ Y N Thai 12 Y x 5 Y N Y N 0.91 $$$ N Y French 75 N x 6 N Y N Y 0.34 $$ Y Y Italian 8 Y x 7 N Y N N 0.09 $ Y N Burger 7 N x 8 N N N Y 0.15 $$ Y Y Thai 10 Y x 9 N Y Y N 0.84 $ Y N Burger 80 N x 10 Y Y Y Y 0.78 $$$ N Y Italian 25 N x 11 N N N N 0.05 $ N N Thai 3 N x 12 Y Y Y Y 0.89 $ N N Burger 38 Y Please describe this dataset without any calculation. S. Gadat (TSE) SAD 2013 3 / 30

Introductory example Alt Bar F/S Hun Pat Pri Rai Res Typ Dur Wai x 1 Y N N Y 0.38 $$$ N Y French 8 Y x 2 Y N N Y 0.83 $ N N Thai 41 N x 3 N Y N N 0.12 $ N N Burger 4 Y x 4 Y N Y Y 0.75 $ Y N Thai 12 Y x 5 Y N Y N 0.91 $$$ N Y French 75 N x 6 N Y N Y 0.34 $$ Y Y Italian 8 Y x 7 N Y N N 0.09 $ Y N Burger 7 N x 8 N N N Y 0.15 $$ Y Y Thai 10 Y x 9 N Y Y N 0.84 $ Y N Burger 80 N x 10 Y Y Y Y 0.78 $$$ N Y Italian 25 N x 11 N N N N 0.05 $ N N Thai 3 N x 12 Y Y Y Y 0.89 $ N N Burger 38 Y Why is Pat a better indicator than Typ? S. Gadat (TSE) SAD 2013 3 / 30

Deciding to wait. . . or not 1 3 4 6 8 12 2 5 7 9 10 11 S. Gadat (TSE) SAD 2013 4 / 30

Deciding to wait. . . or not 1 3 4 6 8 12 2 5 7 9 10 11 Pat [0;0.1] [0.1;0.5] [0.5;1] 7 11 8 4 12 1 3 6 2 5 9 10 S. Gadat (TSE) SAD 2013 4 / 30

Deciding to wait. . . or not 1 3 4 6 8 12 2 5 7 9 10 11 Pat [0;0.1] [0.1;0.5] [0.5;1] 7 11 8 4 12 1 3 6 2 5 9 10 Dur <40 >40 4 12 2 5 9 10 S. Gadat (TSE) SAD 2013 4 / 30

Deciding to wait. . . or not 1 3 4 6 8 12 2 5 7 9 10 11 Pat [0;0.1] [0.1;0.5] [0.5;1] 7 11 8 4 12 1 3 6 2 5 9 10 No Yes Dur <40 >40 4 12 2 5 9 10 Yes No S. Gadat (TSE) SAD 2013 4 / 30

The general idea of learning decision trees Decision trees Ingredients: ◮ Nodes Each node contains a test on the features which partitions the data. ◮ Edges The outcome of a node’s test leads to one of its child edges. ◮ Leaves A terminal node, or leaf, holds a decision value for the output variable. S. Gadat (TSE) SAD 2013 5 / 30

The general idea of learning decision trees Decision trees Ingredients: ◮ Nodes Each node contains a test on the features which partitions the data. ◮ Edges The outcome of a node’s test leads to one of its child edges. ◮ Leaves A terminal node, or leaf, holds a decision value for the output variable. We will look at binary trees ( ⇒ binary tests) and single variable tests. Binary attribute: node = attribute Continuous attribute: node = (attribute, threshold) S. Gadat (TSE) SAD 2013 5 / 30

The general idea of learning decision trees Decision trees Ingredients: ◮ Nodes Each node contains a test on the features which partitions the data. ◮ Edges The outcome of a node’s test leads to one of its child edges. ◮ Leaves A terminal node, or leaf, holds a decision value for the output variable. We will look at binary trees ( ⇒ binary tests) and single variable tests. Binary attribute: node = attribute Continuous attribute: node = (attribute, threshold) How does one build a good decision tree? For a regression problem? For a classification problem? S. Gadat (TSE) SAD 2013 5 / 30

The general idea of learning decision trees A little more formally A tree with M leaves describes a covering set of M hypercubes R m in X . Each R m hold a decision value ˆ y m . M ˆ � f ( x ) = y m I R m ( x ) ˆ m =1 Notation: q � N m = | x i ∈ R m | = I R m ( x i ) i =1 S. Gadat (TSE) SAD 2013 6 / 30

The general idea of learning decision trees The general idea: divide and conquer Example Set T , attributes x 1 , . . . , x p FormTree( T ) 1. Find best split ( j, s ) over T // Which criterion? 2. If ( j, s ) = ∅ , ◮ node = FormLeaf(T) // Which value for the leaf? 3. Else ◮ node = ( j, s ) ◮ split T according to ( j, s ) into ( T 1 , T 2) ◮ append FormTree( T 1 ) to node // Recursive call ◮ append FormTree( T 2 ) to node 4. Return node S. Gadat (TSE) SAD 2013 7 / 30

The general idea of learning decision trees The general idea: divide and conquer Example Set T , attributes x 1 , . . . , x p FormTree( T ) 1. Find best split ( j, s ) over T // Which criterion? 2. If ( j, s ) = ∅ , ◮ node = FormLeaf(T) // Which value for the leaf? 3. Else ◮ node = ( j, s ) ◮ split T according to ( j, s ) into ( T 1 , T 2) ◮ append FormTree( T 1 ) to node // Recursive call ◮ append FormTree( T 2 ) to node 4. Return node Remark This is a greedy algorithm, performing local search. S. Gadat (TSE) SAD 2013 7 / 30

The general idea of learning decision trees The R point of view Two packages for tree-based methods: t ree and r part. S. Gadat (TSE) SAD 2013 8 / 30

Regression trees Regression trees – criterion We want to fit a tree to the data { ( x i , y i ) } i =1 ..q with y i ∈ R . Criterion? S. Gadat (TSE) SAD 2013 9 / 30

Regression trees Regression trees – criterion We want to fit a tree to the data { ( x i , y i ) } i =1 ..q with y i ∈ R . q � 2 � y i − ˆ Criterion? Sum of squares: � f ( x i ) i =1 S. Gadat (TSE) SAD 2013 9 / 30

Regression trees Regression trees – criterion We want to fit a tree to the data { ( x i , y i ) } i =1 ..q with y i ∈ R . q � 2 � y i − ˆ Criterion? Sum of squares: � f ( x i ) i =1 Inside region R m , best ˆ y m ? S. Gadat (TSE) SAD 2013 9 / 30

Regression trees Regression trees – criterion We want to fit a tree to the data { ( x i , y i ) } i =1 ..q with y i ∈ R . q � 2 � y i − ˆ Criterion? Sum of squares: � f ( x i ) i =1 1 � Inside region R m , best ˆ y m ? y m = ˆ y i = Y R m N m x i ∈ R m Node impurity measure: 1 � y m ) 2 Q m = ( y i − ˆ N m x i ∈ R m S. Gadat (TSE) SAD 2013 9 / 30

Regression trees Regression trees – criterion Best partition: hard to find. But locally, best split? S. Gadat (TSE) SAD 2013 10 / 30

Regression trees Regression trees – criterion Best partition: hard to find. But locally, best split? Solve argmin C ( j, s ) j,s   y 1 ) 2 + min � � y 2 ) 2 C ( j, s ) =  min ( y i − ˆ ( y i − ˆ  y 1 ˆ y 2 ˆ x i ∈ R 1 ( j,s ) x i ∈ R 2 ( j,s )   � 2 + � 2 � � � � = y i − Y R 1 ( j,s ) y i − Y R 2 ( j,s )   x i ∈ R 1 ( j,s ) x i ∈ R 1 ( j,s ) = N 1 Q 1 + N 2 Q 2 S. Gadat (TSE) SAD 2013 10 / 30

Regression trees Overgrowing the tree? ◮ Too small: rough average. ◮ Too large: overfitting. Alt Bar F/S Hun Pat Pri Rai Res Typ Dur x 1 Y N N Y 0.38 $$$ N Y French 8 x 2 Y N N Y 0.83 $ N N Thai 41 x 3 N Y N N 0.12 $ N N Burger 4 x 4 Y N Y Y 0.75 $ Y N Thai 12 x 5 Y N Y N 0.91 $$$ N Y French 75 x 6 N Y N Y 0.34 $$ Y Y Italian 8 x 7 N Y N N 0.09 $ Y N Burger 7 x 8 N N N Y 0.15 $$ Y Y Thai 10 x 9 N Y Y N 0.84 $ Y N Burger 80 x 10 Y Y Y Y 0.78 $$$ N Y Italian 25 x 11 N N N N 0.05 $ N N Thai 3 x 12 Y Y Y Y 0.89 $ N N Burger 38 S. Gadat (TSE) SAD 2013 11 / 30

Regression trees Overgrowing the tree? Stopping criterion? ◮ Stop if min j,s C ( j, s ) > κ ? Not good because a good split might be hidden in deeper nodes. ◮ Stop if N m < n ? Good to avoid overspecialization. ◮ Prune the tree after growing. cost-complexity pruning . Cost-complexity criterion: M � C α = N m Q m + αM m =1 Once a tree is grown, prune it to minimize C α . ◮ Each α corresponds to a unique cost-complexity optimal tree. ◮ Pruning method: Weakest link pruning , left to your curiosity. ◮ Best α ? Through cross-validation. S. Gadat (TSE) SAD 2013 11 / 30

Regression trees Regression trees in a nutshell ◮ Constant values on the leaves. ◮ Growing phase: greedy splits that minimize the squared-error impurity measure. ◮ Pruning phase: Weakest-link pruning that minimize the cost-complexity criterion. S. Gadat (TSE) SAD 2013 12 / 30

Regression trees Regression trees in a nutshell ◮ Constant values on the leaves. ◮ Growing phase: greedy splits that minimize the squared-error impurity measure. ◮ Pruning phase: Weakest-link pruning that minimize the cost-complexity criterion. Further reading on regression trees: ◮ MARS: Multivariate Adaptive Regression Splines. Linear functions on the leaves. ◮ PRIM: Patient Rule Induction Method. Focuses on extremas rather than averages. S. Gadat (TSE) SAD 2013 12 / 30

Statistics and learning: Big Data Learning Decision Trees and an - PowerPoint PPT Presentation

Statistics and learning: Big Data Learning Decision Trees and an Introduction to Boosting S ebastien Gadat Toulouse School of Economics February 2017 S. Gadat (TSE) SAD 2013 1 / 30 Keywords Decision trees Divide and Conquer

Machine Learning Anders Holst SICS Big Data Analytics Analysis Big Data Big Value Big Data

Big Data Algorithms with Medical Applications Yixin Chen Outline Challenges to big data

CS535 Big Data 1/22/2020 Sangmi Lee Pallickara CS535 Big Data | Computer Science Department

COMP9313: Big Data Management Introduction to Big Data Management What is big data? Tweeted by

Implications of Big Data for Statistics Instruction 17 Nov 2013 Teaching Introductory Business

Implications of Big Data for Statistics Instruction 17 Nov 2013 Teaching Introductory Business

Official Statistics Matt Dray, Assistant Statistician Official Statistics 2 Official

HOW BIG IS BIG DATA FOR AN INSURER LIKE AXA? CHALLENGES & OPPORTUNITIES Paris Big Data

BIG DATA CONFERENCE How to transform data into money using Big Data technologies INTRO THE

CS535 Big Data 3/4/2020 Week 7-B Sangmi Lee Pallickara CS535 Big Data | Computer Science |

BIG DATA: Revolutionizing construction business through socmed data mining REVOLUTIONIZING

Getting the Big (Data) Picture Eva Andreasson , Cloudera Big Data? Todays Big Data Landscape

Fundamentals of Big Data BIG DATA F UN DAMEN TALS W ITH P YS PARK Upendra Devisetty Science

Big Data Analytics: What is Big Data? Stony Brook University CSE545, Fall 2016 the inaugural

Big Data Analytics: What is Big Data? H. Andrew Schwartz Stony Brook University CSE545, Fall

Why do big data and cloud systems slow down and stop? Shan Lu What are? Why do big data and

Welcome to the HHS HR Clinic Christine M. Major Deputy Assistant Secretary for HR Chief Human

Understanding Machine Learning for Empirical So7ware Engineering

Situation coverage testing for autonomous robots Patrizio Pelliccione Associate Professor,

Direct Sales Bjrn Olav Johansen (University of Bergen and BECCLE) Thibaud Verg (ENSAE and

Class Methods COSC 1020 Suppose we want to add a method isLegal(int Yves Lesp erance no) to

On the Capacity of Multi-user Two-way Linear Deterministic Channels Zhiyu Cheng, Natasha Devroye

Harvesting Terrorists You are under harvest Alain Ayong Le Kama Economix - University of

Practice Collections TSI: Your Revenue Optimization Partner Objectives for Today Get an

Statistics and learning: Big Data Learning Decision Trees and an - PowerPoint PPT Presentation

Statistics and learning: Big Data Learning Decision Trees and an Introduction to Boosting S ebastien Gadat Toulouse School of Economics February 2017 S. Gadat (TSE) SAD 2013 1 / 30 Keywords Decision trees Divide and Conquer

Machine Learning Anders Holst SICS Big Data Analytics Analysis Big Data Big Value Big Data

Big Data Algorithms with Medical Applications Yixin Chen Outline Challenges to big data

CS535 Big Data 1/22/2020 Sangmi Lee Pallickara CS535 Big Data | Computer Science Department

COMP9313: Big Data Management Introduction to Big Data Management What is big data? Tweeted by

Implications of Big Data for Statistics Instruction 17 Nov 2013 Teaching Introductory Business

Implications of Big Data for Statistics Instruction 17 Nov 2013 Teaching Introductory Business

Official Statistics Matt Dray, Assistant Statistician Official Statistics 2 Official

HOW BIG IS BIG DATA FOR AN INSURER LIKE AXA? CHALLENGES &amp; OPPORTUNITIES Paris Big Data

BIG DATA CONFERENCE How to transform data into money using Big Data technologies INTRO THE

CS535 Big Data 3/4/2020 Week 7-B Sangmi Lee Pallickara CS535 Big Data | Computer Science |

BIG DATA: Revolutionizing construction business through socmed data mining REVOLUTIONIZING

Getting the Big (Data) Picture Eva Andreasson , Cloudera Big Data? Todays Big Data Landscape

Fundamentals of Big Data BIG DATA F UN DAMEN TALS W ITH P YS PARK Upendra Devisetty Science

Big Data Analytics: What is Big Data? Stony Brook University CSE545, Fall 2016 the inaugural

Big Data Analytics: What is Big Data? H. Andrew Schwartz Stony Brook University CSE545, Fall

Why do big data and cloud systems slow down and stop? Shan Lu What are? Why do big data and

Welcome to the HHS HR Clinic Christine M. Major Deputy Assistant Secretary for HR Chief Human

Understanding Machine Learning for Empirical So7ware Engineering

Situation coverage testing for autonomous robots Patrizio Pelliccione Associate Professor,

Direct Sales Bjrn Olav Johansen (University of Bergen and BECCLE) Thibaud Verg (ENSAE and

Class Methods COSC 1020 Suppose we want to add a method isLegal(int Yves Lesp erance no) to

On the Capacity of Multi-user Two-way Linear Deterministic Channels Zhiyu Cheng, Natasha Devroye

Harvesting Terrorists You are under harvest Alain Ayong Le Kama Economix - University of

Practice Collections TSI: Your Revenue Optimization Partner Objectives for Today Get an

HOW BIG IS BIG DATA FOR AN INSURER LIKE AXA? CHALLENGES & OPPORTUNITIES Paris Big Data