CS109A Introduction to Data Science
Pavlos Protopapas and Kevin Rader
Advanced Section #7: Decision trees and Ensemble methods
1
Decision trees and Ensemble methods Camilo Fosco CS109A - - PowerPoint PPT Presentation
Advanced Section #7: Decision trees and Ensemble methods Camilo Fosco CS109A Introduction to Data Science Pavlos Protopapas and Kevin Rader 1 Outline Decision trees Metrics Tree-building algorithms Ensemble methods
1
CS109A, PROTOPAPAS, RADER
2
The backbone of most techniques
3
CS109A, PROTOPAPAS, RADER
4
CS109A, PROTOPAPAS, RADER
๐=1 ๐พ
2
5
Number of classes Proportion of elements of class i in subset S
CS109A, PROTOPAPAS, RADER
6
Gini = P(picking green)P(picking label black) + P(picking black)P(picking label green) = 1 โ [ P(picking green)P(picking label green) + P(picking black)P(picking label black) ] = 1 โ
3 7 โ 3 7 + 4 7 โ 4 7 = 0.4898
Gini = P(picking green)P(picking label black) + P(picking black)P(picking label green) = 1 โ [ P(picking green)P(picking label green) + P(picking black)P(picking label black) ] = 1 โ 1 โ 1 + 0 โ 0 = 0
CS109A, PROTOPAPAS, RADER
๐
7
Subset S (parent) Split point
Entropy (parent) Weighted sum of entropy (children)
CS109A, PROTOPAPAS, RADER
8
CS109A, PROTOPAPAS, RADER
9
CS109A, PROTOPAPAS, RADER
10
CS109A, PROTOPAPAS, RADER
11
CS109A, PROTOPAPAS, RADER
12
CS109A, PROTOPAPAS, RADER
13
CS109A, PROTOPAPAS, RADER
๐=1 ๐
๐ง๐โ๐๐
2
14
CS109A, PROTOPAPAS, RADER
15
CS109A, PROTOPAPAS, RADER
16
CS109A, PROTOPAPAS, RADER
17
CS109A, PROTOPAPAS, RADER
18
Left child Right child Phone_bill > 100 649 351 Marital_status = 1 638 362 Left child Right child Phone_bill > 100 550R, 99G 50R, 301G Marital_status = 1 510R, 128G 51R, 311G
CS109A, PROTOPAPAS, RADER
19
CS109A, PROTOPAPAS, RADER
20
CS109A, PROTOPAPAS, RADER
21
CS109A, PROTOPAPAS, RADER
22
1)
1|
1, โฆ , ๐๐where ๐๐ is the tree containing
Assemblers 2: Age of weak learners
23
CS109A, PROTOPAPAS, RADER
24
CS109A, PROTOPAPAS, RADER
25
CS109A, PROTOPAPAS, RADER
๐=1 ๐ฟ
โ(โ )
๐
๐=1 ๐ฟ
๐๐
โ โ
26
Regression: Classification: (Majority Vote) (Average)
CS109A, PROTOPAPAS, RADER
27
*Buja and Stuetzel, 2006, section 7. Number of observations Sample size with replacement Sample size without replacement
CS109A, PROTOPAPAS, RADER
28
CS109A, PROTOPAPAS, RADER
29
CS109A, PROTOPAPAS, RADER
0 ๐ฆ = ๐ฝ0
๐ ๐ฆ = ๐บ ๐โ1 ๐ฆ + ๐ฝ๐โ๐(๐ฆ)
30
๐โ1 ๐ฆ + ๐ฝ๐โ๐(๐ฆ) < ๐(๐ง, ๐บ ๐โ1 ๐ฆ )
CS109A, PROTOPAPAS, RADER
31
๐ that we can plug in here
๐ from ๐ฆ!
CS109A, PROTOPAPAS, RADER
0 ๐ฆ = ๐ฝ0
๐๐ = โ ๐๐ ๐ง๐, ๐บ
๐โ1 ๐ฆ๐
๐๐บ
๐โ1 ๐ฆ๐
๐ ๐ฆ = ๐บ ๐โ1 ๐ฆ + ๐ฝ๐โ๐(๐ฆ)
32
CS109A, PROTOPAPAS, RADER
๐ ๐ฆ๐
๐๐บ
๐(๐ฆ๐)
๐
1 2๐ ฯ๐ ๐ง๐โ๐บ ๐ ๐ฆ๐ 2
๐๐บ
๐ ๐ฆ๐
๐ 1
2 ๐ง๐โ๐บ ๐ ๐ฆ๐ 2
๐๐บ
๐ ๐ฆ๐
๐ ๐ฆ๐
33
CS109A, PROTOPAPAS, RADER
๐พ๐
๐๐1๐๐๐(๐ฆ)
๐ ๐ฆ = ๐บ ๐โ1 ๐ฆ + เท ๐=1 ๐พ๐
๐ฆ๐โ๐๐๐
๐โ1 ๐ฆ๐ + ๐ฟ
34
Number of leaves Disjoint regions partitioned by the tree
CS109A, PROTOPAPAS, RADER
35
http://arogozhnikov.github.io/2016/06/24/gradient_boosting_explained.html
Random Forests, of course.
36
CS109A, PROTOPAPAS, RADER
37
CS109A, PROTOPAPAS, RADER
38
CS109A, PROTOPAPAS, RADER
โ, ๐ง1 โ ,โฆ , ๐ฆ๐ โ,๐ง๐ โ by sampling original training
39
Kaggle killers.
40
CS109A, PROTOPAPAS, RADER
41
CS109A, PROTOPAPAS, RADER
๐
๐
๐
1 = 1 and ๐ฅ๐ ๐ = ๐โ ๐ง๐๐บ
๐โ1 ๐ฆ๐
42
CS109A, PROTOPAPAS, RADER
๐
43
CS109A, PROTOPAPAS, RADER
44
CS109A, PROTOPAPAS, RADER
sparsity patterns in the data.
weighted data.
block structure in its system design. Block structure enables the data layout to be reused.
huge datasets that do not fit into memory.
45
CS109A, PROTOPAPAS, RADER
2 ๐ ๐ฅ 2
46
Number of leaves Leaf weights: prediction of each leaf
CS109A, PROTOPAPAS, RADER
๐
i=1 n
2 ๐ฆ๐ + ฮฉ โ๐
i=1 n
2 ๐ฆ๐ + ฮฉ โ๐
47
First order gradient of loss w.r.t F(x) Second order gradient of loss w.r.t F(x)
CS109A, PROTOPAPAS, RADER
๐ = {๐|๐ ๐ฆ๐ = ๐} as the
48
CS109A, PROTOPAPAS, RADER
49
CS109A, PROTOPAPAS, RADER
50
CS109A, PROTOPAPAS, RADER
51
CS109A, PROTOPAPAS, RADER
52
CS109A, PROTOPAPAS, RADER
53
CS109A, PROTOPAPAS, RADER
54
CS109A, PROTOPAPAS, RADER
55
CS109A, PROTOPAPAS, RADER
56
57
CS109A, PROTOPAPAS, RADER
58