Optimal Sparse Decision Trees Xiyang Hu Cynthia Rudin Margo - - PowerPoint PPT Presentation

optimal sparse decision trees
SMART_READER_LITE
LIVE PREVIEW

Optimal Sparse Decision Trees Xiyang Hu Cynthia Rudin Margo - - PowerPoint PPT Presentation

Optimal Sparse Decision Trees Xiyang Hu Cynthia Rudin Margo Seltzer Carnegie Mellon Duke University University of British University Columbia Decision Trees Decision Trees Decision Trees Should I click on the link in this email? Do I


slide-1
SLIDE 1

Xiyang Hu Cynthia Rudin Margo Seltzer

Carnegie Mellon University Duke University University of British Columbia

Optimal Sparse Decision Trees

slide-2
SLIDE 2

Decision Trees

slide-3
SLIDE 3

Decision Trees

slide-4
SLIDE 4

Decision Trees

Do I recognize the from address? Do the contents seem

  • dd?

Can I see the URL for the link?

Should I click on the link in this email?

slide-5
SLIDE 5

Decision Trees

Can I see the URL for the link? Do the contents seem

  • dd?

Do I recognize the from address?

Should I click on the link in this email?

slide-6
SLIDE 6

Why not just find the Best Tree?

slide-7
SLIDE 7

Why not just find the Best Tree?

slide-8
SLIDE 8

Could we Effectively Search that Space?

slide-9
SLIDE 9

Could we Effectively Search that Space?

slide-10
SLIDE 10

Could we Effectively Search that Space?

slide-11
SLIDE 11

Could we Effectively Search that Space?

slide-12
SLIDE 12

Could we Effectively Search that Space?

slide-13
SLIDE 13

Could we Effectively Search that Space?

slide-14
SLIDE 14

Could we Effectively Search that Space?

slide-15
SLIDE 15

Could we Effectively Search that Space?

Optimal!

slide-16
SLIDE 16

The Optimization Problem

ˆ L(tree,{(xi, yi)}i) = 1 n i=1

n

å1

[tree(xi )¹yi] + C(#leaves in tree)

slide-17
SLIDE 17

The Optimization Problem

Misclassification error

ˆ L(tree,{(xi, yi)}i) = 1 n i=1

n

å1

[tree(xi )¹yi] + C(#leaves in tree)

slide-18
SLIDE 18

The Optimization Problem

Misclassification error Sparsity

ˆ L(tree,{(xi, yi)}i) = 1 n i=1

n

å1

[tree(xi )¹yi] + C(#leaves in tree)

slide-19
SLIDE 19

Optimal Sparse Decision Tree

(Broward County Recidivism Data)

Prior offenses > 3

no yes

Predict Arrest Age < 26 Predict No Arrest

yes no

Prior Offenses > 1

no yes

Any juvenile crimes? Predict Arrest Predict No Arrest

yes no

Predict Arrest

slide-20
SLIDE 20

Optimal Sparse Decision Trees

Branch and Bound Good Scheduling Order Strong Bounds Incremental Computation

slide-21
SLIDE 21

Optimal Sparse Decision Trees

Branch and Bound Good Scheduling Order Strong Bounds FAST Incremental Computation

slide-22
SLIDE 22

Optimal Sparse Decision Trees

Branch and Bound Good Scheduling Order Strong Bounds Accurate Incremental Computation

slide-23
SLIDE 23

Bounding the Search Space

Lower Bound on Node Support Theorem: For an optimal tree, the support of each node must be above 2C.

Prior offenses > 3

no yes

Predict Arrest Age > 70

yes no

Prior Offenses > 2

slide-24
SLIDE 24

Bounding the Search Space

Lower Bound on Node Support Theorem: For an optimal tree, the support of each node must be above 2C.

Prior offenses > 3

no yes

Predict Arrest Age > 70

yes no

Prior Offenses > 2

x

Node support insufficient to produce

  • ptimal solution
slide-25
SLIDE 25

Bounding the Search Space

Lower Bound on Node Support Theorem: For an optimal tree, the support of each node must be above 2C.

Prior offenses > 3

no yes

Predict Arrest Age > 70

yes no

Prior Offenses > 2

x

Node support insufficient to produce

  • ptimal solution

x

slide-26
SLIDE 26

Bounding the Search Space

Lower Bound on Classification Accuracy Theorem: Each leaf of an

  • ptimal tree correctly classifies

at least fraction C of the data

Prior offenses > 3

no yes

Predict Arrest Felony > 5

yes no

Predict Arrest

slide-27
SLIDE 27

Bounding the Search Space

Lower Bound on Classification Accuracy Theorem: Each leaf of an

  • ptimal tree correctly classifies

at least fraction C of the data

Prior offenses > 3

no yes

Predict Arrest Felony > 5

yes no

Predict Arrest

x

Doesn’t classify at least Cn points correctly.

slide-28
SLIDE 28

Bounding the Search Space

Lower Bound on Classification Accuracy Theorem: Each leaf of an

  • ptimal tree correctly classifies

at least fraction C of the data

Prior offenses > 3

no yes

Predict Arrest Felony > 5

yes no

Predict Arrest

x

Doesn’t classify at least Cn points correctly.

x

slide-29
SLIDE 29

Bounding the Search Space

Permutation Bound Theorem: If two trees have the same leaves, up to a permutation, all their child trees will be the same, so one

  • f them can be pruned.

Prior offenses > 3

no yes

Age > 18

yes no

Predict Arrest Predict No Arrest Age > 18

yes no

Predict No Arrest Predict Arrest Age > 18

no yes

Prior offenses > 3

yes no

Predict Arrest Predict No Arrest Prior offenses > 3

yes no

Predict No Arrest Predict Arrest

slide-30
SLIDE 30

Bounding the Search Space

Permutation Bound Theorem: If two trees have the same leaves, up to a permutation, all their child trees will be the same, so one

  • f them can be pruned.

Prior offenses > 3

no yes

Age > 18

yes no

Predict Arrest Predict No Arrest Age > 18

yes no

Predict No Arrest Predict Arrest Age > 18

no yes

Prior offenses > 3

yes no

Predict Arrest Predict No Arrest Prior offenses > 3

yes no

Predict No Arrest Predict Arrest

slide-31
SLIDE 31

Bounding the Search Space

Permutation Bound Theorem: If two trees have the same leaves, up to a permutation, all their child trees will be the same, so one

  • f them can be pruned.

Prior offenses > 3

no yes

Age > 18

yes no

Predict Arrest Predict No Arrest Age > 18

yes no

Predict No Arrest Predict Arrest Age > 18

no yes

Prior offenses > 3

yes no

Predict Arrest Predict No Arrest Prior offenses > 3

yes no

Predict No Arrest Predict Arrest

slide-32
SLIDE 32

Bounding the Search Space

Permutation Bound Theorem: If two trees have the same leaves, up to a permutation, all their child trees will be the same, so one

  • f them can be pruned.

Prior offenses > 3

no yes

Age > 18

yes no

Predict Arrest Predict No Arrest Age > 18

yes no

Predict No Arrest Predict Arrest Age > 18

no yes

Prior offenses > 3

yes no

Predict Arrest Predict No Arrest Prior offenses > 3

yes no

Predict No Arrest Predict Arrest

slide-33
SLIDE 33

Bounding the Search Space

Permutation Bound Theorem: If two trees have the same leaves, up to a permutation, all their child trees will be the same, so one

  • f them can be pruned.

Prior offenses > 3

no yes

Age > 18

yes no

Predict Arrest Predict No Arrest Age > 18

yes no

Predict No Arrest Predict Arrest Age > 18

no yes

Prior offenses > 3

yes no

Predict Arrest Predict No Arrest Prior offenses > 3

yes no

Predict No Arrest Predict Arrest

slide-34
SLIDE 34

Bounding the Search Space

  • Other bounds enable even more pruning

– Equivalent points bound: Samples with the same features, but different predictions will produce misclassifications regardless of model. – Bound on the number of leaves: Regularization value bounds the number of leaves.

slide-35
SLIDE 35

Optimal Sparse Decision Trees

https://github.com/xiyanghu/OSDT

Open Source FAST Accurate Interpretable