Minimal Cost Complexity Pruning of Meta-Classifiers Andreas L. - - PowerPoint PPT Presentation

minimal cost complexity pruning of meta classifiers
SMART_READER_LITE
LIVE PREVIEW

Minimal Cost Complexity Pruning of Meta-Classifiers Andreas L. - - PowerPoint PPT Presentation

Minimal Cost Complexity Pruning of Meta-Classifiers Andreas L. Prodromidis Salvatore J. Stolfo Department of Computer Science Columbia University Combining multiple models Learning Algorithm Classifier-1 Learning Training Classifier-2


slide-1
SLIDE 1

Minimal Cost Complexity Pruning

  • f Meta-Classifiers

Andreas L. Prodromidis Salvatore J. Stolfo

Department of Computer Science Columbia University

slide-2
SLIDE 2

Combining multiple models

Classifier-1 Learning Algorithm Learning Algorithm Learning Algorithm Classifier-2 Classifier-3 Meta-Classifier Meta-learning Training data set

slide-3
SLIDE 3

Meta-learning

Meta-learning Algorithm ML Meta Classificr MC=ML(C1,C2) Training Data D2 Training Data D1 Learning Algorithm L2 Learning Algorithm L1 Classifier C2=L2(D2) Classifier C1=L1(D1) Validation Data D Meta-level Training Data Predictions C1(D) Predictions C2(D)

1 1 1 1 2 2 2 2 3 3 3 4 4

slide-4
SLIDE 4

Validation set Meta-level training set - Stacking (Wolpert-92)

A meta-learning training set example

CID InputT ype Am t … T rue Class 54341 Swipe 19.72 … Legitimate 54432 KeyIn 88.19 … Fraudulent 54101 Phone 11.99 … Legitimate … … … … …

Classifier-1 Classifier-2 Classifier-3 … True Class Legitimate Legitimate Legitimate ... Legitimate Legitimate Fraudulent Legitimate … Fraudulent Fraudulent Fraudulent Legitimate … Legitimate … … … … …

slide-5
SLIDE 5

Classifier C3 Classifier C1 Classifier C2 Testing (unclassified) Data Meta Classificr MC

Meta-Classifying

C1(x) C2 (x) C3 (x) x x x Predictions MC(C1(x),C2 (x),C3 (x))

slide-6
SLIDE 6

Decision tree meta-classifier

CART classifier Bayes classifier CART classifier ID3 classifier Fraud? Legitimate? Fraud? Legitimate? Ripper classifier

slide-7
SLIDE 7

Efficiency

  • Compute base classifiers in parallel
  • Compute “small” meta-classifiers

– to reduce memory requirements – to produce fast classifications

  • Pre-training pruning

– filter before meta-learning (NIT’98, KDD’98-DDM)

  • Post-training pruning

– discard after meta-learning (Prodromidis-et-al-98)

slide-8
SLIDE 8

A graphical description

Map arbitrary meta-classifier to a decision tree representation Prune the decision tree model Map pruned decision tree to original meta-classifier representation (Mapping via modeling of the meta-classifier’s behavior)

slide-9
SLIDE 9

Post-training pruning

Classifier-6 Classifier-3 Classifier-3 Classifier-5 Classifier-7 Classifier-1

  • Minimal cost complexity pruning (Breiman-et-al-84)

– R(T): misclassification cost of a decision tree T – C(T): complexity of tree (= number of terminal nodes) – α: complexity parameter

  • Seek to minimize Rα(T), Rα(T) = R(T) + α· C(T)
slide-10
SLIDE 10

Decision tree model (unpruned)

Complexity=7.84 Complexity=0.5 Complexity=0.92 Complexity=3.52 Complexity=3.99 Complexity=3.61 Complexity=2.8 Complexity=5.0 Complexity=1.7 Complexity=10.5

Rα(T) = R(T) + α· C(T)

slide-11
SLIDE 11

Decision tree model (pruned)

Complexity=7.84 Complexity=3.99 Complexity=3.61 Complexity=5.0 Complexity=10.5

Rα(T) = R(T) + α· C(T)

slide-12
SLIDE 12

Decision tree modeling of meta-classifiers

Meta-level Training Data Meta-Classificr Classifiers Predictions Decision Tree Learning Algorithm (e.g. CART) Decision Tree Training Data Decision Tree Meta-Classificr Classifiers

slide-13
SLIDE 13

Final pruned meta-classifier

Original Meta-Learning Algorithm Decision Tree Meta-Classificr Classifiers Meta-level Training Data Meta-Classificr Classifiers

slide-14
SLIDE 14

Credit Card Fraud detection

  • Chase Credit Card data

– 500,000 transaction records – 30 attributes (numerical, categorical) in 137 bytes per record – 20% fraud, 80% non fraud

  • First Union Credit Card data

– 500,000 transaction records – 28 attributes (numerical categorical) in 137 bytes per record – 15% fraud, 85% non fraud

  • Attributes

– Hashed credit card account number, date, time, type of entry of transaction, type of merchant, amount, validity codes, past payment information, account information, confidential fields, etc. – The fraud label

slide-15
SLIDE 15

Experimental setting

  • Divide data sets in 12 subsets at 6 sites
  • Five learning algorithms

– Naïve Bayes, C4.5, CART, ID3, Ripper

  • Exchange classifiers

– 10 local, 50 remote per site

  • Meta-learn only the remote classifiers
slide-16
SLIDE 16

Meta-learning results

Type of Classification model Size Accuracy TP-FP Savings Best over a single subset 1

88.5% 0.551

$ 812K Best over largest possible subset 1 88.8% 0.568 $ 840K Met a-classifier 50 89.6% 0.621 $ 818K Chase's COTS system

  • 85.7%

0.523 $ 682K Type of Classification model Size Accuracy TP-FP Savings Best over a single subset 1

95.2% 0.749

$ 806K Best over largest possible subset 1 95.3% 0.787 $ 828K Meta-classifier 50 96.5% 0.831 $ 944K

Chase data Maximum savings: $1,470K First Union data Maximum savings: $1,085K

slide-17
SLIDE 17

Pruning results

slide-18
SLIDE 18

More information

  • About the paper

– http://www.cs.columbia.edu/~andreas

  • About the JAM project

– http://www.cs.columbia.edu/~sal/JAM/PROJECT

  • E-mail contact

– andreas@cs.columbia.edu