minimal cost complexity pruning of meta classifiers
play

Minimal Cost Complexity Pruning of Meta-Classifiers Andreas L. - PowerPoint PPT Presentation

Minimal Cost Complexity Pruning of Meta-Classifiers Andreas L. Prodromidis Salvatore J. Stolfo Department of Computer Science Columbia University Combining multiple models Learning Algorithm Classifier-1 Learning Training Classifier-2


  1. Minimal Cost Complexity Pruning of Meta-Classifiers Andreas L. Prodromidis Salvatore J. Stolfo Department of Computer Science Columbia University

  2. Combining multiple models Learning Algorithm Classifier-1 Learning Training Classifier-2 Meta-learning Meta-Classifier Algorithm data set Classifier-3 Learning Algorithm

  3. Meta-learning Learning Training 1 1 2 Classifier Predictions Algorithm Data C 1 =L 1 (D 1 ) C 1 (D) L 1 D 1 2 3 Validation Data 2 D Training Learning 1 Classifier 1 Predictions Data Algorithm C 2 =L 2 (D 2 ) 2 3 C 2 (D) D 2 L 2 3 Meta Meta-learning 4 4 Meta-level Classificr Algorithm Training MC=ML(C 1 ,C 2 ) ML Data

  4. A meta-learning training set example Validation set CID InputT ype Am t … T rue Class 54341 Swipe 19.72 … Legitimate 54432 KeyIn 88.19 … Fraudulent 54101 Phone 11.99 … Legitimate … … … … … Meta-level training set - Stacking (Wolpert-92) Classifier-1 Classifier-2 Classifier-3 … True Class Legitimate Legitimate Legitimate ... Legitimate Legitimate Fraudulent Legitimate … Fraudulent Fraudulent Fraudulent Legitimate … Legitimate … … … … …

  5. Meta-Classifying Classifier C 1 C 1 (x) x Meta MC(C 1 (x),C 2 (x),C 3 (x)) x C 2 (x) Classifier Testing Classificr Predictions C 2 (unclassified) MC x Data C 3 (x) Classifier C 3

  6. Decision tree meta-classifier CART classifier Fraud? Bayes classifier Legitimate? Fraud? Ripper ID3 classifier Legitimate? classifier CART classifier

  7. Efficiency • Compute base classifiers in parallel • Compute “small” meta-classifiers – to reduce memory requirements – to produce fast classifications • Pre-training pruning – filter before meta-learning (NIT’98, KDD’98-DDM) • Post-training pruning – discard after meta-learning (Prodromidis-et-al-98)

  8. A graphical description Map arbitrary meta-classifier to a decision tree representation Prune the decision tree model Map pruned decision tree to original meta-classifier representation (Mapping via modeling of the meta-classifier’s behavior)

  9. Post-training pruning Classifier-7 Classifier-6 Classifier-3 Classifier-1 Classifier-5 Classifier-3 • Minimal cost complexity pruning (Breiman-et-al-84) – R(T): misclassification cost of a decision tree T – C(T): complexity of tree (= number of terminal nodes) – α : complexity parameter • Seek to minimize R α (T), R α (T) = R(T) + α · C(T)

  10. Decision tree model (unpruned) Complexity=0.5 Complexity=0.92 Complexity=1.7 Complexity=2.8 Complexity=3.52 Complexity=3.61 Complexity=3.99 Complexity=5.0 Complexity=7.84 Complexity=10.5 R α (T) = R(T) + α · C(T)

  11. Decision tree model (pruned) Complexity=3.61 Complexity=3.99 Complexity=5.0 Complexity=7.84 Complexity=10.5 R α (T) = R(T) + α · C(T)

  12. Decision tree modeling of meta-classifiers Meta-Classificr Predictions Meta-level Training Classifiers Data Decision Tree Meta-Classificr Decision Tree Classifiers Learning Algorithm Decision Tree (e.g. CART) Training Data

  13. Final pruned meta-classifier Decision Tree Meta-Classificr Meta-Classificr Classifiers Meta-level Training Classifiers Data Original Meta-Learning Algorithm

  14. Credit Card Fraud detection • Chase Credit Card data – 500,000 transaction records – 30 attributes (numerical, categorical) in 137 bytes per record – 20% fraud, 80% non fraud • First Union Credit Card data – 500,000 transaction records – 28 attributes (numerical categorical) in 137 bytes per record – 15% fraud, 85% non fraud • Attributes – Hashed credit card account number, date, time, type of entry of transaction, type of merchant, amount, validity codes, past payment information, account information, confidential fields, etc. – The fraud label

  15. Experimental setting • Divide data sets in 12 subsets at 6 sites • Five learning algorithms – Naïve Bayes, C4.5, CART, ID3, Ripper • Exchange classifiers – 10 local, 50 remote per site • Meta-learn only the remote classifiers

  16. Meta-learning results Type of Chase data Classification Size Accuracy TP-FP Savings model Maximum savings: Best over a 1 88.5% 0.551 $ 812K single subset $1,470K Best over largest 1 88.8% 0.568 $ 840K possible subset 50 89.6% 0.621 $ 818K Met a-classifier Chase's COTS -- 85.7% 0.523 $ 682K system Type of First Union data Classification Size Accuracy TP-FP Savings model Maximum savings: Best over a 1 95.2% 0.749 $ 806K $1,085K single subset Best over largest 1 95.3% 0.787 $ 828K possible subset 50 96.5% 0.831 $ 944K Meta-classifier

  17. Pruning results

  18. More information • About the paper – http://www.cs.columbia.edu/~andreas • About the JAM project – http://www.cs.columbia.edu/~sal/JAM/PROJECT • E-mail contact – andreas@cs.columbia.edu

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend