Classifier Combination Kuncheva Ch. 3 Motivation Classifiers Are - PowerPoint PPT Presentation

Classifier Combination Kuncheva Ch. 3

Motivation Classifiers Are functions that map feature vectors to classes. • Any single classifier selected will define this function subject to certain biases due to the space of models defined, the training algorithm used, and the training data used. • Idea: use a ‘committee’ of base classifiers, map a vector of class outputs or discriminant values for each base classifier to the output classes. (Parallel Combination) • Fusion: base classifiers cover feature space; Selection: classifiers are assigned to produce classes for a region in feature space • Another Idea: organize classifiers in a cascade (list) or hierarchy , to allow progressively more specialized classifiers to refine intermediate classification decisions, e.g. to classify low-confidence or rejected samples. (Hierarchical/Sequential Combination) 2

Effectivene Combinations: Statistical Reason “Average” classifier outputs D1-D5 produced with 0 to resubstitution error for different feature subsets produce (e.g. for 1-NN or better decision tree) estimates The statistical reason for combining classifiers. D � is the best classifier for the problem, Fig. 3.1 the outer curve shows the space of all classifiers; the shaded area is the space of classifiers with good performances on the data set. 3

Effective Combinations: Computational Reason Aggregation of local-search D1-D4: classifiers trained using hill optimizations climbing may improve (e.g. gradient over individual descent), random search (local) error minima The computational reason for combining classifiers. D � is the best classifier for the Fig. 3.2 problem, the closed space shows the space of all classifiers, the dashed lines are the hypothetical trajectories for the classifiers during training. 4

Effective Combinations: Representational Reason Combination allowing decision D1-D4: boundaries not four linear expressible in classifiers (e.g. SVM original classifier with fixed parameter space kernel) to be represented The representational reason for combining classifiers. D � is the best classifier for the Fig. 3.3 problem; the closed shape shows the chosen space of classifiers. 5

Classifier Output Types (Xu, Krzyzak, Suen ) Type 1: Abstract Level Chosen class label for each base classifier Type 2: Rank Level List of ranked class labels for each base classifier Type 3: Measurement Level Real values (e.g. [0,1]) for each class (discriminant function outputs) 6

Fusion Combinations ( k base classifiers) For Type 1 (Single Label per Base Classifier) D l ( x ) = C ( B ( x )) B : R n → Ω k , C : Ω k → Ω Combination example: voting Intermediate Feature Space For Type 2 (Ranked list of r classes) D r ( x ) = C ( B ( x )) B : R n → Ω rk , C : Ω rk → Ω Combination example: weighted voting (e.g. Borda Count) For Type 3 (Discriminant Values) D m ( x ) = C ( B ( x )) B : R n → R | Ω | k , C : R | Ω | k → Ω 7 Combination example: min, max, product rules

Classifier Ensembles (Fusion): Combination Techniques s Fig. 3.4 Approaches to building classifier ensembles. 8

Cascade Architecture *McCane, Novins and Albert, “Optimizing Cascade Classifiers” (unpublished, 2005) image pos? pos? pos? AdaBoost 1 AdaBoost 2 AdaBoost 3 negative negative negative (a) A classifier cascade Here a set of classifiers is obtained using AdaBoost, then partitioned using dynamic programming to produce a cascade of binary classifiers (detectors) (Viola and Jones face detector (2001, 2004)) 9

ECOC: Another Label-Based Combiner Error-Correcting Output Codes (ECOC) Classifier ensemble comprised of binary classifiers (each distinguishing a subset of the class labels: dichotomizers ) • Represent base classifier output as a bit string • Learn/associate bit string sequences with concrete labels; classify by Hamming distance to bit string (‘code’) for each class L (‘support’/discriminant value) X m j ( x ) ¼ � j s i � C ( j , i ) j i ¼ 1 • Details provided in Ch. 8 (Kuncheva) ( s 1 , . . . , s L ) ¼ (0, 1, 1, 0, 1, 0, 1). D 1 D 2 D 3 D 4 D 5 D 6 D 7 Hamming Distances: v 1 0 0 0 1 0 1 1 class 1: 5 v 2 0 0 1 0 0 0 0 v 3 0 1 0 0 1 0 1 class 2: 3 v 4 1 0 0 0 1 1 0 class 3: 1 10 class 4: 5

Training Combiners: Stacked Generalization Fig. 3.5 Standard four-fold cross-validation set-up. Protocol: Train base classifiers using cross-fold validation Then train combiner on all N points by using the class labels output by the base classifiers for each fold (train/test partition) 11

There is nothing new under the sun... Sebeysten (1962) Idea of using classifier outputs as input features, classifier cascade architectures Dasarthy and Sheila (1975) Classifier selection using two classifiers Rastrigin and Erenstein (1981 - Russian) Dynamic classifier selection Barabash (1983) Theoretical results on majority vote classifier combination 12

Classifier Combination Kuncheva Ch. 3 Motivation Classifiers Are - PowerPoint PPT Presentation

Classifier Combination Kuncheva Ch. 3 Motivation Classifiers Are functions that map feature vectors to classes. Any single classifier selected will define this function subject to certain biases due to the space of models defined, the

The Nave Bayes Classifier Machine Learning 1 Todays lecture The nave Bayes Classifier

Lazy Associative Classification Decision Tree Classifier (Eager) Associative Classifier By

MT System Combination Silja Hildebrand MT System Combination System Combination in MT

Data Mining with Weka Class 2 Lesson 1 Be a classifier! Ian H. Witten Department of Computer

When and Why to use a Classifier? When and Why to use a Classifier? Alan Rector Alan Rector

When and Why to use a Classifier? When and Why to use a Classifier? Alan Rector Alan Rector

Lecture 2: Nearest Neighbour Classifier Aykut Erdem September 2017 Hacettepe University Your

Maximum Entropy Classifier Ensembling using Ge- netic Algorithm for NER in Bengali Asif Ekbal 1

Data Classification Linear Classifier II Latent Differential Analysis Mean Classification

Classifier Selection Nicholas Ver Hoeve Craig Martek Ben Gardner Classifier Ensembles Assume

Classifier Classifier Systems Systems

Ensemble Methods Or, Model Combination Based on lecture by Parikshit Ram Numerous Possible

Delivering Disciplined Delivering Disciplined Growth Kinross Friendly Combination Kinross

Proposed Combination with Ascott Residence Trust (the Combination) 26 September 2019

Proposed Combination with Ascott Residence Trust (the Combination) 3 July 2019 Disclaimer

Combination and PDF Fit tools Used in ATLAS. A. Cooper-Sarkar, S. Glazov, V. Radescu, A.

On Top-k Selection from m-wise Partial Rankings via Borda Counting Wenjing Chen 1 Ruida Zhou 1 Chao

Math for Liberal Arts MAT 110: Chapter 12 Notes Voting: Does the Majority Always Rule? Voting

The Paradox of Grading Systems Steven J. Brams Department of Politics New York University New

Chapter 3 Section 1 MA1020 Quantitative Literacy Sidney Butler Michigan Technological University

Mechanism design without money Voting and ranking What will 2020 bring (for this course) This

Collective Schedules Fanny Pascual, Krzysztof Rzadca, Piotr Skowron Sorbonne Universit

Notes in Social Choice Dimitris Fotakis S CHOOL OF E LECTRICAL AND C OMPUTER E NGINEERING N

DPM 2013 Privacy-Preserving Multi-Party Reconciliation Secure in the Malicious Model 8th ACM