Data Dependence in Data Dependence in Combining Classifiers - - PowerPoint PPT Presentation

data dependence in data dependence in combining
SMART_READER_LITE
LIVE PREVIEW

Data Dependence in Data Dependence in Combining Classifiers - - PowerPoint PPT Presentation

Data Dependence in Data Dependence in Combining Classifiers Combining Classifiers Mohamed Kamel, Nayer Wanas Mohamed Kamel, Nayer Wanas Pattern Analysis and Machine Intelligence Lab Pattern Analysis and Machine Intelligence Lab University of


slide-1
SLIDE 1

Data Dependence in Data Dependence in Combining Classifiers Combining Classifiers

Mohamed Kamel, Nayer Wanas Mohamed Kamel, Nayer Wanas

Pattern Analysis and Machine Intelligence Lab Pattern Analysis and Machine Intelligence Lab University of Waterloo University of Waterloo CANADA CANADA

slide-2
SLIDE 2

Introduction Introduction Data Dependence Data Dependence

! ! Implicit Dependence

Implicit Dependence

! ! Explicit Dependence

Explicit Dependence

Feature Based Architecture Feature Based Architecture

! ! Training Algorithm

Training Algorithm

Results Results Conclusions Conclusions

Outline Outline

slide-3
SLIDE 3

MCS 2003 MCS 2003 Data Dependence in Combining Classifiers Data Dependence in Combining Classifiers Introduction Data Dependence Implicit Explicit Feature Based Training Results Conclusions

Introduction Introduction

Pattern Recognition Systems Pattern Recognition Systems

! ! Best possible classification rates.

Best possible classification rates.

! ! Increase efficiency and accuracy.

Increase efficiency and accuracy.

Multiple Classifier Systems Multiple Classifier Systems

! ! Evidence of improving performance

Evidence of improving performance

! ! Problem decomposed naturally from using

Problem decomposed naturally from using various sensors various sensors

! ! Avoid making commitments to arbitrary initial

Avoid making commitments to arbitrary initial conditions or parameters conditions or parameters

Introduction

slide-4
SLIDE 4

MCS 2003 MCS 2003 Data Dependence in Combining Classifiers Data Dependence in Combining Classifiers Introduction Data Dependence Implicit Explicit Feature Based Training Results Conclusions

Categorization of MCS Categorization of MCS

Architecture Architecture Input/Output Mapping Input/Output Mapping Representation Representation Types of classifiers Types of classifiers

Introduction

slide-5
SLIDE 5

MCS 2003 MCS 2003 Data Dependence in Combining Classifiers Data Dependence in Combining Classifiers Introduction Data Dependence Implicit Explicit Feature Based Training Results Conclusions

Categorization of MCS Categorization of MCS (

(cntd cntd…) …)

Architecture Architecture

Parallel Parallel [

[Dasarathy Dasarathy, 94] , 94]

Serial Serial [

[Dasarathy Dasarathy, 94] , 94]

Classifier 1 Classifier 2 Classifier N

F U S I O N

Input 1 Output Input 2 Input N Classifier 1 Classifier 2 Classifier N Input 1 Input 2 Input N Output

Introduction

slide-6
SLIDE 6

MCS 2003 MCS 2003 Data Dependence in Combining Classifiers Data Dependence in Combining Classifiers Introduction Data Dependence Implicit Explicit Feature Based Training Results Conclusions

Categorization of MCS Categorization of MCS (

(cntd cntd…) …)

Architectures Architectures [Lam, 00]

[Lam, 00]

Conditional Topology Conditional Topology

! ! Once a classifier unable to classify the output the

Once a classifier unable to classify the output the following classifier is deployed following classifier is deployed

Hierarchal Topology Hierarchal Topology

! ! Classifiers applied in succession

Classifiers applied in succession

! ! Classifiers with various levels of generalization

Classifiers with various levels of generalization

Hybrid Topology Hybrid Topology

! ! The choice of the classifier to use is based on the

The choice of the classifier to use is based on the input pattern (selection) input pattern (selection)

Multiple (Parallel) Topology Multiple (Parallel) Topology

Introduction

slide-7
SLIDE 7

MCS 2003 MCS 2003 Data Dependence in Combining Classifiers Data Dependence in Combining Classifiers Introduction Data Dependence Implicit Explicit Feature Based Training Results Conclusions

Categorization of MCS Categorization of MCS (

(cntd cntd…) …)

Input/Output Mapping Input/Output Mapping

Linear Mapping Linear Mapping

! ! Sum Rule

Sum Rule

! ! Weighted Average

Weighted Average [

[Hashem Hashem 97] 97]

Non Non-

  • linear Mapping

linear Mapping

! ! Maximum

Maximum

! ! Product

Product

! ! Hierarchal Mixture of Experts

Hierarchal Mixture of Experts [Jordon and Jacobs 94]

[Jordon and Jacobs 94]

! ! Stacked Generalization

Stacked Generalization [

[Wolpert Wolpert 92] 92]

Introduction

slide-8
SLIDE 8

MCS 2003 MCS 2003 Data Dependence in Combining Classifiers Data Dependence in Combining Classifiers Introduction Data Dependence Implicit Explicit Feature Based Training Results Conclusions

Categorization of MCS Categorization of MCS (

(cntd cntd…) …)

Representation Representation

Similar representations Similar representations

! ! Classifiers need to be different

Classifiers need to be different

Different representation Different representation

! ! Use of different sensors

Use of different sensors

! ! Different features extracted from the same data set

Different features extracted from the same data set [Ho, 98, [Ho, 98, Skurichina Skurichina & & Duin Duin, 02] , 02]

Introduction

slide-9
SLIDE 9

MCS 2003 MCS 2003 Data Dependence in Combining Classifiers Data Dependence in Combining Classifiers Introduction Data Dependence Implicit Explicit Feature Based Training Results Conclusions

Categorization of MCS Categorization of MCS (

(cntd cntd…) …)

Types of Classifiers Types of Classifiers

Specialized classifiers Specialized classifiers

! ! Encourage specialization in areas of the feature space

Encourage specialization in areas of the feature space

! ! All classifiers must contribute to achieve a final decision

All classifiers must contribute to achieve a final decision

! ! Hierarchal Mixture of Experts

Hierarchal Mixture of Experts [Jordon and Jacobs 94]

[Jordon and Jacobs 94]

! ! Co

Co-

  • operative Modular Neural Networks
  • perative Modular Neural Networks [

[Auda Auda and Kamel 98] and Kamel 98]

Ensemble of classifiers Ensemble of classifiers

! ! Set of redundant classifiers

Set of redundant classifiers

Competitive versus cooperative [Sharkey, 1999] Competitive versus cooperative [Sharkey, 1999]

Introduction

slide-10
SLIDE 10

MCS 2003 MCS 2003 Data Dependence in Combining Classifiers Data Dependence in Combining Classifiers Introduction Data Dependence Implicit Explicit Feature Based Training Results Conclusions

Categorization of MCS Categorization of MCS (

(cntd cntd…) …)

Data Dependence Data Dependence

! ! Classifiers inherently dependent on the data.

Classifiers inherently dependent on the data.

! ! Describe how the final aggregation uses the

Describe how the final aggregation uses the information present in the input pattern. information present in the input pattern.

! ! Describe the relationship between the final

Describe the relationship between the final

  • utput
  • utput Q(x)

Q(x) and the pattern under and the pattern under classification classification x x

Introduction

slide-11
SLIDE 11

MCS 2003 MCS 2003 Data Dependence in Combining Classifiers Data Dependence in Combining Classifiers Introduction Data Dependence Implicit Explicit Feature Based Training Results Conclusions

Data Dependence Data Dependence

Data Independent Data Independent Implicitly Dependent Implicitly Dependent Explicitly Dependent Explicitly Dependent

Data Dependence

slide-12
SLIDE 12

MCS 2003 MCS 2003 Data Dependence in Combining Classifiers Data Dependence in Combining Classifiers Introduction Data Dependence Implicit Explicit Feature Based Training Results Conclusions

Data Independence Data Independence

Solely rely on output of classifiers to determine Solely rely on output of classifiers to determine final classification output. final classification output.

Q(x) Q(x) is the final class assigned for pattern is the final class assigned for pattern x x C Cj

j is a vector composed of the output of the various

is a vector composed of the output of the various classifiers in the ensemble { classifiers in the ensemble {c c1j

1j,c

,c2j

2j,...,

,...,c cNj

Nj}

} for a given for a given class class y yj

j

c cij

ij is the confidence classifier

is the confidence classifier i i has in pattern has in pattern x x belonging to class belonging to class y yj

j

Mapping Mapping F Fj

j can be linear or non

can be linear or non-

  • linear

linear

Data Dependence

j) (x)), (C (F max arg Q(x)

j j j

∀ =

slide-13
SLIDE 13

MCS 2003 MCS 2003 Data Dependence in Combining Classifiers Data Dependence in Combining Classifiers Introduction Data Dependence Implicit Explicit Feature Based Training Results Conclusions

Data Independence Data Independence (

(cntd cntd…) …)

Simple voting techniques are data independent Simple voting techniques are data independent

! ! Average

Average

! ! Maximum

Maximum

! ! Majority

Majority

Susceptible to incorrect estimates of the confidence Susceptible to incorrect estimates of the confidence

Data Dependence

slide-14
SLIDE 14

MCS 2003 MCS 2003 Data Dependence in Combining Classifiers Data Dependence in Combining Classifiers Introduction Data Dependence Implicit Explicit Feature Based Training Results Conclusions

Implicit Data Dependence Implicit Data Dependence

Train the combiner on global performance of the Train the combiner on global performance of the data data

W(C (x)) W(C (x)) is the weighting matrix composed of is the weighting matrix composed of elements elements w wij

ij

w wij

ij is the weight assigned to class

is the weight assigned to class j j in classifier in classifier i i

Implicit

j) (x)), C x C (W (F max arg Q(x)

j j j

∀ = )), ( (

slide-15
SLIDE 15

MCS 2003 MCS 2003 Data Dependence in Combining Classifiers Data Dependence in Combining Classifiers Introduction Data Dependence Implicit Explicit Feature Based Training Results Conclusions

Implicit Data Dependence Implicit Data Dependence (

(cntd cntd…) …)

Implicitly data dependent approaches include Implicitly data dependent approaches include

! ! Weighted average

Weighted average [

[Hashem Hashem 97] 97]

! ! Fuzzy Measures

Fuzzy Measures [

[Gader Gader et al 96] et al 96]

! ! Belief theory

Belief theory [

[Xu Xu et al 92] et al 92]

! ! Behavior Knowledge Space (BKS)

Behavior Knowledge Space (BKS) [Huang et al 95]

[Huang et al 95]

! ! Decision Templates

Decision Templates [

[Kuncheva Kuncheva et al 01] et al 01]

! ! Modular approaches

Modular approaches [

[Auda Auda and Kamel 98] and Kamel 98]

! ! Stacked Generalization

Stacked Generalization [

[Wolpert Wolpert 92] 92]

! ! Boosting

Boosting [

[Schapire Schapire 90] 90]

Lack consideration for local superiority of classifiers Lack consideration for local superiority of classifiers

Implicit

slide-16
SLIDE 16

MCS 2003 MCS 2003 Data Dependence in Combining Classifiers Data Dependence in Combining Classifiers Introduction Data Dependence Implicit Explicit Feature Based Training Results Conclusions

Explicit Data Dependence Explicit Data Dependence

Classifier selection or combining performed Classifier selection or combining performed based on the sub based on the sub-

  • space which the input pattern

space which the input pattern belongs to. belongs to. Final classification is dependent on the pattern Final classification is dependent on the pattern being classified. being classified.

Explicit

j) (x)), C x (W (F max arg Q(x)

j j j

∀ = ), (

slide-17
SLIDE 17

MCS 2003 MCS 2003 Data Dependence in Combining Classifiers Data Dependence in Combining Classifiers Introduction Data Dependence Implicit Explicit Feature Based Training Results Conclusions

Explicit Data Dependence Explicit Data Dependence (

(cntd cntd…) …)

Explicitly Data Dependent approaches include Explicitly Data Dependent approaches include

! ! Dynamic Classifier Selection

Dynamic Classifier Selection (DCS) (DCS)

DCS With local Accuracy (DCS_LA) DCS With local Accuracy (DCS_LA) [Woods et. al.,97]

[Woods et. al.,97]

DCS based on Multiple Classifier Behavior (DCS_MCB) DCS based on Multiple Classifier Behavior (DCS_MCB)

[ [Giancinto Giancinto and and Roli Roli, 01] , 01]

! ! Hierarchal Mixture of Experts

Hierarchal Mixture of Experts [Jordon and Jacobs 94]

[Jordon and Jacobs 94]

! ! Feature

Feature-

  • based approach

based approach [Wanas et. al., 99]

[Wanas et. al., 99]

Weights demonstrate dependence on the input Weights demonstrate dependence on the input pattern. pattern. Intuitively should perform better than other methods Intuitively should perform better than other methods

Explicit

slide-18
SLIDE 18

MCS 2003 MCS 2003 Data Dependence in Combining Classifiers Data Dependence in Combining Classifiers Introduction Data Dependence Implicit Explicit Feature Based Training Results Conclusions

Feature Based Architectures Feature Based Architectures

Methodology to incorporate multiple classifiers in Methodology to incorporate multiple classifiers in a dynamically adapting system a dynamically adapting system Aggregation adapts to the behavior of the Aggregation adapts to the behavior of the ensemble ensemble

! ! Detectors generate weights for each classifier that

Detectors generate weights for each classifier that reflect the degree of confidence in each classifier for a reflect the degree of confidence in each classifier for a given input given input

! ! A trained aggregation learns to combine the different

A trained aggregation learns to combine the different decisions decisions

Feature Based

slide-19
SLIDE 19

MCS 2003 MCS 2003 Data Dependence in Combining Classifiers Data Dependence in Combining Classifiers Introduction Data Dependence Implicit Explicit Feature Based Training Results Conclusions

Feature Based Architectures Feature Based Architectures

( (cntd cntd…) …)

Architecture I Architecture I

Feature Based

  • N. Wanas, M. Kamel, G. Auda, and F. Karray, “Feature Based Decision Aggregation in Modular

Neural Network Classifiers”, Pattern Recognition Letters, 20(11-13), 1353-1359, 1999.

slide-20
SLIDE 20

MCS 2003 MCS 2003 Data Dependence in Combining Classifiers Data Dependence in Combining Classifiers Introduction Data Dependence Implicit Explicit Feature Based Training Results Conclusions

Feature Based Architectures Feature Based Architectures

( (cntd cntd…) …)

Classifiers Classifiers

! ! Each individual classifier,

Each individual classifier, C Ci

i, produces some output

, produces some output representing its interpretation of the input representing its interpretation of the input x x

! ! Utilizing sub

Utilizing sub-

  • optimal classifiers.
  • ptimal classifiers.

! ! The collection of classifier outputs for class

The collection of classifier outputs for class y yj

j is

is represented as represented as C Cj

j(x

(x) )

Detector Detector

! ! Detector

Detector D Dl

l is a classifier that uses input features to

is a classifier that uses input features to extract useful information for aggregation extract useful information for aggregation

! ! Doesn’t aim to solve the classification problem.

Doesn’t aim to solve the classification problem.

! ! Detector output

Detector output d dlg

lg(x

(x) ) is a probability that the input is a probability that the input pattern x is categorized to group pattern x is categorized to group g g. .

! ! The output of all the detectors is represented by

The output of all the detectors is represented by D(X) D(X)

Feature Based

slide-21
SLIDE 21

MCS 2003 MCS 2003 Data Dependence in Combining Classifiers Data Dependence in Combining Classifiers Introduction Data Dependence Implicit Explicit Feature Based Training Results Conclusions

Feature Based Architectures Feature Based Architectures

( (cntd cntd…) …)

Aggregation Aggregation

! ! Fusion layer for all the classifiers

Fusion layer for all the classifiers

! ! Trained to adapt to the behavior of the various

Trained to adapt to the behavior of the various modules modules

! ! Explicit data dependent

Explicit data dependent

Weights dependent on the input Weights dependent on the input pattern being classified pattern being classified

Feature Based

j) (x)), C x (D (F max arg Q(x)

j j j

∀ = ), (

slide-22
SLIDE 22

MCS 2003 MCS 2003 Data Dependence in Combining Classifiers Data Dependence in Combining Classifiers Introduction Data Dependence Implicit Explicit Feature Based Training Results Conclusions

Feature Based Architectures Feature Based Architectures

( (cntd cntd…) …)

Architecture II Architecture II

Feature Based

slide-23
SLIDE 23

MCS 2003 MCS 2003 Data Dependence in Combining Classifiers Data Dependence in Combining Classifiers Introduction Data Dependence Implicit Explicit Feature Based Training Results Conclusions

Feature Based Architectures Feature Based Architectures

( (cntd cntd…) …)

Classifiers Classifiers

! ! Each individual classifier,

Each individual classifier, C Ci

i, produces some output

, produces some output representing its interpretation of the input representing its interpretation of the input x x

! ! Utilizing sub

Utilizing sub-

  • optimal classifiers.
  • ptimal classifiers.

! ! The collection of classifier outputs for class

The collection of classifier outputs for class y yj

j is

is represented as represented as C Cj

j(x

(x) )

Detector Detector

! ! Appends input to output of classifier ensemble.

Appends input to output of classifier ensemble.

! ! Produces a weighting factor,

Produces a weighting factor, w

wij

ij ,for each class in a

,for each class in a classifier output. classifier output.

! ! The dependence of the weights on both the classifier

The dependence of the weights on both the classifier

  • utput and the input pattern is represented by
  • utput and the input pattern is represented by

W(x,C W(x,Cj

j (x))

(x))

Feature Based

slide-24
SLIDE 24

MCS 2003 MCS 2003 Data Dependence in Combining Classifiers Data Dependence in Combining Classifiers Introduction Data Dependence Implicit Explicit Feature Based Training Results Conclusions

Feature Based Architectures Feature Based Architectures

( (cntd cntd…) …)

Aggregation Aggregation

! ! Fusion layer for all the classifiers

Fusion layer for all the classifiers

! ! Trained to adapt to the behavior of the various

Trained to adapt to the behavior of the various modules modules

! ! Combines implicit and explicit data dependence

Combines implicit and explicit data dependence

Weights dependent on the input pattern and the Weights dependent on the input pattern and the performance of the classifiers. performance of the classifiers.

Feature Based

j) (x)), C x C x (W (F max arg Q(x)

j j j j

∀ = )), ( , (

slide-25
SLIDE 25

MCS 2003 MCS 2003 Data Dependence in Combining Classifiers Data Dependence in Combining Classifiers Introduction Data Dependence Implicit Explicit Feature Based Training Results Conclusions

Results Results

Five one Five one-

  • hidden layer BP classifiers

hidden layer BP classifiers Training used partially disjoint data sets Training used partially disjoint data sets No optimization is performed for the trained No optimization is performed for the trained networks networks The parameters of all the networks are The parameters of all the networks are maintained for all the classifiers that are trained maintained for all the classifiers that are trained Three data sets Three data sets

! ! 20 Class Gaussian

20 Class Gaussian

! ! Satimages

Satimages

! ! Clouds data

Clouds data

Results

slide-26
SLIDE 26

MCS 2003 MCS 2003 Data Dependence in Combining Classifiers Data Dependence in Combining Classifiers Introduction Data Dependence Implicit Explicit Feature Based Training Results Conclusions

Results Results (

(cntd cntd…) …)

12.48 12.48 ± ± 0.19 0.19 10.28 10.28 ± ± 0.10 0.10 8.64 8.64 ± ± 0.60 0.60 Feature Feature-

  • based

based Explicitly Data Dependent Explicitly Data Dependent 13.71 13.71 ± ± 0.19 0.19 10.67 10.67 ± ± 0.05 0.05 12.95 12.95 ± ± 0.34 0.34 Fuzzy Integral Fuzzy Integral 13.51 13.51 ± ± 0.16 0.16 10.71 10.71 ± ± 0.02 0.02 12.48 12.48 ± ± 0.21 0.21 Bayesian Bayesian 13.14 13.14 ± ± 0.21 0.21 10.59 10.59 ± ± 0.05 0.05 12.57 12.57 ± ± 0.20 0.20 Weighted Avg. Weighted Avg. Implicitly Data Dependent Approaches Implicitly Data Dependent Approaches 13.77 13.77 ± ± 0.20 0.20 10.71 10.71 ± ± 0.02 0.02 13.04 13.04 ± ± 0.30 0.30 Borda Borda 13.23 13.23 ± ± 0.22 0.22 10.66 10.66 ± ± 0.04 0.04 12.83 12.83 ± ± 0.26 0.26 Average Average 13.40 13.40 ± ± 0.16 0.16 10.71 10.71 ± ± 0.02 0.02 13.13 13.13 ± ± 0.36 0.36 Majority Majority 13.61 13.61 ± ± 0.21 0.21 10.68 10.68 ± ± 0.04 0.04 12.92 12.92 ± ± 0.35 0.35 Maximum Maximum Data Dependent Approaches Data Dependent Approaches 7.20 7.20 ± ± 0.36 0.36 7.41 7.41 ± ± 0.16 0.16 7.29 7.29 ± ± 1.06 1.06 Oracle Oracle 14.06 14.06 ± ± 1.33 1.33 10.92 10.92 ± ± 0.08 0.08 13.82 13.82 ± ± 1.16 1.16 Singlenet Singlenet Satimages Satimages Clouds Clouds 20 Class 20 Class Data Set Data Set

Results

slide-27
SLIDE 27

MCS 2003 MCS 2003 Data Dependence in Combining Classifiers Data Dependence in Combining Classifiers Introduction Data Dependence Implicit Explicit Feature Based Training Results Conclusions

Training Training

Training each component independently Training each component independently

! ! Optimize individual components, may not

Optimize individual components, may not lead to overall improvement lead to overall improvement

! ! Collinearity

Collinearity, high correlation between , high correlation between classifiers classifiers

! ! Components, under

Components, under-

  • trained or over

trained or over-

  • trained

trained

Training

slide-28
SLIDE 28

MCS 2003 MCS 2003 Data Dependence in Combining Classifiers Data Dependence in Combining Classifiers Introduction Data Dependence Implicit Explicit Feature Based Training Results Conclusions

Training Training (

(cntd cntd…) …)

Adaptive training Adaptive training

! ! Selective:

Selective: Reducing correlation between

Reducing correlation between components components

! ! Focused:

Focused: Re

Re-

  • training focuses on misclassified

training focuses on misclassified patterns. patterns.

! ! Efficient:

Efficient: Controls the duration of training

Controls the duration of training

Training

slide-29
SLIDE 29

MCS 2003 MCS 2003 Data Dependence in Combining Classifiers Data Dependence in Combining Classifiers Introduction Data Dependence Implicit Explicit Feature Based Training Results Conclusions

Adaptive Training: Main loop Adaptive Training: Main loop

Increase diversity among Increase diversity among ensemble ensemble Incremental learning Incremental learning Evaluation of training to Evaluation of training to determine the re determine the re-

  • training set

training set

Training

slide-30
SLIDE 30

MCS 2003 MCS 2003 Data Dependence in Combining Classifiers Data Dependence in Combining Classifiers Introduction Data Dependence Implicit Explicit Feature Based Training Results Conclusions

Adaptive Training: Training Adaptive Training: Training

Save classifier if it performs Save classifier if it performs well on the evaluation set well on the evaluation set Determine when to Determine when to terminate training for each terminate training for each module module

Training

slide-31
SLIDE 31

MCS 2003 MCS 2003 Data Dependence in Combining Classifiers Data Dependence in Combining Classifiers Introduction Data Dependence Implicit Explicit Feature Based Training Results Conclusions

Adaptive Training: Evaluation Adaptive Training: Evaluation

Train aggregation modules Train aggregation modules Evaluate training sets for Evaluate training sets for each classifier each classifier Compose new training data Compose new training data

Training

slide-32
SLIDE 32

MCS 2003 MCS 2003 Data Dependence in Combining Classifiers Data Dependence in Combining Classifiers Introduction Data Dependence Implicit Explicit Feature Based Training Results Conclusions

Adaptive Training: Data Selection Adaptive Training: Data Selection

New training data are composed by New training data are composed by concatenating concatenating

! ! Error

Errori

i: Misclassified entries of training

: Misclassified entries of training data for classifier data for classifier i. i.

! ! Correct

Correcti

i: Random choice of

: Random choice of P ratio of P ratio of correctly classified entries of the correctly classified entries of the training data for classifier training data for classifier i. i.

Training

slide-33
SLIDE 33

MCS 2003 MCS 2003 Data Dependence in Combining Classifiers Data Dependence in Combining Classifiers Introduction Data Dependence Implicit Explicit Feature Based Training Results Conclusions

Results Results

Five one Five one-

  • hidden layer BP classifiers

hidden layer BP classifiers Training used partially disjoint data sets Training used partially disjoint data sets No optimization is performed for the trained No optimization is performed for the trained networks networks The parameters of all the networks are The parameters of all the networks are maintained for all the classifiers that are trained maintained for all the classifiers that are trained Three data sets Three data sets

! ! 20 Class Gaussian

20 Class Gaussian

! ! Satimages

Satimages

! ! Clouds data

Clouds data

Results

slide-34
SLIDE 34

MCS 2003 MCS 2003 Data Dependence in Combining Classifiers Data Dependence in Combining Classifiers Introduction Data Dependence Implicit Explicit Feature Based Training Results Conclusions

Results Results (

(cntd cntd…) …)

16.96 16.96 ± ± 0.87 0.87 11.97 11.97 ± ± 0.59 0.59 14.80 14.80 ± ± 1.32 1.32 Best Classifier Best Classifier 17.13 17.13 ± ± 1.03 1.03 12.03 12.03 ± ± 0.52 0.52 14.75 14.75 ± ± 1.06 1.06 Best Classifier Best Classifier 14.72 14.72 ± ± 0.43 0.43 11.00 11.00 ± ± 0.09 0.09 14.03 14.03 ± ± 0.64 0.64 Best Classifier Best Classifier 5.48 5.48 ± ± 0.18 0.18 5.43 5.43 ± ± 0.11 0.11 5.42 5.42 ± ± 1.30 1.30 Oracle Oracle 5.58 5.58 ± ± 0.17 0.17 5.73 5.73 ± ± 0.11 0.11 6.79 6.79 ± ± 2.30 2.30 Oracle Oracle 7.20 7.20 ± ± 0.36 0.36 7.41 7.41 ± ± 0.16 0.16 7.29 7.29 ± ± 1.06 1.06 Oracle Oracle 12.33 12.33 ± ± 0.14 0.14 10.06 10.06 ± ± 0.13 0.13 8.01 8.01 ± ± 0.19 0.19 Feature Based Feature Based Feature Based Architecture Trained Adaptively Feature Based Architecture Trained Adaptively 12.40 12.40 ± ± 0.12 0.12 10.24 10.24 ± ± 0.17 0.17 8.62 8.62 ± ± 0.25 0.25 Feature Based Feature Based Ensemble Trained Adaptively using WA as the evaluation function Ensemble Trained Adaptively using WA as the evaluation function 12.48 12.48 ± ± 0.19 0.19 10.28 10.28 ± ± 0.10 0.10 8.64 8.64 ± ± 0.60 0.60 Feature Based Feature Based Normal Training Normal Training 14.06 14.06 ± ± 1.33 1.33 10.92 10.92 ± ± 0.08 0.08 13.82 13.82 ± ± 1.16 1.16 Singlenet Singlenet Satimages Satimages Clouds Clouds 20 Class 20 Class Data Set Data Set

Results

slide-35
SLIDE 35

MCS 2003 MCS 2003 Data Dependence in Combining Classifiers Data Dependence in Combining Classifiers Introduction Data Dependence Implicit Explicit Feature Based Training Results Conclusions

Conclusions Conclusions

Categorization of various combining Categorization of various combining approaches based on data dependence approaches based on data dependence

Independent : vulnerable to incorrect Independent : vulnerable to incorrect confidence estimates confidence estimates implicitly dependent: doesn’t take into implicitly dependent: doesn’t take into account local superiority of classifiers account local superiority of classifiers Explicitly dependent: Literature focuses on Explicitly dependent: Literature focuses on selection not combining selection not combining

Conclusions

slide-36
SLIDE 36

MCS 2003 MCS 2003 Data Dependence in Combining Classifiers Data Dependence in Combining Classifiers Introduction Data Dependence Implicit Explicit Feature Based Training Results Conclusions

Conclusions Conclusions (

(cntd cntd…) …)

Feature Feature-

  • based approach

based approach

! ! Combines implicit and explicit data dependence

Combines implicit and explicit data dependence

! ! Uses an Evolving training algorithm to enhance

Uses an Evolving training algorithm to enhance diversity amongst classifiers diversity amongst classifiers

! ! Reduces harmful correlation

Reduces harmful correlation

! ! Determines duration of training

Determines duration of training

! ! Improved classification accuracy

Improved classification accuracy

Conclusions

slide-37
SLIDE 37

MCS 2003 MCS 2003 Data Dependence in Combining Classifiers Data Dependence in Combining Classifiers Introduction Data Dependence Implicit Explicit Feature Based Training Results Conclusions

References References

[Kittler et. al., 98] [Kittler et. al., 98] J. Kittler, M. Hatef, R. Duin, and J. Matas, “On Combining Classifiers”, IEEE Trans. PAMI, 20:3, 226-239, 1998. [ [Dasarthy Dasarthy, 94] , 94] B. Dasarthy, “Decision Fusion”, IEEE Computer Soc. Press, 1994. [Lam, 00] [Lam, 00] L. Lam, “Classifier Combinations: Implementations and Theoretical Issues”, MCS2000, LNCS 1857, 77-86, 2000. [ [Hashem Hashem, 1997] , 1997] S. Hashem, “Algorithms for Optimal Linear Combination of Neural Networks” Int. Conf. on Neural Networks, Vol 1, 242-247, 1997. [Jordon and Jacob, 94] [Jordon and Jacob, 94] M. Jordon, and R. Jacobs, “Hierarchical Mixture of Experts and the EM Algorithm”, Neural Computing, 181-214, 1994. [ [Wolpert Wolpert, 92] , 92] D. Wolpert, “Stacked Generalization”, Neural Networks, Vol 5, 241-259, 1992 [ [Auda Auda and Kamel, 98] and Kamel, 98] G. Auda and M. Kamel, “Modular Neural Network Classifiers: A Comparative Study”, J.

  • Int. Rob. Sys., Vol. 21, 117–129, 1998.

[ [Gader Gader et. al., 96]

  • et. al., 96] P. Gader, M. Mohamed, and J. Keller, “Fusion of Handwritten Word Classifiers”, Patt. Reco.

Let.,17(6), 577–584, 1996. [ [Xu Xu et. al., 92]

  • et. al., 92] L. Xu, A. Kazyzak, C. Suen, “Methods of Combining Multiple Classifiers and their Applications

to Handwritten Recognition”, IEEE Sys. Man and Cyb., 22(3), 418-435, 1992 [ [Kuncheva Kuncheva et. al., 01]

  • et. al., 01] L. Kuncheva, J. Bezdek, and R. Duin, “Decision Templates for Multiple Classifier Fusion:

An Experimental Comparison”, Patt. Reco., vol. 34, 299–314, 2001. [Huang et. al., 95] [Huang et. al., 95] Y. Huang, K. Liu, and C. Suen, “The Combination of Multiple Classifiers by a Neural Network Approach”, J. Patt. Reco. and Art. Int., Vol. 9, 579–597, 1995. [Schapire, 90] [Schapire, 90] R. Schapire, “The Strength of Weak Learnability”, Mach. Lear., Vol. 5, 197–227,1990. [ [Giancinto Giancinto and and Roli Roli, 01 , 01] G. Giancinto and F. Roli, “Dynamic Classifier Selection based on Multiple Classifier Behavior”, Patt. Reco., Vol. 34, 1879-1881, 2001. [Wanas et., al., 99 Wanas et., al., 99] N. Wanas, M. Kamel, G. Auda, and F. Karray, “Feature Based Decision Aggregation in Modular Neural Network Classifiers”, Patt. Reco. Lett., 20(11-13), 1353-1359, 1999.

slide-38
SLIDE 38

http://pami.uwaterloo.ca Email: mkamel@uwaterloo.ca