using rule-based classifier systems Shabnam Nazmi (PhD candidate) - - PowerPoint PPT Presentation

β–Ά
using rule based classifier
SMART_READER_LITE
LIVE PREVIEW

using rule-based classifier systems Shabnam Nazmi (PhD candidate) - - PowerPoint PPT Presentation

Multi-label classification using rule-based classifier systems Shabnam Nazmi (PhD candidate) Department of electrical and computer engineering North Carolina A&T state university Advisor: Dr. A. Homaifar outline Motivation


slide-1
SLIDE 1

Multi-label classification using rule-based classifier systems

Shabnam Nazmi (PhD candidate) Department of electrical and computer engineering North Carolina A&T state university Advisor: Dr. A. Homaifar

slide-2
SLIDE 2
  • utline
  • Motivation
  • Introduction
  • Multi-label classification overview
  • Confidence level in prediction
  • Multi-label classification using learning classifier systems

(LCSs)

  • Simulation results
  • Conclusion and future works

Shabnam Nazmi

2

slide-3
SLIDE 3

Motivation

  • Data-driven techniques are ubiquitous in many applications

such as classification, estimation and modeling

  • In some classification applications, samples in the data set

attribute to more than one class simultaneously

  • Multi-label classification methods that solve a single problem

are in advantage

  • The level of confidence in assigned labels to the samples, is

vital to train an accurate machine

  • When modeling a dynamical system, the overlap among

adjacent sub-models can be handled using multi-label data with appropriate confidence levels

Shabnam Nazmi

3

slide-4
SLIDE 4

Introduction

Multi-class classification

Shabnam Nazmi

4 Multi-label classification

slide-5
SLIDE 5

Introduction

  • In contrast to simple binary-class classification, each instance
  • f the data set belongs to one of (𝑁 > 2) different classes
  • The goal is to construct a function which, given a new data

point, will correctly predict the class to which the new point belongs

  • One-vs-all: trains 𝑁 binary classifiers one for each class
  • One-vs-all: trains 𝑁(𝑁 βˆ’ 1) classifiers to distinguish each pair
  • f classes
  • Decision tress, naΓ―ve Bayes, neural networks,…

Multi-class classification Multi-label classification

Shabnam Nazmi

5

slide-6
SLIDE 6

Introduction

  • In contrast to conventional (single-label) classification,

the setting of multi-label classification (MLC) allows an instance to belong to several classes simultaneously.

  • Multi-label classification tasks are ubiquitous in real-world

problems

  • Text categorization: each document may belong to several

predefined topics

  • Bioinformatics: one protein may have many effects on a cell when

predicting its functional classes

Multi-class classification Multi-label classification

Shabnam Nazmi

6

slide-7
SLIDE 7

Definitions

  • Notation:

𝐸: π‘›π‘£π‘šπ‘’π‘— βˆ’ π‘šπ‘π‘π‘“π‘š 𝑒𝑏𝑒𝑏 𝑑𝑓𝑒 𝐼: π‘Œ β†’ 𝑍

𝑗, 𝑍 π‘—πœ— 𝑍

𝑍 = 𝑧1, 𝑧2, … , π‘§π‘š

  • Label cardinality of 𝐸: the average number of labels of the

examples in 𝐸

  • Label density of 𝐸: the average number of labels of the

examples in 𝐸 divided by |𝑍|

  • Hamming loss:

𝐼𝑀 𝐼, 𝐸 = 1 |𝐸| |𝑍

π‘—βˆ†π‘ 𝑗|

|𝑍|

|𝐸| 𝑗=1

  • Ranking loss:

𝑆𝑀 𝑔 = 1 |𝐸| 1 𝑍 |𝑍 | |𝑆(𝑦)|

|𝐸| 𝑗=1

Shabnam Nazmi

7

slide-8
SLIDE 8

MLC methods

  • Problem transformation methods
  • Algorithm adaptation methods

Shabnam Nazmi

8

slide-9
SLIDE 9

MLC methods

  • Problem transformation methods
  • Select family: discards ML data or selects one of the multiple

labels for each instance

  • It discard a lot of information content in the original dataset
  • Label power set method: considers each different set of labels, as

a single label

  • It may lead to large number of classes with a few examples per class
  • Binary relevance: learns |𝑍| binary classifiers one for each

different label

  • The most common problem transformation method
  • Ranking by pairwise comparison: generates |𝑍|

2 binary label

data sets

  • Outputs a ranking of labels based on the votes from binary classifiers

Shabnam Nazmi

9

slide-10
SLIDE 10

MLC methods

  • Problem transformation methods
  • Select family: discards ML data or selects one of the multiple

labels for each instance

  • It discard a lot of information content in the original dataset
  • Label power set method: considers each different set of labels, as

a single label

  • It may lead to large number of classes with a few examples per class
  • Random 𝑙-labelsets: breaks the initial set of labels into small

random, disjoint or overlapping, subsets

  • Improves label power set results, still is challenged with domains

with large number of labels and instances

Shabnam Nazmi

10

slide-11
SLIDE 11

MLC methods

  • Algorithm adaptation methods
  • Decision trees: 𝐷4.5 was adapted to learn ML data
  • ML models that are understandable by human
  • Probabilistic methods: proposed for text classification, a

generative model is trained according to which, each label generates different words

  • The ML document in generated by a mixture of the word

distributions of its labels using EM

  • Neural networks: the back-propagation algorithm is adapted by

introduction of a new error function similar to ranking loss

  • Lazy methods: 𝑙-nearest neighbors algorithm is used to maximize

the posterior probability of labels assigned to new instances

  • Outputs a ranking function for the probability of each label

Shabnam Nazmi

11

slide-12
SLIDE 12

MLC methods

  • Algorithm adaptation methods
  • Support vector machines: the one-versus-one strategy is used to

partition a dataset with 𝑍 labels into 𝑍

2 double label subsets.

  • Assumes double label instances are located at marginal region

between positive and negative instances

  • Associative classification methods: constructs classification rule

sets using associative rule mining

  • MMAC learns an initial set of rules, removes the examples associated

with this rule set, and recursively learns a new rule set from the remaining examples until no further frequent items are left.

Shabnam Nazmi

12

slide-13
SLIDE 13

Confidence in prediction

  • The AdaBoost algorithm has been extended to generate a

confidence degree for the predictions of β€œweaker” hypotheses

  • Confidence scores give a reliability of each prediction
  • Classification methods like probabilistic approaches and

logistic regression, output a value as a probability of a label to be true

  • The idea of confidence in prediction can be extended to one

step prior to training

Shabnam Nazmi

13

slide-14
SLIDE 14

Confidence in prediction

  • The AdaBoost algorithm has been extended to generate a

confidence degree for the predictions of β€œweaker” hypotheses

  • Confidence scores give a reliability of each prediction
  • Classification methods like probabilistic approaches and

logistic regression, output a value as a probability of a label to be true

  • The idea of confidence in prediction can be extended to one

step prior to training

Encounter confidence levels in training data provided by the expert

Shabnam Nazmi

14

slide-15
SLIDE 15

Confidence in prediction

  • The AdaBoost algorithm has been extended to generate a

confidence degree for the predictions of β€œweaker” hypotheses

  • Confidence scores give a reliability of each prediction
  • Classification methods like probabilistic approaches and

logistic regression, output a value as a probability of a label to be true

  • The idea of confidence in prediction can be extended to one

step prior to training

  • The hypothesis will learn confidence levels and output a

confidence degree along with its predicted labels for new instances

Encounter confidence levels in training data provided by the expert

Shabnam Nazmi

15

slide-16
SLIDE 16

Notations

  • π‘Œ denotes the instance space and 𝑍 = {𝑧1, 𝑧2, … , 𝑧𝑙} is the

finite set of class labels

  • Each instance 𝑦 ∈ π‘Œ is associated with a subset of labels

𝑧 βŠ‚ 𝑍

  • 𝐸 is the set of data

𝐸 = { 𝑦1, πœ‡1, 𝐷1 , 𝑦2, πœ‡2, 𝐷2 , … π‘¦π‘œ, πœ‡π‘œ, π·π‘œ }

  • πœ‡π‘— is the binary relevance vector of labels for instance 𝑦𝑗

πœ‡π‘—,π‘˜ = {1: π‘§π‘˜ ∈ 𝑧, 0: π‘§π‘˜ βˆ‰ 𝑧|βˆ€π‘— ∈ 1, π‘œ , π‘˜ ∈ [1, 𝑙]}

  • 𝐼: π‘Œ β†’ (𝑍, 𝐷), outputs a set of predicted labels (𝑍

) along with a vector of confidence level (𝑋)of the hypothesis in each

  • f the labels

Shabnam Nazmi

16

slide-17
SLIDE 17

LCS structure

  • A strength based Michigan-style classifier system has been

used to extract knowledge from ML data

  • Michigan-style classifier system are rule-based and supervised

learning systems with a fixed rule length

  • Genetic algorithm acts as a driving force to help evolve useful

rules

  • Classification model consists of a population of rule in the

form of β€œIF condition-THEN action”

  • Originally structured for learning binary classification

problems

  • Isolated structure of the action part of the classifiers, lets

further modifications to adapt to more general cases of classification problems, namely multi-class and multi-label

Shabnam Nazmi

17

slide-18
SLIDE 18

LCS structure

Shabnam Nazmi

19 [P] [M] [A] [A ]

Covering CR Genetic algorithm Update rule parameters

Data set

Training instance

Model

Data set: a set of triples in the form of: (sample, label, confidence level) Training instance: randomly drawn individual from the data set

slide-19
SLIDE 19

LCS structure

Shabnam Nazmi

20 [P] [M] [A] [A ]

Covering CR Genetic algorithm Update rule parameters

Data set

Training instance

Model

[P]: population of rules/classifiers Classifier parameters:

  • Condition
  • Action
  • Strength (𝑇)
  • Confidence estimate

𝑋 = π‘₯1, π‘₯2, … , π‘₯𝑙

  • Confidence error (𝜁)
slide-20
SLIDE 20

LCS structure

Shabnam Nazmi

21 [P] [M] [A] [A ]

Covering CR Genetic algorithm Update rule parameters

Data set

Training instance

Model

Condition:

  • For binary-valued attributes

composed of {0,1, #}

  • For real-valued attributes takes the

form of an ordered list of pairs of center and spread (𝑑𝑗, 𝑑𝑗)

slide-21
SLIDE 21

LCS structure

Shabnam Nazmi

22 [P] [M] [A] [A ]

Covering CR Genetic algorithm Update rule parameters

Data set

Training instance

Model

Action: is an ordered list of 0,1

  • Example: labels for a sample drawn

from a four class data set "0110β€œ

  • Confidence level for this label set

𝐷 = [0 1 0.9 0]

slide-22
SLIDE 22

LCS structure

Shabnam Nazmi

23 [P] [M] [A] [A ]

Covering CR Genetic algorithm Update rule parameters

Data set

Training instance

Model

[M]: matching classifiers with provided instance 𝑑𝑗 βˆ’ 𝑑𝑗 < 𝑦𝑗 < 𝑑𝑗 + 𝑑𝑗 Covering: creates a matching classifier if [M] is empty

slide-23
SLIDE 23

LCS structure

Shabnam Nazmi

24 [P] [M] [A] [A ]

Covering CR Genetic algorithm Update rule parameters

Data set

Training instance

Model

CR: conflict resolution

  • Uses bidding to identify the classifier

that gets to classify the instance 𝐢 = π‘‡πœˆπ‘“βˆ’π›½πœ

  • 𝜈 is a function of specificity or

generality of the classifier

slide-24
SLIDE 24

LCS structure

Shabnam Nazmi

25 [P] [M] [A] [A ]

Covering CR Genetic algorithm Update rule parameters

Data set

Training instance

Model

[A]: classifiers having the same action as the winning classifier [A ]: [M] – [A] Genetic algorithm: randomly picks two classifiers from [A] and creates two off- springs

  • Off-springs are replaced into the [P]
slide-25
SLIDE 25

LCS structure

Shabnam Nazmi

26 [P] [M] [A] [A ]

Covering CR Genetic algorithm Update rule parameters

Data set

Training instance

Model

Genetic algorithm: favors classifiers with higher fitness value and lower confidence estimate error simultaneously

slide-26
SLIDE 26

LCS structure

Shabnam Nazmi

27 [P] [M] [A] [A ]

Covering CR Genetic algorithm Update rule parameters

Data set

Training instance

Model

Taxes are deducted from classifiers in both sets πœπ‘— = 𝑋

𝑗 βˆ’ 𝐷 1

  • Delta rule update scheme

𝑋

𝑗 ← 𝑋 𝑗 + 𝛾 𝐷 βˆ’ 𝑋 𝑗

  • Fitness and error proportionate

recourse sharing scheme 𝑆𝑗 = π‘‡π‘—π‘“βˆ’π›½πœπ‘— 𝑇

π‘˜π‘“βˆ’π›½πœπ‘˜ π‘˜βˆˆ[𝐡]

𝑆0

slide-27
SLIDE 27

LCS structure

Shabnam Nazmi

28 [P] [M] [A] [A ]

Covering CR Genetic algorithm Update rule parameters

Data set

Training instance

Model

Model: the population of trained classifiers (rules) that collectively solve the classification problem, after proper number of training iterations

slide-28
SLIDE 28

Performance measures

  • Hamming loss is employed as a measure of accuracy and

plotted against training iterations

  • The average confidence estimate error of the population is

plotted against training iterations

  • In the test stage
  • The prediction of the model is generated based on the votes from

classifiers that match the instance

  • The confidence level of classification is reported as the weighted

average of the confidence estimates of the classifiers that match the instance

Shabnam Nazmi

29

slide-29
SLIDE 29

Simulation results

  • Artificial Binary-valued data set: five attributes and two classes
  • Artificial real-valued data set: four attributes and two classes,

attribute range is (βˆ’0.5, 0.5)

Shabnam Nazmi

30

slide-30
SLIDE 30

Simulation results

  • Iris data: a three class data set with 50 samples per class
  • All data are used for training
  • Results averaged over 10 runs

Shabnam Nazmi

31 Method OVO SVM MLP Logistic Regression Random Forest LCS Accuracy 97.33 99.48 98 100 98

slide-31
SLIDE 31

Conclusion and future work

οƒΌStrength-based learning classifier system is employed to design an embedded MLC algorithm οƒΌClassifier structure is adapted to handle confidence level in labels provided in the training set οƒΌModel is tested on one real-world data set and two artificial datasets and results are provided Appropriate performance measures for test accuracy needs to be implemented MLC method discussed here will be extended to accuracy- based classifier system (UCS)

Shabnam Nazmi

32

slide-32
SLIDE 32

Shabnam Nazmi

33

Thank you for your attention! Your questions are welcome and feedback are appreciated!