using rule based classifier
play

using rule-based classifier systems Shabnam Nazmi (PhD candidate) - PowerPoint PPT Presentation

Multi-label classification using rule-based classifier systems Shabnam Nazmi (PhD candidate) Department of electrical and computer engineering North Carolina A&T state university Advisor: Dr. A. Homaifar outline Motivation


  1. Multi-label classification using rule-based classifier systems Shabnam Nazmi (PhD candidate) Department of electrical and computer engineering North Carolina A&T state university Advisor: Dr. A. Homaifar

  2. outline • Motivation • Introduction • Multi-label classification overview Shabnam Nazmi • Confidence level in prediction • Multi-label classification using learning classifier systems (LCSs) • Simulation results • Conclusion and future works 2

  3. Motivation • Data-driven techniques are ubiquitous in many applications such as classification, estimation and modeling • In some classification applications, samples in the data set attribute to more than one class simultaneously Shabnam Nazmi • Multi-label classification methods that solve a single problem are in advantage • The level of confidence in assigned labels to the samples, is vital to train an accurate machine • When modeling a dynamical system, the overlap among adjacent sub-models can be handled using multi-label data with appropriate confidence levels 3

  4. Introduction Multi-class classification Multi-label classification Shabnam Nazmi 4

  5. Introduction Multi-class classification Multi-label classification • In contrast to simple binary-class classification, each instance Shabnam Nazmi of the data set belongs to one of (𝑁 > 2) different classes • The goal is to construct a function which, given a new data point, will correctly predict the class to which the new point belongs • One-vs-all: trains 𝑁 binary classifiers one for each class • One-vs-all: trains 𝑁(𝑁 − 1) classifiers to distinguish each pair of classes 5 • Decision tress, naïve Bayes, neural networks,…

  6. Introduction Multi-class classification Multi-label classification • In contrast to conventional (single-label) classification, Shabnam Nazmi the setting of multi-label classification (MLC) allows an instance to belong to several classes simultaneously. • Multi-label classification tasks are ubiquitous in real-world problems • Text categorization: each document may belong to several predefined topics • Bioinformatics: one protein may have many effects on a cell when 6 predicting its functional classes

  7. Definitions • Notation: 𝐸: 𝑛𝑣𝑚𝑢𝑗 − 𝑚𝑏𝑐𝑓𝑚 𝑒𝑏𝑢𝑏 𝑡𝑓𝑢 𝐼: 𝑌 → 𝑍 𝑗 , 𝑍 𝑗 𝜗 𝑍 𝑍 = 𝑧 1 , 𝑧 2 , … , 𝑧 𝑚 • Label cardinality of 𝐸 : the average number of labels of the Shabnam Nazmi examples in 𝐸 • Label density of 𝐸 : the average number of labels of the examples in 𝐸 divided by |𝑍| • Hamming loss: |𝐸| 𝐼𝑀 𝐼, 𝐸 = 1 |𝐸| |𝑍 𝑗 ∆𝑍 𝑗 | |𝑍| 𝑗=1 • Ranking loss: 7 |𝐸| 𝑆𝑀 𝑔 = 1 1 |𝐸| | |𝑆(𝑦)| 𝑍 |𝑍 𝑗=1

  8. MLC methods • Problem transformation methods • Algorithm adaptation methods Shabnam Nazmi 8

  9. MLC methods • Problem transformation methods • Select family: discards ML data or selects one of the multiple labels for each instance • It discard a lot of information content in the original dataset • Label power set method: considers each different set of labels, as Shabnam Nazmi a single label • It may lead to large number of classes with a few examples per class • Binary relevance: learns |𝑍| binary classifiers one for each different label • The most common problem transformation method • Ranking by pairwise comparison: generates |𝑍| 2 binary label data sets 9 • Outputs a ranking of labels based on the votes from binary classifiers

  10. MLC methods • Problem transformation methods • Select family: discards ML data or selects one of the multiple labels for each instance • It discard a lot of information content in the original dataset • Label power set method : considers each different set of labels, as Shabnam Nazmi a single label • It may lead to large number of classes with a few examples per class • Random 𝑙 -labelsets: breaks the initial set of labels into small random, disjoint or overlapping, subsets • Improves label power set results, still is challenged with domains with large number of labels and instances 10

  11. MLC methods • Algorithm adaptation methods • Decision trees: 𝐷4.5 was adapted to learn ML data • ML models that are understandable by human • Probabilistic methods: proposed for text classification, a generative model is trained according to which, each label Shabnam Nazmi generates different words • The ML document in generated by a mixture of the word distributions of its labels using EM • Neural networks: the back-propagation algorithm is adapted by introduction of a new error function similar to ranking loss • Lazy methods: 𝑙 -nearest neighbors algorithm is used to maximize the posterior probability of labels assigned to new instances • Outputs a ranking function for the probability of each label 11

  12. MLC methods • Algorithm adaptation methods • Support vector machines: the one-versus-one strategy is used to partition a dataset with 𝑍 labels into 𝑍 2 double label subsets. • Assumes double label instances are located at marginal region Shabnam Nazmi between positive and negative instances • Associative classification methods: constructs classification rule sets using associative rule mining • MMAC learns an initial set of rules, removes the examples associated with this rule set, and recursively learns a new rule set from the remaining examples until no further frequent items are left. 12

  13. Confidence in prediction • The AdaBoost algorithm has been extended to generate a confidence degree for the predictions of “weaker” hypotheses • Confidence scores give a reliability of each prediction • Classification methods like probabilistic approaches and Shabnam Nazmi logistic regression, output a value as a probability of a label to be true • The idea of confidence in prediction can be extended to one step prior to training 13

  14. Confidence in prediction • The AdaBoost algorithm has been extended to generate a confidence degree for the predictions of “weaker” hypotheses • Confidence scores give a reliability of each prediction • Classification methods like probabilistic approaches and Shabnam Nazmi logistic regression, output a value as a probability of a label to be true • The idea of confidence in prediction can be extended to one step prior to training Encounter confidence levels in training data provided by the expert 14

  15. Confidence in prediction • The AdaBoost algorithm has been extended to generate a confidence degree for the predictions of “weaker” hypotheses • Confidence scores give a reliability of each prediction • Classification methods like probabilistic approaches and Shabnam Nazmi logistic regression, output a value as a probability of a label to be true • The idea of confidence in prediction can be extended to one step prior to training Encounter confidence levels in training data provided by the expert 15 • The hypothesis will learn confidence levels and output a confidence degree along with its predicted labels for new instances

  16. Notations • 𝑌 denotes the instance space and 𝑍 = {𝑧 1 , 𝑧 2 , … , 𝑧 𝑙 } is the finite set of class labels • Each instance 𝑦 ∈ 𝑌 is associated with a subset of labels 𝑧 ⊂ 𝑍 Shabnam Nazmi • 𝐸 is the set of data 𝐸 = { 𝑦 1 , 𝜇 1 , 𝐷 1 , 𝑦 2 , 𝜇 2 , 𝐷 2 , … 𝑦 𝑜 , 𝜇 𝑜 , 𝐷 𝑜 } • 𝜇 𝑗 is the binary relevance vector of labels for instance 𝑦 𝑗 𝜇 𝑗,𝑘 = {1: 𝑧 𝑘 ∈ 𝑧, 0: 𝑧 𝑘 ∉ 𝑧|∀𝑗 ∈ 1, 𝑜 , 𝑘 ∈ [1, 𝑙]} ) along • 𝐼: 𝑌 → (𝑍, 𝐷) , outputs a set of predicted labels (𝑍 with a vector of confidence level (𝑋) of the hypothesis in each of the labels 16

  17. LCS structure • A strength based Michigan-style classifier system has been used to extract knowledge from ML data • Michigan-style classifier system are rule-based and supervised learning systems with a fixed rule length Shabnam Nazmi • Genetic algorithm acts as a driving force to help evolve useful rules • Classification model consists of a population of rule in the form of “IF condition - THEN action” • Originally structured for learning binary classification problems • Isolated structure of the action part of the classifiers, lets further modifications to adapt to more general cases of 17 classification problems, namely multi-class and multi-label

  18. LCS structure Data set Training instance Model Covering [P] Shabnam Nazmi Genetic algorithm CR [M] Data set: a set of triples in the form of: (sample, label, confidence level) ] [ A Training instance: randomly drawn individual [A] from the data set Update rule 19 parameters

  19. LCS structure Data set Training instance Model Covering [P] Shabnam Nazmi Genetic algorithm CR [M] [P]: population of rules/classifiers Classifier parameters: ] • [ A Condition [A] • Action Strength ( 𝑇 ) • Update rule 20 • Confidence estimate parameters 𝑋 = 𝑥 1 , 𝑥 2 , … , 𝑥 𝑙 Confidence error ( 𝜁 ) •

  20. LCS structure Data set Training instance Model Covering [P] Shabnam Nazmi Genetic algorithm CR [M] Condition: • For binary-valued attributes ] composed of {0,1, #} [ A [A] • For real-valued attributes takes the form of an ordered list of pairs of Update rule 21 center and spread (𝑑 𝑗 , 𝑡 𝑗 ) parameters

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend