using rule-based classifier systems Shabnam Nazmi (PhD candidate) - PowerPoint PPT Presentation

Multi-label classification using rule-based classifier systems Shabnam Nazmi (PhD candidate) Department of electrical and computer engineering North Carolina A&T state university Advisor: Dr. A. Homaifar

outline • Motivation • Introduction • Multi-label classification overview Shabnam Nazmi • Confidence level in prediction • Multi-label classification using learning classifier systems (LCSs) • Simulation results • Conclusion and future works 2

Motivation • Data-driven techniques are ubiquitous in many applications such as classification, estimation and modeling • In some classification applications, samples in the data set attribute to more than one class simultaneously Shabnam Nazmi • Multi-label classification methods that solve a single problem are in advantage • The level of confidence in assigned labels to the samples, is vital to train an accurate machine • When modeling a dynamical system, the overlap among adjacent sub-models can be handled using multi-label data with appropriate confidence levels 3

Introduction Multi-class classification Multi-label classification Shabnam Nazmi 4

Introduction Multi-class classification Multi-label classification • In contrast to simple binary-class classification, each instance Shabnam Nazmi of the data set belongs to one of (𝑁 > 2) different classes • The goal is to construct a function which, given a new data point, will correctly predict the class to which the new point belongs • One-vs-all: trains 𝑁 binary classifiers one for each class • One-vs-all: trains 𝑁(𝑁 − 1) classifiers to distinguish each pair of classes 5 • Decision tress, naïve Bayes, neural networks,…

Introduction Multi-class classification Multi-label classification • In contrast to conventional (single-label) classification, Shabnam Nazmi the setting of multi-label classification (MLC) allows an instance to belong to several classes simultaneously. • Multi-label classification tasks are ubiquitous in real-world problems • Text categorization: each document may belong to several predefined topics • Bioinformatics: one protein may have many effects on a cell when 6 predicting its functional classes

Definitions • Notation: 𝐸: 𝑛𝑣𝑚𝑢𝑗 − 𝑚𝑏𝑐𝑓𝑚 𝑒𝑏𝑢𝑏 𝑡𝑓𝑢 𝐼: 𝑌 → 𝑍 𝑗 , 𝑍 𝑗 𝜗 𝑍 𝑍 = 𝑧 1 , 𝑧 2 , … , 𝑧 𝑚 • Label cardinality of 𝐸 : the average number of labels of the Shabnam Nazmi examples in 𝐸 • Label density of 𝐸 : the average number of labels of the examples in 𝐸 divided by |𝑍| • Hamming loss: |𝐸| 𝐼𝑀 𝐼, 𝐸 = 1 |𝐸| |𝑍 𝑗 ∆𝑍 𝑗 | |𝑍| 𝑗=1 • Ranking loss: 7 |𝐸| 𝑆𝑀 𝑔 = 1 1 |𝐸| | |𝑆(𝑦)| 𝑍 |𝑍 𝑗=1

MLC methods • Problem transformation methods • Algorithm adaptation methods Shabnam Nazmi 8

MLC methods • Problem transformation methods • Select family: discards ML data or selects one of the multiple labels for each instance • It discard a lot of information content in the original dataset • Label power set method: considers each different set of labels, as Shabnam Nazmi a single label • It may lead to large number of classes with a few examples per class • Binary relevance: learns |𝑍| binary classifiers one for each different label • The most common problem transformation method • Ranking by pairwise comparison: generates |𝑍| 2 binary label data sets 9 • Outputs a ranking of labels based on the votes from binary classifiers

MLC methods • Problem transformation methods • Select family: discards ML data or selects one of the multiple labels for each instance • It discard a lot of information content in the original dataset • Label power set method : considers each different set of labels, as Shabnam Nazmi a single label • It may lead to large number of classes with a few examples per class • Random 𝑙 -labelsets: breaks the initial set of labels into small random, disjoint or overlapping, subsets • Improves label power set results, still is challenged with domains with large number of labels and instances 10

MLC methods • Algorithm adaptation methods • Decision trees: 𝐷4.5 was adapted to learn ML data • ML models that are understandable by human • Probabilistic methods: proposed for text classification, a generative model is trained according to which, each label Shabnam Nazmi generates different words • The ML document in generated by a mixture of the word distributions of its labels using EM • Neural networks: the back-propagation algorithm is adapted by introduction of a new error function similar to ranking loss • Lazy methods: 𝑙 -nearest neighbors algorithm is used to maximize the posterior probability of labels assigned to new instances • Outputs a ranking function for the probability of each label 11

MLC methods • Algorithm adaptation methods • Support vector machines: the one-versus-one strategy is used to partition a dataset with 𝑍 labels into 𝑍 2 double label subsets. • Assumes double label instances are located at marginal region Shabnam Nazmi between positive and negative instances • Associative classification methods: constructs classification rule sets using associative rule mining • MMAC learns an initial set of rules, removes the examples associated with this rule set, and recursively learns a new rule set from the remaining examples until no further frequent items are left. 12

Confidence in prediction • The AdaBoost algorithm has been extended to generate a confidence degree for the predictions of “weaker” hypotheses • Confidence scores give a reliability of each prediction • Classification methods like probabilistic approaches and Shabnam Nazmi logistic regression, output a value as a probability of a label to be true • The idea of confidence in prediction can be extended to one step prior to training 13

Confidence in prediction • The AdaBoost algorithm has been extended to generate a confidence degree for the predictions of “weaker” hypotheses • Confidence scores give a reliability of each prediction • Classification methods like probabilistic approaches and Shabnam Nazmi logistic regression, output a value as a probability of a label to be true • The idea of confidence in prediction can be extended to one step prior to training Encounter confidence levels in training data provided by the expert 14

Confidence in prediction • The AdaBoost algorithm has been extended to generate a confidence degree for the predictions of “weaker” hypotheses • Confidence scores give a reliability of each prediction • Classification methods like probabilistic approaches and Shabnam Nazmi logistic regression, output a value as a probability of a label to be true • The idea of confidence in prediction can be extended to one step prior to training Encounter confidence levels in training data provided by the expert 15 • The hypothesis will learn confidence levels and output a confidence degree along with its predicted labels for new instances

Notations • 𝑌 denotes the instance space and 𝑍 = {𝑧 1 , 𝑧 2 , … , 𝑧 𝑙 } is the finite set of class labels • Each instance 𝑦 ∈ 𝑌 is associated with a subset of labels 𝑧 ⊂ 𝑍 Shabnam Nazmi • 𝐸 is the set of data 𝐸 = { 𝑦 1 , 𝜇 1 , 𝐷 1 , 𝑦 2 , 𝜇 2 , 𝐷 2 , … 𝑦 𝑜 , 𝜇 𝑜 , 𝐷 𝑜 } • 𝜇 𝑗 is the binary relevance vector of labels for instance 𝑦 𝑗 𝜇 𝑗,𝑘 = {1: 𝑧 𝑘 ∈ 𝑧, 0: 𝑧 𝑘 ∉ 𝑧|∀𝑗 ∈ 1, 𝑜 , 𝑘 ∈ [1, 𝑙]} ) along • 𝐼: 𝑌 → (𝑍, 𝐷) , outputs a set of predicted labels (𝑍 with a vector of confidence level (𝑋) of the hypothesis in each of the labels 16

LCS structure • A strength based Michigan-style classifier system has been used to extract knowledge from ML data • Michigan-style classifier system are rule-based and supervised learning systems with a fixed rule length Shabnam Nazmi • Genetic algorithm acts as a driving force to help evolve useful rules • Classification model consists of a population of rule in the form of “IF condition - THEN action” • Originally structured for learning binary classification problems • Isolated structure of the action part of the classifiers, lets further modifications to adapt to more general cases of 17 classification problems, namely multi-class and multi-label

LCS structure Data set Training instance Model Covering [P] Shabnam Nazmi Genetic algorithm CR [M] Data set: a set of triples in the form of: (sample, label, confidence level) ] [ A Training instance: randomly drawn individual [A] from the data set Update rule 19 parameters

LCS structure Data set Training instance Model Covering [P] Shabnam Nazmi Genetic algorithm CR [M] [P]: population of rules/classifiers Classifier parameters: ] • [ A Condition [A] • Action Strength ( 𝑇 ) • Update rule 20 • Confidence estimate parameters 𝑋 = 𝑥 1 , 𝑥 2 , … , 𝑥 𝑙 Confidence error ( 𝜁 ) •

LCS structure Data set Training instance Model Covering [P] Shabnam Nazmi Genetic algorithm CR [M] Condition: • For binary-valued attributes ] composed of {0,1, #} [ A [A] • For real-valued attributes takes the form of an ordered list of pairs of Update rule 21 center and spread (𝑑 𝑗 , 𝑡 𝑗 ) parameters

using rule-based classifier systems Shabnam Nazmi (PhD candidate) - PowerPoint PPT Presentation

Multi-label classification using rule-based classifier systems Shabnam Nazmi (PhD candidate) Department of electrical and computer engineering North Carolina A&T state university Advisor: Dr. A. Homaifar outline Motivation

Using Rule-Based Activity Using Rule-Based Activity Using Rule-Based Activity Using Rule-Based

The Nave Bayes Classifier Machine Learning 1 Todays lecture The nave Bayes Classifier

Lazy Associative Classification Decision Tree Classifier (Eager) Associative Classifier By

Maximum Entropy Classifier Ensembling using Ge- netic Algorithm for NER in Bengali Asif Ekbal 1

Rule Changes - Non rule change year Review of 2017 rule changes - just the easy to forgot

Common Rule Advanced Notice of Proposed Rulemaking (ANPRM) IRB Investigator Advanced Notice

2nd RULE: You MUST TALK about BOOK CLUB. 2nd RULE: You DO NOT talk about 3rd RULE: PERSEVERE -- If

Rule #1: Have a takeaway. Rule #2: Keep It Simple. Rule #3: Repetition is Good. Rule #4: Be

Counting Rules, etc Product Rule Generalized Product Rule Division Rule Bijection

Data Mining with Weka Class 2 Lesson 1 Be a classifier! Ian H. Witten Department of Computer

When and Why to use a Classifier? When and Why to use a Classifier? Alan Rector Alan Rector

When and Why to use a Classifier? When and Why to use a Classifier? Alan Rector Alan Rector

Lecture 2: Nearest Neighbour Classifier Aykut Erdem September 2017 Hacettepe University Your

Data Classification Linear Classifier II Latent Differential Analysis Mean Classification

Classifier Selection Nicholas Ver Hoeve Craig Martek Ben Gardner Classifier Ensembles Assume

Classifier Classifier Systems Systems

Snider Tir Tire Optimizes Its Its Cu Customers- Stores-Plants Transportatio ion Network

Changing Perspective Common themes throughout past papers Repeated simple games with

Can Mathematics Help End the Scourge of Political Gerrymandering? Austin Fry frya2@xavier.edu

Effects of Constant Optimization by Nonlinear Contact: Michael Kommenda Least Squares

PRESENTATION MANUAL define PRESENTATION ON SCREEN This manual covers guidelines pertaining to

TEEN: MY LIFE SERIES PRESENTER: JESSICA GORDON FINKELSTEIN MEMORIAL LIBRARY SPRING VALLEY, NY

Life Insurance for the Global Citizen Pfleger Financial Since 1965, Pfleger Financial has

A presentation by Robin MacLean Programme for Government 2015/16: Growing your own food,