Data Mining using Ant Colony Optimization Thanks to: Johannes - PDF document

Data Mining using Ant Colony Optimization Thanks to: Johannes Singler, Bryan Atkinson Presentation Outline • Introduction to Data Mining • Rule Induction for Classification • AntMiner – Overview: Input/Output – Rule Construction – Quality Measurement – Pheromone: Initial/Updating – Experiments/Results – Performance/Complexity • Swarm-based Genetic Programming – Introduction to GP, Symbolic Regression – Crossover problems – Ant Colony Crossover – Experiments and Results Introduction • Data Mining tries to find: – hidden knowledge – unexpected patterns – new rules – in large databases. • “Discovery of useful summaries of data” • Is a key element of much more elaborate process: Knowledge Discovery in Databases (KDD) 1

Goals of Rule Induction • Stage of Data Mining: Rule Induction • Find rules to describe data in some way – Not only accurate… – …but also comprehensible for a human user… – …to support decision making decision making Focus in this Talk • Rule Induction for Classification using ACO – Given: training set (instances/cases to classify) – Goal: to come up with (preferably simple) rules to classify data • Algorithm by Parpinelli, Lopes and Freitas: AntMiner • ACO + Genetic Programming – Symbolic regression Rule Induction • Possible Outputs for Rule Induction – decision trees if <attribute1>=<value1> and – (ordered) decision <attribute2>=<value2> and… lists [here] then <class>=<class1> else if… – … 2

AntMiner Input • Training set / test set • Attribute / value pairs • Given classes / classification AntMiner Output • Ordered decision list – Ordered list of IF-THEN-Rules like IF <condition> THEN <class> • <condition> = <term1> AND <term2> AND… – <term> = <attribute> ‘=‘ <value> – + Default rule (majority value) – First rule “fires”. • Only discrete attributes supported so far. – Continuous values must be discretized before. • This is a quite limited version of a decision list. Prerequisites for an ACO (Review) • Problem-dependent heuristic function ( η ) for measuring the quality of items that could be added to the partial solution so far. • Pheromone updating rule ( τ ) • Probabilistic transition rule based on η and τ • Difference to most ACO algorithms mentioned in class: Does not use a graph representation of the problem. 3

AntMiner Algorithm: Top-Level • Pseudo-Code for finding one rule set: trainingSet = {all training cases} discoveredRuleList = [ ] WHILE(| trainingSet | still too big) Initialize pheromone (equally distributed) Ants try to find a good classification rule by the ACO heuristic Add best rule found to discoveredRuleList Remove correctly covered examples from trainingSet AntMiner Algorithm: Mid-Level • Pseudo-Code for finding one rule: Repeat Start new ant with empty rule (antecedent) Construct rule by adding one term at a time and choosing the rule consequent subsequently Prune rule Increase pheromone on trail which ant used according to the quality of the rule Until (maximum number z of ants exceeded) or (no improvement any more during the last k iterations) • Actually only the population of one ant at a time working. AntMiner Algorithm: Bottom- Level • Repeat as long as possible: – Add one condition to the rule. • Use probabilistic approach referring to pheromone concentration and heuristic. • Do not use attributes twice . • Resulting rule must cover at least a minimum of cases. • After having finished the antecedent, calculate the resulting class. 4

Rule Construction • Probability for adding <A i >=<V ij > P ij = � ij � ij (t) [normalized] • where – A i the i-th attribute – V ij the j-th possible value of the i-th attribute – η heuristic function, τ pheromone trail Heuristic Function ( η ) • Analogous to: – Proximity function in TSP – Colouring matrix in graph colouring problem. • Uses information theory (entropy). – Split instances using rule. – Quality corresponds to entropy of remaining “buckets”; the less, the better. k � H(W|A j = V ij ) = � (P(w | A j = V ij ). log 2 P(w | A j = V ij )) w = 1 � ij � log 2 k � H(W | A j = V ij ) [normalized] where k is number of classes Information Heuristic Example For T, high = >80, mild = 70<T ≤ 80, cold = 0<T ≤ 70 (for later) P(play|outlook=sunny)=2/14=0.143, P(don’t play|outlook=sunny)=3/14=0.214 H(W,outlook=sunny)=-0.143.log(0.143)-0.214.log(0.214)=0.877 η = log 2 k − H(W,outlook=sunny) = 1 − 0.877=0.123 5

Information Heuristic Example For H, high = >85, normal = 0<T ≤ 85, (for later) P(play|outlook=overcast)=4/14=0.286, P(don’t play|outlook=overcast)=0/14=0 H(W,outlook=sunny)=-0.286.log(0.286)=0.516 η = log 2 k − H(W,outlook=sunny) = 1 − 0.516=0.484 Quality Function • Measuring the classification quality of a rule / several rules. – For one rule: sensitivity · specificity TP TN Q = TP + FN . FP + TN where T=true, F=false, P=positive, N=negative – The bigger the value of Q, the better • Measuring the simplicity of a rule: – number of rules · average number of terms per rule – The less, the simpler, thus the better. Rule Pruning • Iteratively remove one-term-at-a-time from the rule while this process improves the classification accuracy of the rule. – Majority class might change. – If ambiguous, remove term that improves the accuracy the most. – Simplicity improves anyway. 6

Pheromone • Initial pheromone value: 1 � ij (t = 0 ) = [normalized] a � b i i = 1 where a is the total number of attributes and b i is the number of possible values of A i . Pheromone Updating ( τ ) • Values before (1). • First increase pheromone of used terms regarding rule quality (2): � ij (t + 1 ) = � ij (t).( 1 + Q) • Then normalize the pheromone level of all terms → pheromone evaporation (3) Using the Discovered Rules • Apply in the order they were discovered. • First rule that covers case is applied. • If no rule covers case, apply default result (majority value). 7

Possible Discretization of Continuous Attributes • Use C4.5-Disc • Quick overview: – Extract reduced data set that only contains attribute to discretize and desired classification. – From that build up decision tree using the C4.5 algorithm (another rule induction algorithm). – Result: Decision tree with binary decisions x ≤ a → go left; x > a → go right – Each path corresponds to the definition of a categorical interval. AntMiner’s Parameters • Number of ants (3000 used in experiments). Also limits the maximum number of rules found for a classification. Is not necessarily exploited because algorithm might converge before. • Minimum number of cases per rule (10). Each rule must at least cover so many cases. Avoids overfitting. • Maximum number of uncovered classes in the training set (10). The algorithm stops when there are only fewer instances left. • Number of rules to test for the convergence of the ants (10). The algorithm waits so long for an improvement. Sample Run Start • Deciding whether to play outside • Sample run for finding one rule set. – Attributes: outlook, temperature, humidity, • Start: I={all}, R={} windy, play • Ant 1: Choose probabilistically – Classes: play (yes), do not play (no) outlook=overcast (then play=yes) • Ant 1: Chooses values for other – sunny,hot,high,FALSE,no (1) attributes… – sunny,hot,high,TRUE,no (2) • Ant 1: Finishes because all attributes are – overcast,hot,normal,FALSE,yes (3) used. – rainy,mild,high,FALSE,yes (4) • Ant 1: Last three conditions are pruned – rainy,cool,normal,FALSE,yes (5) away. – rainy,cool,normal,TRUE,no (6) • I={1,2,4,5,6,8,9,10,11,14}, – overcast,cool,normal,TRUE,yes (7) R={outlook=overcast → yes) – sunny,mild,high,FALSE,no (8) • Ant 2: Choose outlook=rainy (then – sunny,cool,normal,FALSE,yes (9) play=yes) – rainy,mild,normal,FALSE,yes (10) – sunny,mild,normal,TRUE,yes (11) • Rule is not good enough (3:2) – overcast,mild,high,TRUE,yes (12) • Ant 2: Choose windy=true (then play=no) – overcast,hot,normal,FALSE,yes (13) • Ant 2 finishes because otherwise covered – rainy,mild,high,TRUE,no (14) set would be too small. • No pruning possible either. • … 8

Sample Run Result • Possible result (not most simple): – outlook=overcast → play=yes outlook=rainy, windy=false → play=yes outlook=sunny, humidity=normal → play=yes otherwise → play=no Comparison to CN2 Algorithm • Uses beam search (limited breadth first search with beam width b). • Add all possible terms to current partial rules, evaluate, and retain only the b best ones. • No feedback for constructing new rules. • Output format is the same (ordered rule list). • Uses entropy heuristic as well. Experiment Setup • Dimension roughly: 100…1000 cases, 9…34 attributes, 2…6 classes • Tests run using a 10-fold cross-validation procedure – Divide data into 10 partitions. – For each partition do • Treat it as the test data and use the other 90% as the training data. • Measure the performance. – Take the average value. • This helps to achieve significant results. 9

Data Mining using Ant Colony Optimization Thanks to: Johannes - PDF document

Data Mining using Ant Colony Optimization Thanks to: Johannes Singler, Bryan Atkinson Presentation Outline Introduction to Data Mining Rule Induction for Classification AntMiner Overview: Input/Output Rule Construction

Evolutionary Algorithm 2. Swarm Intelligence and Ant Colony Optimization Ant Colony Optimization

Ant Colony Optimization Marco Chiarandini Outline 1. Swarm Intelligence and Ant Colony

Algorithms in Nature Ant colony optimization 2 Last time Ant colony optimization: An

Ant Colony Optimization and the Minimum Cut Problem Timo K otzing, Per Kristian Lehre, Frank

Outline Application Examples DM812 METAHEURISTICS 1. Ant Colony Optimization Context Lecture

Ant Colony Optimization By: Aaron Obernuefemann October 22 nd , 2012 Data Mining Methods MATH

ant colony optimization Clay McLeod April 27, 2015 University of Mississippi lists Ant

Ant Colony Optimization Marco Chiarandini Department of Mathematics & Computer Science

Ant Colony Optimization Algorithm and Approaches in Robot Path Planning Katinka B ohm

Ant Colony Optimization Capacited Vehicle Routing Linear Ordering Exercises Marco Chiarandini

Ant Colony Optimization EDWIN WONG PHILLIP SUMMERS ROSALYN KU PATRICK XIE PIC 10C SPRING 2011

Expanding the Zoo We have snakes and armadillos. Let's add ants. An ant has a weight a

Web Mining Web Mining Web Mining Web Mining Web mining is the use of data mining techniques

Task Dependencies: ant Steven J Zeil February 25, 2013 Task Dependencies: ant Outline

Introduction What is data mining? to Data Mining: On what kind of data? Data Mining

MuACO sm A New Mutation-Based Ant Colony Optimization Algorithm for Learning Finite-State

Question-Answering LR&E roadmap proposal Gnter Neumann DFKI, Saarbrcken QA: general

ON-BOTTOM STABILITY CALCULATIONS FOR FIBRE OPTIC SUBMARINE CABLES Inge Vintermyr Nexans Norway

GL Insight TM Modem Analysis Training 818 West Diamond Avenue - Third Floor, Gaithersburg, MD

1 Good morning, Ladies and Gentlemen. My name is Nadine De Greef I am the Secretary General of

Conserving Fitness Evaluations by Marking Used Transitions Daniil Chivilikhin and Vladimir

Teach Yourself Ant Clustering in Ten Minutes Karan K. Budhraja What to Expect Basic model

Test-Based Extended Finite-State Machines Induction with Evolutionary Algorithms and Ant Colony

Meta-heuristic Based Cloud Resource Provisioning Approach Agenda for Today Introduction

Data Mining using Ant Colony Optimization Thanks to: Johannes - PDF document

Data Mining using Ant Colony Optimization Thanks to: Johannes Singler, Bryan Atkinson Presentation Outline Introduction to Data Mining Rule Induction for Classification AntMiner Overview: Input/Output Rule Construction

Evolutionary Algorithm 2. Swarm Intelligence and Ant Colony Optimization Ant Colony Optimization

Ant Colony Optimization Marco Chiarandini Outline 1. Swarm Intelligence and Ant Colony

Algorithms in Nature Ant colony optimization 2 Last time Ant colony optimization: An

Ant Colony Optimization and the Minimum Cut Problem Timo K otzing, Per Kristian Lehre, Frank

Outline Application Examples DM812 METAHEURISTICS 1. Ant Colony Optimization Context Lecture

Ant Colony Optimization By: Aaron Obernuefemann October 22 nd , 2012 Data Mining Methods MATH

ant colony optimization Clay McLeod April 27, 2015 University of Mississippi lists Ant

Ant Colony Optimization Marco Chiarandini Department of Mathematics &amp; Computer Science

Ant Colony Optimization Algorithm and Approaches in Robot Path Planning Katinka B ohm

Ant Colony Optimization Capacited Vehicle Routing Linear Ordering Exercises Marco Chiarandini

Ant Colony Optimization EDWIN WONG PHILLIP SUMMERS ROSALYN KU PATRICK XIE PIC 10C SPRING 2011

Expanding the Zoo We have snakes and armadillos. Let's add ants. An ant has a weight a

Web Mining Web Mining Web Mining Web Mining Web mining is the use of data mining techniques

Task Dependencies: ant Steven J Zeil February 25, 2013 Task Dependencies: ant Outline

Introduction What is data mining? to Data Mining: On what kind of data? Data Mining

MuACO sm A New Mutation-Based Ant Colony Optimization Algorithm for Learning Finite-State

Question-Answering LR&amp;E roadmap proposal Gnter Neumann DFKI, Saarbrcken QA: general

ON-BOTTOM STABILITY CALCULATIONS FOR FIBRE OPTIC SUBMARINE CABLES Inge Vintermyr Nexans Norway

GL Insight TM Modem Analysis Training 818 West Diamond Avenue - Third Floor, Gaithersburg, MD

1 Good morning, Ladies and Gentlemen. My name is Nadine De Greef I am the Secretary General of

Conserving Fitness Evaluations by Marking Used Transitions Daniil Chivilikhin and Vladimir

Teach Yourself Ant Clustering in Ten Minutes Karan K. Budhraja What to Expect Basic model

Test-Based Extended Finite-State Machines Induction with Evolutionary Algorithms and Ant Colony

Meta-heuristic Based Cloud Resource Provisioning Approach Agenda for Today Introduction

Ant Colony Optimization Marco Chiarandini Department of Mathematics & Computer Science

Question-Answering LR&E roadmap proposal Gnter Neumann DFKI, Saarbrcken QA: general