data mining using ant colony optimization
play

Data Mining using Ant Colony Optimization Thanks to: Johannes - PDF document

Data Mining using Ant Colony Optimization Thanks to: Johannes Singler, Bryan Atkinson Presentation Outline Introduction to Data Mining Rule Induction for Classification AntMiner Overview: Input/Output Rule Construction


  1. Data Mining using Ant Colony Optimization Thanks to: Johannes Singler, Bryan Atkinson Presentation Outline • Introduction to Data Mining • Rule Induction for Classification • AntMiner – Overview: Input/Output – Rule Construction – Quality Measurement – Pheromone: Initial/Updating – Experiments/Results – Performance/Complexity • Swarm-based Genetic Programming – Introduction to GP, Symbolic Regression – Crossover problems – Ant Colony Crossover – Experiments and Results Introduction • Data Mining tries to find: – hidden knowledge – unexpected patterns – new rules – in large databases. • “Discovery of useful summaries of data” • Is a key element of much more elaborate process: Knowledge Discovery in Databases (KDD) 1

  2. Goals of Rule Induction • Stage of Data Mining: Rule Induction • Find rules to describe data in some way – Not only accurate… – …but also comprehensible for a human user… – …to support decision making decision making Focus in this Talk • Rule Induction for Classification using ACO – Given: training set (instances/cases to classify) – Goal: to come up with (preferably simple) rules to classify data • Algorithm by Parpinelli, Lopes and Freitas: AntMiner • ACO + Genetic Programming – Symbolic regression Rule Induction • Possible Outputs for Rule Induction – decision trees if <attribute1>=<value1> and – (ordered) decision <attribute2>=<value2> and… lists [here] then <class>=<class1> else if… – … 2

  3. AntMiner Input • Training set / test set • Attribute / value pairs • Given classes / classification AntMiner Output • Ordered decision list – Ordered list of IF-THEN-Rules like IF <condition> THEN <class> • <condition> = <term1> AND <term2> AND… – <term> = <attribute> ‘=‘ <value> – + Default rule (majority value) – First rule “fires”. • Only discrete attributes supported so far. – Continuous values must be discretized before. • This is a quite limited version of a decision list. Prerequisites for an ACO (Review) • Problem-dependent heuristic function ( η ) for measuring the quality of items that could be added to the partial solution so far. • Pheromone updating rule ( τ ) • Probabilistic transition rule based on η and τ • Difference to most ACO algorithms mentioned in class: Does not use a graph representation of the problem. 3

  4. AntMiner Algorithm: Top-Level • Pseudo-Code for finding one rule set: trainingSet = {all training cases} discoveredRuleList = [ ] WHILE(| trainingSet | still too big) Initialize pheromone (equally distributed) Ants try to find a good classification rule by the ACO heuristic Add best rule found to discoveredRuleList Remove correctly covered examples from trainingSet AntMiner Algorithm: Mid-Level • Pseudo-Code for finding one rule: Repeat Start new ant with empty rule (antecedent) Construct rule by adding one term at a time and choosing the rule consequent subsequently Prune rule Increase pheromone on trail which ant used according to the quality of the rule Until (maximum number z of ants exceeded) or (no improvement any more during the last k iterations) • Actually only the population of one ant at a time working. AntMiner Algorithm: Bottom- Level • Repeat as long as possible: – Add one condition to the rule. • Use probabilistic approach referring to pheromone concentration and heuristic. • Do not use attributes twice . • Resulting rule must cover at least a minimum of cases. • After having finished the antecedent, calculate the resulting class. 4

  5. Rule Construction • Probability for adding <A i >=<V ij > P ij = � ij � ij (t) [normalized] • where – A i the i-th attribute – V ij the j-th possible value of the i-th attribute – η heuristic function, τ pheromone trail Heuristic Function ( η ) • Analogous to: – Proximity function in TSP – Colouring matrix in graph colouring problem. • Uses information theory (entropy). – Split instances using rule. – Quality corresponds to entropy of remaining “buckets”; the less, the better. k � H(W|A j = V ij ) = � (P(w | A j = V ij ). log 2 P(w | A j = V ij )) w = 1 � ij � log 2 k � H(W | A j = V ij ) [normalized] where k is number of classes Information Heuristic Example For T, high = >80, mild = 70<T ≤ 80, cold = 0<T ≤ 70 (for later) P(play|outlook=sunny)=2/14=0.143, P(don’t play|outlook=sunny)=3/14=0.214 H(W,outlook=sunny)=-0.143.log(0.143)-0.214.log(0.214)=0.877 η = log 2 k − H(W,outlook=sunny) = 1 − 0.877=0.123 5

  6. Information Heuristic Example For H, high = >85, normal = 0<T ≤ 85, (for later) P(play|outlook=overcast)=4/14=0.286, P(don’t play|outlook=overcast)=0/14=0 H(W,outlook=sunny)=-0.286.log(0.286)=0.516 η = log 2 k − H(W,outlook=sunny) = 1 − 0.516=0.484 Quality Function • Measuring the classification quality of a rule / several rules. – For one rule: sensitivity · specificity TP TN Q = TP + FN . FP + TN where T=true, F=false, P=positive, N=negative – The bigger the value of Q, the better • Measuring the simplicity of a rule: – number of rules · average number of terms per rule – The less, the simpler, thus the better. Rule Pruning • Iteratively remove one-term-at-a-time from the rule while this process improves the classification accuracy of the rule. – Majority class might change. – If ambiguous, remove term that improves the accuracy the most. – Simplicity improves anyway. 6

  7. Pheromone • Initial pheromone value: 1 � ij (t = 0 ) = [normalized] a � b i i = 1 where a is the total number of attributes and b i is the number of possible values of A i . Pheromone Updating ( τ ) • Values before (1). • First increase pheromone of used terms regarding rule quality (2): � ij (t + 1 ) = � ij (t).( 1 + Q) • Then normalize the pheromone level of all terms → pheromone evaporation (3) Using the Discovered Rules • Apply in the order they were discovered. • First rule that covers case is applied. • If no rule covers case, apply default result (majority value). 7

  8. Possible Discretization of Continuous Attributes • Use C4.5-Disc • Quick overview: – Extract reduced data set that only contains attribute to discretize and desired classification. – From that build up decision tree using the C4.5 algorithm (another rule induction algorithm). – Result: Decision tree with binary decisions x ≤ a → go left; x > a → go right – Each path corresponds to the definition of a categorical interval. AntMiner’s Parameters • Number of ants (3000 used in experiments). Also limits the maximum number of rules found for a classification. Is not necessarily exploited because algorithm might converge before. • Minimum number of cases per rule (10). Each rule must at least cover so many cases. Avoids overfitting. • Maximum number of uncovered classes in the training set (10). The algorithm stops when there are only fewer instances left. • Number of rules to test for the convergence of the ants (10). The algorithm waits so long for an improvement. Sample Run Start • Deciding whether to play outside • Sample run for finding one rule set. – Attributes: outlook, temperature, humidity, • Start: I={all}, R={} windy, play • Ant 1: Choose probabilistically – Classes: play (yes), do not play (no) outlook=overcast (then play=yes) • Ant 1: Chooses values for other – sunny,hot,high,FALSE,no (1) attributes… – sunny,hot,high,TRUE,no (2) • Ant 1: Finishes because all attributes are – overcast,hot,normal,FALSE,yes (3) used. – rainy,mild,high,FALSE,yes (4) • Ant 1: Last three conditions are pruned – rainy,cool,normal,FALSE,yes (5) away. – rainy,cool,normal,TRUE,no (6) • I={1,2,4,5,6,8,9,10,11,14}, – overcast,cool,normal,TRUE,yes (7) R={outlook=overcast → yes) – sunny,mild,high,FALSE,no (8) • Ant 2: Choose outlook=rainy (then – sunny,cool,normal,FALSE,yes (9) play=yes) – rainy,mild,normal,FALSE,yes (10) – sunny,mild,normal,TRUE,yes (11) • Rule is not good enough (3:2) – overcast,mild,high,TRUE,yes (12) • Ant 2: Choose windy=true (then play=no) – overcast,hot,normal,FALSE,yes (13) • Ant 2 finishes because otherwise covered – rainy,mild,high,TRUE,no (14) set would be too small. • No pruning possible either. • … 8

  9. Sample Run Result • Possible result (not most simple): – outlook=overcast → play=yes outlook=rainy, windy=false → play=yes outlook=sunny, humidity=normal → play=yes otherwise → play=no Comparison to CN2 Algorithm • Uses beam search (limited breadth first search with beam width b). • Add all possible terms to current partial rules, evaluate, and retain only the b best ones. • No feedback for constructing new rules. • Output format is the same (ordered rule list). • Uses entropy heuristic as well. Experiment Setup • Dimension roughly: 100…1000 cases, 9…34 attributes, 2…6 classes • Tests run using a 10-fold cross-validation procedure – Divide data into 10 partitions. – For each partition do • Treat it as the test data and use the other 90% as the training data. • Measure the performance. – Take the average value. • This helps to achieve significant results. 9

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend