a distributed population genetic algorithm for
play

A Distributed-Population Genetic Algorithm for Discovering - PowerPoint PPT Presentation

A Distributed-Population Genetic Algorithm for Discovering Interesting Prediction Rules Edgar Noda 1 Alex A. Freitas 2 Akebo Yamakami 1 1 School of Electrical and Computer Engineering (FEEC) State University of Campinas (Unicamp), Brazil 2


  1. A Distributed-Population Genetic Algorithm for Discovering Interesting Prediction Rules Edgar Noda 1 Alex A. Freitas 2 Akebo Yamakami 1 1 School of Electrical and Computer Engineering (FEEC) State University of Campinas (Unicamp), Brazil 2 Computing Laboratory University of Kent at Canterbury, UK

  2. Introduction ! Data Mining – Extraction of knowledge from data. – Data mining task: • Classification. – One goal Attribute, prediction. • Dependence Modeling. – Classification generalization, more than one possible goal attribute. ! Prediction rules form. – IF conditions on the values of predicting attributes are true THEN predict a value for some goal attribute

  3. Discovered Knowledge ! Desirable properties: – In principle, 3 properties. – 1. Predicative accuracy. • Most emphasized in the literature. • Discovered knowledge should have high predictive accuracy – 2. Comprehensibility. • High-level rules. • The output of rule discovery algorithms tends to be more comprehensible than the output of other kinds of algorithms

  4. Discovered Knowledge ! Desirable properties: – 3. Interestingness. • Discovered knowledge should be interesting to the user. • Among the three above-mentioned desirable properties, interestingness seems to be the most difficult one to be quantified and to be achieved. • By "interesting" we mean that discovered knowledge should be novel or surprising to the user. • The notion of interestingness goes beyond the notions of predictive accuracy and comprehensibility.

  5. Motivation for using a Genetic Algorithm (GA) in rule discovery ! Genetic Algorithm. – A GA is essentially a search algorithm inspired by the principle of natural selection. – In general, GAs tend to cope better with attribute interaction problems than greedy rule induction algorithms. – GAs perform a global search. – GAs use stochastic search operators, which contributes to make them more robust and less sensitive to noise. – The execution of a GA can be regarded as a parallel search engine acting upon a population of candidate rules.

  6. Motivation for using a Genetic Algorithm (GA) in rule discovery ! Distributed Genetic Algorithm (DGA). – Basic idea lies in the partition of the population into several small semi-isolated subpopulations. – Each subpopulation being associated to an independent GA, possibly exploring different promising regions. – Occasionally, these subpopulations interact with other subpopulations through the exchange of few individuals, simulating a seasonal migratory process. – The new injected genetic material hopefully ensures that good genetic material is shared from time to time. – This approach also contributes to minimize the early convergence problem and restricts the occurrence of “illegal matting”.

  7. GA-Nuggets ! Overview. – Designed to the dependence modeling task. – Individual encoding: • Genotype: fixed-length individual. • Phenotype: rules with variable number of attributes. – Fitness Function. • Two Parts: – Degree of interestingness. » Objective (Information-theoretical) measure. » Antecedent and consequent interestingness. – Predictive accuracy.

  8. GA-Nuggets ! The fitness function: + AntInt ConsInt + w . w . P redAcc 1 2 . 2 Fitness = + w 1 w 2 – AntInt – Antecedent degree of interestingness. – ConsInt – Consequent degree of interestingness. – PredAcc – Predicative accuracy. – W 1 and W 2 are user-defined weights.

  9. GA-Nuggets ! Selection method: – Tournament selection (factor:2). ! Genetic operators: – Uniform crossover. – Mutation. – Condition Insertion / Removal operators. • Influence in the size of the discovered predictive rule. – Consequent formation. – All operators guarantee the maintenance of valid genetic material.

  10. DGA-Nuggets ! Fitness, selection and genetic operators. – The same as in the single population version. ! Subpopulations – A specific fitness function in each subpopulation (search for different goals attributes). – Number of subpopulations = number of possible goals attributes. ! Migration policy. – Migration take places every m generations. – Each subpolutaion send a best individual based in the “foreign ” fitness.

  11. Computational Results ! Datasets. – Obtained from the UCI repository of machine learning databases ( http://www.ics.uci.edu/AI/Machine- Learning.html ). The data sets used are Zoo, Car Evaluation, Auto Imports and Nursery • Zoo - 101 instances and 18 attributes. • Car evaluation - 1728 instances and 6 attributes. • Auto-imports 85M - 205 instances and 26 categorical attributes. • Nursery school - 12960 instances and 9 attributes.

  12. Computational Results ! Summary of results. – Predicative accuracy. • DGA-Nuggets obtained somewhat better results than single- population GA-Nuggets. • In one case the GA-Nuggets found rules with significantly higher predictive accuracy. DGA-Nuggets significantly outperformed single-population GA in six cases – Degree of interestingness. • DGA-Nuggets obtained results considerably better than single- population GA-Nuggets. • DGA-nuggets outperformed the latter in 22 out of 44 cases – considering all the discovered rules in all the four data sets – whereas the reverse was true in just five out of 44 cases. In the other cases the difference between the two algorithms was not statistically significant �

  13. Discussion ! ���������������������������������� � �������������������� ������������������������������ � ������������������������������������������������������� ! ���������������������������� � ������������������������������������������������������������ ������������������������������������������������������������� ����������������������������������������������������������� ����������������������������������������������������������� ������ ! ������������������ � ��������������������������������������������������������� ������������ ���������� ���� ���������������� �������� ���� �������������������������

  14. Future Works ! ����������� �� ���� �������� ��� ���� ����������������������� ��� ������ ����� �������������� ��� ����������� ����� �� ����� ���������� ������� ������� ����� ����� �� ����� ���������� ��� ��� ���� �������� �������������������� ! ���������� ���� ������������ ��� ����� ������� �������� ����� ���� ������������ ��� ���� �������� ������������ ��������� ��� ������ ��� ������������ ���������� ���� ������������������� ��� ������ ������������ ! ��������������������������������������������������������������� ������������������������������������������������

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend