A Distributed-Population Genetic Algorithm for Discovering - - PowerPoint PPT Presentation

a distributed population genetic algorithm for
SMART_READER_LITE
LIVE PREVIEW

A Distributed-Population Genetic Algorithm for Discovering - - PowerPoint PPT Presentation

A Distributed-Population Genetic Algorithm for Discovering Interesting Prediction Rules Edgar Noda 1 Alex A. Freitas 2 Akebo Yamakami 1 1 School of Electrical and Computer Engineering (FEEC) State University of Campinas (Unicamp), Brazil 2


slide-1
SLIDE 1

A Distributed-Population Genetic Algorithm for Discovering Interesting Prediction Rules

Edgar Noda1 Alex A. Freitas2 Akebo Yamakami1

1 School of Electrical and Computer Engineering (FEEC)

State University of Campinas (Unicamp), Brazil

2 Computing Laboratory

University of Kent at Canterbury, UK

slide-2
SLIDE 2

Introduction

! Data Mining

– Extraction of knowledge from data. – Data mining task:

  • Classification.

– One goal Attribute, prediction.

  • Dependence Modeling.

– Classification generalization, more than one possible goal attribute.

! Prediction rules form.

– IF conditions on the values of predicting attributes are true THEN predict a value for some goal attribute

slide-3
SLIDE 3

Discovered Knowledge

! Desirable properties:

– In principle, 3 properties. – 1. Predicative accuracy.

  • Most emphasized in the literature.
  • Discovered knowledge should have high predictive

accuracy

– 2. Comprehensibility.

  • High-level rules.
  • The output of rule discovery algorithms tends to be more

comprehensible than the output of other kinds of algorithms

slide-4
SLIDE 4

Discovered Knowledge

! Desirable properties:

– 3. Interestingness.

  • Discovered knowledge should be interesting to the user.
  • Among the three above-mentioned desirable properties,

interestingness seems to be the most difficult one to be quantified and to be achieved.

  • By "interesting" we mean that discovered knowledge

should be novel or surprising to the user.

  • The notion of interestingness goes beyond the notions of

predictive accuracy and comprehensibility.

slide-5
SLIDE 5

Motivation for using a Genetic Algorithm (GA) in rule discovery

! Genetic Algorithm.

– A GA is essentially a search algorithm inspired by the principle of natural selection. – In general, GAs tend to cope better with attribute interaction problems than greedy rule induction algorithms. – GAs perform a global search. – GAs use stochastic search operators, which contributes to make them more robust and less sensitive to noise. – The execution of a GA can be regarded as a parallel search engine acting upon a population of candidate rules.

slide-6
SLIDE 6

Motivation for using a Genetic Algorithm (GA) in rule discovery

! Distributed Genetic Algorithm (DGA).

– Basic idea lies in the partition of the population into several small semi-isolated subpopulations. – Each subpopulation being associated to an independent GA, possibly exploring different promising regions. – Occasionally, these subpopulations interact with other subpopulations through the exchange of few individuals, simulating a seasonal migratory process. – The new injected genetic material hopefully ensures that good genetic material is shared from time to time. – This approach also contributes to minimize the early convergence problem and restricts the occurrence of “illegal matting”.

slide-7
SLIDE 7

GA-Nuggets

! Overview.

– Designed to the dependence modeling task. – Individual encoding:

  • Genotype: fixed-length individual.
  • Phenotype: rules with variable number of attributes.

– Fitness Function.

  • Two Parts:

– Degree of interestingness. » Objective (Information-theoretical) measure. » Antecedent and consequent interestingness. – Predictive accuracy.

slide-8
SLIDE 8

GA-Nuggets

! The fitness function:

– AntInt – Antecedent degree of interestingness. – ConsInt – Consequent degree of interestingness. – PredAcc – Predicative accuracy. – W1 and W2 are user-defined weights.

Fitness =

2 1 . 2 .

. 2 1

w w redAcc P w ConsInt AntInt w + + +

slide-9
SLIDE 9

GA-Nuggets

! Selection method:

– Tournament selection (factor:2).

! Genetic operators:

– Uniform crossover. – Mutation. – Condition Insertion / Removal operators.

  • Influence in the size of the discovered predictive rule.

– Consequent formation. – All operators guarantee the maintenance of valid genetic material.

slide-10
SLIDE 10

DGA-Nuggets

! Fitness, selection and genetic operators.

– The same as in the single population version.

! Subpopulations

– A specific fitness function in each subpopulation (search for different goals attributes). – Number of subpopulations = number of possible goals attributes.

! Migration policy.

– Migration take places every m generations. – Each subpolutaion send a best individual based in the “foreign ” fitness.

slide-11
SLIDE 11

Computational Results

! Datasets.

– Obtained from the UCI repository of machine learning databases (http://www.ics.uci.edu/AI/Machine- Learning.html). The data sets used are Zoo, Car Evaluation, Auto Imports and Nursery

  • Zoo - 101 instances and 18 attributes.
  • Car evaluation - 1728 instances and 6 attributes.
  • Auto-imports 85M - 205 instances and 26 categorical

attributes.

  • Nursery school - 12960 instances and 9 attributes.
slide-12
SLIDE 12

Computational Results

! Summary of results.

– Predicative accuracy.

  • DGA-Nuggets obtained somewhat better results than single-

population GA-Nuggets.

  • In one case the GA-Nuggets found rules with significantly

higher predictive accuracy. DGA-Nuggets significantly

  • utperformed single-population GA in six cases

– Degree of interestingness.

  • DGA-Nuggets obtained results considerably better than single-

population GA-Nuggets.

  • DGA-nuggets outperformed the latter in 22 out of 44 cases –

considering all the discovered rules in all the four data sets – whereas the reverse was true in just five out of 44 cases. In the

  • ther cases the difference between the two algorithms was not

statistically significant

slide-13
SLIDE 13

Discussion

! !

  • !
slide-14
SLIDE 14

Future Works

!

  • !
  • !