Application of Multi- -Objective Objective Metaheuristic - - PowerPoint PPT Presentation

application of multi objective objective metaheuristic
SMART_READER_LITE
LIVE PREVIEW

Application of Multi- -Objective Objective Metaheuristic - - PowerPoint PPT Presentation

Application of Multi- -Objective Objective Metaheuristic Metaheuristic Application of Multi Algorithms in Data Mining Algorithms in Data Mining Presented by: Presented by: Dr Beatriz de la Iglesia Iglesia Dr Beatriz de la University of


slide-1
SLIDE 1

UKKDD UKKDD’ ’07 07

Application of Multi Application of Multi-

  • Objective

Objective Metaheuristic Metaheuristic Algorithms in Data Mining Algorithms in Data Mining

Presented by: Presented by:

Dr Beatriz de la Dr Beatriz de la Iglesia Iglesia University of East Anglia University of East Anglia Norwich, Norfolk, UK Norwich, Norfolk, UK email: email: bli@cmp.uea.ac.uk bli@cmp.uea.ac.uk

slide-2
SLIDE 2

2 2

  • Dr. B de la
  • Dr. B de la Iglesia

Iglesia UKKDD UKKDD’ ’07 07

Overview Overview

l l Why use Multi Why use Multi-

  • Objective (MO) algorithms?

Objective (MO) algorithms? l l An introduction to MO optimisation An introduction to MO optimisation l l MO algorithms for classification MO algorithms for classification l l Conclusions Conclusions

slide-3
SLIDE 3

3 3

  • Dr. B de la
  • Dr. B de la Iglesia

Iglesia UKKDD UKKDD’ ’07 07

Data Mining Data Mining

Data Mining is Data Mining is a step a step in the KDD process. in the KDD process. It consists of the application of particular data It consists of the application of particular data mining mining algorithms algorithms to extract higher level to extract higher level information in the form of a information in the form of a model model or a set of

  • r a set of

patterns patterns from a large dataset. from a large dataset.

slide-4
SLIDE 4

4 4

  • Dr. B de la
  • Dr. B de la Iglesia

Iglesia UKKDD UKKDD’ ’07 07

Model selection Model selection

l l Many models can fit the same data. Many models can fit the same data. l l Data mining is concerned with the improvement Data mining is concerned with the improvement ( (optimisation

  • ptimisation) of the model to obtain the best

) of the model to obtain the best prediction or description of the data, depending prediction or description of the data, depending

  • n the objectives of the KDD process.
  • n the objectives of the KDD process.
slide-5
SLIDE 5

5 5

  • Dr. B de la
  • Dr. B de la Iglesia

Iglesia UKKDD UKKDD’ ’07 07

The goal is to predict whether a customer will buy a product given their sex, country and age (classification).

Sex Country Age Buy? Goal/class M France 25 Yes M England 21 Yes F France 23 Yes F England 34 Yes F France 30 No M Germany 21 No M Germany 20 No F Germany 18 No F France 34 No M France 55 No Freitas and Lavington (1998) Data Mining with EAs, CEC99.

Marketing example

slide-6
SLIDE 6

6 6

  • Dr. B de la
  • Dr. B de la Iglesia

Iglesia UKKDD UKKDD’ ’07 07

no no yes yes country? age? Germany England France <= 25 > 25 Internal branching node Leaf node

Different models

sex country age Buy?-yes input layer hidden layer

  • utput

layer Buy?- no

Decision Tree Decision Tree Neural Network Neural Network

slide-7
SLIDE 7

7 7

  • Dr. B de la
  • Dr. B de la Iglesia

Iglesia UKKDD UKKDD’ ’07 07

Optimising the model/patterns Optimising the model/patterns

l l Data Mining is an Data Mining is an optimisation

  • ptimisation process.

process. l l We search for the best model or patterns We search for the best model or patterns according to some according to some evaluation evaluation criteria. criteria. l l This normally requires adjusting This normally requires adjusting parameters parameters of

  • f

the algorithm. the algorithm.

slide-8
SLIDE 8

8 8

  • Dr. B de la
  • Dr. B de la Iglesia

Iglesia UKKDD UKKDD’ ’07 07

Generalisation Generalisation

l l Model should not only model the data used to build them Model should not only model the data used to build them ( (train set train set) but also the real ) but also the real-

  • world process that is

world process that is generating the data. generating the data. l l Only then we may get a model that will Only then we may get a model that will generalise generalise to to

  • ther samples from the real
  • ther samples from the real-
  • world process.

world process. l l We use an independent sample ( We use an independent sample (test set test set) drawn from the ) drawn from the real real-

  • world data to test the performance of the model on

world data to test the performance of the model on new data. new data. l l Test set must not be compromised when building the Test set must not be compromised when building the

  • model. A
  • model. A validation set

validation set should be used for any testing of should be used for any testing of the model in the intermediary stages. the model in the intermediary stages.

slide-9
SLIDE 9

9 9

  • Dr. B de la
  • Dr. B de la Iglesia

Iglesia UKKDD UKKDD’ ’07 07

Model selection criteria Model selection criteria

l l Most selections involve more than one criterion Most selections involve more than one criterion and there are often conflicts or trade and there are often conflicts or trade-

  • offs
  • ffs

between different criteria. between different criteria. l l Eg

  • Eg. A couple are buying a house

. A couple are buying a house

n n She wants a very modern house with

She wants a very modern house with “ “wow wow” ” factor factor and gadgets and gadgets

n n He wants a house with many rooms for family growth

He wants a house with many rooms for family growth

n n The both want to find the house of their dreams as

The both want to find the house of their dreams as cheaply as possible cheaply as possible

slide-10
SLIDE 10

10 10

  • Dr. B de la
  • Dr. B de la Iglesia

Iglesia UKKDD UKKDD’ ’07 07

5 10 £1,000,000 7 5 £900,000 3 4 £900,000 10 3 £550,000 3 4 £500,000 4 5 £500,000 5 4 £300,000 4 2 £250,000 "wow" factor Rooms Cost

Multi Multi-

  • objective problem
  • bjective problem
slide-11
SLIDE 11

11 11

  • Dr. B de la
  • Dr. B de la Iglesia

Iglesia UKKDD UKKDD’ ’07 07

Multi Multi-

  • Objective Data Mining

Objective Data Mining

l l In data mining there are also many conflicting In data mining there are also many conflicting criteria for model evaluation. criteria for model evaluation. l l Eg Eg. .

n n Decision trees and Neural Nets may be evaluated by

Decision trees and Neural Nets may be evaluated by their complexity and their generalisation error. their complexity and their generalisation error.

n n Association rules may be evaluated by their support

Association rules may be evaluated by their support and confidence. and confidence.

n n Clustering solutions may evaluate entropy and purity

Clustering solutions may evaluate entropy and purity

  • r other measures of clustering quality.
  • r other measures of clustering quality.
slide-12
SLIDE 12

12 12

  • Dr. B de la
  • Dr. B de la Iglesia

Iglesia UKKDD UKKDD’ ’07 07

Multi Multi-

  • Objective Optimisation

Objective Optimisation

l l Given two solutions with different objective Given two solutions with different objective values, it is not possible to state categorically values, it is not possible to state categorically than one solution is better than the other. than one solution is better than the other. l l Multi Multi-

  • objective algorithms must find the set of all
  • bjective algorithms must find the set of all

such such trade trade-

  • off
  • ff solutions.

solutions. l l The user can then select a solution according to The user can then select a solution according to preference. preference.

slide-13
SLIDE 13

13 13

  • Dr. B de la
  • Dr. B de la Iglesia

Iglesia UKKDD UKKDD’ ’07 07

MO MO Opmimisation Opmimisation

l l Given a problem with Given a problem with n n objectives

  • bjectives f

f1

1,

, … …, f , fn

n,

, each of which is going to be maximised each of which is going to be maximised, , solution solution a a dominates dominates solution solution b b if if l l Given a set of solutions, Given a set of solutions, S, S, a solution a solution a a∈ ∈S S is is non non-

  • dominated

dominated if there is no solution if there is no solution s s∈ ∈S S that that dominates a. dominates a.

{ } { }

). ( ) ( f such that , , 1 j and , , 1 ) ( ) (

j

b f a n n i b f a f

j i i

> ∈ ∃ ∈ ∀ ≥

slide-14
SLIDE 14

14 14

  • Dr. B de la
  • Dr. B de la Iglesia

Iglesia UKKDD UKKDD’ ’07 07

Pareto Front Pareto Front

slide-15
SLIDE 15

15 15

  • Dr. B de la
  • Dr. B de la Iglesia

Iglesia UKKDD UKKDD’ ’07 07

MO algorithms MO algorithms

l l Should approximate the Pareto Should approximate the Pareto-

  • front.

front. l l Should provide a good spread of solutions in the Pareto Should provide a good spread of solutions in the Pareto-

  • front.

front. l l Evolutionary Algorithms ( Evolutionary Algorithms (EAs EAs) are well suited to MO ) are well suited to MO

  • ptimisation as they deal with a population of solutions.
  • ptimisation as they deal with a population of solutions.

l l Many alternative EA approaches including Many alternative EA approaches including

n n

Aggregating functions Aggregating functions

n n

Lexicographical ranking Lexicographical ranking

n n

Pareto Dominance Pareto Dominance (e.g. PAES, NSGA II, SPEA 2) (e.g. PAES, NSGA II, SPEA 2)

slide-16
SLIDE 16

16 16

  • Dr. B de la
  • Dr. B de la Iglesia

Iglesia UKKDD UKKDD’ ’07 07

Classification Classification

l l In classification a model is sought which can In classification a model is sought which can assign a class to each instance in the database. assign a class to each instance in the database. It relies on historical labelled data. It relies on historical labelled data. l l Nugget discovery or partial classification seeks Nugget discovery or partial classification seeks to find patterns that represent a to find patterns that represent a “ “strong strong” ” description of a predefined class. description of a predefined class. l l Particularly relevant for Particularly relevant for “ “minority classes minority classes” ”. .

slide-17
SLIDE 17

17 17

  • Dr. B de la
  • Dr. B de la Iglesia

Iglesia UKKDD UKKDD’ ’07 07

A nugget A nugget

l l Let Let Q Q be a finite set of be a finite set of attributes, attributes, each with an each with an associated domain. associated domain. l l A A record record specifies values for each attribute in specifies values for each attribute in Q. Q. l l A tabular database A tabular database, D, , D, is a finite set of records is a finite set of records. . l l Partial classification rules are of the form Partial classification rules are of the form

  • where the antecedent and consequent are predicates

where the antecedent and consequent are predicates used to define subset of records from the database used to define subset of records from the database D D and the rule underlines an association between these and the rule underlines an association between these subsets. subsets.

slide-18
SLIDE 18

18 18

  • Dr. B de la
  • Dr. B de la Iglesia

Iglesia UKKDD UKKDD’ ’07 07

Attribute Tests Attribute Tests

l l In nugget discovery, antecedent is often constrained to In nugget discovery, antecedent is often constrained to be a be a conjunction conjunction of Attribute Tests (

  • f Attribute Tests (ATs

ATs). ). l l Numerical attributes Numerical attributes are defined by a simple value, are defined by a simple value, binary partition or range of values (e.g. age binary partition or range of values (e.g. age≥ ≥25) 25) l l Categorical attributes Categorical attributes are defined by a simple value, a are defined by a simple value, a subset of values or an inequality test (e.g. subset of values or an inequality test (e.g. colour colour≠ ≠blue blue) ). . l l Τ Τhe consequent he consequent represents the class assignment. represents the class assignment.

slide-19
SLIDE 19

19 19

  • Dr. B de la
  • Dr. B de la Iglesia

Iglesia UKKDD UKKDD’ ’07 07

Strength of a rule Strength of a rule

l l If If A A represents the antecedent and represents the antecedent and C C consequent the consequent the strength of the rule, strength of the rule, r r, can be expressed by , can be expressed by

A = {t A = {t ∈ ∈D D | | ( ( )} )} B = {t B = {t ∈ ∈D D | | ( ( )} )} C = {t C = {t ∈ ∈D D | | ( (t t) ) ∧ ∧

  • l

l A strong rule is one that meets certain confidence and A strong rule is one that meets certain confidence and coverage thresholds. coverage thresholds.

a c A C r = = | | | | ) ( conf

b c B C r = = | | | | ) ( cov

slide-20
SLIDE 20

20 20

  • Dr. B de la
  • Dr. B de la Iglesia

Iglesia UKKDD UKKDD’ ’07 07

Sex Country Age Buy? Goal/class M France 25 Yes M England 21 Yes F France 23 Yes F England 34 Yes F France 30 No M Germany 21 No M Germany 20 No F Germany 18 No F France 34 No M France 55 No

Marketing example Marketing example

If Country = England If Country = England then then “ “YES YES” ” Conf (r) = 2/2 =1 Conf (r) = 2/2 =1 Cov(r Cov(r) 2/4 = 0.5 ) 2/4 = 0.5 If (Country = France AND Age If (Country = France AND Age≤ ≤ ≤ ≤ ≤ ≤ ≤ ≤ 25) OR (Country = England) 25) OR (Country = England) then then “ “YES YES” ” Conf (r) = 4/4 = 1 Conf (r) = 4/4 = 1 Cov(r Cov(r) = 4/4 = 1 ) = 4/4 = 1 If Country = France If Country = France then then “ “YES YES” ” Conf (r)= 2/5 = 0.4 Conf (r)= 2/5 = 0.4 Cov(r Cov(r) = 2/4 = 0.5 ) = 2/4 = 0.5

slide-21
SLIDE 21

21 21

  • Dr. B de la
  • Dr. B de la Iglesia

Iglesia UKKDD UKKDD’ ’07 07

Multi Multi-

  • Objective problem

Objective problem

l l Each measure of the strength of the rule Each measure of the strength of the rule represents a different objective function to be represents a different objective function to be

  • ptimised.
  • ptimised.

l l Allows for the application of Multi Allows for the application of Multi-

  • Objective

Objective metaheuristics (MOMH). metaheuristics (MOMH). l l Each Pareto optimal solution represents a Each Pareto optimal solution represents a different compromise between the objectives. different compromise between the objectives.

slide-22
SLIDE 22

22 22

  • Dr. B de la
  • Dr. B de la Iglesia

Iglesia UKKDD UKKDD’ ’07 07

Partial Classification with MOMH Partial Classification with MOMH

70% 75% 80% 85% 90% 95% 1 00% 0% 1 0% 20% 30% 40% 50% 60% 70% 80% 90% 1 00%

Coverage Confidence

MOMH ARA

Rules for "<=50K" class of UCI Adult Dataset Rules for "<=50K" class of UCI Adult Dataset

slide-23
SLIDE 23

23 23

  • Dr. B de la
  • Dr. B de la Iglesia

Iglesia UKKDD UKKDD’ ’07 07

Partial classification rules Partial classification rules

IF CapitalLoss <= 2206 AND CapitalGain <= 6849 AND Relationship != Husband THEN Salary <= 50K Confidence = 0.92 Coverage = 0.69 IF CapitalGain <= 6723 AND Age <= 21 AND HoursPerWeek <= 59 AND MaritalStat != Married_civ_spouse THEN Salary <= 50K Confidence = 1 Coverage = 0.109341 IF CapitalGain <= 41310 THEN Salary <= 50K Confidence = 0.75 Coverage = 1

slide-24
SLIDE 24

24 24

  • Dr. B de la
  • Dr. B de la Iglesia

Iglesia UKKDD UKKDD’ ’07 07

Rule Trees for binary classification Rule Trees for binary classification

l Rule trees are more expressive than simple rules and more compact than sets of such rules.

Age = 28 Hobby = Chess Attendance = 90% Job = Engineer Degree = Physics OR AND AND OR OR Degree = Maths

Leaf nodes contain attribute tests (ATs) Interior nodes contain boolean

  • perators AND

and OR. Binary tree representation means simpler genetic operators

slide-25
SLIDE 25

25 25

  • Dr. B de la
  • Dr. B de la Iglesia

Iglesia UKKDD UKKDD’ ’07 07

Objectives for MO approach Objectives for MO approach

l Minimizing misclassification costs:

n

simple error rate;

n

balanced error rate;

n

any other measure of overall misclassification cost.

l Minimizing rule complexity:

n

to encourage the production of rules that are easily understood by the client;

n

to reduce the chance of overfitting the data.

slide-26
SLIDE 26

26 26

  • Dr. B de la
  • Dr. B de la Iglesia

Iglesia UKKDD UKKDD’ ’07 07

Train/Validation/Test Train/Validation/Test

13% 14% 15% 16% 17% 18% 19% 20% 21% 5 10 15 20

Number of ATs Error Rate

Training Validation

slide-27
SLIDE 27

27 27

  • Dr. B de la
  • Dr. B de la Iglesia

Iglesia UKKDD UKKDD’ ’07 07

Rules for Adult Rules for Adult “ “>50k >50k” ”

1990 1710 + 628 10732

  • Actual

+

  • Predicted

Confusion matrix 3151 549 + 2729 8631

  • Actual

+

  • Predicted

Confusion matrix

Simple error rate Simple error rate Balanced error rate Balanced error rate

Cap.Gain≥4687 Age≥30 Edu.Years≥13 Edu.Years≥9 Mar.status=CivilianSpouse AND OR AND OR

slide-28
SLIDE 28

28 28

  • Dr. B de la
  • Dr. B de la Iglesia

Iglesia UKKDD UKKDD’ ’07 07

Rules for Adult Rules for Adult “ “>50k >50k” ”

3560 140 + 4836 6524

  • Actual

+

  • Predicted

Confusion matrix

FN is 10 times the cost of a FP FN is 10 times the cost of a FP

Cap.Gain≥4687 Age≥28 Edu.Years≥13 Cap.Loss≥2231 Mar.status=CivilianSpouse OR OR OR AND

slide-29
SLIDE 29

29 29

  • Dr. B de la
  • Dr. B de la Iglesia

Iglesia UKKDD UKKDD’ ’07 07

Rule sets Rule sets

If cap. gain If cap. gain ≥ ≥ 5178 then salary > 50 K 5178 then salary > 50 K If mar. status = civilian spouse and cap. loss If mar. status = civilian spouse and cap. loss ≥ ≥ 1816 and 1816 and

  • cap. loss
  • cap. loss ≤

≤ 2001 then salary > 50 K 2001 then salary > 50 K If mar. status = civilian spouse and If mar. status = civilian spouse and edu

  • edu. years

. years ≥ ≥ 13 then 13 then salary > 50 K salary > 50 K Otherwise salary Otherwise salary ≤ ≤ 50 K 50 K

slide-30
SLIDE 30

30 30

  • Dr. B de la
  • Dr. B de la Iglesia

Iglesia UKKDD UKKDD’ ’07 07

Result evaluation Result evaluation

l l MOMHs MOMHs to extract to extract ETs ETs produce complete, simple and produce complete, simple and understandable class descriptions. understandable class descriptions. l l Competitive in terms in classification performance with Competitive in terms in classification performance with

  • ther more sophisticated classifiers.
  • ther more sophisticated classifiers.

l l Very flexible approach, providing user with a number of Very flexible approach, providing user with a number of models. models. l l Can operate with different measures of complexity and Can operate with different measures of complexity and different error measures. different error measures. l l Can present knowledge in different formats: rule set, rule Can present knowledge in different formats: rule set, rule tree. tree.

slide-31
SLIDE 31

31 31

  • Dr. B de la
  • Dr. B de la Iglesia

Iglesia UKKDD UKKDD’ ’07 07

Conclusions Conclusions

l Application of meta-heuristics to data mining has produced efficient and effective algorithms, scalable to large databases l A MO approach has many advantages for both partial classification and binary classification. l Algorithms for partial classification have compared well with All Rule Algorithms and algorithms for rule trees compare well with classification algorithms. l The approach is very flexible: we can change rule representation, AT Types, objectives, etc. l We are continuing research in this area.

slide-32
SLIDE 32

32 32

  • Dr. B de la
  • Dr. B de la Iglesia

Iglesia UKKDD UKKDD’ ’07 07

Questions? Questions?