Behaviour of FeRaNGA - Feature Ranking process using Inductive - - PowerPoint PPT Presentation

behaviour of feranga feature ranking process using
SMART_READER_LITE
LIVE PREVIEW

Behaviour of FeRaNGA - Feature Ranking process using Inductive - - PowerPoint PPT Presentation

Behaviour of FeRaNGA - Feature Ranking process using Inductive Modelling 0 100 1110 0 1100 10 1 0 1110 10 1 0 1110 0 10 Ale Piln 0 110 1111 0 1101110 0 110 1111 0 1110 110 pi l nya1@ f el . cvut . cz 0 110 0 0


slide-1
SLIDE 1

100 1110 1100 10 1 1110 10 1 1110 10 110 1111 1101110 110 1111 1110 110 110 1 100 1110 11 11010 11 1110 10 1 1110 110 10 1 1101110 110 1 100 110 10 11 1100 1 1110 10 1100 10 1 110 10 1110 10 11110 1 100 1110 1101111 110 11 11010 1 1110 10 1100 1 110 11 1110 10 1 10 110 100 100 110 10 00 10 1 100 110 100 100 11 10 10 110 1010 10 1 10 10 10

Behaviour of FeRaNGA - Feature Ranking process using Inductive Modelling

Aleš Pilný

pi l nya1@ f el . cvut . cz

Pavel Kordík, Miroslav Šnorek

kor di kp@ f el . cvut . cz, s nor ek@ f el . cvut . cz ht t p: //ci g. f el k. cvut . cz Computational Intelligence Group Department of Computer Science and Engineering Faculty of Electrical Engineering Czech Technical University in Prague

ICANN 2008

slide-2
SLIDE 2

ICANN 2008 Aleš Pilný, pilnya1@fel.cvut.cz, http://cig.felk.cvut.cz

Overview of Feature Ranking and Selection How important is each feature?

Feature Ranking

  • 1. P-length
  • 2. P-width
  • 3. S-length
  • 4. S-width

Reduction Knowledge Feature Selection Of dimensionality Ranks

slide-3
SLIDE 3

ICANN 2008 Aleš Pilný, pilnya1@fel.cvut.cz, http://cig.felk.cvut.cz

The FAKE-GAME Tool overview

  • Extension of MIA GMDH
slide-4
SLIDE 4

ICANN 2008 Aleš Pilný, pilnya1@fel.cvut.cz, http://cig.felk.cvut.cz

Feature Ranking(FR) in FAKE-GAME

  • FAKE-GAME tool creates the GAME network using

Niching Genetic Algorithm (NGA)

  • Importance of each feature can be obtained as a side

effect of NGA by computing utilization in net building process

  • This approach also causes selection of important

features by ignoring redundant and irrelevant features.

slide-5
SLIDE 5

ICANN 2008 Aleš Pilný, pilnya1@fel.cvut.cz, http://cig.felk.cvut.cz

Feature Ranking utilizing information from Niching Genetic Algorithm - FeRaNGA

  • Novel approach for Feature Ranking
  • Ranking is easily extracted from proportional significance
  • f features
  • How?

– NGA = GA + domains (location of multiple solutions) – We used Deterministic Crowding method to promote the formation

and maintenance of stable subpopulations in GA.

– Significance is estimated by monitoring which genes exist in the

population (which features are used by genes in NGA)

slide-6
SLIDE 6

ICANN 2008 Aleš Pilný, pilnya1@fel.cvut.cz, http://cig.felk.cvut.cz

FeRaNGA

  • NGA random initialization → Problem with a results

instability of FeRaNGA

How to solve it?

slide-7
SLIDE 7

ICANN 2008 Aleš Pilný, pilnya1@fel.cvut.cz, http://cig.felk.cvut.cz

FeRaNGA-n

  • NGA random initialization → Problem with a results

instability of FeRaNGA

  • All ranks are computed from ensemble of -n GAME

models as a MEDIANS from estimated significance

slide-8
SLIDE 8

ICANN 2008 Aleš Pilný, pilnya1@fel.cvut.cz, http://cig.felk.cvut.cz

FeRaNGA-n

  • NGA random initialization → Problem with a results

instability of FeRaNGA

  • All ranks are computed from ensemble of -n GAME

models as a MEDIANS from estimated significance

Model 0: 1 2 3 5 4 Model 1: 1 3 2 4 5 Model 2: 1 2 3 4 5 1 2 3 5 4 Correct ranks: FAKE-GAME models: FeRaNGA-3 1 2 3 4 5

slide-9
SLIDE 9

ICANN 2008 Aleš Pilný, pilnya1@fel.cvut.cz, http://cig.felk.cvut.cz

Experiments

1.Influence of NGA configuration on ranks 2.Dependency of accuracy on Nr. of models for FeRaNGA-n method 3.Changes of ranks between layers

Three kinds of experiments on two artificial data sets.

slide-10
SLIDE 10

ICANN 2008 Aleš Pilný, pilnya1@fel.cvut.cz, http://cig.felk.cvut.cz

The Data sets used in experiments

  • Gaussian multivariate data set

– two clusters of points generated from two different 10th-

dimensional normal Gaussian distributions

– 1-10 are equally relevant, 11-20 are irrelevant, 21-50

are highly redundant with the first ten features

  • Uniform Hypercube data set

– two clusters of points generated from two different 10th-

dimensional hypercube [0; 1]¹º, with uniform distribution

– 1-10 with decreasing relevance, 11-20 are irrelevant,

21-50 redundant

slide-11
SLIDE 11

ICANN 2008 Aleš Pilný, pilnya1@fel.cvut.cz, http://cig.felk.cvut.cz

  • 1. Influence of NGA configuration on

FeRaNGA-7 results (on Gaussian Data Set)

Default configuration of NGA: 30 individuals and 15 epochs Ranks computed as a medians over all layers of medians.

slide-12
SLIDE 12

ICANN 2008 Aleš Pilný, pilnya1@fel.cvut.cz, http://cig.felk.cvut.cz

  • 1. Influence of NGA configuration on

FeRaNGA-7 results (on Gaussian Data Set)

Default configuration of NGA: 30 individuals and 15 epochs 9 correct ranks in first two layers Incorrect features! Redundant features Ranks computed as a medians over all layers of medians

slide-13
SLIDE 13

ICANN 2008 Aleš Pilný, pilnya1@fel.cvut.cz, http://cig.felk.cvut.cz

  • 1. Influence of NGA configuration on

FeRaNGA-7 results (on Gaussian Data Set)

Default configuration of NGA: 30 individuals and 15 epochs Configuration of NGA: 75 individuals and 75 epochs 9 correct ranks in first two layers 10 correct ranks Incorrect features! Redundant features

slide-14
SLIDE 14

ICANN 2008 Aleš Pilný, pilnya1@fel.cvut.cz, http://cig.felk.cvut.cz

  • 1. Influence of NGA configuration on

FeRaNGA-7 results (on Gaussian Data Set)

Default configuration of NGA: 30 individuals and 15 epochs Configuration of NGA: 75 individuals and 75 epochs Configuration of NGA: 150 individuals and 150 epochs 9 correct ranks in first two layers 10 correct ranks Incorrect features! Redundant features All features have correct ranks.

slide-15
SLIDE 15

ICANN 2008 Aleš Pilný, pilnya1@fel.cvut.cz, http://cig.felk.cvut.cz

Dependency of accuracy on Nr. of models for FeRaNGA-n method (on Hypercube Data set)

First ten ranks from first layers of FeRaNGA-7 on the Hypercube Data Set.

  • Ranks computed from a higher Nr. of models depend on

significance of features from previous models.

slide-16
SLIDE 16

ICANN 2008 Aleš Pilný, pilnya1@fel.cvut.cz, http://cig.felk.cvut.cz

Dependency of accuracy on Nr. of models for FeRaNGA-n method (on Hypercube Data set)

First ten ranks from first layers of FeRaNGA-7 on the Hypercube Data Set.

slide-17
SLIDE 17

ICANN 2008 Aleš Pilný, pilnya1@fel.cvut.cz, http://cig.felk.cvut.cz

Dependency of accuracy on Nr. of models for FeRaNGA-n method (on Hypercube Data set)

First ten ranks from first layers of FeRaNGA-7 on the Hypercube Data Set.

  • For NGA configuration 75 are correct ranks from 5, 6 and 7

models.

slide-18
SLIDE 18

ICANN 2008 Aleš Pilný, pilnya1@fel.cvut.cz, http://cig.felk.cvut.cz

Dependency of accuracy on Nr. of models for FeRaNGA-n method (on Hypercube Data set)

First ten ranks from first layers of FeRaNGA-7 on the Hypercube Data Set.

  • For NGA configuration 75 are correct ranks from 5, 6 and 7

models.

  • Growing Nr. of models and stronger NGA config. cause

improving of accuracy.

slide-19
SLIDE 19

ICANN 2008 Aleš Pilný, pilnya1@fel.cvut.cz, http://cig.felk.cvut.cz

Dependency of accuracy on Nr. of models for FeRaNGA-n method (on Hypercube Data set)

First ten ranks from first layers of FeRaNGA-7 on the Hypercube Data Set.

  • For NGA configuration 75 are correct ranks from 5, 6 and 7

models.

  • Growing Nr. of models and stronger NGA config. cause

improving of accuracy.

  • With NGA config. 150 are all ranks of features correct.
slide-20
SLIDE 20

ICANN 2008 Aleš Pilný, pilnya1@fel.cvut.cz, http://cig.felk.cvut.cz

Changes of ranks between layers (on Hypercube Data set)

Changes of ranks between first two layers for 14 GAME models (cfg.150)

slide-21
SLIDE 21

ICANN 2008 Aleš Pilný, pilnya1@fel.cvut.cz, http://cig.felk.cvut.cz

Changes of ranks between layers (on Hypercube Data set)

Changes of ranks between first two layers for 14 GAME models (cfg.150)

  • In all cases the relevant features loses a part of their importance
slide-22
SLIDE 22

ICANN 2008 Aleš Pilný, pilnya1@fel.cvut.cz, http://cig.felk.cvut.cz

Changes of ranks between layers (on Hypercube Data set)

Changes of ranks between first two layers for 14 GAME models (cfg.150)

  • In all cases the relevant features loses a part of their importance
  • The average loss on one relevant feature is -0,3. A gain on one

redundant feature is 0,09 and a gain on one irrelevant feature is 0,07. (the numbers are relative to Nr. of features)

slide-23
SLIDE 23

ICANN 2008 Aleš Pilný, pilnya1@fel.cvut.cz, http://cig.felk.cvut.cz

Changes of ranks between layers (on Hypercube Data set)

Changes of ranks between first two layers for 14 GAME models (cfg.150)

  • In all cases the relevant features loses a part of their importance
  • The average loss on one relevant feature is -0,3. A gain on one

redundant feature is 0,09 and a gain on one irrelevant feature is 0,07. (the numbers are relative to Nr. of features)

  • In first layer are ranked only a few most important features

and in every next layer this important features loss its importance

  • n behalf of redundant and irrelevant features.
slide-24
SLIDE 24

ICANN 2008 Aleš Pilný, pilnya1@fel.cvut.cz, http://cig.felk.cvut.cz

Conclusion

  • Stronger NGA configuration causes better results but

higher Nr. of epochs and individuals slow down a learning process.

  • With growing Nr. of models is accuracy increasing.
  • Power of FeRaNGA-n is in first layer where only a few

important features are ranked and redundant and irrelevant features are unused.

slide-25
SLIDE 25

ICANN 2008 Aleš Pilný, pilnya1@fel.cvut.cz, http://cig.felk.cvut.cz

Questions

Thank you for your attention.

Any questions?

pilnya1@fel.cvut.cz

slide-26
SLIDE 26

ICANN 2008 Aleš Pilný, pilnya1@fel.cvut.cz, http://cig.felk.cvut.cz

Thank you.