Detecting multivariate outliers using projection pursuit with - - PowerPoint PPT Presentation

detecting multivariate outliers using projection pursuit
SMART_READER_LITE
LIVE PREVIEW

Detecting multivariate outliers using projection pursuit with - - PowerPoint PPT Presentation

Detecting multivariate outliers using projection pursuit with particle swarm optimization Anne Ruiz-Gazen Alain Berro Souad Larabi Marie-Sainte University of Toulouse 1 - Capitole (TSE - IRIT - IMT) COMPSTAT, Paris, September 2010 A.


slide-1
SLIDE 1

Detecting multivariate outliers using projection pursuit with particle swarm optimization

Anne Ruiz-Gazen Alain Berro Souad Larabi Marie-Sainte

University of Toulouse 1 - Capitole (TSE - IRIT - IMT)

COMPSTAT, Paris, September 2010

  • A. Ruiz-Gazen (University of Toulouse)

EPP using PSO COMPSTAT 2010 1 / 35

slide-2
SLIDE 2

Introduction

What is Exploratory Projection Pursuit? search for “interesting” linear low dimensional projections of high dimensional multivariate data

  • A. Ruiz-Gazen (University of Toulouse)

EPP using PSO COMPSTAT 2010 2 / 35

slide-3
SLIDE 3

Introduction

What is Exploratory Projection Pursuit? search for “interesting” linear low dimensional projections of high dimensional multivariate data Interesting structures:

  • utliers,

clusters, . . .

  • A. Ruiz-Gazen (University of Toulouse)

EPP using PSO COMPSTAT 2010 2 / 35

slide-4
SLIDE 4

Introduction

What is Exploratory Projection Pursuit? search for “interesting” linear low dimensional projections of high dimensional multivariate data Interesting structures:

  • utliers,

clusters, . . .

Two ingredients:

projection interestingness: projection index I

  • ptimization of the index:

algorithm

  • A. Ruiz-Gazen (University of Toulouse)

EPP using PSO COMPSTAT 2010 2 / 35

slide-5
SLIDE 5

Introduction

EPP usually known by statisticians but not used! Well-known statistical softwares do NOT propose PP procedures (some routines in Fortran, Splus, Matlab and GGobi).

  • A. Ruiz-Gazen (University of Toulouse)

EPP using PSO COMPSTAT 2010 3 / 35

slide-6
SLIDE 6

Introduction

EPP usually known by statisticians but not used! Well-known statistical softwares do NOT propose PP procedures (some routines in Fortran, Splus, Matlab and GGobi). Recent applications in the domain of anomalies detection in hyperspectral imagery (Achard et al., 2004, Malpika et al., 2008, Smetek and Bauer, 2008).

  • A. Ruiz-Gazen (University of Toulouse)

EPP using PSO COMPSTAT 2010 3 / 35

slide-7
SLIDE 7

Introduction

Mathematically denote X data matrix n × p, Xi observation p × 1, continuous variables, data are centered and scaled (divided by standard deviation or made spherical), consider one-dimensional projections from Rp to R : z = Xα,

where α is a p-dimensional projection vector α′α = 1, z is a n-dimensional vector: coordinates of the projected observations,

define a projection index function I : α → I(α), find projection vectors α : max{α∈Rp|α′α=1} I(α)

  • A. Ruiz-Gazen (University of Toulouse)

EPP using PSO COMPSTAT 2010 4 / 35

slide-8
SLIDE 8

Contents

1 PP indices for multivariate outliers detection: a review

First proposals Other proposals

  • A. Ruiz-Gazen (University of Toulouse)

EPP using PSO COMPSTAT 2010 5 / 35

slide-9
SLIDE 9

Contents

1 PP indices for multivariate outliers detection: a review

First proposals Other proposals

2 Optimization procedures for PP: a review

Strategy for the first proposals Other strategies

  • A. Ruiz-Gazen (University of Toulouse)

EPP using PSO COMPSTAT 2010 5 / 35

slide-10
SLIDE 10

Contents

1 PP indices for multivariate outliers detection: a review

First proposals Other proposals

2 Optimization procedures for PP: a review

Strategy for the first proposals Other strategies

3 New optimization proposals

A new strategy Heuristics optimization algorithms

Genetic Algorithm Particle Swarm Optimization algorithm Tribes

  • A. Ruiz-Gazen (University of Toulouse)

EPP using PSO COMPSTAT 2010 5 / 35

slide-11
SLIDE 11

Contents

1 PP indices for multivariate outliers detection: a review

First proposals Other proposals

2 Optimization procedures for PP: a review

Strategy for the first proposals Other strategies

3 New optimization proposals

A new strategy Heuristics optimization algorithms

Genetic Algorithm Particle Swarm Optimization algorithm Tribes

4 Illustration with EPP-Lab

  • A. Ruiz-Gazen (University of Toulouse)

EPP using PSO COMPSTAT 2010 5 / 35

slide-12
SLIDE 12

Contents

1 PP indices for multivariate outliers detection: a review

First proposals Other proposals

2 Optimization procedures for PP: a review

Strategy for the first proposals Other strategies

3 New optimization proposals

A new strategy Heuristics optimization algorithms

Genetic Algorithm Particle Swarm Optimization algorithm Tribes

4 Illustration with EPP-Lab 5 Conclusion and perspectives

  • A. Ruiz-Gazen (University of Toulouse)

EPP using PSO COMPSTAT 2010 5 / 35

slide-13
SLIDE 13

PP indices for multivariate outliers detection: a review

Contents

1 PP indices for multivariate outliers detection: a review

First proposals Other proposals

2 Optimization procedures for PP: a review

Strategy for the first proposals Other strategies

3 New optimization proposals

A new strategy Heuristics optimization algorithms

Genetic Algorithm Particle Swarm Optimization algorithm Tribes

4 Illustration with EPP-Lab 5 Conclusion and perspectives

  • A. Ruiz-Gazen (University of Toulouse)

EPP using PSO COMPSTAT 2010 6 / 35

slide-14
SLIDE 14

PP indices for multivariate outliers detection: a review First proposals

First proposals

Definition of an “interesting” projection discussed in the founding papers on PP (Friedman and Tukey, 1974, Huber, 1985, Jones and Sibson, 1987, and Friedman, 1987). Several arguments: “gaussianity is uninteresting”. Any measure of departure from normality = a PP index. Objective more general than looking for projections that reveal

  • utlying observations. However, several indices very sensitive to

departure from normality in the tails of the distribution and reveal

  • utliers in priority.
  • A. Ruiz-Gazen (University of Toulouse)

EPP using PSO COMPSTAT 2010 7 / 35

slide-15
SLIDE 15

PP indices for multivariate outliers detection: a review First proposals

First proposals

Friedman-Tukey (1974): IFT(α) = 1 n2h2

n

  • i=1

n

  • j=1

K α′(Xi − Xj) h

  • with K(u) = 35

32(1 − u2)3I{|u|≤1} and h = 3.12N− 1

6 (Klinke, 1997).

  • A. Ruiz-Gazen (University of Toulouse)

EPP using PSO COMPSTAT 2010 8 / 35

slide-16
SLIDE 16

PP indices for multivariate outliers detection: a review First proposals

First proposals

Friedman-Tukey (1974): IFT(α) = 1 n2h2

n

  • i=1

n

  • j=1

K α′(Xi − Xj) h

  • with K(u) = 35

32(1 − u2)3I{|u|≤1} and h = 3.12N− 1

6 (Klinke, 1997).

Friedman (1987): index based on the L2 distance between the projected data distribution and the Gaussian distribution (using expansions based on Legendre polynomials).

  • A. Ruiz-Gazen (University of Toulouse)

EPP using PSO COMPSTAT 2010 8 / 35

slide-17
SLIDE 17

PP indices for multivariate outliers detection: a review First proposals

First proposals

Friedman-Tukey (1974): IFT(α) = 1 n2h2

n

  • i=1

n

  • j=1

K α′(Xi − Xj) h

  • with K(u) = 35

32(1 − u2)3I{|u|≤1} and h = 3.12N− 1

6 (Klinke, 1997).

Friedman (1987): index based on the L2 distance between the projected data distribution and the Gaussian distribution (using expansions based on Legendre polynomials). Kurtosis: Ikurt(α) =

n

  • i=1

(α′Xi)4 (Huber, 1985, Pe˜ na and Prieto, 2001)

  • A. Ruiz-Gazen (University of Toulouse)

EPP using PSO COMPSTAT 2010 8 / 35

slide-18
SLIDE 18

PP indices for multivariate outliers detection: a review Other proposals

Other indices

Measure of outlyingness (Stahel-Donoho): for each observation i = 1, . . . , n, Ii(α) = |α′Xi − medj(α′Xj)| madj(α′Xj) where “med” = median, “mad” = median absolute deviation of the projected data from the median.

  • A. Ruiz-Gazen (University of Toulouse)

EPP using PSO COMPSTAT 2010 9 / 35

slide-19
SLIDE 19

PP indices for multivariate outliers detection: a review Other proposals

Other indices

Measure of outlyingness (Stahel-Donoho): for each observation i = 1, . . . , n, Ii(α) = |α′Xi − medj(α′Xj)| madj(α′Xj) where “med” = median, “mad” = median absolute deviation of the projected data from the median. Dispersion-based indices: robust dispersion estimator (Li and Chen, 1985, Croux and Ruiz-Gazen, 2005), defines a robust principal component analysis.

  • A. Ruiz-Gazen (University of Toulouse)

EPP using PSO COMPSTAT 2010 9 / 35

slide-20
SLIDE 20

PP indices for multivariate outliers detection: a review Other proposals

Other indices

Measure of outlyingness (Stahel-Donoho): for each observation i = 1, . . . , n, Ii(α) = |α′Xi − medj(α′Xj)| madj(α′Xj) where “med” = median, “mad” = median absolute deviation of the projected data from the median. Dispersion-based indices: robust dispersion estimator (Li and Chen, 1985, Croux and Ruiz-Gazen, 2005), defines a robust principal component analysis. Juan and Prieto (2001) for concentrated contamination patterns Hall and Kay (2005): non parametric atypicality index Indices adapted to time series, . . .

  • A. Ruiz-Gazen (University of Toulouse)

EPP using PSO COMPSTAT 2010 9 / 35

slide-21
SLIDE 21

PP indices for multivariate outliers detection: a review Other proposals

Other indices

Measure of outlyingness (Stahel-Donoho): for each observation i = 1, . . . , n, Ii(α) = |α′Xi − medj(α′Xj)| madj(α′Xj) where “med” = median, “mad” = median absolute deviation of the projected data from the median. Dispersion-based indices: robust dispersion estimator (Li and Chen, 1985, Croux and Ruiz-Gazen, 2005), defines a robust principal component analysis. Juan and Prieto (2001) for concentrated contamination patterns Hall and Kay (2005): non parametric atypicality index Indices adapted to time series, . . . Many complementary definitions of indices but . . . the main problem with PP: pursuit computationally intensive.

  • A. Ruiz-Gazen (University of Toulouse)

EPP using PSO COMPSTAT 2010 9 / 35

slide-22
SLIDE 22

Optimization procedures for PP: a review

Contents

1 PP indices for multivariate outliers detection: a review

First proposals Other proposals

2 Optimization procedures for PP: a review

Strategy for the first proposals Other strategies

3 New optimization proposals

A new strategy Heuristics optimization algorithms

Genetic Algorithm Particle Swarm Optimization algorithm Tribes

4 Illustration with EPP-Lab 5 Conclusion and perspectives

  • A. Ruiz-Gazen (University of Toulouse)

EPP using PSO COMPSTAT 2010 10 / 35

slide-23
SLIDE 23

Optimization procedures for PP: a review Strategy for the first proposals

Strategy for the first proposals

Usually complex structure needs several one-dimensional projections to be revealed ⇒ several interesting optima of the projection index.

  • A. Ruiz-Gazen (University of Toulouse)

EPP using PSO COMPSTAT 2010 11 / 35

slide-24
SLIDE 24

Optimization procedures for PP: a review Strategy for the first proposals

Strategy for the first proposals

Usually complex structure needs several one-dimensional projections to be revealed ⇒ several interesting optima of the projection index. The usual strategy Global optimization algorithm ⇒ one optimum projection. Remove the structure found from the data set. Iterate the procedure. Example: For the kurtosis index, Pe˜ na and Prieto (2001) uses a modified version of Newton’s optimization method and remove the structure by projecting the data on the space orthogonal to the projection found. The procedure is iterated p times (the number of dimensions).

  • A. Ruiz-Gazen (University of Toulouse)

EPP using PSO COMPSTAT 2010 11 / 35

slide-25
SLIDE 25

Optimization procedures for PP: a review Strategy for the first proposals

Strategy for the first proposals

Limitations: Global optimization based on repeated local optimization usually quite costly. Local optimization algorithms rely on regularity conditions on the projection index. Structure removal may miss some interesting projections (Huber, 1985, Friedman, 1987).

  • A. Ruiz-Gazen (University of Toulouse)

EPP using PSO COMPSTAT 2010 12 / 35

slide-26
SLIDE 26

Optimization procedures for PP: a review Other strategies

Other strategies

Finite number of projection vectors replace the maximization problem: max{α∈Rp|α′α=1} I(α) by: max

{α∈A|α′α=1} I(α)

where A contains a finite number of directions and calculate I(α) for all α ∈ A. Limitations: This strategy may miss interesting projections by not exploring enough the space of solutions.

  • A. Ruiz-Gazen (University of Toulouse)

EPP using PSO COMPSTAT 2010 13 / 35

slide-27
SLIDE 27

New optimization proposals

Contents

1 PP indices for multivariate outliers detection: a review

First proposals Other proposals

2 Optimization procedures for PP: a review

Strategy for the first proposals Other strategies

3 New optimization proposals

A new strategy Heuristics optimization algorithms

Genetic Algorithm Particle Swarm Optimization algorithm Tribes

4 Illustration with EPP-Lab 5 Conclusion and perspectives

  • A. Ruiz-Gazen (University of Toulouse)

EPP using PSO COMPSTAT 2010 14 / 35

slide-28
SLIDE 28

New optimization proposals A new strategy

A new strategy

Find directly several local optima Run several times local optimization algorithms. Use heuristics algorithms for local optimization (no need of regularity condition and better exploration of the space of solutions) No need for global optimization and structure removal.

  • A. Ruiz-Gazen (University of Toulouse)

EPP using PSO COMPSTAT 2010 15 / 35

slide-29
SLIDE 29

New optimization proposals A new strategy

A new strategy

Find directly several local optima Run several times local optimization algorithms. Use heuristics algorithms for local optimization (no need of regularity condition and better exploration of the space of solutions) No need for global optimization and structure removal. Difficulty: not easy to know the extent to which a new view reflects a similar or a different structure compared with the previous views (Friedman in Jones and Sibson, 1987, discussion) ⇒ exploratory tools to analyze the different projections.

  • A. Ruiz-Gazen (University of Toulouse)

EPP using PSO COMPSTAT 2010 15 / 35

slide-30
SLIDE 30

New optimization proposals Heuristics optimization algorithms

Heuristics optimization algorithms

Two families of heuristic optimization methods (Gilli and Winker, 2008): the trajectory methods (e.g. simulated annealing or Tabu search) which consider one single solution at a time, the population based methods (e.g. genetic algorithms or Particle Swarm Optimization) which update a whole set of solutions

  • simultaneously. Focus on this second family of methods (exploration
  • f the whole search space sometimes more efficient).
  • A. Ruiz-Gazen (University of Toulouse)

EPP using PSO COMPSTAT 2010 16 / 35

slide-31
SLIDE 31

New optimization proposals Heuristics optimization algorithms

Heuristics optimization algorithms

Two families of heuristic optimization methods (Gilli and Winker, 2008): the trajectory methods (e.g. simulated annealing or Tabu search) which consider one single solution at a time, the population based methods (e.g. genetic algorithms or Particle Swarm Optimization) which update a whole set of solutions

  • simultaneously. Focus on this second family of methods (exploration
  • f the whole search space sometimes more efficient).

Main characteristics: Heuristics optimization methods can tackle optimization problems that are not tractable with classical optimization tools. They usually mimic some behavior found in nature. Implementation of GA, PSO and Tribes (adaptive PSO algorithm).

  • A. Ruiz-Gazen (University of Toulouse)

EPP using PSO COMPSTAT 2010 16 / 35

slide-32
SLIDE 32

New optimization proposals Heuristics optimization algorithms

GA

Initialization Evaluation Selection Crossover and Mutation Stop ? End yes no Model An individual represents a projection vector The fitness function is the projection index Random initialization Tournament selection with 3 participants probability of mutation = 0.05 probability of crossover = 0.65 Number of iterations

  • A. Ruiz-Gazen (University of Toulouse)

EPP using PSO COMPSTAT 2010 17 / 35

slide-33
SLIDE 33

New optimization proposals Heuristics optimization algorithms

GA

Initialization Evaluation Selection Crossover and Mutation Stop ? End yes no Model An individual represents a projection vector The fitness function is the projection index Random initialization Tournament selection with 3 participants probability of mutation = 0.05 probability of crossover = 0.65 Number of iterations

  • A. Ruiz-Gazen (University of Toulouse)

EPP using PSO COMPSTAT 2010 17 / 35

slide-34
SLIDE 34

New optimization proposals Heuristics optimization algorithms

GA

Initialization Evaluation Selection Crossover and Mutation Stop ? End yes no Model An individual represents a projection vector The fitness function is the projection index Random initialization Tournament selection with 3 participants probability of mutation = 0.05 probability of crossover = 0.65 Number of iterations

  • A. Ruiz-Gazen (University of Toulouse)

EPP using PSO COMPSTAT 2010 17 / 35

slide-35
SLIDE 35

New optimization proposals Heuristics optimization algorithms

GA

Initialization Evaluation Selection Crossover and Mutation Stop ? End yes no Model An individual represents a projection vector The fitness function is the projection index Random initialization Tournament selection with 3 participants probability of mutation = 0.05 probability of crossover = 0.65 Number of iterations

  • A. Ruiz-Gazen (University of Toulouse)

EPP using PSO COMPSTAT 2010 17 / 35

slide-36
SLIDE 36

New optimization proposals Heuristics optimization algorithms

GA

Initialization Evaluation Selection Crossover and Mutation Stop ? End yes no Model An individual represents a projection vector The fitness function is the projection index Random initialization Tournament selection with 3 participants probability of mutation = 0.05 probability of crossover = 0.65 Number of iterations

  • A. Ruiz-Gazen (University of Toulouse)

EPP using PSO COMPSTAT 2010 17 / 35

slide-37
SLIDE 37

New optimization proposals Heuristics optimization algorithms

GA

Initialization Evaluation Selection Crossover and Mutation Stop ? End yes no Model An individual represents a projection vector The fitness function is the projection index Random initialization Tournament selection with 3 participants probability of mutation = 0.05 probability of crossover = 0.65 Number of iterations

  • A. Ruiz-Gazen (University of Toulouse)

EPP using PSO COMPSTAT 2010 17 / 35

slide-38
SLIDE 38

New optimization proposals Heuristics optimization algorithms

PSO

Particle Swarm Optimization: Kennedy and Eberhart (1995) Stochastic method Biological inspiration (fish schooling and bird flocking)

Each bird seems to move randomly The communication between birds is limited

  • A. Ruiz-Gazen (University of Toulouse)

EPP using PSO COMPSTAT 2010 18 / 35

slide-39
SLIDE 39

New optimization proposals Heuristics optimization algorithms

PSO

Particle Swarm Optimization: Kennedy and Eberhart (1995) Stochastic method Biological inspiration (fish schooling and bird flocking)

Each bird seems to move randomly The communication between birds is limited However, a swarm is able to find food

  • A. Ruiz-Gazen (University of Toulouse)

EPP using PSO COMPSTAT 2010 18 / 35

slide-40
SLIDE 40

New optimization proposals Heuristics optimization algorithms

PSO: strategy of displacement of a particle

Generation of a swarm of particles A fitness is associated to each particle Particles move according to their

  • wn experience

and that of the swarm Convergence made possible by the cooperation between particles

  • A. Ruiz-Gazen (University of Toulouse)

EPP using PSO COMPSTAT 2010 19 / 35

slide-41
SLIDE 41

New optimization proposals Heuristics optimization algorithms

PSO Algorithm

Initialization Update velocity Update position Update memory Stop ? End yes no Model A particle represents a projection vector The fitness function is the projection index Parameters of the particle i − → xi : position − → vi : velocity − − − → pbesti best solution − − − → gbest best solution of the swarm

  • A. Ruiz-Gazen (University of Toulouse)

EPP using PSO COMPSTAT 2010 20 / 35

slide-42
SLIDE 42

New optimization proposals Heuristics optimization algorithms

PSO Algorithm

Initialization Update velocity Update position Update memory Stop ? End yes no Random initialization Velocity equation : − → vi ← ω · − → vi +c1 · − → r1 ⊗ (− − − → pbesti − − → xi ) +c2 · − → r2 ⊗ (− − − → gbest − − → xi ) where : ω, c1, c2 : parameters and − → r1 , − → r2 : random values

  • A. Ruiz-Gazen (University of Toulouse)

EPP using PSO COMPSTAT 2010 20 / 35

slide-43
SLIDE 43

New optimization proposals Heuristics optimization algorithms

PSO Algorithm

Initialization Update velocity Update position Update memory Stop ? End yes no Random initialization Velocity equation : − → vi ← ω · − → vi +c1 · − → r1 ⊗ (− − − → pbesti − − → xi ) +c2 · − → r2 ⊗ (− − − → gbest − − → xi ) where : ω, c1, c2 : parameters and − → r1 , − → r2 : random values

  • A. Ruiz-Gazen (University of Toulouse)

EPP using PSO COMPSTAT 2010 20 / 35

slide-44
SLIDE 44

New optimization proposals Heuristics optimization algorithms

PSO Algorithm

Initialization Update velocity Update position Update memory Stop ? End yes no position equation : − → xi ← − → xi + − → vi I is the projection index update pbesti and gbest Number of iterations

  • A. Ruiz-Gazen (University of Toulouse)

EPP using PSO COMPSTAT 2010 20 / 35

slide-45
SLIDE 45

New optimization proposals Heuristics optimization algorithms

PSO Algorithm

Initialization Update velocity Update position Update memory Stop ? End yes no position equation : − → xi ← − → xi + − → vi I is the projection index update pbesti and gbest Number of iterations

  • A. Ruiz-Gazen (University of Toulouse)

EPP using PSO COMPSTAT 2010 20 / 35

slide-46
SLIDE 46

New optimization proposals Heuristics optimization algorithms

PSO Algorithm

Initialization Update velocity Update position Update memory Stop ? End yes no position equation : − → xi ← − → xi + − → vi I is the projection index update pbesti and gbest Number of iterations

  • A. Ruiz-Gazen (University of Toulouse)

EPP using PSO COMPSTAT 2010 20 / 35

slide-47
SLIDE 47

New optimization proposals Heuristics optimization algorithms

Tribes method

GA and PSO compared in Berro et al. (2010). Results quite similar but several parameters to tune.

  • A. Ruiz-Gazen (University of Toulouse)

EPP using PSO COMPSTAT 2010 21 / 35

slide-48
SLIDE 48

New optimization proposals Heuristics optimization algorithms

Tribes method

GA and PSO compared in Berro et al. (2010). Results quite similar but several parameters to tune. Tribes (Clerc, 2006) is the first parameter-free particle swarm

  • ptimization algorithm. It is an adaptive algorithm.

Principle:

Swarm divided in “tribes” At the beginning, the swarm is composed of only one particle According to tribes’ behaviors, particles are added or removed in the tribes According to the performances of the particles, their strategies of displacement are adapted

  • A. Ruiz-Gazen (University of Toulouse)

EPP using PSO COMPSTAT 2010 21 / 35

slide-49
SLIDE 49

New optimization proposals Heuristics optimization algorithms

The Tribes method

Tribes has been compared with usual PSO in Larabi et al. (2010). Tribes is more interesting in the EPP context for two reasons: No parameter to settle except the stopping criterion, It converges very quickly to local optima.

  • A. Ruiz-Gazen (University of Toulouse)

EPP using PSO COMPSTAT 2010 22 / 35

slide-50
SLIDE 50

Illustration with EPP-Lab

Contents

1 PP indices for multivariate outliers detection: a review

First proposals Other proposals

2 Optimization procedures for PP: a review

Strategy for the first proposals Other strategies

3 New optimization proposals

A new strategy Heuristics optimization algorithms

Genetic Algorithm Particle Swarm Optimization algorithm Tribes

4 Illustration with EPP-Lab 5 Conclusion and perspectives

  • A. Ruiz-Gazen (University of Toulouse)

EPP using PSO COMPSTAT 2010 23 / 35

slide-51
SLIDE 51

Illustration with EPP-Lab

The interface EPP-Lab

Implemented in Java (A. Berro, S. Larabi, E. Chabbert, I. Griffith). Contents Indices for outliers detection: Friedman-Tukey, Friedman and kurtosis. Optimization algorithms: GA, PSO and Tribes.

  • A. Ruiz-Gazen (University of Toulouse)

EPP using PSO COMPSTAT 2010 24 / 35

slide-52
SLIDE 52

Illustration with EPP-Lab

The interface EPP-Lab

Implemented in Java (A. Berro, S. Larabi, E. Chabbert, I. Griffith). Contents Indices for outliers detection: Friedman-Tukey, Friedman and kurtosis. Optimization algorithms: GA, PSO and Tribes. EPP-Lab in action First step in the analysis: the pursuit (may be time consuming but no need for the statistician to be in front of the computer = GGOBI), Second step: from the first step (projections saved), analysis of the results (imediate)

  • A. Ruiz-Gazen (University of Toulouse)

EPP using PSO COMPSTAT 2010 24 / 35

slide-53
SLIDE 53

Illustration with EPP-Lab

The interface EPP-Lab in action

  • A. Ruiz-Gazen (University of Toulouse)

EPP using PSO COMPSTAT 2010 25 / 35

slide-54
SLIDE 54

Illustration with EPP-Lab

The interface EPP-Lab in action

The file don3.txt is 200 × 8 with: the first 190 observations follow a N(0, I8) distribution and the last 10 follow a N((10, 0, . . . , 0)′, I8) distribution. Observations 191 to 200 are outlying.

  • A. Ruiz-Gazen (University of Toulouse)

EPP using PSO COMPSTAT 2010 26 / 35

slide-55
SLIDE 55

Illustration with EPP-Lab

The interface EPP-Lab in action

  • A. Ruiz-Gazen (University of Toulouse)

EPP using PSO COMPSTAT 2010 27 / 35

slide-56
SLIDE 56

Illustration with EPP-Lab

The interface EPP-Lab in action

  • A. Ruiz-Gazen (University of Toulouse)

EPP using PSO COMPSTAT 2010 28 / 35

slide-57
SLIDE 57

Illustration with EPP-Lab

The interface EPP-Lab in action

  • A. Ruiz-Gazen (University of Toulouse)

EPP using PSO COMPSTAT 2010 29 / 35

slide-58
SLIDE 58

Illustration with EPP-Lab

The interface EPP-Lab in action

  • A. Ruiz-Gazen (University of Toulouse)

EPP using PSO COMPSTAT 2010 30 / 35

slide-59
SLIDE 59

Illustration with EPP-Lab

The interface EPP-Lab in action

  • A. Ruiz-Gazen (University of Toulouse)

EPP using PSO COMPSTAT 2010 31 / 35

slide-60
SLIDE 60

Conclusion and perspectives

Contents

1 PP indices for multivariate outliers detection: a review

First proposals Other proposals

2 Optimization procedures for PP: a review

Strategy for the first proposals Other strategies

3 New optimization proposals

A new strategy Heuristics optimization algorithms

Genetic Algorithm Particle Swarm Optimization algorithm Tribes

4 Illustration with EPP-Lab 5 Conclusion and perspectives

  • A. Ruiz-Gazen (University of Toulouse)

EPP using PSO COMPSTAT 2010 32 / 35

slide-61
SLIDE 61

Conclusion and perspectives

Conclusion and perspectives

Conclusion

Heuristics algorithms interesting in the context of EPP with a new strategy based on local optimization. Possibility of using free-parameter algorithms such as Tribes.

  • A. Ruiz-Gazen (University of Toulouse)

EPP using PSO COMPSTAT 2010 33 / 35

slide-62
SLIDE 62

Conclusion and perspectives

Conclusion and perspectives

Conclusion

Heuristics algorithms interesting in the context of EPP with a new strategy based on local optimization. Possibility of using free-parameter algorithms such as Tribes.

Perspectives

Interface EPP-Lab to improve:

study and implement several stopping criteria, implement Stahel-Donoho index, implement robust selection, Develop and implement statistical tools to summarize the different projections (clustering of variables, principal components analysis, sum

  • f projectors, . . . )
  • A. Ruiz-Gazen (University of Toulouse)

EPP using PSO COMPSTAT 2010 33 / 35

slide-63
SLIDE 63

Conclusion and perspectives

Conclusion and perspectives

Conclusion

Heuristics algorithms interesting in the context of EPP with a new strategy based on local optimization. Possibility of using free-parameter algorithms such as Tribes.

Perspectives

Interface EPP-Lab to improve:

study and implement several stopping criteria, implement Stahel-Donoho index, implement robust selection, Develop and implement statistical tools to summarize the different projections (clustering of variables, principal components analysis, sum

  • f projectors, . . . )

Introduce multiobjective optimization in order to deal with spatial data sets. . . .

  • A. Ruiz-Gazen (University of Toulouse)

EPP using PSO COMPSTAT 2010 33 / 35

slide-64
SLIDE 64

Conclusion and perspectives

Biblography I

Berro, A., Larabi Marie-Sainte, S. and Ruiz-Gazen, A. Genetic Algorithms and Particle Swarm Optimization for Exploratory Projection Pursuit. Annals of Mathematics and Artificial Intelligence, to appear, 2010 Larabi Marie-Sainte S., Berro, A. and Ruiz-Gazen, A. An Efficient Optimization Method for Revealing Local Optima of Projection Pursuit Indices. Ants 2010, Seventh International Conference on Swarm Intelligence, 2010 Caussinus, H. and Ruiz-Gazen, A. Exploratory Projection Pursuit. In Data Analysis (Digital Signal and Image Processing series), Ed. G. Govaert, ISTE, 2009 Thank you for your attention!

  • A. Ruiz-Gazen (University of Toulouse)

EPP using PSO COMPSTAT 2010 34 / 35