identification of local multivariate outliers
play

Identification of local multivariate outliers Anne Ruiz-Gazen and - PowerPoint PPT Presentation

Identification of local multivariate outliers Anne Ruiz-Gazen and Christine Thomas-Agnan Gremaq, TSE and IMT Toulouse, France (in collab. with Peter Filzmoser) SSIAB - Avignon - 11/05/12 A. Ruiz-Gazen & C. Thomas-Agnan (TSE) Local


  1. Identification of local multivariate outliers Anne Ruiz-Gazen and Christine Thomas-Agnan Gremaq, TSE and IMT Toulouse, France (in collab. with Peter Filzmoser) SSIAB - Avignon - 11/05/12 A. Ruiz-Gazen & C. Thomas-Agnan (TSE) Local multivariate outliers SSIAB - Avignon - 11/05/12 1 / 24

  2. Introduction In robust statistics, an observation is considered as outlying if it differs from the main bulk of the data set. A. Ruiz-Gazen & C. Thomas-Agnan (TSE) Local multivariate outliers SSIAB - Avignon - 11/05/12 2 / 24

  3. Introduction In robust statistics, an observation is considered as outlying if it differs from the main bulk of the data set. F ε = (1 − ε ) F + ε G In the case of continuous attributes, the main bulk of the data set assumed to follow an elliptical distribution (e.g. gaussian) F and the outlying observations following a distribution G (e.g. point mass). A. Ruiz-Gazen & C. Thomas-Agnan (TSE) Local multivariate outliers SSIAB - Avignon - 11/05/12 2 / 24

  4. Introduction In robust statistics, an observation is considered as outlying if it differs from the main bulk of the data set. F ε = (1 − ε ) F + ε G In the case of continuous attributes, the main bulk of the data set assumed to follow an elliptical distribution (e.g. gaussian) F and the outlying observations following a distribution G (e.g. point mass). Objective : identify/detect gross errors, atypical observations taking into account the multivariate and the spatial nature of the data. A. Ruiz-Gazen & C. Thomas-Agnan (TSE) Local multivariate outliers SSIAB - Avignon - 11/05/12 2 / 24

  5. Introduction North�Atlantic�Ocean 0 km 50 100 150 200 Legend E�U�R�O�P E Mine,�in�production Barents�Sea N�O�R W A Y Mine,�closed�down Kirkenes Nikel Important�mineral�occurrence, Zapolyarnij not�developed R�U�S�S�I�A Murmansk Murmansk Smelter,�production of�mineral�concentrate Ivalo City,�town,�settlement F�I�N�L�A�N�D Project�boundary Olenegorsk Monchegorsk Saattopora Kirovsk Kovdor Kittil ä Keivitsa Apatity Pahtavaara E Kandalaksha ' 0 3 24 E o 5 3 o White�Sea C i r c l e A r c t c i � Rovaniemi The Kola project : concentration measures for more than 50 chemical elements in four layers and 617 observations. A. Ruiz-Gazen & C. Thomas-Agnan (TSE) Local multivariate outliers SSIAB - Avignon - 11/05/12 3 / 24

  6. Introduction North�Atlantic�Ocean 0 km 50 100 150 200 Legend E�U�R�O�P E Mine,�in�production Barents�Sea N�O�R W A Y Mine,�closed�down Kirkenes Nikel Important�mineral�occurrence, Zapolyarnij not�developed R�U�S�S�I�A Murmansk Murmansk Smelter,�production of�mineral�concentrate Ivalo City,�town,�settlement F�I�N�L�A�N�D Project�boundary Olenegorsk Monchegorsk Saattopora Kirovsk Kovdor Kittil ä Keivitsa Apatity Pahtavaara E Kandalaksha ' 0 3 24 E o 5 3 o White�Sea C i r c l e A r c t i c � Rovaniemi The Kola project : concentration measures for more than 50 chemical elements in four layers and 617 observations. Data available in the R-package mvoutlier by M. Gschwandtner et P. Filzmoser. A. Ruiz-Gazen & C. Thomas-Agnan (TSE) Local multivariate outliers SSIAB - Avignon - 11/05/12 3 / 24

  7. 1 Detection of outliers in a non spatial context Detection of univariate outliers Detection of multivariate outliers 2 Spatial outliers Global and local outliers Identification of univariate spatial outliers 3 Identification of multivariate spatial outliers Variocloud of pairwise Mahalanobis distances Toy example Quantile geographical-variate plot A. Ruiz-Gazen & C. Thomas-Agnan (TSE) Local multivariate outliers SSIAB - Avignon - 11/05/12 4 / 24

  8. Detection of outliers in a non spatial context Detection of univariate outliers 1 Detection of outliers in a non spatial context Detection of univariate outliers Detection of multivariate outliers 2 Spatial outliers Global and local outliers Identification of univariate spatial outliers 3 Identification of multivariate spatial outliers Variocloud of pairwise Mahalanobis distances Toy example Quantile geographical-variate plot A. Ruiz-Gazen & C. Thomas-Agnan (TSE) Local multivariate outliers SSIAB - Avignon - 11/05/12 5 / 24

  9. Detection of outliers in a non spatial context Detection of univariate outliers Detection of univariate outliers Let us consider a data set x , n × p with n observations x i and p variables. A. Ruiz-Gazen & C. Thomas-Agnan (TSE) Local multivariate outliers SSIAB - Avignon - 11/05/12 6 / 24

  10. Detection of outliers in a non spatial context Detection of univariate outliers Detection of univariate outliers Let us consider a data set x , n × p with n observations x i and p variables. In one dimension ( p = 1), the detection of outliers is often based on | x i − ¯ x | σ x (Grubbs, 1969). A. Ruiz-Gazen & C. Thomas-Agnan (TSE) Local multivariate outliers SSIAB - Avignon - 11/05/12 6 / 24

  11. Detection of outliers in a non spatial context Detection of univariate outliers Detection of univariate outliers Let us consider a data set x , n × p with n observations x i and p variables. In one dimension ( p = 1), the detection of outliers is often based on | x i − ¯ x | σ x (Grubbs, 1969). Problem of masking effect : outliers may spoil the empirical mean and the standard deviation estimators in such a way that outliers are not detected. A. Ruiz-Gazen & C. Thomas-Agnan (TSE) Local multivariate outliers SSIAB - Avignon - 11/05/12 6 / 24

  12. Detection of outliers in a non spatial context Detection of univariate outliers Detection of univariate outliers Let us consider a data set x , n × p with n observations x i and p variables. In one dimension ( p = 1), the detection of outliers is often based on | x i − ¯ x | σ x (Grubbs, 1969). Problem of masking effect : outliers may spoil the empirical mean and the standard deviation estimators in such a way that outliers are not detected. Robust version : ¯ x and σ x replaced by some robust estimators such as the median and the MAD. A. Ruiz-Gazen & C. Thomas-Agnan (TSE) Local multivariate outliers SSIAB - Avignon - 11/05/12 6 / 24

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend