sensitivity analysis of feature selection for object
play

Sensitivity Analysis of Feature Selection for Object Based Urban - PowerPoint PPT Presentation

Sensitivity Analysis of Feature Selection for Object Based Urban Classification Stefanos Georganos a , Michal Shimoni b , Tais Grippa a , Sabine Vanhuysse a , Moritz Lennert a , , Elonore Wolff a a Department of Geosciences, Environment and


  1. Sensitivity Analysis of Feature Selection for Object Based Urban Classification Stefanos Georganos a , Michal Shimoni b , Tais Grippa a , Sabine Vanhuysse a , Moritz Lennert a , , Eléonore Wolff a a Department of Geosciences, Environment and Society (DGES), Institute for Environmental Management and Land-Use Planning (IGEAT), Universitée libre de Bruxelles, 1050 Bruxelles, Belgium b Signal and Image Center, Royal Military Academy, 1000 Bruxelles, Belgium IV JIAAIS Workshop, UFAL-Maceió, Brazil; 10-15 November 2017

  2. Introduction  Significant increase in the acquisition of very-high-resolution (VHR) satellite data from Earth Observation (EO) missions.  Land Use – Land Cover ( LULC) maps can be produced at unprecedented resolutions in urban areas.  Significant change in the image processing approach Object based classification Pixel based classification 2 IV JIAAIS Workshop, UFAL-Maceió, Brazil; 10-15 November 2017

  3. Introduction Object-based Image Analysis (OBIA) principles:  3 steps: feature extraction, segmentation, classification.  Several features can be computed : geometrical, textural, spectral, contextual nature. 3 IV JIAAIS Workshop, UFAL-Maceió, Brazil; 10-15 November 2017

  4. Pleiades image - Ouagadougou Capital of Burkina Faso. Area: 625 km 2 Image size: 12 X 10 6 pixels Spatial resolution: 0.5 m Spectral resolution: 4 bands, RGB+NIR Tri-stereo option: DEM 4 IV JIAAIS Workshop, UFAL-Maceió, Brazil; 10-15 November 2017

  5. Feature extraction Optical: VNIR(4bands)  Morphological statistics: • area, perimeter, compact (circle+square), fractal dimension  Spectral statistics: • Optical indices: NDVI, NDWI • Measures: min, max, range, mean, stddev, variance, sum, 1 th quantile, median, 3 rd quantile, coefficient of variation  DEM: • Height nDSM Total: 169 features !! 5 IV JIAAIS Workshop, UFAL-Maceió, Brazil; 10-15 November 2017

  6. i.segment cost 6 IV JIAAIS Workshop, UFAL-Maceió, Brazil; 10-15 November 2017

  7. Contextual framework  The total amount of features that can be computed can often exceed several hundreds.  Highly dimensional datasets can have important effects in a classifier’s performance: • Reduced accuracy due to the Hughes effect • Increased training time • Overfitting from noisy/irrelevant features • Complex models that are difficult to interpret and transfer  In a large scale urban analysis other effects that have not been systematically examined in the literature are: • Storage space • Processing time to compute all these features in an OBIA form  Feature Selection (FS) algorithms serves to reduce the number of predictors for each classifiers. 7 IV JIAAIS Workshop, UFAL-Maceió, Brazil; 10-15 November 2017

  8. Aim and Objectives  To investigate the effect of Feature Selection techniques of various types and complexities in several state of the art classifiers in terms of: • Classification accuracy • Number of features  To propose a new metric for model selection that: • Quantifies model parsimony • Storage requirements • Processing time • Prediction accuracy 8 IV JIAAIS Workshop, UFAL-Maceió, Brazil; 10-15 November 2017

  9. Methodology – Object Based Image Analysis (OBIA)  Semi-automated processing chain for OBIA – LULC mapping  Jupyter notebook  GRASS GIS functions  Python  R 9 IV JIAAIS Workshop, UFAL-Maceió, Brazil; 10-15 November 2017

  10. Methodology – Classification Algorithms  Support Vector Machines (SVM)  Random Forest (RF)  K-Nearest Neighbor (KNN)  Recursive Partitioning (RPART)  Extreme Gradient Boosting (Xgboost) 10 IV JIAAIS Workshop, UFAL-Maceió, Brazil; 10-15 November 2017

  11. Classifiers Linear Support Vector Machines (SVM) Random Forest (RF) Adapted from Burges (1998) Adapted from Belgiu et al., (2016) 11 IV JIAAIS Workshop, UFAL-Maceió, Brazil; 10-15 November 2017

  12. Extreme Gradient Boosting (Xgboost) The smaller the score the better the structure is 𝐸 = 𝑜, 𝑦 𝑗 ∈ ℝ 𝑛 For a given data set with n examples/members and m features 𝐸 = 𝑦 𝑗 − 𝑧 𝑗 𝐿 Our tree ensemble model uses additive K function to predict the output: 𝑧 𝑗 = 𝜚 x 𝑗 = 𝑔 x 𝑗 , 𝑔 𝑙 ∈ ℱ, 𝑙 𝑙=1 𝑟 ∶ ℝ 𝑛 → 𝑈, 𝓍 ∈ ℝ 𝑛 where ℱ = 𝑔 𝒴 = 𝓍 𝑟(𝒴) 12 IV JIAAIS Workshop, UFAL-Maceió, Brazil; 10-15 November 2017

  13. Methodology – Feature Selection Algorithms  Filters: 𝑠 𝑙𝑔 𝐷𝐺𝑇 = 𝑁𝑓𝑠𝑗𝑢 𝑡 Correlation-Based Feature Selection (CFS)  𝑙 + 𝑙(𝑙 − 1)𝑠 𝑔𝑔 • Subset ranking that maximizes correlation with the dependent variable while minimizes correlation with independent variables  Embedded: 𝑁𝐸𝐵 𝑜 𝑘 𝑗 ∈ OOB 𝐽 𝑧 𝑗 = 𝑔 x 𝑗 − 𝑗 ∈ OOB 𝐽 𝑧 𝑗 = 𝑔 x 𝑗 = 1 Mean Decrease in Accuracy (MDA) 𝑜  OOB 𝑢=1 • Built-in feature evaluation in decision trees that performs FS while training  Wrappers: Recursive Feature Elimination (RFE)  Variable Selection Using Random Forest (VSURF)  • Computationally intensive methods that create several models to evaluate features 13 IV JIAAIS Workshop, UFAL-Maceió, Brazil; 10-15 November 2017

  14. Methodology – Training and Test sets Training Test LULC Set Size Set Size Buildings 157 157 Swimming Pools 68 69 Asphalt road 56 56 Brown/Red Bare Soil 90 91 White/Grey Bare Soil 72 72 Trees 77 77 Mixed Bare Soil/Vegetation 76 75 Dry Vegetation 70 70 Other Vegetation 139 141 Inland Waters 75 75 Shadow 58 59 14 IV JIAAIS Workshop, UFAL-Maceió, Brazil; 10-15 November 2017

  15. Methodology - Classification Optimization Score (COS)  Classification Optimization Score (COS): 𝑂 𝑜 ∗𝑃𝐵 𝑜 𝐷𝑃𝑇 = (1 + 𝑏 2 ) 𝑏 2 ∗𝑂 𝑜 +𝑃𝐵 𝑜  where 𝑂 𝑜 is the normalized value of the number of features of a classification model  OA n is the normalized overall classification accuracy across all classifiers and feature selection methods  a is the weight factor 15 IV JIAAIS Workshop, UFAL-Maceió, Brazil; 10-15 November 2017

  16. Results – Support Vector Machine 16 IV JIAAIS Workshop, UFAL-Maceió, Brazil; 10-15 November 2017

  17. Results – Extreme Gradient Boosting 17 IV JIAAIS Workshop, UFAL-Maceió, Brazil; 10-15 November 2017

  18. Results – Random Forest 18 IV JIAAIS Workshop, UFAL-Maceió, Brazil; 10-15 November 2017

  19. Results – Recursive Partitioning 19 IV JIAAIS Workshop, UFAL-Maceió, Brazil; 10-15 November 2017

  20. Results – k Nearest Neighbors 20 IV JIAAIS Workshop, UFAL-Maceió, Brazil; 10-15 November 2017

  21. Results – Overall Accuracy Algorithm CFS MDA RFE VSURF All Features Xgboost 79.2 (42) 79.1 (136) 79.8* (23) 79.5* (31) 77.8 (169) RF 78.5 (117) 78.6 (84) 78.9 (22) 78.1 (38) 77.7 (169) SVM 80.1* (37) 78.8 (111) 79.2 (32) 79.7* (75) 78.1 (169) KNN 78.2* (51) 76.2* (75) 78.0* (42) 77.9* (20) 74 (169) Rpart 69.5 (53) 69.4 (123) 69.6 (34) 70.1 (49) 69.4 (169) 21 IV JIAAIS Workshop, UFAL-Maceió, Brazil; 10-15 November 2017

  22. Results – Classification Optimization Score Number of Classification Model COS Overall Accuracy Features RFE_XGBOOST 0.982 0.798 23 CFS_SVM 0.980 0.801 38 CFS_SVM 0.975 0.800 37 CFS_SVM 0.975 0.801 39 RFE_XGBOOST 0.972 0.792 22 VSURF_XGBOOST 0.971 0.795 31 RFE_XGBOOST 0.970 0.792 25 CFS_SVM 0.970 0.797 36 CFS_SVM 0.970 0.799 40 VSURF_SVM 0.969 0.789 20 22 IV JIAAIS Workshop, UFAL-Maceió, Brazil; 10-15 November 2017

  23. Results – Classification Optimization Score  The amount of input features in a classifier can serve as a robust surrogate for computational time, model complexity, data storage and processing. 23 IV JIAAIS Workshop, UFAL-Maceió, Brazil; 10-15 November 2017

  24. Results - Classification Classification results from the highest ranked model of the COS metric (Xgboost, 23 features) for different regions in Ouagadougou. a) Industrial, b) b) planned residential and c) c) unplanned residential neighborhoods. 24 IV JIAAIS Workshop, UFAL-Maceió, Brazil; 10-15 November 2017

  25. Conclusions  The selection of an appropriate FS method can have a crucial impact on the performance of ML classifier.  The added value of using a wide set of features might not manifest if no or inappropriate FS is performed such as in the case of SVM, KNN and Xgboost.  MDA has difficulties to detect discriminant features in heavily redundant datasets.  CFS was the best all around FS method. 25 IV JIAAIS Workshop, UFAL-Maceió, Brazil; 10-15 November 2017

  26. Discussion  RFE and VSURF performed better for Classification and Regression Trees (CART) based classifiers.  In terms of pure classification accuracy, SVM and Xgboost were the best performing classifiers.  By using the proposed COS index, parsimonious model selection was performed.  The COS metric a more intuitive evaluation measure than other, solely accuracy based metrics such as Overall Accuracy or Kappa index.  Particular benefit in large scale urban classifications where the computational burden and complexity of the models are critical factors. 26 IV JIAAIS Workshop, UFAL-Maceió, Brazil; 10-15 November 2017

  27.  Thank you! REACT Project: http://www.react.ulb.be MAUPP Project: http://www.MAUPP.ulb.be 27 IV JIAAIS Workshop, UFAL-Maceió, Brazil; 10-15 November 2017

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend