An Algorithm for Sample and Data Dimensionality Reduction Using Fast - PowerPoint PPT Presentation

7th International Conference on Advanced Data Mining and Applications An Algorithm for Sample and Data Dimensionality Reduction Using Fast Simulated Annealing Szymon Łukasik , Piotr Kulczycki Department of Automatic Control and IT, Cracow University of Technology Systems Research Institute, Polish Academy of Sciences

Motivation • It is estimated ("How much information” project, Univ. of California Berkeley) that 1 million terabytes of data is generated annually worldwide, with 99.997% of it available only in digital form. • It is commonly agreed that our ability to analyze new data is growing at much lower pace than the capacity to collect and store it. • When examining huge data samples one faces both technical difficulties and methodological obstacles of high-dimensional data analysis (coined term – "curse of dimensionality”) . 2

Curse of dimensionality - example Source: K. Beyer et al., „ When Is « Nearest Neighbor » meaningful ?”, In: Proc. ICDT, 1999. 3

Scope of our research • We have developed an universal unsupervised data dimensionality reduction technique, in some aspects similar to Principal Components Analysis (it’s linear) and Multidimensional Scaling (it’s distance-preserving). What is more, we try to reduce data sample length at the same time • Establishing exact form of the transformation matrix is treated as a continuous optimization problem and solved by Parallel Fast Simulated Annealing. • The algorithm is supposed to be used in conjunction with various procedures of data mining e.g. outlier detection, cluster analysis, and classification. 4

General description of the algorithm • Data dimensionality reduction is realized via linear transformation: 𝑋 = 𝐵 𝑉 whereas 𝑉 denotes the initial data set ( 𝑜 × 𝑛 ), 𝐵 - transformation matrix ( 𝑂 × 𝑜 ) and 𝑋 represents the transformed data matrix ( 𝑂 × 𝑛 ). • Transformation matrix is obtained using Parallel FSA. The cost function 𝑕 ( 𝐵 ) which is minimized is given by raw Stress: 𝑛 𝑛 2 𝑕 ( 𝐵 ) = 𝑥 𝑗 ( 𝐵 ) − 𝑥 𝑘 ( 𝐵 ) 𝑆 𝑂 − 𝑣 𝑗 − 𝑣 𝑘 𝑆 𝑜 𝑗 =1 𝑘 = 𝑗 +1 with 𝐵 being a solution of the optimization problem, and 𝑣 i , 𝑣 j , 𝑥 i ( 𝐵 ) , 𝑥 j ( 𝐵 ) representing data instances in initial and reduced feature space. 5

FSA neighbor generation strategy 20 20  a 2  a 2 0 0 -20 -20 -20 -10 0 10 20 -20 -10 0 10 20  a 1  a 1 6

FSA temperature and termination criterion Initial temperature 𝑈 (0) is determined through a set of pilot runs consisting • of 𝑙 𝑄 positive transitions from the starting solution. It is supposed to guarantee predetermined initial level of worse solution’s acceptance probability 𝑄 (0) resulting from the Metropolis rule. • Initial solution is obtained using feature selection algorithm introduced by Pal & Mitra in 2004. It is based on feature space clustering, with similar features forming distinctive clusters. As a measure of similarity maximal information compression index was defined. The partition itself is performed using k-nearest neighbor rule (here 𝒍 = 𝒐 − 𝑶 is being used). • The termination criterion is either executing assumed number of iterations or fulfilling customized condition based on the estimator of the global minimum employing order statistics proposed recently for a class of stochastic random search algorithms by Bartkute and Sakalauskas (2009) 7

FSA paralellization Current global solution … Neighbor 1 Neighbor 2 Neighbor n cores FSA FSA FSA Current 1 Current 2 Current n cores Make global current either best improving or random non-improving solution 8

Sample size reduction For each sample element u i a positive weight 𝑞 i is assigned. It incorporates an • information about a relative deformation of the element’s distance to other sample points. Data elements with higher weight could then be treated as more 𝑞 𝑗 = 1 adequate. Weights are normalized to fulfill . • Consequently weights could be then used to improve the performance of data mining procedures e.g. by introducing such weights into the definition of the classic data mining algorithms (e.g. k-means or k-nearest neighbor). • Alternatively one can use weights to eliminate some data elements from the sample. It can be performed by removing from the sample data elements with associated weights fulfilling following condition: 𝑞 i < P where P ∊ [0, 1] and then normalizing all weights. One can achieve in this way simultaneous dimensionality and sample length reduction with P serving as a data compression ratio. 9

Experimental evaluation • We have examined the performance of the algorithm by measuring the accuracy of outlier detection 𝘑 o (for artificially generated datasets), clustering 𝘑 c and classification 𝘑 k (for selected benchmark instances from UCI ML repository). • Outlier detection was performed using nonparametric statistical kernel density estimation. By using randomly generated datasets we had a possibility to designate actual outliers. • Clustering accuracy was measured by Rand index (in reference to class labels). It was implemented via classic k-means algorithm. • Classification accuracy (for nearest-neighbor classifier) was measured, by average classification correctness obtained during 5-fold cross-validation procedure. • Each test consisted of 30 runs, we reported average and the mean of above mentioned indices. We compared our approach to PCA and Evolutionary Algorithms- based Feature Selection (by Saxena et al.). 10

Example: seeds dataset (7D → 2D) Our approach PCA 11

More details – classification glass wine WBC vehicle seeds 9D → 4D 13D → 5D 9D → 4D 18D → 10D 7D → 2D 𝒍𝑱𝑶𝑱𝑼 𝑱 71.90 74.57 95.88 63.37 90.23 ±𝝉 ( 𝑱 𝒍𝑱𝑶𝑱𝑼 ) ±8 .10 ±5 .29 ±1 .35 ±3 .34 ±2 .85 Our approach (P=0.1) 𝒍𝑺𝑭𝑬 𝑱 70.48 78.00 95.95 63.96 89.76 ±𝝉 ( 𝑱 𝒍𝑺𝑭𝑬 ) ±7 .02 ±4 .86 ±1 .43 ±2 .66 ±3 .18 PCA 𝒍𝑺𝑭𝑬 𝑱 58.33 72.00 95.29 62.24 83.09 ±𝝉 ( 𝑱 𝒍𝑺𝑭𝑬 ) ±6 .37 ±7 .22 ±2 .06 ±3 .84 ±7 .31 EA-based Feature Selection 𝒍𝑺𝑭𝑬 𝑱 64.80 72.82 95.10 60.86 not tested ±𝝉 ( 𝑱 𝒍𝑺𝑭𝑬 ) ±4 .43 ±1 .02 ±0 .80 ±1 .51 12

More details – cluster analysis glass wine WBC vehicle seeds 9D → 4D 13D → 5D 9D → 4D 18D → 10D 7D → 2D 𝒅𝑱𝑶𝑱𝑼 𝑱 68.23 93.48 66.23 64.18 91.06 Our approach (P=0.2) 𝒅𝑺𝑭𝑬 𝑱 68.43 92.81 66.29 64.62 89.59 ±𝝉 ( 𝑱 𝒅𝑺𝑭𝑬 ) ±0 .62 ±0 .76 ±0 .62 ±0 .24 ±1 .57 PCA 𝒅𝑺𝑭𝑬 𝑱 67.71 92.64 66.16 64.16 88.95 13

Conclusion • The algorithm was tested for numerous instances of outlier detection, cluster analysis and classification problems and was found to offer promising performance. It results in an accurate distance preservation with possibility of out-of-sample extension at the same time. • Drawbacks? It is not designed for huge datasets (due to significant computational cost of cost function evaluation) and shouldn’t be used in the situation where only single data analysis task needs to be performed. • What can be done in the future We observed that taking into account topological deformation of the dataset in the reduced feature space (by proposed weighting scheme) brings positive results in various data mining procedures. It can be easily extended for other DR techniques! Proposed approach could make algorithms very prone to ‘ curse of dimensionality ’ practically usable (we have examined it in the case of KDE). 14

Thank you for your attention!

Short bibliography 1. H. Szu, R. Hartley: " Fast simulated annealing ” , Physics Letters A, vol. 122/3-4, 1987. 2. L. Ingber: " Adaptive simulated annealing (ASA): Lessons learned “ , Control and Cybernetics, vol. 25/1, 1996. 3. D. Nam, J.-S. Lee, C. H. Park, " N-dimensional Cauchy neighbor generation for the fast simulated annealing ", IEICE Trans. Information and Systems, vol. E87-D/11, 2004 4. S.K. Pal, P. Mitra, " Pattern Recognition Algorithms for Data Mining ” , Chapman and Hall, 2004. 5. V. Bartkute, L. Sakalauskas: " Statistical Inferences for Termination of Markov Type Random Search Algorithms ”, Journal of Optimization Theory and Applications, vol. 141/3, 2009. 6. P. Kulczycki, " Kernel Estimators in Industrial Applications ”, Soft Computing Applications in Industry”, B. Prasad (ed.), Springer-Verlag, 2008. 7. A. Saxena, N.R. Pal, M. Vora, " Evolutionary methods for unsupervised feature selection using Sammon’s stress function ". Fuzzy Information and Engineering, vol. 2, 2010. 16

An Algorithm for Sample and Data Dimensionality Reduction Using Fast - PowerPoint PPT Presentation

7th International Conference on Advanced Data Mining and Applications An Algorithm for Sample and Data Dimensionality Reduction Using Fast Simulated Annealing Szymon ukasik , Piotr Kulczycki Department of Automatic Control and IT, Cracow

STAT 209 Dimensionality Reduction November 26, 2019 Colin Reimer Dawson 1 / 24 Dimensionality

Dimensionality Reduction Alexandros Tantos Assistant Professor Aristotle University of

10 slides that always work Simple text boxes (I) Sample text Sample text Sample text

Investigating Dimensionality Dimensionality Dimensionality with with Investigating

Dimensionality Reduction Algorithms (and how to interpret their output) Dalya Baron (Tel Aviv

Exploring Multivariate Data with Clustering and Dimensionality Reduction Marco Baroni Practical

Preprocessing and Dimensionality Reduction J er emy Fix CentraleSup elec

Kernel-Based Dimensionality Reduction Methods on Synthesized and Facial Image Data Jonathan L.

WIKIPEDIA ARTICLE GROUP 9 Contents Article Overview 1. Dimensionality Reduction 2.

Nonlinear Dimensionality Reduction Donovan Parks Overview Direct visualization vs.

Applied Machine Learning Dimensionality reduction using PCA Siamak Ravanbakhsh COMP 551 (Fall

DIMENSIONALITY REDUCTION DIMENSIONALITY REDUCTION MATTHIEU BLOCH April 21, 2020 1 / 26

Probabilistic Dimensionality Reduction Neil D. Lawrence University of Sheffield Facebook, London

Spatial Data: Dimensionality Reduction CSC444 Techniques In this subfield, we think of a data

Spatial Data: Dimensionality Reduction CS444 Techniques, Lecture 3 In this subfield, we think

Dimensionality Reduction INFO-4604, Applied Machine Learning University of Colorado Boulder

A spatial and temporal analysis for long term renewal of water pipes Youssef TLILI Amir NAFI

Spatially Weighted Geodemographics *Muhammad Adnan, **Alex Singleton, Paul Longley University

Determining the Model Order of Nonlinear Input-Output Systems by Fuzzy Clustering Balzs

Comparative Clustering Analysis of Gene Expression Profiles Qun Shan Genetics Division, MCB

The Data Dilemma Qualitative research can generate lots of data interview and focus group

Communities in Coastal Virginia Sarah Stafford and Jeremy Abramowitz Jefferson Program in Public

Region: Hajd-Bihar DATE: 2016.05.31 PRESENTER: IBRNYI ANDRS INSTITUTION: HAJD-BIHAR

and nd St Stra rategic egic Poli olicy cy Recom commend mendations ations Nove vembe

An Algorithm for Sample and Data Dimensionality Reduction Using Fast - PowerPoint PPT Presentation

7th International Conference on Advanced Data Mining and Applications An Algorithm for Sample and Data Dimensionality Reduction Using Fast Simulated Annealing Szymon ukasik , Piotr Kulczycki Department of Automatic Control and IT, Cracow

STAT 209 Dimensionality Reduction November 26, 2019 Colin Reimer Dawson 1 / 24 Dimensionality

Dimensionality Reduction Alexandros Tantos Assistant Professor Aristotle University of

10 slides that always work Simple text boxes (I) Sample text Sample text Sample text

Investigating Dimensionality Dimensionality Dimensionality with with Investigating

Dimensionality Reduction Algorithms (and how to interpret their output) Dalya Baron (Tel Aviv

Exploring Multivariate Data with Clustering and Dimensionality Reduction Marco Baroni Practical

Preprocessing and Dimensionality Reduction J er emy Fix CentraleSup elec

Kernel-Based Dimensionality Reduction Methods on Synthesized and Facial Image Data Jonathan L.

WIKIPEDIA ARTICLE GROUP 9 Contents Article Overview 1. Dimensionality Reduction 2.

Nonlinear Dimensionality Reduction Donovan Parks Overview Direct visualization vs.

Applied Machine Learning Dimensionality reduction using PCA Siamak Ravanbakhsh COMP 551 (Fall

DIMENSIONALITY REDUCTION DIMENSIONALITY REDUCTION MATTHIEU BLOCH April 21, 2020 1 / 26

Probabilistic Dimensionality Reduction Neil D. Lawrence University of Sheffield Facebook, London

Spatial Data: Dimensionality Reduction CSC444 Techniques In this subfield, we think of a data

Spatial Data: Dimensionality Reduction CS444 Techniques, Lecture 3 In this subfield, we think

Dimensionality Reduction INFO-4604, Applied Machine Learning University of Colorado Boulder

A spatial and temporal analysis for long term renewal of water pipes Youssef TLILI Amir NAFI

Spatially Weighted Geodemographics *Muhammad Adnan, **Alex Singleton, *Paul Longley *University

Determining the Model Order of Nonlinear Input-Output Systems by Fuzzy Clustering Balzs

Comparative Clustering Analysis of Gene Expression Profiles Qun Shan Genetics Division, MCB

The Data Dilemma Qualitative research can generate lots of data interview and focus group

Communities in Coastal Virginia Sarah Stafford and Jeremy Abramowitz Jefferson Program in Public

Region: Hajd-Bihar DATE: 2016.05.31 PRESENTER: IBRNYI ANDRS INSTITUTION: HAJD-BIHAR

and nd St Stra rategic egic Poli olicy cy Recom commend mendations ations Nove vembe

Spatially Weighted Geodemographics *Muhammad Adnan, **Alex Singleton, Paul Longley University