Dimension Reduction Techniques Presented by Jie (Jerry) Yu Outline - PowerPoint PPT Presentation

Dimension Reduction Techniques Presented by Jie (Jerry) Yu

Outline � Problem Modeling � Review of PCA and MDS � Isomap � Local Linear Embedding (LLE) � Charting

Background � Advances in data collection and storage capacities lead to information overload in many fields. � Traditional statistical methods often break down because of the increase in the number of variables in each observations, that is , the dimension of the data. � One of the most challenging problem is to reduce the dimension of original data.

Problem Modeling Original high-dimensional data: � = T ( ,..., ) : p dimensional multivariate random X x x 1 p Underlying/Intrinsic low-dimensional data: � = T ( ,..., ) Y y y : k (< < p) dimensional multivariate 1 k random The mean and covariance: � ∑ = µ = µ µ = − µ − µ T T {( )( ) } ( ) ( ,..., ) E X X E X x 1 p Problems : � 1) Find the appropriate mapping that can best capture the most important features in low dimension and 2) Find the appropriate k that can best describe the data in low dimension.

State-of-the-art Techniques � Dimension reduction techniques can be categorized into two major classes: linear and non-linear. � Non-Linear Methods: Multidimensional Scaling (MDS), Principal Curves, Self-Organizing Map (SOM), Neural Network, Isomap, Local Linear Embedding (LLE) and Charting. � Linear Methods: Principal Component Analysis (PCA), Factor Analysis, Projection Pursuit and Independent Component Analysis (ICA)

Principal Component Analysis (PCA) � Denote a linear projection as = [ ,..., ] W w w 1 k i = T � Thus y w X i � In essence PCA tries to reduce the data dimension by finding a few orthogonal linear combinations (Principal Components, PCs) of the original variables with the largest variance. k k ∑ ∑ = = T arg max var{ } arg max var{ } W y w X i i = = 1 1 i i � It can also be further rewritten as : = T Σ arg max( ) W W W x

PCA � Σ can be decomposed by eigen decomposition as ∑ = U Λ T U x λ λ � Λ = ( ,..., ) is the diagonal matrix of diag 1 p ascending ordered eigenvalues. U is the orthogonal matrix containing the eigenvectors. � It is proven that the optimal projection matrix W are the first k eigenvectors in U.

PCA � Property 1: The subspace spanned by the first k eigenvectors has the smallest mean square deviation from X among all subspace of dimension K. � Property 2: The total variance is equal to the sum of the eigenvalues of the original covariance matrix.

Multidimensional Scaling (MDS) � Multidimensional Scaling (MDS) produces low- dimensional representation of the data such that the distance in the new space reflect the proximities of the data in the original space. � Denote symmetric proximity matrix as : ∆ = δ = { , , 1 ,..., } i j n ij � MDS tries to find the mapping such that the d = ( , ) distance in the lower space are as d y y ij i j close as possible to a function of the f δ corresponding proximity . ( ij )

MDS � Mapping Cost function: ∑ δ − 2 [ ( ) ] f d , i j ij ij _ scale factor ∑ i f δ 2 � The scale_factor are often based on ( ij ) , j ∑ 2 or . i d , j ij � Problem: Find optimal mapping that minimize the cost function L � If the proximity is the distance measure , 2 or , we call it metric-MDS. L 1 � If the proximity uses ordinal information of the data, it is called non-metric-MDS.

Isomap � Disadvantage of PCA and MDS: 1) Both methods often fail to discover complicated nonlinear structure and 2) both have difficulties in detecting the intrinsic dimension of the data. � Goal : combine the major algorithmic feature of PCA and MDS: computational efficiency, global optimality and asymptotic convergence guarantee and have the flexibility to learn nonlinear manifolds. � Idea : Introduce geodesic distance that can better describe the relation between data points.

Isomap Illustration: Points far apart on the underlying manifold , when measured by their geodesic distance may appear close in high-dimensional input space. The Swiss Roll data set

Isomap � In this approach the intrinsic geometry of the data is preserved by capturing the manifold distance between all data. � For neighboring points ( ε or k-nearest ), the Euclidean distance provides good approximation to the geodesic distance. � For faraway points, geodesic distance can be approximated by adding up a sequence of “ short hops ” between neighboring points. (Floyd Algorithm)

Isomap Algorithm � Step 1: determine which points are neighbors on the manifold based on the input distance matrix. � Step 2: Isomap estimates the geodesic distances between all pairs of ( , ) d G i j points on the manifold M by computing their shortest path distance . ( , ) d x i j � Step 3: Apply MDS or PCA to the matrix of G = the graph distance matrix . { ( , )} D d i j G

The Swiss Roll Problem

Detect Intrinsic Dimension The intrinsic dimensionality � of the data can be estimated from the decrease rate of Residual Variance as the dimensionality of Y increased. Residual Variance is defined � as : − 2 while R() 1 ( , ) R D D M y operation is the linear correlation coefficient and D M is the estimated distance in original space and the D y distance in projected space.

Theoretical Analysis � The main contribution of Isomap is substitute the Euclidean distance with geodesic distance, which may better capture the nonlinear structure of a manifold. � Given sufficient data, Isomap is guaranteed asymptotically to recover the true dimensionality and geometric structure of a non-linear manifolds.

Experiments

Experiment 1: Facial Images

The hand-written 2 ’ s Experiment 2:

Locally Linear Embedding (LLE) � MDS and its variant Isomap try to preserve pair wise distance between data points. � Locally Linear Embedding (LLE) is unsupervised learning algorithm that recovers global nonlinear structure from locally linear fits. � Assumption: each data point and its neighbors lie on or close to a locally linear patch of the manifold.

Locally Linearity

LLE � Idea: The local geometry is characterized by linear coefficients that reconstruct each data point from its neighbors. � Reconstruction Cost is defined as : ∑ ∑ ε = − 2 ( ) | | W x w x i j ij j i � Two constraints: 1) each data point is only reconstruct by its neighbors instead of faraway points and 2) rows of weight matrix sum to one.

Linear reconstruction

LLE � The symmetric weight matrix for any data point is invariant to rotations, rescaling and translations. � Although the global manifold may be nonlinear, for each locally linear neighborhood there exists a linear mapping (consisting of a translation ,rotation and rescaling) that project the neighborhood to low dimension. � The same weight matrix that reconstruct ith data in D dimension should also reconstruct its embedded manifold in d dimsension.

LLE � W is solved by minimizing the reconstruct cost function in the original space. � To find the optimal global mapping to lower dimensional space, define an embedding cost function: ∑ ∑ φ = − 2 ( ) | | Y y w y i j ij j i � Because W is fixed, the problem turns to find a optimal projection (X-> Y) which minimize the embedding function.

Theoretical analysis: � 1) only one free parameter K and transformation is determinant. � 2)Guranteed to converge to global optimality with sufficient data point. � 3)LLE don ’ t have to be rerun to compute higher dimension embeddings. � 4)The intrinsic dimension d can be estimated by analyzing a reciprocal cost function of reconstruct Y to X.

Facial Images Experiment 1

Words in semantic space Experiment 2:

Arranging words in semantic Experiment 2: space

Charting � Charting is the problem of assigning a low- dimensional coordinate system to data points in a high-dimensional samples. � Assume that the data lies on or near a low- dimensional manifold in the sample space and there exists a 1-to-1 smooth nonlinear transform between the manifold and a low-dimensional vector space. � Goal: find a mapping that is expressed as a kernel- based mixture of linear projections that minimizes information loss about the density and relative locations of sample points.

Local Linear Scale and Intrinsic Dimensionality � Local Linear Scale (r) : at some scale r the mapping from a neighborhood on M d (original space) to lower dimension is r linear. � Consider a ball of radius r centered on a data point and containing n(r) data points. d r The count n(r) grows at only at the locally linear scale.

Local Linear Scale and Intrinsic Dimensionality There are two other factor that may affect the data distribution in different scale: isotropic noise (at a smaller scale) and embedding curvature ( at a larger scale). Define c(r) = log r /log � n(r) .At noise scale c(r)= 1/D< 1/d . At locally linear scale c(r)= 1/d . At curvature scale c(r) < 1/d .

Dimension Reduction Techniques Presented by Jie (Jerry) Yu Outline - PowerPoint PPT Presentation

Dimension Reduction Techniques Presented by Jie (Jerry) Yu Outline Problem Modeling Review of PCA and MDS Isomap Local Linear Embedding (LLE) Charting Background Advances in data collection and storage capacities lead to

Dimension Reduction and Nearest Neighbor Search Advanced Algorithms Nanjing University, Fall

Dimension Reduction CSE 6242 / CX 4242 Thanks : Prof. Jaegul Choo , Dr. Ramakrishnan Kannan,

Linear Dimension Reduction (in L 2 ) Linear Dimension Reduction: R D R d Goal: Find a low-dim.

VC-dimension and Erd os-P osa property Nicolas Bousquet LIRMM, University Montpellier II

The Human Dimension Sue Manns Regional Director Pegasus The Human Dimension The Human

The Metric Dimension Problem. J. D az Monash U., May 2018 The Metric Dimension problem

Packing Dimension Results for Anisotropic Gaussian Random Fields Dongsheng Wu Department of

Dimension Reduction CS 760@UW-Madison Goals for the lecture you should understand the following

Geometric perspectives for supervised dimension reduction A Tale of Two Manifolds S. Mukherjee,

Nonparametric Variable Selection via Sufficient Dimension Reduction Lexin Li Workshop on Current

Extreme Value Theory and Dimension GARDES Inference on reduction for the study of hyperspectral

Dimension Reduction CS 6242 Ramakrishnan Kannan Thanks : Prof. Jaegul Choo and Prof. Le

Dimension reduction numerical methods for Bermudan options Scott Sues Probability, Numerics, and

Mixed Gates: Leakage Reduction Mixed Gates: Leakage Reduction Techniques applied to Techniques

1/88 Presentation: Advanced Techniques 2/88 Presentation: Advanced Techniques 3/88

Intraday Techniques Intraday Techniques Intraday Techniques Intraday Techniques Combining

Dimensionality reduction Outline From distances to points : MultiDimensional Scaling (MDS)

Parks & Environment Catawba County Board of Commissioners February 13, 2017 AGENDA KEY

CRYSTAL TELECOM INVESTOR CALL PRESENTATION INDEX CTL Financial statements overview MTN

Investor Presentation AusGroup Investor Presentation Q1 FY2018 Contents Company Overview

Graph Algorithm Efficient Shortest Path Estimation Mentee: Yonk Shi (CSE, Moorpark College)

Interpretation of Dimensionally-Reduced Crime Data A Study with Untrained Domain Experts Dominik

Matt McMillan Roman Dial Mike Loso Jim Depasquale Jason Geck Buildings Built Chlorophyll-a (

WPI Precision Personnel Location System: Automatic Antenna Geometry Estimation Benjamin Woodacre

Dimension Reduction Techniques Presented by Jie (Jerry) Yu Outline - PowerPoint PPT Presentation

Dimension Reduction Techniques Presented by Jie (Jerry) Yu Outline Problem Modeling Review of PCA and MDS Isomap Local Linear Embedding (LLE) Charting Background Advances in data collection and storage capacities lead to

Dimension Reduction and Nearest Neighbor Search Advanced Algorithms Nanjing University, Fall

Dimension Reduction CSE 6242 / CX 4242 Thanks : Prof. Jaegul Choo , Dr. Ramakrishnan Kannan,

Linear Dimension Reduction (in L 2 ) Linear Dimension Reduction: R D R d Goal: Find a low-dim.

VC-dimension and Erd os-P osa property Nicolas Bousquet LIRMM, University Montpellier II

The Human Dimension Sue Manns Regional Director Pegasus The Human Dimension The Human

The Metric Dimension Problem. J. D az Monash U., May 2018 The Metric Dimension problem

Packing Dimension Results for Anisotropic Gaussian Random Fields Dongsheng Wu Department of

Dimension Reduction CS 760@UW-Madison Goals for the lecture you should understand the following

Geometric perspectives for supervised dimension reduction A Tale of Two Manifolds S. Mukherjee,

Nonparametric Variable Selection via Sufficient Dimension Reduction Lexin Li Workshop on Current

Extreme Value Theory and Dimension GARDES Inference on reduction for the study of hyperspectral

Dimension Reduction CS 6242 Ramakrishnan Kannan Thanks : Prof. Jaegul Choo and Prof. Le

Dimension reduction numerical methods for Bermudan options Scott Sues Probability, Numerics, and

Mixed Gates: Leakage Reduction Mixed Gates: Leakage Reduction Techniques applied to Techniques

1/88 Presentation: Advanced Techniques 2/88 Presentation: Advanced Techniques 3/88

Intraday Techniques Intraday Techniques Intraday Techniques Intraday Techniques Combining

Dimensionality reduction Outline From distances to points : MultiDimensional Scaling (MDS)

Parks &amp; Environment Catawba County Board of Commissioners February 13, 2017 AGENDA KEY

CRYSTAL TELECOM INVESTOR CALL PRESENTATION INDEX CTL Financial statements overview MTN

Investor Presentation AusGroup Investor Presentation Q1 FY2018 Contents Company Overview

Graph Algorithm Efficient Shortest Path Estimation Mentee: Yonk Shi (CSE, Moorpark College)

Interpretation of Dimensionally-Reduced Crime Data A Study with Untrained Domain Experts Dominik

Matt McMillan Roman Dial Mike Loso Jim Depasquale Jason Geck Buildings Built Chlorophyll-a (

WPI Precision Personnel Location System: Automatic Antenna Geometry Estimation Benjamin Woodacre

Parks & Environment Catawba County Board of Commissioners February 13, 2017 AGENDA KEY