lecture 13 even more dimension reduction techniques
play

Lecture 13: Even more dimension reduction techniques Felix Held, - PowerPoint PPT Presentation

Lecture 13: Even more dimension reduction techniques Felix Held, Mathematical Sciences MSA220/MVE440 Statistical Learning for Big Data 10th May 2019 Recap: kernel PCA The projection of a feature vector onto the -th principal


  1. Lecture 13: Even more dimension reduction techniques Felix Held, Mathematical Sciences MSA220/MVE440 Statistical Learning for Big Data 10th May 2019

  2. Recap: kernel PCA The projection of a feature vector 𝐲 onto the 𝑗 -th principal 𝑏 𝑗𝑚 𝑙(𝐲, 𝐲 𝑚 ) 𝑚=1 ∑ 𝑜 𝜃 𝑗 (𝑦) = component in the implicit space of the 𝝔(𝐲) is 1/20 𝐛 𝑈 perform kernel 𝑙(𝐲, 𝐳) , form the Gram matrix 𝐋 = (𝑙(𝐲 𝑗 , 𝐲 Given a set of 𝑛 -dimensional feature vectors 𝐲 1 , … , 𝐲 𝑜 and a 𝑘 )) 𝑗𝑘 and ▶ Solve the eigenvalue problem 𝐋𝐛 𝑗 = 𝜇 𝑗 𝑜𝐛 𝑗 for 𝜇 𝑗 and 𝐛 𝑗 ▶ Scale 𝐛 𝑗 such that 𝑗 𝐋𝐛 𝑗 = 1

  3. Centring and kernel PCA ∑ 𝑘=1 𝑜 𝑜 ∑ 𝑘=1 𝑜 2 𝑜 𝑘=1 𝑜 𝑜 ∑ 𝑛=1 𝐿 𝑘𝑛 1 𝑜 𝟐𝟐 𝑈 , centring in the implicit space is equivalent to transforming 𝐋 as 𝑜 ∑ 2/20 𝑈 feature vectors 𝝔(𝐲 𝑚 ) were centred. What if they are not? Centring in the implicit space leads to 𝑜 𝑜 ∑ 𝑘=1 𝑘 )) 𝝔(𝐲 𝑜 𝑜 ∑ 𝑘=1 𝝔(𝐲 ▶ The derivation assumed that the implicitly defined ▶ In the derivation we look at scalar products 𝝔(𝐲 𝑗 ) 𝑈 𝝔(𝐲 𝑚 ) . (𝝔(𝐲 𝑗 ) − 1 (𝝔(𝐲 𝑚 ) − 1 𝑘 )) = 𝐿 𝑗𝑚 − 1 𝐿 𝑘𝑗 − 1 𝐿 𝑘𝑚 + 1 ▶ Using the centring matrix 𝐊 = 𝐉 𝑜 − 𝐋 ′ = 𝐊𝐋𝐊 ▶ Algorithm is the same, apart from using 𝐋 ′ instead of 𝐋 .

  4. Dimension reduction while preserving distances

  5. Preserving distance Like in cartography, the goal of dimension reduction can be subject to different sub-criteria, e.g. PCA preserves the directions of largest variance. dimension? 3/20 What if we want to preserve the distance while reducing the For given vectors 𝐲 1 , … , 𝐲 𝑜 ∈ ℝ 𝑞 we want to find 𝐳 1 , … , 𝐳 𝑜 ∈ ℝ 𝑛 where 𝑛 < 𝑞 such that ‖𝐲 𝑗 − 𝐲 𝑚 ‖ 2 ≈ ‖𝐳 𝑗 − 𝐳 𝑚 ‖ 2

  6. Distance matrices and the linear kernel ⋯ 𝑜 𝟐𝟐 𝑈 1 −1 and (with element-wise exponentiation) 𝑛 𝐲 𝑛 norm. Note that = 𝐋 ⎠ Given a data matrix 𝐘 ∈ ℝ 𝑜×𝑞 , note that ⎟ ⎞ 𝑜 𝐲 𝑜 𝐲 𝑈 ⎟ 𝑜 𝐲 1 𝐲 𝑈 ⎛ ⎜ ⎜ ⎝ 𝐲 𝑈 4/20 ⋮ 𝐲 𝑈 ⋮ ⋯ 1 𝐲 1 1 𝐲 𝑜 𝐘𝐘 𝑈 = which is also the Gram matrix 𝐋 of the linear kernel . Let 𝐄 = (‖𝐲 𝑚 − 𝐲 𝑛 ‖ 2 ) 𝑚𝑛 be the distance matrix in the Euclidean ‖𝐲 𝑚 − 𝐲 𝑛 ‖ 2 2 = 𝐲 𝑈 𝑚 𝐲 𝑚 − 2𝐲 𝑈 𝑚 𝐲 𝑛 + 𝐲 𝑈 2𝐄 2 = 𝐘𝐘 𝑈 − 1 2𝟐 diag (𝐘𝐘 𝑈 ) − 1 2 diag (𝐘𝐘 𝑈 )𝟐 𝑈 . Through calculation it can be shown that with 𝐊 = 𝐉 𝑜 − 𝐋 = 𝐊 (−1 2𝐄 2 ) 𝐊

  7. Finding an exact embedding 𝑛 = rank (𝐋) ≤ rank (𝐘) ≤ min (𝑜, 𝑞) dimensions. 1. Perform PCA on 𝐋 = 𝐕𝚳𝐕 𝑈 2. If 𝑛 = rank (𝐋) , set 3. The rows of 𝐙 are the sought-after embedding, i.e. for reduction, i.e. 𝑛 = 𝑞 possible. However, usually the internal structure of the data is lower-dimensional and 𝑛 < 𝑞 . 5/20 ▶ Can be shown that if 𝐋 is positive semi-definite then there exists an exact embedding in 𝐙 = (√𝜇 1 𝐯 1 , … , √𝜇 𝑞 𝐯 𝑛 ) ∈ ℝ 𝑜×𝑛 𝐳 𝑚 = 𝐙 𝑚⋅ it holds that ‖𝐲 𝑗 − 𝐲 𝑚 ‖ 2 = ‖𝐳 𝑗 − 𝐳 𝑚 ‖ 2 ▶ Note: This is not guaranteed to lead to dimension

  8. Multi-dimensional scaling and minimizes the so-called stress or strain 𝑗≠𝑘 𝑘 ‖ 2 ) 2 ) 1/2 MDS . 6/20 ▶ Keeping only the first 𝑟 < 𝑛 components of 𝐳 𝑚 is known as classical scaling or multi-dimensional scaling (MDS) 𝑒(𝐄, 𝐙) = (∑ (𝐸 𝑗𝑘 − ‖𝐳 𝑗 − 𝐳 ▶ Results also hold for general distance matrices 𝐄 as long as 𝜇 1 , … , 𝜇 𝑛 > 0 for 𝑛 = rank (𝐋) . This is called metric

  9. Lower-dimensional data in a high-dimensional space

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend