lecture 6 clustering
play

Lecture 6: Clustering Felix Held, Mathematical Sciences - PowerPoint PPT Presentation

Lecture 6: Clustering Felix Held, Mathematical Sciences MSA220/MVE440 Statistical Learning for Big Data 5th April 2019 Projects assumptions (all groups) disadvantage if you cannot present because there is not enough time) 1/23 Focus on


  1. Lecture 6: Clustering Felix Held, Mathematical Sciences MSA220/MVE440 Statistical Learning for Big Data 5th April 2019

  2. Projects assumptions (all groups) disadvantage if you cannot present because there is not enough time) 1/23 ▶ Focus on challenging the algorithms and their ▶ Keep your presentations short ( ∼ 10 min) ▶ Send in your presentation and code by 10.00 on Friday ▶ There are 30 groups across 3 rooms, i.e. ▶ Not every group might get to present (it is not to your ▶ We will group similar topics to allow for better discussion

  3. Importance of standardisation (I) The overall issue: Subjectivity vs Objectivity 𝑚=1 ∑ 𝑜 𝑜 − 1 1 = 𝑚=1 ∑ 𝑜 𝑜 − 1 1 If 𝑦 is scald by a factor 𝑑 , i.e. 𝑨 = 𝑑 ⋅ 𝑦 , then 𝑚=1 ∑ 𝑜 𝑜 − 1 1 of variables 𝑦 and 𝑧 , then their empirical covariance is 2/23 (Co-)variance is scale dependent: If we have a sample (size 𝑜 ) 𝑡 𝑦𝑧 = (𝑦 𝑚 − 𝑦)(𝑧 𝑚 − 𝑧) 𝑡 𝑨𝑧 = (𝑨 𝑚 − 𝑨)(𝑧 𝑚 − 𝑧) (𝑑 ⋅ 𝑦 𝑚 − 𝑑 ⋅ 𝑦)(𝑧 𝑚 − 𝑧) = 𝑑 ⋅ 𝑡 𝑦𝑧

  4. Importance of standardisation (II) large/influential or small/insignificant as we want, which and reach an objective point-of-view samples for a variable fall into that range, then it is not very informative after all therefore there will still be dominating directions after standardisation 3/23 (Co-)variance is scale dependent: 𝑡 𝑨𝑧 = 𝑑 ⋅ 𝑡 𝑦𝑧 where 𝑨 = 𝑑 ⋅ 𝑦 ▶ By scaling variables we can therefore make them as is a very subjective process ▶ By standardising variables we can get of rid of scaling ▶ Do we get rid of information? ▶ The typical range of a variable is compressed, but if most ▶ Real data is not a perfect Gaussian point cloud and ▶ Outliers will still be outliers

  5. Importance of standardisation (III) UCI Wine dataset (Three different types of wine with 𝑞 = 13 4/23 characteristics) Raw Centred + Standardised 3 ● ● 1500 ● ● ● ● ● ● ● ● ● ● 2 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Proline ● ● ● Proline ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1000 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 500 ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −1 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 11 12 13 14 15 −2 −1 0 1 2 Alcohol Alcohol ● ● 20 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 2 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● PC2 ● ● PC2 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −20 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● −2 ● ● ● ● ● ● ● ● ● ● ● ● −40 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −60 ● −4 ● −1000 −500 0 500 −2.5 0.0 2.5 PC1 PC1

  6. Class-related dimension reduction

  7. Better data projection for classification? Idea: Find directions along which projections result in minimal within-class scatter and maximal between-class separation. Projection onto first principal component Projection onto first discriminant LDA decision boundary 5/23 LD1 1 C P

  8. Classification and principal components 𝑜 𝜈 3 𝜈 2 𝜈 1 on these directions can by problematic. account. Classification after projection directions do not take class-labels into Note: The principal component 𝐲 𝑚 𝑚=1 In LDA the covariance matrix of the features within each class 𝑜 ∑ 6/23 where 𝑗=1 ∑ 𝐿 ˆ ˆ 𝚻 . In addition define is ˆ 𝚻 . Now we will consider the within-class scatter matrix 𝚻 𝑋 = (𝑜 − 𝐿)ˆ 𝝂 = 1 𝚻 𝐶 = 𝑜 𝑗 (𝝂 𝑗 − 𝝂)(𝝂 𝑗 − 𝝂) 𝑈 , the between-class scatter matrix . ˆ 𝚻 𝑋 f o C 1 P

  9. Fisher’s Problem Recall: The variance of the data projected on a direction given called Fisher’s problem . ‖𝐬‖ = 1 subject to 𝚻 𝑋 𝐬 𝐬 𝑈 ˆ 𝚻 𝐶 𝐬 Optimization goal: Maximize over 𝐬 simultaneously minimizing variance within each class. The goal is to maximize variance between class centres while 𝚻 𝐶 𝐬 . calculated as 𝐬 𝑈 ˆ In analogy, the variance between class centres along 𝐬 is 𝚻 𝑋 𝐬 . by 𝐬 can be calculated as 𝑇(𝐬) = 𝐬 𝑈 ˆ 7/23 𝐾(𝐬) = 𝐬 𝑈 ˆ which is a more general form of a Rayleigh Quotient and is

  10. Solving Fisher’s Problem Note: There are maximum 𝐿 − 1 solutions 𝐬 on the orthogonal complement of the first 𝑘 − 1 solutions) (as with PCA the 𝑘 -th solution maximizes Fisher’s problem 𝐖 . The columns of 𝐒 solve Fisher’s problem 𝑋 𝚻 −1/2 = 𝐖𝐄𝐖 𝑈 𝑋 𝚻 −1/2 𝚻 𝐶 ˆ ˆ 𝑋 𝚻 −1/2 ˆ symmetric) Computation of solutions: 8/23 𝑘 to Fisher’s problem (because ˆ 𝚻 𝐶 has rank ≤ 𝐿 − 1 ). 1. Compute the eigen-decomposition (the matrix is real and where 𝐖 ∈ ℝ 𝑞×𝑞 orthogonal and 𝐄 ∈ ℝ 𝑞×𝑞 diagonal. 2. Set 𝐒 = ˆ

  11. Discriminant Variables and Reduced-rank LDA the optimal separation of projected class centroids 9/23 ▶ The vectors 𝐬 𝑘 determined by solving Fisher’s problem can be used like PCA, but are aware of class labels and give ▶ Projecting the data onto the 𝑘 -th solution gives the 𝑘 -th discriminant variable 𝐬 𝑈 𝑘 𝐲 ▶ Using only the 𝑛 < 𝐿 − 1 first is called reduced-rank LDA

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend