data mining techniques
play

Data Mining Techniques CS 6220 - Section 3 - Fall 2016 Lecture 12 - PowerPoint PPT Presentation

Data Mining Techniques CS 6220 - Section 3 - Fall 2016 Lecture 12 Jan-Willem van de Meent (credit: Yijun Zhao, Percy Liang) DIMENSIONALITY REDUCTION Borrowing from : Percy Liang (Stanford) Linear Dimensionality Reduction Idea :


  1. Data Mining Techniques CS 6220 - Section 3 - Fall 2016 Lecture 12 Jan-Willem van de Meent (credit: Yijun Zhao, Percy Liang)

  2. DIMENSIONALITY REDUCTION Borrowing from : 
 Percy Liang 
 (Stanford)

  3. Linear Dimensionality Reduction Idea : Project high-dimensional vector 
 onto a lower dimensional space ∈ x ∈ R 361 z = U > x z ∈ R 10

  4. Problem Setup Given n data points in d dimensions: x 1 , . . . , x n ∈ R d X = ( x 1 · · · · · · x n ) ∈ R d ⇥ n

  5. Problem Setup Given n data points in d dimensions: x 1 , . . . , x n ∈ R d X = ( x 1 · · · · · · x n ) ∈ R d ⇥ n Want to reduce dimensionality from d to k Choose k directions u 1 , . . . , u k U = ( u 1 ·· u k ) ∈ R d ⇥ k

  6. Problem Setup Given n data points in d dimensions: x 1 , . . . , x n ∈ R d X = ( x 1 · · · · · · x n ) ∈ R d ⇥ n Want to reduce dimensionality from d to k Choose k directions u 1 , . . . , u k U = ( u 1 ·· u k ) ∈ R d ⇥ k For each u j , compute “similarity” z j = u > j x

  7. Problem Setup Given n data points in d dimensions: x 1 , . . . , x n ∈ R d X = ( x 1 · · · · · · x n ) ∈ R d ⇥ n Want to reduce dimensionality from d to k Choose k directions u 1 , . . . , u k U = ( u 1 ·· u k ) ∈ R d ⇥ k For each u j , compute “similarity” z j = u > j x Project x down to z = ( z 1 , . . . , z k ) > = U > x How to choose U ?

  8. Principal Component Analysis ∈ x ∈ R 361 z = U > x z ∈ R 10 How do we choose U ? Two Objectives 1. Minimize the reconstruction error 2. Maximize the projected variance

  9. PCA Objective 1: Reconstruction Error U serves two functions: • Encode: z = U > x , z j = u > j x P

  10. PCA Objective 1: Reconstruction Error U serves two functions: • Encode: z = U > x , z j = u > j x • Decode: ˜ x = Uz = P k j =1 z j u j

  11. PCA Objective 1: Reconstruction Error U serves two functions: • Encode: z = U > x , z j = u > j x • Decode: ˜ x = Uz = P k j =1 z j u j Want reconstruction error k x � ˜ x k to be small

  12. PCA Objective 1: Reconstruction Error U serves two functions: • Encode: z = U > x , z j = u > j x • Decode: ˜ x = Uz = P k j =1 z j u j Want reconstruction error k x � ˜ x k to be small Objective: minimize total squared reconstruction error n X k x i � UU > x i k 2 min U 2 R d ⇥ k i =1

  13. PCA Objective 2: Projected Variance Empirical distribution: uniform over x 1 , . . . , x n Expectation (think sum over data points): P n ˆ E [ f ( x )] = 1 i =1 f ( x i ) n Variance (think sum of squares if centered): P n E [ f ( x )]) 2 = ˆ var [ f ( x )] + (ˆ E [ f ( x ) 2 ] = 1 i =1 f ( x i ) 2 c n

  14. PCA Objective 2: Projected Variance Empirical distribution: uniform over x 1 , . . . , x n Expectation (think sum over data points): P n ˆ E [ f ( x )] = 1 i =1 f ( x i ) n Variance (think sum of squares if centered): P n E [ f ( x )]) 2 = ˆ var [ f ( x )] + (ˆ E [ f ( x ) 2 ] = 1 i =1 f ( x i ) 2 c c n Assume data is centered: ˆ E [ x ] = 0 (what’s

  15. PCA Objective 2: Projected Variance Empirical distribution: uniform over x 1 , . . . , x n Expectation (think sum over data points): P n ˆ E [ f ( x )] = 1 i =1 f ( x i ) n Variance (think sum of squares if centered): P n P E [ f ( x )]) 2 = ˆ var [ f ( x )] + (ˆ E [ f ( x ) 2 ] = 1 i =1 f ( x i ) 2 c c n Assume data is centered: ˆ E [ x ] = 0 (what’s ˆ E [ U > x ] ?)

  16. PCA Objective 2: Projected Variance Empirical distribution: uniform over x 1 , . . . , x n Expectation (think sum over data points): P n ˆ E [ f ( x )] = 1 i =1 f ( x i ) n Variance (think sum of squares if centered): P P n E [ f ( x )]) 2 = ˆ var [ f ( x )] + (ˆ E [ f ( x ) 2 ] = 1 i =1 f ( x i ) 2 c c n Assume data is centered: ˆ E [ x ] = 0 (what’s ˆ E [ U > x ] ?) Objective: maximize variance of projected data ˆ E [ k U > x k 2 ] max U 2 R d ⇥ k , U > U = I

  17. PCA Objective 2: Projected Variance Empirical distribution: uniform over x 1 , . . . , x n Expectation (think sum over data points): P n ˆ E [ f ( x )] = 1 i =1 f ( x i ) n Variance (think sum of squares if centered): P P n E [ f ( x )]) 2 = ˆ var [ f ( x )] + (ˆ E [ f ( x ) 2 ] = 1 i =1 f ( x i ) 2 c c n Assume data is centered: ˆ E [ x ] = 0 (what’s ˆ E [ U > x ] ?) Objective: maximize variance of projected data ˆ E [ k U > x k 2 ] max U 2 R d ⇥ k , U > U = I

  18. Equivalence of two objectives Key intuition: variance of data = captured variance + reconstruction error | {z } | {z } | {z } fixed want small want large

  19. Equivalence of two objectives Key intuition: variance of data = captured variance + reconstruction error | {z } | {z } | {z } fixed want small want large Pythagorean decomposition: x = UU > x + ( I � UU > ) x k x k k ( I � UU > ) x k k UU > x k Take expectations; note rotation U doesn’t a ff ect length: E [ k x k 2 ] = ˆ ˆ E [ k U > x k 2 ] + ˆ E [ k x � UU > x k 2 ]

  20. Equivalence of two objectives Key intuition: variance of data = captured variance + reconstruction error | {z } | {z } | {z } fixed want small want large Pythagorean decomposition: x = UU > x + ( I � UU > ) x k x k k ( I � UU > ) x k k UU > x k Take expectations; note rotation U doesn’t a ff ect length: E [ k x k 2 ] = ˆ ˆ E [ k U > x k 2 ] + ˆ E [ k x � UU > x k 2 ] Minimize reconstruction error $ Maximize captured variance

  21. Finding one principal component Objective: maximize variance of projected data Input data: X = ( x 1 . . . x n ) rincipal component analysis (PCA) / Ba

  22. Finding one principal component Objective: maximize variance of projected data ˆ E [( u > x ) 2 ] = max k u k =1 Input data: X = ( x 1 . . . x n ) rincipal component analysis (PCA) / Ba

  23. Finding one principal component Objective: maximize variance of projected data ˆ E [( u > x ) 2 ] = max k u k =1 n 1 X ( u > x i ) 2 = max n k u k =1 i =1 1 Input data: X = ( x 1 . . . x n ) rincipal component analysis (PCA) / Ba

  24. Finding one principal component Objective: maximize variance of projected data ˆ E [( u > x ) 2 ] = max k u k =1 n 1 X ( u > x i ) 2 = max n k u k =1 i =1 1 Input data: n k u > X k 2 = max k u k =1 X = ( x 1 . . . x n ) ✓ 1 ◆ rincipal component analysis (PCA) / Ba

  25. Finding one principal component Objective: maximize variance of projected data ˆ E [( u > x ) 2 ] = max k u k =1 n 1 X ( u > x i ) 2 = max n k u k =1 i =1 1 Input data: n k u > X k 2 = max k u k =1 X = ( x 1 . . . x n ) ✓ 1 ◆ n XX > k u k =1 u > = max u 1 rincipal component analysis (PCA) / Ba

  26. Finding one principal component Objective: maximize variance of projected data ˆ E [( u > x ) 2 ] = max k u k =1 n 1 X ( u > x i ) 2 = max n k u k =1 i =1 1 Input data: n k u > X k 2 = max k u k =1 X = ( x 1 . . . x n ) ✓ 1 ◆ n XX > k u k =1 u > = max u = 1 def n XX > = largest eigenvalue of C rincipal component analysis (PCA) / Ba

  27. Finding one principal component Objective: maximize variance of projected data ˆ E [( u > x ) 2 ] = max k u k =1 n 1 X ( u > x i ) 2 = max n k u k =1 i =1 1 Input data: n k u > X k 2 = max k u k =1 X = ( x 1 . . . x n ) ✓ 1 ◆ n XX > k u k =1 u > = max u = 1 def n XX > = largest eigenvalue of C ( C is covariance matrix of data) rincipal component analysis (PCA) / Ba ic principles

  28. How many components? • Similar to question of “How many clusters?” • Magnitude of eigenvalues indicate fraction of variance captured.

  29. How many components? • Similar to question of “How many clusters?” • Magnitude of eigenvalues indicate fraction of variance captured. • Eigenvalues on a face image dataset: 1353.2 1086.7 820.1 λ i 553.6 287.1 2 3 4 5 6 7 8 9 10 11 i

  30. How many components? • Similar to question of “How many clusters?” • Magnitude of eigenvalues indicate fraction of variance captured. • Eigenvalues on a face image dataset: 1353.2 1086.7 820.1 λ i 553.6 287.1 2 3 4 5 6 7 8 9 10 11 i • Eigenvalues typically drop o ff sharply, so don’t need that many. • Of course variance isn’t everything...

  31. Computing PCA Method 1: eigendecomposition n XX > U are eigenvectors of covariance matrix C = 1 2 ) (

  32. Computing PCA Method 1: eigendecomposition n XX > U are eigenvectors of covariance matrix C = 1 Computing C already takes O ( nd 2 ) time (very expensive)

  33. Computing PCA Method 1: eigendecomposition n XX > U are eigenvectors of covariance matrix C = 1 Computing C already takes O ( nd 2 ) time (very expensive) Method 2: singular value decomposition (SVD) Find X = U d ⇥ d Σ d ⇥ n V > n ⇥ n where U > U = I d ⇥ d , V > V = I n ⇥ n , Σ is diagonal ( )

  34. Computing PCA Method 1: eigendecomposition n XX > U are eigenvectors of covariance matrix C = 1 Computing C already takes O ( nd 2 ) time (very expensive) Method 2: singular value decomposition (SVD) Find X = U d ⇥ d Σ d ⇥ n V > n ⇥ n where U > U = I d ⇥ d , V > V = I n ⇥ n , Σ is diagonal Computing top k singular vectors takes only O ( ndk )

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend