Data Mining Techniques CS 6220 - Section 3 - Fall 2016 Lecture 12 - PowerPoint PPT Presentation

Data Mining Techniques CS 6220 - Section 3 - Fall 2016 Lecture 12 Jan-Willem van de Meent (credit: Yijun Zhao, Percy Liang)

DIMENSIONALITY REDUCTION Borrowing from :   Percy Liang   (Stanford)

Linear Dimensionality Reduction Idea : Project high-dimensional vector   onto a lower dimensional space ∈ x ∈ R 361 z = U > x z ∈ R 10

Problem Setup Given n data points in d dimensions: x 1 , . . . , x n ∈ R d X = ( x 1 · · · · · · x n ) ∈ R d ⇥ n

Problem Setup Given n data points in d dimensions: x 1 , . . . , x n ∈ R d X = ( x 1 · · · · · · x n ) ∈ R d ⇥ n Want to reduce dimensionality from d to k Choose k directions u 1 , . . . , u k U = ( u 1 ·· u k ) ∈ R d ⇥ k

Problem Setup Given n data points in d dimensions: x 1 , . . . , x n ∈ R d X = ( x 1 · · · · · · x n ) ∈ R d ⇥ n Want to reduce dimensionality from d to k Choose k directions u 1 , . . . , u k U = ( u 1 ·· u k ) ∈ R d ⇥ k For each u j , compute “similarity” z j = u > j x

Problem Setup Given n data points in d dimensions: x 1 , . . . , x n ∈ R d X = ( x 1 · · · · · · x n ) ∈ R d ⇥ n Want to reduce dimensionality from d to k Choose k directions u 1 , . . . , u k U = ( u 1 ·· u k ) ∈ R d ⇥ k For each u j , compute “similarity” z j = u > j x Project x down to z = ( z 1 , . . . , z k ) > = U > x How to choose U ?

Principal Component Analysis ∈ x ∈ R 361 z = U > x z ∈ R 10 How do we choose U ? Two Objectives 1. Minimize the reconstruction error 2. Maximize the projected variance

PCA Objective 1: Reconstruction Error U serves two functions: • Encode: z = U > x , z j = u > j x P

PCA Objective 1: Reconstruction Error U serves two functions: • Encode: z = U > x , z j = u > j x • Decode: ˜ x = Uz = P k j =1 z j u j

PCA Objective 1: Reconstruction Error U serves two functions: • Encode: z = U > x , z j = u > j x • Decode: ˜ x = Uz = P k j =1 z j u j Want reconstruction error k x � ˜ x k to be small

PCA Objective 1: Reconstruction Error U serves two functions: • Encode: z = U > x , z j = u > j x • Decode: ˜ x = Uz = P k j =1 z j u j Want reconstruction error k x � ˜ x k to be small Objective: minimize total squared reconstruction error n X k x i � UU > x i k 2 min U 2 R d ⇥ k i =1

PCA Objective 2: Projected Variance Empirical distribution: uniform over x 1 , . . . , x n Expectation (think sum over data points): P n ˆ E [ f ( x )] = 1 i =1 f ( x i ) n Variance (think sum of squares if centered): P n E [ f ( x )]) 2 = ˆ var [ f ( x )] + (ˆ E [ f ( x ) 2 ] = 1 i =1 f ( x i ) 2 c n

PCA Objective 2: Projected Variance Empirical distribution: uniform over x 1 , . . . , x n Expectation (think sum over data points): P n ˆ E [ f ( x )] = 1 i =1 f ( x i ) n Variance (think sum of squares if centered): P n E [ f ( x )]) 2 = ˆ var [ f ( x )] + (ˆ E [ f ( x ) 2 ] = 1 i =1 f ( x i ) 2 c c n Assume data is centered: ˆ E [ x ] = 0 (what’s

PCA Objective 2: Projected Variance Empirical distribution: uniform over x 1 , . . . , x n Expectation (think sum over data points): P n ˆ E [ f ( x )] = 1 i =1 f ( x i ) n Variance (think sum of squares if centered): P n P E [ f ( x )]) 2 = ˆ var [ f ( x )] + (ˆ E [ f ( x ) 2 ] = 1 i =1 f ( x i ) 2 c c n Assume data is centered: ˆ E [ x ] = 0 (what’s ˆ E [ U > x ] ?)

PCA Objective 2: Projected Variance Empirical distribution: uniform over x 1 , . . . , x n Expectation (think sum over data points): P n ˆ E [ f ( x )] = 1 i =1 f ( x i ) n Variance (think sum of squares if centered): P P n E [ f ( x )]) 2 = ˆ var [ f ( x )] + (ˆ E [ f ( x ) 2 ] = 1 i =1 f ( x i ) 2 c c n Assume data is centered: ˆ E [ x ] = 0 (what’s ˆ E [ U > x ] ?) Objective: maximize variance of projected data ˆ E [ k U > x k 2 ] max U 2 R d ⇥ k , U > U = I

Equivalence of two objectives Key intuition: variance of data = captured variance + reconstruction error | {z } | {z } | {z } fixed want small want large

Equivalence of two objectives Key intuition: variance of data = captured variance + reconstruction error | {z } | {z } | {z } fixed want small want large Pythagorean decomposition: x = UU > x + ( I � UU > ) x k x k k ( I � UU > ) x k k UU > x k Take expectations; note rotation U doesn’t a ff ect length: E [ k x k 2 ] = ˆ ˆ E [ k U > x k 2 ] + ˆ E [ k x � UU > x k 2 ]

Equivalence of two objectives Key intuition: variance of data = captured variance + reconstruction error | {z } | {z } | {z } fixed want small want large Pythagorean decomposition: x = UU > x + ( I � UU > ) x k x k k ( I � UU > ) x k k UU > x k Take expectations; note rotation U doesn’t a ff ect length: E [ k x k 2 ] = ˆ ˆ E [ k U > x k 2 ] + ˆ E [ k x � UU > x k 2 ] Minimize reconstruction error $ Maximize captured variance

Finding one principal component Objective: maximize variance of projected data Input data: X = ( x 1 . . . x n ) rincipal component analysis (PCA) / Ba

Finding one principal component Objective: maximize variance of projected data ˆ E [( u > x ) 2 ] = max k u k =1 Input data: X = ( x 1 . . . x n ) rincipal component analysis (PCA) / Ba

Finding one principal component Objective: maximize variance of projected data ˆ E [( u > x ) 2 ] = max k u k =1 n 1 X ( u > x i ) 2 = max n k u k =1 i =1 1 Input data: X = ( x 1 . . . x n ) rincipal component analysis (PCA) / Ba

Finding one principal component Objective: maximize variance of projected data ˆ E [( u > x ) 2 ] = max k u k =1 n 1 X ( u > x i ) 2 = max n k u k =1 i =1 1 Input data: n k u > X k 2 = max k u k =1 X = ( x 1 . . . x n ) ✓ 1 ◆ rincipal component analysis (PCA) / Ba

Finding one principal component Objective: maximize variance of projected data ˆ E [( u > x ) 2 ] = max k u k =1 n 1 X ( u > x i ) 2 = max n k u k =1 i =1 1 Input data: n k u > X k 2 = max k u k =1 X = ( x 1 . . . x n ) ✓ 1 ◆ n XX > k u k =1 u > = max u 1 rincipal component analysis (PCA) / Ba

Finding one principal component Objective: maximize variance of projected data ˆ E [( u > x ) 2 ] = max k u k =1 n 1 X ( u > x i ) 2 = max n k u k =1 i =1 1 Input data: n k u > X k 2 = max k u k =1 X = ( x 1 . . . x n ) ✓ 1 ◆ n XX > k u k =1 u > = max u = 1 def n XX > = largest eigenvalue of C rincipal component analysis (PCA) / Ba

Finding one principal component Objective: maximize variance of projected data ˆ E [( u > x ) 2 ] = max k u k =1 n 1 X ( u > x i ) 2 = max n k u k =1 i =1 1 Input data: n k u > X k 2 = max k u k =1 X = ( x 1 . . . x n ) ✓ 1 ◆ n XX > k u k =1 u > = max u = 1 def n XX > = largest eigenvalue of C ( C is covariance matrix of data) rincipal component analysis (PCA) / Ba ic principles

How many components? • Similar to question of “How many clusters?” • Magnitude of eigenvalues indicate fraction of variance captured.

How many components? • Similar to question of “How many clusters?” • Magnitude of eigenvalues indicate fraction of variance captured. • Eigenvalues on a face image dataset: 1353.2 1086.7 820.1 λ i 553.6 287.1 2 3 4 5 6 7 8 9 10 11 i

How many components? • Similar to question of “How many clusters?” • Magnitude of eigenvalues indicate fraction of variance captured. • Eigenvalues on a face image dataset: 1353.2 1086.7 820.1 λ i 553.6 287.1 2 3 4 5 6 7 8 9 10 11 i • Eigenvalues typically drop o ff sharply, so don’t need that many. • Of course variance isn’t everything...

Computing PCA Method 1: eigendecomposition n XX > U are eigenvectors of covariance matrix C = 1 2 ) (

Computing PCA Method 1: eigendecomposition n XX > U are eigenvectors of covariance matrix C = 1 Computing C already takes O ( nd 2 ) time (very expensive)

Computing PCA Method 1: eigendecomposition n XX > U are eigenvectors of covariance matrix C = 1 Computing C already takes O ( nd 2 ) time (very expensive) Method 2: singular value decomposition (SVD) Find X = U d ⇥ d Σ d ⇥ n V > n ⇥ n where U > U = I d ⇥ d , V > V = I n ⇥ n , Σ is diagonal ( )

Computing PCA Method 1: eigendecomposition n XX > U are eigenvectors of covariance matrix C = 1 Computing C already takes O ( nd 2 ) time (very expensive) Method 2: singular value decomposition (SVD) Find X = U d ⇥ d Σ d ⇥ n V > n ⇥ n where U > U = I d ⇥ d , V > V = I n ⇥ n , Σ is diagonal Computing top k singular vectors takes only O ( ndk )

Data Mining Techniques CS 6220 - Section 3 - Fall 2016 Lecture 12 - PowerPoint PPT Presentation

Data Mining Techniques CS 6220 - Section 3 - Fall 2016 Lecture 12 Jan-Willem van de Meent (credit: Yijun Zhao, Percy Liang) DIMENSIONALITY REDUCTION Borrowing from : Percy Liang (Stanford) Linear Dimensionality Reduction Idea :

Web Mining Web Mining Web Mining Web Mining Web mining is the use of data mining techniques

Web Mining Web Mining Web mining is the use of data mining techniques to automatically

Introduction What is data mining? to Data Mining: On what kind of data? Data Mining

DATA MINING LECTURE 2 What is data? The data mining pipeline What is Data Mining? Data

Data Mining: Concepts and Techniques Chapter 1 Introduction 1 August 19, 2013

Introduction What is data mining? to Data mining functionalities Data Mining Major

Data mining Machine Intelligence Thomas D. Nielsen September 2008 Data mining September 2008

DATA MINING LECTURE 1 Introduction What is data mining? After years of data mining there is

Data Mining: Concepts and Techniques Web Mining Li Xiong Slides credits: Jiawei Han and

Data Mining 2020 Frequent Pattern Mining (2) Ad Feelders Universiteit Utrecht October 2, 2020

CS6220: DATA MINING TECHNIQUES Chapter 7: Advanced Pattern Mining Instructor: Yizhou Sun

Web MINING Web MINING Overview Overview Dr Ahmed Rafea Rafea Dr Ahmed 1 Web Mining Outline

LECTURE 1: INTRODUCTION TO DATA MINING Dr. Dhaval Patel CSE, IIT-Roorkee What is data mining?

Web Mining Web Mining to automatically discover and extract information from Web

Web Mining Web Mining to automatically discover and extract information from Web

Data Mining: Concepts and Techniques Chap 8. Data Streams, Time Series Data, and Sequential

Dimensionality Reduc1on Lecture 23 David Sontag New York University Slides adapted from Carlos

Chapter 8. Principal-Components Analysis Neural Networks and

Unsupervised Learning Principal Component Analysis CMSC 422 M ARINE C ARPUAT marine@cs.umd.edu

WELCOME! SMEI Virtual Series Setting up for September: Exploring strategies for music teaching

Factor Analysis and Related Methods James H. Steiger Vanderbilt University Primary Goals for

Recasting Principal Components R.W. Oldford University of Waterloo Reducing dimensions -

IN5490 Advanced Topics in Artificial Intelligence for Intelligent Systems Md. Zia Uddin

Convex elicitation of continuous properties Jessica Finocchiaro, Rafael Frongillo University of