Dimensionality Reduc1on Machine Learning 10-601B Seyoung - PowerPoint PPT Presentation

Dimensionality ¡Reduc1on ¡ Machine ¡Learning ¡10-‑601B ¡ Seyoung ¡Kim ¡ 1 ¡

Text ¡document ¡retrieval/labelling ¡ • Represent ¡each ¡document ¡by ¡a ¡high-‑dimensional ¡vector ¡in ¡the ¡space ¡of ¡ words ¡ 2 ¡

Image ¡retrieval/labelling ¡ 3 ¡

Dimensionality ¡Bo;lenecks ¡ • Data ¡dimension ¡ – Input ¡variables ¡X: ¡High ¡ ¡ • 1-‑5M ¡lexicon ¡token ¡in ¡text ¡documents ¡ • 1024 2 ¡pixels ¡of ¡a ¡projected ¡image ¡on ¡a ¡IR ¡camera ¡sensor ¡ • N 2 ¡expansion ¡factor ¡to ¡account ¡for ¡all ¡pairwise ¡correlaNons ¡ • 1,000,000 ¡geneNc ¡variants ¡in ¡a ¡human’s ¡genome ¡ • InformaNon ¡dimension: ¡Low ¡ – Number ¡of ¡free ¡parameters ¡describing ¡probability ¡densiNes ¡ ¡ • Unsupervised ¡learning ¡p(X) ¡ ¡ • Supervised ¡learning ¡p(Y|X): ¡the ¡predicNon ¡of ¡Y ¡depends ¡on ¡ “informaNon ¡dimension” ¡of ¡X ¡ ¡ 4 ¡

Intui1on: ¡how ¡does ¡your ¡brain ¡store ¡these ¡ pictures? ¡ 5 ¡

Brain ¡Representa1on ¡ 6 ¡

Brain ¡Representa1on ¡ • Every ¡pixel? ¡ • Or ¡perceptually ¡meaningful ¡ structure? ¡ – Up-‑down ¡pose ¡ ¡ – Le[-‑right ¡pose ¡ – LighNng ¡direcNon ¡ So, ¡your ¡brain ¡successfully ¡ reduced ¡the ¡high-‑dimensional ¡ inputs ¡to ¡an ¡intrinsically ¡3-‑ dimensional ¡manifold! ¡ ¡ 7 ¡

Principal ¡Component ¡Analysis ¡ • Areas ¡of ¡variance ¡in ¡data ¡are ¡where ¡items ¡can ¡be ¡best ¡discriminated ¡and ¡ key ¡underlying ¡phenomena ¡are ¡observed ¡ • If ¡two ¡items ¡or ¡dimensions ¡are ¡highly ¡correlated ¡or ¡dependent ¡ – They ¡are ¡likely ¡to ¡represent ¡highly ¡related ¡phenomena ¡ – We ¡want ¡to ¡combine ¡related ¡variables, ¡and ¡focus ¡on ¡uncorrelated ¡or ¡independent ¡ones, ¡especially ¡ those ¡along ¡which ¡the ¡observaNons ¡have ¡high ¡variance ¡ • We ¡look ¡for ¡the ¡phenomena ¡underlying ¡the ¡observed ¡covariance/co-‑ dependence ¡in ¡a ¡set ¡of ¡variables ¡ • These ¡phenomena ¡are ¡called ¡“principal ¡components” ¡ 8 ¡

An ¡example: ¡ 9 ¡

Principal ¡Component ¡Analysis ¡ • The ¡new ¡variables/dimensions ¡ – Are ¡uncorrelated ¡with ¡one ¡another ¡ • Orthogonal ¡in ¡original ¡dimension ¡space ¡ Original ¡Variable ¡B ¡ PC ¡2 ¡ – Capture ¡as ¡much ¡of ¡the ¡original ¡variance ¡in ¡ PC ¡1 ¡ the ¡data ¡as ¡possible ¡ – Are ¡called ¡Principal ¡Components ¡ – Are ¡linear ¡combinaNons ¡of ¡the ¡original ¡ ones ¡ • Orthogonal ¡direcNons ¡of ¡greatest ¡ variance ¡in ¡data ¡ Original ¡Variable ¡A ¡ • ProjecNons ¡along ¡PC1 ¡ discriminate ¡the ¡data ¡most ¡along ¡ any ¡one ¡axis ¡ 10 ¡

Principal ¡Component ¡Analysis ¡ First ¡principal ¡component ¡is ¡the ¡direcNon ¡ • of ¡greatest ¡variability ¡(covariance) ¡in ¡the ¡ data ¡ Original ¡Variable ¡B ¡ PC ¡2 ¡ PC ¡1 ¡ Second ¡is ¡the ¡next ¡orthogonal ¡ • (uncorrelated) ¡direcNon ¡of ¡greatest ¡ variability ¡ – So ¡first ¡remove ¡all ¡the ¡variability ¡along ¡the ¡first ¡component, ¡and ¡ then ¡find ¡the ¡next ¡direcNon ¡of ¡greatest ¡variability ¡ And ¡so ¡on ¡… ¡ • Original ¡Variable ¡A ¡ 11 ¡

Eigen/diagonal ¡Decomposi1on ¡ • Let ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡be ¡a ¡ square ¡ matrix ¡ • Theorem : ¡Exists ¡an ¡ eigen ¡decomposi1on ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ Unique for distinct diagonal eigen- ¡ ¡ values (cf. ¡matrix ¡diagonalizaNon ¡theorem) ¡ ¡ • Columns ¡of ¡ U ¡are ¡ eigenvectors ¡of ¡ S ¡ • Diagonal ¡elements ¡of ¡ ¡ ¡ ¡ ¡are ¡ eigenvalues ¡of ¡ ¡ 12 ¡

Eigenvalues ¡& ¡Eigenvectors ¡ • For ¡symmetric ¡matrices, ¡eigenvectors ¡for ¡disNnct ¡eigenvalues ¡ are ¡ orthogonal ¡ Sv 1 = λ 1 v 1 , Sv 2 = λ 2 v 2 , and λ 1 ≠ λ 2 ⇒ v 1 • v 2 = 0 • All ¡eigenvalues ¡of ¡a ¡real ¡symmetric ¡matrix ¡are ¡ real . ¡ • All ¡eigenvalues ¡of ¡a ¡posiNve ¡semidefinite ¡matrix ¡are ¡ non-‑ nega1ve ¡ 13 ¡

Compu1ng ¡the ¡Components ¡ • ProjecNon ¡of ¡vector ¡ x ¡onto ¡an ¡axis ¡(dimension) ¡ u ¡is ¡ u T x ¡ • Assume ¡ X ¡is ¡a ¡normalized ¡ n x p ¡data ¡matrix ¡for ¡ n ¡samples ¡and ¡ p ¡features. ¡ DirecNon ¡of ¡greatest ¡variability ¡is ¡that ¡in ¡which ¡the ¡average ¡square ¡of ¡the ¡ projecNon ¡is ¡greatest: ¡ ¡ (1/n) ¡u T X T X u ¡ ¡ ¡ ¡ ¡ Maximize ¡ ¡ u T u ¡= ¡1 ¡ ¡ ¡ ¡ ¡ ¡ s.t ¡ ¡ ¡ ¡ ¡ ¡ 14 ¡

Compu1ng ¡the ¡Components ¡ • ProjecNon ¡of ¡vector ¡ x ¡onto ¡an ¡axis ¡(dimension) ¡ u ¡is ¡ u T x ¡ • Assume ¡ X ¡is ¡a ¡normalized ¡ n x p ¡data ¡matrix ¡for ¡ n ¡samples ¡and ¡ p ¡features. ¡ DirecNon ¡of ¡greatest ¡variability ¡is ¡that ¡in ¡which ¡the ¡average ¡square ¡of ¡the ¡ projecNon ¡is ¡greatest: ¡ ¡ (1/n) ¡u T X T X u ¡ ¡ ¡ ¡ ¡ Maximize ¡ ¡ u T u ¡= ¡1 ¡ ¡ ¡ ¡ ¡ ¡ s.t ¡ ¡ ¡ ¡ ¡ Construct ¡Langrangian ¡ ¡(1/n) ¡ u T X T Xu ¡+ ¡ λ (1- u T u) ¡ ¡ ¡ ¡ Vector ¡of ¡parNal ¡derivaNves ¡set ¡to ¡zero ¡ 1/n ¡ X T Xu ¡ – ¡ λ u ¡ ¡= ¡0 ¡ ¡ ¡ ¡ ¡ ¡ ¡ 15 ¡

Compu1ng ¡the ¡Components ¡ • ProjecNon ¡of ¡vector ¡ x ¡onto ¡an ¡axis ¡(dimension) ¡ u ¡is ¡ u T x ¡ • Assume ¡ X ¡is ¡a ¡normalized ¡ n x p ¡data ¡matrix ¡for ¡ n ¡samples ¡and ¡ p ¡features. ¡ DirecNon ¡of ¡greatest ¡variability ¡is ¡that ¡in ¡which ¡the ¡average ¡square ¡of ¡the ¡ projecNon ¡is ¡greatest: ¡ ¡ (1/n) ¡u T X T X u ¡ ¡ ¡ ¡ ¡ Maximize ¡ ¡ u T u ¡= ¡1 ¡ ¡ ¡ ¡ ¡ ¡ s.t ¡ ¡ ¡ ¡ ¡ Construct ¡Langrangian ¡ ¡(1/n) ¡ u T X T Xu ¡+ ¡ λ (1- u T u) ¡ ¡ ¡ ¡ Vector ¡of ¡parNal ¡derivaNves ¡set ¡to ¡zero ¡ 1/n ¡ X T Xu ¡ – ¡ λ u ¡ ¡= ¡0 ¡ ¡ ¡ ¡ ¡ ¡ ¡ or ¡equivalently ¡ Su ¡ – ¡ λ u ¡ ¡= ¡0 ¡(S ¡= 1/n ¡X T X : ¡covariance ¡matrix) ¡ ¡ ¡ As ¡ u ¡≠ ¡0 ¡then ¡ u ¡must ¡be ¡an ¡eigenvector ¡of ¡S ¡ with ¡eigenvalue ¡ ¡ λ 16 ¡

Compu1ng ¡the ¡Components ¡ • ProjecNon ¡of ¡vector ¡ x ¡onto ¡an ¡axis ¡(dimension) ¡ u ¡is ¡ u T x ¡ • Assume ¡ X ¡is ¡a ¡normalized ¡ n x p ¡data ¡matrix ¡for ¡ n ¡samples ¡and ¡ p ¡features. ¡ DirecNon ¡of ¡greatest ¡variability ¡is ¡that ¡in ¡which ¡the ¡average ¡square ¡of ¡the ¡ projecNon ¡is ¡greatest: ¡ ¡ (1/n) ¡u T X T X u ¡ ¡ ¡ ¡ ¡ Maximize ¡ ¡ u T u ¡= ¡1 ¡ ¡ ¡ ¡ ¡ ¡ s.t ¡ ¡ ¡ ¡ ¡ Construct ¡Langrangian ¡ ¡(1/n) ¡ u T X T Xu ¡ – ¡ λ u T u ¡ ¡ ¡ ¡ Vector ¡of ¡parNal ¡derivaNves ¡set ¡to ¡zero ¡ 1/n ¡ X T Xu ¡ – ¡ λ u ¡ ¡= ¡0 ¡ ¡ ¡ ¡ or ¡equivalently ¡ Su ¡ – ¡ λ u ¡ ¡= ¡0 ¡(S ¡= 1/n ¡X T X : ¡covariance ¡matrix) ¡ ¡ ¡ ¡ ¡ ¡ As ¡ u ¡≠ ¡0 ¡then ¡ u ¡must ¡be ¡an ¡eigenvector ¡of ¡S ¡ with ¡eigenvalue ¡ ¡ λ – λ ¡is ¡the ¡principal ¡eigenvalue ¡of ¡the ¡ covariance ¡ matrix ¡S ¡ – The ¡eigenvalue ¡denotes ¡the ¡amount ¡of ¡variability ¡captured ¡along ¡that ¡dimension 17 ¡

Dimensionality Reduc1on Machine Learning 10-601B Seyoung - PowerPoint PPT Presentation

Dimensionality Reduc1on Machine Learning 10-601B Seyoung Kim 1 Text document retrieval/labelling Represent each document by a high-dimensional vector in

Dimensionality Reduc1on Lecture 23 David Sontag New York University Slides adapted from Carlos

Dimensionality Reduc1on Lecture 23 David Sontag New York University Slides adapted from Carlos

Investigating Dimensionality Dimensionality Dimensionality with with Investigating

STAT 209 Dimensionality Reduction November 26, 2019 Colin Reimer Dawson 1 / 24 Dimensionality

Dimensionality Reduc1on Lecture 9 David Sontag New York

Dimensionality Reduc1on Lecture 23 David Sontag New York

Dimensionality Reduc1on contd Aarti Singh Machine Learning 10-601 Nov 10,

Dimensionality Reduction Alexandros Tantos Assistant Professor Aristotle University of

Massachuse(s)Toxics)Use)Reduc1on)Act) (TURA):)Reducing)the)Use)of)Carcinogens) Rachel'Massey'

Kernel-Based Dimensionality Reduction Methods on Synthesized and Facial Image Data Jonathan L.

WIKIPEDIA ARTICLE GROUP 9 Contents Article Overview 1. Dimensionality Reduction 2.

Dimensionality Reduction INFO-4604, Applied Machine Learning University of Colorado Boulder

Estimation of Intrinsic Dimensionality Using High-Rate Vector Quantization Maxim Raginsky and

Nonlinear Dimensionality Reduction Donovan Parks Overview Direct visualization vs.

Dimensionality Reduction Algorithms (and how to interpret their output) Dalya Baron (Tel Aviv

Exploring Multivariate Data with Clustering and Dimensionality Reduction Marco Baroni Practical

Introduction to R Day 4: Functions October 10, 2019 Agenda Day 1: Figures Day 2: Selecting,

Introduction to R Statistical Consulting Center University of North Carolina at Greensboro

CONVE YANCING SE RIE S 2020 Maste r ing the e sse ntials Pr e se nte d by Phil Nolan, Risk

Monday, 16 October 2017 Monday, 16 October 2017 Monday, 16 October 2017 Monday, 16 October 2017

. . . . . . . . . . . . . . . . . . . . . Let denote an average .

Week 1: Introduc/on Precision and covariance matrix 2 1.2C

Applied Machine Learning Multivariate Gaussian Siamak Ravanbakhsh COMP 551 (Fall 2020) Admin

Potential PCA Interpretation Problems for Volatility Smile Dynamics Robert Tompkins, Dimitri