Principal Components Analysis (PCA) Prof. Mike Hughes Many - - PowerPoint PPT Presentation

principal components analysis pca
SMART_READER_LITE
LIVE PREVIEW

Principal Components Analysis (PCA) Prof. Mike Hughes Many - - PowerPoint PPT Presentation

Tufts COMP 135: Introduction to Machine Learning https://www.cs.tufts.edu/comp/135/2020f/ Principal Components Analysis (PCA) Prof. Mike Hughes Many ideas/slides attributable to: Liping Liu (Tufts), Emily Fox (UW) Matt Gormley (CMU) 2 What


slide-1
SLIDE 1

Principal Components Analysis (PCA)

2

Tufts COMP 135: Introduction to Machine Learning https://www.cs.tufts.edu/comp/135/2020f/

Many ideas/slides attributable to: Liping Liu (Tufts), Emily Fox (UW) Matt Gormley (CMU)

  • Prof. Mike Hughes
slide-2
SLIDE 2

3

Mike Hughes - Tufts COMP 135 - Fall 2020

What will we learn?

Data Examples data x

Supervised Learning Unsupervised Learning Reinforcement Learning

{xn}N

n=1

Task summary

  • f x

Performance measure

slide-3
SLIDE 3

4

Mike Hughes - Tufts COMP 135 - Fall 2020

Task: Embedding

Supervised Learning Unsupervised Learning Reinforcement Learning

embedding

x2 x1

slide-4
SLIDE 4
  • Dim. Reduction/Embedding

Unit Objectives

  • Goals of dimensionality reduction
  • Reduce feature vector size (keep signal, discard noise)
  • “Interpret” features: visualize/explore/understand
  • Common approaches
  • Principal Component Analysis (PCA)
  • word2vec and other neural embeddings
  • Evaluation Metrics
  • Storage size
  • Reconstruction error
  • “Interpretability”

5

Mike Hughes - Tufts COMP 135 - Fall 2020

slide-5
SLIDE 5

Example: 2D viz. of movies

6

Mike Hughes - Tufts COMP 135 - Fall 2020

slide-6
SLIDE 6

7

Mike Hughes - Tufts COMP 135 - Fall 2020

Example: Genes vs. geography

Nature, 2008

slide-7
SLIDE 7

Centering the Data

Goal: each feature’s mean = 0.0

8

Mike Hughes - Tufts COMP 135 - Fall 2020

slide-8
SLIDE 8

Constant Reconstruction model

Parameters: m, an F-dim vector Training problem: Minimize reconstruction error

9

Mike Hughes - Tufts COMP 135 - Fall 2020

min

m∈RF N

X

n=1

(xn − m)T (xn − m)

m∗ = mean(x1, . . . xN)

Optimal parameters:

Think of mean vector as optimal “reconstruction” of a dataset if you must use a single vector

ˆ xi = m

<latexit sha1_base64="dUHKmRLUMswF0x+NPSnxqRI0fg=">AB/3icbVDLSsNAFJ34rPUVFdy4GSyCq5JUQTdC0Y3LCvYBTQiT6aQdOnkwcyMtMQt/xY0LRdz6G+78G6dtFtp64MLhnHu59x4/EVyBZX0bS8srq2vrpY3y5tb2zq65t9ScSopa9JYxLjE8UEj1gTOAjWSQjoS9Y2x/eTPz2A5OKx9E9jBPmhqQf8YBTAlryzENnQCBzgI3AD7JR7vEcX+HQMytW1ZoCLxK7IBVUoOGZX04vpmnIqCKNW1rQTcjEjgVLC87KSKJYQOSZ91NY1IyJSbTe/P8YlWejiIpa4I8FT9PZGRUKlx6OvOkMBAzXsT8T+vm0Jw6WY8SlJgEZ0tClKBIcaTMHCPS0ZBjDUhVHJ9K6YDIgkFHVlZh2DPv7xIWrWqfVat3Z1X6tdFHCV0hI7RKbLRBaqjW9RATUTRI3pGr+jNeDJejHfjY9a6ZBQzB+gPjM8fH62WJw=</latexit>

m This is squared error between two vectors

slide-9
SLIDE 9

Mean reconstruction

10

Mike Hughes - Tufts COMP 135 - Fall 2020

  • riginal

reconstructed

slide-10
SLIDE 10

Linear Reconstruction and Principal Component Analysis

11

Mike Hughes - Tufts COMP 135 - Fall 2020

slide-11
SLIDE 11

Linear Projection to 1D

12

Mike Hughes - Tufts COMP 135 - Fall 2020

slide-12
SLIDE 12

Reconstruction from 1D to 2D

13

Mike Hughes - Tufts COMP 135 - Fall 2020

slide-13
SLIDE 13

2D Orthogonal Basis

14

Mike Hughes - Tufts COMP 135 - Fall 2020 If we could project into 2 dims (same as F), we can perfectly reconstruct

slide-14
SLIDE 14

Which 1D projection is best?

15

Mike Hughes - Tufts COMP 135 - Fall 2020

Idea: Minimize reconstruction error

slide-15
SLIDE 15

Linear Reconstruction Model with 1 components

16

Mike Hughes - Tufts COMP 135 - Fall 2020

Fx1 High-dim. data 1 x 1 Low-dim embedding

  • r “score”

F x 1 Weights F x 1 “mean” vector

ˆ xi = wzi + m

<latexit sha1_base64="zytQD0ua0ZeTvtQCu7rVyhs+2Jg=">ACGXicbVDLSsNAFJ3UV62vqEs3g0UQhJUQTdC0Y3LCvYBTQmT6aQdOpmEmYlaQ37Djb/ixoUiLnXl3zhpo2jrgYEz59zLvfd4EaNSWdanUZibX1hcKi6XVlbX1jfMza2mDGOBSQOHLBRtD0nCKCcNRUj7UgQFHiMtLzhea3romQNORXahSRboD6nPoUI6Ul17ScAVKJEyA18PzkNk1dCk/h9/8mhXdaOPgRgtQ1y1bFGgPOEjsnZCj7prvTi/EcUC4wgxJ2bGtSHUTJBTFjKQlJ5YkQniI+qSjKUcBkd1kfFkK97TSg34o9OMKjtXfHQkKpBwFnq7MNpTXib+53Vi5Z90E8qjWBGOJ4P8mEVwiwm2KOCYMVGmiAsqN4V4gESCsdZkmHYE+fPEua1Yp9WKleHpVrZ3kcRbADdsE+sMExqIELUAcNgME9eATP4MV4MJ6MV+NtUlow8p5t8AfGxf1vKDg</latexit>
slide-16
SLIDE 16

Linear Reconstruction Model with 1 components

17

Mike Hughes - Tufts COMP 135 - Fall 2020

Problem: “Over-parameterized”. Too many possible solutions! Suppose we have an alternate model with weights w’ and embedding z’ We would get equivalent reconstructions if we set:

  • w’ = w * 2
  • z’ = z / 2

Solution: Constrain magnitude of w. w is a unit vector. We care about direction, not scale.

ˆ xi = wzi + m

<latexit sha1_base64="zytQD0ua0ZeTvtQCu7rVyhs+2Jg=">ACGXicbVDLSsNAFJ3UV62vqEs3g0UQhJUQTdC0Y3LCvYBTQmT6aQdOpmEmYlaQ37Djb/ixoUiLnXl3zhpo2jrgYEz59zLvfd4EaNSWdanUZibX1hcKi6XVlbX1jfMza2mDGOBSQOHLBRtD0nCKCcNRUj7UgQFHiMtLzhea3romQNORXahSRboD6nPoUI6Ul17ScAVKJEyA18PzkNk1dCk/h9/8mhXdaOPgRgtQ1y1bFGgPOEjsnZCj7prvTi/EcUC4wgxJ2bGtSHUTJBTFjKQlJ5YkQniI+qSjKUcBkd1kfFkK97TSg34o9OMKjtXfHQkKpBwFnq7MNpTXib+53Vi5Z90E8qjWBGOJ4P8mEVwiwm2KOCYMVGmiAsqN4V4gESCsdZkmHYE+fPEua1Yp9WKleHpVrZ3kcRbADdsE+sMExqIELUAcNgME9eATP4MV4MJ6MV+NtUlow8p5t8AfGxf1vKDg</latexit>

F

X

f=1

w2

f = 1

<latexit sha1_base64="+9+RCl1NQLq0x56CkST31Y4T5uY=">ACAHicbVDLSsNAFJ34rPUVdeHCzWARXJWkCropFAVxWcE+oE3DZDph85MwsxEKSEbf8WNC0Xc+hnu/BunbRbaeuDC4Zx7ufeIGZUacf5tpaWV1bX1gsbxc2t7Z1de2+/qaJEYtLAEYtkO0CKMCpIQ1PNSDuWBPGAkVYwup74rQciFY3EvR7HxONoIGhIMdJG8u3Drkq4n4ZVN+vdwEc/7FVgFbrQt0tO2ZkCLhI3JyWQo+7bX91+hBNOhMYMKdVxnVh7KZKaYkayYjdRJEZ4hAakY6hAnCgvnT6QwROj9GEYSVNCw6n6eyJFXKkxD0wnR3qo5r2J+J/XSXR46aVUxIkmAs8WhQmDOoKTNGCfSoI1GxuCsKTmVoiHSCKsTWZFE4I7/IiaVbK7lm5cndeql3lcRTAETgGp8AF6AGbkEdNAGXgGr+DNerJerHfrY9a6ZOUzB+APrM8fKFmUzw=</latexit>

W is a vector on unit circle. Magnitude is always 1.

slide-17
SLIDE 17

Linear Reconstruction Model with 1 components

18

Mike Hughes - Tufts COMP 135 - Fall 2020

ˆ xi = wzi + m

<latexit sha1_base64="zytQD0ua0ZeTvtQCu7rVyhs+2Jg=">ACGXicbVDLSsNAFJ3UV62vqEs3g0UQhJUQTdC0Y3LCvYBTQmT6aQdOpmEmYlaQ37Djb/ixoUiLnXl3zhpo2jrgYEz59zLvfd4EaNSWdanUZibX1hcKi6XVlbX1jfMza2mDGOBSQOHLBRtD0nCKCcNRUj7UgQFHiMtLzhea3romQNORXahSRboD6nPoUI6Ul17ScAVKJEyA18PzkNk1dCk/h9/8mhXdaOPgRgtQ1y1bFGgPOEjsnZCj7prvTi/EcUC4wgxJ2bGtSHUTJBTFjKQlJ5YkQniI+qSjKUcBkd1kfFkK97TSg34o9OMKjtXfHQkKpBwFnq7MNpTXib+53Vi5Z90E8qjWBGOJ4P8mEVwiwm2KOCYMVGmiAsqN4V4gESCsdZkmHYE+fPEua1Yp9WKleHpVrZ3kcRbADdsE+sMExqIELUAcNgME9eATP4MV4MJ6MV+NtUlow8p5t8AfGxf1vKDg</latexit>

W is a vector on unit circle. Magnitude is always 1.

Given fixed weights w and a specific x, what is the optimal scalar z value? Minimize reconstruction error! Exact analytical solution (take gradient, set to zero, solve for z) gives:

z = wT (x − m)

<latexit sha1_base64="wLdgEcLQpRKuImgEmyR48SYRZ9Q=">AB+HicbVDLTgIxFO3gC/HBqEs3jcQEF5IZNGNCdGNS0x4JTCSTulAQ9uZtB0VJnyJGxca49ZPcefWGAWCp7kJifn3Jt7/EjRpV2nG8rs7K6tr6R3cxtbe/s5u29/YKY4lJHYcslC0fKcKoIHVNSOtSBLEfUa/vBm6jcfiFQ0FDU9iojHUV/QgGKkjdS182N4BR/va7D4BE8hP+naBafkzACXiZuSAkhR7dpfnV6IY06Exgwp1XadSHsJkpiRia5TqxIhPAQ9UnbUIE4UV4yO3wCj43Sg0EoTQkNZ+rviQRxpUbcN50c6YFa9Kbif1471sGl1ARxZoIPF8UxAzqE5TgD0qCdZsZAjCkpbIR4gibA2WeVMCO7iy8ukUS65Z6Xy3Xmhcp3GkQWH4AgUgQsuQAXcgiqoAwxi8AxewZs1tl6sd+tj3pqx0pkD8AfW5w+wBpEo</latexit>

Projection of feature vector x onto vector w after “centering” (removing the mean)

min

z∈R

(x − (wz + m))2

<latexit sha1_base64="K0louETznvNXoYDCXF7hv1DtY=">ACMHicbVDLSgMxFM34rPVdenmYhEUscxUQZeiC12qWBU6tWTSjIYmSHJqHWY/pEbP0U3Coq49StMH4qvA4Fzr2X3HuCmDNtXPfJGRgcGh4ZzY3lxycmp6YLM7PHOkoUoRUS8UidBlhTziStGY4PY0VxSLg9CRo7nTqJ5dUaRbJI9OKaU3gc8lCRrCxVr2w6wsm6+kN+EyCL7C5CIL0Mug3V7qyTC9zmAVvtRVBjewAp9SZMuwfFauF4puye0C/hKvT4qoj/164d5vRCQRVBrCsdZVz41NLcXKMJplvcTWNMmvicVi2VWFBdS7sHZ7BonQaEkbJPGui63ydSLRuicB2dtbUv2sd879aNTHhZi1lMk4MlaT3UZhwMBF0oMGU5QY3rIE8XsrkAusMLE2IzNgTv98l/yXG5K2Vygfrxa3tfhw5NI8W0BLy0AbaQntoH1UQbfoAT2jF+fOeXRenbde64DTn5lDP+C8fwAqVaj7</latexit>

Fx1 1 x 1 F x 1 F x 1

slide-18
SLIDE 18

Linear Reconstruction Model with K components

19

Mike Hughes - Tufts COMP 135 - Fall 2020

F x 1 High-dim. data K x 1 Low-dim vector F x K Weights F x 1 Mean of data vector

ˆ xi = Wzi + m

<latexit sha1_base64="46QcHBJKEcuj9/eJEC2G4ul9bs=">ACInicbVDLSsNAFJ34rPUVdelmsAiCUJIqAuh6MZlBfuANoTJdNIOnUnCzESsId/ixl9x40JRV4If46SNRVsPDJx7zr3MvceLGJXKsj6NufmFxaXlwkpxdW19Y9Pc2m7IMBaY1HIQtHykCSMBqSuqGKkFQmCuMdI0xtcZn7zlghJw+BGDSPicNQLqE8xUlpyzbNOH6mkw5Hqe35yl6Yuhefwp26mE3qfOYeTkqeuWbLK1ghwltg5KYEcNd873RDHMSKMyQlG3bipSTIKEoZiQtdmJIoQHqEfamgaIE+koxNTuK+VLvRDoV+g4Ej9PZEgLuWQe7oz21BOe5n4n9eOlX/qJDSIYkUCP7IjxlUIczygl0qCFZsqAnCgupdIe4jgbDSqRZ1CPb0ybOkUSnbR+XK9XGpepHUQC7YA8cABucgCq4AjVQBxg8gCfwAl6NR+PZeDM+xq1zRj6zA/7A+PoGheWk4A=</latexit>

W =   | | | w1 w2 . . . wK | | |  

<latexit sha1_base64="tPE1Wzn+xEBbswS0K+Urtg2bJmU=">ACcXicbVFfS9xAEN/EVm367R9KcWyeLQUCkdyCvpSkPal0BcLnidcQthsJneLm03YnShHzLufze/hC9+gW7uUriqs8zym9/825lNSikM+v6t4649e76+sfnCe/nq9Zu3va3tU1NUmsOIF7LQZwkzIWCEQqUcFZqYHkiYZyc/2z94wvQRhTqBOclRDmbKpEJztBSce96TL/TUEKGExomMBWqZlqzeVNzujiNd0W/0Fbt7YWhF+YMZ0lWXzZxYLkVc9iaMi3QrPLx76ZN+1dlWQZU2vWhoRbTGUZxr+8P/IXQxyDoQJ90chz3bsK04FUOCrlkxkwCv8TIVkXBJdielYGS8XM2hYmFiuVgonqxsYZ+tkxKs0JbVUgX7GpGzXJj5nliI9s5zENfSz7lm1SYHUa1UGWFoPiyUVZJigVt109ToYGjnFvAuBb2rZTPmGYc7Sd5dgnBw5Efg9PhINgbDP/s949+dOvYJB/JLvlKAnJAjsgvckxGhJM7572z43xy7t0PLnV3l6Gu0+W8I/+J+0vdG+3Zw=</latexit>

Each of the K weight vectors wk is one “component”. Our goal is to find the K weight vectors that best reconstruct our training dataset

min

W ∈RF ×K N

X

n=1 F

X

f=1

(xnf − ˆ xnf(W))2

<latexit sha1_base64="NdPAnfuhieiVL0AYaUg60rf8g=">ACS3icbVDPaxNBFJ5Nra3xV9Sjl4dBSA+G3VioF6FYKIgVUxTyCTL7GS2GTozu515Kw3D/n9evHjzn/DiQREPzqZ70NYPBr7ve+8x731ZqaTDOP4adTZubN7c2r7VvX3n7r37vQcPj1RWS7GvFCFPcmYE0oaMUaJSpyUVjCdKTHJzg6a+uSjsE4W5gOuSjHT7NTIXHKGwUp7GdXSpH4CVBqgmuEy/z7eu4PKUotHLypawB6XrEFUFfp1JuXST1/24q8EYcwuAh+XsMzoEuG/qJey8FkB3bmo7TXj4fxGnCdJC3pkxZHae8LXRS80sIgV8y5aRKXOPMouRK1F1aOVEyfsZOxTRQw8KeM7/OoanwVlAXtjwDMLa/XvCM+3cSmehs7nWXa015v9q0wrzFzMvTVmhMPzyo7xSgAU0wcJCWsFRrQJh3MqwK/Als4xjiL8bQkiunydHI+GyfPh6N1uf/9VG8c2eUyekAFJyB7ZJ6/JERkTj6Rb+QH+Rl9jr5Hv6Lfl62dqJ15RP5BZ/MPrsGyg=</latexit>

Solving this squared error reconstruction objective is known as principal components analysis (PCA)

slide-19
SLIDE 19

Linear Reconstruction Model with K components

20

Mike Hughes - Tufts COMP 135 - Fall 2020 We will require that:

  • (1) All weight vectors are unit vectors
  • This fixes scale and avoid several W with same error
  • (2) Component directions are orthogonal (perpendicular)
  • Avoids information redundancy in W’s components

Weights that satisfy (1) and (2) form an “orthonormal basis”

W =   | | | w1 w2 . . . wK | | |  

<latexit sha1_base64="tPE1Wzn+xEBbswS0K+Urtg2bJmU=">ACcXicbVFfS9xAEN/EVm367R9KcWyeLQUCkdyCvpSkPal0BcLnidcQthsJneLm03YnShHzLufze/hC9+gW7uUriqs8zym9/825lNSikM+v6t4649e76+sfnCe/nq9Zu3va3tU1NUmsOIF7LQZwkzIWCEQqUcFZqYHkiYZyc/2z94wvQRhTqBOclRDmbKpEJztBSce96TL/TUEKGExomMBWqZlqzeVNzujiNd0W/0Fbt7YWhF+YMZ0lWXzZxYLkVc9iaMi3QrPLx76ZN+1dlWQZU2vWhoRbTGUZxr+8P/IXQxyDoQJ90chz3bsK04FUOCrlkxkwCv8TIVkXBJdielYGS8XM2hYmFiuVgonqxsYZ+tkxKs0JbVUgX7GpGzXJj5nliI9s5zENfSz7lm1SYHUa1UGWFoPiyUVZJigVt109ToYGjnFvAuBb2rZTPmGYc7Sd5dgnBw5Efg9PhINgbDP/s949+dOvYJB/JLvlKAnJAjsgvckxGhJM7572z43xy7t0PLnV3l6Gu0+W8I/+J+0vdG+3Zw=</latexit>

F x 1 High-dim. data K x 1 Low-dim vector F x K Weights F x 1 Mean of data vector

ˆ xi = Wzi + m

<latexit sha1_base64="46QcHBJKEcuj9/eJEC2G4ul9bs=">ACInicbVDLSsNAFJ34rPUVdelmsAiCUJIqAuh6MZlBfuANoTJdNIOnUnCzESsId/ixl9x40JRV4If46SNRVsPDJx7zr3MvceLGJXKsj6NufmFxaXlwkpxdW19Y9Pc2m7IMBaY1HIQtHykCSMBqSuqGKkFQmCuMdI0xtcZn7zlghJw+BGDSPicNQLqE8xUlpyzbNOH6mkw5Hqe35yl6Yuhefwp26mE3qfOYeTkqeuWbLK1ghwltg5KYEcNd873RDHMSKMyQlG3bipSTIKEoZiQtdmJIoQHqEfamgaIE+koxNTuK+VLvRDoV+g4Ej9PZEgLuWQe7oz21BOe5n4n9eOlX/qJDSIYkUCP7IjxlUIczygl0qCFZsqAnCgupdIe4jgbDSqRZ1CPb0ybOkUSnbR+XK9XGpepHUQC7YA8cABucgCq4AjVQBxg8gCfwAl6NR+PZeDM+xq1zRj6zA/7A+PoGheWk4A=</latexit>

wT

k wk = 1

→ PF

f=1 W 2 fk = 1

<latexit sha1_base64="oluz+D131CYKNVOsMWq9I94j4tI=">ACQnicbVDLSgMxFM34rPVdekmWARXZaYKuimIgrisYK3SqUMmvdOGZh4md9QyzLe58Qvc+QFuXCji1oVp7cLXgcDhnHuS3OMnUmi07UdrYnJqema2MFecX1hcWi6trJ7pOFUcGjyWsTr3mQYpImigQAniQIW+hKafv9w6DevQWkR6c4SKAdsm4kAsEZGskrXbghw54fZDe5l/Xzy1P6TejTGqVO0b1KWYe6SnR7yJSKb6iLcIsaBxKoq9PQy4Kak18e0aZh5pKqyTleqWxX7BHoX+KMSZmMUfdKD24n5mkIEXLJtG45doLtjCkUXEJedFMNCeN91oWoRELQbezUQU53TRKhwaxMidCOlK/JzIWaj0IfTM53E/9obif14rxWCvnYkoSREi/vVQkEqKMR32STtCAUc5MIRxJcxfKe8xTia1oumBOf3yn/JWbXibFeqJzvl/YNxHQWyTjbIFnHILtknx6ROGoSTO/JEXsirdW89W2/W+9fohDXOrJEfsD4+AWqWsJQ=</latexit>

wT

j wk = 0

→ PF

f=1 WfjWfk = 0

<latexit sha1_base64="wqcCc/XOreIqz+Ued60IN5sLtk=">ACR3icbVDLSgMxFM3UV62vqks3wSK4KjMq6KYgCuKygrVCpx0y6Z02NvMwuaOWYf7OjVt3/oIbF4q4NK1d+DoQcjnHpJ7/EQKjb9ZBWmpmdm54rzpYXFpeWV8urahY5TxaHBYxmrS59pkCKCBgqUcJkoYKEvoekPjkd+8waUFnF0jsME2iHrRSIQnKGRvHLHDRn2/SC7zb3sKu+c02/CgNYotUvudcq61FWi10emVHxLXYQ71DiUQF2dhl4W1Jy8c0Kbhl3l42uQm7DtlSt21R6D/iXOhFTIBHWv/Oh2Y56GECGXTOuWYyfYzphCwSXkJTfVkDA+YD1oGRqxEHQ7G/eQ0y2jdGkQK3MipGP1eyJjodbD0DeToyX1b28k/ue1UgwO2pmIkhQh4l8PBamkGNRqbQrFHCUQ0MYV8L8lfI+U4yjqb5kSnB+r/yXOxUnd3qztle5fBoUkeRbJBNsk0csk8OySmpkwbh5J48k1fyZj1YL9a79fE1WrAmXyAwXrE+jQsdI=</latexit>

8j 6= k

<latexit sha1_base64="tIxlaGYQtu7rthENmyWEIoXRgJI=">AB+XicbVDLSgMxFL1TX7W+Rl26CRbBVZmpgi6LblxWsA9oS8mkt21sJjMmUIZ+iduXCji1j9x59+YtrPQ1gOBwzn3ck9OEAujed9O7m19Y3Nrfx2YWd3b/APTyq6yhRDGsEpFqBlSj4BJrhuBzVghDQOBjWB0O/MbY1SaR/LBTGLshHQgeZ8zaqzUd12P1JUCPJI2hKfyKjrFr2SNwdZJX5GipCh2nW/2r2IJSFKwTVuV7semkVBnOBE4L7URjTNmIDrBlqaQh6k46Tz4lZ1bpERvBPmnIXP29kdJQ60kY2MmQmqFe9mbif14rMf3rTsplnBiUbHGonwhiIjKrgfS4QmbExBLKFLdZCRtSRZmxZRVsCf7yl1dJvVzyL0rl+8ti5SarIw8ncArn4MVOAOqlADBmN4hld4c1LnxXl3PhajOSfbOY/cD5/AKEPkwA=</latexit>
slide-20
SLIDE 20

View: PCA as Matrix Factorization

21

Mike Hughes - Tufts COMP 135 - Fall 2020

π

W T

<latexit sha1_base64="/0EKeFOh3BxLTKvNIGXg/QMrZ20=">AB6nicbVDLSgNBEOyNrxhfUY9eBoPgKexGQY9BLx4j5gXJGmYnvcmQ2dlZlYIZ/gxYMiXv0ib/6Nk2QPmljQUFR1090VJIJr47rfTm5tfWNzK79d2Nnd2z8oHh41dZwqhg0Wi1i1A6pRcIkNw43AdqKQRoHAVjC6nfmtJ1Sax7Juxgn6ER1IHnJGjZUeWo/1XrHklt05yCrxMlKCDLVe8avbj1kaoTRMUK07npsYf0KV4UzgtNBNSaUjegAO5ZKGqH2J/NTp+TMKn0SxsqWNGSu/p6Y0EjrcRTYzoiaoV72ZuJ/Xic14bU/4TJDUq2WBSmgpiYzP4mfa6QGTG2hDLF7a2EDamizNh0CjYEb/nlVdKslL2LcuX+slS9yeLIwmcwjl4cAVuIMaNIDBAJ7hFd4c4bw4787HojXnZDPH8AfO5w8OP42l</latexit>

W =   | | | w1 w2 . . . wK | | |  

<latexit sha1_base64="tPE1Wzn+xEBbswS0K+Urtg2bJmU=">ACcXicbVFfS9xAEN/EVm367R9KcWyeLQUCkdyCvpSkPal0BcLnidcQthsJneLm03YnShHzLufze/hC9+gW7uUriqs8zym9/825lNSikM+v6t4649e76+sfnCe/nq9Zu3va3tU1NUmsOIF7LQZwkzIWCEQqUcFZqYHkiYZyc/2z94wvQRhTqBOclRDmbKpEJztBSce96TL/TUEKGExomMBWqZlqzeVNzujiNd0W/0Fbt7YWhF+YMZ0lWXzZxYLkVc9iaMi3QrPLx76ZN+1dlWQZU2vWhoRbTGUZxr+8P/IXQxyDoQJ90chz3bsK04FUOCrlkxkwCv8TIVkXBJdielYGS8XM2hYmFiuVgonqxsYZ+tkxKs0JbVUgX7GpGzXJj5nliI9s5zENfSz7lm1SYHUa1UGWFoPiyUVZJigVt109ToYGjnFvAuBb2rZTPmGYc7Sd5dgnBw5Efg9PhINgbDP/s949+dOvYJB/JLvlKAnJAjsgvckxGhJM7572z43xy7t0PLnV3l6Gu0+W8I/+J+0vdG+3Zw=</latexit>

X

<latexit sha1_base64="kpPTAtGMnO2krFRSnNra2xDivU=">AB6HicbVBNS8NAEJ3Ur1q/qh69LBbBU0mqoMeiF48t2FpoQ9lsJ+3azSbsboQS+gu8eFDEqz/Jm/GbZuDtj4YeLw3w8y8IBFcG9f9dgpr6xubW8Xt0s7u3v5B+fCoreNUMWyxWMSqE1CNgktsGW4EdhKFNAoEPgTj25n/8IRK81jem0mCfkSHkoecUWOlZqdfrhVdw6ySrycVCBHo1/+6g1ilkYoDRNU67nJsbPqDKcCZyWeqnGhLIxHWLXUkj1H42P3RKzqwyIGsbElD5urviYxGWk+iwHZG1Iz0sjcT/O6qQmv/YzLJDUo2WJRmApiYjL7mgy4QmbExBLKFLe3EjaijJjsynZELzl1dJu1b1Lq15mWlfpPHUYQTOIVz8OAK6nAHDWgBA4RneIU359F5cd6dj0VrwclnjuEPnM8ftweM4A=</latexit>

Z

<latexit sha1_base64="FO5NRjT0DapzdaQZYFw0PtqJG0U=">AB6HicbVDLTgJBEOzF+IL9ehlIjHxRHbRI9ELx4hkUeEDZkdemFkdnYzM2tCF/gxYPGePWTvPk3DrAHBSvpFLVne6uIBFcG9f9dnJr6xubW/ntws7u3v5B8fCoqeNUMWywWMSqHVCNgktsG4EthOFNAoEtoLR7cxvPaHSPJb3ZpygH9GB5CFn1Fip/tArltyOwdZJV5GSpCh1it+dfsxSyOUhgmqdcdzE+NPqDKcCZwWuqnGhLIRHWDHUkj1P5kfuiUnFmlT8JY2ZKGzNXfExMaT2OAtsZUTPUy95M/M/rpCa89idcJqlByRaLwlQE5PZ16TPFTIjxpZQpri9lbAhVZQZm03BhuAtv7xKmpWyd1Gu1C9L1ZsjycwCmcgwdXUIU7qEDGCA8wyu8OY/Oi/PufCxac042cwx/4Hz+ALoPjOI=</latexit>

View: Encoding and Decoding

ˆ xi = Wzi + m

<latexit sha1_base64="46QcHBJKEcuj9/eJEC2G4ul9bs=">ACInicbVDLSsNAFJ34rPUVdelmsAiCUJIqAuh6MZlBfuANoTJdNIOnUnCzESsId/ixl9x40JRV4If46SNRVsPDJx7zr3MvceLGJXKsj6NufmFxaXlwkpxdW19Y9Pc2m7IMBaY1HIQtHykCSMBqSuqGKkFQmCuMdI0xtcZn7zlghJw+BGDSPicNQLqE8xUlpyzbNOH6mkw5Hqe35yl6Yuhefwp26mE3qfOYeTkqeuWbLK1ghwltg5KYEcNd873RDHMSKMyQlG3bipSTIKEoZiQtdmJIoQHqEfamgaIE+koxNTuK+VLvRDoV+g4Ej9PZEgLuWQe7oz21BOe5n4n9eOlX/qJDSIYkUCP7IjxlUIczygl0qCFZsqAnCgupdIe4jgbDSqRZ1CPb0ybOkUSnbR+XK9XGpepHUQC7YA8cABucgCq4AjVQBxg8gCfwAl6NR+PZeDM+xq1zRj6zA/7A+PoGheWk4A=</latexit>

zn = W T (xn − m)

<latexit sha1_base64="AnK3oLSf5ZYtTUl9JcKuHYRENy0=">AB/HicbZDLSgMxFIYzXmu9jXbpJliEurDMVE3QtGNywq9QTsOmThiaZIcmIY6mv4saFIm59EHe+jWk7C239IfDxn3M4J38QM6q043xbS8srq2vruY385tb2zq69t9USIxaeCIRbIdIEUYFaShqWakHUuCeMBIKxheT+qteyIVjURdpzHxOoLGlKMtLF8u/DoC3gJW3d1WHoweAL5sW8XnbIzFVwEN4MiyFTz7a9uL8IJ0JjhpTquE6svRGSmJGxvluokiM8BD1ScegQJwobzQ9fgyPjNODYSTNExpO3d8TI8SVSnlgOjnSAzVfm5j/1TqJDi+8ERVxonAs0VhwqCO4CQJ2KOSYM1SAwhLam6FeIAkwtrklTchuPNfXoRmpeyeliu3Z8XqVRZHDhyAQ1ACLjgHVXADaqABMEjBM3gFb9aT9WK9Wx+z1iUrmymAP7I+fwCYZpLK</latexit>

π

encode “transform” decode “reconstruct”

xn

<latexit sha1_base64="2OdGja3qMfiDRv5VfkfpL6vXD8=">AB6nicbVBNS8NAEJ34WetX1aOXxSJ4KkV9Fj04rGi/YA2lM120i7dbMLuRiyhP8GLB0W8+ou8+W/ctjlo64OBx3szMwLEsG1cd1vZ2V1bX1js7BV3N7Z3dsvHRw2dZwqhg0Wi1i1A6pRcIkNw43AdqKQRoHAVjC6mfqtR1Sax/LBjBP0IzqQPOSMGivdP/Vkr1R2K+4MZJl4OSlDjnqv9NXtxyNUBomqNYdz02Mn1FlOBM4KXZTjQlIzrAjqWSRqj9bHbqhJxapU/CWNmShszU3xMZjbQeR4HtjKgZ6kVvKv7ndVITXvkZl0lqUL5ojAVxMRk+jfpc4XMiLElClubyVsSBVlxqZTtCF4iy8vk2a14p1XqncX5dp1HkcBjuEzsCDS6jBLdShAQwG8Ayv8OYI58V5dz7mrStOPnMEf+B8/gBpco3h</latexit>

ˆ xn

<latexit sha1_base64="zNc7bDHzOV8PLkB763pxPeDztAU=">AB8HicbVBNS8NAEJ34WetX1aOXxSJ4KkV9Fj04rGC/ZA2lM120y7dbMLuRCyhv8KLB0W8+nO8+W/ctjlo64OBx3szMwLEikMu63s7K6tr6xWdgqbu/s7u2XDg6bJk414w0Wy1i3A2q4FIo3UKDk7URzGgWSt4LRzdRvPXJtRKzucZxwP6IDJULBKFrpoTukmD1NeqpXKrsVdwayTLyclCFHvVf6vZjlkZcIZPUmI7nJuhnVKNgk+K3dTwhLIRHfCOpYpG3PjZ7OAJObVKn4SxtqWQzNTfExmNjBlHge2MKA7NojcV/M6KYZXfiZUkiJXbL4oTCXBmEy/J32hOUM5toQyLeythA2pgxtRkUbgrf48jJpViveaV6d1GuXedxFOAYTuAMPLiEGtxCHRrAIJneIU3RzsvzrvzMW9dcfKZI/gD5/MHOK+Qrg=</latexit>

zn

<latexit sha1_base64="AnK3oLSf5ZYtTUl9JcKuHYRENy0=">AB/HicbZDLSgMxFIYzXmu9jXbpJliEurDMVE3QtGNywq9QTsOmThiaZIcmIY6mv4saFIm59EHe+jWk7C239IfDxn3M4J38QM6q043xbS8srq2vruY385tb2zq69t9USIxaeCIRbIdIEUYFaShqWakHUuCeMBIKxheT+qteyIVjURdpzHxOoLGlKMtLF8u/DoC3gJW3d1WHoweAL5sW8XnbIzFVwEN4MiyFTz7a9uL8IJ0JjhpTquE6svRGSmJGxvluokiM8BD1ScegQJwobzQ9fgyPjNODYSTNExpO3d8TI8SVSnlgOjnSAzVfm5j/1TqJDi+8ERVxonAs0VhwqCO4CQJ2KOSYM1SAwhLam6FeIAkwtrklTchuPNfXoRmpeyeliu3Z8XqVRZHDhyAQ1ACLjgHVXADaqABMEjBM3gFb9aT9WK9Wx+z1iUrmymAP7I+fwCYZpLK</latexit>
slide-21
SLIDE 21

Principal Component Analysis

Input:

  • X : query data, Q x F
  • Q examples of high-dim. feature vectors
  • Trained PCA parameters (contained inside pca)
  • m : mean vector, size F
  • W : learned basis of weight vectors, F x K

Output:

  • Z : projections, N x K
  • Each row Z[n] is a low-dim. “embedding” of X[n]

22

Mike Hughes - Tufts COMP 135 - Fall 2020

Transformation step

What happens when you call pca.transform(x_QF)

zn = W T (xn − m)

<latexit sha1_base64="AnK3oLSf5ZYtTUl9JcKuHYRENy0=">AB/HicbZDLSgMxFIYzXmu9jXbpJliEurDMVE3QtGNywq9QTsOmThiaZIcmIY6mv4saFIm59EHe+jWk7C239IfDxn3M4J38QM6q043xbS8srq2vruY385tb2zq69t9USIxaeCIRbIdIEUYFaShqWakHUuCeMBIKxheT+qteyIVjURdpzHxOoLGlKMtLF8u/DoC3gJW3d1WHoweAL5sW8XnbIzFVwEN4MiyFTz7a9uL8IJ0JjhpTquE6svRGSmJGxvluokiM8BD1ScegQJwobzQ9fgyPjNODYSTNExpO3d8TI8SVSnlgOjnSAzVfm5j/1TqJDi+8ERVxonAs0VhwqCO4CQJ2KOSYM1SAwhLam6FeIAkwtrklTchuPNfXoRmpeyeliu3Z8XqVRZHDhyAQ1ACLjgHVXADaqABMEjBM3gFb9aT9WK9Wx+z1iUrmymAP7I+fwCYZpLK</latexit>
slide-22
SLIDE 22

Example: PCA on Faces

23

Mike Hughes - Tufts COMP 135 - Fall 2020

  • riginal

k=F If we use all possible components, we perfectly reconstruct original data

π

slide-23
SLIDE 23

Principal Component Analysis

Input:

  • X : training data, N x F
  • N examples of high-dim. feature vectors
  • K : int, number of components
  • Satisfies 1 <= K <= F

Output: Trained parameters for PCA

  • m : mean vector, size F
  • W : learned basis of weight vectors, F x K
  • One F-dim. unit vector (magnitude 1) for each component
  • Each of the K vectors is orthogonal to every other

24

Mike Hughes - Tufts COMP 135 - Fall 2020

Training step : What happens when we call pca.fit(x_NF)

min

m∈RF ,W ∈RF ×K N

X

n=1 F

X

f=1

(xnf − ˆ xnf(m, W))2 subject to: W T W = IK

<latexit sha1_base64="SWfPRaHQK+lvYdbMp1HWzMHnjvk=">ACjHicbVFtaxNBEN4739pTa6of/TIYlBRqehcLrUqgKASlIFWaXiGbHubvWbt7t5uycJy/0a/5Hf/DfuJRE0dWDhmWdeduaZtBcmzD85fm3bt+5e29rO7j/4OHOo9bu4wudVyVlQ5qLvLxMiWaCKzY03Ah2WZSMyFSwOL1+38Tj76zUPFfnZlGwsSRXimecEuOopPUDS64SKwFzBVgSM0tT+6WeDPYh3uTsABsumYbTugbA3yoyhRdYVzKxqh/Vk0+wcrLGUBn7vishpeAZ8TYeb10O3I/3oO9S/AOMCGzY3VfqVUQMmf1P/aRtPzt0AfiYnAZJqx12w6XBTRCtQRut7Sxp/cTnFaSKUMF0XoUhYUZW1IaTgWrA1xpVhB6Ta7YyEF3FJjuxSzhueOmUKWl+4pA0v27wpLpNYLmbrMRhq9GWvI/8VGlcmOx5arojJM0dVHWSXc2tBcBqa8dCqIhQOEltzNCnRGSkKNu18jQrS58k1w0etGr7q9z4ftk3drObQU/QMdVCEjtAJ+oDO0BRb9s78I691/6Of+i/9furVN9b1zxB/5g/+A2jgsIw</latexit>

Orthonormal constraint

slide-24
SLIDE 24

25

Mike Hughes - Tufts COMP 135 - Spring 2019

Source: https://textbooks.math.gatech.edu/ila/eigenvectors.html

slide-25
SLIDE 25

The weight component vectors are the eigenvectors of the covariance matrix of the centered dataset

26

Mike Hughes - Tufts COMP 135 - Fall 2020

S = 1 N

N

X

n=1

(xn − m)(xn − m)T

<latexit sha1_base64="Foln9UdFfDqmIDtWTnqYKLQiD4=">ACG3icbZDLSgMxFIYzXmu9jbp0EyxCXVhmqCbQtGNq1KxF6GXIZNmamiSGZKMWIZ5Dze+ihsXirgSXPg2phfwUn8IfPznHE7O70eMKu04n9bc/MLi0nJmJbu6tr6xaW9tN1QYS0zqOGShvPaRIowKUtdUM3IdSYK4z0jTH5yP6s1bIhUNRU0PI9LhqC9oQDHSxvLs4hUswXYgEU7cNKmksK1i7iWi5KbdCszfeQIeQn7wTd2aZ+ecgjMWnAV3CjkwVdWz39u9EMecCI0ZUqrlOpHuJEhqihlJs+1YkQjhAeqTlkGBOFGdZHxbCveN04NBKM0TGo7dnxMJ4koNuW86OdI36m9tZP5Xa8U6O0kVESxJgJPFgUxgzqEo6Bgj0qCNRsaQFhS81eIb5AJSps4syYE9+/Js9AoFtyjQvHyOFc+m8aRAbtgD+SBC05AGVyAKqgDO7BI3gGL9aD9WS9Wm+T1jlrOrMDfsn6+AIyFZ5c</latexit>

Swk = λkwk

<latexit sha1_base64="HG9spXgkz3vaivsg/1sMcGDo8g8=">AB/nicbVDLSgMxFL1TX7W+RsWVm2ARXJWZKuhGKLpxWdE+oB2GTCZtQzOZIckoZSj4K25cKOLW73Dn35i2s9DWA4GTc+4hNydIOFPacb6twtLyupacb20sbm1vWPv7jVnEpCGyTmsWwHWFHOBG1opjltJ5LiKOC0FQyvJ37rgUrFYnGvRwn1ItwXrMcI1kby7YM79OgP0SXqchMKseHm7tlp+JMgRaJm5My5Kj79lc3jEkaUaEJx0p1XCfRXoalZoTcambKpgMsR92jFU4IgqL5uP0bHRglRL5bmCI2m6u9EhiOlRlFgJiOsB2rem4j/eZ1U9y68jIk1VSQ2UO9lCMdo0kXKGSEs1HhmAimdkVkQGWmGjTWMmU4M5/eZE0qxX3tFK9PSvXrvI6inAIR3ACLpxDW6gDg0gkMEzvMKb9WS9WO/Wx2y0YOWZfgD6/MHkwCUmg=</latexit>

Every principal component vector wk satisfies this equation: When we fit K principal components to a dataset, the optimal ones (that minimize reconstruction error) are those with the K largest eigenvalues. Can use standard linalg libraries to compute the eigenvalues/vectors!

W =   | | | w1 w2 . . . wK | | |  

<latexit sha1_base64="tPE1Wzn+xEBbswS0K+Urtg2bJmU=">ACcXicbVFfS9xAEN/EVm367R9KcWyeLQUCkdyCvpSkPal0BcLnidcQthsJneLm03YnShHzLufze/hC9+gW7uUriqs8zym9/825lNSikM+v6t4649e76+sfnCe/nq9Zu3va3tU1NUmsOIF7LQZwkzIWCEQqUcFZqYHkiYZyc/2z94wvQRhTqBOclRDmbKpEJztBSce96TL/TUEKGExomMBWqZlqzeVNzujiNd0W/0Fbt7YWhF+YMZ0lWXzZxYLkVc9iaMi3QrPLx76ZN+1dlWQZU2vWhoRbTGUZxr+8P/IXQxyDoQJ90chz3bsK04FUOCrlkxkwCv8TIVkXBJdielYGS8XM2hYmFiuVgonqxsYZ+tkxKs0JbVUgX7GpGzXJj5nliI9s5zENfSz7lm1SYHUa1UGWFoPiyUVZJigVt109ToYGjnFvAuBb2rZTPmGYc7Sd5dgnBw5Efg9PhINgbDP/s949+dOvYJB/JLvlKAnJAjsgvckxGhJM7572z43xy7t0PLnV3l6Gu0+W8I/+J+0vdG+3Zw=</latexit>
slide-26
SLIDE 26

PCA Principles

  • Minimize reconstruction error
  • Should be able to recreate x from z
  • Equivalent to maximizing variance
  • Want reconstructions to retain maximum information

27

Mike Hughes - Tufts COMP 135 - Spring 2019

slide-27
SLIDE 27

PCA: How to Select K?

  • 1) Use downstream supervised task metric
  • Regression error
  • 2) Use memory constraints of task
  • Can’t store more than 50 dims for 1M examples?

Take K=50

  • 3) Plot cumulative “variance explained”
  • Take K that seems to capture most or all variance

28

Mike Hughes - Tufts COMP 135 - Spring 2019

slide-28
SLIDE 28

Empirical Variance of Data X

29

Mike Hughes - Tufts COMP 135 - Spring 2019

Var[X] = 1 N

N

X

n=1 F

X

f=1

(xnf − mf)2 = 1 N

N

X

n=1

(xn − m)T (xn − m)

<latexit sha1_base64="GKRC83sxB0J/BEVDnTwZFSYHPc=">ACaXicfVFdS9xAFJ3EWm207apUin0ZXCz60CVZBX0RKH0Sy468ImGyazEx2cmYSZG3EJAX9j3/oHfPFPOIl58KP0wsC57DnTmT5Ib8P2/jv3bv79wuIHb2n546fPnZXVockKTdmAZiLTo4QYJrhiA+Ag2CjXjMhEsIvk+qSeX9wbXimzmGWs0iS8VTglYKu7chcBuoRwSXY1HEf5+iMNUE1oGVXla4dAUMi7VYVBNTtsmrZufePvW8mFf2AZpzuTvheG3v/Mjb6R70zOnzVe3On6Pb8p/BYELeits7izp9wmtFCMgVUEGPGgZ9DVBINnApWeWFhWE7oNblkYwsVkcxEZNUhbcsM8Vpu1RgBv2uaMk0piZTKxSErgyr2c1+a/ZuID0ICq5ygtgij4tSguBIcN17HjKNaMgZhYQqrm9K6ZXxIYF9nPqEILXT34Lhv1esNvr/97rHh23cSyib2gTbaMA7aMj9AudoQGi6N5Zdr4686Du+J+dTepK7TetbQi3K7j6Owtbc=</latexit>

m , 1 N

N

X

n=1

xn

<latexit sha1_base64="vWTdbPF+VhlhDHozPaTzD0QqM=">ACEnicbVDLSgMxFM3UV62vqks3wSLopsxUQTdC0Y2rUsE+oDMOmThiaZMcmIZhvcOvuHGhiFtX7vwb08dCWw8EDuecy809Qcyo0rb9beUWFpeWV/KrhbX1jc2t4vZOU0WJxKSBIxbJdoAUYVSQhqakXYsCeIBI61gcDnyW/dEKhqJGz2MicdRT9CQYqSN5BePOHS1pEj0GLmDbigRTp0srWXQVQn3U3HuZLc1+OCbMku2PAeJMSQlMUfeLX243wgknQmOGlOo4dqy9FElNMSNZwU0UiREeoB7pGCoQJ8pLxydl8MAoXRhG0jyh4Vj9PZEirtSQBybJke6rW8k/ud1Eh2eSkVcaKJwJNFYcKgjuCoH9ilkmDNhoYgLKn5K8R9ZGrRpsWCKcGZPXmeNCtl57hcuT4pVS+mdeTBHtgHh8ABp6AKrkAdNAGj+AZvI368l6sd6tj0k0Z01ndsEfWJ8/2+SdlQ=</latexit>

Assume we’ve computed the empirical mean vector: Empirical variance is defined as averaged squared error from the empirical mean:

slide-29
SLIDE 29

Empirical Variance of reconstructions

30

Mike Hughes - Tufts COMP 135 - Spring 2019

= 1 N

N

X

n=1

xT

nxn

= 1 N

N

X

n=1

(zn1w1 + . . . + znKwK)T (zn1w1 + . . . + znKwK)

= 1 N

N

X

n=1 K

X

k=1

z2

nk

=

K

X

k=1

λk

Just sum up the top K eigenvalues!

slide-30
SLIDE 30

Proportion of Variance Explained by first K components

31

Mike Hughes - Tufts COMP 135 - Spring 2019

PVE(K) = PK

k=1 λk

PF

f=1 λf

Goal: Want K value where proportion of variance explained is large. Indicates good reconstruction ability on our training set.

slide-31
SLIDE 31

Variance explained curve

32

Mike Hughes - Tufts COMP 135 - Spring 2019

PVE(K) = PK

k=1 λk

PF

f=1 λf

slide-32
SLIDE 32

PCA Summary

PRO

  • Usually, fast to train, fast to test
  • Slowest step: finding K eigenvectors of an F x F matrix
  • Nested model
  • PCA with K=5 overlaps with PCA with K=4

CON

  • Sensitive to rescaling of input data features
  • Learned basis known only up to +/- scaling
  • Not often best for supervised tasks

33

Mike Hughes - Tufts COMP 135 - Spring 2019

slide-33
SLIDE 33

PCA: Best Practices

  • If features all have different units
  • Try rescaling to all be within (-1, +1) or have

variance 1

  • If features have same units, may not need to do

this

34

Mike Hughes - Tufts COMP 135 - Spring 2019

slide-34
SLIDE 34
  • Dim. Reduction/Embedding

Unit Objectives

  • Goals of dimensionality reduction
  • Reduce feature vector size (keep signal, discard noise)
  • “Interpret” features: visualize/explore/understand
  • Common approaches
  • Principal Component Analysis (PCA)
  • word2vec and other non-linear embeddings
  • Evaluation Metrics
  • Storage size
  • Reconstruction error
  • “Interpretability”

35

Mike Hughes - Tufts COMP 135 - Fall 2020