Principal Components Analysis (PCA)
2
Tufts COMP 135: Introduction to Machine Learning https://www.cs.tufts.edu/comp/135/2020f/
Many ideas/slides attributable to: Liping Liu (Tufts), Emily Fox (UW) Matt Gormley (CMU)
- Prof. Mike Hughes
Principal Components Analysis (PCA) Prof. Mike Hughes Many - - PowerPoint PPT Presentation
Tufts COMP 135: Introduction to Machine Learning https://www.cs.tufts.edu/comp/135/2020f/ Principal Components Analysis (PCA) Prof. Mike Hughes Many ideas/slides attributable to: Liping Liu (Tufts), Emily Fox (UW) Matt Gormley (CMU) 2 What
2
Tufts COMP 135: Introduction to Machine Learning https://www.cs.tufts.edu/comp/135/2020f/
Many ideas/slides attributable to: Liping Liu (Tufts), Emily Fox (UW) Matt Gormley (CMU)
3
Mike Hughes - Tufts COMP 135 - Fall 2020
Data Examples data x
Supervised Learning Unsupervised Learning Reinforcement Learning
n=1
Task summary
Performance measure
4
Mike Hughes - Tufts COMP 135 - Fall 2020
Supervised Learning Unsupervised Learning Reinforcement Learning
embedding
5
Mike Hughes - Tufts COMP 135 - Fall 2020
6
Mike Hughes - Tufts COMP 135 - Fall 2020
7
Mike Hughes - Tufts COMP 135 - Fall 2020
Nature, 2008
8
Mike Hughes - Tufts COMP 135 - Fall 2020
Parameters: m, an F-dim vector Training problem: Minimize reconstruction error
9
Mike Hughes - Tufts COMP 135 - Fall 2020
min
m∈RF N
X
n=1
(xn − m)T (xn − m)
Optimal parameters:
Think of mean vector as optimal “reconstruction” of a dataset if you must use a single vector
m This is squared error between two vectors
10
Mike Hughes - Tufts COMP 135 - Fall 2020
reconstructed
11
Mike Hughes - Tufts COMP 135 - Fall 2020
12
Mike Hughes - Tufts COMP 135 - Fall 2020
13
Mike Hughes - Tufts COMP 135 - Fall 2020
14
Mike Hughes - Tufts COMP 135 - Fall 2020 If we could project into 2 dims (same as F), we can perfectly reconstruct
15
Mike Hughes - Tufts COMP 135 - Fall 2020
Idea: Minimize reconstruction error
16
Mike Hughes - Tufts COMP 135 - Fall 2020
Fx1 High-dim. data 1 x 1 Low-dim embedding
F x 1 Weights F x 1 “mean” vector
17
Mike Hughes - Tufts COMP 135 - Fall 2020
Problem: “Over-parameterized”. Too many possible solutions! Suppose we have an alternate model with weights w’ and embedding z’ We would get equivalent reconstructions if we set:
Solution: Constrain magnitude of w. w is a unit vector. We care about direction, not scale.
F
X
f=1
w2
f = 1
<latexit sha1_base64="+9+RCl1NQLq0x56CkST31Y4T5uY=">ACAHicbVDLSsNAFJ34rPUVdeHCzWARXJWkCropFAVxWcE+oE3DZDph85MwsxEKSEbf8WNC0Xc+hnu/BunbRbaeuDC4Zx7ufeIGZUacf5tpaWV1bX1gsbxc2t7Z1de2+/qaJEYtLAEYtkO0CKMCpIQ1PNSDuWBPGAkVYwup74rQciFY3EvR7HxONoIGhIMdJG8u3Drkq4n4ZVN+vdwEc/7FVgFbrQt0tO2ZkCLhI3JyWQo+7bX91+hBNOhMYMKdVxnVh7KZKaYkayYjdRJEZ4hAakY6hAnCgvnT6QwROj9GEYSVNCw6n6eyJFXKkxD0wnR3qo5r2J+J/XSXR46aVUxIkmAs8WhQmDOoKTNGCfSoI1GxuCsKTmVoiHSCKsTWZFE4I7/IiaVbK7lm5cndeql3lcRTAETgGp8AF6AGbkEdNAGXgGr+DNerJerHfrY9a6ZOUzB+APrM8fKFmUzw=</latexit>W is a vector on unit circle. Magnitude is always 1.
18
Mike Hughes - Tufts COMP 135 - Fall 2020
W is a vector on unit circle. Magnitude is always 1.
Given fixed weights w and a specific x, what is the optimal scalar z value? Minimize reconstruction error! Exact analytical solution (take gradient, set to zero, solve for z) gives:
z = wT (x − m)
<latexit sha1_base64="wLdgEcLQpRKuImgEmyR48SYRZ9Q=">AB+HicbVDLTgIxFO3gC/HBqEs3jcQEF5IZNGNCdGNS0x4JTCSTulAQ9uZtB0VJnyJGxca49ZPcefWGAWCp7kJifn3Jt7/EjRpV2nG8rs7K6tr6R3cxtbe/s5u29/YKY4lJHYcslC0fKcKoIHVNSOtSBLEfUa/vBm6jcfiFQ0FDU9iojHUV/QgGKkjdS182N4BR/va7D4BE8hP+naBafkzACXiZuSAkhR7dpfnV6IY06Exgwp1XadSHsJkpiRia5TqxIhPAQ9UnbUIE4UV4yO3wCj43Sg0EoTQkNZ+rviQRxpUbcN50c6YFa9Kbif1471sGl1ARxZoIPF8UxAzqE5TgD0qCdZsZAjCkpbIR4gibA2WeVMCO7iy8ukUS65Z6Xy3Xmhcp3GkQWH4AgUgQsuQAXcgiqoAwxi8AxewZs1tl6sd+tj3pqx0pkD8AfW5w+wBpEo</latexit>Projection of feature vector x onto vector w after “centering” (removing the mean)
min
z∈R
(x − (wz + m))2
<latexit sha1_base64="K0louETznvNXoYDCXF7hv1DtY=">ACMHicbVDLSgMxFM34rPVdenmYhEUscxUQZeiC12qWBU6tWTSjIYmSHJqHWY/pEbP0U3Coq49StMH4qvA4Fzr2X3HuCmDNtXPfJGRgcGh4ZzY3lxycmp6YLM7PHOkoUoRUS8UidBlhTziStGY4PY0VxSLg9CRo7nTqJ5dUaRbJI9OKaU3gc8lCRrCxVr2w6wsm6+kN+EyCL7C5CIL0Mug3V7qyTC9zmAVvtRVBjewAp9SZMuwfFauF4puye0C/hKvT4qoj/164d5vRCQRVBrCsdZVz41NLcXKMJplvcTWNMmvicVi2VWFBdS7sHZ7BonQaEkbJPGui63ydSLRuicB2dtbUv2sd879aNTHhZi1lMk4MlaT3UZhwMBF0oMGU5QY3rIE8XsrkAusMLE2IzNgTv98l/yXG5K2Vygfrxa3tfhw5NI8W0BLy0AbaQntoH1UQbfoAT2jF+fOeXRenbde64DTn5lDP+C8fwAqVaj7</latexit>Fx1 1 x 1 F x 1 F x 1
19
Mike Hughes - Tufts COMP 135 - Fall 2020
F x 1 High-dim. data K x 1 Low-dim vector F x K Weights F x 1 Mean of data vector
W = | | | w1 w2 . . . wK | | |
<latexit sha1_base64="tPE1Wzn+xEBbswS0K+Urtg2bJmU=">ACcXicbVFfS9xAEN/EVm367R9KcWyeLQUCkdyCvpSkPal0BcLnidcQthsJneLm03YnShHzLufze/hC9+gW7uUriqs8zym9/825lNSikM+v6t4649e76+sfnCe/nq9Zu3va3tU1NUmsOIF7LQZwkzIWCEQqUcFZqYHkiYZyc/2z94wvQRhTqBOclRDmbKpEJztBSce96TL/TUEKGExomMBWqZlqzeVNzujiNd0W/0Fbt7YWhF+YMZ0lWXzZxYLkVc9iaMi3QrPLx76ZN+1dlWQZU2vWhoRbTGUZxr+8P/IXQxyDoQJ90chz3bsK04FUOCrlkxkwCv8TIVkXBJdielYGS8XM2hYmFiuVgonqxsYZ+tkxKs0JbVUgX7GpGzXJj5nliI9s5zENfSz7lm1SYHUa1UGWFoPiyUVZJigVt109ToYGjnFvAuBb2rZTPmGYc7Sd5dgnBw5Efg9PhINgbDP/s949+dOvYJB/JLvlKAnJAjsgvckxGhJM7572z43xy7t0PLnV3l6Gu0+W8I/+J+0vdG+3Zw=</latexit>Each of the K weight vectors wk is one “component”. Our goal is to find the K weight vectors that best reconstruct our training dataset
min
W ∈RF ×K N
X
n=1 F
X
f=1
(xnf − ˆ xnf(W))2
<latexit sha1_base64="NdPAnfuhieiVL0AYaUg60rf8g=">ACS3icbVDPaxNBFJ5Nra3xV9Sjl4dBSA+G3VioF6FYKIgVUxTyCTL7GS2GTozu515Kw3D/n9evHjzn/DiQREPzqZ70NYPBr7ve+8x731ZqaTDOP4adTZubN7c2r7VvX3n7r37vQcPj1RWS7GvFCFPcmYE0oaMUaJSpyUVjCdKTHJzg6a+uSjsE4W5gOuSjHT7NTIXHKGwUp7GdXSpH4CVBqgmuEy/z7eu4PKUotHLypawB6XrEFUFfp1JuXST1/24q8EYcwuAh+XsMzoEuG/qJey8FkB3bmo7TXj4fxGnCdJC3pkxZHae8LXRS80sIgV8y5aRKXOPMouRK1F1aOVEyfsZOxTRQw8KeM7/OoanwVlAXtjwDMLa/XvCM+3cSmehs7nWXa015v9q0wrzFzMvTVmhMPzyo7xSgAU0wcJCWsFRrQJh3MqwK/Als4xjiL8bQkiunydHI+GyfPh6N1uf/9VG8c2eUyekAFJyB7ZJ6/JERkTj6Rb+QH+Rl9jr5Hv6Lfl62dqJ15RP5BZ/MPrsGyg=</latexit>Solving this squared error reconstruction objective is known as principal components analysis (PCA)
20
Mike Hughes - Tufts COMP 135 - Fall 2020 We will require that:
Weights that satisfy (1) and (2) form an “orthonormal basis”
W = | | | w1 w2 . . . wK | | |
<latexit sha1_base64="tPE1Wzn+xEBbswS0K+Urtg2bJmU=">ACcXicbVFfS9xAEN/EVm367R9KcWyeLQUCkdyCvpSkPal0BcLnidcQthsJneLm03YnShHzLufze/hC9+gW7uUriqs8zym9/825lNSikM+v6t4649e76+sfnCe/nq9Zu3va3tU1NUmsOIF7LQZwkzIWCEQqUcFZqYHkiYZyc/2z94wvQRhTqBOclRDmbKpEJztBSce96TL/TUEKGExomMBWqZlqzeVNzujiNd0W/0Fbt7YWhF+YMZ0lWXzZxYLkVc9iaMi3QrPLx76ZN+1dlWQZU2vWhoRbTGUZxr+8P/IXQxyDoQJ90chz3bsK04FUOCrlkxkwCv8TIVkXBJdielYGS8XM2hYmFiuVgonqxsYZ+tkxKs0JbVUgX7GpGzXJj5nliI9s5zENfSz7lm1SYHUa1UGWFoPiyUVZJigVt109ToYGjnFvAuBb2rZTPmGYc7Sd5dgnBw5Efg9PhINgbDP/s949+dOvYJB/JLvlKAnJAjsgvckxGhJM7572z43xy7t0PLnV3l6Gu0+W8I/+J+0vdG+3Zw=</latexit>F x 1 High-dim. data K x 1 Low-dim vector F x K Weights F x 1 Mean of data vector
wT
k wk = 1
→ PF
f=1 W 2 fk = 1
<latexit sha1_base64="oluz+D131CYKNVOsMWq9I94j4tI=">ACQnicbVDLSgMxFM34rPVdekmWARXZaYKuimIgrisYK3SqUMmvdOGZh4md9QyzLe58Qvc+QFuXCji1oVp7cLXgcDhnHuS3OMnUmi07UdrYnJqema2MFecX1hcWi6trJ7pOFUcGjyWsTr3mQYpImigQAniQIW+hKafv9w6DevQWkR6c4SKAdsm4kAsEZGskrXbghw54fZDe5l/Xzy1P6TejTGqVO0b1KWYe6SnR7yJSKb6iLcIsaBxKoq9PQy4Kak18e0aZh5pKqyTleqWxX7BHoX+KMSZmMUfdKD24n5mkIEXLJtG45doLtjCkUXEJedFMNCeN91oWoRELQbezUQU53TRKhwaxMidCOlK/JzIWaj0IfTM53E/9obif14rxWCvnYkoSREi/vVQkEqKMR32STtCAUc5MIRxJcxfKe8xTia1oumBOf3yn/JWbXibFeqJzvl/YNxHQWyTjbIFnHILtknx6ROGoSTO/JEXsirdW89W2/W+9fohDXOrJEfsD4+AWqWsJQ=</latexit>wT
j wk = 0
→ PF
f=1 WfjWfk = 0
<latexit sha1_base64="wqcCc/XOreIqz+Ued60IN5sLtk=">ACR3icbVDLSgMxFM3UV62vqks3wSK4KjMq6KYgCuKygrVCpx0y6Z02NvMwuaOWYf7OjVt3/oIbF4q4NK1d+DoQcjnHpJ7/EQKjb9ZBWmpmdm54rzpYXFpeWV8urahY5TxaHBYxmrS59pkCKCBgqUcJkoYKEvoekPjkd+8waUFnF0jsME2iHrRSIQnKGRvHLHDRn2/SC7zb3sKu+c02/CgNYotUvudcq61FWi10emVHxLXYQ71DiUQF2dhl4W1Jy8c0Kbhl3l42uQm7DtlSt21R6D/iXOhFTIBHWv/Oh2Y56GECGXTOuWYyfYzphCwSXkJTfVkDA+YD1oGRqxEHQ7G/eQ0y2jdGkQK3MipGP1eyJjodbD0DeToyX1b28k/ue1UgwO2pmIkhQh4l8PBamkGNRqbQrFHCUQ0MYV8L8lfI+U4yjqb5kSnB+r/yXOxUnd3qztle5fBoUkeRbJBNsk0csk8OySmpkwbh5J48k1fyZj1YL9a79fE1WrAmXyAwXrE+jQsdI=</latexit>8j 6= k
<latexit sha1_base64="tIxlaGYQtu7rthENmyWEIoXRgJI=">AB+XicbVDLSgMxFL1TX7W+Rl26CRbBVZmpgi6LblxWsA9oS8mkt21sJjMmUIZ+iduXCji1j9x59+YtrPQ1gOBwzn3ck9OEAujed9O7m19Y3Nrfx2YWd3b/APTyq6yhRDGsEpFqBlSj4BJrhuBzVghDQOBjWB0O/MbY1SaR/LBTGLshHQgeZ8zaqzUd12P1JUCPJI2hKfyKjrFr2SNwdZJX5GipCh2nW/2r2IJSFKwTVuV7semkVBnOBE4L7URjTNmIDrBlqaQh6k46Tz4lZ1bpERvBPmnIXP29kdJQ60kY2MmQmqFe9mbif14rMf3rTsplnBiUbHGonwhiIjKrgfS4QmbExBLKFLdZCRtSRZmxZRVsCf7yl1dJvVzyL0rl+8ti5SarIw8ncArn4MVOAOqlADBmN4hld4c1LnxXl3PhajOSfbOY/cD5/AKEPkwA=</latexit>21
Mike Hughes - Tufts COMP 135 - Fall 2020
π
W = | | | w1 w2 . . . wK | | |
<latexit sha1_base64="tPE1Wzn+xEBbswS0K+Urtg2bJmU=">ACcXicbVFfS9xAEN/EVm367R9KcWyeLQUCkdyCvpSkPal0BcLnidcQthsJneLm03YnShHzLufze/hC9+gW7uUriqs8zym9/825lNSikM+v6t4649e76+sfnCe/nq9Zu3va3tU1NUmsOIF7LQZwkzIWCEQqUcFZqYHkiYZyc/2z94wvQRhTqBOclRDmbKpEJztBSce96TL/TUEKGExomMBWqZlqzeVNzujiNd0W/0Fbt7YWhF+YMZ0lWXzZxYLkVc9iaMi3QrPLx76ZN+1dlWQZU2vWhoRbTGUZxr+8P/IXQxyDoQJ90chz3bsK04FUOCrlkxkwCv8TIVkXBJdielYGS8XM2hYmFiuVgonqxsYZ+tkxKs0JbVUgX7GpGzXJj5nliI9s5zENfSz7lm1SYHUa1UGWFoPiyUVZJigVt109ToYGjnFvAuBb2rZTPmGYc7Sd5dgnBw5Efg9PhINgbDP/s949+dOvYJB/JLvlKAnJAjsgvckxGhJM7572z43xy7t0PLnV3l6Gu0+W8I/+J+0vdG+3Zw=</latexit>ˆ xi = Wzi + m
<latexit sha1_base64="46QcHBJKEcuj9/eJEC2G4ul9bs=">ACInicbVDLSsNAFJ34rPUVdelmsAiCUJIqAuh6MZlBfuANoTJdNIOnUnCzESsId/ixl9x40JRV4If46SNRVsPDJx7zr3MvceLGJXKsj6NufmFxaXlwkpxdW19Y9Pc2m7IMBaY1HIQtHykCSMBqSuqGKkFQmCuMdI0xtcZn7zlghJw+BGDSPicNQLqE8xUlpyzbNOH6mkw5Hqe35yl6Yuhefwp26mE3qfOYeTkqeuWbLK1ghwltg5KYEcNd873RDHMSKMyQlG3bipSTIKEoZiQtdmJIoQHqEfamgaIE+koxNTuK+VLvRDoV+g4Ej9PZEgLuWQe7oz21BOe5n4n9eOlX/qJDSIYkUCP7IjxlUIczygl0qCFZsqAnCgupdIe4jgbDSqRZ1CPb0ybOkUSnbR+XK9XGpepHUQC7YA8cABucgCq4AjVQBxg8gCfwAl6NR+PZeDM+xq1zRj6zA/7A+PoGheWk4A=</latexit>zn = W T (xn − m)
<latexit sha1_base64="AnK3oLSf5ZYtTUl9JcKuHYRENy0=">AB/HicbZDLSgMxFIYzXmu9jXbpJliEurDMVE3QtGNywq9QTsOmThiaZIcmIY6mv4saFIm59EHe+jWk7C239IfDxn3M4J38QM6q043xbS8srq2vruY385tb2zq69t9USIxaeCIRbIdIEUYFaShqWakHUuCeMBIKxheT+qteyIVjURdpzHxOoLGlKMtLF8u/DoC3gJW3d1WHoweAL5sW8XnbIzFVwEN4MiyFTz7a9uL8IJ0JjhpTquE6svRGSmJGxvluokiM8BD1ScegQJwobzQ9fgyPjNODYSTNExpO3d8TI8SVSnlgOjnSAzVfm5j/1TqJDi+8ERVxonAs0VhwqCO4CQJ2KOSYM1SAwhLam6FeIAkwtrklTchuPNfXoRmpeyeliu3Z8XqVRZHDhyAQ1ACLjgHVXADaqABMEjBM3gFb9aT9WK9Wx+z1iUrmymAP7I+fwCYZpLK</latexit>encode “transform” decode “reconstruct”
xn
<latexit sha1_base64="2OdGja3qMfiDRv5VfkfpL6vXD8=">AB6nicbVBNS8NAEJ34WetX1aOXxSJ4KkV9Fj04rGi/YA2lM120i7dbMLuRiyhP8GLB0W8+ou8+W/ctjlo64OBx3szMwLEsG1cd1vZ2V1bX1js7BV3N7Z3dsvHRw2dZwqhg0Wi1i1A6pRcIkNw43AdqKQRoHAVjC6mfqtR1Sax/LBjBP0IzqQPOSMGivdP/Vkr1R2K+4MZJl4OSlDjnqv9NXtxyNUBomqNYdz02Mn1FlOBM4KXZTjQlIzrAjqWSRqj9bHbqhJxapU/CWNmShszU3xMZjbQeR4HtjKgZ6kVvKv7ndVITXvkZl0lqUL5ojAVxMRk+jfpc4XMiLElClubyVsSBVlxqZTtCF4iy8vk2a14p1XqncX5dp1HkcBjuEzsCDS6jBLdShAQwG8Ayv8OYI58V5dz7mrStOPnMEf+B8/gBpco3h</latexit>ˆ xn
<latexit sha1_base64="zNc7bDHzOV8PLkB763pxPeDztAU=">AB8HicbVBNS8NAEJ34WetX1aOXxSJ4KkV9Fj04rGC/ZA2lM120y7dbMLuRCyhv8KLB0W8+nO8+W/ctjlo64OBx3szMwLEikMu63s7K6tr6xWdgqbu/s7u2XDg6bJk414w0Wy1i3A2q4FIo3UKDk7URzGgWSt4LRzdRvPXJtRKzucZxwP6IDJULBKFrpoTukmD1NeqpXKrsVdwayTLyclCFHvVf6vZjlkZcIZPUmI7nJuhnVKNgk+K3dTwhLIRHfCOpYpG3PjZ7OAJObVKn4SxtqWQzNTfExmNjBlHge2MKA7NojcV/M6KYZXfiZUkiJXbL4oTCXBmEy/J32hOUM5toQyLeythA2pgxtRkUbgrf48jJpViveaV6d1GuXedxFOAYTuAMPLiEGtxCHRrAIJneIU3RzsvzrvzMW9dcfKZI/gD5/MHOK+Qrg=</latexit>zn
<latexit sha1_base64="AnK3oLSf5ZYtTUl9JcKuHYRENy0=">AB/HicbZDLSgMxFIYzXmu9jXbpJliEurDMVE3QtGNywq9QTsOmThiaZIcmIY6mv4saFIm59EHe+jWk7C239IfDxn3M4J38QM6q043xbS8srq2vruY385tb2zq69t9USIxaeCIRbIdIEUYFaShqWakHUuCeMBIKxheT+qteyIVjURdpzHxOoLGlKMtLF8u/DoC3gJW3d1WHoweAL5sW8XnbIzFVwEN4MiyFTz7a9uL8IJ0JjhpTquE6svRGSmJGxvluokiM8BD1ScegQJwobzQ9fgyPjNODYSTNExpO3d8TI8SVSnlgOjnSAzVfm5j/1TqJDi+8ERVxonAs0VhwqCO4CQJ2KOSYM1SAwhLam6FeIAkwtrklTchuPNfXoRmpeyeliu3Z8XqVRZHDhyAQ1ACLjgHVXADaqABMEjBM3gFb9aT9WK9Wx+z1iUrmymAP7I+fwCYZpLK</latexit>Input:
Output:
22
Mike Hughes - Tufts COMP 135 - Fall 2020
Transformation step
What happens when you call pca.transform(x_QF)
23
Mike Hughes - Tufts COMP 135 - Fall 2020
k=F If we use all possible components, we perfectly reconstruct original data
π
Input:
Output: Trained parameters for PCA
24
Mike Hughes - Tufts COMP 135 - Fall 2020
Training step : What happens when we call pca.fit(x_NF)
min
m∈RF ,W ∈RF ×K N
X
n=1 F
X
f=1
(xnf − ˆ xnf(m, W))2 subject to: W T W = IK
<latexit sha1_base64="SWfPRaHQK+lvYdbMp1HWzMHnjvk=">ACjHicbVFtaxNBEN4739pTa6of/TIYlBRqehcLrUqgKASlIFWaXiGbHubvWbt7t5uycJy/0a/5Hf/DfuJRE0dWDhmWdeduaZtBcmzD85fm3bt+5e29rO7j/4OHOo9bu4wudVyVlQ5qLvLxMiWaCKzY03Ah2WZSMyFSwOL1+38Tj76zUPFfnZlGwsSRXimecEuOopPUDS64SKwFzBVgSM0tT+6WeDPYh3uTsABsumYbTugbA3yoyhRdYVzKxqh/Vk0+wcrLGUBn7vishpeAZ8TYeb10O3I/3oO9S/AOMCGzY3VfqVUQMmf1P/aRtPzt0AfiYnAZJqx12w6XBTRCtQRut7Sxp/cTnFaSKUMF0XoUhYUZW1IaTgWrA1xpVhB6Ta7YyEF3FJjuxSzhueOmUKWl+4pA0v27wpLpNYLmbrMRhq9GWvI/8VGlcmOx5arojJM0dVHWSXc2tBcBqa8dCqIhQOEltzNCnRGSkKNu18jQrS58k1w0etGr7q9z4ftk3drObQU/QMdVCEjtAJ+oDO0BRb9s78I691/6Of+i/9furVN9b1zxB/5g/+A2jgsIw</latexit>Orthonormal constraint
25
Mike Hughes - Tufts COMP 135 - Spring 2019
Source: https://textbooks.math.gatech.edu/ila/eigenvectors.html
26
Mike Hughes - Tufts COMP 135 - Fall 2020
N
n=1
Every principal component vector wk satisfies this equation: When we fit K principal components to a dataset, the optimal ones (that minimize reconstruction error) are those with the K largest eigenvalues. Can use standard linalg libraries to compute the eigenvalues/vectors!
W = | | | w1 w2 . . . wK | | |
<latexit sha1_base64="tPE1Wzn+xEBbswS0K+Urtg2bJmU=">ACcXicbVFfS9xAEN/EVm367R9KcWyeLQUCkdyCvpSkPal0BcLnidcQthsJneLm03YnShHzLufze/hC9+gW7uUriqs8zym9/825lNSikM+v6t4649e76+sfnCe/nq9Zu3va3tU1NUmsOIF7LQZwkzIWCEQqUcFZqYHkiYZyc/2z94wvQRhTqBOclRDmbKpEJztBSce96TL/TUEKGExomMBWqZlqzeVNzujiNd0W/0Fbt7YWhF+YMZ0lWXzZxYLkVc9iaMi3QrPLx76ZN+1dlWQZU2vWhoRbTGUZxr+8P/IXQxyDoQJ90chz3bsK04FUOCrlkxkwCv8TIVkXBJdielYGS8XM2hYmFiuVgonqxsYZ+tkxKs0JbVUgX7GpGzXJj5nliI9s5zENfSz7lm1SYHUa1UGWFoPiyUVZJigVt109ToYGjnFvAuBb2rZTPmGYc7Sd5dgnBw5Efg9PhINgbDP/s949+dOvYJB/JLvlKAnJAjsgvckxGhJM7572z43xy7t0PLnV3l6Gu0+W8I/+J+0vdG+3Zw=</latexit>27
Mike Hughes - Tufts COMP 135 - Spring 2019
Take K=50
28
Mike Hughes - Tufts COMP 135 - Spring 2019
29
Mike Hughes - Tufts COMP 135 - Spring 2019
N
n=1 F
f=1
N
n=1
m , 1 N
N
X
n=1
xn
<latexit sha1_base64="vWTdbPF+VhlhDHozPaTzD0QqM=">ACEnicbVDLSgMxFM3UV62vqks3wSLopsxUQTdC0Y2rUsE+oDMOmThiaZMcmIZhvcOvuHGhiFtX7vwb08dCWw8EDuecy809Qcyo0rb9beUWFpeWV/KrhbX1jc2t4vZOU0WJxKSBIxbJdoAUYVSQhqakXYsCeIBI61gcDnyW/dEKhqJGz2MicdRT9CQYqSN5BePOHS1pEj0GLmDbigRTp0srWXQVQn3U3HuZLc1+OCbMku2PAeJMSQlMUfeLX243wgknQmOGlOo4dqy9FElNMSNZwU0UiREeoB7pGCoQJ8pLxydl8MAoXRhG0jyh4Vj9PZEirtSQBybJke6rW8k/ud1Eh2eSkVcaKJwJNFYcKgjuCoH9ilkmDNhoYgLKn5K8R9ZGrRpsWCKcGZPXmeNCtl57hcuT4pVS+mdeTBHtgHh8ABp6AKrkAdNAGj+AZvI368l6sd6tj0k0Z01ndsEfWJ8/2+SdlQ=</latexit>Assume we’ve computed the empirical mean vector: Empirical variance is defined as averaged squared error from the empirical mean:
30
Mike Hughes - Tufts COMP 135 - Spring 2019
= 1 N
N
X
n=1
xT
nxn
= 1 N
N
X
n=1
(zn1w1 + . . . + znKwK)T (zn1w1 + . . . + znKwK)
= 1 N
N
X
n=1 K
X
k=1
z2
nk
=
K
X
k=1
λk
Just sum up the top K eigenvalues!
31
Mike Hughes - Tufts COMP 135 - Spring 2019
Goal: Want K value where proportion of variance explained is large. Indicates good reconstruction ability on our training set.
32
Mike Hughes - Tufts COMP 135 - Spring 2019
PVE(K) = PK
k=1 λk
PF
f=1 λf
PRO
CON
33
Mike Hughes - Tufts COMP 135 - Spring 2019
variance 1
this
34
Mike Hughes - Tufts COMP 135 - Spring 2019
35
Mike Hughes - Tufts COMP 135 - Fall 2020