cs 103 representation learning information theory and
play

CS 103: Representation Learning, Information Theory and Control - PowerPoint PPT Presentation

CS 103: Representation Learning, Information Theory and Control Lecture 8, Mar 1, 2019 Recap Group nuisances - Group convolutions - Canonical reference frames - SIFT descriptors General nuisances - Minimal information in activation Invariance


  1. CS 103: Representation Learning, Information Theory and Control Lecture 8, Mar 1, 2019

  2. Recap Group nuisances - Group convolutions - Canonical reference frames - SIFT descriptors General nuisances - Minimal information in activation β‡’ Invariance to nuisances - Information Bottleneck - IB loss can upper-bounded by introducing an auxiliary variable - Aside: Variational Auto-Encoder can be seen as a particular case - Aside: Disentanglement in VAE How does this relate to standard deep learning? 2

  3. The Kolmogorov Structure of a Task How can we define the structure of a task? Define the Kolmogorov Structure Function: S 𝒠 ( t ) = min K ( M ) ≀ t L ( 𝒠 ; M ) Increasing the complexity of the model leads to big gains in accuracy: We are learning the structure of the problem. Training Loss Kolmogorov minimal sufficient statistic After learning all the structure, we can only memorize: inefficient asymptotic phase. Tangent = 1 in the asymptote: Need to store 1 bit Optimal in the model to decrease the loss by 1 bit Kolmogorov complexity of model Kolmogorov's Structure Functions and Model Selection , Vereshchagin and Vitanyi , 2002 3 Information Complexity of Tasks, their Structure and their Distance , Achille et al. , 2018

  4. Optimizing using Deep Neural Networks How do we find the optimal solution? S 𝒠 ( t ) = min K ( M )< t L ( 𝒠 ; M ) Corresponding Lagrangian β„’ ( M ) = L ( 𝒠 ; M ) + Ξ» K ( M ) Let w be the parameters of the model Use the bound K ( M ) ≀ KL( q ( w | 𝒠 ) βˆ₯ p ( w )) β„’ ( M ) = L ( 𝒠 ; M ) + Ξ» KL( q ( w | 𝒠 ) βˆ₯ p ( w )) This loss can be implemented using a DNN and the local reparametrization trick.* * Variational Dropout and the Local Reparameterization Trick , Kingma et al. , 2015 4 Information Complexity of Tasks, their Structure and their Distance , Achille et al. , 2018

  5. Let’s rewrite it using Information Theory We used an upperbound, what is the best we value it can assume? β„’ ( M ) = 𝔽 w ∼ q ( w | 𝒠 ) [ H p , q ( 𝒠 | w )] + Ξ» KL( q ( w | 𝒠 ) βˆ₯ p ( w )) . Recall that: I ( w ; 𝒠 ) ≀ 𝔽 𝒠 [KL( q ( w | 𝒠 ) βˆ₯ p ( w ))], which is obtained when p(w) = q(w|D) . Hence, on expectation over the datasets, the best function loss function to use to recover the task structure is: β„’ ( M ) = 𝔽 𝒠 [ H ( 𝒠 | w )] + Ξ» I ( w ; 𝒠 ) . IB Lagrangian for the weights 5

  6. <latexit sha1_base64="eu/9thb7n8qPJF1t/o56inDEc=">ACfHicdVFdSxtBFJ1d26qxrVEfbk0Cgm2YVdKFaQg2AdbfLBgVMiGZXYySQZnZteZu43Juj+0z3pryidfCDV1AMDh3Pu4V7OJkUFoPgp+cvXj5anltbL2+s3b9erG5qVNc8N4i6UyNdcJtVwKzVsoUPLrzHCqEsmvkpuTiX/1gxsrUn2Bo4x3FO1r0ROMopPiahkpoePitj6+v2uUECmKA0ZlcVbCZziNi+w93JZQH92PG7AHUcKRwtf6+AjuGpXdWXb4TCweLgYfBr+URzBsxNVa0AymgEUSzkmNzHEeV39H3ZTlimtklrbDoMOwU1KJjkZSXKLc8ou6F93nZU8Vtp5jWVMKuU7rQS417GmGq/psoqLJ2pBI3OTnTPvUm4v+8do69w04hdJYj12y2qJdLwBQmnUNXGM5QjhyhzAh3K7ABNZSh+5nKozVMJUb0B1i6asKnRSySy/1mGDTD7x9rx9/mJa2QbfKO1ElIDsgxOSXnpEUY+eWtepvelvfH3/H3/A+zUd+bZ7bI/if/gLpa73S</latexit> <latexit sha1_base64="eu/9thb7n8qPJF1t/o56inDEc=">ACfHicdVFdSxtBFJ1d26qxrVEfbk0Cgm2YVdKFaQg2AdbfLBgVMiGZXYySQZnZteZu43Juj+0z3pryidfCDV1AMDh3Pu4V7OJkUFoPgp+cvXj5anltbL2+s3b9erG5qVNc8N4i6UyNdcJtVwKzVsoUPLrzHCqEsmvkpuTiX/1gxsrUn2Bo4x3FO1r0ROMopPiahkpoePitj6+v2uUECmKA0ZlcVbCZziNi+w93JZQH92PG7AHUcKRwtf6+AjuGpXdWXb4TCweLgYfBr+URzBsxNVa0AymgEUSzkmNzHEeV39H3ZTlimtklrbDoMOwU1KJjkZSXKLc8ou6F93nZU8Vtp5jWVMKuU7rQS417GmGq/psoqLJ2pBI3OTnTPvUm4v+8do69w04hdJYj12y2qJdLwBQmnUNXGM5QjhyhzAh3K7ABNZSh+5nKozVMJUb0B1i6asKnRSySy/1mGDTD7x9rx9/mJa2QbfKO1ElIDsgxOSXnpEUY+eWtepvelvfH3/H3/A+zUd+bZ7bI/if/gLpa73S</latexit> <latexit sha1_base64="eu/9thb7n8qPJF1t/o56inDEc=">ACfHicdVFdSxtBFJ1d26qxrVEfbk0Cgm2YVdKFaQg2AdbfLBgVMiGZXYySQZnZteZu43Juj+0z3pryidfCDV1AMDh3Pu4V7OJkUFoPgp+cvXj5anltbL2+s3b9erG5qVNc8N4i6UyNdcJtVwKzVsoUPLrzHCqEsmvkpuTiX/1gxsrUn2Bo4x3FO1r0ROMopPiahkpoePitj6+v2uUECmKA0ZlcVbCZziNi+w93JZQH92PG7AHUcKRwtf6+AjuGpXdWXb4TCweLgYfBr+URzBsxNVa0AymgEUSzkmNzHEeV39H3ZTlimtklrbDoMOwU1KJjkZSXKLc8ou6F93nZU8Vtp5jWVMKuU7rQS417GmGq/psoqLJ2pBI3OTnTPvUm4v+8do69w04hdJYj12y2qJdLwBQmnUNXGM5QjhyhzAh3K7ABNZSh+5nKozVMJUb0B1i6asKnRSySy/1mGDTD7x9rx9/mJa2QbfKO1ElIDsgxOSXnpEUY+eWtepvelvfH3/H3/A+zUd+bZ7bI/if/gLpa73S</latexit> <latexit sha1_base64="eu/9thb7n8qPJF1t/o56inDEc=">ACfHicdVFdSxtBFJ1d26qxrVEfbk0Cgm2YVdKFaQg2AdbfLBgVMiGZXYySQZnZteZu43Juj+0z3pryidfCDV1AMDh3Pu4V7OJkUFoPgp+cvXj5anltbL2+s3b9erG5qVNc8N4i6UyNdcJtVwKzVsoUPLrzHCqEsmvkpuTiX/1gxsrUn2Bo4x3FO1r0ROMopPiahkpoePitj6+v2uUECmKA0ZlcVbCZziNi+w93JZQH92PG7AHUcKRwtf6+AjuGpXdWXb4TCweLgYfBr+URzBsxNVa0AymgEUSzkmNzHEeV39H3ZTlimtklrbDoMOwU1KJjkZSXKLc8ou6F93nZU8Vtp5jWVMKuU7rQS417GmGq/psoqLJ2pBI3OTnTPvUm4v+8do69w04hdJYj12y2qJdLwBQmnUNXGM5QjhyhzAh3K7ABNZSh+5nKozVMJUb0B1i6asKnRSySy/1mGDTD7x9rx9/mJa2QbfKO1ElIDsgxOSXnpEUY+eWtepvelvfH3/H3/A+zUd+bZ7bI/if/gLpa73S</latexit> <latexit sha1_base64="07Llzu3pX36qhiNEl5FSqs7OCBw=">ACf3iclVFdSxtBFJ3d2hqjrVHRF18uDdKEStgtBRURBPug0gcLRoUkLOTSTI4O7vO3G0+1gX/pn+g+DOcfCD148ULA+ecw/3ciZMpDoefeO+2Hu46f5wkJxcenzl+XSyuqFiVPNeJ3FMtZXITVcCsXrKFDyq0RzGoWSX4bXR2P98i/XRsTqHIcJb0W0q0RHMIqWCkp3W9CMhAqym8rodlDNbUex6jMfudwAMdBlmzDTdDPoTK8HVXhOzRDjhROKqN9GFSLU3P/Hb6nwV/5PvSrQans1bxJwWvgz0CZzOosKD02zFLI6QSWpMw/cSbGVUo2CS58VmanhC2TXt8oaFikbctLJUjlsWaYNnVjbpxAm7P+OjEbGDKPQTo7PNC+1MfmW1kixs9vKhEpS5IpNF3VSCRjDOHZoC80ZyqEFlGlhbwXWo5oytJ9TfLaGRaEW3R7mNhr/ZRCvwcWPmu/V/D8/y4ens5AKZJN8JRXikx1ySI7JGakTRv45S86s+E67je35nrTUdeZedbIs3L3HgEoL3n</latexit> <latexit sha1_base64="07Llzu3pX36qhiNEl5FSqs7OCBw=">ACf3iclVFdSxtBFJ3d2hqjrVHRF18uDdKEStgtBRURBPug0gcLRoUkLOTSTI4O7vO3G0+1gX/pn+g+DOcfCD148ULA+ecw/3ciZMpDoefeO+2Hu46f5wkJxcenzl+XSyuqFiVPNeJ3FMtZXITVcCsXrKFDyq0RzGoWSX4bXR2P98i/XRsTqHIcJb0W0q0RHMIqWCkp3W9CMhAqym8rodlDNbUex6jMfudwAMdBlmzDTdDPoTK8HVXhOzRDjhROKqN9GFSLU3P/Hb6nwV/5PvSrQans1bxJwWvgz0CZzOosKD02zFLI6QSWpMw/cSbGVUo2CS58VmanhC2TXt8oaFikbctLJUjlsWaYNnVjbpxAm7P+OjEbGDKPQTo7PNC+1MfmW1kixs9vKhEpS5IpNF3VSCRjDOHZoC80ZyqEFlGlhbwXWo5oytJ9TfLaGRaEW3R7mNhr/ZRCvwcWPmu/V/D8/y4ens5AKZJN8JRXikx1ySI7JGakTRv45S86s+E67je35nrTUdeZedbIs3L3HgEoL3n</latexit> <latexit sha1_base64="07Llzu3pX36qhiNEl5FSqs7OCBw=">ACf3iclVFdSxtBFJ3d2hqjrVHRF18uDdKEStgtBRURBPug0gcLRoUkLOTSTI4O7vO3G0+1gX/pn+g+DOcfCD148ULA+ecw/3ciZMpDoefeO+2Hu46f5wkJxcenzl+XSyuqFiVPNeJ3FMtZXITVcCsXrKFDyq0RzGoWSX4bXR2P98i/XRsTqHIcJb0W0q0RHMIqWCkp3W9CMhAqym8rodlDNbUex6jMfudwAMdBlmzDTdDPoTK8HVXhOzRDjhROKqN9GFSLU3P/Hb6nwV/5PvSrQans1bxJwWvgz0CZzOosKD02zFLI6QSWpMw/cSbGVUo2CS58VmanhC2TXt8oaFikbctLJUjlsWaYNnVjbpxAm7P+OjEbGDKPQTo7PNC+1MfmW1kixs9vKhEpS5IpNF3VSCRjDOHZoC80ZyqEFlGlhbwXWo5oytJ9TfLaGRaEW3R7mNhr/ZRCvwcWPmu/V/D8/y4ens5AKZJN8JRXikx1ySI7JGakTRv45S86s+E67je35nrTUdeZedbIs3L3HgEoL3n</latexit> <latexit sha1_base64="07Llzu3pX36qhiNEl5FSqs7OCBw=">ACf3iclVFdSxtBFJ3d2hqjrVHRF18uDdKEStgtBRURBPug0gcLRoUkLOTSTI4O7vO3G0+1gX/pn+g+DOcfCD148ULA+ecw/3ciZMpDoefeO+2Hu46f5wkJxcenzl+XSyuqFiVPNeJ3FMtZXITVcCsXrKFDyq0RzGoWSX4bXR2P98i/XRsTqHIcJb0W0q0RHMIqWCkp3W9CMhAqym8rodlDNbUex6jMfudwAMdBlmzDTdDPoTK8HVXhOzRDjhROKqN9GFSLU3P/Hb6nwV/5PvSrQans1bxJwWvgz0CZzOosKD02zFLI6QSWpMw/cSbGVUo2CS58VmanhC2TXt8oaFikbctLJUjlsWaYNnVjbpxAm7P+OjEbGDKPQTo7PNC+1MfmW1kixs9vKhEpS5IpNF3VSCRjDOHZoC80ZyqEFlGlhbwXWo5oytJ9TfLaGRaEW3R7mNhr/ZRCvwcWPmu/V/D8/y4ens5AKZJN8JRXikx1ySI7JGakTRv45S86s+E67je35nrTUdeZedbIs3L3HgEoL3n</latexit> A new Information Bottleneck p(y|x) D w Weights IB dataset real distribution weights Overfitting min w L = H p,q w ( y | z ) + Ξ² I ( D ; w ) y z x Activations IB data label activations Invariance q ( z | x ) L = H p,q ( y | z ) + Ξ² I ( z ; x ) min 6

  7. The PAC-Bayes generalization bound Catoni, 2007; McAllester 2013 PAC-Bayes bound (Catoni, 2007; McAllester 2013). Corollary. Minimizing the IB Lagrangian for the weights minimizes an upper bound on the test error. This gives non-vacuous generalization bounds! (Dziugaite and Roy, 2017) 7

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend