cs 103 representation learning information theory and
play

CS 103: Representation Learning, Information Theory and Control - PowerPoint PPT Presentation

CS 103: Representation Learning, Information Theory and Control Lecture 5, Feb 8, 2019 Representation Learning and Information Bottleneck Desiderata for representations An optimal representation z of the data x for the task y is a stochastic


  1. CS 103: Representation Learning, Information Theory and Control Lecture 5, Feb 8, 2019

  2. Representation Learning and Information Bottleneck

  3. Desiderata for representations An optimal representation z of the data x for the task y is a stochastic function z ∼ p(z|x) that is: I(z; y) = I(x; y) Sufficient Minimal I(x; z) is minimal among sufficient z If n ⫫ y , then I(n; z) = 0 Invariant to nuisances Maximally disentangled is minimized TC ( z ) = KL( p ( z ) k Q i p ( z i )) Sufficient Minimal 3

  4. Information Bottleneck Lagrangian Minimal sufficient representations for deep learning A minimal sufficient representation is the solution to: minimize p ( z | x ) I ( x ; z ) H ( y | z ) = H ( y | x ) s.t. Information Bottleneck Lagrangian: L = H p,q ( y | z ) + β I ( z ; x ) cross-entropy regularizer Trade-off: between sufficiency and minimality, regulated by the parameter. 4

  5. Invariant if and only if minimal We only need to enforce minimality (easy) to gain invariance (difficult) Proposition. (A. and Soatto, 2017) Let z be a sufficient representation and n a nuisance. Then, I ( z ; n ) ≤ I ( z ; x ) − I ( x ; y ) invariance minimality constant Moreover, there exists a nuisance n for which equality holds. A representation is maximally insensitive to all nuisances iff it is minimal 5

  6. Corollary: Ways of enforcing invariance The standard architecture alone already promotes invariant representations Regularization by architecture Reducing dimension (max-pooling) or adding noise (dropout) increases minimality and invariance. Only nuisance information dropped 1 in a bottleneck (sufficiency). Nuisance information The classifier cannot overfit to nuisances. 3 I(x; n) Task information Increasingly more minimal implies I(x; y) 2 increasingly more invariant to nuisances. Stacking layers Stacking multiple layers makes the representation increasingly minimal. 6

  7. Information Dropout: a Variational Bottleneck Creating a soft bottleneck with controlled noise L = H p,q ( y | z ) + β I ( z ; x ) = H p,q ( y | z ) − β log α ( x ) bottleneck Nuisance information I(x; n) Task information I(x; y) Multiplicative noise ~ N(0, 𝛽 (x)) Achille and Soatto, "Information Dropout: Learning Optimal Representations Through Noisy Computation” , PAMI 2018 (arXiv 2016) 7

  8. Learning invariant representations ( Achille and Soatto, 2017) Deeper layers filter increasingly more nuisances Stronger bottleneck = more filtering Only informative part of the image Other information is discarded Achille and Soatto, "Information Dropout: Learning Optimal Representations Through Noisy Computation” , PAMI 2018 (arXiv 2016) 8

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend