CS 103: Representation Learning, Information Theory and Control - PowerPoint PPT Presentation

CS 103: Representation Learning, Information Theory and Control Lecture 6, Feb 15, 2019

VAEs and disentanglement A β - VAE minimizes the loss function: Factorized prior L = H p,q ( x | z ) + β E x [KL( q ( z | x ) k p ( z ))] = H p,q ( x | z ) + β { I ( z ; x ) + TC( z ) } Minimality Disentanglement Assuming a factorized prior for z, a β -VAE optimizes both for the IB Lagrangian and for disentanglement. 2 Achille and Soatto, "Information Dropout: Learning Optimal Representations Through Noisy Computation” , PAMI 2018 (arXiv 2016)

Learning disentangled representations (Higgins et al., 2017, Burgess et al., 2017) Start with very high β and slowly decrease during training. Beginning: Very strict bottleneck, only encode most important factor End: Very large bottleneck, encode all remaining factors Components of the representation z Image seed Think of it as a non-linear PCA, where training time disentangles the factors. 3

Learning disentangled representations (Higgins et al., 2017, Burgess et al., 2017) Each component of the learned representation corresponds to a different semantic factor. Components of the representation z Image seed Higgins et al., β -VAE: Learning Basic Visual Concepts with a Constrained Variational Framework , 2017 Pictures courtesy of Higgins et al., Burgess et al. 4 Burgess et al., Understanding Disentangling in beta-VAE” 2017

Multiple Objects Attend, Infer, Repeat (Eslami et al.) Multi-Entity VAE (Nash et al.) 5

Is the representation “semantic” and domain invariant? 6 Achille et al., Life-Long Disentangled Representation Learning with Cross-Domain Latent Homologies , 2018

Corollary: Ways of enforcing invariance The standard architecture alone already promotes invariant representations Regularization by architecture Reducing dimension (max-pooling) or adding noise (dropout) increases minimality and invariance. Only nuisance information dropped 1 in a bottleneck (sufficiency). Nuisance information The classifier cannot overfit to nuisances. 3 I(x; n) Task information Increasingly more minimal implies I(x; y) 2 increasingly more invariant to nuisances. Stacking layers Stacking multiple layers makes the representation increasingly minimal. 7

Information Dropout: a Variational Bottleneck Creating a soft bottleneck with controlled noise ℒ = H p , q ( y | x ) + 𝔽 x KL( p ( z | x ) ∥ q ( z )) = H p , q ( y | x ) + 𝔽 x [ − log | Σ ( x ) | ] bottleneck Average log-variance of noise Nuisance information I(x; n) Task information I(x; y) Multiplicative noise ~ log N(0, 𝜏 (x)) Achille and Soatto, "Information Dropout: Learning Optimal Representations Through Noisy Computation” , PAMI 2018 (arXiv 2016) 8

Learning invariant representations ( Achille and Soatto, 2017) Deeper layers filter increasingly more nuisances Stronger bottleneck = more filtering Only informative part of the image Other information is discarded Achille and Soatto, "Information Dropout: Learning Optimal Representations Through Noisy Computation” , PAMI 2018 (arXiv 2016) 9

The catch What if we just represent an image by its index in the training set (or by a unique hash)? x z y 24,576 bits 16 bits 4 bits 0100 0000000000000000 0001 0000000000000001 0010 0000000000000010 0101 0000000000000011 It is a sufficient representation and it is close to minimal. 10

<latexit sha1_base64="5ix/e5Cegp0ObsQK/e6TCJk524w=">ACOnicbVBNSwMxEM36bf2qevQSLEKLUnZFUBCh4EVBRMGq0C1LNk3bYJdk1lpu+6P8X9496pHr54Ur/4A09qDXw8GHu/NMDMvjAU34LrPzsjo2PjE5NR0bmZ2bn4hv7h0bqJEU1alkYj0ZUgME1yxKnAQ7DLWjMhQsIvwar/vX9wbXikzqAbs7okLcWbnBKwUpDf9SVXQXpd7N12Shn2JYE2JSI9yvAePgjSeOM6K3ZveyW8jv2QAcH+Bj4s9nZxpxTkC27ZHQD/Jd6QFNAQJ0H+1W9ENJFMARXEmJrnxlBPiQZOBctyfmJYTOgVabGapYpIZurp4MkMr1mlgZuRtqUAD9TvEymRxnRlaDv7T5jfXl/8z6sl0Nyp1zFCTBFvxY1E4Ehwv3EcINrRkF0LSFUc3srpm2iCQWba+7HGipDzVtyGw03u8g/pLzbLnlr3TrULleBjSFpBq6iIPLSNKugAnaAqougOPaBH9OTcOy/Om/P+1TriDGeW0Q84H5+kawM</latexit> <latexit sha1_base64="5ix/e5Cegp0ObsQK/e6TCJk524w=">ACOnicbVBNSwMxEM36bf2qevQSLEKLUnZFUBCh4EVBRMGq0C1LNk3bYJdk1lpu+6P8X9496pHr54Ur/4A09qDXw8GHu/NMDMvjAU34LrPzsjo2PjE5NR0bmZ2bn4hv7h0bqJEU1alkYj0ZUgME1yxKnAQ7DLWjMhQsIvwar/vX9wbXikzqAbs7okLcWbnBKwUpDf9SVXQXpd7N12Shn2JYE2JSI9yvAePgjSeOM6K3ZveyW8jv2QAcH+Bj4s9nZxpxTkC27ZHQD/Jd6QFNAQJ0H+1W9ENJFMARXEmJrnxlBPiQZOBctyfmJYTOgVabGapYpIZurp4MkMr1mlgZuRtqUAD9TvEymRxnRlaDv7T5jfXl/8z6sl0Nyp1zFCTBFvxY1E4Ehwv3EcINrRkF0LSFUc3srpm2iCQWba+7HGipDzVtyGw03u8g/pLzbLnlr3TrULleBjSFpBq6iIPLSNKugAnaAqougOPaBH9OTcOy/Om/P+1TriDGeW0Q84H5+kawM</latexit> <latexit sha1_base64="5ix/e5Cegp0ObsQK/e6TCJk524w=">ACOnicbVBNSwMxEM36bf2qevQSLEKLUnZFUBCh4EVBRMGq0C1LNk3bYJdk1lpu+6P8X9496pHr54Ur/4A09qDXw8GHu/NMDMvjAU34LrPzsjo2PjE5NR0bmZ2bn4hv7h0bqJEU1alkYj0ZUgME1yxKnAQ7DLWjMhQsIvwar/vX9wbXikzqAbs7okLcWbnBKwUpDf9SVXQXpd7N12Shn2JYE2JSI9yvAePgjSeOM6K3ZveyW8jv2QAcH+Bj4s9nZxpxTkC27ZHQD/Jd6QFNAQJ0H+1W9ENJFMARXEmJrnxlBPiQZOBctyfmJYTOgVabGapYpIZurp4MkMr1mlgZuRtqUAD9TvEymRxnRlaDv7T5jfXl/8z6sl0Nyp1zFCTBFvxY1E4Ehwv3EcINrRkF0LSFUc3srpm2iCQWba+7HGipDzVtyGw03u8g/pLzbLnlr3TrULleBjSFpBq6iIPLSNKugAnaAqougOPaBH9OTcOy/Om/P+1TriDGeW0Q84H5+kawM</latexit> <latexit sha1_base64="5ix/e5Cegp0ObsQK/e6TCJk524w=">ACOnicbVBNSwMxEM36bf2qevQSLEKLUnZFUBCh4EVBRMGq0C1LNk3bYJdk1lpu+6P8X9496pHr54Ur/4A09qDXw8GHu/NMDMvjAU34LrPzsjo2PjE5NR0bmZ2bn4hv7h0bqJEU1alkYj0ZUgME1yxKnAQ7DLWjMhQsIvwar/vX9wbXikzqAbs7okLcWbnBKwUpDf9SVXQXpd7N12Shn2JYE2JSI9yvAePgjSeOM6K3ZveyW8jv2QAcH+Bj4s9nZxpxTkC27ZHQD/Jd6QFNAQJ0H+1W9ENJFMARXEmJrnxlBPiQZOBctyfmJYTOgVabGapYpIZurp4MkMr1mlgZuRtqUAD9TvEymRxnRlaDv7T5jfXl/8z6sl0Nyp1zFCTBFvxY1E4Ehwv3EcINrRkF0LSFUc3srpm2iCQWba+7HGipDzVtyGw03u8g/pLzbLnlr3TrULleBjSFpBq6iIPLSNKugAnaAqougOPaBH9OTcOy/Om/P+1TriDGeW0Q84H5+kawM</latexit> This Information Bottleneck is wishful thinking The IB is a statement of desire for future data we do not have: q ( z | x ) L = H p,q ( y | z ) + β I ( z ; x ) min What we have is the data collected in the past. What is the best way to use the past data in view of future tasks? 11

Testing Training data Weights } { , (car, horse, deer, …) Invariant representation

CS 103: Representation Learning, Information Theory and Control - PowerPoint PPT Presentation

CS 103: Representation Learning, Information Theory and Control Lecture 6, Feb 15, 2019 VAEs and disentanglement A - VAE minimizes the loss function: Factorized prior L = H p,q ( x | z ) + E x [KL( q ( z | x ) k p ( z ))] = H p,q ( x | z )

Slide 1 / 103 Slide 2 / 103 1 What is metabolism? 2 What role do enzymes play in metabolic

Slide 1 / 103 1 What is metabolism? Slide 2 / 103 2 What role do enzymes play in metabolic

CS 103: Representation Learning, Information Theory and Control Lecture 5, Feb 8, 2019

Eukaryotic Cellular Reproduction: Mitosis & Meiosis www.njctl.org Slide 3 / 103 Slide 4 /

Eukaryotic Cellular Reproduction: Mitosis & Meiosis www.njctl.org Slide 3 / 103 Slide 4 /

CS 103: Representation Learning, Information Theory and Control Lecture 8, Mar 1, 2019 Recap

CS 103: Representation Learning, Information Theory and Control Lecture 4, Feb 1, 2019 Seen last

CS 103: Representation Learning, Information Theory and Control Lecture 3, Jan 25, 2019 Seen

CS 103: Representation Learning, Information Theory and Control Lecture 1, Jan 11, 2019 What is

CS 103: Representation Learning, Information Theory and Control Lecture 2, Jan 18, 2019

BUILDING INFORMATION: 103-105 GREENE STREET ADDRESS: 103-105 GREENE STREET AKA 101 GREENE

Safety Reviews Highways 101, 103 and 104 Purpose of Road Safety Reviews (Highways 101, 103 and

Title: Healing Class 103 Week 1 Healing 103 Week 1 Jesus Healing Individuals Part 5

Membership Survey Completion Rates of 660 Responses! 120% 103% 103% 100% 91% 91% 84% 80%

MKA-65-B MKA-65-PM MKA-66-P MKA-87-P MKA-103-N MKA-103-BNBF MKA-34NL MKA-34-NLP MKA-44N

EKT 103 KT 103 CHAPTER CHAPTER 5 5 DC Machine Contents Contents Overview of Direct

Circle valued Morse theory and Novikov homology Andrew Ranicki Department of Mathematics and

Manolescus work on the triangulation conjecture Stipsicz Andrs Rnyi Institute of

Homological Stability for Selmer Spaces? Aaron Landesman Stanford University Workshop on

An introduction to link homology Marco Mackaay CAMGSD and Universidade do Algarve 2 September,

Invariants for transverse knots from Khovanov-type homologies Contact & links Kh-type

Do Super Cats Make Odd Knots? Sean Clark MPIM Oberseminar November 5, 2015 Sean Clark Do Super

QUANDLE COCYCLES FROM GROUP COCYCLES YUICHI KABAYA Abstract. We give a construction of a quandle

A simple tool from a complex system: A simple tool from a complex system: high- -throughput,

CS 103: Representation Learning, Information Theory and Control - PowerPoint PPT Presentation

CS 103: Representation Learning, Information Theory and Control Lecture 6, Feb 15, 2019 VAEs and disentanglement A - VAE minimizes the loss function: Factorized prior L = H p,q ( x | z ) + E x [KL( q ( z | x ) k p ( z ))] = H p,q ( x | z )

Slide 1 / 103 Slide 2 / 103 1 What is metabolism? 2 What role do enzymes play in metabolic

Slide 1 / 103 1 What is metabolism? Slide 2 / 103 2 What role do enzymes play in metabolic

CS 103: Representation Learning, Information Theory and Control Lecture 5, Feb 8, 2019

Eukaryotic Cellular Reproduction: Mitosis &amp; Meiosis www.njctl.org Slide 3 / 103 Slide 4 /

Eukaryotic Cellular Reproduction: Mitosis &amp; Meiosis www.njctl.org Slide 3 / 103 Slide 4 /

CS 103: Representation Learning, Information Theory and Control Lecture 8, Mar 1, 2019 Recap

CS 103: Representation Learning, Information Theory and Control Lecture 4, Feb 1, 2019 Seen last

CS 103: Representation Learning, Information Theory and Control Lecture 3, Jan 25, 2019 Seen

CS 103: Representation Learning, Information Theory and Control Lecture 1, Jan 11, 2019 What is

CS 103: Representation Learning, Information Theory and Control Lecture 2, Jan 18, 2019

BUILDING INFORMATION: 103-105 GREENE STREET ADDRESS: 103-105 GREENE STREET AKA 101 GREENE

Safety Reviews Highways 101, 103 and 104 Purpose of Road Safety Reviews (Highways 101, 103 and

Title: Healing Class 103 Week 1 Healing 103 Week 1 Jesus Healing Individuals Part 5

Membership Survey Completion Rates of 660 Responses! 120% 103% 103% 100% 91% 91% 84% 80%

MKA-65-B MKA-65-PM MKA-66-P MKA-87-P MKA-103-N MKA-103-BNBF MKA-34NL MKA-34-NLP MKA-44N

EKT 103 KT 103 CHAPTER CHAPTER 5 5 DC Machine Contents Contents Overview of Direct

Circle valued Morse theory and Novikov homology Andrew Ranicki Department of Mathematics and

Manolescus work on the triangulation conjecture Stipsicz Andrs Rnyi Institute of

Homological Stability for Selmer Spaces? Aaron Landesman Stanford University Workshop on

An introduction to link homology Marco Mackaay CAMGSD and Universidade do Algarve 2 September,

Invariants for transverse knots from Khovanov-type homologies Contact &amp; links Kh-type

Do Super Cats Make Odd Knots? Sean Clark MPIM Oberseminar November 5, 2015 Sean Clark Do Super

QUANDLE COCYCLES FROM GROUP COCYCLES YUICHI KABAYA Abstract. We give a construction of a quandle

A simple tool from a complex system: A simple tool from a complex system: high- -throughput,

Eukaryotic Cellular Reproduction: Mitosis & Meiosis www.njctl.org Slide 3 / 103 Slide 4 /

Eukaryotic Cellular Reproduction: Mitosis & Meiosis www.njctl.org Slide 3 / 103 Slide 4 /

Invariants for transverse knots from Khovanov-type homologies Contact & links Kh-type