The committee machine: Computational to statistical gaps in - PowerPoint PPT Presentation

The committee machine: Computational to statistical gaps in learning a two-layers neural network Benjamin Aubin , Antoine Maillard, Jean Barbier Nicolas Macris, Florent Krzakala & Lenka Zdeborovà Benjamin Aubin - Institut de Physique T héorique NeurIPS 2018

« Can we efficiently learn a teacher network from a limited number of samples? » p features K hidden units output f (1) ( X i ) n ( X i ) n ) n f (2) i =1 i =1 i =1 ı Y i samples samples ples f (1) W (2) fixed W ı ∈ R p × K K hidden units ? f (1) f (2) Y i f (1) W (2) fixed W Benjamin Aubin - Institut de Physique T héorique NeurIPS 2018

« Can we efficiently learn a teacher network from a limited number of samples? » p features K hidden units output f (1) ๏ Teacher: ( X i ) n ( X i ) n ) n f (2) i =1 i =1 i =1 ı Y i samples samples ples f (1) W (2) fixed W ı ∈ R p × K K hidden units ? f (1) f (2) Y i f (1) W (2) fixed W Benjamin Aubin - Institut de Physique T héorique NeurIPS 2018

« Can we efficiently learn a teacher network from a limited number of samples? » p features K hidden units output f (1) ๏ Teacher: ( X i ) n ( X i ) n ) n f (2) i =1 i =1 i =1 ı Y i samples samples ples f (1) W (2) ✓ Committee machine: second layer fixed   fixed [Schwarze’93] W ı ∈ R p × K K hidden units ? f (1) f (2) Y i f (1) W (2) fixed W Benjamin Aubin - Institut de Physique T héorique NeurIPS 2018

« Can we efficiently learn a teacher network from a limited number of samples? » p features K hidden units output f (1) ๏ Teacher: ( X i ) n ( X i ) n ) n f (2) i =1 i =1 i =1 ı Y i samples samples ples f (1) W (2) ✓ Committee machine: second layer fixed   fixed [Schwarze’93] W ı ∈ R p × K ✓ i.i.d samples K hidden units ? f (1) f (2) Y i f (1) W (2) fixed W Benjamin Aubin - Institut de Physique T héorique NeurIPS 2018

« Can we efficiently learn a teacher network from a limited number of samples? » p features K hidden units output f (1) ๏ Teacher: ( X i ) n ( X i ) n ) n f (2) i =1 i =1 i =1 ı Y i samples samples ples f (1) W (2) ✓ Committee machine: second layer fixed   fixed [Schwarze’93] W ı ∈ R p × K ✓ i.i.d samples K hidden units ? ๏ Student: f (1) f (2) Y i f (1) W (2) fixed W Benjamin Aubin - Institut de Physique T héorique NeurIPS 2018

« Can we efficiently learn a teacher network from a limited number of samples? » p features K hidden units output f (1) ๏ Teacher: ( X i ) n ) n f (2) i =1 i =1 ı Y i samples ples f (1) W (2) ✓ Committee machine: second layer fixed   fixed [Schwarze’93] W ı ∈ R p × K ✓ i.i.d samples K hidden units ? ๏ Student: f (1) ( X i ) n f (2) i =1 Y i samples f (1) W (2) fixed W Y ı i Benjamin Aubin - Institut de Physique T héorique NeurIPS 2018

« Can we efficiently learn a teacher network from a limited number of samples? » p features K hidden units output f (1) ๏ Teacher: ( X i ) n ) n f (2) i =1 i =1 ı Y i samples ples f (1) W (2) ✓ Committee machine: second layer fixed   fixed [Schwarze’93] W ı ∈ R p × K ✓ i.i.d samples K hidden units ? ๏ Student: f (1) ( X i ) n f (2) i =1 Y i ✓ Learning task possible ? samples f (1) W (2) fixed W Y ı i Benjamin Aubin - Institut de Physique T héorique NeurIPS 2018

« Can we efficiently learn a teacher network from a limited number of samples? » p features K hidden units output f (1) ๏ Teacher: ( X i ) n ) n f (2) i =1 i =1 ı Y i samples ples f (1) W (2) ✓ Committee machine: second layer fixed   fixed [Schwarze’93] W ı ∈ R p × K ✓ i.i.d samples K hidden units ? ๏ Student: f (1) ( X i ) n f (2) i =1 Y i ✓ Learning task possible ? samples f (1) W (2) ✓ Computational complexity? fixed W Y ı i Benjamin Aubin - Institut de Physique T héorique NeurIPS 2018

Motivation ➡ Traditional approach ๏ Worst case scenario/PAC bounds: VC-dim & Rademacher complexity ๏ Numerical experiments Benjamin Aubin - Institut de Physique T héorique NeurIPS 2018

Motivation ➡ Traditional approach ๏ Worst case scenario/PAC bounds: VC-dim & Rademacher complexity ๏ Numerical experiments ➡ Complementary approach ✓ Revisit the statistical physics typical case scenario [Sompolinsky’92, Mezard’87] :   i.i.d data coming from a probabilistic model Benjamin Aubin - Institut de Physique T héorique NeurIPS 2018

Motivation ➡ Traditional approach ๏ Worst case scenario/PAC bounds: VC-dim & Rademacher complexity ๏ Numerical experiments ➡ Complementary approach ✓ Revisit the statistical physics typical case scenario [Sompolinsky’92, Mezard’87] :   i.i.d data coming from a probabilistic model ✓ Theoretical understanding of the generalization performance p → ∞ , n p = Θ (1) ✓ Regime: Benjamin Aubin - Institut de Physique T héorique NeurIPS 2018

Main result (1) - Generalization error ๏ Information theoretically optimal generalization error   (Bayes optimal case) ≡ 1 h� � 2 i ✏ ( p ) ⇥ ⇤ − Y ? ( XW ? ) p →∞ ✏ g ( q ∗ ) Y ( XW ) 2 E X , W ? E W | X − − − → g Benjamin Aubin - Institut de Physique T héorique NeurIPS 2018

Main result (1) - Generalization error ๏ Information theoretically optimal generalization error   (Bayes optimal case) ≡ 1 h� � 2 i ✏ ( p ) ⇥ ⇤ − Y ? ( XW ? ) p →∞ ✏ g ( q ∗ ) Y ( XW ) 2 E X , W ? E W | X EXPLICIT − − − → g Benjamin Aubin - Institut de Physique T héorique NeurIPS 2018

Main result (1) - Generalization error ๏ Information theoretically optimal generalization error   (Bayes optimal case) ≡ 1 h� � 2 i ✏ ( p ) ⇥ ⇤ − Y ? ( XW ? ) p →∞ ✏ g ( q ∗ ) Y ( XW ) 2 E X , W ? E W | X EXPLICIT − − − → g ๏ : extremizing the variational formulation of this mutual information : q ∗ 1 ψ P 0 ( r ) + α Ψ out ( q ) − 1 n o lim pI ( W ; Y | X ) = − sup inf 2Tr( rq ) + cst p →∞ q ∈ S + r ∈ S + K K Benjamin Aubin - Institut de Physique T héorique NeurIPS 2018

Main result (1) - Generalization error ๏ Information theoretically optimal generalization error   (Bayes optimal case) ≡ 1 h� � 2 i ✏ ( p ) ⇥ ⇤ − Y ? ( XW ? ) p →∞ ✏ g ( q ∗ ) Y ( XW ) 2 E X , W ? E W | X EXPLICIT − − − → g ๏ : extremizing the variational formulation of this mutual information : q ∗ 1 ψ P 0 ( r ) + α Ψ out ( q ) − 1 n o lim pI ( W ; Y | X ) = − sup inf 2Tr( rq ) + cst p →∞ q ∈ S + r ∈ S + K K Heuristic replica mutual information well known in statistical   physics since 80’s Benjamin Aubin - Institut de Physique T héorique NeurIPS 2018

Main result (1) - Generalization error ๏ Information theoretically optimal generalization error   (Bayes optimal case) ≡ 1 h� � 2 i ✏ ( p ) ⇥ ⇤ − Y ? ( XW ? ) p →∞ ✏ g ( q ∗ ) Y ( XW ) 2 E X , W ? E W | X EXPLICIT − − − → g ๏ : extremizing the variational formulation of this mutual information : q ∗ 1 ψ P 0 ( r ) + α Ψ out ( q ) − 1 n o lim pI ( W ; Y | X ) = − sup inf 2Tr( rq ) + cst p →∞ q ∈ S + r ∈ S + K K Heuristic replica mutual information well known in statistical   physics since 80’s ✓ Main contribution: rigorous proof by adaptive (Guerra) interpolation Benjamin Aubin - Institut de Physique T héorique NeurIPS 2018

Main result (2) - Message Passing Algorithm Benjamin Aubin - Institut de Physique T héorique NeurIPS 2018

Main result (2) - Message Passing Algorithm ๏ Traditional approach: ‣ Minimize a loss function. Not optimal for limited number of samples. Benjamin Aubin - Institut de Physique T héorique NeurIPS 2018

Main result (2) - Message Passing Algorithm ๏ Traditional approach: ‣ Minimize a loss function. Not optimal for limited number of samples. w j P 0 ( w j ) P out ( Y i | X i W ) ๏ Approximate Message Passing (AMP) algorithm: ‣ Expansion of BP equations on a factor graph. Closed set of iterative equations.   m j ( w j ) Estimates marginal probabilities m j → i ( w j ) Factor graph representation of the committee machine Benjamin Aubin - Institut de Physique T héorique NeurIPS 2018

Main result (2) - Message Passing Algorithm ๏ Traditional approach: ‣ Minimize a loss function. Not optimal for limited number of samples. w j P 0 ( w j ) P out ( Y i | X i W ) ๏ Approximate Message Passing (AMP) algorithm: ‣ Expansion of BP equations on a factor graph. Closed set of iterative equations.   m j ( w j ) Estimates marginal probabilities ✓ Conjectured to be optimal among polynomial algorithms m j → i ( w j ) ✓ Can be tracked rigorously (state evolution Factor graph representation given by critical points of the replica mutual of the committee machine information) [Montanari-Bayati ‘10] Benjamin Aubin - Institut de Physique T héorique NeurIPS 2018

Gaussian weights - sign activation Large number of hidden units K = Θ p (1) Benjamin Aubin - Institut de Physique T héorique NeurIPS 2018

The committee machine: Computational to statistical gaps in - PowerPoint PPT Presentation

The committee machine: Computational to statistical gaps in learning a two-layers neural network Benjamin Aubin , Antoine Maillard, Jean Barbier Nicolas Macris, Florent Krzakala & Lenka Zdeborov Benjamin Aubin - Institut de Physique T

GAPS GAPS Dark matter search Dark matter search using low using low- -energy

Statistical Machine Translation George Foster George Foster Statistical Machine Translation A

Data2X Overview: About Data2X Mapping gender data gaps More than routine gaps: bad

Statistical Machine Translation Statistical Machine Translation p Lecture 2 Theory and Praxis of

Statistical Statistical Statistical Model Statistical Model Model Checking Model Checking

Statistical Machine Translation Graham Neubig Nara Institute of Science and Technology (NAIST)

Domain Adaptation in Statistical Machine Translation Logic, Language and Computation Bart

Brief introduction to computational & statistical neuroscience Jonathan Pillow Lecture #1

Statistical and Computational Statistical and Computational Learning Theory Learning Theory

Computational and Statistical Aspects of Statistical Machine Learning John Lafferty Department

COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Trevor Cohn 23. PGM

Mind the Gaps: Reducing the Inequalities & Research Implementation Gaps in the English

Francesca Bettio Gender Income Gaps and Inequality Universit di Siena This lecture Are

How to Identify Patient Gaps in Care Angela Hale Quality Improvement Advisor PHA Physicians

CHARGE DEPOSITIONS IN THE APA GAPS FILTERING GAP CROSSING EVENTS FILTERING STOPPING EVENTS USING

The GAPS experiment a search for cosmic-ray antinuclei from dark matter M. Kozai (ISAS/JAXA)

Inductive Logic Programming. Part 2 Based partially on Luc De Raedts slides

9- Generalization and Leibniz Rules Ref: G. Tourlakis, Mathematical Logic , John Wiley & Sons,

Carnegie Mellon Univ. Dept. of Computer Science 15-415/615 - DB Applications C. Faloutsos

THE ENHANCED ER (EER) MODEL CHAPTER 8 (6/E) CHAPTER 4 (5/E) CHAPTER 8 OUTLINE Extending

Query reformulation model and patterns from dango to japanese cakes M Universit

The Generalization of the Conjunctive Rule for Aggregating Contradictory Sources of Information

CPSC 121: Models of Computation Unit 4 Propositional Logic Proofs Based on slides by Patrice

Program Realisation 2 Todays Topics Unit testing, manual versus automated

The committee machine: Computational to statistical gaps in - PowerPoint PPT Presentation

The committee machine: Computational to statistical gaps in learning a two-layers neural network Benjamin Aubin , Antoine Maillard, Jean Barbier Nicolas Macris, Florent Krzakala & Lenka Zdeborov Benjamin Aubin - Institut de Physique T

GAPS GAPS Dark matter search Dark matter search using low using low- -energy

Statistical Machine Translation George Foster George Foster Statistical Machine Translation A

Data2X Overview: About Data2X Mapping gender data gaps More than routine gaps: bad

Statistical Machine Translation Statistical Machine Translation p Lecture 2 Theory and Praxis of

Statistical Statistical Statistical Model Statistical Model Model Checking Model Checking

Statistical Machine Translation Graham Neubig Nara Institute of Science and Technology (NAIST)

Domain Adaptation in Statistical Machine Translation Logic, Language and Computation Bart

Brief introduction to computational &amp; statistical neuroscience Jonathan Pillow Lecture #1

Statistical and Computational Statistical and Computational Learning Theory Learning Theory

Computational and Statistical Aspects of Statistical Machine Learning John Lafferty Department

COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Trevor Cohn 23. PGM

Mind the Gaps: Reducing the Inequalities &amp; Research Implementation Gaps in the English

Francesca Bettio Gender Income Gaps and Inequality Universit di Siena This lecture Are

How to Identify Patient Gaps in Care Angela Hale Quality Improvement Advisor PHA Physicians

CHARGE DEPOSITIONS IN THE APA GAPS FILTERING GAP CROSSING EVENTS FILTERING STOPPING EVENTS USING

The GAPS experiment a search for cosmic-ray antinuclei from dark matter M. Kozai (ISAS/JAXA)

Inductive Logic Programming. Part 2 Based partially on Luc De Raedts slides

9- Generalization and Leibniz Rules Ref: G. Tourlakis, Mathematical Logic , John Wiley &amp; Sons,

Carnegie Mellon Univ. Dept. of Computer Science 15-415/615 - DB Applications C. Faloutsos

THE ENHANCED ER (EER) MODEL CHAPTER 8 (6/E) CHAPTER 4 (5/E) CHAPTER 8 OUTLINE Extending

Query reformulation model and patterns from dango to japanese cakes M Universit

The Generalization of the Conjunctive Rule for Aggregating Contradictory Sources of Information

CPSC 121: Models of Computation Unit 4 Propositional Logic Proofs Based on slides by Patrice

Program Realisation 2 Todays Topics Unit testing, manual versus automated

Brief introduction to computational & statistical neuroscience Jonathan Pillow Lecture #1

Mind the Gaps: Reducing the Inequalities & Research Implementation Gaps in the English

9- Generalization and Leibniz Rules Ref: G. Tourlakis, Mathematical Logic , John Wiley & Sons,