the committee machine computational to statistical gaps
play

The committee machine: Computational to statistical gaps in - PowerPoint PPT Presentation

The committee machine: Computational to statistical gaps in learning a two-layers neural network Benjamin Aubin , Antoine Maillard, Jean Barbier Nicolas Macris, Florent Krzakala & Lenka Zdeborov Benjamin Aubin - Institut de Physique T


  1. The committee machine: Computational to statistical gaps in learning a two-layers neural network Benjamin Aubin , Antoine Maillard, Jean Barbier Nicolas Macris, Florent Krzakala & Lenka Zdeborovà Benjamin Aubin - Institut de Physique T héorique NeurIPS 2018

  2. « Can we efficiently learn a teacher network from a limited number of samples? » p features K hidden units output f (1) ( X i ) n ( X i ) n ) n f (2) i =1 i =1 i =1 ı Y i samples samples ples f (1) W (2) fixed W ı ∈ R p × K K hidden units ? f (1) f (2) Y i f (1) W (2) fixed W Benjamin Aubin - Institut de Physique T héorique NeurIPS 2018

  3. « Can we efficiently learn a teacher network from a limited number of samples? » p features K hidden units output f (1) ๏ Teacher: ( X i ) n ( X i ) n ) n f (2) i =1 i =1 i =1 ı Y i samples samples ples f (1) W (2) fixed W ı ∈ R p × K K hidden units ? f (1) f (2) Y i f (1) W (2) fixed W Benjamin Aubin - Institut de Physique T héorique NeurIPS 2018

  4. « Can we efficiently learn a teacher network from a limited number of samples? » p features K hidden units output f (1) ๏ Teacher: ( X i ) n ( X i ) n ) n f (2) i =1 i =1 i =1 ı Y i samples samples ples f (1) W (2) ✓ Committee machine: second layer fixed 
 fixed [Schwarze’93] W ı ∈ R p × K K hidden units ? f (1) f (2) Y i f (1) W (2) fixed W Benjamin Aubin - Institut de Physique T héorique NeurIPS 2018

  5. « Can we efficiently learn a teacher network from a limited number of samples? » p features K hidden units output f (1) ๏ Teacher: ( X i ) n ( X i ) n ) n f (2) i =1 i =1 i =1 ı Y i samples samples ples f (1) W (2) ✓ Committee machine: second layer fixed 
 fixed [Schwarze’93] W ı ∈ R p × K ✓ i.i.d samples K hidden units ? f (1) f (2) Y i f (1) W (2) fixed W Benjamin Aubin - Institut de Physique T héorique NeurIPS 2018

  6. « Can we efficiently learn a teacher network from a limited number of samples? » p features K hidden units output f (1) ๏ Teacher: ( X i ) n ( X i ) n ) n f (2) i =1 i =1 i =1 ı Y i samples samples ples f (1) W (2) ✓ Committee machine: second layer fixed 
 fixed [Schwarze’93] W ı ∈ R p × K ✓ i.i.d samples K hidden units ? ๏ Student: f (1) f (2) Y i f (1) W (2) fixed W Benjamin Aubin - Institut de Physique T héorique NeurIPS 2018

  7. « Can we efficiently learn a teacher network from a limited number of samples? » p features K hidden units output f (1) ๏ Teacher: ( X i ) n ) n f (2) i =1 i =1 ı Y i samples ples f (1) W (2) ✓ Committee machine: second layer fixed 
 fixed [Schwarze’93] W ı ∈ R p × K ✓ i.i.d samples K hidden units ? ๏ Student: f (1) ( X i ) n f (2) i =1 Y i samples f (1) W (2) fixed W Y ı i Benjamin Aubin - Institut de Physique T héorique NeurIPS 2018

  8. « Can we efficiently learn a teacher network from a limited number of samples? » p features K hidden units output f (1) ๏ Teacher: ( X i ) n ) n f (2) i =1 i =1 ı Y i samples ples f (1) W (2) ✓ Committee machine: second layer fixed 
 fixed [Schwarze’93] W ı ∈ R p × K ✓ i.i.d samples K hidden units ? ๏ Student: f (1) ( X i ) n f (2) i =1 Y i ✓ Learning task possible ? samples f (1) W (2) fixed W Y ı i Benjamin Aubin - Institut de Physique T héorique NeurIPS 2018

  9. « Can we efficiently learn a teacher network from a limited number of samples? » p features K hidden units output f (1) ๏ Teacher: ( X i ) n ) n f (2) i =1 i =1 ı Y i samples ples f (1) W (2) ✓ Committee machine: second layer fixed 
 fixed [Schwarze’93] W ı ∈ R p × K ✓ i.i.d samples K hidden units ? ๏ Student: f (1) ( X i ) n f (2) i =1 Y i ✓ Learning task possible ? samples f (1) W (2) ✓ Computational complexity? fixed W Y ı i Benjamin Aubin - Institut de Physique T héorique NeurIPS 2018

  10. Motivation ➡ Traditional approach ๏ Worst case scenario/PAC bounds: VC-dim & Rademacher complexity ๏ Numerical experiments Benjamin Aubin - Institut de Physique T héorique NeurIPS 2018

  11. Motivation ➡ Traditional approach ๏ Worst case scenario/PAC bounds: VC-dim & Rademacher complexity ๏ Numerical experiments ➡ Complementary approach ✓ Revisit the statistical physics typical case scenario [Sompolinsky’92, Mezard’87] : 
 i.i.d data coming from a probabilistic model Benjamin Aubin - Institut de Physique T héorique NeurIPS 2018

  12. Motivation ➡ Traditional approach ๏ Worst case scenario/PAC bounds: VC-dim & Rademacher complexity ๏ Numerical experiments ➡ Complementary approach ✓ Revisit the statistical physics typical case scenario [Sompolinsky’92, Mezard’87] : 
 i.i.d data coming from a probabilistic model ✓ Theoretical understanding of the generalization performance p → ∞ , n p = Θ (1) ✓ Regime: Benjamin Aubin - Institut de Physique T héorique NeurIPS 2018

  13. Main result (1) - Generalization error ๏ Information theoretically optimal generalization error 
 (Bayes optimal case) ≡ 1 h� � 2 i ✏ ( p ) ⇥ ⇤ − Y ? ( XW ? ) p →∞ ✏ g ( q ∗ ) Y ( XW ) 2 E X , W ? E W | X − − − → g Benjamin Aubin - Institut de Physique T héorique NeurIPS 2018

  14. Main result (1) - Generalization error ๏ Information theoretically optimal generalization error 
 (Bayes optimal case) ≡ 1 h� � 2 i ✏ ( p ) ⇥ ⇤ − Y ? ( XW ? ) p →∞ ✏ g ( q ∗ ) Y ( XW ) 2 E X , W ? E W | X EXPLICIT − − − → g Benjamin Aubin - Institut de Physique T héorique NeurIPS 2018

  15. Main result (1) - Generalization error ๏ Information theoretically optimal generalization error 
 (Bayes optimal case) ≡ 1 h� � 2 i ✏ ( p ) ⇥ ⇤ − Y ? ( XW ? ) p →∞ ✏ g ( q ∗ ) Y ( XW ) 2 E X , W ? E W | X EXPLICIT − − − → g Benjamin Aubin - Institut de Physique T héorique NeurIPS 2018

  16. Main result (1) - Generalization error ๏ Information theoretically optimal generalization error 
 (Bayes optimal case) ≡ 1 h� � 2 i ✏ ( p ) ⇥ ⇤ − Y ? ( XW ? ) p →∞ ✏ g ( q ∗ ) Y ( XW ) 2 E X , W ? E W | X EXPLICIT − − − → g ๏ : extremizing the variational formulation of this mutual information : q ∗ 1 ψ P 0 ( r ) + α Ψ out ( q ) − 1 n o lim pI ( W ; Y | X ) = − sup inf 2Tr( rq ) + cst p →∞ q ∈ S + r ∈ S + K K Benjamin Aubin - Institut de Physique T héorique NeurIPS 2018

  17. Main result (1) - Generalization error ๏ Information theoretically optimal generalization error 
 (Bayes optimal case) ≡ 1 h� � 2 i ✏ ( p ) ⇥ ⇤ − Y ? ( XW ? ) p →∞ ✏ g ( q ∗ ) Y ( XW ) 2 E X , W ? E W | X EXPLICIT − − − → g ๏ : extremizing the variational formulation of this mutual information : q ∗ 1 ψ P 0 ( r ) + α Ψ out ( q ) − 1 n o lim pI ( W ; Y | X ) = − sup inf 2Tr( rq ) + cst p →∞ q ∈ S + r ∈ S + K K Heuristic replica mutual information well known in statistical 
 physics since 80’s Benjamin Aubin - Institut de Physique T héorique NeurIPS 2018

  18. Main result (1) - Generalization error ๏ Information theoretically optimal generalization error 
 (Bayes optimal case) ≡ 1 h� � 2 i ✏ ( p ) ⇥ ⇤ − Y ? ( XW ? ) p →∞ ✏ g ( q ∗ ) Y ( XW ) 2 E X , W ? E W | X EXPLICIT − − − → g ๏ : extremizing the variational formulation of this mutual information : q ∗ 1 ψ P 0 ( r ) + α Ψ out ( q ) − 1 n o lim pI ( W ; Y | X ) = − sup inf 2Tr( rq ) + cst p →∞ q ∈ S + r ∈ S + K K Heuristic replica mutual information well known in statistical 
 physics since 80’s ✓ Main contribution: rigorous proof by adaptive (Guerra) interpolation Benjamin Aubin - Institut de Physique T héorique NeurIPS 2018

  19. Main result (2) - Message Passing Algorithm Benjamin Aubin - Institut de Physique T héorique NeurIPS 2018

  20. Main result (2) - Message Passing Algorithm ๏ Traditional approach: ‣ Minimize a loss function. Not optimal for limited number of samples. Benjamin Aubin - Institut de Physique T héorique NeurIPS 2018

  21. Main result (2) - Message Passing Algorithm ๏ Traditional approach: ‣ Minimize a loss function. Not optimal for limited number of samples. w j P 0 ( w j ) P out ( Y i | X i W ) ๏ Approximate Message Passing (AMP) algorithm: ‣ Expansion of BP equations on a factor graph. Closed set of iterative equations. 
 m j ( w j ) Estimates marginal probabilities m j → i ( w j ) Factor graph representation of the committee machine Benjamin Aubin - Institut de Physique T héorique NeurIPS 2018

  22. Main result (2) - Message Passing Algorithm ๏ Traditional approach: ‣ Minimize a loss function. Not optimal for limited number of samples. w j P 0 ( w j ) P out ( Y i | X i W ) ๏ Approximate Message Passing (AMP) algorithm: ‣ Expansion of BP equations on a factor graph. Closed set of iterative equations. 
 m j ( w j ) Estimates marginal probabilities ✓ Conjectured to be optimal among polynomial algorithms m j → i ( w j ) ✓ Can be tracked rigorously (state evolution Factor graph representation given by critical points of the replica mutual of the committee machine information) [Montanari-Bayati ‘10] Benjamin Aubin - Institut de Physique T héorique NeurIPS 2018

  23. Gaussian weights - sign activation Large number of hidden units K = Θ p (1) Benjamin Aubin - Institut de Physique T héorique NeurIPS 2018

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend