probabilistic symmetry and invariant neural networks
play

Probabilistic symmetry and invariant neural networks Benjamin - PowerPoint PPT Presentation

Probabilistic symmetry and invariant neural networks Benjamin Bloem-Reddy , University of Oxford Work with Yee Whye Teh 14 January 2019, UBC Computer Science models Outline B. Bloem-Reddy 2 / 27 Symmetry in neural networks


  1. Probabilistic symmetry and invariant neural networks Benjamin Bloem-Reddy , University of Oxford Work with Yee Whye Teh 14 January 2019, UBC Computer Science

  2. models Outline B. Bloem-Reddy 2 / 27 • Symmetry in neural networks • Permutation-invariant neural networks • Symmetry in probability and statistics • Exchangeable sequences • Permutation-invariant neural networks as exchangeable probability • Symmetry in neural networks as probabilistic symmetry

  3. Deep learning and statistics settings. semi-/unsupervised domains. B. Bloem-Reddy 3 / 27 • Deep neural networks have been applied successfully in a range of • Effort under way to improve performance in data poor and • Focus on symmetry . • The study of symmetry in probability and statistics has a long history.

  4. Symmetric neural networks n network. If X and Y are assumed to satisfy a symmetry property, B. Bloem-Reddy 4 / 27 ( ) w ( ℓ ) ∑ f ℓ, i = σ i , j f ℓ − 1 , j j = 1 For input X and output Y , model Y = h ( X ) , where h ∈ H is a neural how is H restricted?

  5. Symmetric neural networks Convolutional neural networks encode translation invariance: Illustration from medium.freecodecamp.org B. Bloem-Reddy 5 / 27

  6. Why symmetry? Stabler training and better generalization through • reduction in dimension of parameter space through weight-tying; and • capturing structure at multiple scales via pooling. Historical note: Interest in invariant neural networks goes back at least to Minsky and Papert [MP88]; extended by Shawe-Taylor and Wood [Sha89; WS96]. More recent work by a host of others. B. Bloem-Reddy 6 / 27 Encoding symmetry in network architecture is a Good Thing ∗ .

  7. Neural networks for permutation-invariant data [Zah+17] Permutation invariance: X 1 X 2 X 3 X 4 Y B. Bloem-Reddy 7 / 27 Consider a sequence X n := ( X 1 , . . . , X n ) , X i ∈ X . Y = h ( X n ) = h ( π · X n ) for all π ∈ S n .

  8. Neural networks for permutation-invariant data [Zah+17] X 1 B. Bloem-Reddy n h Y X 3 X 2 X 4 7 / 27 X 2 Permutation invariance: Y X 1 X 3 X 4 Consider a sequence X n := ( X 1 , . . . , X n ) , X i ∈ X . Y = h ( X n ) = h ( π · X n ) for all π ∈ S n . �→ ( ) Y = ˜ ∑ Y = h ( X n ) �→ φ ( X i ) i = 1

  9. Neural networks for permutation-invariant data [Zah+17] Equivariance: X 1 X 2 X 3 X 4 Y 1 Y 2 Y 3 Y 4 B. Bloem-Reddy 8 / 27 Y n = h ( X n ) such that h ( π · X n ) = π · h ( X n ) for all π ∈ S n .

  10. Neural networks for permutation-invariant data [Zah+17] X 3 B. Bloem-Reddy X j n n Equivariance: Y 3 Y 2 Y 1 X 4 Y 4 X 2 X 3 X 1 X 2 X 1 8 / 27 X 4 Y 4 Y 1 Y 2 Y 3 Y n = h ( X n ) such that h ( π · X n ) = π · h ( X n ) for all π ∈ S n . ( ) ( ) ∑ ∑ [ h ( X n )] i = σ �→ [ h ( X n )] i = σ w 0 X i + w 1 w i , j X j j = 1 j = 1

  11. Neural networks for permutation-invariant data . . . B. Bloem-Reddy 9 / 27

  12. You could probably make some money making decent hats. Note to students: These were the first Google Image results for ”deep learning hat” and ”statistics hat”. B. Bloem-Reddy 10 / 27 ⟨⟨ Deep learning hat, off; statistics hat, on ⟩⟩

  13. Statistical models and symmetry If X is assumed to satisfy a symmetry property, B. Bloem-Reddy 11 / 27 Consider a sequence X n := ( X 1 , . . . , X n ) , X i ∈ X . A statistical model of X n is a family of probability distributions on X n : P = { P θ : θ ∈ Ω } . how is P restricted?

  14. Exchangeable sequences de Finetti’s theorem: iid Implication for Bayesian inference: Analogous theorems for other symmetries. The book by Kallenberg [Kal05] collects many of them. Some other accessible references: [Dia88; OR15]. B. Bloem-Reddy 12 / 27 A distribution P on X n is exchangeable if P ( X 1 , . . . , X n ) = P ( X π ( 1 ) , . . . , X π ( n ) ) for all π ∈ S n . X N is infinitely exchangeable if this is true for all prefixes X n ⊂ X N , n ∈ N . X N exchangeable ⇐ ⇒ X i | Q ∼ Q for some random Q . Our models for X N need only consist of i.i.d. distributions on X .

  15. Finite exchangeable sequences de Finetti’s theorem may fail for finite exchangeable sequences. What else can we say? n B. Bloem-Reddy 13 / 27 The empirical measure of X n is ∑ M X n ( • ) := δ X i ( • ) . i = 1

  16. Finite exchangeable sequences The empirical measure is a sufficient statistic : P is exchangeable iff with empirical measure m . B. Bloem-Reddy 14 / 27 • | M X n = m ) = U m ( • ) , P ( X n ∈ where U m is the uniform distribution on all sequences ( x 1 , . . . , x n )

  17. Finite exchangeable sequences The empirical measure is a sufficient statistic : P is exchangeable iff with empirical measure m . d The empirical measure is an adequate statistic for any such Y : B. Bloem-Reddy 14 / 27 • | M X n = m ) = U m ( • ) , P ( X n ∈ where U m is the uniform distribution on all sequences ( x 1 , . . . , x n ) Consider Y such that ( π · X n , Y ) = ( X n , Y ) . • | X n = x n ) = P ( Y ∈ • | M X n = M x n ) . P ( Y ∈ M X n contains all information in X n that is relevant for predicting Y .

  18. A useful theorem Theorem (Invariant representation; B-R, Teh) d a.s. B. Bloem-Reddy 15 / 27 Suppose X n is an exchangeable sequence. Then ( π · X n , Y ) = ( X n , Y ) for all π ∈ S n if and only if there is a mea- surable function ˜ h : [ 0 , 1 ] × M ( X ) → Y such that = ( X n , ˜ ( X n , Y ) h ( η, M X n )) and η ∼ Unif [ 0 , 1 ] , η ⊥ ⊥ X n .

  19. A useful theorem X 3 B. Bloem-Reddy n h n h Y Theorem (Invariant representation; B-R, Teh) X 3 X 2 X 1 Y X 4 X 4 X 2 a.s. d X 1 15 / 27 Suppose X n is an exchangeable sequence. Then ( π · X n , Y ) = ( X n , Y ) for all π ∈ S n if and only if there is a mea- surable function ˜ h : [ 0 , 1 ] × M ( X ) → Y such that = ( X n , ˜ ( X n , Y ) h ( η, M X n )) and η ∼ Unif [ 0 , 1 ] , η ⊥ ⊥ X n . Deterministic invariance [Zah+17] �→ stochastic invariance [B-R, Teh] η ( ) ( ) Y = ˜ ∑ Y = ˜ ∑ φ ( X i ) �→ η, δ X i i = 1 i = 1

  20. Another useful theorem a.s. B. Bloem-Reddy iid Theorem (Equivariant representation; B-R, Teh) 16 / 27 d Suppose X n is an exchangeable sequence and Y i ⊥ ⊥ X n ( Y n \ Y i ) . Then ( π · X n , π · Y n ) = ( X n , Y n ) for all π ∈ S n if and only if there is a measurable function ˜ h : [ 0 , 1 ] × X × M ( X ) → Y such that X n , (˜ ( ) ( X n , Y n ) = h ( η i , X i , M X n )) i ∈ [ n ] and η i ∼ Unif [ 0 , 1 ] , ( η i ) i ∈ [ n ] ⊥ ⊥ X n .

  21. Another useful theorem Theorem (Equivariant representation; B-R, Teh) Y 2 Y 3 Y 4 X 1 X 2 X 3 X 4 Y 1 Y 2 Y 4 X 4 n X j h n w 0 X i w 1 n j 1 B. Bloem-Reddy Y 1 Y 3 X 3 iid d X 2 a.s. X 1 16 / 27 Suppose X n is an exchangeable sequence and Y i ⊥ ⊥ X n ( Y n \ Y i ) . Then ( π · X n , π · Y n ) = ( X n , Y n ) for all π ∈ S n if and only if there is a measurable function ˜ h : [ 0 , 1 ] × X × M ( X ) → Y such that X n , (˜ ( ) ( X n , Y n ) = h ( η i , X i , M X n )) i ∈ [ n ] and η i ∼ Unif [ 0 , 1 ] , ( η i ) i ∈ [ n ] ⊥ ⊥ X n . Deterministic equivariance [Zah+17] �→ stochastic equivariance [B-R, Teh] η 1 η 2 η 3 η 4 ( ) ( ) ∑ Y i = ˜ ∑ Y i = σ w 0 X i + w 1 �→ η i , X i , δ X j j = 1 j = 1 ( ) ∫ ∑ X j dx

  22. models Outline B. Bloem-Reddy 17 / 27 • Symmetry in neural networks • Permutation-invariant neural networks • Symmetry in probability and statistics • Exchangeable sequences • Permutation-invariant neural networks as exchangeable probability • Symmetry in neural networks as probabilistic symmetry

  23. and A bit of group theory B. Bloem-Reddy 18 / 27 For a group G acting on a set X : • The orbit of any x ∈ X is the subset of X generated by applying G to x : G · x = { g · x ; g ∈ G} . • A maximal invariant statistic M : X → S (i) is constant on an orbit, i.e., M ( g · x ) = M ( x ) for all g ∈ G and x ∈ X ; (ii) takes a different value on each orbit, i.e., M ( x 1 ) = M ( x 2 ) implies x 1 = g · x 2 for some g ∈ G . • A maximal equivariant τ : X → G satisfies τ ( g · X ) = g · τ ( x ) , g ∈ G , x ∈ X .

  24. A general invariance theorem d B. Bloem-Reddy a.s. Theorem (B-R, Teh) 19 / 27 d Let G be a compact group and assume that g · X = X for all g ∈ G . Let M : X → S be a maximal invariant. Then ( g · X , Y ) = ( X , Y ) for all g ∈ G if and only if there exists a mea- surable function ˜ h : [ 0 , 1 ] × S → Y such that X , ˜ ( ) ( X , Y ) = h ( η, M ( X )) with η ∼ Unif [ 0 , 1 ] and η ⊥ ⊥ X .

  25. Proof by picture B. Bloem-Reddy 20 / 27 P ( g · X , Y ) = P ( X , Y ) for all g ∈ G X Y

  26. Proof by picture B. Bloem-Reddy 20 / 27 P ( g · X , M ( g · X ) , Y ) = P ( X , M ( X ) , Y ) for all g ∈ G ⇒ Y ⊥ ⊥ M ( X ) X X Y M ( X )

  27. A general equivariance theorem a.s. B. Bloem-Reddy a.s. h is equivariant: Theorem (Kallenberg; B-R, Teh) 21 / 27 d d Let G be a compact group and assume that g · X = X for all g ∈ G . Assume that a maximal equivariant τ : X → G exists. Then ( g · X , g · Y ) = ( X , Y ) for all g ∈ G if and only if there exists a measurable function ˜ h : [ 0 , 1 ] × X → Y such that X , ˜ ( ) ( X , Y ) = h ( η, X ) with η ∼ Unif [ 0 , 1 ] and η ⊥ ⊥ X , where ˜ ˜ = g · ˜ h ( η, g · X ) h ( η, X ) , g ∈ G .

  28. Proof by picture B. Bloem-Reddy 22 / 27 P ( g · X , g · Y ) = P ( X , Y ) for all g ∈ G X Y

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend