Invariant neural networks and probabilistic symmetry Benjamin - PowerPoint PPT Presentation

Invariant neural networks and probabilistic symmetry Benjamin Bloem-Reddy , University of Oxford Work with Yee Whye Teh 5 October 2018, OxWaSP Workshop

Deep learning and statistics settings. semi-/unsupervised domains. B. Bloem-Reddy 2 / 20 • Deep neural networks have been applied successfully in a range of • Effort under way to improve performance in data poor and • Focus on symmetry . • The study of symmetry in probability and statistics has a long history.

Symmetric neural networks n network. If X and Y are assumed to satisfy a symmetry property, B. Bloem-Reddy 3 / 20 ( ) w ( ℓ ) ∑ f ℓ, i = σ i , j f ℓ − 1 , j j = 1 For input X and output Y , model Y = h ( X ) , where h ∈ H is a neural how is H restricted?

Symmetric neural networks Convolutional neural networks encode translation invariance: Illustration from medium.freecodecamp.org B. Bloem-Reddy 4 / 20

Why symmetry? in stabler training and better generalization through • reduction in dimension of parameter space through weight-tying; and • capturing structure at multiple scales via pooling. empirical evidence, loose connections to learning theory and what we “know” about high-dimensional data analysis. Some PAC theory to this end [Sha91; Sha95]; I haven’t found anything else. B. Bloem-Reddy 5 / 20 Encoding symmetry in network architecture is a Good Thing ∗ , i.e., it results ∗ Oft-stated “fact”. Mostly supported by heuristics and intuition, some

Neural networks for permutation-invariant data [Zah+17] X 1 X 2 X 3 X 4 Y B. Bloem-Reddy 6 / 20 Consider a sequence X [ n ] := ( X 1 , . . . , X n ) , X i ∈ X . Invariance: Y = h ( X [ n ] ) = h ( π · X [ n ] ) for all π ∈ S n .

Neural networks for permutation-invariant data [Zah+17] X 2 B. Bloem-Reddy n h X 4 X 3 Y 6 / 20 Y X 1 X 2 X 4 X 3 Consider a sequence X [ n ] := ( X 1 , . . . , X n ) , X i ∈ X . Invariance: Y = h ( X [ n ] ) = h ( π · X [ n ] ) for all π ∈ S n . ⇒ X 1 ( ) Y = ˜ ∑ Y = h ( X [ n ] ) �→ φ ( X i ) i = 1

Neural networks for permutation-invariant data [Zah+17] X 1 X 2 X 3 X 4 Y 1 Y 2 Y 3 Y 4 B. Bloem-Reddy 7 / 20 Consider a sequence X [ n ] := ( X 1 , . . . , X n ) , X i ∈ X . Equivariance: Y [ n ] = h ( X [ n ] ) such that h ( π · X [ n ] ) = π · h ( X [ n ] ) for all π ∈ S n .

Neural networks for permutation-invariant data [Zah+17] X 2 B. Bloem-Reddy X j n n Y 3 Y 2 Y 1 X 4 X 3 Y 4 X 1 X 3 X 1 X 2 Y 4 7 / 20 Y 3 X 4 Y 1 Y 2 Consider a sequence X [ n ] := ( X 1 , . . . , X n ) , X i ∈ X . Equivariance: Y [ n ] = h ( X [ n ] ) such that h ( π · X [ n ] ) = π · h ( X [ n ] ) for all π ∈ S n . ( ) ( ) ∑ ∑ [ h ( X [ n ] )] i = σ �→ [ h ( X [ n ] )] i = σ w 0 X i + w 1 w i , j X j j = 1 j = 1

Neural networks for permutation-invariant data . . . B. Bloem-Reddy 8 / 20

You could probably make some money making decent hats. Note to students: These were the first Google Image results for ”deep learning hat” and ”statistics hat”. B. Bloem-Reddy 9 / 20 ⟨⟨ Deep learning hat, off; statistics hat, on ⟩⟩

Statistical models and symmetry If X is assumed to satisfy a symmetry property, B. Bloem-Reddy 10 / 20 Consider a sequence X [ n ] := ( X 1 , . . . , X n ) , X i ∈ X . A statistical model of X [ n ] is a family of probability distributions on X n : P = { P θ : θ ∈ Ω } . how is P restricted?

Exchangeable sequences iid Analogous theorems for other symmetries. The book by Kallenberg [Kal05] collects many of them. Some other accessible references: [Dia88; OR15]. B. Bloem-Reddy 11 / 20 A distribution P on X n is exchangeable if P ( X 1 , . . . , X n ) = P ( X π ( 1 ) , . . . , X π ( n ) ) for all π ∈ S n . X N is infinitely exchangeable if this is true for all prefixes X [ n ] ⊂ X N , n ∈ N . de Finetti’s theorem: X N ⇐ ⇒ X i | Q ∼ Q for some random Q Our models for X N need only consist of i.i.d. distributions on X .

Finite exchangeable sequences de Finetti’s theorem may fail for finite exchangeable sequences. What else can we say? n B. Bloem-Reddy 12 / 20 The empirical measure of X [ n ] is ∑ M X [ n ] ( • ) := δ X i ( • ) . i = 1

Finite exchangeable sequences The empirical measure is sufficient : with empirical measure m . B. Bloem-Reddy 13 / 20 • | M X [ n ] = m ) = U m ( • ) , P ( X [ n ] ∈ where U m is the uniform distribution on all sequences ( x 1 , . . . , x n )

Finite exchangeable sequences The empirical measure is sufficient : with empirical measure m . d B. Bloem-Reddy 13 / 20 • | M X [ n ] = m ) = U m ( • ) , P ( X [ n ] ∈ where U m is the uniform distribution on all sequences ( x 1 , . . . , x n ) The empirical measure is adequate for any Y such that ( π · X [ n ] , Y ) = ( X [ n ] , Y ) : • | X [ n ] = x [ n ] ) = P ( Y ∈ • | M X [ n ] = M x [ n ] ) . P ( Y ∈ M X [ n ] contains all information in X [ n ] that is relevant for predicting Y .

A useful theorem Invariance theorem: d a.s., B. Bloem-Reddy 14 / 20 Suppose X [ n ] is an exchangeable sequence. ( π · X [ n ] , Y ) = ( X [ n ] , Y ) for all π ∈ S n if and only if ( X [ n ] , Y ) = ( X [ n ] , ˜ h ( η, M X [ n ] )) with ˜ h a measurable function and η ∼ Unif [ 0 , 1 ] , η ⊥ ⊥ X [ n ] .

A useful theorem X 3 B. Bloem-Reddy n h n h X 4 X 3 X 2 X 1 Y X 4 Y X 2 a.s., Invariance theorem: d X 1 14 / 20 Suppose X [ n ] is an exchangeable sequence. ( π · X [ n ] , Y ) = ( X [ n ] , Y ) for all π ∈ S n if and only if ( X [ n ] , Y ) = ( X [ n ] , ˜ h ( η, M X [ n ] )) with ˜ h a measurable function and η ∼ Unif [ 0 , 1 ] , η ⊥ ⊥ X [ n ] . Deterministic invariance [Zah+17] �→ stochastic invariance [this work] η ( ) ( ) Y = ˜ Y = ˜ ∑ ∑ φ ( X i ) �→ η, δ X i i = 1 i = 1

Another useful theorem Equivariance theorem: d a.s., B. Bloem-Reddy 15 / 20 ( π · X [ n ] , π · Y [ n ] ) = ( X [ n ] , Y [ n ] ) for all π ∈ S n if and only if X [ n ] , (˜ ( ) ( X [ n ] , Y [ n ] ) = h ( η i , X i , M X [ n ] )) i ∈ [ n ] with ˜ h a measurable function and i.i.d. η i ∼ Unif [ 0 , 1 ] , η i ⊥ ⊥ X [ n ] .

Another useful theorem Y 3 Y 4 X 1 X 2 X 3 X 4 Y 1 Y 2 Y 4 Y 2 Equivariance theorem: n X j h n n B. Bloem-Reddy Y 3 15 / 20 Y 1 d X 4 a.s., X 2 X 1 X 3 ( π · X [ n ] , π · Y [ n ] ) = ( X [ n ] , Y [ n ] ) for all π ∈ S n if and only if X [ n ] , (˜ ( ) ( X [ n ] , Y [ n ] ) = h ( η i , X i , M X [ n ] )) i ∈ [ n ] with ˜ h a measurable function and i.i.d. η i ∼ Unif [ 0 , 1 ] , η i ⊥ ⊥ X [ n ] . Deterministic equivariance [Zah+17] �→ stochastic equivariance [this work] η 1 η 2 η 3 η 4 ( ) ( ) Y i = ˜ ∑ ∑ Y i = σ w 0 X i + w 1 �→ η i , X i , δ X j j = 1 j = 1 ( ∫ ) ∑ = σ w 0 X i + w 1 δ X j ( dx ) X j = 1

Some answers other related structures. special cases. B. Bloem-Reddy 16 / 20 • Sufficiency/adequacy provides the magic. • Similar results for exchangeable graphs/arrays/tensors and some • Framework is general enough that it catches a lot of existing work as • Suggests some new (stochastic) network architectures.

Many questions analogous results? Equivariance is especially difficult. symmetry (though they typically have a set of symmetry transformations)—what are the analogous results? Are they useful? this context it amounts to the difference between deterministic invariance and distributional invariance—can we prove anything rigorous in these settings? networks is a Good Thing) on rigorous footing? B. Bloem-Reddy 17 / 20 • For group symmetries that don’t involve permutations—what are the • There are models with sufficient statistics that don’t have group • Evidence that adding noise during training has beneficial effects; in • Relatedly, can we put the “fact” (encoding symmetry in neural

Thank you. B. Bloem-Reddy 18 / 20

[Aus13] Tim Austin. “Exchangeable random arrays”. Lecture notes for IIS. 2013. url: B. Bloem-Reddy Olav Kallenberg. Probabilistic Symmetries and Invariance Principles . Springer, 2005. [Kal05] https://arxiv.org/abs/1802.05451 . Roei Herzig et al. “Mapping Images to Scene Graphs with Permutation-Invariant Structured [Her+18] Vol. 80. Proceedings of Machine Learning Research. PMLR, 2018, pp. 1914–1923. International Conference on Machine Learning . Ed. by Jennifer Dy and Andreas Krause. Jason Hartford et al. “Deep Models of Interactions Across Sets”. In: Proceedings of the 35th [Har+18] http://papers.nips.cc/paper/5424-deep-symmetry-networks.pdf . pp. 2537–2545. url: Information Processing Systems 27 . Ed. by Z. Ghahramani et al. Curran Associates, Inc., 2014, Robert Gens and Pedro M Domingos. “Deep Symmetry Networks”. In: Advances in Neural [GD14] Symposium . Ed. by F. Browder. American Mathematical Society, 1988, pp. 15–26. P. Diaconis. “Sufficiency as statistical symmetry”. In: Proceedings of the AMS Centennial [Dia88] http://proceedings.mlr.press/v48/cohenc16.html . USA: PMLR, 2016, pp. 2990–2999. url: Kilian Q. Weinberger. Vol. 48. Proceedings of Machine Learning Research. New York, New York, of The 33rd International Conference on Machine Learning . Ed. by Maria Florina Balcan and Taco Cohen and Max Welling. “Group Equivariant Convolutional Networks”. In: Proceedings [CW16] https://openreview.net/pdf?id=Hkbd5xZRb . Taco S. Cohen et al. “Spherical CNNs”. In: ICLR . 2018. url: [Coh+18] http://www.math.ucla.edu/~tim/ExchnotesforIISc.pdf . 18 / 20 Prediction”. In: (Feb. 2018). eprint: 1802.05451 . url:

Invariant neural networks and probabilistic symmetry Benjamin - PowerPoint PPT Presentation

Invariant neural networks and probabilistic symmetry Benjamin Bloem-Reddy , University of Oxford Work with Yee Whye Teh 5 October 2018, OxWaSP Workshop Deep learning and statistics settings. semi-/unsupervised domains. B. Bloem-Reddy 2 / 20

Probabilistic symmetry and invariant neural networks Benjamin Bloem-Reddy , University of Oxford

Symmetry Transforms 1 1 Motivation Symmetry is everywhere 2 Motivation Symmetry is

Symmetry properties Symmetry properties periodic potential space group crystal symmetry: point

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

Neural Networks and Handwriting Recognition Background Neural Networks Neural Network Steven

Classification of Symmetry Classification of Symmetry Classification of Symmetry Classification

Neural Networks Neural networks arise from attempts to model Neural Networks human/animal

Sequential Data with Neural Networks Recurrent Neural Networks Sequential input / output Greg

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural

Probabilistic model Probabilistic model c Probabilistic model Probabilistic model c c

Symmetry in mathematics and mathematics of symmetry Peter J. Cameron p.j.cameron@qmul.ac.uk

Symmetry in mathematics and mathematics of symmetry Peter J. Cameron p.j.cameron@qmul.ac.uk

Neural Networks and their Application to Go Neural Networks Learning Blackjack Theory Training

Neural Networks 0. Logistics Spring 2019 1 Neural Networks are taking over! Neural networks

Neural Networks 1. Introduction Fall 2017 Neural Networks are taking over! Neural networks

CHAPTER II I CHAPTER I Recurrent Neural Networks Recurrent Neural Networks CHAPTER II : I :

Computational Semantics and Pragmatics Autumn 2012 Raquel Fernndez Institute for Logic,

Rainbow Milan Straka November 19, 2018 Charles University in Prague Faculty of Mathematics and

Parametric Estimation QUAN TITATIVE RIS K MAN AGEMEN T IN P YTH ON Jamsheed Shorish

IN5550: Neural Methods in Natural Language Processing Lecture 5 Distributional hypothesis and

Learning Machines Seminars 2020-11-05 Uncertainty in deep learning Olof Mogren, PhD RISE

Outline Morning program Preliminaries Modeling user behavior Semantic matching Learning to

A Cohesion Graph Based Approach for Unsupervised Recognition of Literal and Nonliteral Use of

Economic Freedom and Public Policy: Economics as a Moral Discipline Lord Turner Chairman of the