Invariances in Gaussian processes And how to learn them ST John - PowerPoint PPT Presentation

Invariances in Gaussian processes And how to learn them ST John PROWLER.io

Outline 1. What are invariances? 2. Why do we want to make use of them? 3. How can we construct invariant GPs? 4. Where invariant GPs are actually crucial 5. How can we figure out what invariances to employ? 2/53

What are invariances? Function does not change under some transformation i.e. for Can be discrete or continuous - Translation - Rotation - Reflection - Permutation 3/53

Invariance under discrete translation Periodic functions 4/53

Invariance under discrete translation 1 2 (2, 3) (3, 4) Density( ) = Density( ) 3 4 5/53 1 2 3 4 5

Invariance under discrete rotation Density of water molecules as a function of (x, y) point in plane 1/6th of the plane already predicts the function value everywhere 6/53

Invariance under reflection Solar elevation measured as function of azimuth (for different days) Left half already predicts right half 7/53

Invariance under permutation [100, 200, 1, 1, 1] 8/53

Invariance under permutation [1, 200, 1, 100, 1] 9/53

Invariance under permutation f ( ) = f ( ) f (100, 200, 1, 1, 1) = f (1, 200, 1, 100, 1) Different inputs but same function value 10/53

Invariance under permutation 2 2 E ( ) = E ( ) 1 3 3 1 11/53

Discrete symmetries 12/53

Invariance under continuous transformations Translation Rotation 13/53

Example: image classification Class label as a function of image pixel matrix Label ( ) = “cat” 14/53

Example: image classification Class label as a function of image pixel matrix 8 Label ( ) = “8” 18/53

Example: molecular energy E ( ) = E ( ) 19/53

Approximately invariant… 20/53

Approximately invariant… 6 9 21/53

2. Why do we want to use invariances? - Incorporate prior knowledge about the behaviour of a system - Physical symmetries, e.g. modelling total energy (and gradients, i.e. forces) of a set of atoms - Helps generalisation - Improved accuracy vs number of training points 22/53

Toy example 23/53

Toy example 24/53

Toy example 25/53

Toy example 26/53

Constructing invariant GPs We want a prior over functions that obey the chosen symmetry. Symmetrise the function: Can do this by a) appropriate mapping to invariant space b) sum over transformations 27/53

Permutation-invariant GPs: mapping construction 28/53

Permutation-invariant GPs: sum construction : 29/53

Invariant sum kernel 30/53

Samples from the prior 31/53

How can we generalise this? 32/53

Symmetry group Transformations can be composed : Set of all compositions of transformations is a group ; corresponds to symmetries 33/53

Orbit of x : all points reachable by transformations Example: Permutation in 2D 34/53

Examples of orbits: permutation invariance Orbit size = 2 35/53

Examples of orbits: six-fold rotation invariance Orbit size = 6 36/53

Examples of orbits: permutation and six-fold rotation Orbit size = 12 37/53

Examples of orbit: continuous rotation symmetry Uncountably infinite 38/53

Orbit of a periodic function in 1D Countably infinite … … 39/53

Constructing invariant GPs: sum revisited 40/53

Applications 41/53

Molecular modelling Time-evolution of the configuration (position of all atoms) of a system of atoms/molecules Need Potential Energy Surface (PES)! Gradients = forces (easy with GPs) 42/53

Potential Energy Surface 43/53

Modelling Potential Energy Surface Approximate as sum over k-mers (many-body expansion) Invariance to rotation/translation of local environment/k-mer Invariance under permutation of equivalent atoms 44/53

Modelling Potential Energy Surface Many-body expansion, sum over k-mers: 45/53

Modelling Potential Energy Surface Invariance to rotation/translation of local environment/k-mer: Map to interatomic distances 46/53

Modelling Potential Energy Surface Invariance under permutation of equivalent atoms: sum over them! 47/53

How can we find out if an invariance is helpful? - As usual (like another kernel hyperparameter): marginal likelihood - Unlike “regular” likelihood (equivalent to training-set RMSE): - Less overfitting - Related to generalisation 48/53

Marginal likelihood and generalisation Measures how well part of the training set predicts the other training points: = how accurately the model generalises during inference, similar to cross-validation (but differentiable) 49/53

Marginal likelihood 50/53

Summary: we have seen… How to constrain GPs to give invariant functions When invariance improves a model's generalisation When invariance increases the marginal likelihood That invariances exist in real-world problems 51/53

Questions? Next up: how to learn invariances… 52/53

Snowflake prior 53/53

Why not just data augmentation? Used in deep learning… Invariances are better: 1. Cubic scaling with number of data points vs linear scaling with invariances in prior 2. Data augmentation results in same predictive mean, but not variance 3. Invariances in the GP prior give us invariant samples 54/53

Learning Invariances with the Marginal Likelihood Mark van der Wilk PROWLER.io

We discussed… How to constrain GPs to When invariance improves 1 2 give invariant functions a model's generalisation When invariance increases That invariances exist in 3 4 the marginal likelihood real-world problems

From known invariances to learning them We previously saw that known invariances were useful to modelling . How do we exploit invariances in a problem, if we don't know them a-priori ? ● Can we learn a useful invariance from the data ? ●

Model selection Invariances in a GP are expressed in the kernel ● We use the marginal likelihood to select models ● Parameterising the orbit is all that is left ●

Parameterising orbits is hard Strict invariance requires: which we can obtain using the construction I don't know how to parameterise orbits!

From orbits to distributions We sum over an arbitrary set of points ● Take the infinite limit ● Find kernel ● I do know how to parameterise distributions!

Insensitivity We lose exact invariance… but this may be a blessing! ● &

What we will do Parameterise a distribution that describes the insensitivity ● Use this distribution to define a kernel ● Find invariance in the kernel by optimising the hyperparameters ●

Obstacles to inference For large datasets, the matrix operations of K ff becomes infeasible (O(N 3 ) time complexity) 1. 2. We may have non-Gaussian likelihoods (classification!) 3. We can't even evaluate the kernel!

Variational inference For large datasets, the matrix operations of K ff becomes infeasible (O(N 3 ) time complexity) 1. 2. We may have non-Gaussian likelihoods (classification!) 3. We can't even evaluate the kernel! Still needed for K uu and k u n

Interdomain inducing variables & Variational posterior is constructed by conditioning ● Gaussian conditioning requires covariances ●

Unbiased estimation of the kernel 2 , σ n 2 , give unbiased estimate of the ELBO! Unbiased estimates of μ n , μ n

Unbiased estimation of the kernel

Unbiased estimation of the kernel } sample (We only need to sample one set from p θ ( x a | x ), see paper for details)

What we did Parameterise a distribution that describes the insensitivity ● Use this distribution to define a kernel ● Approximate the marginal likelihood using the variational evidence lower bound (ELBO) ● Find an unbiased ELBO approximation , using unbiased estimates of the kernel ● Optimise the hyperparameters , using the gradients of the ELBO ●

Results MNIST Single model tunes ● itself automatically to multiple datasets Fire off optimisation ● Rotated MNIST and watch it go

Invariances in Gaussian processes And how to learn them ST John - PowerPoint PPT Presentation

Invariances in Gaussian processes And how to learn them ST John PROWLER.io Outline 1. What are invariances? 2. Why do we want to make use of them? 3. How can we construct invariant GPs? 4. Where invariant GPs are actually crucial 5.

Gaussian Filter The Gaussian filter 1 2 1 A Gaussian kernel gives less 1 2 4 2 weight to

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes Instructor: Arindam Banerjee

Gaussian Processes Dan Cervone NYU CDS November 10, 2015 Dan Cervone (NYU CDS) Gaussian

CMPUT 466 Introduction to Gaussian Processes Dan Lizotte The Plan Introduction to Gaussian

Non-Gaussian likelihoods for Gaussian Processes Alan Saul Outline Motivation Non-Gaussian

Lecture 3 Capacity of Multiuser Gaussian Channels The Gaussian uplink: 6.1 The fading

Proving Invariances CS256/Spring 2008 Lecture #6 Zohar Manna Definitions Recall: the

State Space Gaussian Processes with Non-Gaussian Likelihoods Hannes Nickisch 1 Arno Solin 2

Another introduction to Gaussian Processes Richard Wilkinson School of Maths and Statistics

Gaussian Processes for Big Data James Hensman joint work with Nicol o Fusi, Neil D. Lawrence

Gaussian Processes Seung-Hoon Na Chonbuk National University Gaussian Process Regression

Faster Gaussian Lattice Sampling using Information Leakage Gaussian Sampling Our Work Lazy

CS70: Jean Walrand: Lecture 36. Gaussian and CLT CS70: Jean Walrand: Lecture 36. Gaussian and

Determining the PSF over the Full FoV of LSST using Anisotropic Gaussian Processes

Scalable Gaussian Processes Zhenwen Dai Amazon September 4, 2018 @GPSS2018 Zhenwen Dai (Amazon)

Scalable Gaussian Processes Zhenwen Dai Amazon 9 September 2019 @GPSS 2019 Zhenwen Dai (Amazon)

Radial Basis Function Generated Finite Differences (RBF FD): Basic Concepts and Some

Microfluidic Sample Preparation: Opportunities, Challenges and Visual Proteomics Thomas

StackGAN Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks

New Parameter Choice Rules for Regularization with Mixed Gaussian and Poissonian Noise Elias

Phonology 9/10/2010 Key Words / Concepts Phonology vs. phonetics Phoneme vs. allophone

Modern Discrete Probability III - Stopping times and martingales Review S ebastien Roch

An h-adaptive unfitted finite element method for interface elliptic boundary value problems Eric

Reinforcement Learning in Psychology and Neuroscience with thanks to Elliot Ludvig University