Bayesian Learning from Sequential Data using Gaussian Processes with - PowerPoint PPT Presentation

Bayesian Learning from Sequential Data using Gaussian Processes with Signature Covariances Csaba Toth Joint work with Harald Oberhauser Mathematical Institute, University of Oxford International Conference on Machine Learning, July 2020

Overview

Overview Purpose of this work 1. Define a Gaussian process (GP) [6] over sequences/time series 2/62

Overview Purpose of this work 1. Define a Gaussian process (GP) [6] over sequences/time series ◮ To model of functions of sequences { Seq( R d ) → R } ( f x ) x ∈ Seq( R d ) ∼ GP ( m ( · ) , k ( · , · )) 3/62

Overview Purpose of this work 1. Define a Gaussian process (GP) [6] over sequences/time series ◮ To model of functions of sequences { Seq( R d ) → R } ( f x ) x ∈ Seq( R d ) ∼ GP ( m ( · ) , k ( · , · )) ◮ Find a suitable covariance kernel k : Seq( R d ) × Seq( R d ) → R 4/62

Overview Purpose of this work 1. Define a Gaussian process (GP) [6] over sequences/time series ◮ To model of functions of sequences { Seq( R d ) → R } ( f x ) x ∈ Seq( R d ) ∼ GP ( m ( · ) , k ( · , · )) ◮ Find a suitable covariance kernel k : Seq( R d ) × Seq( R d ) → R ◮ Seq( R d ) := { ( x t 1 , . . . , x t L ) | ( t i , x t i ) ∈ R + × R d , L ∈ N } 5/62

Overview Purpose of this work 1. Define a Gaussian process (GP) [6] over sequences/time series ◮ To model of functions of sequences { Seq( R d ) → R } ( f x ) x ∈ Seq( R d ) ∼ GP ( m ( · ) , k ( · , · )) ◮ Find a suitable covariance kernel k : Seq( R d ) × Seq( R d ) → R ◮ Seq( R d ) := { ( x t 1 , . . . , x t L ) | ( t i , x t i ) ∈ R + × R d , L ∈ N } 2. Develop an efficient inference framework 6/62

Overview Purpose of this work 1. Define a Gaussian process (GP) [6] over sequences/time series ◮ To model of functions of sequences { Seq( R d ) → R } ( f x ) x ∈ Seq( R d ) ∼ GP ( m ( · ) , k ( · , · )) ◮ Find a suitable covariance kernel k : Seq( R d ) × Seq( R d ) → R ◮ Seq( R d ) := { ( x t 1 , . . . , x t L ) | ( t i , x t i ) ∈ R + × R d , L ∈ N } 2. Develop an efficient inference framework ◮ Standard challenges: intractable posteriors, O ( N 3 ) scaling in training data 7/62

Overview Purpose of this work 1. Define a Gaussian process (GP) [6] over sequences/time series ◮ To model of functions of sequences { Seq( R d ) → R } ( f x ) x ∈ Seq( R d ) ∼ GP ( m ( · ) , k ( · , · )) ◮ Find a suitable covariance kernel k : Seq( R d ) × Seq( R d ) → R ◮ Seq( R d ) := { ( x t 1 , . . . , x t L ) | ( t i , x t i ) ∈ R + × R d , L ∈ N } 2. Develop an efficient inference framework ◮ Standard challenges: intractable posteriors, O ( N 3 ) scaling in training data ◮ Additional challenge: potentially very high dimensional inputs (long sequences) 8/62

Overview Suitable feature map? Signatures from stochastic analysis [2]! 9/62

Overview Suitable feature map? Signatures from stochastic analysis [2]! Can be used to transform vector-kernels into sequence-kernels 10/62

Overview Suitable feature map? Signatures from stochastic analysis [2]! Can be used to transform vector-kernels into sequence-kernels ◮ κ : R d × R d → R a kernel for vector-valued data 11/62

Overview Suitable feature map? Signatures from stochastic analysis [2]! Can be used to transform vector-kernels into sequence-kernels ◮ κ : R d × R d → R a kernel for vector-valued data ◮ [4] used signatures to define the kernel for x , y ∈ Seq( R d ) M m � σ 2 � � k( x , y ) = c ( i ) c ( j ) ∆ i l , j l κ ( x i l , y j l ) m m =0 1 ≤ i 1 < ··· < i m ≤ L x l =1 1 ≤ j 1 < ··· < j m ≤ L y for some explicitly given constants c ( i 1 , . . . , i m ) , c ( j 1 , . . . , j m ) ∆ i , j κ ( x i , y j ) = κ ( x i +1 , y j +1 ) − κ ( x i , y j +1 ) − κ ( x i +1 , y j ) + κ ( x i , y j ) 12/62

Overview Suitable feature map? Signatures from stochastic analysis [2]! Can be used to transform vector-kernels into sequence-kernels ◮ κ : R d × R d → R a kernel for vector-valued data ◮ [4] used signatures to define the kernel for x , y ∈ Seq( R d ) M m � σ 2 � � k( x , y ) = c ( i ) c ( j ) ∆ i l , j l κ ( x i l , y j l ) m m =0 1 ≤ i 1 < ··· < i m ≤ L x l =1 1 ≤ j 1 < ··· < j m ≤ L y for some explicitly given constants c ( i 1 , . . . , i m ) , c ( j 1 , . . . , j m ) ∆ i , j κ ( x i , y j ) = κ ( x i +1 , y j +1 ) − κ ( x i , y j +1 ) − κ ( x i +1 , y j ) + κ ( x i , y j ) ◮ Strong theoretical properties! 13/62

Overview Our contributions ◮ Bringing GPs and signatures together (+analysis) 14/62

Overview Our contributions ◮ Bringing GPs and signatures together (+analysis) ◮ Developing a tractable, efficient inference scheme 15/62

Overview Our contributions ◮ Bringing GPs and signatures together (+analysis) ◮ Developing a tractable, efficient inference scheme 1. Sparse VI [3]: non-conjugacy, large N ∈ N 16/62

Overview Our contributions ◮ Bringing GPs and signatures together (+analysis) ◮ Developing a tractable, efficient inference scheme 1. Sparse VI [3]: non-conjugacy, large N ∈ N 2. Inter-domain inducing points: long sequences (sup x ∈ X L x large) 17/62

Overview Our contributions ◮ Bringing GPs and signatures together (+analysis) ◮ Developing a tractable, efficient inference scheme 1. Sparse VI [3]: non-conjugacy, large N ∈ N 2. Inter-domain inducing points: long sequences (sup x ∈ X L x large) ◮ GPflow implementation, thorough experimental evaluation 18/62

Signatures

Signatures What are signatures? 19/62

Signatures What are signatures? Signatures are defined on continuous time objects, paths � � ◮ Paths( R d ) = x ∈ C ([0 , T ] , R d ) | x 0 = 0 , � x � bv < + ∞ 20/62

Signatures What are signatures? Signatures are defined on continuous time objects, paths � � ◮ Paths( R d ) = x ∈ C ([0 , T ] , R d ) | x 0 = 0 , � x � bv < + ∞ Φ m ( x ) = � 0 < t 1 < ··· < t m < T ˙ x t 1 ⊗ · · · ⊗ ˙ x t m dt 1 . . . dt m 21/62

Signatures What are signatures? Signatures are defined on continuous time objects, paths � � ◮ Paths( R d ) = x ∈ C ([0 , T ] , R d ) | x 0 = 0 , � x � bv < + ∞ Φ m ( x ) = � 0 < t 1 < ··· < t m < T ˙ x t 1 ⊗ · · · ⊗ ˙ x t m dt 1 . . . dt m Φ m ( x ) ∈ ( R d ) ⊗ m is what is known as a tensor of degree m ∈ N 22/62

Signatures What are signatures? Signatures are defined on continuous time objects, paths � � ◮ Paths( R d ) = x ∈ C ([0 , T ] , R d ) | x 0 = 0 , � x � bv < + ∞ Φ m ( x ) = � 0 < t 1 < ··· < t m < T ˙ x t 1 ⊗ · · · ⊗ ˙ x t m dt 1 . . . dt m Φ m ( x ) ∈ ( R d ) ⊗ m is what is known as a tensor of degree m ∈ N Φ( x ) = (Φ m ( x )) m ≥ 0 is an infinite collection of tensors with increasing degrees 23/62

Signatures What are signatures? Signatures are defined on continuous time objects, paths � � ◮ Paths( R d ) = x ∈ C ([0 , T ] , R d ) | x 0 = 0 , � x � bv < + ∞ Φ m ( x ) = � 0 < t 1 < ··· < t m < T ˙ x t 1 ⊗ · · · ⊗ ˙ x t m dt 1 . . . dt m Φ m ( x ) ∈ ( R d ) ⊗ m is what is known as a tensor of degree m ∈ N Φ( x ) = (Φ m ( x )) m ≥ 0 is an infinite collection of tensors with increasing degrees A generalization of polynomials for vector-valued data to paths (and sequences!) 24/62

Bayesian Learning from Sequential Data using Gaussian Processes with - PowerPoint PPT Presentation

Bayesian Learning from Sequential Data using Gaussian Processes with Signature Covariances Csaba Toth Joint work with Harald Oberhauser Mathematical Institute, University of Oxford International Conference on Machine Learning, July 2020

{Sequential Code} {Sequential Code} {Sequential Code} {Sequential Code} {Sequential Code}

Gaussian Filter The Gaussian filter 1 2 1 A Gaussian kernel gives less 1 2 4 2 weight to

Outline Intro to RL and Bayesian Learning History of Bayesian RL Model-based Bayesian

Bayesian Learning 1 Outline MLE, MAP vs. Bayesian Learning Bayesian Linear Regression

Sequential Supervised Learning Sequential Supervised Learning Many Application Problems Require

Lecture 3 Capacity of Multiuser Gaussian Channels The Gaussian uplink: 6.1 The fading

Faster Gaussian Lattice Sampling using Information Leakage Gaussian Sampling Our Work Lazy

Random Sampling Florian Schoppmann August 24, 2010 Non-Sequential Sequential Sequential with

Hardware Design with VHDL Sequential Stmts ECE 443 Sequential Statements This slide set covers

Sequential Files : Outline ! Overview ! Ordered vs. Unordered ! Physical sequential Files !

Being Bayesian About Being Bayesian About Net work St ruct ure Net work St ruct ure A Bayesian

CS440/ECE448 Lecture 15: Bayesian Inference and Bayesian Learning Slides by Svetlana Lazebnik,

Gaussian Processes Dan Cervone NYU CDS November 10, 2015 Dan Cervone (NYU CDS) Gaussian

Non-Gaussian likelihoods for Gaussian Processes Alan Saul Outline Motivation Non-Gaussian

CS70: Jean Walrand: Lecture 36. Gaussian and CLT CS70: Jean Walrand: Lecture 36. Gaussian and

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes Instructor: Arindam Banerjee

Power Signatures of High- Performance Computing Workloads Jacob Combs Chung-Hsing Hsu Jolie

Digital Signatures Good properties of hand-written signatures: 1. Signature is authentic. 2.

Anonymous Credentials: How to show credentials without compromising privacy Melissa Chase

Signature-based algorithms for computing Grbner bases over Principal Ideal Domains Maria Francis

Preview question Which of the following would have to be completely abandoned if scalable quantum

Message Authentication Codes Digital Signatures Lecture 11 Shafi Goldwasser Authentication

Lattice Attacks against Elliptic-Curve Signatures with Blinded Scalar Multiplication Dahmun

Short-term Linkable Group Signatures with Categorized Batch Verification Lukas Malina 1 , Jordi

Sambuz

Useful Links

Newsletter

Mail Us