Bayesian Learning from Sequential Data using Gaussian Processes with Signature Covariances
Csaba Toth Joint work with Harald Oberhauser Mathematical Institute, University of Oxford International Conference on Machine Learning, July 2020
Bayesian Learning from Sequential Data using Gaussian Processes with - - PowerPoint PPT Presentation
Bayesian Learning from Sequential Data using Gaussian Processes with Signature Covariances Csaba Toth Joint work with Harald Oberhauser Mathematical Institute, University of Oxford International Conference on Machine Learning, July 2020
Csaba Toth Joint work with Harald Oberhauser Mathematical Institute, University of Oxford International Conference on Machine Learning, July 2020
Purpose of this work
2/62
Purpose of this work
◮ To model of functions of sequences {Seq(Rd) → R} (fx)x∈Seq(Rd) ∼ GP(m(·), k(·, ·))
3/62
Purpose of this work
◮ To model of functions of sequences {Seq(Rd) → R} (fx)x∈Seq(Rd) ∼ GP(m(·), k(·, ·)) ◮ Find a suitable covariance kernel k : Seq(Rd) × Seq(Rd) → R
4/62
Purpose of this work
◮ To model of functions of sequences {Seq(Rd) → R} (fx)x∈Seq(Rd) ∼ GP(m(·), k(·, ·)) ◮ Find a suitable covariance kernel k : Seq(Rd) × Seq(Rd) → R ◮ Seq(Rd) := {(xt1, . . . , xtL) | (ti, xti) ∈ R+ × Rd, L ∈ N}
5/62
Purpose of this work
◮ To model of functions of sequences {Seq(Rd) → R} (fx)x∈Seq(Rd) ∼ GP(m(·), k(·, ·)) ◮ Find a suitable covariance kernel k : Seq(Rd) × Seq(Rd) → R ◮ Seq(Rd) := {(xt1, . . . , xtL) | (ti, xti) ∈ R+ × Rd, L ∈ N}
6/62
Purpose of this work
◮ To model of functions of sequences {Seq(Rd) → R} (fx)x∈Seq(Rd) ∼ GP(m(·), k(·, ·)) ◮ Find a suitable covariance kernel k : Seq(Rd) × Seq(Rd) → R ◮ Seq(Rd) := {(xt1, . . . , xtL) | (ti, xti) ∈ R+ × Rd, L ∈ N}
◮ Standard challenges: intractable posteriors, O(N3) scaling in training data
7/62
Purpose of this work
◮ To model of functions of sequences {Seq(Rd) → R} (fx)x∈Seq(Rd) ∼ GP(m(·), k(·, ·)) ◮ Find a suitable covariance kernel k : Seq(Rd) × Seq(Rd) → R ◮ Seq(Rd) := {(xt1, . . . , xtL) | (ti, xti) ∈ R+ × Rd, L ∈ N}
◮ Standard challenges: intractable posteriors, O(N3) scaling in training data ◮ Additional challenge: potentially very high dimensional inputs (long sequences)
8/62
Suitable feature map? Signatures from stochastic analysis [2]!
9/62
Suitable feature map? Signatures from stochastic analysis [2]!
Can be used to transform vector-kernels into sequence-kernels
10/62
Suitable feature map? Signatures from stochastic analysis [2]!
Can be used to transform vector-kernels into sequence-kernels
◮ κ : Rd × Rd → R a kernel for vector-valued data
11/62
Suitable feature map? Signatures from stochastic analysis [2]!
Can be used to transform vector-kernels into sequence-kernels
◮ κ : Rd × Rd → R a kernel for vector-valued data ◮ [4] used signatures to define the kernel for x, y ∈ Seq(Rd) k(x, y) =
M
σ2
m
1≤j1<···<jm≤Ly
c(i)c(j)
m
∆il,jlκ(xil, yjl) for some explicitly given constants c(i1, . . . , im), c(j1, . . . , jm) ∆i,jκ(xi, yj) = κ(xi+1, yj+1) − κ(xi, yj+1) − κ(xi+1, yj) + κ(xi, yj)
12/62
Suitable feature map? Signatures from stochastic analysis [2]!
Can be used to transform vector-kernels into sequence-kernels
◮ κ : Rd × Rd → R a kernel for vector-valued data ◮ [4] used signatures to define the kernel for x, y ∈ Seq(Rd) k(x, y) =
M
σ2
m
1≤j1<···<jm≤Ly
c(i)c(j)
m
∆il,jlκ(xil, yjl) for some explicitly given constants c(i1, . . . , im), c(j1, . . . , jm) ∆i,jκ(xi, yj) = κ(xi+1, yj+1) − κ(xi, yj+1) − κ(xi+1, yj) + κ(xi, yj) ◮ Strong theoretical properties!
13/62
Our contributions ◮ Bringing GPs and signatures together (+analysis)
14/62
Our contributions ◮ Bringing GPs and signatures together (+analysis) ◮ Developing a tractable, efficient inference scheme
15/62
Our contributions ◮ Bringing GPs and signatures together (+analysis) ◮ Developing a tractable, efficient inference scheme
16/62
Our contributions ◮ Bringing GPs and signatures together (+analysis) ◮ Developing a tractable, efficient inference scheme
17/62
Our contributions ◮ Bringing GPs and signatures together (+analysis) ◮ Developing a tractable, efficient inference scheme
◮ GPflow implementation, thorough experimental evaluation
18/62
What are signatures?
19/62
What are signatures? Signatures are defined on continuous time objects, paths ◮ Paths(Rd) =
What are signatures? Signatures are defined on continuous time objects, paths ◮ Paths(Rd) =
xt1 ⊗ · · · ⊗ ˙ xtmdt1 . . . dtm
21/62
What are signatures? Signatures are defined on continuous time objects, paths ◮ Paths(Rd) =
xt1 ⊗ · · · ⊗ ˙ xtmdt1 . . . dtm Φm(x) ∈ (Rd)⊗m is what is known as a tensor of degree m ∈ N
22/62
What are signatures? Signatures are defined on continuous time objects, paths ◮ Paths(Rd) =
xt1 ⊗ · · · ⊗ ˙ xtmdt1 . . . dtm Φm(x) ∈ (Rd)⊗m is what is known as a tensor of degree m ∈ N Φ(x) = (Φm(x))m≥0 is an infinite collection of tensors with increasing degrees
23/62
What are signatures? Signatures are defined on continuous time objects, paths ◮ Paths(Rd) =
xt1 ⊗ · · · ⊗ ˙ xtmdt1 . . . dtm Φm(x) ∈ (Rd)⊗m is what is known as a tensor of degree m ∈ N Φ(x) = (Φm(x))m≥0 is an infinite collection of tensors with increasing degrees A generalization of polynomials for vector-valued data to paths (and sequences!)
24/62