Reduced-Set Models for Improving the Training and Execution Speed of - - PowerPoint PPT Presentation

reduced set models for improving the training and
SMART_READER_LITE
LIVE PREVIEW

Reduced-Set Models for Improving the Training and Execution Speed of - - PowerPoint PPT Presentation

Introduction RSKPCA: Basics RSKPCA: Applications GP-MRAC Conclusion Questions Reduced-Set Models for Improving the Training and Execution Speed of Kernel Methods Hassan A. Kingravi IVALab 1 Introduction RSKPCA: Basics RSKPCA:


slide-1
SLIDE 1

Introduction RSKPCA: Basics RSKPCA: Applications GP-MRAC Conclusion Questions

Reduced-Set Models for Improving the Training and Execution Speed of Kernel Methods

Hassan A. Kingravi

IVALab 1

slide-2
SLIDE 2

Introduction RSKPCA: Basics RSKPCA: Applications GP-MRAC Conclusion Questions

1

Introduction Kernel Methods Speeding Up Training Speeding Up Testing Research Objectives

2

RSKPCA: Basics Reduced-Set Models for the Batch Case RSKPCA Results

3

RSKPCA: Applications RSKPCA Applications Gaussian Process Regression Diffusion Maps RSKPCA Applications: Results

4

GP-MRAC Reduced-Set Models for the Online Case GP-MRAC Results

5

Conclusion Conclusions and Future Work

6

Questions

2

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95

slide-3
SLIDE 3

Introduction RSKPCA: Basics RSKPCA: Applications GP-MRAC Conclusion Questions

1

Introduction Kernel Methods Speeding Up Training Speeding Up Testing Research Objectives

2

RSKPCA: Basics Reduced-Set Models for the Batch Case RSKPCA Results

3

RSKPCA: Applications RSKPCA Applications Gaussian Process Regression Diffusion Maps RSKPCA Applications: Results

4

GP-MRAC Reduced-Set Models for the Online Case GP-MRAC Results

5

Conclusion Conclusions and Future Work

6

Questions

3

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95

slide-4
SLIDE 4

Introduction RSKPCA: Basics RSKPCA: Applications GP-MRAC Conclusion Questions

1

Introduction Kernel Methods Speeding Up Training Speeding Up Testing Research Objectives

2

RSKPCA: Basics Reduced-Set Models for the Batch Case RSKPCA Results

3

RSKPCA: Applications RSKPCA Applications Gaussian Process Regression Diffusion Maps RSKPCA Applications: Results

4

GP-MRAC Reduced-Set Models for the Online Case GP-MRAC Results

5

Conclusion Conclusions and Future Work

6

Questions

4

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.05 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.05 2

slide-5
SLIDE 5

Introduction RSKPCA: Basics RSKPCA: Applications GP-MRAC Conclusion Questions

Outline

1

Introduction Kernel Methods Speeding Up Training Speeding Up Testing Research Objectives

2

RSKPCA: Basics Reduced-Set Models for the Batch Case RSKPCA Results

3

RSKPCA: Applications RSKPCA Applications Gaussian Process Regression Diffusion Maps RSKPCA Applications: Results

4

GP-MRAC Reduced-Set Models for the Online Case GP-MRAC Results

5

Conclusion Conclusions and Future Work

6

Questions

5

slide-6
SLIDE 6

Introduction RSKPCA: Basics RSKPCA: Applications GP-MRAC Conclusion Questions

General Questions

Observations Nonparametric methods are powerful because they use all possible data Nonparametric methods are slow because they use all possible data Questions For a given class of nonparametric methods (kernel methods), is all the data necessary? How can we intelligently discard data? How do we prove these procedures are well founded, in a deterministic fashion?

6

slide-7
SLIDE 7

Introduction RSKPCA: Basics RSKPCA: Applications GP-MRAC Conclusion Questions

Kernel Methods

Kernel methods (machines) are a class of machine learning algorithms that are used to convert linear algorithms into nonlinear algorithms by the use of a feature map. Linear Learning Algorithms Classification: linear perceptron, linear support vector machine (SVM). Regression: Bayesian linear regression. Dimensionality Reduction: principal components analysis (PCA).

7

slide-8
SLIDE 8

Introduction RSKPCA: Basics RSKPCA: Applications GP-MRAC Conclusion Questions

Linear Algorithm Example

Linear algorithms limited if data has nonlinear stucture in input space. Example tries to separate data using linear perceptron; no feasible solution in input space.

−2 −1.5 −1 −0.5 0.5 1 1.5 2 −2 −1.5 −1 −0.5 0.5 1 1.5 2

(a) Original data

−2 −1.5 −1 −0.5 0.5 1 1.5 2 −18 −16 −14 −12 −10 −8 −6 −4 −2 2

(b) Perceptron solution (in green)

8

slide-9
SLIDE 9

Introduction RSKPCA: Basics RSKPCA: Applications GP-MRAC Conclusion Questions

Feature Maps

Feature maps offer possible solution. Map data using feature map ψ to higher-dimensional feature space H. If (x, y) → (x2, y 2, √ 2xy), then data linearly separable.

−2 −1.5 −1 −0.5 0.5 1 1.5 2 −2 −1.5 −1 −0.5 0.5 1 1.5 2

(a) Original data

9

slide-10
SLIDE 10

Introduction RSKPCA: Basics RSKPCA: Applications GP-MRAC Conclusion Questions

Feature Maps

Feature maps offer possible solution. Map data using feature map ψ to higher-dimensional feature space H. If (x, y) → (x2, y 2, √ 2xy), then data linearly separable.

−2 −1.5 −1 −0.5 0.5 1 1.5 2 −2 −1.5 −1 −0.5 0.5 1 1.5 2

(a) Original data (b) Mapped Data

9

slide-11
SLIDE 11

Introduction RSKPCA: Basics RSKPCA: Applications GP-MRAC Conclusion Questions

Feature Maps

Feature maps offer possible solution. Map data using feature map ψ to higher-dimensional feature space H. If (x, y) → (x2, y 2, √ 2xy), then data linearly separable.

−2 −1.5 −1 −0.5 0.5 1 1.5 2 −2 −1.5 −1 −0.5 0.5 1 1.5 2

(a) Original data (b) Mapped Data (c) Peceptron (green)

9

slide-12
SLIDE 12

Introduction RSKPCA: Basics RSKPCA: Applications GP-MRAC Conclusion Questions

Feature Maps

Feature maps offer possible solution. Map data using feature map ψ to higher-dimensional feature space H. If (x, y) → (x2, y 2, √ 2xy), then data linearly separable.

−2 −1.5 −1 −0.5 0.5 1 1.5 2 −2 −1.5 −1 −0.5 0.5 1 1.5 2

(a) Original data (b) Mapped Data (c) Peceptron (green)

Question How are feature maps generated?

9

slide-13
SLIDE 13

Introduction RSKPCA: Basics RSKPCA: Applications GP-MRAC Conclusion Questions

Kernels and Feature Maps

Issues with Feature Maps Difficult to design by hand. Many problems require arbitrary degrees of freedom.

10

slide-14
SLIDE 14

Introduction RSKPCA: Basics RSKPCA: Applications GP-MRAC Conclusion Questions

Kernels and Feature Maps

Issues with Feature Maps Difficult to design by hand. Many problems require arbitrary degrees of freedom. Solution: Kernelization

◮ If k : Ω × Ω → R positive definite symmetric kernel function , then

k(x, y) = ψ(x), ψ(y)H (1)

◮ Map obtained from operator K : L2(Ω) → L2(Ω)

(Kf )(x) :=

k(x, y)f (y)dy (2)

10

slide-15
SLIDE 15

Introduction RSKPCA: Basics RSKPCA: Applications GP-MRAC Conclusion Questions

Kernels and Feature Maps

Issues with Feature Maps Difficult to design by hand. Many problems require arbitrary degrees of freedom. Solution: Kernelization

◮ Mercer’s theorem: eigendecomposition of operator (λι, φι)N ι=1

  • rthonormal basis (ONB) of L2(Ω).

◮ Kernel: k(x, y) = N ι=1 λιφι(x)φι(x), N ∈ {N, ∞}. ◮ Feature map:

ψ :=(

  • λ1φ1(x),
  • λ2φ2(x), . . . )

k(x, y) =ψ(x), ψ(y)H

10

slide-16
SLIDE 16

Introduction RSKPCA: Basics RSKPCA: Applications GP-MRAC Conclusion Questions

Eigendecomposition in practice

Empirical Procedure for the Eigendecomposition of the Kernel Given dataset X = {xi}n

1, integral operator approximated by Gram

matrix Kij = k(xi, xj). Feature map learned via eigendecomposition K = UΛUT. (1) Eigendecomposition heart of Kernel PCA procedure. Applies to many other methods: we examine Gaussian process regression and diffusion maps.

11

slide-17
SLIDE 17

Introduction RSKPCA: Basics RSKPCA: Applications GP-MRAC Conclusion Questions

Eigendecomposition in practice

Empirical Procedure for the Eigendecomposition of the Kernel Given dataset X = {xi}n

1, integral operator approximated by Gram

matrix Kij = k(xi, xj). Feature map learned via eigendecomposition K = UΛUT. (1) Eigendecomposition heart of Kernel PCA procedure. Applies to many other methods: we examine Gaussian process regression and diffusion maps. Issues If n points, O(n3) training complexity. If n points, O(nr) testing (mapping) complexity. If n points, O(n2) space complexity. Not scalable in either training or testing.

11

slide-18
SLIDE 18

Introduction RSKPCA: Basics RSKPCA: Applications GP-MRAC Conclusion Questions

Speeding Up Training

Matrix Sparsification [GOLUB:1996] Compute K; if matrix is sparse, use techniques such as Jacobi, Arnoldi, Hebbian etc to get low-rank approximation. Issues Very accurate, but requires kernel matrix, iterative, and unsuitable for dense matrices. Random projections [AILON:2006, LIBERTY:2009] Compute K, project onto lower-dimensional subspace using random matrix, compute ONB using SVD and use to get low-rank approximation. Issues Very accurate approximations in practice, but need to compute kernel matrix and superlinear in n.

12

slide-19
SLIDE 19

Introduction RSKPCA: Basics RSKPCA: Applications GP-MRAC Conclusion Questions

Speeding Up Training

Sampling-based approach [HAR-PELED:2006, BOUTSIDIS:2009, TALWALKAR:2010] Compute K, sample r-columns using different schemes, compute low-rank approximation. Issues Accurate approximations in practice, but still need to compute kernel matrix. Nystr¨

  • m Method (CUR Decomposition) [DRINEAS:2005,ZHANG:2009]

Sample r samples from the dataset, use to compute low-rank approximation. Approaches include uniform random samping and k-means. Pros Avoids computing the kernel matrix. Tries to approximate eigenfunctions of kernel. Study [TALWALKAR:2010] has shown effectiveness of method on wide variety of real-world learning problems.

12

slide-20
SLIDE 20

Introduction RSKPCA: Basics RSKPCA: Applications GP-MRAC Conclusion Questions

Speeding Up Testing

Reduced-Set Selection [SCH¨ OLKOPF:1999] Find reduced set of expansion vectors from original dataset to approximate kernel map using preimage computations. Reduced-Set Construction [SCH¨ OLKOPF:1999] Find reduced set of expansion vectors from input space to approximate kernel map using preimage computations. Kernel Matching Pursuit [VINCENT:2002] Approximate kernel map using overcomplete dictionary of basis functions. Kernel Map Compression [ARIF:2010] Use Generalized Radial Basis Functions to approximate kernel map. Issues Kernel map needs to be available, which means kernel matrix + eigendecomposition already needs to be done.

13

slide-21
SLIDE 21

Introduction RSKPCA: Basics RSKPCA: Applications GP-MRAC Conclusion Questions

Questions Driving Research

Batch Case: Issues In existing methods, speeding up training does not speed up testing. Speeding up testing does not speed up training. Can both be done at the same time, in principled manner, using reduced-set models? Online Case: Issues Kernel methods untenable in online setting if used as designed. Can reduced-set models be used for an interesting online application (adaptive control)?

14

slide-22
SLIDE 22

Introduction RSKPCA: Basics RSKPCA: Applications GP-MRAC Conclusion Questions

Outline

1

Introduction Kernel Methods Speeding Up Training Speeding Up Testing Research Objectives

2

RSKPCA: Basics Reduced-Set Models for the Batch Case RSKPCA Results

3

RSKPCA: Applications RSKPCA Applications Gaussian Process Regression Diffusion Maps RSKPCA Applications: Results

4

GP-MRAC Reduced-Set Models for the Online Case GP-MRAC Results

5

Conclusion Conclusions and Future Work

6

Questions

15

slide-23
SLIDE 23

Introduction RSKPCA: Basics RSKPCA: Applications GP-MRAC Conclusion Questions

Kernel PCA

Main Issue Kernel machines are powerful, but requirement that all of the data be incorporated into the machine is far too stringent. Main factor limiting KM use in industrial or large-scale applications.

16

slide-24
SLIDE 24

Introduction RSKPCA: Basics RSKPCA: Applications GP-MRAC Conclusion Questions

Kernel PCA

Main Issue Kernel machines are powerful, but requirement that all of the data be incorporated into the machine is far too stringent. Main factor limiting KM use in industrial or large-scale applications. Our Contributions Show that action of kernel on probability measure determines quality of operator (kernel machine) approximation. Show deep connection between density estimation and operator approximation:

  • nce a compact model of the density is computed, the original data can be

discarded. Outline general class of minimizers for operator approximation, which points to clues regarding choice of optimizers.

16

slide-25
SLIDE 25

Introduction RSKPCA: Basics RSKPCA: Applications GP-MRAC Conclusion Questions

Kernel PCA

Ideal Operator Integral operator: (Kf )(x) :=

k(x, y)f (y)p(y)dy (2) =

ψ(x) ⊗ ψ(x)dυ(x)

  • CH

f (3) Eigenproblem:

  • D

k(x, y)p(x)φι(x)dx = λιφι(y) (4) Discretization p(x) ≈ 1 n

n

  • i=1

δ(xi, x)

  • Empirical density

⇐ ⇒ 1 n

n

  • i=1

ψ(xi) ⊗ ψ(xi)

  • Empirical covariance

CH

(5) Discrete eigenproblem: Kφi = λiφi. (6)

17

slide-26
SLIDE 26

Introduction RSKPCA: Basics RSKPCA: Applications GP-MRAC Conclusion Questions

Quantized KPCA

Observation: can replace empirical operator with quantized version Quantized operator: CH = 1 n

n

  • i=1

ψ(cϑ(i)) ⊗ ψ(cϑ(i)), (7) where ϑ : {1, . . . , n} → {1, . . . , m} is data-to-center mapping. Discretization p(x) ≈ 1 n

m

  • i=1

wiδ(ci, x)

  • Empirical density

⇐ ⇒ 1 n

n

  • i=1

ψ(cϑ(i)) ⊗ ψ(cϑ(i))

  • Quantized operator

CH

(8) Discrete eigenproblem:

  • W

1 2 K CW 1 2

  • ˜

φi = λi ˜ φi. (9) Our Approach If empirical density p(x) can be compressed to Reduced-Set Density Estimate (RSDE)

  • p(x) with minimal effect on eigenvalues + eigenfunctions, potentially large savings in

computation.

18

slide-27
SLIDE 27

Introduction RSKPCA: Basics RSKPCA: Applications GP-MRAC Conclusion Questions

Reduced-Set KPCA: Algorithm

Algorithm 1 Reduced-Set KPCA

Apply a reduced-set density estimator to X to compute C = {c1, . . . , cm} and w = {w1, . . . , wm}. Create diagonal matrix W = diag(√w1, . . . , √wm). Compute weighted kernel matrix

  • K ∈ Rm×m,
  • K := WK CW

where K C

ij := k(ci, cj).

Perform eigenvector decomposition K = UΛUT Reweight eigenvectors U = W 1/2U. Complexity Training: O(m3), Testing: O(mr), Space: O(m2).

19

slide-28
SLIDE 28

Introduction RSKPCA: Basics RSKPCA: Applications GP-MRAC Conclusion Questions

RSKPCA: When is Quantization Appropriate?

When densities can be compressed. Smoothing kernel Kernel Density Estimate: (K IΩ)(x) ≈ 1 n

n

  • i=1

k(xi, x). (10) Depending on structure of kernel, KDE can be highly redundant; e.g., Gaussian kernel with large bandwidth. RSDE In this case, can compute density estimator of smaller cardinality. Reduced-Set Density Estimate: p(x) = 1 n

m

  • i=1

wik(ci, x), m ≪ n. (11)

20

slide-29
SLIDE 29

Introduction RSKPCA: Basics RSKPCA: Applications GP-MRAC Conclusion Questions

Nystr¨

  • m KPCA

0.5 1 1.5 2 2.5

Data

Selection Procedure

0.5 1 1.5 2 2.5

Centers+Data U Coefficients

Eigendecomposition

E × Km Low-rank matrix

Figure: Use subset of data in conjunction with full set for approximations.

21

slide-30
SLIDE 30

Introduction RSKPCA: Basics RSKPCA: Applications GP-MRAC Conclusion Questions

Reduced-Set Model KPCA

0.5 1 1.5 2 2.5 0.1 0.2 0.3 0.4 0.5

Data (kernel density)

Reduced-Set Density Estimator

0.5 1 1.5 2 2.5 0.1 0.2 0.3 0.4 0.5

RSDE

  • U

Coefficients

Eigendecomposition

  • K

Weighted matrix

Figure: Different approach philosophically: use RSDE as generator for approximations.

22

slide-31
SLIDE 31

Introduction RSKPCA: Basics RSKPCA: Applications GP-MRAC Conclusion Questions

RSKPCA: Approximation Error Analysis

Need to ensure density and eigenfunctions approximated well. Kernel Definition k(x, y) = ϕ x − yp σp

  • (12)

Maximum Mean Discrepancy (MMD) MMD(X, Y)2

b :=

  • n
  • i=1

1 n ψ(xi) −

n

  • i=1

1 n ψ(yi)

  • 2

H

, (13) Extrapolation operator Let ψ(xi) = kxi := k(·, xi), and define operators CH, CH : H → H via tensor products

  • CH := 1

n

n

  • i=1

ψ(xi) ⊗ ψ(xi). (14) Measure operator approximation error in terms of Hilbert-Schmidt norm · HS.

23

slide-32
SLIDE 32

Introduction RSKPCA: Basics RSKPCA: Applications GP-MRAC Conclusion Questions

RSKPCA: Approximation Error Analysis (Shadow Density)

Previously presented simple shadow density algorithm that allows computation of closed-form bounds, in terms of parameter ℓ. Theorem (MMD Worst Case Bound (Shadow)) Let n be the number of samples, X be defined as above, C be the quantized dataset, and let k satisfy (12). Then MMD(X, C )b ≤

  • 2
  • κ − ϕ

1 ℓp

  • .

(15) Theorem (Hilbert-Schmidt Worst Case Bound (Shadow)) Let Kn and K n be defined using extrapolation operators with X and C, respectively. Then

  • CH −

CH

  • HS ≤ 2κ
  • 2
  • κ − ϕ

1 ℓp

  • .

(16)

24

slide-33
SLIDE 33

Introduction RSKPCA: Basics RSKPCA: Applications GP-MRAC Conclusion Questions

RSKPCA: Approximation Error Analysis (Generic)

Theorem (MMD Worst Case Bound (Generic)) Let n be the number of samples, and X and C be defined as above. Then MMD(X, C)b ≤ 1 n

m

  • j=1

√ 2wj

  • κ − k(xmaxj , cj)
  • ,

(17) where κ is the maximum value of the kernel, xmaxj = arg minxi ∈Sj k(xi, cj), and Sj represents the set of points assigned to center cj. Theorem (Hilbert-Schmidt Worst Case Bound (Generic)) Let CH and CH be defined using tensor products in H with X and C, as above. Then

  • CH −

CH

  • HS ≤ 1

n

m

  • j=1

wj

  • 2
  • κ − k(xmaxj , cj)
  • .

(18)

25

slide-34
SLIDE 34

Introduction RSKPCA: Basics RSKPCA: Applications GP-MRAC Conclusion Questions

RSKPCA Eigenembedding Results: german

[Kingravi, Vela, Gray – SIAM 2013]

Validation Procedure 1

◮ Goal: compare RSKPCA with Nystr¨

  • m and density-weighted Nystr¨
  • m.

◮ Run KPCA procedure: take dataset, learn eigenfunctions via eigendecomposition

  • f kernel matrix.

◮ Project data onto new eigenspace. ◮ Do same for uniform KPCA, RSKPCA, Nystr¨

  • m and WNystr¨
  • m. Compare

embeddings and eigenvalues. Validation Procedure 2

◮ Do same as 1, but this time try to classify data in new eigenspace using

k-nearest neighbors, with different center-selection schemes.

26

slide-35
SLIDE 35

Introduction RSKPCA: Basics RSKPCA: Applications GP-MRAC Conclusion Questions

RSKPCA Eigenembedding Results: german

[Kingravi, Vela, Gray – SIAM 2013]

3 3.5 4 4.5 10 10

1

10

2

Speedup shadows nystrom wnystrom unif

(a) Tr. speedup

3 3.5 4 4.5 10

1

Speedup shadows nystrom wnystrom unif

(b) Te. speedup

3 3.5 4 4.5 10

1

10

2

10

3

Error shadows nystrom wnystrom unif

(c) Eigenvalue Deviation

3 3.5 4 4.5 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1

Error shadows nystrom wnystrom unif

(d) Embedding Accuracy

27

slide-36
SLIDE 36

Introduction RSKPCA: Basics RSKPCA: Applications GP-MRAC Conclusion Questions

RSKPCA Center-Picking Results: usps

[Kingravi, Vela, Gray – SIAM 2013]

3 3.5 4 4.5 5 10

−1

10 10

1

10

2

10

3

Speedup shadows herding paring kmeans

(a) Tr. speedup

3 3.5 4 4.5 5 10 10

1

Speedup shadows herding paring kmeans

(b) Te. speedup

3 3.5 4 4.5 5 10 10

1

10

2

Speedup shadows herding paring kmeans

(c) Total speedup

3 3.5 4 4.5 0.86 0.88 0.9 0.92 0.94 0.96 0.98 1

Accuracy none shadows herding paring kmeans

(d) Accuracy

28

slide-37
SLIDE 37

Introduction RSKPCA: Basics RSKPCA: Applications GP-MRAC Conclusion Questions

RSKPCA Conclusions

Summary of Contributions and Take-Home Message Showed that for KPCA, operator approximation can be arrived at via density estimation. Proved bounds that provide clues to optimize approximation appropriately. End result is a simple and practical method, which speeds up training and testing simultaneously.

29

slide-38
SLIDE 38

Introduction RSKPCA: Basics RSKPCA: Applications GP-MRAC Conclusion Questions

Outline

1

Introduction Kernel Methods Speeding Up Training Speeding Up Testing Research Objectives

2

RSKPCA: Basics Reduced-Set Models for the Batch Case RSKPCA Results

3

RSKPCA: Applications RSKPCA Applications Gaussian Process Regression Diffusion Maps RSKPCA Applications: Results

4

GP-MRAC Reduced-Set Models for the Online Case GP-MRAC Results

5

Conclusion Conclusions and Future Work

6

Questions

30

slide-39
SLIDE 39

Introduction RSKPCA: Basics RSKPCA: Applications GP-MRAC Conclusion Questions

RSKPCA Applications

Main Issue Other kernel algorithms can be reformulated as either KPCA on modified kernel matrices, or require feature map induced by integral operator. Still suffer from O(n3) training and O(n) mapping complexity.

31

slide-40
SLIDE 40

Introduction RSKPCA: Basics RSKPCA: Applications GP-MRAC Conclusion Questions

RSKPCA Applications

Main Issue Other kernel algorithms can be reformulated as either KPCA on modified kernel matrices, or require feature map induced by integral operator. Still suffer from O(n3) training and O(n) mapping complexity. Our Contributions Modify work in previous section to create approximation algorithms for kernel methods in two domains:

1

Regression: Gaussian process regression.

2

Clustering: Diffusion maps.

Prove bounds on approximation error, and show that bound relate to operator approximation error.

31

slide-41
SLIDE 41

Introduction RSKPCA: Basics RSKPCA: Applications GP-MRAC Conclusion Questions

Gaussian Process Regression

GPR Outline Method to perform Bayesian linear regression in feature space H. If zero mean prior put on space, the covariance between points given by kernel matrix K, and

  • bservations y corrupted by i.i.d N(0, ω2), then it can be shown that posterior

distribution (conditioned on observations) at point x∗ given by N

  • kT

x∗(K + ω2I)−1y, k(x∗, x∗) − kT x∗(K + ω2I)−1kx∗

  • .

(19) Storage and inversion of K + ω2I main bottleneck in computations. Define h(x∗) := kT

x∗(K + ω2I)−1y.

Our solution Create compact kernel matrix in manner similar to RSKPCA, and compute reduced-set approximation to map.

32

slide-42
SLIDE 42

Introduction RSKPCA: Basics RSKPCA: Applications GP-MRAC Conclusion Questions

Reduced-Set Gaussian Process Regression

Algorithm 2 Reduced-Set Gaussian Process Regression

Input: Initial data Z = X × Y = {(x1, y1), . . . , (xn, yn)}, noise parameter ω2. Procedure: 1) Apply RSKPCA to get

  • K + ω2I

−1 , W = {w1, . . . , wm}, C = {c1, . . . , cm}, and D = {d1, . . . , dm}. 2) Compute coefficients α = U(Λ + ω2I)−1 UT d, where d ∈ Rm is a column vector representation of D. Output: Compute approximate GP mean as

  • h(x) :=

m

  • i=1

αik(ci, x), (20) and approximate posterior variance as

  • ν(x) := k(x, x) − kT

x

U(Λ + ω2I)−1 UT kx. (21)

33

slide-43
SLIDE 43

Introduction RSKPCA: Basics RSKPCA: Applications GP-MRAC Conclusion Questions

Gaussian Process Regression

Following theorem shows efficacy of GP regression approximation algorithm. Theorem Let h and h be defined as above, supx′∈Ω k(x′, x) ≤ κ, and y∞ ≤ M. Then

  • h(x) −

h(x)

  • ≤ κM

nω4

  • Υ2 + 4κ2nΥ + √nω2Υ
  • ,

(22) where Υ :=

m

  • j=1

wj

  • 2
  • κ − k(x′

j , cj)

  • ,

(23) k(x′

j , cj) represents the smallest value that can occur over all possible values of xj in

the center sets Si, and ω2 represents the variance of the i.i.d. noise in the observations yi.

34

slide-44
SLIDE 44

Introduction RSKPCA: Basics RSKPCA: Applications GP-MRAC Conclusion Questions

Gaussian Process Regression

Following theorem shows efficacy of GP regression approximation algorithm. Theorem Let h and h be defined as above, supx′∈Ω k(x′, x) ≤ κ, and y∞ ≤ M. Then

  • h(x) −

h(x)

  • ≤ κM

nω4

  • Υ2 + 4κ2nΥ + √nω2Υ
  • ,

(22) where Υ :=

m

  • j=1

wj

  • 2
  • κ − k(x′

j , cj)

  • ,

(23) k(x′

j , cj) represents the smallest value that can occur over all possible values of xj in

the center sets Si, and ω2 represents the variance of the i.i.d. noise in the observations yi. NOTE Υ is the operator approximation error!

34

slide-45
SLIDE 45

Introduction RSKPCA: Basics RSKPCA: Applications GP-MRAC Conclusion Questions

Diffusion Maps

Diffusion Maps Manifold learning algorithm where goal is to use matrix A(x, y) = kζ(x, y) pζ(x)pζ(y) , (24) to learn diffusion distance Dt(x, y)2 := Mt

f (x|·) − Mt f (y|·)2,

where Mt

f (x|y) =

  • ι≥0

λt

ιψι(x)ϕι(y),

where ϕ(x) eigenfunctions of A and ψι(x) = ϕι(x)/pζ(x). Issue Suffers from O(n3) training complexity.

35

slide-46
SLIDE 46

Introduction RSKPCA: Basics RSKPCA: Applications GP-MRAC Conclusion Questions

Diffusion Maps

Diffusion Maps Manifold learning algorithm where goal is to use matrix A(x, y) = kζ(x, y) pζ(x)pζ(y) , (24) to learn diffusion distance Dt(x, y)2 := Mt

f (x|·) − Mt f (y|·)2,

where Mt

f (x|y) =

  • ι≥0

λt

ιψι(x)ϕι(y),

where ϕ(x) eigenfunctions of A and ψι(x) = ϕι(x)/pζ(x). Solution Create compact kernel matrix in manner similar to RSKPCA, and compute reduced-set approximation to map.

35

slide-47
SLIDE 47

Introduction RSKPCA: Basics RSKPCA: Applications GP-MRAC Conclusion Questions

Diffusion Maps: Definitions

Feature Map Consider symmetric kernel in computations, and modified feature map χ(x) := ψ(x)

  • pζ(x)

(25) Covariance Operator Given by

  • D(n)

Hζ := 1

n

n

  • i=1
  • χ(xi) ⊗

χ(xi), (26) an empirical approximation of an ideal covariance operator DHζ , where

  • χ(x) :=

ψ(x) n

ι=1 kζ(x, xι)

. (27) 36

slide-48
SLIDE 48

Introduction RSKPCA: Basics RSKPCA: Applications GP-MRAC Conclusion Questions

Diffusion Maps: Reduced-Set Approximation

Algorithm 3 Reduced-Set Diffusion Map

Input: Dataset X = {x1, . . . , xn}, rank k + 1, bandwidth ζ, and no. of centers m. Procedure: 1) Compute RSDE C = {c1, . . . , cm} and w = {w1, . . . , wm}. 2) Compute normalized weighted kernel matrix

  • A ∈ Rm×m,
  • Aij :=

k(ci, cj) ( n

ι=1

√wιk(cι, ci) n

ι=1

√wιk(xι, cj)) . 3) Perform eigenvector decomposition A ˜ φi = λi ˜ φi, and reweight to get the eigenvectors

  • φi = W −1/2 ˜

φi. Finally, compute

  • ψi(j) =
  • φi(j)

m

ι=1 wιk(cι, cj)

. Output: Compute the diffusion embedding

  • Ψt(xj) :=

  λt

1

ψ1(j) · · · λt

k

ψk(j)   . (28) 37

slide-49
SLIDE 49

Introduction RSKPCA: Basics RSKPCA: Applications GP-MRAC Conclusion Questions

Diffusion Maps: MMD Error

Theorem The rank-1 vectors associated with the diffusion operator are

  • χ(xi) :=

ψ(xi) n

j=1 ψ(xj), ψ(xi)

  • H

. Then the distance in MMD between these vectors and the RS approximations is MMDdiffusion(X, C) ≤ √κ n

m

  • j=1
  • xi ∈Sj
  • ρi + ρ+

i

ρiρ+

i

, (29) where ρi := Cζ

  • m
  • j=1

wjkζ(ci, cj), ρ+

i

:= Cζ

  • m
  • j=1

wjk+(ci, cj), (30) and kζ(x, y) := exp

  • −x − y2

  • ,

k+(x, y) := exp

  • −x − y2

  • exp
  • − ci − cj

  • .

(31) 38

slide-50
SLIDE 50

Introduction RSKPCA: Basics RSKPCA: Applications GP-MRAC Conclusion Questions

Diffusion Map: Hilbert-Schmidt Error

Theorem Let the covariance operator D(n)

Hζ associated to Ms be defined by (26), and the

reduced-set covariance operator D(n)

Hζ be defined similarly by Algorithm 3. Then

  • D(n)

Hζ −

D(n)

  • HS ≤ 1

n

m

  • j=1
  • xi ∈Sj

κ(ρi + ρ+

i )

ρiρ+

i

  • κρ−

i (ρi + ρ+ i ) − 2ρ+ i

ρiρ+

i ρ− i

(32) Note: upper bound maximizers can be different! Culprit is nonconstant divisor.

39

slide-51
SLIDE 51

Introduction RSKPCA: Basics RSKPCA: Applications GP-MRAC Conclusion Questions

GPR Results: concrete

[Kingravi, Zhang, Vela, Gray – NC 2014]

50 100 150 200 20 40 60 80 100

RMSE

baseline reduced set nystrom

(a) RMSE Error

50 100 150 200 10 10

1

Speedup

reduced set nystrom

(b) Tr. speedup

50 100 150 200 10

Speedup

reduced set nystrom

(c) Te. speedup

50 100 150 200 2 4 6 8 10 12 14 16 18

Retained

(d) Retention

40

slide-52
SLIDE 52

Introduction RSKPCA: Basics RSKPCA: Applications GP-MRAC Conclusion Questions

Diffusion Map Results: swissroll

[Kingravi, Zhang, Vela, Gray – NC 2014]

−0.25 −0.2 −0.15 −0.1 −0.05 0.05 0.1 0.15 0.2 0.25 −0.25 −0.2 −0.15 −0.1 −0.05 0.05 0.1 0.15 0.2 0.25

(a) Full embedding at t = 1

−0.25 −0.2 −0.15 −0.1 −0.05 0.05 0.1 0.15 0.2 0.25 −0.25 −0.2 −0.15 −0.1 −0.05 0.05 0.1 0.15 0.2 0.25

(b) RS embedding at t = 1

−6 −4 −2 2 4 6 x 10

−3

−4 −2 2 4 6 x 10

−3

(c) Full embedding at t = 5

−6 −4 −2 2 4 6 8 x 10

−3

−6 −4 −2 2 4 6 x 10

−3

(d) RS embedding at t = 5

41

slide-53
SLIDE 53

Introduction RSKPCA: Basics RSKPCA: Applications GP-MRAC Conclusion Questions

RSKPCA Applications Conclusions

Summary of Contributions and Take-Home Message Showed how RSKPCA can be extended for regression and clustering. Proved bounds that provide clues to optimize approximation appropriately. Showed once again that operator approximation completely dependent on learning appropriate density estimate.

42

slide-54
SLIDE 54

Introduction RSKPCA: Basics RSKPCA: Applications GP-MRAC Conclusion Questions

Outline

1

Introduction Kernel Methods Speeding Up Training Speeding Up Testing Research Objectives

2

RSKPCA: Basics Reduced-Set Models for the Batch Case RSKPCA Results

3

RSKPCA: Applications RSKPCA Applications Gaussian Process Regression Diffusion Maps RSKPCA Applications: Results

4

GP-MRAC Reduced-Set Models for the Online Case GP-MRAC Results

5

Conclusion Conclusions and Future Work

6

Questions

43

slide-55
SLIDE 55

Introduction RSKPCA: Basics RSKPCA: Applications GP-MRAC Conclusion Questions

Reduced-Set Models for the Online Case: Neuroadaptive control

Adaptive Control Make (uncertain) dynamical system behave like an idealized

  • ne driven by a known reference.

Plant ˙ x = f (x, u). (33) Here, x ∈ Rn system states, u ∈ Rn control inputs, controllable, f s.t. unique solution possible. Model-Reference Adaptive Control (MRAC) Design control law u(t) s.t. plant tracks reference model ˙ xrm = frm(xrm, r). (34) Here, xrm ∈ Rn system states, r ∈ Rn exogenous inputs, frm ∈ C 1 and BIBO stable.

44

0.5 1 1.5 2 2.5 −1 −0.5 0.5 1 1.5 2

slide-56
SLIDE 56

Introduction RSKPCA: Basics RSKPCA: Applications GP-MRAC Conclusion Questions

Approximate Model Inversion

Plant ˙ x1(t) = x2(t), ˙ x2(t) = f (x(t)) + b(x(t))u(t). (35) Reference ˙ x1rm = x2rm, (36) ˙ x2rm = frm(xrm, r) Design pseudo-control v ∈ Rn s.t. x → xrm. Dynamic Inversion u = ˆ b−1(x)(ν − ˆ f (x)). (37) Modeling Error ˙ x2 = ν(¯ x) + ∆(¯ x) (38) Tracking Error Tracking error: e(t) = xrm(t) − x(t) (39) Pseudocontrol: ν = νrm + νpd − νad (40)

45

slide-57
SLIDE 57

Introduction RSKPCA: Basics RSKPCA: Applications GP-MRAC Conclusion Questions

Approximate Model Inversion

Plant ˙ x1(t) = x2(t), ˙ x2(t) = f (x(t)) + b(x(t))u(t). (35) Reference ˙ x1rm = x2rm, (36) ˙ x2rm = frm(xrm, r) Design pseudo-control v ∈ Rn s.t. x → xrm. Dynamic Inversion u = ˆ b−1(x)(ν − ˆ f (x)). (37) Modeling Error ˙ x2 = ν(¯ x) + ∆(¯ x) (38) Tracking Error Dynamics ˙ e = Ae + B[νad(¯ x) − ∆(¯ x)], (39) where A =

  • I

−K1 −K2

  • , B =
  • I
  • .

45

slide-58
SLIDE 58

Introduction RSKPCA: Basics RSKPCA: Applications GP-MRAC Conclusion Questions

RBF Networks

Neuroadaptive control Model ∆ as ∆(x, u) = W ∗T ϕ(x, u), with predefined centers. Weight law: ˆ W (t) = Γϕ(x(t), u(t))eT (t)PB. Minimizes instantaneous error. ∆(x, u) ∈ R ϕ(x, u) ∈ R2 W ∈ R2×3 (x, u) ∈ R3

46

slide-59
SLIDE 59

Introduction RSKPCA: Basics RSKPCA: Applications GP-MRAC Conclusion Questions

RBF Networks

Neuroadaptive control Model ∆ as ∆(x, u) = W ∗T ϕ(x, u), with predefined centers. Weight law: ˆ W (t) = Γϕ(x(t), u(t))eT (t)PB. Minimizes instantaneous error. Required: Persistency of Excitation Bounded signal x(t) persistently exciting if ∀t > t0, there is a T > 0 s.t. t+T

t

x(τ)xT (τ)dτ > γI, γ ∈ R+. (40) If ϕ(x, u) is PE, law guarantees uniform ultimate boundedness around 0. Hard to

  • check. Usually assumed!

46

slide-60
SLIDE 60

Introduction RSKPCA: Basics RSKPCA: Applications GP-MRAC Conclusion Questions

Previous Solutions for Neuroadaptive Control

‘Classical laws’: σ-mod, e-mod, projection [TAO:2003] Simple modifications of weight law guarantee bounded weights even if not PE. ♠ ♣ Q-mod [VOLYANSKYY:2009] Integral of tracking error drives weights to hyper-surface containing weights (*). ♠ Instantaneous error tracking [NARDI:2000] Move centers to minimize error. ♣ Concurrent Learning [CHOWDHARY:2010] Record states when they were exciting; then augmented law removes PE need. ♠ Limitations ♠ Needs domain knowledge to place centers. ♣ Doesn’t learn model error.

47

slide-61
SLIDE 61

Introduction RSKPCA: Basics RSKPCA: Applications GP-MRAC Conclusion Questions

Our (Previous) Solution

[Kingravi, Chowhdary, Vela, Johnson – TNN 2012]

Observation: RBFs are kernel machines Previous example maps R2 → R1; a hand-crafted feature map! Idea

◮ Treat as machine learning problem: goal is to learn model error. ◮ Exploit RKHS structure to pick centers appropriately. ◮ Use recorded data (concurrent learning) to guarantee boundedness of weights

(BRK-CL algorithm) ˙ W = −ΓW  ϕk(¯ x)eT PB −

p

  • j=1

ϕk(¯ xj)ǫkT

j

  . (41) BKR-CL Algorithm Removes need for knowledge of domain.

48

slide-62
SLIDE 62

Introduction RSKPCA: Basics RSKPCA: Applications GP-MRAC Conclusion Questions

Our (Previous) Solution

[Kingravi, Chowhdary, Vela, Johnson – TNN 2012]

Observation: RBFs are kernel machines Previous example maps R2 → R1; a hand-crafted feature map! Idea

◮ Treat as machine learning problem: goal is to learn model error. ◮ Exploit RKHS structure to pick centers appropriately. ◮ Use recorded data (concurrent learning) to guarantee boundedness of weights

(BRK-CL algorithm) ˙ W = −ΓW  ϕk(¯ x)eT PB −

p

  • j=1

ϕk(¯ xj)ǫkT

j

  . (41) BKR-CL Algorithm Still needs to record states when persistently exciting (stack). Can be slow due to SVD.

48

slide-63
SLIDE 63

Introduction RSKPCA: Basics RSKPCA: Applications GP-MRAC Conclusion Questions

New Approach

Observation: model error as realization of stochastic process

◮ Inherent way to handle noise. ◮ If computed appropriately, removes need to record data.

Solution: Gaussian process regression Natural RKHS extension of previous work. We employ variety of online GP regression algorithms to learn modeling error. Allows for budgeted inference, and dispenses with CL data stack.

49

slide-64
SLIDE 64

Introduction RSKPCA: Basics RSKPCA: Applications GP-MRAC Conclusion Questions

New Approach

Observation: model error as realization of stochastic process

◮ Inherent way to handle noise. ◮ If computed appropriately, removes need to record data.

Solution: Gaussian process regression Uncertainty modeled as ∆(¯ x(t)) ∼ GP(m(¯ x(t)), k(¯ x(t), ¯ x(t′))). (42) Adaptive signal process: ¯ νad(z) ∼ GP( ˆ mσ(z), k(z, z′)). (43) In practice, set νad(z) = ˆ mσ(z).

49

slide-65
SLIDE 65

Introduction RSKPCA: Basics RSKPCA: Applications GP-MRAC Conclusion Questions

GP-MRAC Algorithm

Algorithm 4 The Gaussian Process - Model Reference Adaptive Control (GP-MRAC) algorithm while new measurements are available do

  • btain measurement z(t)

if k(zi, z(t)) has information then add to basis vector set end if if maximum number of points to be stored (pmax) reached then remove least informative basis vector end if predict posterior mean ˆ mt+1 predict posterior covariance ˆ Σt+1 set output of the adaptive element to be ˆ mt+1 calculate pseudo control ν. end while

50

slide-66
SLIDE 66

Introduction RSKPCA: Basics RSKPCA: Applications GP-MRAC Conclusion Questions

Stability Result

Theorem (Global Approximation Theorem) Let h and h be the outputs of GPs learned from the full data and the approximation algorithm respectively, let supx′∈Ω k(x′, x) ≤ κ, and let y∞ ≤ M. Then ǫσ

m(¯

x) ≤ 2κ2M√kmax ω4 + κkmaxM ω2 , (44) where kmax := maxi

  • ψ(¯

xi) − ψ(cϑ(i))

  • H is the greatest kernel approximation error.

Theorem Consider the system and the control law above, and assume that the uncertainty ∆(¯ x) is representable by a Gaussian process. Assume that ǫσ

m(¯

x0) is bounded for the initial state ¯

  • x0. Then the adaptive signal νad(¯

x) = ˆ mσ(z) guarantees that the system is mean square uniformly ultimately bounded a.s.

51

slide-67
SLIDE 67

Introduction RSKPCA: Basics RSKPCA: Applications GP-MRAC Conclusion Questions

GP-MRAC Results: Tracking

[Chowhdary, Kingravi, How, Vela – TNN 2014 (submitted)]

Wing-rock Consider nonlinear system ˙ θ = p (45) ˙ p = Lδaδa + ∆(x), (46) where ∆(x) = W ∗

0 + W ∗ 1 θ + W ∗ 2 p + W ∗ 3 |θ|p + W ∗ 4 |p|p + W ∗ 5 θ3

(47) and Lδa = 3. Wish to control this system, and learn model error.

52

slide-68
SLIDE 68

Introduction RSKPCA: Basics RSKPCA: Applications GP-MRAC Conclusion Questions

GP-MRAC Results: Tracking

[Chowhdary, Kingravi, How, Vela – TNN 2014 (submitted)]

10 20 30 40 50 −2 −1 1 time (seconds) e (deg) Position Error 10 20 30 40 50 −4 −2 2 4 time (seconds) ˙ e (deg/s) Angular Rate Error proj OP KL proj OP KL

(a) Within Domain

10 20 30 40 50 −1 −0.5 0.5 1 time (seconds) e (deg) Position Error 10 20 30 40 50 −4 −2 2 time (seconds) ˙ e (deg/s) Angular Rate Error proj OP KL proj OP KL

(b) Outside Domain

Figure: Comparison of tracking error when using GP regression based MRAC and RBFN-MRAC with the projection operator and uniformly distributed centers over their respective domains.

53

slide-69
SLIDE 69

Introduction RSKPCA: Basics RSKPCA: Applications GP-MRAC Conclusion Questions

GP-MRAC Results: Fourier Transform Energy

[Chowhdary, Kingravi, How, Vela – TNN 2014 (submitted)]

−0.03 −0.02 −0.01 0.01 0.02 0.03 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 Frequency Amplitude proj OP KL

(a) Within Domain

−0.06 −0.04 −0.02 0.02 0.04 0.06 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 Frequency Amplitude proj OP KL

(b) Outside Domain

Figure: Energy of the spectra of the error between the adaptive element output and the actual uncertainty. This figure quantifies the greater number of

  • scillations while tracking in RBF MRAC.

54

slide-70
SLIDE 70

Introduction RSKPCA: Basics RSKPCA: Applications GP-MRAC Conclusion Questions

GP-MRAC Results: Long Term Learning

[Chowhdary, Kingravi, How, Vela – TNN 2014 (submitted)]

10 20 30 40 50 −6 −4 −2 2 4 6 8 time(seconds) νad ∆ proj OP KL

(a) Within Domain

10 20 30 40 50 −6 −4 −2 2 4 6 8 time(seconds) νad ∆ proj OP KL

(b) Outside Domain

Figure: Comparison of uncertainty tracking after the models are learned and the weights are frozen. As can be seen, the locality of the proj operator and OP controllers precludes true learning upon the domain.

55

slide-71
SLIDE 71

Introduction RSKPCA: Basics RSKPCA: Applications GP-MRAC Conclusion Questions

GP-MRAC Conclusions

Summary of Contributions and Take-Home Message Showed that kernel methods can be used in challenging online problem (nonlinear adaptive control). Proved (stochastic) stability results, ensuring controllers well-founded. Showed that unlike previous work, GP-MRAC actually learns the model error, without resorting to any PE condition, making controller useful for many real-world scenarios.

56

slide-72
SLIDE 72

Introduction RSKPCA: Basics RSKPCA: Applications GP-MRAC Conclusion Questions

Outline

1

Introduction Kernel Methods Speeding Up Training Speeding Up Testing Research Objectives

2

RSKPCA: Basics Reduced-Set Models for the Batch Case RSKPCA Results

3

RSKPCA: Applications RSKPCA Applications Gaussian Process Regression Diffusion Maps RSKPCA Applications: Results

4

GP-MRAC Reduced-Set Models for the Online Case GP-MRAC Results

5

Conclusion Conclusions and Future Work

6

Questions

57

slide-73
SLIDE 73

Introduction RSKPCA: Basics RSKPCA: Applications GP-MRAC Conclusion Questions

Conclusions and Future Work

Summary of Contributions Demonstrated that a large class of kernel methods can be sped up in both training and testing phases in a holistic manner. Highlighted deep connection between density estimation and operator approximation. Demonstrated that in most cases, it is not necessary to build models that retain the entire dataset. Used ideas from RKHS theory and GP regression to create new controllers for adaptive control that truly learn model error and uncertainty. Possible Extensions Apply methods to large-scale applications (millions of points) on clusters. Design approximations for vector-valued RKHSs, and apply to learning vector fields associated with nonlinear dynamics. GP-MRAC has been implemented on quadrotor aircraft; next step is to apply it to fixed-wing aircraft, where dynamics more complicated. Explore reduction/projection methods different from RSKPCA algorithm.

58

slide-74
SLIDE 74

Introduction RSKPCA: Basics RSKPCA: Applications GP-MRAC Conclusion Questions

Completed, Related, and Planned Publications

Journals

1

Girish Chowdhary, Hassan A. Kingravi, Jonathan How, and Patricio A. Vela, Bayesian Nonparametric Adaptive Control using Gaussian Processes (submitted), IEEE Transactions on Neural Networks and Learning Systems, 2013.

2

Hassan A. Kingravi, Girish Chowdhary, Patricio A. Vela and Eric Johnson, Reproducing Kernel Hilbert Space Approach for the Online Update of Radial Bases in Neuro-Adaptive Control, IEEE Transactions on Neural Networks and Learning Systems, 2012. Conferences

1

Hassan A. Kingravi, Patricio A. Vela, and Alexander Gray, Reduced set KPCA for improving the training and execution speed of kernel machines, SIAM International Conference on Data Mining, 2013.

2

Girish Chowdhary, Hassan A. Kingravi, Jonathan How, and Patricio A. Vela, Nonparametric adaptive control of time-varying systems using Gaussian processes, American Conference on Control, 2013.

3

Girish Chowdhary, Hassan A. Kingravi, Jonathan How, and Patricio A. Vela, Nonparametric adaptive control using Gaussian processes, IEEE Conference on Decision and Control, 2013.

4

Girish Chowdhary, Hassan A. Kingravi, Jonathan How, and Patricio A. Vela, Nonparametric adaptive control using adaptive elements (invited paper), AIAA Guidance Navigation and Control Conference, 2012.

5

Hassan A. Kingravi, Girish Chowdhary, Patricio A. Vela and Eric Johnson, A Reproducing Kernel Hilbert Space Approach for the Online Update of Radial Bases in Neuro-Adaptive Control, IEEE Conference on Decision and Control, 2012. Journals to be Submitted

1

Hassan A. Kingravi, Patricio A. Vela, and Alexander Gray, Reduced-set models for improving the training and execution speed of kernel machines, Neural Computation, 2014.

59

slide-75
SLIDE 75

Introduction RSKPCA: Basics RSKPCA: Applications GP-MRAC Conclusion Questions

Outline

1

Introduction Kernel Methods Speeding Up Training Speeding Up Testing Research Objectives

2

RSKPCA: Basics Reduced-Set Models for the Batch Case RSKPCA Results

3

RSKPCA: Applications RSKPCA Applications Gaussian Process Regression Diffusion Maps RSKPCA Applications: Results

4

GP-MRAC Reduced-Set Models for the Online Case GP-MRAC Results

5

Conclusion Conclusions and Future Work

6

Questions

60

slide-76
SLIDE 76

Introduction RSKPCA: Basics RSKPCA: Applications GP-MRAC Conclusion Questions

Questions?

61