CS480/680 Lecture 12: June 17, 2019 Gaussian Processes [B] Section - - PowerPoint PPT Presentation

β–Ά
cs480 680 lecture 12 june 17 2019
SMART_READER_LITE
LIVE PREVIEW

CS480/680 Lecture 12: June 17, 2019 Gaussian Processes [B] Section - - PowerPoint PPT Presentation

CS480/680 Lecture 12: June 17, 2019 Gaussian Processes [B] Section 6.4 [M] Chap. 15 [HTF] Sec. 8.3 University of Waterloo CS480/680 Spring 2019 Pascal Poupart 1 Gaussian Process Regression Idea: distribution over functions University of


slide-1
SLIDE 1

CS480/680 Lecture 12: June 17, 2019

Gaussian Processes [B] Section 6.4 [M] Chap. 15 [HTF]

  • Sec. 8.3

CS480/680 Spring 2019 Pascal Poupart 1 University of Waterloo

slide-2
SLIDE 2

Gaussian Process Regression

  • Idea: distribution over functions

CS480/680 Spring 2019 Pascal Poupart 2 University of Waterloo

slide-3
SLIDE 3

Bayesian Linear Regression

  • Setting: 𝑔 π’š = 𝒙!𝜚(π’š) and 𝑧 = 𝑔 π’š + πœ—
  • Weight space view:

– Prior: Pr 𝒙 – Posterior: Pr 𝒙 𝒀, 𝒛 = 𝑙 Pr 𝒙 Pr(𝒛|𝒙, 𝒀)

CS480/680 Spring 2019 Pascal Poupart 3

unknown 𝑂(0, 𝜏!) Gaussian Gaussian Gaussian

University of Waterloo

slide-4
SLIDE 4

Bayesian Linear Regression

  • Setting: 𝑔 π’š = 𝒙!𝜚(π’š) and 𝑧 = 𝑔 π’š + πœ—
  • Function space view:

– Prior: Pr 𝑔 π’šβˆ— = ∫

𝒙Pr 𝑔 𝒙, π’šβˆ— Pr(𝒙) 𝑒𝒙

– Posterior: Pr 𝑔 π’šβˆ— 𝒀, 𝒛 = ∫

𝒙Pr 𝑔 𝒙, π’šβˆ— Pr(𝒙|𝒀, 𝒛) 𝑒𝒙

CS480/680 Spring 2019 Pascal Poupart 4

unknown 𝑂(0, 𝜏!) Gaussian Gaussian Deterministic Gaussian Gaussian Deterministic

University of Waterloo

slide-5
SLIDE 5

Gaussian Process

  • According to the function view, there is a Gaussian at

𝑔(π’šβˆ—) for every π’šβˆ—. Those Gaussians are correlated through π‘₯.

  • What is the general form of Pr(𝑔) (i.e., distribution
  • ver functions)?
  • Answer: Gaussian Process (infinite dimensional

Gaussian distribution)

CS480/680 Spring 2019 Pascal Poupart 5 University of Waterloo

slide-6
SLIDE 6

Gaussian Process

  • Distribution over functions:

𝑔 π’š ~ 𝐻𝑄 𝑛 π’š , 𝑙 π’š, π’š# βˆ€π’š, π’šβ€²

  • Where 𝑛 π’š = 𝐹(𝑔 π’š ) is the mean

and 𝑙 π’š, π’š# = 𝐹((𝑔 π’š βˆ’ 𝑛 π’š )(𝑔 π’š# βˆ’ 𝑛 π’š# ) is the kernel covariance function

CS480/680 Spring 2019 Pascal Poupart 6 University of Waterloo

slide-7
SLIDE 7

Mean function 𝑛(π’š)

  • Compute the mean function 𝑛(π’š) as follows:
  • Let 𝑔 π’š = 𝜚 π’š !𝒙

with 𝒙 ~ 𝑂(𝟏, 𝛽$%𝑱)

  • Then 𝑛 π’š = 𝐹(𝑔 π’š )

= 𝐹 𝒙 !𝜚 π’š = 𝟏

CS480/680 Spring 2019 Pascal Poupart 7 University of Waterloo

slide-8
SLIDE 8

Kernel covariance function 𝑙(π’š, π’š!)

  • Compute kernel covariance 𝑙(π’š, π’š#) as follows:
  • 𝑙 π’š, π’š# = 𝐹(𝑔 π’š 𝑔 π’š# )

= 𝜚 π’š !𝐹 𝒙𝒙𝑼 𝜚(π’š#) = 𝜚 π’š ! '

( 𝜚 π’š#

=

) π’š !) π’š" (

  • In some cases we can use domain knowledge to

specify 𝑙 directly.

CS480/680 Spring 2019 Pascal Poupart 8 University of Waterloo

slide-9
SLIDE 9

Examples

  • Sampled functions from a Gaussian Process

CS480/680 Spring 2019 Pascal Poupart 9

Gaussian kernel 𝑙 π’š, π’š$ = 𝑓%

π’š%π’š!

"

!'"

Exponential kernel (Brownian motion) 𝑙 π’š, π’š$ = 𝑓%(|π’š%π’š!|

University of Waterloo

slide-10
SLIDE 10

Gaussian Process Regression

  • Gaussian Process Regression corresponds to

kernelized Bayesian Linear Regression

  • Bayesian Linear Regression:

– Weight space view – Goal: Pr(𝒙|𝒀, 𝒛) (posterior over 𝒙) – Complexity: cubic in # of basis functions

  • Gaussian Process Regression:

– Function space view – Goal: Pr(𝑔|𝒀, 𝒛) (posterior over 𝑔) – Complexity: cubic in # of training points

CS480/680 Spring 2019 Pascal Poupart 10 University of Waterloo

slide-11
SLIDE 11

Recap: Bayesian Linear Regression

  • Prior: Pr 𝒙 = 𝑂(𝟏, 𝚻)
  • Likelihood: Pr 𝒛 𝒀, 𝒙 = 𝑂 π’™π‘ΌπšΎ, 𝜏+𝑱
  • Posterior: Pr 𝒙 𝒀, 𝒛 = 𝑂 A

𝒙, 𝑩$𝟐

where 4 𝒙 = 𝜏%!𝑩%πŸπšΎπ’› and 𝑩 = 𝜏%!πšΎπšΎπ‘Ό + 𝚻%,

  • Prediction:

Pr π‘§βˆ— π’šβˆ—, 𝒀, 𝒛 = 𝑂(𝜏"#𝜚 π’šβˆ—

$ 𝑩"%πšΎπ’›, 𝜏# + 𝜚 π’šβˆ— $𝑩"%𝜚(π’šβˆ—))

  • Complexity: inversion of 𝑩 is cubic in # of basis

functions

CS480/680 Spring 2019 Pascal Poupart 11 University of Waterloo

slide-12
SLIDE 12

Gaussian Process Regression

  • Prior: Pr 𝑔(β‹…) = 𝑂(𝑛(β‹…), 𝑙(β‹…,β‹…))
  • Likelihood: Pr 𝒛 𝒀, 𝑔 = 𝑂 𝑔(𝒀), 𝜏+𝑱
  • Posterior: Pr 𝑔(β‹…) 𝒀, 𝒛 = 𝑂

Μ… 𝑔(β‹…), 𝑙′(β‹…,β‹…)

where Μ… 𝑔(β‹…) = 𝑙 β‹…, 𝒀 𝑳 + 𝜏!𝑱 %,𝒛 and 𝑙$ β‹…,β‹… = 𝑙 β‹…,β‹… + 𝜏!𝑱 βˆ’ 𝑙 β‹…, 𝒀 𝑳 + 𝜏!𝑱 %,𝑙(𝒀,β‹…)

  • Prediction: Pr π‘§βˆ— π’šβˆ—, 𝒀, 𝒛 = 𝑂

Μ… 𝑔 π’šβˆ— , 𝑙$ π’šβˆ—, π’šβˆ—

  • Complexity: inversion of 𝑳 + 𝜏+𝑱 is cubic in # of

training points

CS480/680 Spring 2019 Pascal Poupart 12 University of Waterloo

slide-13
SLIDE 13

Infinite Neural Networks

  • Recall: neural networks with a single hidden layer

(that contains sufficiently many hidden units) can approximate any function arbitrarily closely

  • Neal 94: The limit of an infinite single hidden layer

neural network is a Gaussian Process

CS480/680 Spring 2019 Pascal Poupart 13 University of Waterloo

slide-14
SLIDE 14

Bayesian Neural Networks

  • Consider a neural network with 𝐾 hidden units and a single

identity output unit 𝑧-:

𝑧# = 𝑔 π’š; 𝒙 = βˆ‘$%&

'

π‘₯#$β„Ž βˆ‘( π‘₯

$(𝑦( + π‘₯ $) + π‘₯#)

  • Bayesian learning: express prior over the weights

– Weight space view: Pr π‘₯!" where 𝐹 π‘₯!" = 0, π‘Šπ‘π‘  π‘₯!" = #

$

βˆ€π‘˜, Pr π‘₯!% where 𝐹 π‘₯!% = 0, π‘Šπ‘π‘  π‘₯!% = 𝜏&βˆ€π‘˜π‘— Type equation here. – Function space view: when 𝐾 β†’ ∞, by the central limit theorem, an infinite sum of i.i.d. (identically and independently distributed) variables yields a Gaussian Pr 𝑔 π’š = 𝑂(𝑔(π’š)|0, 𝛽𝐹 β„Ž π’š β„Ž(π’šβ€²) + 𝜏&)

CS480/680 Spring 2019 Pascal Poupart 14 University of Waterloo

slide-15
SLIDE 15

Mean Derivation

  • Calculation of the mean function:
  • 𝐹 𝑔(π’š) = βˆ‘,-.

/

𝐹[π‘₯0,β„Ž(π’š)] + 𝐹 π‘₯01 = βˆ‘,-.

/

𝐹 π‘₯0, 𝐹 β„Ž π’š + 𝐹 π‘₯01 = βˆ‘,-.

/

0 𝐹[β„Ž π’š ] + 0 = 0

CS480/680 Spring 2019 Pascal Poupart 15 University of Waterloo

slide-16
SLIDE 16

Covariance Derivation

  • 𝐷𝑝𝑀 𝑔 π’š , 𝑔(π’š!)

= 𝐹[𝑔 π’š 𝑔 π’š! ] βˆ’ 𝐹 𝑔 π’š 𝐹[𝑔 π’š! ] = 𝐹[𝑔 π’š 𝑔 π’š! ] = 𝐹 βˆ‘" π‘₯#"β„Ž" π’š + π‘₯#$ βˆ‘" π‘₯#"β„Ž" π’š! + π‘₯#$ = βˆ‘"%&

'

𝐹 π‘₯#"β„Ž" π’š π‘₯#"β„Ž" π’š! + 𝐹[π‘₯#$π‘₯#$] = βˆ‘"%&

'

𝐹 π‘₯#"

( 𝐹 β„Ž" π’š β„Ž" π’š!

+ 𝐹[π‘₯#$

( ]

= βˆ‘"%&

'

π‘Šπ‘π‘  π‘₯#" 𝐹 β„Ž π’š β„Ž π’š! + π‘Šπ‘π‘ (π‘₯#$) = βˆ‘"%&

' ) ' 𝐹 β„Ž π’š β„Ž π’š!

+ 𝜏( = 𝛽𝐹 β„Ž π’š β„Ž π’š! + 𝜏(

CS480/680 Spring 2019 Pascal Poupart 16 University of Waterloo

slide-17
SLIDE 17

Bayesian Neural Networks

  • When # of hidden units 𝐾 β†’ ∞, then Bayesian

neural net is equivalent to a Gaussian Process Pr 𝑔 β‹… = 𝐻𝑄(𝑔(β‹…)|0, 𝛽𝐹 β„Ž β‹… β„Ž(β‹…) + 𝜏2)

  • Note: this works for

– Any activation function β„Ž – Any i.i.d. prior over the weights with mean 0

CS480/680 Spring 2019 Pascal Poupart 17 University of Waterloo

slide-18
SLIDE 18

Case Study: AIBO Gait Optimization

CS480/680 Spring 2019 Pascal Poupart 18 University of Waterloo

slide-19
SLIDE 19

Gait Optimization

  • Problem: find best parameter setting of the gait

controller to maximize walking speed

– Why?: Fast robots have a better chance of winning in robotic soccer

  • Solutions:

– Stochastic hill climbing – Gaussian Processes

  • Lizotte, Wang, Bowling, Schuurmans (2007) Automatic Gait

Optimization with Gaussian Processes, International Joint Conferences on Artificial Intelligence (IJCAI).

CS480/680 Spring 2019 Pascal Poupart 19 University of Waterloo

slide-20
SLIDE 20

Search Problem

  • Let π’š ∈ β„œ%<, be a vector of 15 parameters that

defines a controller for gait

  • Let 𝑔: π’š β†’ β„œ be a mapping from controller

parameters to gait speed

  • Problem: find parameters π’šβˆ— that yield highest

speed. π’šβˆ— ← π‘π‘ π‘•π‘›π‘π‘¦π’š 𝑔(π’š)

But 𝑔 is unknown…

CS480/680 Spring 2019 Pascal Poupart 20 University of Waterloo

slide-21
SLIDE 21

Approach

  • Picture

CS480/680 Spring 2019 Pascal Poupart 21 University of Waterloo

slide-22
SLIDE 22

Approach

  • Initialize 𝑔 β‹… ~ 𝐻𝑄(𝑛 β‹… , 𝑙 β‹…,β‹… )
  • Repeat:

– Select new π’š: π’š*+, ← π‘π‘ π‘•π‘›π‘π‘¦π’š

# π’š,π’š /01

π’š(∈*

2 π’š( 34 π’š

– Evaluate 𝑔(π’šπ’π’‡π’™) by observing speed of robot with parameters set to π’š*+, – Update Gaussian process:

  • 𝒀 ← 𝒀 βˆͺ {π’šπ’π’‡π’™} and 𝒛 ← 𝒛 βˆͺ 𝑔(π’šπ’π’‡π’™)
  • 𝑛 β‹… ← 𝑙 β‹…, 𝒀

𝑳 + 𝜏&𝑱 ./𝒛

  • 𝑙 β‹…,β‹… ← 𝑙 β‹…,β‹… + 𝜏&𝑱 βˆ’ 𝑙 β‹…, 𝒀

𝑳 + 𝜏&𝑱 ./𝑙(𝒀,β‹…)

CS480/680 Spring 2019 Pascal Poupart 22 University of Waterloo

slide-23
SLIDE 23

Results

CS480/680 Spring 2019 Pascal Poupart 23

𝑙 π’š, π’š& = 𝜏

' #𝑓"% # π’š"π’š! ") π’š"π’š!

Gaussian kernel:

University of Waterloo