CS480/680 Lecture 12: June 17, 2019 Gaussian Processes [B] Section - PowerPoint PPT Presentation

CS480/680 Lecture 12: June 17, 2019 Gaussian Processes [B] Section 6.4 [M] Chap. 15 [HTF] Sec. 8.3 University of Waterloo CS480/680 Spring 2019 Pascal Poupart 1

Gaussian Process Regression • Idea: distribution over functions University of Waterloo CS480/680 Spring 2019 Pascal Poupart 2

Bayesian Linear Regression • Setting: 𝑔 𝒚 = 𝒙 ! 𝜚(𝒚) and 𝑧 = 𝑔 𝒚 + 𝜗 𝑂(0, 𝜏 ! ) unknown • Weight space view: – Prior: Pr 𝒙 – Posterior: Pr 𝒙 𝒀, 𝒛 = 𝑙 Pr 𝒙 Pr(𝒛|𝒙, 𝒀) Gaussian Gaussian Gaussian University of Waterloo CS480/680 Spring 2019 Pascal Poupart 3

Bayesian Linear Regression • Setting: 𝑔 𝒚 = 𝒙 ! 𝜚(𝒚) and 𝑧 = 𝑔 𝒚 + 𝜗 𝑂(0, 𝜏 ! ) unknown • Function space view: – Prior: Pr 𝑔 𝒚 ∗ = ∫ 𝒙 Pr 𝑔 𝒙, 𝒚 ∗ Pr(𝒙) 𝑒𝒙 Gaussian Gaussian Deterministic – Posterior: Pr 𝑔 𝒚 ∗ 𝒀, 𝒛 = ∫ 𝒙 Pr 𝑔 𝒙, 𝒚 ∗ Pr(𝒙|𝒀, 𝒛) 𝑒𝒙 Deterministic Gaussian Gaussian University of Waterloo CS480/680 Spring 2019 Pascal Poupart 4

Gaussian Process • According to the function view, there is a Gaussian at 𝑔(𝒚 ∗ ) for every 𝒚 ∗ . Those Gaussians are correlated through 𝑥 . • What is the general form of Pr(𝑔) (i.e., distribution over functions)? • Answer: Gaussian Process (infinite dimensional Gaussian distribution) University of Waterloo CS480/680 Spring 2019 Pascal Poupart 5

Gaussian Process • Distribution over functions: 𝑔 𝒚 ~ 𝐻𝑄 𝑛 𝒚 , 𝑙 𝒚, 𝒚 # ∀𝒚, 𝒚′ • Where 𝑛 𝒚 = 𝐹(𝑔 𝒚 ) is the mean and 𝑙 𝒚, 𝒚 # = 𝐹((𝑔 𝒚 − 𝑛 𝒚 )(𝑔 𝒚 # − 𝑛 𝒚 # ) is the kernel covariance function University of Waterloo CS480/680 Spring 2019 Pascal Poupart 6

Mean function 𝑛(𝒚) • Compute the mean function 𝑛(𝒚) as follows: • Let 𝑔 𝒚 = 𝜚 𝒚 ! 𝒙 with 𝒙 ~ 𝑂(𝟏, 𝛽 $% 𝑱) • Then 𝑛 𝒚 = 𝐹(𝑔 𝒚 ) = 𝐹 𝒙 ! 𝜚 𝒚 = 𝟏 University of Waterloo CS480/680 Spring 2019 Pascal Poupart 7

Kernel covariance function 𝑙(𝒚, 𝒚 ! ) • Compute kernel covariance 𝑙(𝒚, 𝒚 # ) as follows: • 𝑙 𝒚, 𝒚 # = 𝐹(𝑔 𝒚 𝑔 𝒚 # ) = 𝜚 𝒚 ! 𝐹 𝒙𝒙 𝑼 𝜚(𝒚 # ) = 𝜚 𝒚 ! ' ( 𝜚 𝒚 # ) 𝒚 ! ) 𝒚 " = ( • In some cases we can use domain knowledge to specify 𝑙 directly. University of Waterloo CS480/680 Spring 2019 Pascal Poupart 8

Examples • Sampled functions from a Gaussian Process Gaussian kernel Exponential kernel " (Brownian motion) 𝒚%𝒚 ! 𝑙 𝒚, 𝒚 $ = 𝑓 %(|𝒚%𝒚 ! | 𝑙 𝒚, 𝒚 $ = 𝑓 % !' " University of Waterloo CS480/680 Spring 2019 Pascal Poupart 9

Gaussian Process Regression • Gaussian Process Regression corresponds to kernelized Bayesian Linear Regression • Bayesian Linear Regression: – Weight space view – Goal: Pr(𝒙|𝒀, 𝒛) (posterior over 𝒙 ) – Complexity: cubic in # of basis functions • Gaussian Process Regression: – Function space view – Goal: Pr(𝑔|𝒀, 𝒛) (posterior over 𝑔 ) – Complexity: cubic in # of training points University of Waterloo CS480/680 Spring 2019 Pascal Poupart 10

Recap: Bayesian Linear Regression • Prior: Pr 𝒙 = 𝑂(𝟏, 𝚻) • Likelihood: Pr 𝒛 𝒀, 𝒙 = 𝑂 𝒙 𝑼 𝚾, 𝜏 + 𝑱 𝒙, 𝑩 $𝟐 • Posterior: Pr 𝒙 𝒀, 𝒛 = 𝑂 A 𝒙 = 𝜏 %! 𝑩 %𝟐 𝚾𝒛 and 𝑩 = 𝜏 %! 𝚾𝚾 𝑼 + 𝚻 %, where 4 • Prediction: $ 𝑩 "% 𝚾𝒛, 𝜏 # + 𝜚 𝒚 ∗ $ 𝑩 "% 𝜚(𝒚 ∗ )) Pr 𝑧 ∗ 𝒚 ∗ , 𝒀, 𝒛 = 𝑂(𝜏 "# 𝜚 𝒚 ∗ • Complexity: inversion of 𝑩 is cubic in # of basis functions University of Waterloo CS480/680 Spring 2019 Pascal Poupart 11

̅ ̅ Gaussian Process Regression • Prior: Pr 𝑔(⋅) = 𝑂(𝑛(⋅), 𝑙(⋅,⋅)) • Likelihood: Pr 𝒛 𝒀, 𝑔 = 𝑂 𝑔(𝒀), 𝜏 + 𝑱 • Posterior: Pr 𝑔(⋅) 𝒀, 𝒛 = 𝑂 𝑔(⋅), 𝑙′(⋅,⋅) 𝑳 + 𝜏 ! 𝑱 %, 𝒛 and where ̅ 𝑔(⋅) = 𝑙 ⋅, 𝒀 𝑙 $ ⋅,⋅ = 𝑙 ⋅,⋅ + 𝜏 ! 𝑱 − 𝑙 ⋅, 𝒀 𝑳 + 𝜏 ! 𝑱 %, 𝑙(𝒀,⋅) 𝑔 𝒚 ∗ , 𝑙 $ 𝒚 ∗ , 𝒚 ∗ • Prediction: Pr 𝑧 ∗ 𝒚 ∗ , 𝒀, 𝒛 = 𝑂 • Complexity: inversion of 𝑳 + 𝜏 + 𝑱 is cubic in # of training points University of Waterloo CS480/680 Spring 2019 Pascal Poupart 12

Infinite Neural Networks • Recall: neural networks with a single hidden layer (that contains sufficiently many hidden units) can approximate any function arbitrarily closely • Neal 94: The limit of an infinite single hidden layer neural network is a Gaussian Process University of Waterloo CS480/680 Spring 2019 Pascal Poupart 13

Bayesian Neural Networks • Consider a neural network with 𝐾 hidden units and a single identity output unit 𝑧 - : ' 𝑧 # = 𝑔 𝒚; 𝒙 = ∑ $%& 𝑥 #$ ℎ ∑ ( 𝑥 $( 𝑦 ( + 𝑥 $) + 𝑥 #) • Bayesian learning: express prior over the weights – Weight space view: Pr 𝑥 !" where 𝐹 𝑥 !" = 0, 𝑊𝑏𝑠 𝑥 !" = # ∀𝑘 , $ Pr 𝑥 !% where 𝐹 𝑥 !% = 0, 𝑊𝑏𝑠 𝑥 !% = 𝜏 & ∀𝑘𝑗 Type equation here. – Function space view: when 𝐾 → ∞ , by the central limit theorem, an infinite sum of i.i.d. (identically and independently distributed) variables yields a Gaussian = 𝑂(𝑔(𝒚)|0, 𝛽𝐹 ℎ 𝒚 ℎ(𝒚′) + 𝜏 & ) Pr 𝑔 𝒚 University of Waterloo CS480/680 Spring 2019 Pascal Poupart 14

Mean Derivation • Calculation of the mean function: / • 𝐹 𝑔(𝒚) = ∑ ,-. 𝐹[𝑥 0, ℎ(𝒚)] + 𝐹 𝑥 01 / = ∑ ,-. 𝐹 𝑥 0, 𝐹 ℎ 𝒚 + 𝐹 𝑥 01 / = ∑ ,-. 0 𝐹[ℎ 𝒚 ] + 0 = 0 University of Waterloo CS480/680 Spring 2019 Pascal Poupart 15

Covariance Derivation • 𝐷𝑝𝑤 𝑔 𝒚 , 𝑔(𝒚 ! ) = 𝐹[𝑔 𝒚 𝑔 𝒚 ! ] − 𝐹 𝑔 𝒚 𝐹[𝑔 𝒚 ! ] = 𝐹[𝑔 𝒚 𝑔 𝒚 ! ] ∑ " 𝑥 #" ℎ " 𝒚 ! + 𝑥 #$ ∑ " 𝑥 #" ℎ " 𝒚 + 𝑥 #$ = 𝐹 ' 𝐹 𝑥 #" ℎ " 𝒚 𝑥 #" ℎ " 𝒚 ! = ∑ "%& + 𝐹[𝑥 #$ 𝑥 #$ ] ( 𝐹 ℎ " 𝒚 ℎ " 𝒚 ! ( ] ' = ∑ "%& 𝐹 𝑥 #" + 𝐹[𝑥 #$ ' 𝑊𝑏𝑠 𝑥 #" 𝐹 ℎ 𝒚 ℎ 𝒚 ! = ∑ "%& + 𝑊𝑏𝑠(𝑥 #$ ) ) ' ' 𝐹 ℎ 𝒚 ℎ 𝒚 ! + 𝜏 ( = ∑ "%& = 𝛽𝐹 ℎ 𝒚 ℎ 𝒚 ! + 𝜏 ( University of Waterloo CS480/680 Spring 2019 Pascal Poupart 16

Bayesian Neural Networks • When # of hidden units 𝐾 → ∞ , then Bayesian neural net is equivalent to a Gaussian Process = 𝐻𝑄(𝑔(⋅)|0, 𝛽𝐹 ℎ ⋅ ℎ(⋅) + 𝜏 2 ) Pr 𝑔 ⋅ • Note: this works for – Any activation function ℎ – Any i.i.d. prior over the weights with mean 0 University of Waterloo CS480/680 Spring 2019 Pascal Poupart 17

Case Study: AIBO Gait Optimization University of Waterloo CS480/680 Spring 2019 Pascal Poupart 18

Gait Optimization • Problem: find best parameter setting of the gait controller to maximize walking speed – Why?: Fast robots have a better chance of winning in robotic soccer • Solutions: – Stochastic hill climbing – Gaussian Processes • Lizotte, Wang, Bowling, Schuurmans (2007) Automatic Gait Optimization with Gaussian Processes, International Joint Conferences on Artificial Intelligence (IJCAI) . University of Waterloo CS480/680 Spring 2019 Pascal Poupart 19

Search Problem • Let 𝒚 ∈ ℜ %< , be a vector of 15 parameters that defines a controller for gait • Let 𝑔: 𝒚 → ℜ be a mapping from controller parameters to gait speed • Problem: find parameters 𝒚 ∗ that yield highest speed. 𝒚 ∗ ← 𝑏𝑠𝑕𝑛𝑏𝑦 𝒚 𝑔(𝒚) But 𝑔 is unknown… University of Waterloo CS480/680 Spring 2019 Pascal Poupart 20

Approach • Picture University of Waterloo CS480/680 Spring 2019 Pascal Poupart 21

Approach • Initialize 𝑔 ⋅ ~ 𝐻𝑄(𝑛 ⋅ , 𝑙 ⋅,⋅ ) • Repeat: – Select new 𝒚: # 𝒚,𝒚 𝒚 *+, ← 𝑏𝑠𝑕𝑛𝑏𝑦 𝒚 2 𝒚 ( 34 𝒚 /01 𝒚(∈* – Evaluate 𝑔(𝒚 𝒐𝒇𝒙 ) by observing speed of robot with parameters set to 𝒚 *+, – Update Gaussian process: • 𝒀 ← 𝒀 ∪ {𝒚 𝒐𝒇𝒙 } and 𝒛 ← 𝒛 ∪ 𝑔(𝒚 𝒐𝒇𝒙 ) 𝑳 + 𝜏 & 𝑱 ./ 𝒛 • 𝑛 ⋅ ← 𝑙 ⋅, 𝒀 • 𝑙 ⋅,⋅ ← 𝑙 ⋅,⋅ + 𝜏 & 𝑱 − 𝑙 ⋅, 𝒀 𝑳 + 𝜏 & 𝑱 ./ 𝑙(𝒀,⋅) University of Waterloo CS480/680 Spring 2019 Pascal Poupart 22

Results Gaussian kernel: # 𝑓 "% # 𝒚"𝒚 ! " ) 𝒚"𝒚 ! 𝑙 𝒚, 𝒚 & = 𝜏 ' University of Waterloo CS480/680 Spring 2019 Pascal Poupart 23

CS480/680 Lecture 12: June 17, 2019 Gaussian Processes [B] Section - PowerPoint PPT Presentation

CS480/680 Lecture 12: June 17, 2019 Gaussian Processes [B] Section 6.4 [M] Chap. 15 [HTF] Sec. 8.3 University of Waterloo CS480/680 Spring 2019 Pascal Poupart 1 Gaussian Process Regression Idea: distribution over functions University of

CEE 680 Lecture #2 1/22/2020 1 CEE 680 Lecture #2 1/22/2020 2 CEE 680 Lecture #2

CS480/680 Lecture 9: June 5, 2019 Perceptrons, Neural Networks [D] Chapt. 4, [HTF] Chapt. 11,

CS480/680 Lecture 15: June 26, 2019 Deep Neural Networks [GBC] Chap. 6, 7, 8 University of

CS480/680 Lecture 8: June 3, 2019 Classification by Logistic Regression, Generalized linear

CS480/680 Lecture 11: June 12, 2019 Kernel methods [D] Chap. 11 [B] Sec. 6.1, 6.2 [M] Sec.

CS480/680 Lecture 14: June 24, 2019 Support Vector Machines (continued) [B] Sec. 7.1 [D] Sec.

CS480/680 Lecture 18: July 8, 2019 Recurrent and Recursive Neural Networks [GBC] Chap. 10

CS480/680 Lecture 4: May 15, 2019 Statistical Learning [RN]: Sec 20.1, 20.2, [M]: Sec. 2.2, 3.2

CS480/680 Machine Learning Lecture 1: May 6 th , 2019 Course Introduction Pascal Poupart

CS480/680 Lecture 24: July 29, 2019 Gradient Boosting, Bagging, Decision Forest [RN] Sec. 18.10,

CS480/680 Lecture 22: July 22, 2019 Ensemble Learning [RN] Sec. 18.10, [M] Sec. 16.2.5, [B]

CS480/680 Lecture 2: May 8 th , 2019 Nearest Neighbour [RN] Sec. 18.8.1, [HTF] Sec. 2.3.2, [D]

CS480/680 Machine Learning Lecture 3: May 13, 2019 Linear Regression [RN] Sec. 18.6.1, [HTF]

CS480/680 Lecture 7: May 29, 2019 Classification with Mixture of Gaussians [B] Sections 4.2,

CS480/680 Lecture 19: July 10, 2019 Attention and Transformer Networks [Vaswani et al.,

CS480/680 Machine Learning Lecture 8: January 30 th , 2020 Graphical Models Zahra Sheikhbahaee

Parameter estimation (cont.) Dr. Jarad Niemi STAT 544 - Iowa State University January 24, 2019

Gaussian Random Variables and Processes Saravanan Vijayakumaran sarva@ee.iitb.ac.in Department

An introduction to Gaussian processes Oliver Stegle and Karsten Borgwardt Machine Learning and

A New Statistical Test for Analyzing Skew Normal Data Hassan Elsalloukh, Ph.D. Associate

& Exact inference for Gaussian networks Probabilistic Graphical Models Sharif University of

Machine Learning (CSE 446): Probabilistic Machine Learning MLE & MAP Sham M Kakade 2018

MLE 04-09-2019 For Gaussian and Mixture Gaussian Models Instructor - Sriram Ganapathy

Multivariate normal distribution Surajit Ray Reader, University of Glasgow DataCamp

CS480/680 Lecture 12: June 17, 2019 Gaussian Processes [B] Section - PowerPoint PPT Presentation

CS480/680 Lecture 12: June 17, 2019 Gaussian Processes [B] Section 6.4 [M] Chap. 15 [HTF] Sec. 8.3 University of Waterloo CS480/680 Spring 2019 Pascal Poupart 1 Gaussian Process Regression Idea: distribution over functions University of

CEE 680 Lecture #2 1/22/2020 1 CEE 680 Lecture #2 1/22/2020 2 CEE 680 Lecture #2

CS480/680 Lecture 9: June 5, 2019 Perceptrons, Neural Networks [D] Chapt. 4, [HTF] Chapt. 11,

CS480/680 Lecture 15: June 26, 2019 Deep Neural Networks [GBC] Chap. 6, 7, 8 University of

CS480/680 Lecture 8: June 3, 2019 Classification by Logistic Regression, Generalized linear

CS480/680 Lecture 11: June 12, 2019 Kernel methods [D] Chap. 11 [B] Sec. 6.1, 6.2 [M] Sec.

CS480/680 Lecture 14: June 24, 2019 Support Vector Machines (continued) [B] Sec. 7.1 [D] Sec.

CS480/680 Lecture 18: July 8, 2019 Recurrent and Recursive Neural Networks [GBC] Chap. 10

CS480/680 Lecture 4: May 15, 2019 Statistical Learning [RN]: Sec 20.1, 20.2, [M]: Sec. 2.2, 3.2

CS480/680 Machine Learning Lecture 1: May 6 th , 2019 Course Introduction Pascal Poupart

CS480/680 Lecture 24: July 29, 2019 Gradient Boosting, Bagging, Decision Forest [RN] Sec. 18.10,

CS480/680 Lecture 22: July 22, 2019 Ensemble Learning [RN] Sec. 18.10, [M] Sec. 16.2.5, [B]

CS480/680 Lecture 2: May 8 th , 2019 Nearest Neighbour [RN] Sec. 18.8.1, [HTF] Sec. 2.3.2, [D]

CS480/680 Machine Learning Lecture 3: May 13, 2019 Linear Regression [RN] Sec. 18.6.1, [HTF]

CS480/680 Lecture 7: May 29, 2019 Classification with Mixture of Gaussians [B] Sections 4.2,

CS480/680 Lecture 19: July 10, 2019 Attention and Transformer Networks [Vaswani et al.,

CS480/680 Machine Learning Lecture 8: January 30 th , 2020 Graphical Models Zahra Sheikhbahaee

Parameter estimation (cont.) Dr. Jarad Niemi STAT 544 - Iowa State University January 24, 2019

Gaussian Random Variables and Processes Saravanan Vijayakumaran sarva@ee.iitb.ac.in Department

An introduction to Gaussian processes Oliver Stegle and Karsten Borgwardt Machine Learning and

A New Statistical Test for Analyzing Skew Normal Data Hassan Elsalloukh, Ph.D. Associate

&amp; Exact inference for Gaussian networks Probabilistic Graphical Models Sharif University of

Machine Learning (CSE 446): Probabilistic Machine Learning MLE &amp; MAP Sham M Kakade 2018

MLE 04-09-2019 For Gaussian and Mixture Gaussian Models Instructor - Sriram Ganapathy

Multivariate normal distribution Surajit Ray Reader, University of Glasgow DataCamp

& Exact inference for Gaussian networks Probabilistic Graphical Models Sharif University of

Machine Learning (CSE 446): Probabilistic Machine Learning MLE & MAP Sham M Kakade 2018