Announcements Homework 1: out tomorrow Due Thu Jan 29 Project - PowerPoint PPT Presentation

Active Learning and Optimized Information Gathering Lecture 6 – Gaussian Process Optimization CS 101.2 Andreas Krause

Announcements Homework 1: out tomorrow Due Thu Jan 29 Project Proposal due Tue Jan 27 Office hours Come to office hours before your presentation! Andreas: Friday 1:30-3pm, 260 Jorgensen Ryan: Wednesday 4:00-6:00pm, 109 Moore 2

Course outline Online decision making 1. Statistical active learning 2. Combinatorial approaches 3. 3

Recap Bandit problems … p 2 p 3 p k p 1 K-arms ε n greedy, UCB1 have regret O(log(T) K ) What about infinite arms (K= ∞ ) Have to make assumptions! 4

Bandits = Noisy function optimization We are given black box access to function f f(x) = mean payoff for arm x x f y = f(x) + noise Evaluating f is very expensive Want to (quickly) find x* = argmax x f(x) 5

Bandits with ∞ -many arms f(x)=w T x Lipschitz-continuous Linear (bounded slope) Can only hope to perform well if we make some assumptions 6

Regret depends on complexity Bandit linear optimization over R n “strong” assumptions Regret O(T 2/3 n) Bandit problems for optimizing Lipschitz functions “weak” assumptions Regret O(C(n) T n/(n+1) ) Curse of dimensionality! Today: Flexible (Bayesian) approach for encoding assumptions about function complexity 7

What if we believe, the function looks like: Piece-wise linear? Analytic? ( ∞ ∞ -diff.’able) ∞ ∞ Want flexible way to encode assumptions about functions! 8

A Bayesian approach Bayesian models for functions Likelihood P(data | f) Prior P(f) Posterior P(f | data) + + + + Uff… Why is this useful? 10

Probability of data P(y 1 ,…,y k ) = Can compute P(y’ | y 1 ,…,y k ) = 11

Regression with uncertainty about predictions! + + + + 12

How can we do this? Want to compute P(y’ | y 1 ,…,y k ) P(y 1 ,…,y k ) = ∫ P(f, y 1 ,…,y k ) df Horribly complicated integral?? � Will see how we can compute this (more or less) efficiently In closed form! … if P(f) is a Gaussian Process 13

Gaussian distribution σ = Standard deviation µ = mean 14

Bivariate Gaussian distribution 0.2 2 0.15 0.4 1 0.3 2 0.1 0.2 1 0 0.05 0.1 0 -1 0 -1 0 -2 -1.5 -1 -0.5 -2 -1.5 0 -1 -2 0.5 -0.5 -2 0 1 0.5 1 1.5 1.5 2 2 15

Multivariate Gaussian distribution Joint distribution over n random variables P(Y 1 ,…Y n ) σ jk = E[ (Y j – µ j ) (Y k - µ k ) ] Y j and Y k independent � σ jk =0 16

Marginalization Suppose (Y 1 ,…,Y n ) ~ N( µ , Σ ) What is P(Y 1 )?? More generally: Let A={i 1 ,…,i k } ⊆ {1,…,N} Write Y A = (Y i1 ,…,Y ik ) Y A ~ N( µ A , Σ AA ) 17

Conditioning Suppose (Y 1 ,…,Y n ) ~ N( µ , Σ ) Decompose as (Y A ,Y B ) What is P(Y A | Y B )?? P(Y A = y A | Y B = y B ) = N(y A ; µ A|B , Σ A|B ) where Computable using linear algebra! 18

Conditioning 2 0.4 1 0.3 0.2 0 0.1 -1 0 -2 -1.5 -1 -0.5 P(Y 2 | Y 1 =0.75) 0 0.5 -2 1 1.5 2 Y 1 =0.75 19

High dimensional Gaussians Gaussian Bivariate Gaussian 2 0.4 1 0.3 0.2 0 0.1 -1 0 -2 -1.5 -1 -0.5 0 0.5 -2 1 1.5 2 Multivariate Gaussian Gaussian Process = “ ∞ -variate Gaussian” 20

Gaussian process A Gaussian Process (GP) is a (infinite) set of random variables, indexed by some set V i.e., for each x ∈ V there’s a RV Y x Let A ⊆ V, |A|= {x 1 ,…,x k } < ∞ Then Y A ~ N( µ A , Σ AA ) where K: V × V → R is called kernel (covariance) function µ : V → R is called mean function 21

Visualizing GPs x ∈ ∈ ∈ ∈ V Typically, only care about “marginals”, i.e., P(y) = N(y; µ (x), K(x,x)) 22

Mean functions Can encode prior knowledge Typically, one simply assumes µ (x) = 0 Will do that here to simplify notation 23

Kernel functions K must be symmetric K(x,x’) = K(x’,x) for all x, x’ K must be positive definite For all A: Σ AA is positive definite matrix Kernel function K: assumptions about correlation! 24

Kernel functions: Examples Squared exponential kernel 1 K(x,x’) = exp(-(x-x’) 2 /h 2 ) 0 . 9 0 . 8 0 . 7 0 . 6 0 . 5 0 . 4 0 . 3 0 . 2 0 . 1 0 0 1 0 0 2 0 0 3 0 0 4 0 0 5 0 0 6 0 0 7 0 0 Distance |x-x’| Samples from P(f) 3 2 2.5 1 2 1.5 0 1 0.5 -1 0 -2 -0.5 -1 -3 -1.5 -2 -4 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 Bandwidth h=.1 Bandwidth h=.3 25

Kernel functions: Examples Exponential kernel 1 0 . 9 0 . 8 K(x,x’) = exp(-|x-x’|/h) 0 . 7 0 . 6 0 . 5 0 . 4 0 . 3 0 . 2 0 . 1 0 0 1 0 0 2 0 0 3 0 0 4 0 0 5 0 0 6 0 0 7 0 0 Distance |x-x’| 2.5 1.5 2 1 1.5 0.5 1 0 0.5 0 -0.5 -0.5 -1 -1 -1.5 -1.5 -2 -2 -2.5 -2.5 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Bandwidth h=.3 Bandwidth h=1 26

Kernel functions: Examples Linear kernel: K(x,x’) = x T x’ 1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Corresponds to linear regression! 27

Kernel functions: Examples Linear kernel with features: K(x,x’) = Φ (x) T Φ (x’) Φ (x) = [0,x,x 2 ] E.g., Φ Φ Φ E.g., Φ Φ Φ Φ (x) = sin(x) 2.5 2.5 2 2 1.5 1.5 1 1 0.5 0.5 0 0 -0.5 -0.5 -1 -1 -1.5 -1.5 -2 -2 -2.5 -2.5 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 28

Kernel functions: Examples White noise: K(x,x) = 1; K(x,x’) = 0 for x’ ≠ x 4 3 2 1 0 -1 -2 -3 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 29

Constructing kernels from kernels If K 1 (x,x’) and K 2 (x,x’) are kernel functions then α K 1 (x,x’) + β K 2 (x,x’) is a kernel for α , β > 0 K 1 (x,x’)*K 2 (x,x’) is a kernel 30

GP Regression Suppose we know kernel function K Get data (x 1 ,y 1 ),…,(x n ,y n ) Want to predict y’ = f(x’) for some new x’ 31

Linear prediction Posterior mean µ x` | D = Σ x`,D Σ D,D-1 y D Hence, µ x`|D = ∑ i=1n w i y i Prediction µ x`|D depends linearly on inputs y i ! For fixed data set D = {(x 1 ,y 1 ),…,(x n ,y n )}, can precompute weights w i Like linear regression, but number of parameters w_i grows with training data � “Nonparametric regression” � Can fit any data set!! ☺ 32

Learning parameters Example: K(x,x’) = exp(-(x-x’) 2 /h 2 ) Need to specify h! + + + + + + + + + + + + + + + + + + + + + h too small h too large h “just right” “underfit” “overfit” In general, kernel function has parameters θ Want to learn θ from data 33

Learning parameters Pick parameters that make data most likely! log P(y | θ ) differentiable if K(x,x’) is! � Can do gradient descent, conjugate gradient, etc. Tends to work well (not over- or underfit) in practice! 34

Matlab demo [Rasmussen & Williams, Gaussian Processes for Machine Learning] http://www.gaussianprocess.org/gpml/ 35

Gaussian process A Gaussian Process (GP) is a (infinite) set of random variables, indexed by some set V i.e., for each x ∈ V there’s a RV Y x Let A ⊆ V, |A|= {x 1 ,…,x k } < ∞ Then Y A ~ N( µ A , Σ AA ) where K: V × V → R is called kernel (covariance) function µ : V → R is called mean function 36

GPs over other sets GP is collection of random variables, indexed by set V So far: Have seen GPs over V = R Can define GPs over Text (strings) Graphs Sets … Only need to choose appropriate kernel function 37

Example: Using GPs to model spatial phenomena �� 38

Other extensions (won’t cover here) GPs for classification Nonparametric generalization of logistic regression Like SVMs (but give confidence on predicted labels!) GPs for modeling non-Gaussian phenomena Model count data over space, … Active set methods for fast inference … Still active research area in machine learning 39

Bandits = Noisy function optimization We are given black box access to function f x f y = f(x) + noise Evaluating f is very expensive Want to (quickly) find x* = argmax x f(x) Idea: Assume f is a sample from a Gaussian Process! � Gaussian Process optimization (a.k.a.: Response surface optimization) 40

Upper confidence bound approach UCB(x | D) = µ (x | D) + 2* σ (x | D) Pick point x* = argmax x UCB(x | D) + + + + + x ∈ ∈ ∈ ∈ V 41

Matlab demo 42

Properties Implicitly trades off exploration and exploitation Exploits prior knowledge about function Can converge to optimal solution very quickly! ☺ Seems to work well in many applications Can perform poorly if our prior assumptions are wrong � 43

Announcements Homework 1: out tomorrow Due Thu Jan 29 Project - PowerPoint PPT Presentation

Active Learning and Optimized Information Gathering Lecture 6 Gaussian Process Optimization CS 101.2 Andreas Krause Announcements Homework 1: out tomorrow Due Thu Jan 29 Project Proposal due Tue Jan 27 Office hours Come to office

DHTs and Sharding Aurojit Panda Announcements Announcements Fill out the Github consent

61A Lecture 35 Wednesday, December 4 Announcements 2 Announcements Homework 11 due Thursday

61A Lecture 6 Monday, February 2 Announcements 2 Announcements Homework 2 due Monday 2/2 @

61A Lecture 33 Monday, November 25 Announcements 2 Announcements Homework 10 due Tuesday

61A Lecture 6 Friday, September 13 Announcements 2 Announcements Homework 2 due Tuesday

61A Lecture 24 Monday, March 30 Announcements 2 Announcements Homework 7 due Wednesday 4/8

61A Lecture 37 Wednesday, April 29 Announcements 2 Announcements Homework 9 (4 pts) due

CS 61A Lecture 10 Friday, February 13 Announcements 2 Announcements Guerrilla Section 2 is

61A Lecture 14 Wednesday, February 25 Announcements 2 Announcements Project 2 due Thursday

Linearizability & CAP Announcements No hours this week. Announcements No hours this

61A Lecture 13 Wednesday, October 2 Announcements 2 Announcements Homework 3 deadline

61A Lecture 24 Friday, November 1 Announcements 2 Announcements Homework 7 due Tuesday 11/5

61A Extra Lecture 2 Thursday, February 5 Announcements 2 Announcements If you want 1 unit

CS 61A Lecture 11 Wednesday, February 18 Announcements 2 Announcements Optional Hog Contest

Announcements Lecture 22 System Development Leah Perlmutter / Summer 2018 Announcements

Lecture 30: Conclusion Brian Hou August 11, 2016 Announcements Announcements Final Exam

Information Technology and Productivity in the New Economy Kevin Stiroh* Federal Reserve

Tuesday, August 1, 2017, 2:00 3:00 p.m. (EDT) This is the DTIC public site:

PROMISE ZONES Urban - 2014 Draft Second Round Application Guide April 29, 2014 Presenter

Windows Not just for houses Windows 1-10 Windows Server Essentially a jacked up windows 8 box

Virginia Webb, PhD, RD Procurement Review Process First review cycle Review last

GENI Global Environment for Network Innovations The GENI Project Office (GPO) www.geni.net

GENI Global Environment for Network Innovations Chip Elliott GENI Project Director

The Economics of Climate Change C 175 Christian Traeger 75 g Part 3: Policy Instruments

Sambuz

Useful Links

Newsletter

Mail Us

Announcements Homework 1: out tomorrow Due Thu Jan 29 Project - PowerPoint PPT Presentation

Active Learning and Optimized Information Gathering Lecture 6 Gaussian Process Optimization CS 101.2 Andreas Krause Announcements Homework 1: out tomorrow Due Thu Jan 29 Project Proposal due Tue Jan 27 Office hours Come to office

DHTs and Sharding Aurojit Panda Announcements Announcements Fill out the Github consent

61A Lecture 35 Wednesday, December 4 Announcements 2 Announcements Homework 11 due Thursday

61A Lecture 6 Monday, February 2 Announcements 2 Announcements Homework 2 due Monday 2/2 @

61A Lecture 33 Monday, November 25 Announcements 2 Announcements Homework 10 due Tuesday

61A Lecture 6 Friday, September 13 Announcements 2 Announcements Homework 2 due Tuesday

61A Lecture 24 Monday, March 30 Announcements 2 Announcements Homework 7 due Wednesday 4/8

61A Lecture 37 Wednesday, April 29 Announcements 2 Announcements Homework 9 (4 pts) due

CS 61A Lecture 10 Friday, February 13 Announcements 2 Announcements Guerrilla Section 2 is

61A Lecture 14 Wednesday, February 25 Announcements 2 Announcements Project 2 due Thursday

Linearizability &amp; CAP Announcements No hours this week. Announcements No hours this

61A Lecture 13 Wednesday, October 2 Announcements 2 Announcements Homework 3 deadline

61A Lecture 24 Friday, November 1 Announcements 2 Announcements Homework 7 due Tuesday 11/5

61A Extra Lecture 2 Thursday, February 5 Announcements 2 Announcements If you want 1 unit

CS 61A Lecture 11 Wednesday, February 18 Announcements 2 Announcements Optional Hog Contest

Announcements Lecture 22 System Development Leah Perlmutter / Summer 2018 Announcements

Lecture 30: Conclusion Brian Hou August 11, 2016 Announcements Announcements Final Exam

Information Technology and Productivity in the New Economy Kevin Stiroh* Federal Reserve

Tuesday, August 1, 2017, 2:00 3:00 p.m. (EDT) This is the DTIC public site:

PROMISE ZONES Urban - 2014 Draft Second Round Application Guide April 29, 2014 Presenter

Windows Not just for houses Windows 1-10 Windows Server Essentially a jacked up windows 8 box

Virginia Webb, PhD, RD Procurement Review Process First review cycle Review last

GENI Global Environment for Network Innovations The GENI Project Office (GPO) www.geni.net

GENI Global Environment for Network Innovations Chip Elliott GENI Project Director

The Economics of Climate Change C 175 Christian Traeger 75 g Part 3: Policy Instruments

Sambuz

Useful Links

Newsletter

Mail Us

Linearizability & CAP Announcements No hours this week. Announcements No hours this