Multiple-output Gaussian processes Mauricio A. Alvarez Department - PowerPoint PPT Presentation

Multiple-output Gaussian process f 1 ( x ) ∼ GP ( 0 , k 1 ( x , x ′ )) f 2 ( x ) ∼ GP ( 0 , k 2 ( x , x ′ )) D 1 = { ( x i , 1 , f 1 ( x i , 1 )) | i = 1 , . . . , N 1 } D 2 = { ( x i , 2 , f 2 ( x i , 2 )) | i = 1 , . . . , N 2 } f 1 ∼ N ( 0 , K 1 ) f 2 ∼ N ( 0 , K 2 ) � f 1 � �� 0 � � K 1 �� 0 ∼ N , f 2 0 0 K 2 f 6 / 76

Multiple-output Gaussian process f 1 ( x ) ∼ GP ( 0 , k 1 ( x , x ′ )) f 2 ( x ) ∼ GP ( 0 , k 2 ( x , x ′ )) D 1 = { ( x i , 1 , f 1 ( x i , 1 )) | i = 1 , . . . , N 1 } D 2 = { ( x i , 2 , f 2 ( x i , 2 )) | i = 1 , . . . , N 2 } f 1 ∼ N ( 0 , K 1 ) f 2 ∼ N ( 0 , K 2 ) � f 1 � �� 0 � � K 1 �� 0 ∼ N , f 2 0 0 K 2 f K f , f 6 / 76

Multiple-output Gaussian process f 1 ( x ) ∼ GP ( 0 , k 1 ( x , x ′ )) f 2 ( x ) ∼ GP ( 0 , k 2 ( x , x ′ )) D 1 = { ( x i , 1 , f 1 ( x i , 1 )) | i = 1 , . . . , N 1 } D 2 = { ( x i , 2 , f 2 ( x i , 2 )) | i = 1 , . . . , N 2 } f 1 ∼ N ( 0 , K 1 ) f 2 ∼ N ( 0 , K 2 ) � f 1 � �� 0 � � K 1 �� 0 ∼ N , f 2 0 0 K 2 f 0 K f , f 6 / 76

Multiple-output Gaussian process f 1 ( x ) ∼ GP ( 0 , k 1 ( x , x ′ )) f 2 ( x ) ∼ GP ( 0 , k 2 ( x , x ′ )) 6 / 76

Multiple-output Gaussian process f 1 ( x ) ∼ GP ( 0 , k 1 ( x , x ′ )) f 2 ( x ) ∼ GP ( 0 , k 2 ( x , x ′ )) D 1 = { ( x i , 1 , y 1 ( x i , 2 )) | i = 1 , . . . , N 1 } D 2 = { ( x i , 2 , y 2 ( x i , 2 )) | i = 1 , . . . , N 2 } 6 / 76

Multiple-output Gaussian process f 1 ( x ) ∼ GP ( 0 , k 1 ( x , x ′ )) f 2 ( x ) ∼ GP ( 0 , k 2 ( x , x ′ )) D 1 = { ( x i , 1 , y 1 ( x i , 2 )) | i = 1 , . . . , N 1 } D 2 = { ( x i , 2 , y 2 ( x i , 2 )) | i = 1 , . . . , N 2 } y 1 ∼ N ( 0 , K 1 + σ 2 y 2 ∼ N ( 0 , K 2 + σ 2 1 I ) 2 I ) 6 / 76

Multiple-output Gaussian process f 1 ( x ) ∼ GP ( 0 , k 1 ( x , x ′ )) f 2 ( x ) ∼ GP ( 0 , k 2 ( x , x ′ )) D 1 = { ( x i , 1 , y 1 ( x i , 2 )) | i = 1 , . . . , N 1 } D 2 = { ( x i , 2 , y 2 ( x i , 2 )) | i = 1 , . . . , N 2 } y 1 ∼ N ( 0 , K 1 + σ 2 y 2 ∼ N ( 0 , K 2 + σ 2 1 I ) 2 I ) � y 1 � �� 0 � � K 1 � � σ 2 �� 0 1 I 0 ∼ N + , σ 2 y 2 0 0 K 2 0 2 I 6 / 76

Multiple-output Gaussian process f 1 ( x ) ∼ GP ( 0 , k 1 ( x , x ′ )) f 2 ( x ) ∼ GP ( 0 , k 2 ( x , x ′ )) D 1 = { ( x i , 1 , y 1 ( x i , 2 )) | i = 1 , . . . , N 1 } D 2 = { ( x i , 2 , y 2 ( x i , 2 )) | i = 1 , . . . , N 2 } y 1 ∼ N ( 0 , K 1 + σ 2 y 2 ∼ N ( 0 , K 2 + σ 2 1 I ) 2 I ) � y 1 � �� 0 � � K 1 � � σ 2 �� 0 1 I 0 ∼ N + , σ 2 y 2 0 0 K 2 0 2 I y 6 / 76

Multiple-output Gaussian process f 1 ( x ) ∼ GP ( 0 , k 1 ( x , x ′ )) f 2 ( x ) ∼ GP ( 0 , k 2 ( x , x ′ )) D 1 = { ( x i , 1 , y 1 ( x i , 2 )) | i = 1 , . . . , N 1 } D 2 = { ( x i , 2 , y 2 ( x i , 2 )) | i = 1 , . . . , N 2 } y 1 ∼ N ( 0 , K 1 + σ 2 y 2 ∼ N ( 0 , K 2 + σ 2 1 I ) 2 I ) � y 1 � �� 0 � � K 1 � � σ 2 �� 0 1 I 0 ∼ N + , σ 2 y 2 0 0 K 2 0 2 I y K f , f 6 / 76

Multiple-output Gaussian process f 1 ( x ) ∼ GP ( 0 , k 1 ( x , x ′ )) f 2 ( x ) ∼ GP ( 0 , k 2 ( x , x ′ )) D 1 = { ( x i , 1 , y 1 ( x i , 2 )) | i = 1 , . . . , N 1 } D 2 = { ( x i , 2 , y 2 ( x i , 2 )) | i = 1 , . . . , N 2 } y 1 ∼ N ( 0 , K 1 + σ 2 y 2 ∼ N ( 0 , K 2 + σ 2 1 I ) 2 I ) � y 1 � �� 0 � � K 1 � � σ 2 �� 0 1 I 0 ∼ N + , σ 2 y 2 0 0 K 2 0 2 I y + Σ K f , f 6 / 76

Multiple-output Gaussian process f 1 ( x ) ∼ GP ( 0 , k 1 ( x , x ′ )) f 2 ( x ) ∼ GP ( 0 , k 2 ( x , x ′ )) D 1 = { ( x i , 1 , y 1 ( x i , 2 )) | i = 1 , . . . , N 1 } D 2 = { ( x i , 2 , y 2 ( x i , 2 )) | i = 1 , . . . , N 2 } y 1 ∼ N ( 0 , K 1 + σ 2 y 2 ∼ N ( 0 , K 2 + σ 2 1 I ) 2 I ) � y 1 � �� 0 � � K 1 � � σ 2 �� 0 1 I 0 ∼ N + , σ 2 y 2 0 0 K 2 0 2 I y + Σ 0 K f , f 6 / 76

Kernels for multiple outputs f 1 ( x ) ∼ GP ( 0 , k 1 ( x , x ′ )) f 2 ( x ) ∼ GP ( 0 , k 2 ( x , x ′ )) D 1 = { ( x i , 1 , f 1 ( x i , 1 )) | i = 1 , . . . , N 1 } D 2 = { ( x i , 2 , f 2 ( x i , 2 )) | i = 1 , . . . , N 2 } 7 / 76

Kernels for multiple outputs f 1 ( x ) ∼ GP ( 0 , k 1 ( x , x ′ )) f 2 ( x ) ∼ GP ( 0 , k 2 ( x , x ′ )) D 1 = { ( x i , 1 , f 1 ( x i , 1 )) | i = 1 , . . . , N 1 } D 2 = { ( x i , 2 , f 2 ( x i , 2 )) | i = 1 , . . . , N 2 } � f 1 � �� 0 � � K 1 �� 0 ∼ N , f 2 0 0 K 2 0 K f , f f 7 / 76

Kernels for multiple outputs f 1 ( x ) ∼ GP ( 0 , k 1 ( x , x ′ )) f 2 ( x ) ∼ GP ( 0 , k 2 ( x , x ′ )) D 1 = { ( x i , 1 , f 1 ( x i , 1 )) | i = 1 , . . . , N 1 } D 2 = { ( x i , 2 , f 2 ( x i , 2 )) | i = 1 , . . . , N 2 } � � K 1 0 K f , f = 0 K 2 7 / 76

Kernels for multiple outputs f 1 ( x ) ∼ GP ( 0 , k 1 ( x , x ′ )) f 2 ( x ) ∼ GP ( 0 , k 2 ( x , x ′ )) D 1 = { ( x i , 1 , f 1 ( x i , 1 )) | i = 1 , . . . , N 1 } D 2 = { ( x i , 2 , f 2 ( x i , 2 )) | i = 1 , . . . , N 2 } � � K 1 ? K f , f = ? K 2 7 / 76

Kernels for multiple outputs f 1 ( x ) ∼ GP ( 0 , k 1 ( x , x ′ )) f 2 ( x ) ∼ GP ( 0 , k 2 ( x , x ′ )) D 1 = { ( x i , 1 , f 1 ( x i , 1 )) | i = 1 , . . . , N 1 } D 2 = { ( x i , 2 , f 2 ( x i , 2 )) | i = 1 , . . . , N 2 } � � Build a cross-covariance K 1 ? K f , f = function cov [ f 1 ( x ) , f 2 ( x ′ )] such ? K 2 that K f , f is positive semi-definite. 7 / 76

Different input configurations of the data Isotopic data Inputs for f 1 ( x ) Inputs for f 2 ( x ) Sample sites are shared 8 / 76

Different input configurations of the data Isotopic data Inputs for f 1 ( x ) Inputs for f 2 ( x ) Sample sites are shared D 1 = { ( x i , f 1 ( x i )) N i = 1 } D 2 = { ( x i , f 2 ( x i )) N i = 1 } 8 / 76

Different input configurations of the data Isotopic data Heterotopic data Inputs for f 1 ( x ) Inputs for f 2 ( x ) Sample sites are shared Sample sites may be different D 1 = { ( x i , f 1 ( x i )) N i = 1 } D 2 = { ( x i , f 2 ( x i )) N i = 1 } 8 / 76

Different input configurations of the data Isotopic data Heterotopic data Inputs for f 1 ( x ) Inputs for f 2 ( x ) Sample sites are shared Sample sites may be different D 1 = { ( x i , f 1 ( x i )) N D 1 = { ( x i , 1 , f 1 ( x i , 1 )) N 1 i = 1 } i = 1 } D 2 = { ( x i , f 2 ( x i )) N D 2 = { ( x i , 2 , f 2 ( x i , 2 )) N 2 i = 1 } i = 1 } 8 / 76

Contents Dependencies between processes Intrinsic Coregionalization Model Semiparametric Latent Factor Model Linear Model of Coregionalization Process convolutions Covariance fitting and Prediction Cokriging Extensions Computational complexity Variations of LMC Variations of PC Summary 9 / 76

Intrinsic coregionalization model (ICM): two outputs Consider two outputs f 1 ( x ) and f 2 ( x ) with x ∈ R p . ❑ We assume the following generative model for the outputs ❑ 1. Sample from a GP u ( x ) ∼ GP ( 0 , k ( x , x ′ )) to obtain u 1 ( x ) 2. Obtain f 1 ( x ) and f 2 ( x ) by linearly transforming u 1 ( x ) f 1 ( x ) = a 1 1 u 1 ( x ) f 2 ( x ) = a 1 2 u 1 ( x ) 10 / 76

ICM: samples 0.4 0.2 0 -0.2 -0.4 -0.6 -0.8 -1 -1.2 0 0.2 0.4 0.6 0.8 1 11 / 76

ICM: samples 2 1 0 -1 0.4 -2 0.2 -3 0 -4 -0.2 -5 -0.4 -6 0 0.2 0.4 0.6 0.8 1 -0.6 -0.8 -1 -1.2 0 0.2 0.4 0.6 0.8 1 11 / 76

ICM: samples 2 1 0 -1 0.4 -2 0.2 -3 0 -4 -0.2 -5 -0.4 -6 0 0.2 0.4 0.6 0.8 1 -0.6 0.5 -0.8 -1 0 -1.2 0 0.2 0.4 0.6 0.8 1 -0.5 -1 -1.5 -2 0 0.2 0.4 0.6 0.8 1 11 / 76

ICM: samples 1 0.9 0.8 0.7 0.6 0.5 0 0.2 0.4 0.6 0.8 1 11 / 76

ICM: samples 5 4.5 4 1 3.5 0.9 3 0.8 2.5 0 0.2 0.4 0.6 0.8 1 0.7 0.6 0.5 0 0.2 0.4 0.6 0.8 1 11 / 76

ICM: samples 5 4.5 4 1 3.5 0.9 3 0.8 2.5 0 0.2 0.4 0.6 0.8 1 0.7 1.5 0.6 1.4 1.3 0.5 0 0.2 0.4 0.6 0.8 1 1.2 1.1 1 0.9 0.8 0.7 0 0.2 0.4 0.6 0.8 1 11 / 76

ICM: covariance (I) For a fixed value of x , we can group f 1 ( x ) and f 2 ( x ) in a vector f ( x ) ❑ � f 1 ( x ) � f ( x ) = f 2 ( x ) We refer to this vector as a vector-valued function . ❑ The covariance for f ( x ) is computed as ❑ − E { f ( x ) } [ E { f ( x ′ ) } ] ⊤ . � f ( x )[ f ( x ′ )] ⊤ � cov ( f ( x ) , f ( x ′ )) = E � f ( x )[ f ( x ′ )] ⊤ � We compute first the term E ❑ �� f 1 ( x ) � � f 1 ( x ′ ) f 2 ( x ′ ) �� E { f 1 ( x ) f 1 ( x ′ ) } � E { f 1 ( x ) f 2 ( x ′ ) } = E E { f 2 ( x ) f 1 ( x ′ ) } E { f 2 ( x ) f 2 ( x ′ ) } f 2 ( x ) 12 / 76

ICM: covariance (II) We compute the expected values as ❑ � � � � a 1 1 u 1 ( x ) a 1 1 u 1 ( x ′ ) = ( a 1 1 ) 2 E u 1 ( x ) u 1 ( x ′ ) E { f 1 ( x ) f 1 ( x ′ ) } = E � � � � E { f 1 ( x ) f 2 ( x ′ ) } = E a 1 1 u 1 ( x ) a 1 2 ( x ′ ) = a 1 1 a 1 u 1 ( x ) u 1 ( x ′ ) 2 E � � � � E { f 2 ( x ) f 2 ( x ′ ) } = E a 1 2 u 1 ( x ) a 1 2 u 1 ( x ′ ) = ( a 1 2 ) 2 E u 1 ( x ) u 1 ( x ′ ) � f ( x )[ f ( x ′ )] ⊤ � The term E follows as ❑ � � � � � � ( a 1 1 ) 2 E u 1 ( x ) u 1 ( x ′ ) a 1 1 a 1 u 1 ( x ) u 1 ( x ′ ) 2 E � f ( x )[ f ( x ′ )] ⊤ � = E � � � � a 1 a 2 E u 1 ( x ) u 1 ( x ′ ) ( a 1 2 ) 2 E u 1 ( x ) u 1 ( x ′ ) � � ( a 1 1 ) 2 a 1 1 a 1 � � 2 u 1 ( x ) u 1 ( x ′ ) = E a 1 1 a 1 ( a 1 2 ) 2 2 The term E { f ( x ) } is computed as ❑ �� a 1 1 u 1 ( x ) a 1 f 1 ( x ) E { f 1 ( x ) } E � � u 1 ( x ) 1 = = = E E � � a 1 2 u 1 ( x ) a 1 f 2 ( x ) E { f 2 ( x ) } E 2 13 / 76

ICM: covariance (III) Putting the terms together, the covariance for f ( x ′ ) follows as ❑ � ( a 1 � � a 1 � � 1 ) 2 a 1 1 a 1 � � � � � � � u 1 ( x ) u 1 ( x ′ ) a 1 a 1 u 1 ( x ) u 1 ( x ′ ) 2 − 1 E E E a 1 1 a 1 ( a 1 2 ) 2 a 1 1 2 2 2 Defining a = [ a 1 1 a 1 2 ] ⊤ , ❑ � � � � � � cov ( f ( x ) , f ( x ′ )) = aa ⊤ E u 1 ( x ) u 1 ( x ′ ) − aa ⊤ E u 1 ( x ) u 1 ( x ′ ) E = aa ⊤ � � � � � � �� u 1 ( x ) u 1 ( x ′ ) u 1 ( x ) u 1 ( x ′ ) − E E E � �� k ( x , x ′ ) = aa ⊤ k ( x , x ′ ) We define B = aa ⊤ , leading to ❑ � b 11 � b 12 cov ( f ( x ) , f ( x ′ )) = B k ( x , x ′ ) = k ( x , x ′ ) b 21 b 22 Notice that B has rank one. ❑ 14 / 76

ICM: two outputs and two latent samples We can introduce a bit more of complexity in the model before as ❑ follows. Consider again two outputs f 1 ( x ) and f 2 ( x ) with x ∈ R p . ❑ We assume the following generative model for the outputs ❑ 1. Sample twice from a GP u ( x ) ∼ GP ( 0 , k ( x , x ′ )) to obtain u 1 ( x ) and u 2 ( x ) 2. Obtain f 1 ( x ) and f 2 ( x ) by adding a scaled transformation of u 1 ( x ) and u 2 ( x ) f 1 ( x ) = a 1 1 u 1 ( x ) + a 2 1 u 2 ( x ) f 2 ( x ) = a 1 2 u 1 ( x ) + a 2 2 u 2 ( x ) Notice that u 1 ( x ) and u 2 ( x ) are independent, although they share the ❑ same covariance k ( x , x ′ ) . 15 / 76

ICM: samples 3 2 1 0 -1 -2 0 0.2 0.4 0.6 0.8 1 1.5 1 0.5 0 -0.5 -1 -1.5 0 0.2 0.4 0.6 0.8 1 16 / 76

ICM: samples 3 10 10 2 5 5 1 0 0 0 -5 -5 -1 -2 -10 -10 0 0.2 0.4 0.6 0.8 1 0 0 0.2 0.2 0.4 0.4 0.6 0.6 0.8 0.8 1 1 1.5 1 0.5 0 -0.5 -1 -1.5 0 0.2 0.4 0.6 0.8 1 16 / 76

ICM: samples 3 10 10 2 5 5 1 0 0 0 -5 -5 -1 -2 -10 -10 0 0.2 0.4 0.6 0.8 1 0 0 0.2 0.2 0.4 0.4 0.6 0.6 0.8 0.8 1 1 1.5 5 5 1 0.5 0 0 0 -0.5 -1 -1.5 -5 -5 0 0.2 0.4 0.6 0.8 1 0 0 0.2 0.2 0.4 0.4 0.6 0.6 0.8 0.8 1 1 16 / 76

ICM: samples 1 0.5 0 -0.5 -1 -1.5 -2 -2.5 0 0.2 0.4 0.6 0.8 1 2.5 2 1.5 1 0.5 0 -0.5 -1 0 0.2 0.4 0.6 0.8 1 16 / 76

ICM: samples 1 10 10 0.5 5 5 0 -0.5 0 0 -1 -1.5 -5 -5 -2 -2.5 -10 -10 0 0.2 0.4 0.6 0.8 1 0 0 0.2 0.2 0.4 0.4 0.6 0.6 0.8 0.8 1 1 2.5 2 1.5 1 0.5 0 -0.5 -1 0 0.2 0.4 0.6 0.8 1 16 / 76

ICM: samples 1 10 10 0.5 5 5 0 -0.5 0 0 -1 -1.5 -5 -5 -2 -2.5 -10 -10 0 0.2 0.4 0.6 0.8 1 0 0 0.2 0.2 0.4 0.4 0.6 0.6 0.8 0.8 1 1 2.5 2 2 1 1 2 0 0 1.5 -1 -1 1 -2 -2 0.5 -3 -3 0 -4 -4 -0.5 -5 -5 -1 -6 -6 0 0.2 0.4 0.6 0.8 1 0 0 0.2 0.2 0.4 0.4 0.6 0.6 0.8 0.8 1 1 16 / 76

ICM: covariance The vector-valued function can be written as f ( x ) ❑ f ( x ) = a 1 u 1 ( x ) + a 2 u 2 ( x ) where a 1 = [ a 1 2 ] ⊤ and a 2 = [ a 2 1 a 1 1 a 2 2 ] ⊤ . The covariance for f ( x ) is computed as ❑ cov ( f ( x ) , f ( x ′ )) = a 1 ( a 1 ) ⊤ cov ( u 1 ( x ) , u 1 ( x ′ )) + a 2 ( a 2 ) ⊤ cov ( u 2 ( x ) , u 2 ( x ′ )) = a 1 ( a 1 ) ⊤ k ( x , x ′ ) + a 2 ( a 2 ) ⊤ k ( x , x ′ ) � a 1 ( a 1 ) ⊤ + a 2 ( a 2 ) ⊤ � k ( x , x ′ ) = We define B = a 1 ( a 1 ) ⊤ + a 2 ( a 2 ) ⊤ , leading to ❑ � � b 11 b 12 cov ( f ( x ) , f ( x ′ )) = B k ( x , x ′ ) = k ( x , x ′ ) b 21 b 22 Notice that B has rank two. ❑ 17 / 76

ICM: observed data 10 2 1 5 0 -1 0 -2 -3 -5 -4 -5 -10 -6 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 18 / 76

ICM: observed data 10 2 1 5 0 -1 0 -2 -3 -5 -4 -5 -10 -6 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 D 1 = { ( x i , f 1 ( x i )) | i = 1 , . . . , N } D 2 = { ( x i , f 2 ( x i )) | i = 1 , . . . , N } 18 / 76

ICM: observed data 10 2 1 5 0 -1 0 -2 -3 -5 -4 -5 -10 -6 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 D 1 = { ( x i , f 1 ( x i )) | i = 1 , . . . , N } D 2 = { ( x i , f 2 ( x i )) | i = 1 , . . . , N }   f 1 ( x 1 ) .  .  .   � �   �� f 1 f 1 ( x N ) 0 b 11 K b 12 K   = ∼ N ,   f 2 f 2 ( x 1 ) 0 b 21 K b 22 K     .   . .   f 2 ( x N ) 18 / 76

ICM: observed data 10 2 1 5 0 -1 0 -2 -3 -5 -4 -5 -10 -6 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 D 1 = { ( x i , f 1 ( x i )) | i = 1 , . . . , N } D 2 = { ( x i , f 2 ( x i )) | i = 1 , . . . , N }   f 1 ( x 1 ) .  .  .   � �   �� The matrix K ∈ R N × N has f 1 f 1 ( x N ) 0 b 11 K b 12 K   = ∼ N ,   f 2 f 2 ( x 1 ) 0 b 21 K b 22 K elements k ( x i , x j ) .     .   . .   f 2 ( x N ) 18 / 76

ICM: observed data 10 2 1 5 0 -1 0 -2 -3 -5 -4 -5 -10 -6 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 D 1 = { ( x i , f 1 ( x i )) | i = 1 , . . . , N } D 2 = { ( x i , f 2 ( x i )) | i = 1 , . . . , N } The Kronecker product between matrices C ∈ R c 1 × c 2 and G ∈ R g 1 × g 2 with     c 1 , 1 · · · c 1 , c 2 c 1 , 1 G · · · c 1 , c 2 G . . . . . .     C = . . . is C ⊗ G = . . . . . . . . .     · · · · · · c c 1 , 1 c c 1 , c 2 c c 1 , 1 G c c 1 , c 2 G 18 / 76

ICM: observed data 10 2 1 5 0 -1 0 -2 -3 -5 -4 -5 -10 -6 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 D 1 = { ( x i , f 1 ( x i )) | i = 1 , . . . , N } D 2 = { ( x i , f 2 ( x i )) | i = 1 , . . . , N }   f 1 ( x 1 ) .  .  .   � �   �� f 1 f 1 ( x N ) 0   = ∼ N , B ⊗ K   f 2 f 2 ( x 1 ) 0     .   . .   f 2 ( x N ) 18 / 76

ICM: observed data 10 2 1 5 0 -1 0 -2 -3 -5 -4 -5 -10 -6 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 D 1 = { ( x i , f 1 ( x i )) | i = 1 , . . . , N } D 2 = { ( x i , f 2 ( x i )) | i = 1 , . . . , N }   f 1 ( x 1 ) .  .  .   � �   �� The matrix K ∈ R N × N has f 1 f 1 ( x N ) 0   = ∼ N , B ⊗ K   f 2 f 2 ( x 1 ) 0 elements k ( x i , x j ) .     .   . .   f 2 ( x N ) 18 / 76

ICM: general case Consider a set of functions { f d ( x ) } D d = 1 . ❑ In the ICM ❑ R � a i d u i ( x ) , f d ( x ) = i = 1 where the functions u i ( x ) are GPs sampled independently, and share the same covariance function k ( x , x ′ ) . For f ( x ) = [ f 1 ( x ) · · · f D ( x )] ⊤ , the covariance cov [ f ( x ) , f ( x ′ )] is given as ❑ cov [ f ( x ) , f ( x ′ )] = AA ⊤ k ( x , x ′ ) = B k ( x , x ′ ) , where A = [ a 1 a 2 · · · a R ] . The rank of B ∈ R D × D is given by R . ❑ 19 / 76

ICM: autokrigeability If the outputs are considered to be noise-free, prediction using the ICM ❑ under an isotopic data case is equivalent to independent prediction over each output. This circumstance is also known as autokrigeability. ❑ 20 / 76

Semiparametric Latent Factor Model (SLFM) ICM uses R samples u i ( x ) from u ( x ) with the same covariance ❑ function. SLFM uses Q samples from u q ( x ) processes with different covariance ❑ functions. The SLFM with Q = 1 is the same to the ICM with R = 1. ❑ Consider two outputs f 1 ( x ) and f 2 ( x ) with x ∈ R p . ❑ Suppose we have Q = 2. ❑ We assume the following generative model for the outputs ❑ 1. Sample from a GP GP ( 0 , k 1 ( x , x ′ )) to obtain u 1 ( x ) . 2. Sample from a GP GP ( 0 , k 2 ( x , x ′ )) to obtain u 2 ( x ) . 3. Obtain f 1 ( x ) and f 2 ( x ) by adding a scaled versions of u 1 ( x ) and u 2 ( x ) f 1 ( x ) = a 1 , 1 u 1 ( x ) + a 1 , 2 u 2 ( x ) f 2 ( x ) = a 2 , 1 u 1 ( x ) + a 2 , 2 u 2 ( x ) 22 / 76

SLFM: samples 0 -0.5 -1 -1.5 -2 -2.5 0 0.2 0.4 0.6 0.8 1 1.5 1 0.5 0 -0.5 -1 -1.5 -2 -2.5 0 0.2 0.4 0.6 0.8 1 23 / 76

SLFM: samples 0 0 0 -2 -2 -0.5 -4 -4 -1 -6 -6 -1.5 -8 -8 -2 -10 -10 -2.5 -12 -12 0 0.2 0.4 0.6 0.8 1 0 0 0.2 0.2 0.4 0.4 0.6 0.6 0.8 0.8 1 1 1.5 1 0.5 0 -0.5 -1 -1.5 -2 -2.5 0 0.2 0.4 0.6 0.8 1 23 / 76

SLFM: samples 0 0 0 -2 -2 -0.5 -4 -4 -1 -6 -6 -1.5 -8 -8 -2 -10 -10 -2.5 -12 -12 0 0.2 0.4 0.6 0.8 1 0 0 0.2 0.2 0.4 0.4 0.6 0.6 0.8 0.8 1 1 1.5 4 4 1 2 2 0.5 0 0 0 -0.5 -2 -2 -1 -1.5 -4 -4 -2 -2.5 -6 -6 0 0.2 0.4 0.6 0.8 1 0 0 0.2 0.2 0.4 0.4 0.6 0.6 0.8 0.8 1 1 23 / 76

SLFM: samples 0 -0.5 -1 -1.5 -2 -2.5 0 0.2 0.4 0.6 0.8 1 3 2 1 0 -1 -2 0 0.2 0.4 0.6 0.8 1 23 / 76

SLFM: samples 0 0 0 -2 -2 -0.5 -4 -4 -1 -6 -6 -1.5 -8 -8 -2 -10 -10 -2.5 -12 -12 0 0.2 0.4 0.6 0.8 1 0 0 0.2 0.2 0.4 0.4 0.6 0.6 0.8 0.8 1 1 3 2 1 0 -1 -2 0 0.2 0.4 0.6 0.8 1 23 / 76

SLFM: samples 0 0 0 -2 -2 -0.5 -4 -4 -1 -6 -6 -1.5 -8 -8 -2 -10 -10 -2.5 -12 -12 0 0.2 0.4 0.6 0.8 1 0 0 0.2 0.2 0.4 0.4 0.6 0.6 0.8 0.8 1 1 3 2 2 2 0 0 1 -2 -2 0 -4 -4 -1 -6 -6 -2 -8 -8 0 0.2 0.4 0.6 0.8 1 0 0 0.2 0.2 0.4 0.4 0.6 0.6 0.8 0.8 1 1 23 / 76

SLFM: covariance The vector-valued function can be written as f ( x ) ❑ f ( x ) = a 1 u 1 ( x ) + a 2 u 2 ( x ) where a 1 = [ a 1 , 1 a 2 , 1 ] ⊤ and a 2 = [ a 1 , 2 a 2 , 2 ] ⊤ . The covariance for f ( x ) is computed as ❑ cov ( f ( x ) , f ( x ′ )) = a 1 ( a 1 ) ⊤ cov ( u 1 ( x ) , u 1 ( x ′ )) + a 2 ( a 2 ) ⊤ cov ( u 2 ( x ) , u 2 ( x ′ )) = a 1 ( a 1 ) ⊤ k 1 ( x , x ′ ) + a 2 ( a 2 ) ⊤ k 2 ( x , x ′ ) We define B 1 = a 1 ( a 1 ) ⊤ and B 2 = a 2 ( a 2 ) ⊤ , leading to ❑ cov ( f ( x ) , f ( x ′ )) = B 1 k 1 ( x , x ′ ) + B 2 k 2 ( x , x ′ ) Notice that B 1 and B 2 have rank one. ❑ 24 / 76

SLFM: observed data 0 2 -2 0 -4 -2 -6 -4 -8 -6 -10 -12 -8 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 25 / 76

SLFM: observed data 0 2 -2 0 -4 -2 -6 -4 -8 -6 -10 -12 -8 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 D 1 = { ( x i , f 1 ( x i )) | i = 1 , . . . , N } D 2 = { ( x i , f 2 ( x i )) | i = 1 , . . . , N } 25 / 76

SLFM: observed data 0 2 -2 0 -4 -2 -6 -4 -8 -6 -10 -12 -8 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 D 1 = { ( x i , f 1 ( x i )) | i = 1 , . . . , N } D 2 = { ( x i , f 2 ( x i )) | i = 1 , . . . , N }   f 1 ( x 1 ) .  .  .   � �   �� f 1 f 1 ( x N ) 0   = ∼ N , B 1 ⊗ K 1 + B 2 ⊗ K 2   f 2 f 2 ( x 1 ) 0     .   . .   f 2 ( x N ) 25 / 76

SLFM: observed data 0 2 -2 0 -4 -2 -6 -4 -8 -6 -10 -12 -8 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 D 1 = { ( x i , f 1 ( x i )) | i = 1 , . . . , N } D 2 = { ( x i , f 2 ( x i )) | i = 1 , . . . , N }   f 1 ( x 1 ) The matrix K 1 ∈ R N × N has .  .  .   elements k 1 ( x i , x j ) . � �   �� f 1 f 1 ( x N ) 0   = ∼ N , B 1 ⊗ K 1 + B 2 ⊗ K 2   f 2 f 2 ( x 1 ) 0     .   . .   f 2 ( x N ) 25 / 76

SLFM: observed data 0 2 -2 0 -4 -2 -6 -4 -8 -6 -10 -12 -8 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 D 1 = { ( x i , f 1 ( x i )) | i = 1 , . . . , N } D 2 = { ( x i , f 2 ( x i )) | i = 1 , . . . , N }   f 1 ( x 1 ) The matrix K 1 ∈ R N × N has .  .  .   elements k 1 ( x i , x j ) . � �   �� f 1 f 1 ( x N ) 0   = ∼ N , B 1 ⊗ K 1 + B 2 ⊗ K 2   f 2 f 2 ( x 1 ) 0 The matrix K 2 ∈ R N × N has     .   . elements k 2 ( x i , x j ) . .   f 2 ( x N ) 25 / 76

SLFM: general case Consider a set of functions { f d ( x ) } D d = 1 . ❑ In the SLFM ❑ Q � f d ( x ) = a d , q u q ( x ) , q = 1 where the functions u q ( x ) are GPs with covariance functions k q ( x , x ′ ) . For f ( x ) = [ f 1 ( x ) · · · f D ( x )] ⊤ , the covariance cov [ f ( x ) , f ( x ′ )] is given as ❑ Q Q � � cov [ f ( x ) , f ( x ′ )] = A q A ⊤ q k q ( x , x ′ ) = B q k q ( x , x ′ ) , q = 1 q = 1 where A q = a q . The rank of each B q ∈ R D × D is one. ❑ 26 / 76

Linear model of coregionalization (LMC) The LMC generalizes the ICM and the SLFM allowing several ❑ independent samples from GPs with different covariances. Consider a set of functions { f d ( x ) } D d = 1 . ❑ In the LMC ❑ R q Q � � a i d , q u i f d ( x ) = q ( x ) , q = 1 i = 1 where the functions u i q ( x ) are GPs with zero means and covariance functions q ( x ) , u i ′ cov [ u i q ′ ( x ′ )] = k q ( x , x ′ ) , if i = i ′ and q = q ′ . 28 / 76

Multiple-output Gaussian processes Mauricio A. Alvarez Department - PowerPoint PPT Presentation

Multiple-output Gaussian processes Mauricio A. Alvarez Department of Computer Science, The University of Sheffield. 1 / 76 Sensor Network South Coast of England Sensor location 2 / 76 Sensor Network South Coast of England Sensor

Gaussian Filter The Gaussian filter 1 2 1 A Gaussian kernel gives less 1 2 4 2 weight to

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes Instructor: Arindam Banerjee

Tra ffi c Management as a Service | Ghent, Belgium INPUT PROCESS OUTPUT INPUT PROCESS OUTPUT

Gaussian Processes Dan Cervone NYU CDS November 10, 2015 Dan Cervone (NYU CDS) Gaussian

CMPUT 466 Introduction to Gaussian Processes Dan Lizotte The Plan Introduction to Gaussian

Non-Gaussian likelihoods for Gaussian Processes Alan Saul Outline Motivation Non-Gaussian

Lecture 3 Capacity of Multiuser Gaussian Channels The Gaussian uplink: 6.1 The fading

State Space Gaussian Processes with Non-Gaussian Likelihoods Hannes Nickisch 1 Arno Solin 2

Scalable Gaussian Processes Zhenwen Dai Amazon September 4, 2018 @GPSS2018 Zhenwen Dai (Amazon)

Scalable Gaussian Processes Zhenwen Dai Amazon 9 September 2019 @GPSS 2019 Zhenwen Dai (Amazon)

Multiple Input and Output Channels Multiple Input and Output Channels Multiple Input Channels In

Another introduction to Gaussian Processes Richard Wilkinson School of Maths and Statistics

Gaussian Processes for Big Data James Hensman joint work with Nicol o Fusi, Neil D. Lawrence

Gaussian Processes Seung-Hoon Na Chonbuk National University Gaussian Process Regression

File Input and Output File Input and Output 1 / 9 File input/output input function reads values

Chapter 12 Overview Devices and Output Visual Output Dynamic Visualizations Sound

Efficient Contextual Representation Learning With Continuous Outputs Kai-Wei Chang Liunian Harold

Advanced Macroeconomics 1. Introducing the IS-MP-PC Model Karl Whelan School of Economics, UCD

Finite State Machines CS 3410 Computer System Organization & Programming [K. Bala, A. Bracy,

Hao Chen University of California, Davis Web services are highly attractive targets Over

ELEC 3040/3050 Lab #7 PWM Waveform Generation References: STM32L1xx Technical Reference Manual

Introduction to Neural Networks Jakob Verbeek INRIA, Grenoble Picture: Omar U. Florez Homework,

Tutorial Slides for Week 10 ENEL 353: Digital Circuits Fall 2015 Term Steve Norman, PhD, PEng

Why is forward guidance so powerful in standard monetary models? W HY S O P OWERFUL ? Textbook

Multiple-output Gaussian processes Mauricio A. Alvarez Department - PowerPoint PPT Presentation

Multiple-output Gaussian processes Mauricio A. Alvarez Department of Computer Science, The University of Sheffield. 1 / 76 Sensor Network South Coast of England Sensor location 2 / 76 Sensor Network South Coast of England Sensor

Gaussian Filter The Gaussian filter 1 2 1 A Gaussian kernel gives less 1 2 4 2 weight to

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes Instructor: Arindam Banerjee

Tra ffi c Management as a Service | Ghent, Belgium INPUT PROCESS OUTPUT INPUT PROCESS OUTPUT

Gaussian Processes Dan Cervone NYU CDS November 10, 2015 Dan Cervone (NYU CDS) Gaussian

CMPUT 466 Introduction to Gaussian Processes Dan Lizotte The Plan Introduction to Gaussian

Non-Gaussian likelihoods for Gaussian Processes Alan Saul Outline Motivation Non-Gaussian

Lecture 3 Capacity of Multiuser Gaussian Channels The Gaussian uplink: 6.1 The fading

State Space Gaussian Processes with Non-Gaussian Likelihoods Hannes Nickisch 1 Arno Solin 2

Scalable Gaussian Processes Zhenwen Dai Amazon September 4, 2018 @GPSS2018 Zhenwen Dai (Amazon)

Scalable Gaussian Processes Zhenwen Dai Amazon 9 September 2019 @GPSS 2019 Zhenwen Dai (Amazon)

Multiple Input and Output Channels Multiple Input and Output Channels Multiple Input Channels In

Another introduction to Gaussian Processes Richard Wilkinson School of Maths and Statistics

Gaussian Processes for Big Data James Hensman joint work with Nicol o Fusi, Neil D. Lawrence

Gaussian Processes Seung-Hoon Na Chonbuk National University Gaussian Process Regression

File Input and Output File Input and Output 1 / 9 File input/output input function reads values

Chapter 12 Overview Devices and Output Visual Output Dynamic Visualizations Sound

Efficient Contextual Representation Learning With Continuous Outputs Kai-Wei Chang Liunian Harold

Advanced Macroeconomics 1. Introducing the IS-MP-PC Model Karl Whelan School of Economics, UCD

Finite State Machines CS 3410 Computer System Organization &amp; Programming [K. Bala, A. Bracy,

Hao Chen University of California, Davis Web services are highly attractive targets Over

ELEC 3040/3050 Lab #7 PWM Waveform Generation References: STM32L1xx Technical Reference Manual

Introduction to Neural Networks Jakob Verbeek INRIA, Grenoble Picture: Omar U. Florez Homework,

Tutorial Slides for Week 10 ENEL 353: Digital Circuits Fall 2015 Term Steve Norman, PhD, PEng

Why is forward guidance so powerful in standard monetary models? W HY S O P OWERFUL ? Textbook

Finite State Machines CS 3410 Computer System Organization & Programming [K. Bala, A. Bracy,