Analysing Gene Expression Data Using Gaussian Processes Lorenz - PowerPoint PPT Presentation

Analysing Gene Expression Data Using Gaussian Processes Lorenz Wernisch School of Crystallography Birkbeck College London – p.1/35

Overview Gene regulatory networks, microarrays Time-series analysis by linear regression Bayesian inference, Occam’s razor Extension to nonlinear models Gaussian processes Applications Filtering with Gaussian processes – p.2/35

Gene Regulatory Networks _� +� Gene expression levels depend on external stimuli and activity of genes (transcription factors) Microarrays measure the mRNA levels of genes Construction of gene networks from microarray data – p.3/35

A. thaliana : APRR family Time-series of A. APRR9 thaliana log expression levels Constant light APRR7 13 time points APRR5 every 4 hours from 26 to 74 hrs APRR3 Data by Kieron TOC1/APRR1 Edwards and Andrew Millar 26 30 34 38 42 46 50 54 58 62 66 70 74 time/hrs APRR family, possible modulators for light sensitivity of main circadian clock series – p.4/35

Networks from time-series data A A A . . . A B B B . . . B C C C C . . . t = 0 t = 1 t = 2 Static graph representing dependencies between genes has cycles Cycles unrolled in time: acyclic graph Network topology repeated over time slices – p.5/35

Linear time-series model x t = Φ x t − 1 + µ + w t x t is N -vector of RNA levels at time t (of N genes) w t is N -vector of biological noise added at t µ is N -vector of constant trend, ie constitutive expression If there is no constant trend , µ = 0 , Φ can be estimated by standard regression: Φ ′ = ( X t − 1 X ′ t − 1 ) − 1 X t − 1 X ′ t where X t and X t − 1 are N × ( T − 1) matrices with time vectors x 2 , . . . , x T and x 1 , . . . , x T − 1 as columns – p.6/35

Estimating matrix for APPR family Estimation by standard (least squares) regression: APRR9 APRR7 APRR5 APRR3 TOC1 APRR9 -0.59 -0.06 0.78 0.39 0.48 APRR7 0.56 0.35 0.34 0.29 0.21 APRR5 -0.80 0.15 -0.26 0.46 0.43 APRR3 -0.34 -0.94 -0.12 -0.13 0.05 TOC1 -0.11 -0.05 0.66 0.46 0.30 Problem : each gene connected to each other One could test for significance of nonzero parameters: problems of significance tests, significance levels, multiple testing, . . . – p.7/35

Bayesian models are simple Automatic complexity control, 0.030 Occam’s razor : data probability 0.020 Complex model covers many data sets: small probability each 0.010 Simple model few data sets: large probability each 0.000 −100 −50 0 50 100 [MacKay, Neal] data space Automatic relevance determination : assume Gaussian distribution for each matrix entry a ij with variances σ 2 ij as free parameters, integrate out a ij and maximize P ( D | model , { σ 2 ij } ) [RVMs Tipping] – p.8/35

Linear regression framework t = Φ w + ǫ Probability of data, given parameters ( likelihood ): (2 π ) N/ 2 σ N exp( −| t − Φ w | 2 1 p ( t | w, σ 2 ) = ) 2 σ 2 Gaussian prior on coefficients (weights) w : M � m exp( − α m w 2 1 α 1 / 2 m p ( w | α ) = ) (2 π ) − M/ 2 2 m =1 α m is the precision (the inverse variance 1 /σ 2 m ) – p.9/35

Maximum likelihood type II Integrating out w : (2 π ) N/ 2 | C | 1 / 2 exp( − 1 1 p ( t | α, σ 2 ) = 2 t ′ C − 1 t ) C = σ 2 I + Φ A − 1 Φ ′ Maximum likelihood estimation of hyperparameters α by maximizing p ( t | α, σ 2 ) (type II ML) brings Occam’s razor to bear Tipping et al. suggest analytical solutions for iterative optimization, optimizing for α i in turn Maximization, eg, by conjugate gradients seems to be at least as efficient – p.10/35

Sparse Bayesian estimates for APRR net APRR9 APRR7 APRR5 APRR3 TOC1 APRR9 -0.11 0.27 -0.90 -0.01 0 APRR7 0.00 0.28 0.00 -0.80 0 APRR5 0.28 0.39 0.00 0.00 0 APRR3 0.00 0.41 0.59 0.00 0 TOC1 0.00 0.37 0.52 0.00 0 Far fewer nonzero entries than in standard regression! – p.11/35

Reconstruction of APRR traces APRR9 APRR9 log expression levels APRR7 APRR7 APRR5 APRR5 APRR3 APRR3 TOC1/APRR1 TOC1/APRR1 26 30 34 38 42 46 50 54 58 62 66 70 74 26 30 34 38 42 46 50 54 58 62 66 70 74 time/hrs time/hrs Start estimated dynamics on initial conditions with 0 process noise: good agreement – p.12/35

Sparse Bayesian estimates for LHY/TOC1 net LHY TOC1 GI PIF3 LHY 0.66 0.80 -0.78 0.00 TOC1 -0.34 -0.19 0.58 -0.10 GI 0.00 -0.87 0.65 0.00 PIF3 0.00 0.00 0.22 -0.14 LHY in negative feedback with TOC1 Second negative feedback loop involving GI PIF3 just added for good measure – p.13/35

Reconstruction of LHY/TOC1 traces LHY LHY log expression levels TOC1/APRR1 TOC1/APRR1 GI GI PIF3 PIF3 26 30 34 38 42 46 50 54 58 62 66 70 74 26 30 34 38 42 46 50 54 58 62 66 70 74 time/hrs time/hrs Start estimated dynamics on initial conditions with 0 process noise – p.14/35

Nonlinear dependencies Assumed linear depencies of level of 4 gene A on other 2 gene levels G e 0 n e 10 C −2 Genes often operate 5 −4 as switches and −10 Gene B 0 −5 complex gates with 0 −5 Gene A nonlinear 5 interactions (eg −10 10 exclusive or) Need to go beyond linear models: Gaussian processes (GP) – p.15/35

Gaussian process Input values d -dimensional x = ( x 1 , . . . , x N ) , x i ∈ R d Target values t = ( t 1 , . . . , t N ) , t i ∈ R Joint distribution of the output t is multivariate Gaussian N (0 , K ) Covariance matrix K K pq = β 0 + C L ( x p , x q ) + C G ( x p , x q ) + σ 2 ǫ I ( p = q ) β 0 overall constant σ 2 ǫ noise term along diagonal of K I () indicator function – p.16/35

Covariance components Linear covariance part p B − 1 x q C L ( x p , x q ) = x ′ with linear relevance parameters B = diag ( β 1 , . . . , β d ) Squared exponential (Gaussian) covariance part C G ( x p , x q ) = α 0 exp( − 1 2( x p − x q ) ′ A − 1 ( x p − x q )) with nonlinear relevance parameters A = diag ( α 1 , . . . , α d ) and scale parameter α 0 – p.17/35

Compare with linear regression Compare linear covariance part with noise: p B − 1 x q + σ 2 C L ( x p , x q ) = x ′ ǫ I with the covariance matrix of a linear regression with weights integrated out (see above): C = Φ A − 1 Φ ′ + σ 2 ǫ I This is the same if B = diag( α 1 , . . . , α p ) = diag(1 /σ 2 1 , . . . , 1 /σ 2 p ) and the rows of Φ are the input vectors x i – p.18/35

Training of GP Covariance parameters θ MAP maximizing posterior probability: P ( θ | t, x ) ∝ P ( t | x, θ ) P ( θ ) with � � log P ( t | x, θ ) = − 1 t ′ K ( x, θ ) t − log | K ( x, θ ) |− n log 2 π 2 Lognormal prior P ( θ ) with fixed a and b log P ( θ ) = N ( θ | a, b ) Optimization with conjugate gradients (using derivatives) – p.19/35

Conditional mean and variance New input point x ∗ : � � K k ( x ∗ ) ˜ K = k ( x ∗ ) ′ k ( x ∗ , x ∗ ) where k ( x ∗ ) = ( β 0 + C L ( x ∗ , x q ) + C G ( x ∗ , x q )) N q =1 k ( x ∗ , x ∗ ) = β 0 + x ∗′ B − 1 x ∗ + α 0 + σ 2 ǫ f ( x ∗ ) is Gaussian N ( µ ( x ∗ ) , σ 2 ( x ∗ )) µ ( x ∗ ) = k ( x ∗ ) ′ K − 1 t σ 2 ( x ∗ ) = k ( x ∗ , x ∗ ) − k ( x ∗ ) ′ K − 1 k ( x ∗ ) – p.20/35

GP on simulated static data 12 21 10 18 Relevance parameters: 19 3 26 5 7 23 16 10 x 1 x 2 x 3 4 30 28 17 14 6 2 15 11 0 z nonlinear 0.21 0 0 22 8 20 1 9 linear 0 0.35 0 −5 13 27 29 10 −10 5 estimated sd 0.92 −10 5 −5 25 0 y 0 24 x −5 5 −10 10 30 data points with f ( x 1 , x 2 , x 3 ) = 5 sin(0 . 7 x 1 ) + 0 . 5 x 2 + ǫ where ǫ ∼ N (0 , 1) – p.21/35

GP on simulated time-series data Artificial network of 3 8 variables connected by nonlinear relationships 6 3 variables 4 x t +1 = 0 . 35 x t + 5 sin(0 . 3 y t ) + ǫ 1 2 y t +1 = 0 . 4 y t + 5 cos(0 . 3 z t ) + ǫ 2 0 z t +1 = 0 . 4 z t + 0 . 1 y 2 t − 2 + ǫ 3 −2 5 10 15 20 time Stable cycling easy to achieve with nonlinear networks – p.22/35

GP on time-series 10 10 5 5 0 z 0 z −5 −5 10 −10 10 −10 −10 5 −10 5 −5 −5 0 0 y 0 0 y x x −5 −5 5 5 −10 −10 10 10 Variable 1: the linear and nonlinear relevance parameters for input 3 are both 0 – p.23/35

GP on time-series 10 10 5 5 0 0 z z 10 −5 −5 10 5 5 −10 −10 10 0 10 0 x 5 x 5 0 −5 0 −5 y y −5 −5 −10 −10 −10 −10 Variable 2: the linear and nonlinear relevance parameters for input 1 are both 0 – p.24/35

GP on time-series 10 10 5 5 0 z 0 z −5 −5 10 −10 10 −10 −10 5 −10 5 −5 −5 0 0 y 0 0 y x x −5 −5 5 5 −10 −10 10 10 Variable 3: the linear and nonlinear relevance parameters for input 1 are both 0 – p.25/35

Gene network: LHY dependency Nonlinear relevance: 2 0.01, 0.01, 0.73, 0.01 1 Linear relevance: z 0.81, 1.13, 0.45, 0.00 0 Estimated sd 0.18 2 −1 −2 1 No dependency of LHY on −1 0 PIF3 y 0 x −1 1 −2 2 Nonlinear dependency of LHY on TOC1 and GI, LHY and PIF3 were set to 0 – p.26/35

Analysing Gene Expression Data Using Gaussian Processes Lorenz - PowerPoint PPT Presentation

Analysing Gene Expression Data Using Gaussian Processes Lorenz Wernisch School of Crystallography Birkbeck College London p.1/35 Overview Gene regulatory networks, microarrays Time-series analysis by linear regression Bayesian

Gene Expression Data Introduction to gene expression data Expression data storage concept An

Week 5 Kullmann Analysing BFS Depth-first search Depth-first search Analysing DFS

Survival Models built from Gene Expression Data using Gene Groups as Covariates Kai Kammers,

Gaussian Filter The Gaussian filter 1 2 1 A Gaussian kernel gives less 1 2 4 2 weight to

A Data Warehouse-based A Data Warehouse-based Gene Expression Analysis Gene Expression Analysis

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes Instructor: Arindam Banerjee

Eukaryotic Gene Eukaryotic Gene Prediction Prediction Eukaryotic gene structure Eukaryotic

Gaussian Processes Dan Cervone NYU CDS November 10, 2015 Dan Cervone (NYU CDS) Gaussian

Examples of online analysis tools for gene expression data Tools integrated in data repositories

CMPUT 466 Introduction to Gaussian Processes Dan Lizotte The Plan Introduction to Gaussian

Analysis of Gene Expression Profiles Analysis of Gene Expression Profiles and Drug Activity

Gene Expression Remember the days of 10 th grade biology Learning about gene expression Which can

AP BIOLOGY Gene Expression Summer 2013 www.njctl.org Slide 3 / 199 Gene Expression Unit Topics

1 Milestones Milestones ID Task Name Duration Start Finish % Complete 1 Project Proposal

CSEP 527 Computational Biology Gene Expression Analysis 1 Assaying Gene Expression 3

CSEP 527 Computational Biology Gene Expression Analysis 1 Assaying Gene Expression 3

Applied Machine Learning Applied Machine Learning Regularization Siamak Ravanbakhsh Siamak

A motivation for polynomial regression We have obtained input-output pairs { ( x t , y t ) } t over

Optimal state estimation for numerical weather prediction using reduced order models A.S. Lawless

Theory of Generalized Linear Models If Y has a Poisson distribution with parameter then P (

Introduction to Mobile Robotics Basics of LSQ Estimation, Geometric Feature Extraction Wolfram

Module 4 Markov Processes CS 886 Sequential Decision Making and Reinforcement Learning

Fast location of the process noise for nonlinear system identification Erliang Zhang * , Maarten

State estimation approach to nonstationary Introduction inverse problems State estimation