Simulation of molecular regulatory networks with graphical models - - PowerPoint PPT Presentation

simulation of molecular regulatory networks with
SMART_READER_LITE
LIVE PREVIEW

Simulation of molecular regulatory networks with graphical models - - PowerPoint PPT Presentation

Simulation of molecular regulatory networks with graphical models Inma Tur 1 Robert Castelo 1 Alberto Roverato 2 inma.tur@upf.edu robert.castelo@upf.edu alberto.roverato@unibo.it 1 Universitat Pompeu Fabra, Barcelona, Spain 2 Universit` a di


slide-1
SLIDE 1

Simulation of molecular regulatory networks with graphical models

Inma Tur1

inma.tur@upf.edu

Alberto Roverato2

alberto.roverato@unibo.it

Robert Castelo1

robert.castelo@upf.edu

1Universitat Pompeu Fabra, Barcelona, Spain 2Universit`

a di Bologna, Bologna, Italy

useR! 2013 - UCLM, Albacete, Spain - July 10-12 2013

Inma Tur, Alberto Roverato, Robert Castelo Simulation of molecular regulatory networks with graphical models 1 / 27

slide-2
SLIDE 2

Motivation - Genomics data

High-throughput genomics technologies produce high-dimensional (p ≫ n) multivariate data sets of continuous and discrete random variables. Gene expression data Genetical genomics data Network of molecular regulatory interactions

Inma Tur, Alberto Roverato, Robert Castelo Simulation of molecular regulatory networks with graphical models 2 / 27

slide-3
SLIDE 3

Motivation - Graphical Markov Models (GMM)

Gaussian GMMs XV ∼ Np (µ, Σ)

1 2 3 4

> library(qpgraph) > set.seed(12345) > gmm <- rUGgmm(dRegularGraphParam()) > round(solve(gmm$sigma), digits=1) 1 2 3 4 1 9.5 -3.4 -7.2 0.0 2 -3.4 5.9 0.0 -2.3 3 -7.2 0.0 8.2 0.9 4 0.0 -2.3 0.9 2.3

Homogeneous Mixed GMMs XV ∼ Np (µ(i), Σ(i)) with Σ(i) ≡ Σ

I1 Y1 Y2 Y3

> library(qpgraph) > set.seed(12345) > gmm <- rHMgmm(dRegularMarkedGraphParam()) > round(solve(gmm$sigma), digits=1) Y1 Y2 Y3 Y1 11.0 0.0 -7.2 Y2 0.0 1.2 -1.6 Y3 -7.2 -1.6 8.2 > gmm$mean() Y1 Y2 Y3 1 0.4720734 0.9669291 0.7242007 2 1.4720734 1.9669291 1.7934027

Inma Tur, Alberto Roverato, Robert Castelo Simulation of molecular regulatory networks with graphical models 3 / 27

slide-4
SLIDE 4

Motivation - Graphical Markov Model (GMM)

Testing GMM estimation procedures on simulated data is a fundamental step to verify properties such as correctness or asymptotic behavior. The R/Bioconductor package qpgraph implements algorithms to simulate Gaussian GMMs, homogeneous mixed GMMs and data from them. Simulating Gaussian GMMs requires simulating covariance matrices whose inverse matches a zero-pattern defined by missing edges in a graph. Simulating homogeneous mixed GMMs requires simulating conditional covariance matrices whose inverse matches a zero-pattern defined by a graph, and conditional mean vectors satisfying additive effects. The rest of this talk is based on the vignette entitled ” Simulating molecular regulatory networks using qpgraph”from the qpgraph package.

Inma Tur, Alberto Roverato, Robert Castelo Simulation of molecular regulatory networks with graphical models 4 / 27

slide-5
SLIDE 5

Outline

1

Simulation of graphs for GMMs

2

Simulation of undirected Gaussian Graphical Markov Models

3

Simulation of Homogeneous Mixed Graphical Markov Models

4

Simulation of eQTL models of experimental crosses

5

Conclusions

Inma Tur, Alberto Roverato, Robert Castelo Simulation of molecular regulatory networks with graphical models 5 / 27

slide-6
SLIDE 6

Outline

1

Simulation of graphs for GMMs

2

Simulation of undirected Gaussian Graphical Markov Models

3

Simulation of Homogeneous Mixed Graphical Markov Models

4

Simulation of eQTL models of experimental crosses

5

Conclusions

Inma Tur, Alberto Roverato, Robert Castelo Simulation of molecular regulatory networks with graphical models 6 / 27

slide-7
SLIDE 7

Simulation of graphs for GMMs

Consider undirected labeled graphs G = (V , E) where V = {1, . . . , p} is the vertex set indexing a vector of random variables (r.v.’s) XV = {X1, . . . , Xp} that belong to some probability distribution P. E ⊆ (V × V ) is the edge set, such that (i, j) ∈ E ⇒ Xi⊥ ⊥Xj|XV \{Xi, Xj} , holds in P (pairwise Markov property w.r.t G). Let {A, B, S} ⊆ V , S separates A from B in G (A⊥GB|S) if every path between A and B intersects S. G should be such that A⊥GB|S ⇒ XA⊥ ⊥XB|XS, holds in P (global Markov property w.r.t G). In the context of GMMs we want to simulate graphs in which we can control separation and sparseness.

Inma Tur, Alberto Roverato, Robert Castelo Simulation of molecular regulatory networks with graphical models 7 / 27

slide-8
SLIDE 8

Simulation of graphs for GMMs

qpgraph simulates undirected graphs according to

1

the type of graph pure single type of vertices marked two subsets of vertices (associated to discrete and continuous random variables)

2

the model to simulate the random graph Erd¨

  • s-R´

enyi edges occur with equal probability, or graphs are chosen uniformly at random with given number of vertices & edges. d-regular vertices have a constant degree d (Harary, 1969). It follows that graph density is a linear function of d: D = d/(p − 1). This is implemented in the function rgraphBAM(n, param, ...) which returns

  • bjects of the class graphBAM defined in the graph package.

Inma Tur, Alberto Roverato, Robert Castelo Simulation of molecular regulatory networks with graphical models 8 / 27

slide-9
SLIDE 9

Simulation of graphs for GMMs

Input parameters in rgraphBAM() are defined by S4 classes

graphParam

p labels

markedGraphParam

pI Ilabels pY Ylabels

erGraphParam

m [# edges] prob [Pr(edge)]

erMarkedGraphParam dRegularMarkedGraphParam dRegularGraphParam

d [vertex degree] exclude

Inma Tur, Alberto Roverato, Robert Castelo Simulation of molecular regulatory networks with graphical models 9 / 27

slide-10
SLIDE 10

Simulation of graphs for GMMs

d-regular graphs are simulated with the Steger and Wormald (1999) algorithm

> library(qpgraph) > set.seed(1234) > g <- rgraphBAM(dRegularMarkedGraphParam(pI=2, pY=10, d=3)) > plot(g, lwd=3)

I1 I2 Y1 Y2 Y3 Y4 Y5 Y6 Y7 Y8 Y9 Y10

plot() is overloaded to use the functionality from the Rgraphviz package.

Inma Tur, Alberto Roverato, Robert Castelo Simulation of molecular regulatory networks with graphical models 10 / 27

slide-11
SLIDE 11

Outline

1

Simulation of graphs for GMMs

2

Simulation of undirected Gaussian Graphical Markov Models

3

Simulation of Homogeneous Mixed Graphical Markov Models

4

Simulation of eQTL models of experimental crosses

5

Conclusions

Inma Tur, Alberto Roverato, Robert Castelo Simulation of molecular regulatory networks with graphical models 11 / 27

slide-12
SLIDE 12

Simulation of undirected Gaussian GMMs

Undirected Gaussian GMMs are multivariate models on continuous r.v.’s XV = {X1, . . . , Xp} determined by an undirected graph G = (V , E) with V = {1, . . . , p} and E ⊆ V × V such that XV ∼ Np (µ, Σ) where {Σ−1}ij = 0 for i = j and (i, j) ∈ G Therefore, to simulate an undirected Gaussian GMM we need to build a matrix Σ such that

1

Σ is positive definite (Σ ∈ S +),

2

the off-diagonal cells of the scaled Σ corresponding to the present edges in G match a given marginal correlation ρ,

3

the zero pattern of Σ−1 matches the missing edges in G. This is not straightforward since setting directly off-diagonal cells to zero in some initial Γ ∈ S + will not typically lead to a positive definite matrix.

Inma Tur, Alberto Roverato, Robert Castelo Simulation of molecular regulatory networks with graphical models 12 / 27

slide-13
SLIDE 13

Simulation of undirected Gaussian GMMs

Let ΓG be an incomplete matrix with elements {γij} for i = j or (i, j) ∈ G.

1 2 3 4

ΓG =     γ11 γ12 γ13 ∗ γ21 γ22 ∗ γ24 γ31 ∗ γ33 γ34 ∗ γ42 γ43 γ44     Γ is a positive completion of ΓG if Γ∈S + and {Γ−1}ij =0 for i =j, (i, j)∈G. Draw ΓG from a Wishart distributionWp(Λ, p); Λ=∆R∆, ∆=diag({

  • 1/p}p)

and R = {Rij}p×p where Rij = 1 for i = j and Rij = ρ for i = j. It is required that Λ ∈ S + and this happens if and only if −1/(p − 1) < ρ < 1. Finally, to obtain Σ ≡ Γ from ΓG, qpgraph uses the regression algorithm by Hastie, Tibshirani and Friedman (2009, pg. 634) as matrix completion algorithm. See functions qpRndWishart(), qpHTF() and qpG2Sigma() for further details.

Inma Tur, Alberto Roverato, Robert Castelo Simulation of molecular regulatory networks with graphical models 13 / 27

slide-14
SLIDE 14

Simulation of undirected Gaussian GMMs

rUGgmm() simulates undirected Gaussian GMMs taking as input either a

graphParam or a graphBAM object. It returns an S4 object of class UGgmm.

> set.seed(12345) > gmm <- rUGgmm(n=1, g=dRegularGraphParam(p=4, d=2), rho=0.75) > class(gmm) [1] "UGgmm" attr(,"package") [1] "qpgraph" > names(gmm) ## it behaves pretty much like a 'list' object [1] "X" "p" "g" "mean" "sigma" > round(solve(gmm$sigma), digits=1) 1 2 3 4 1 20.9 -6.6 -13.6 0.0 2

  • 6.6 10.6

0.0 -4.3 3 -13.6 0.0 12.7 0.9 4 0.0 -4.3 0.9 4.3 > gmm2 <- rUGgmm(n=1, g=gmm$g) ## fix 'g' simulate covariance only > round(solve(gmm2$sigma), digits=1) 1 2 3 4 1 7.2 -1.3 -3.2 0.0 2 -1.3 1.2 0.0 -0.5 3 -3.2 0.0 3.4 -0.5 4 0.0 -0.5 -0.5 1.5

1 2 3 4

Inma Tur, Alberto Roverato, Robert Castelo Simulation of molecular regulatory networks with graphical models 14 / 27

slide-15
SLIDE 15

Simulation of undirected Gaussian GMMs

rmvnorm() from the mvtnorm package is overloaded in qpgraph to simulate

multivariate normal observations from the undirected GMM:

> set.seed(123) > X <- rmvnorm(n=10000, gmm) > head(X, n=3) 1 2 3 4 [1,] 0.4212627 0.28868036 0.8864354 0.22768491 [2,] 0.7696843 0.98995123 0.8572206 -0.09639081 [3,] 0.1257251 0.04640156 0.5221513 0.23059947 > round(solve(cov(X)), digits=1) 1 2 3 4 1 21.2 -6.7 -13.7 0.0 2

  • 6.7 10.6

0.0 -4.3 3 -13.7 0.0 12.7 0.9 4 0.0 -4.3 0.9 4.2 > mean(cor(X)[upper.tri(cor(X)) & as(gmm$g, "matrix")]) [1] 0.8765055

As the number of present edges grow, their mean marginal correlation approaches the given one.

Inma Tur, Alberto Roverato, Robert Castelo Simulation of molecular regulatory networks with graphical models 15 / 27

slide-16
SLIDE 16

Outline

1

Simulation of graphs for GMMs

2

Simulation of undirected Gaussian Graphical Markov Models

3

Simulation of Homogeneous Mixed Graphical Markov Models

4

Simulation of eQTL models of experimental crosses

5

Conclusions

Inma Tur, Alberto Roverato, Robert Castelo Simulation of molecular regulatory networks with graphical models 16 / 27

slide-17
SLIDE 17

Simulation of Homogeneous Mixed GMMs

Simplifying assumptions (in the context of genetical genomics data):

1

Discrete genotypes affect gene expression and not the other way around.

2

Joint distribution of X is a conditional Gaussian distribution XV ∼ NpY (µ(i), Σ(i)) with i ∈ I.

3

Genotype alleles affect only mean expression levels

  • f genes and not the correlations between them,

i.e., Σ(i) ≡ Σ is constant throughout i ∈ I.

4

Currently, discrete r.v.’s are simulated as being marginally independent between them.

5

Currently, every continuous r.v. cannot depend on more than one discrete r.v., where µ(i) = Σ × h(i) and h(i) = {h1(i), . . . , hp(i)} contain the mixed linear parameters that define the additive effects. I1 Y1 Y2 Y3

Inma Tur, Alberto Roverato, Robert Castelo Simulation of molecular regulatory networks with graphical models 17 / 27

slide-18
SLIDE 18

Simulation of Homogeneous Mixed GMMs

rHMgmm() simulates homogeneous mixed GMMs taking as input either a

markedGraphParam or graphBAM object. It returns an S4 object of class

  • HMgmm. Here we simulate one involving 250 joint discrete levels:

> set.seed(12345) > gmm <- rHMgmm(n=1, dRegularMarkedGraphParam(pI=50, pY=100, d=2), a=2) > names(gmm) [1] "X" "I" "Y" "p" "pI" "pY" "g" "mean" "sigma" [10] "a" "eta2" > print(object.size(gmm), units="Kb") 90.1 Kb

rcmvnorm() simulates data from a homogeneous mixed GMM using rmvnorm()

for the continuous observations (mvtnorm package):

> set.seed(123) > X <- rcmvnorm(n=1000, gmm) > Ymix <- gmm$Y[which(rowSums(as(gmm$g, "matrix")[gmm$Y, gmm$I]) == 1)] > Imix <- gmm$I[apply(as(gmm$g, "matrix")[Ymix, gmm$I] == 1, 1, which)] > smean <- sapply(1:length(Ymix), function(i, Y, I) tapply(Y[, i], I[, i], mean), + X[, Ymix], X[, Imix]) > mean(abs(smean[1, ] - smean[2, ])) ## sample additive effects approach the given one [1] 2.028512

Inma Tur, Alberto Roverato, Robert Castelo Simulation of molecular regulatory networks with graphical models 18 / 27

slide-19
SLIDE 19

Outline

1

Simulation of graphs for GMMs

2

Simulation of undirected Gaussian Graphical Markov Models

3

Simulation of Homogeneous Mixed Graphical Markov Models

4

Simulation of eQTL models of experimental crosses

5

Conclusions

Inma Tur, Alberto Roverato, Robert Castelo Simulation of molecular regulatory networks with graphical models 19 / 27

slide-20
SLIDE 20

Simulation of eQTL models of experimental crosses

S4 class eQTLcross: a genetic map (map class in qtl package) a homogeneous mixed GMM (HMgmm)

reQTLcross simulates n eQTLcross models:

> reQTLcross(n=2, network=eQTLcrossParam()) [[1]] eQTL backcross model with 20 markers, 20 genes, 20 eQTL and 20 gene-gene expression associations. [[2]] eQTL backcross model with 20 markers, 20 genes, 20 eQTL and 20 gene-gene expression associations.

120 100 80 60 40 20 Chromosome Location (cM) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 X Genetic map

Inma Tur, Alberto Roverato, Robert Castelo Simulation of molecular regulatory networks with graphical models 20 / 27

slide-21
SLIDE 21

Simulation of eQTL models of experimental crosses

eQTL model with cis-associations only

> data(map10, package="qtl") ## load example genetic map from the 'qtl' package > (eqtl <- reQTLcross(eQTLcrossParam(map=map10, genes=100))) eQTL backcross model with 187 markers, 100 genes, 100 eQTL and 100 gene-gene expression associations. > eqtl$model Homogeneous mixed graphical Markov model with 100 discrete and 100 continuous r.v., and 200 edges. > plot(eqtl, main="eQTL model with cis-associations only", cex.lab=1.5, cex.main=1.5)

eQTL model with cis−associations only eQTL location Gene location

1 2 3 4 5 6 8 9 11 13 15 17 19 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 X

Inma Tur, Alberto Roverato, Robert Castelo Simulation of molecular regulatory networks with graphical models 21 / 27

slide-22
SLIDE 22

Simulation of eQTL models of experimental crosses

eQTL model with cis and trans-associations

> set.seed(123) > (eqtl <- reQTLcross(eQTLcrossParam(map=map10, genes=100, cis=0.5, trans=c(5,5)), a=5)) eQTL backcross model with 187 markers, 100 genes, 60 eQTL and 100 gene-gene expression associations. > head(ciseQTL(eqtl), n=3) chrom location QTL gene a 1 1 0.00000 QTL1 g1 5 2 1 78.15385 QTL2 g5 5 3 1 97.69231 QTL3 g6 5 > head(transeQTL(eqtl), n=10) chrom location QTL gene a 1 6 65.62500 QTL17 g21 5 2 6 65.62500 QTL17 g40 5 3 6 65.62500 QTL17 g41 5 4 6 65.62500 QTL17 g59 5 5 6 65.62500 QTL17 g97 5 6 7 63.42857 QTL19 g10 5 7 7 63.42857 QTL19 g22 5 8 7 63.42857 QTL19 g34 5 9 7 63.42857 QTL19 g42 5 10 7 63.42857 QTL19 g79 5 > plot(eqtl, cex.lab=1.5, cex.main=1.5, + main="eQTL model with trans-eQTL")

eQTL model with trans−eQTL eQTL location Gene location

1 2 3 4 5 6 8 9 11 13 15 17 19 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 X

Inma Tur, Alberto Roverato, Robert Castelo Simulation of molecular regulatory networks with graphical models 22 / 27

slide-23
SLIDE 23

Simulation of eQTL models of experimental crosses

sim.cross() from package qtl is overloaded to simulate genotypes from

experimental crosses and place eQTL using an input homogeneous mixed GMM:

> set.seed(123) > cross <- sim.cross(map10, eqtl) > eQTLgene <- "g5" ## take gene 'g5' with a cis-eQTL > connectedGenes <- graph::edges(eqtl$model$g)[[eQTLgene]] > (connectedGenes <- connectedGenes[connectedGenes %in% eqtl$model$Y]) [1] "g15" "g22" > out.mr <- qtl::scanone(cross, method="mr", pheno.col=c(eQTLgene, connectedGenes))

20 40 60 80 100 120 5 10 15

Map position (cM) LOD scores

g5 g15 g22 QTL2

g15 g22 g5 QTL2

Inma Tur, Alberto Roverato, Robert Castelo Simulation of molecular regulatory networks with graphical models 23 / 27

slide-24
SLIDE 24

Outline

1

Simulation of graphs for GMMs

2

Simulation of undirected Gaussian Graphical Markov Models

3

Simulation of Homogeneous Mixed Graphical Markov Models

4

Simulation of eQTL models of experimental crosses

5

Conclusions

Inma Tur, Alberto Roverato, Robert Castelo Simulation of molecular regulatory networks with graphical models 24 / 27

slide-25
SLIDE 25

Conclusions

The R/Bioconductor package qpgraph enables simulating Gaussian GMMs, homogeneous mixed GMMs and data from them. The implemented algorithms can simulate GMMs with given marginal correlations and additive effects. This provides flexibility when simulating data to verify correctness or asymptotic behavior of GMM estimation algorithms. The use of S4 classes hides algorithmic complexity to the non-expert user and provides a seamless integration with other packages such as graph, Rgraphviz, mvtnorm and qtl. By overloading the function sim.cross() from qtl, qpgraph allows one to easily simulate genetical genomics data from experimental crosses.

Inma Tur, Alberto Roverato, Robert Castelo Simulation of molecular regulatory networks with graphical models 25 / 27

slide-26
SLIDE 26

Session information

> toLatex(sessionInfo())

R version 3.0.0 (2013-04-03), x86_64-unknown-linux-gnu Locale: LC_CTYPE=en_US.UTF8, LC_NUMERIC=C, LC_TIME=en_US.UTF8, LC_COLLATE=en_US.UTF8, LC_MONETARY=en_US.UTF8, LC_MESSAGES=en_US.UTF8, LC_PAPER=C, LC_NAME=C, LC_ADDRESS=C, LC_TELEPHONE=C, LC_MEASUREMENT=en_US.UTF8, LC_IDENTIFICATION=C Base packages: base, datasets, graphics, grDevices, methods, stats, utils Other packages: colorout 1.0-0, qpgraph 1.16.2, setwidth 1.0-3, vimcom 0.9-8 Loaded via a namespace (and not attached): annotate 1.38.0, AnnotationDbi 1.22.6, Biobase 2.20.1, BiocGenerics 0.6.0, DBI 0.2-7, genefilter 1.42.0, GGBase 3.22.0, graph 1.38.2, grid 3.0.0, IRanges 1.18.1, lattice 0.20-15, Matrix 1.0-12, mvtnorm 0.9-9995, parallel 3.0.0, qtl 1.27-10, Rgraphviz 2.4.1, RSQLite 0.11.4, snpStats 1.10.0, splines 3.0.0, stats4 3.0.0, survival 2.37-4, tools 3.0.0, XML 3.98-1.1, xtable 1.7-1

Inma Tur, Alberto Roverato, Robert Castelo Simulation of molecular regulatory networks with graphical models 26 / 27

slide-27
SLIDE 27

Acknowledgements and advertisements

Funding: FPI predoctoral fellowship [BES-2009-024901] to I. Tur Spanish MINECO project grant [TIN2011-22826] to R. Castelo The qpgraph package is available at http://www.bioconductor.org. Follow news and bugfixes about qpgraph in Twitter at @robertclab.

Thanks for listening!

Inma Tur, Alberto Roverato, Robert Castelo Simulation of molecular regulatory networks with graphical models 27 / 27