rjacgh a package for analysis of
play

RJaCGH, a package for analysis of cancer activity. CGH arrays with - PowerPoint PPT Presentation

1.- CGH Arrays: National Spanish Cancer Center National Spanish Cancer Center Biological problem: Changes in number of DNA copies are associated to RJaCGH, a package for analysis of cancer activity. CGH arrays with Reversible Jump MCMC


  1. 1.- CGH Arrays: National Spanish Cancer Center National Spanish Cancer Center Biological problem: Changes in number of DNA copies are associated to RJaCGH, a package for analysis of cancer activity. CGH arrays with Reversible Jump MCMC Microarray technology: Comparative Genomic Hybridization (CGH) ● Test DNA sample (Cancer) labeled in red ● Reference DNA sample (Control) labeled in green ● Samples are hybridized, and superimposed. ● The intensity of color is measured in log scale Oscar Rueda, omrueda@cnio.es Ramon Diaz-Uriarte, rdiaz@ligarto.org ● y=log (intensity of Test / intensity of reference) http://en.wikipedia.org/wiki/Image:Microarray-schema.gif 01 / 18 2.- Methods for the Analysis of CGH 2.- Methods for the Analysis of CGH National Spanish Cancer Center National Spanish Cancer Center Arrays: Arrays (II): Hypothesis testing based: Copy number estimation based: • Circular Binary Segmentation (Olshen et al., 2004) • Hidden Markov Models (Fridlyand et al., 2003, Guha et al, 2005, Marioni et al., 2006) • CGH-Explorer (Lingjaerde et al., 2004) • Quantile smoothing (Eilers et al., 2004) • aCGH-Smooth (Jong et al., 2004) • GLAD (Hupé et al., 2004). • SW-Array (Price et al., 2005) • Picard et al., (2005) • CLAC (Wang et al., 2005). • CGHMIX (Bröet and Richardson, 2006) • Wavelets (Hsu et al., 2005) • Bayes Regression (Wen et al., 2006) 02 / 18 03 / 18

  2. 3.- Drawbacks of the current methods 4.- RJaCGH. Motivation: National Spanish Cancer Center National Spanish Cancer Center for the Analysis of CGH Arrays: There are a finite number of different copy gains / losses. Finite Mixture Model. We don't measure directly that number, but instead we have a gaussian noise. • Most of them don't have biological background. Finite Mixture Model with Gaussian Distributions. • Some of them don't have an statistical model behind. The state of every gen influences the state of its neighbours, • Some of them have it, but make a post-processing step that invalidates the statistics. Hidden Markov Model with Gaussian Distributions. • Most of them don't take into account distance between genes. This influence must be bigger the closer the genes are. • Most of them have a lot of parameters to tune, with no intuitive interpretation. Non Homogeneous Hidden Markov Model with Gaussian Distributions. The model uncertainty must be taken into account NH HMM with Gaussian Distributions. with Bayesian Model Averaging: RJaCGH. 04 / 18 05 / 18 5.- RJaCGH. Main features: 6.- RJaCGH. The statistical model: National Spanish Cancer Center National Spanish Cancer Center k = number of different copy numbers s t = true copy • Non Homogeneous Hidden Markov Model with unknown number of states. number of the gene t y t = log 2 ratio of the gene t • Bayesian Inference through Markov Chain Monte Carlo Simulation x t = distance between genes t and its predecessor y t / s t ~ N  k ,  k 2  • Automatic selection of the number of states through Reversible Jump MCMC. p  s t = j / s t − 1 = i , x t = x = Q i, j , x • Classification of states takes into account model uncertainty: exp − 1  1 x  Q i , j , x = • AIC or BIC are not good methods for choosing the number of hidden states. k ∑ exp − p  p x  p = 1 • Not a “purist” bayesian analysis: hidden state sequence is obtained =  0   1  k − 1 0 ... via a point estimator of means, variances and transition matrix.  k  2k − 2 0 ... ≥ 0 , ... ... ... ... • Bayesian Model Averaging:   k − 1  k − 1 − k − 1    k − 1  k − 1 − k ... P  S i = r / X i = x = ∑ P  K = i  P  S i = r / X i = x , K = i  06 / 18 07 / 18

  3. 7.- RJaCGH. The bayesian model: 8.- RJaCGH. The MCMC simulation: National Spanish Cancer Center National Spanish Cancer Center Each sweep consists of three steps: p  k ≡ Priori over number of hidden states By default ,is auniform distribution 1.- Update model: p  k / k ≡ Priori over HMM conditioned on k ● In turn, Metropolis-Hastings step for means, variances and transition matrix. ~ N  ,  ● The hidden state sequence is not part of the of the state space of the sampler By default , = median  y  , = range  y  ● The dimensionality of that space is reduced.  2 ~ IG  ka ,g  By default ,ka = 2, g = range 2  y / 50 2.-Update number of hidden states: attempt birth / death move: Beta ~ 1,1  3.-Update number of hidden states: attempt split / combine move: L  y ; k ,  k ≡ Likelihood of the model p  k  p  k / k  L  y ; k ,  k ≡ Joint distribution 08 / 18 09 / 18 9.- RJaCGH. The RJ moves : 10.- RJaCGH. The package: National Spanish Cancer Center National Spanish Cancer Center Birth move: Main function: RJaCGH(y, Chrom = NULL, Pos = NULL, model = "genome", burnin = 0, A new state is sampled from the priors and accepted with probability TOT =1000, k.max = 6, stat = NULL, mu.alfa = NULL, mu.beta = NULL, ka = NULL, prob.birth=min(1, p) p = P  k = r  1  L  y ;r  1,  r  1  P death  r  1  g = NULL, prob.k = NULL, jump.parameters=list(), start.k = NULL, RJ=TRUE) P  k = r  L  y ;r ,  r  P birth  r  Split move: The object returned can be of several classes: A state is split into two ones and accepted with probability prob.split=min(1, p) ● RJaCGH.array: if y was a matrix or data frame of arrays.  i1 = i0 − i0   ,  i2 = i0  i0   with   ~ N  0,    2 = i0 2   , 2 = i0 2  1 −    i1  i2 with   ~ Beta  2,2  ● RJaCGH.genome: if we fit the same model to the whole genome.  i ,i1 = i ,i0   ,  i ,i2 = i ,i0 /  with   ~ ln  0,    fori ≠ i 0 Split column i 0  i1 , j = i0 , j U j ,  i2 , j = i0 , j  1 − U j  withU j ~ Beta  2,2  for j ≠ i 0 ● RJaCGH.Chrom: if we fit a different model to each chromosome. Split row i 0  i1 ,i2 ~ 1,1  ● RJaCGH: a fit to a sequence without chromosome index. 3 ∏ p = P  k = r  1  P  r  1  L  y ;  r  1  r  1   i ,i0  i0 , j ∏ J split = 2 r  i0 J split P  k = r  P  r  L  y;  r  2p    p    ∏ P    ∏ P  U j   beta jp <- list(sigma.tau.mu=rep(0.05, 6), sigma.tau.sigma.2=rep(0.01, 6), r − 1 r − 1 sigma.tau.beta=rep(0.5, 6),tau.split.mu=0.5, tau.split.beta=0.5) Death and combine moves are the symmetric ones, and their acceptance fit <- RjaCGH(y=gm01523$LogRatio, Pos=gm01523$PosBase, probabilities are the inverse of the birth and split ones.. Chrom=gm01523$Chromosome,model=”genome”, burnin=50000, 10 / 18 11 / 18 TOT=100000, jump.parameters=jp)

  4. 11.- RJaCGH. The package: 12.- RJaCGH. The package. Methods: Summary: National Spanish Cancer Center National Spanish Cancer Center The objects returned are lists: summary(fit) -> summary of the fit. Point estimator (mean, median or mode) of if fit has class 'RjaCGH' we can access its elements: means, variances and transition matrix. fit$k : models visited States: fit[[1]] : model with 1 hidden state states(fit) -> sequence of hidden states. Not a part of the model, computed via fit[[1]]$mu : means visited by the sampler. a point estimator of the means, variances and transition matrix and fit[[1]]$sigma.2 : variances visited by the sampler. the backward filtering probabilities. Not computed by viterbi. fit[[1]]$beta : betas visited by the sampler. Model averaging: fit[[r]] : model with r hidden states. model.averaging(fit) -> sequence of hidden states computed via a call to states If fit has class 'RjaCGH.genome', it's the same as before. for every model fit, weighted by the posterior probability of that If fit has class 'RjaCGH.Chrom' it's a list with sublists as before: number of states. fit[[1]] : model for the first chromosome Plot: fit[[1]]$k, fit[[1]][[1]]$mu, etc. plot(fit) -> plot fitted model. Plot a single chromosome, the whole genome, if fit has class 'RjaCGH.array' it's again a list with sublists: bayesian model averaging of several arrays, region of common fit[[1]] : first array gains / losses of several arrays. 12 / 18 13 / 18 fit[[1]][[1]] : first array, first chromosome (if model=Chrom) 13.- RJaCGH. The package. Examples: 13.- RJaCGH. The package. Examples (II): National Spanish Cancer Center National Spanish Cancer Center Data from cell line GM05296 from Snijders et al. (2001). 96.82% of correct classification, but only 14 transdimensional moves 15 / 18 15 / 18

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend