bayesian two way clustering
play

Bayesian Two-way Clustering expression analysis: can they be made - PDF document

Motivation: Obvious potential for Bayesian and EB methods in gene Bayesian Two-way Clustering expression analysis: can they be made to work? for Gene Expression Data BGX project, BBSRC funded with Sylvia Richardson, Clare Marshall, Alex Lewin


  1. Motivation: Obvious potential for Bayesian and EB methods in gene Bayesian Two-way Clustering expression analysis: can they be made to work? for Gene Expression Data BGX project, BBSRC funded with Sylvia Richardson, Clare Marshall, Alex Lewin and Anne-Mette Hein (Imperial), in collaboration with Helen Causton and Tim Aitman and colleagues Graeme Ambler and Peter Green (CSC/IC Microarray Centre) University of Bristol 12 July 2003 Model-based, flexible approach to gene expression analysis 1 2 Gene expression using Plan Affymetrix chips * * Zoom Image of Hybridised Array Hybridised Spot * * • Variation and uncertainty in gene * Single stranded, expression labeled RNA sample Oligonucleotide element • Hierarchical models 20µm • Simultaneous inference • Common framework, including clustering Millions of copies of a specific oligonucleotide sequence element • Initial experiments with layer models Expressed genes Approx. ½ million different complementary oligonucleotides Non-expressed genes Slide courtesy of Affymetrix 1.28cm 3 4 Image of Hybridised Array Hierarchical models Variation and uncertainty Variables at Gene expression data (e.g. Affymetrix  ) is the result of multiple sources of variability several levels - allows modelling of • condition/treatment • within/between complex systems array variation • biological • gene-specific • array manufacture variability • imaging • technical 5 6

  2. Bayesian The Bayes orthodoxy hierarchical models • Should avoid a plug-in approach -- all sources of variation should be One of the most important benefits of assimilated the Bayesian approach has nothing much to do with having real • Propagates uncertainty quantitative prior information • ‘Borrows strength’ - shares out • it has more to do with the information - according to principle structures connecting variables • Avoids over-optimistic inference • especially when there is uncertainty at more than one level 7 8 Gene expression is a Bayes in hierarchical models hierarchical process • The arrows represent (top • Substantive question down) model specification, not the order in which • Experimental design operations are performed • Sample preparation • Once specified, model • Array design & manufacture unknowns should be estimated simultaneously • Gene expression matrix • (We cannot yet claim all of • Probe level data this is practical in gene • Image level data expression) 9 10 Hierarchical clustering of samples Additive models for (log-) gene expression The simplest model: gene + sample The gene expression profiles = α + β + ε g =gene y cluster gs g s gs s =sample/condition A subset of 1161 according to gene expression tissue of profiles, obtained in origin of the Under standard conditions, the ) α = − y g y 60 different samples samples (least-squares) estimates of gene g . .. Red : more mRNA effects are Green : less mRNA in the sample The model generates the method, and in this compared case performs a simple form of normalisation to a reference 11 12 Ross et al, Nature Genetics, 2000

  3. Non-model-based clustering Model-based clustering • Many clustering algorithms have been • Build the cluster structure into the model, developed and used for exploratory purposes rather than estimating gene effects (say) first, • They rely on a measure of ‘distance’ and post-processing to seek clusters (dissimilarity) between gene or sample • Bayesian setting allows use of real prior profiles, e.g. Euclidean information where it is exists (biological • Hierarchical clustering proceeds in an understanding of pathways, etc, previous agglomerative manner: single profiles are experiments, …) joined to form groups using the distance metric, recursively • Good visual tool, but many arbitrary choices care in interpretation! 13 14 A common framework for Clustering via additive model specifying gene expression models (single sample first!) = α + ε y g =gene g g For ease of exposition, y gs consider only gene expression matrix = α + γ + ε y g =gene s =sample/condition g T g g with no structure to samples T g = unknown cluster to (although incorporating experimental structure is which gene g belongs a key goal for later) This is a mixture model 15 16 Clustering via additive model Clustering via additive model (multiple samples ) = α + β + γ + ε y gs g s T s gs = α + β + ε g y s =sample/condition gs g s gs g =gene T g =cluster to which gene g belongs = α + β + γ + ε y = α + β + δ + ε gs g s T s gs y g gs g s gU gs s T g = unknown cluster to which gene g belongs U s =cluster to which sample s belongs clustering of gene profiles 17 18

  4. Two-way Lazzeroni and Owen Clustering via additive model ‘Plaid’ model = α + β + γ + ε y = α + β + γ + ε y gs g s T s gs g gs g s T s gs g = α + β + δ + ε y Now write ρ gh =1 if and only if T g =h , 0 otherwise gs g s gU gs s ∑ = α + β + ρ γ ( h ) + ε y = α + β + γ + δ + ε y gs g s gh s gs gs g s T s gU gs g s h = α + β + γ + ε y or h denotes a ‘cluster’, ‘block’ or ‘layer’ - and gs g s T U gs g s now we allow them to overlap 19 20 …. continued over .... samples ‘Plaid’ model ∑ = α + β + ρ γ + ε ( h ) y gs g s gh s gs h = ∑ genes ρ κ γ ( h ) + ε layers overlap y gs gh sh gs gs h (after re- h denotes a ‘cluster’, ‘block’ or ‘layer’ – pathway? ordering ρ gh = 0 or 1 and κ sh = 0 or 1 genes and samples) γ γ γ γ = = = = µ µ µ µ + + + α α β + β ( ( ( ( h h h h ) ) ) ) ( ( ( ( h h h h ) ) ) ) ( ( ( h h h ) ) ) ( h ) gs gs gs gs g g s s 21 22 samples MacKay and Miskin model = ∑ ρ κ γ + ε Instead of ( h ) y gs gh sh gs gs h where h denotes a ‘cluster’, ‘block’ or genes ‘layer’; ρ gh = 0 or 1 and κ sh = 0 or 1 MacKay and Miskin take simply = ∑ + ε ( h ) ( h ) y a b gs s g gs h 23 24

  5. Markov chain Monte Carlo Simultaneous inference (MCMC) computation • An important example of the flexibility of • Fitting of Bayesian models hugely MCMC computation in a Bayesian model: facilitated by advent of these simulation inference about several unknowns at methods once. • Produce a large sample of values of all • e.g. not only ‘which gene has the biggest unknowns, ≈ from posterior given data estimated differential effect?’, but also • Easy to set up for hierarchical models ‘how probable is it that this gene has the • BUT can be slow to run (for many biggest differential effect?’ variables!) • and can fail to converge reliably 25 26 Contact details http://www.stats.bris.ac.uk/BGX Graeme.Ambler@bristol.ac.uk P.J.Green@bristol.ac.uk 27

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend