Bayesian Two-way Clustering expression analysis: can they be made - PDF document

Motivation: Obvious potential for Bayesian and EB methods in gene Bayesian Two-way Clustering expression analysis: can they be made to work? for Gene Expression Data BGX project, BBSRC funded with Sylvia Richardson, Clare Marshall, Alex Lewin and Anne-Mette Hein (Imperial), in collaboration with Helen Causton and Tim Aitman and colleagues Graeme Ambler and Peter Green (CSC/IC Microarray Centre) University of Bristol 12 July 2003 Model-based, flexible approach to gene expression analysis 1 2 Gene expression using Plan Affymetrix chips * * Zoom Image of Hybridised Array Hybridised Spot * * • Variation and uncertainty in gene * Single stranded, expression labeled RNA sample Oligonucleotide element • Hierarchical models 20µm • Simultaneous inference • Common framework, including clustering Millions of copies of a specific oligonucleotide sequence element • Initial experiments with layer models Expressed genes Approx. ½ million different complementary oligonucleotides Non-expressed genes Slide courtesy of Affymetrix 1.28cm 3 4 Image of Hybridised Array Hierarchical models Variation and uncertainty Variables at Gene expression data (e.g. Affymetrix  ) is the result of multiple sources of variability several levels - allows modelling of • condition/treatment • within/between complex systems array variation • biological • gene-specific • array manufacture variability • imaging • technical 5 6

Bayesian The Bayes orthodoxy hierarchical models • Should avoid a plug-in approach -- all sources of variation should be One of the most important benefits of assimilated the Bayesian approach has nothing much to do with having real • Propagates uncertainty quantitative prior information • ‘Borrows strength’ - shares out • it has more to do with the information - according to principle structures connecting variables • Avoids over-optimistic inference • especially when there is uncertainty at more than one level 7 8 Gene expression is a Bayes in hierarchical models hierarchical process • The arrows represent (top • Substantive question down) model specification, not the order in which • Experimental design operations are performed • Sample preparation • Once specified, model • Array design & manufacture unknowns should be estimated simultaneously • Gene expression matrix • (We cannot yet claim all of • Probe level data this is practical in gene • Image level data expression) 9 10 Hierarchical clustering of samples Additive models for (log-) gene expression The simplest model: gene + sample The gene expression profiles = α + β + ε g =gene y cluster gs g s gs s =sample/condition A subset of 1161 according to gene expression tissue of profiles, obtained in origin of the Under standard conditions, the ) α = − y g y 60 different samples samples (least-squares) estimates of gene g . .. Red : more mRNA effects are Green : less mRNA in the sample The model generates the method, and in this compared case performs a simple form of normalisation to a reference 11 12 Ross et al, Nature Genetics, 2000

Non-model-based clustering Model-based clustering • Many clustering algorithms have been • Build the cluster structure into the model, developed and used for exploratory purposes rather than estimating gene effects (say) first, • They rely on a measure of ‘distance’ and post-processing to seek clusters (dissimilarity) between gene or sample • Bayesian setting allows use of real prior profiles, e.g. Euclidean information where it is exists (biological • Hierarchical clustering proceeds in an understanding of pathways, etc, previous agglomerative manner: single profiles are experiments, …) joined to form groups using the distance metric, recursively • Good visual tool, but many arbitrary choices care in interpretation! 13 14 A common framework for Clustering via additive model specifying gene expression models (single sample first!) = α + ε y g =gene g g For ease of exposition, y gs consider only gene expression matrix = α + γ + ε y g =gene s =sample/condition g T g g with no structure to samples T g = unknown cluster to (although incorporating experimental structure is which gene g belongs a key goal for later) This is a mixture model 15 16 Clustering via additive model Clustering via additive model (multiple samples ) = α + β + γ + ε y gs g s T s gs = α + β + ε g y s =sample/condition gs g s gs g =gene T g =cluster to which gene g belongs = α + β + γ + ε y = α + β + δ + ε gs g s T s gs y g gs g s gU gs s T g = unknown cluster to which gene g belongs U s =cluster to which sample s belongs clustering of gene profiles 17 18

Two-way Lazzeroni and Owen Clustering via additive model ‘Plaid’ model = α + β + γ + ε y = α + β + γ + ε y gs g s T s gs g gs g s T s gs g = α + β + δ + ε y Now write ρ gh =1 if and only if T g =h , 0 otherwise gs g s gU gs s ∑ = α + β + ρ γ ( h ) + ε y = α + β + γ + δ + ε y gs g s gh s gs gs g s T s gU gs g s h = α + β + γ + ε y or h denotes a ‘cluster’, ‘block’ or ‘layer’ - and gs g s T U gs g s now we allow them to overlap 19 20 …. continued over .... samples ‘Plaid’ model ∑ = α + β + ρ γ + ε ( h ) y gs g s gh s gs h = ∑ genes ρ κ γ ( h ) + ε layers overlap y gs gh sh gs gs h (after re- h denotes a ‘cluster’, ‘block’ or ‘layer’ – pathway? ordering ρ gh = 0 or 1 and κ sh = 0 or 1 genes and samples) γ γ γ γ = = = = µ µ µ µ + + + α α β + β ( ( ( ( h h h h ) ) ) ) ( ( ( ( h h h h ) ) ) ) ( ( ( h h h ) ) ) ( h ) gs gs gs gs g g s s 21 22 samples MacKay and Miskin model = ∑ ρ κ γ + ε Instead of ( h ) y gs gh sh gs gs h where h denotes a ‘cluster’, ‘block’ or genes ‘layer’; ρ gh = 0 or 1 and κ sh = 0 or 1 MacKay and Miskin take simply = ∑ + ε ( h ) ( h ) y a b gs s g gs h 23 24

Markov chain Monte Carlo Simultaneous inference (MCMC) computation • An important example of the flexibility of • Fitting of Bayesian models hugely MCMC computation in a Bayesian model: facilitated by advent of these simulation inference about several unknowns at methods once. • Produce a large sample of values of all • e.g. not only ‘which gene has the biggest unknowns, ≈ from posterior given data estimated differential effect?’, but also • Easy to set up for hierarchical models ‘how probable is it that this gene has the • BUT can be slow to run (for many biggest differential effect?’ variables!) • and can fail to converge reliably 25 26 Contact details http://www.stats.bris.ac.uk/BGX Graeme.Ambler@bristol.ac.uk P.J.Green@bristol.ac.uk 27

Bayesian Two-way Clustering expression analysis: can they be made - PDF document

Motivation: Obvious potential for Bayesian and EB methods in gene Bayesian Two-way Clustering expression analysis: can they be made to work? for Gene Expression Data BGX project, BBSRC funded with Sylvia Richardson, Clare Marshall, Alex Lewin

Graph Clustering Graph Clustering What is clustering? What is clustering? Finding patterns

Subspace Clustering Ensemble Clustering Subspace Clustering, Ensemble Clustering, Alternative

PAC-Bayesian Analysis of Co-clustering, Graph Clustering and Pairwise Clustering Yevgeny Seldin

Evolutionary Clustering Presenter: Lei Tang Evolutionary Clustering Evolutionary Clustering

Clustering A Categorization of Major Clustering Methods Partitioning Methods

Being Bayesian About Being Bayesian About Net work St ruct ure Net work St ruct ure A Bayesian

Outline Intro to RL and Bayesian Learning History of Bayesian RL Model-based Bayesian

Trust based Clustering for Group Trust based Clustering for Group Trust based Clustering for

Finding Clusters Types of Clustering Approaches: Linkage Based, e.g. Hierarchical Clustering

Clustering Hierarchical clustering and k-mean clustering Genome 373 Genomic Informatics

Cl Clustering t i A Categorization of Major Clustering Methods Partitioning Methods

Clustering Hierarchical clustering, k-mean clustering Genome 559: Introduction to Statistical and

CSCE 478/878 Lecture 8: Stephen Scott Clustering Introduction Outline Clustering Stephen

Clustering and Dimensionality Reduction Preview Clustering K -means clustering

Clustering kMeans, Expectation Maximization, Self-Organizing Maps Outline K-means

Lecture 23: Spectral clustering Hierarchical clustering What is a good clustering?

Kernel Design Nicolas Durrande PROWLER.io (nicolas@prowler.io) Sheffield, September 2017 1 /

machine vision and computation to describe genome function at the organismal level. Tessa Durham

Lecture 2: Biology Basics Continued Central Dogma DNA: The Code of Life The structure and the

Genetic determinants of dabigatran plasma levels and their relation to bleeding Guillaume Pare MD

Transforming Medicine and Healthcare through Machine Learning and AI Mihaela van der Schaar John

Bayesian computing with INLA and the R-INLA package H avard Rue Norwegian University of

Computationally Tractable Methods for High-Dimensional Data Peter B uhlmann Seminar f ur

Principal components and linear mixed models Zhou Fan Yale University, Statistics and Data

Sambuz

Useful Links

Newsletter

Mail Us