Bayesian Decomposition Expression to Pathways Michael Ochs - - PowerPoint PPT Presentation

bayesian decomposition expression to pathways
SMART_READER_LITE
LIVE PREVIEW

Bayesian Decomposition Expression to Pathways Michael Ochs - - PowerPoint PPT Presentation

Bayesian Decomposition Expression to Pathways Michael Ochs Bioinformatics Group Fox Chase Cancer Center Cancer Biology Cancer is many Diseases but with a Single Theme a cell becomes immortal Insert poster a cell becomes mobile


slide-1
SLIDE 1

Bioinformatics Group Fox Chase Cancer Center

Bayesian Decomposition Expression to Pathways

Michael Ochs

slide-2
SLIDE 2

Bioinformatics Group Fox Chase Cancer Center

Cancer Biology

Insert poster Cancer is many Diseases but with a Single Theme

  • a cell becomes immortal
  • a cell becomes mobile

Signalling and Metabolic Pathways Hold the Key

www.biosource.com

slide-3
SLIDE 3

Bioinformatics Group Fox Chase Cancer Center

Signalling Pathways

Downward, Nature, 411, 759, 2001

mRNA Stimulus Signal Transduction Transcription

slide-4
SLIDE 4

Bioinformatics Group Fox Chase Cancer Center

Identifying Pathways

mRNA Interacting Pathways Lead to Confusion if All Genes Need to Lie in a Single Cluster

www.promega.com

slide-5
SLIDE 5

Bioinformatics Group Fox Chase Cancer Center

Bayesian Decomposition

  • Data Mining/Pattern Recognition Algorithm

– Unsupervised Method – Create Multiple, Overlapping “Clusters”

  • Each Gene can be in Multiple Patterns
  • Get to Pathways: Key for Cancer Development
  • Methodology

– Markov Chain Monte Carlo Algorithm – Simulated Annealing – Integration of Prior Knowledge

slide-6
SLIDE 6

Bioinformatics Group Fox Chase Cancer Center

BD: Matrix Decomposition

* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

Data X

gene 1 gene N * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * gene 1 gene N pattern 1 pattern k Exp 1 Exp M * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * pattern 1 pattern k

Distribution of Patterns Patterns of Behavior =

The behavior of

  • ne gene can be

explained as a mixture of patterns with different behaviors Exp 1 Exp M

slide-7
SLIDE 7

Bioinformatics Group Fox Chase Cancer Center

BD: Domains

A Atomic Domain P Atomic Domain * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

A P Model Domain

* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

Data X Data Domain

convolution convolution

slide-8
SLIDE 8

Bioinformatics Group Fox Chase Cancer Center

BD: Markov Chain MC

Cloud in N-Dimensional Space: Probability Density for the Model Results from Atomic Domain Prior, Model Functions (Prior), and the Likelihood

Based on Maximum Entropy Data Consultants Massive Inference Sampler

slide-9
SLIDE 9

Bioinformatics Group Fox Chase Cancer Center

BD Requirements

  • Data Points > (A + P) Points
  • Atomic Domains (Sibisi and Skilling, J R Stat Soc B, 59, 217, 1997)

– Positive Additive Distributions – Infinitely Divisible Process

  • Model Domains

– Linked to Atomic Domains by Model Function – Correlations between Parameters are Introduced by Model Functions (Atomic > Model)

slide-10
SLIDE 10

Bioinformatics Group Fox Chase Cancer Center

BD Features

  • Basis Vectors (Patterns) are Nonorthogonal

– Physically Meaningful if Good Model – Artifacts Removed if Do Not Fit Model

  • Noise is Treated

– Noise is Integral Part of Fitting Process – Artifacts Often Appear in Residuals (i.e. noise)

  • Markov Chain Sampling Yields

– Mean of Probable Distributions and Patterns – Uncertainties for Distributions and Patterns

slide-11
SLIDE 11

Bioinformatics Group Fox Chase Cancer Center

BD: Gene Expression

A Atomic Domain P Atomic Domain

* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * gene 1 gene N pattern 1 pattern k

*

* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * pattern 1 pattern k Exp 1 Exp M

*

slide-12
SLIDE 12

Bioinformatics Group Fox Chase Cancer Center

Rosetta Data Set

  • Filtering

– Eliminate Genes

  • >25% Data Missing in Ratios or Uncertainties
  • < 2 Experiments with 3 Fold Change

– Eliminate Experiments

  • < 2 Genes Changing by 3 Fold
  • Uncertainties

– Used Values from Rosetta Error Model – Missing Data Log Ratio =1, Log Unc = 100

slide-13
SLIDE 13

Bioinformatics Group Fox Chase Cancer Center

Analysis

  • Analyzed Full Experimental Data with PCA

– Estimate of Dimensionality of Data

  • Bayesian Decomposition

– Filtered Data: 764 Genes, 228 Experiments – Ran Multiple Seeds, Multiple Pattern Number – Focus on Dimensions Suggested by PCA

  • Data Driven

– Let Analysis Determine Where to Look

slide-14
SLIDE 14

Bioinformatics Group Fox Chase Cancer Center

PCA Results

Principal Component Score (EigenValue)

slide-15
SLIDE 15

Bioinformatics Group Fox Chase Cancer Center

Bayesian Decomposition

  • Distributions

– Assignment of Genes to Patterns

  • Patterns

– Each Pattern Defines Behavior Across Experiments

  • Experimental Patterns

– Experiments explained by a single pattern – Correlations between experiments

  • Genes in Patterns

– Identify biological processes – Identify correlations in genes

slide-16
SLIDE 16

Bioinformatics Group Fox Chase Cancer Center

Experiments High in One Pattern

  • Pattern 1

– YHR034C 56%

  • Pattern 2

– rpd3 89%

  • Pattern 3

– ssn6 (cyc8) 76% – YER024W 56% – tup1 54%

  • Pattern 5

– YJL107C 53% – yap3 51%

slide-17
SLIDE 17

Bioinformatics Group Fox Chase Cancer Center

Genes in Patterns

(Proteome Database Cellular Role)

  • Pattern 1

– 403 Genes – 22/36 AA metabolism – 9 additional metabolism

  • Pattern 2

– 410 Genes – 7/27 metabolism – 7/27 DNA/RNA processing – 6 transport

  • Pattern 3

– 390 Genes – 13/26 metabolism – 6 transport, 4 Pol II

  • Pattern 4

– 276 Genes, 30/50 Unknown

  • Pattern 5

– 355 Genes – 14/37 carbohydrate metabolism – 7/37 cell stress – 6 transport

  • Pattern 6

– 297 Genes, 30/50 unknown

  • Pattern 7

– 223 Genes – 13/23 mating response – 5/23 meiosis

AA Pattern Carbo Pattern Metabolic Pattern Mating Pattern

slide-18
SLIDE 18

Bioinformatics Group Fox Chase Cancer Center

Metabolic Patterns

  • Patterns 1 and 5

– yap 3 98% – YJL107C 98% – YHR034C 98% – FR901,228 98%

  • Patterns 1, 3, and 5

– ssn6 100% – swi6 99% – yap 3 98% – YJL107C 98% – YHR034C 98% – FR901,228 98%

slide-19
SLIDE 19

Bioinformatics Group Fox Chase Cancer Center

Metabolic Pattern

0% 10% 20% 30% 40% 50% 60% 70% 80% ssn6 (haploid) tup1 (haploid) yer024w AA Metab Carbo Mating

Behavior Explained by Pattern

slide-20
SLIDE 20

Bioinformatics Group Fox Chase Cancer Center

Sterile Family Proteins

0% 5% 10% 15% 20% 25% 30% 35% 40% 45%

AA Patt 2 Metab Patt 4 Carbo Patt 6 Mating

ste11 (haploid) ste12 (haploid) ste18 (haploid) ste2 (haploid) ste20 (**11) ste24 (haploid) ste4 (haploid) ste5 (haploid) ste7 (haploid)

Behavior Explained by Pattern

slide-21
SLIDE 21

Bioinformatics Group Fox Chase Cancer Center

Ste2

0% 5% 10% 15% 20% 25% 30% 35% 40% 45%

AA Patt 2 Metab Patt 4 Carbo Patt 6 Mating

ste2 (haploid) yil117c (haploid)

YIL117C is prm5, a pheromone regulated protein of unknown function

Behavior Explained by Pattern

slide-22
SLIDE 22

Bioinformatics Group Fox Chase Cancer Center

Mating Pattern

0% 5% 10% 15% 20% 25% 30% 35% 40% dig1, dig2 (haploid) dig1, dig2 dig1 dig2 ste20 (**11) fus3 (haploid)

AA Metab Carbo Mating

Behavior Explained by Pattern

slide-23
SLIDE 23

Bioinformatics Group Fox Chase Cancer Center

Mating Pathway

Posas, et al, Curr Opin Microbiology, 1, 175, 1998

slide-24
SLIDE 24

Bioinformatics Group Fox Chase Cancer Center

Mating Pattern

0% 5% 10% 15% 20% 25% 30% 35% 40% 45% 50% fus3 (haploid) fus3, kss1 (haploid) ste11 (haploid) ste7 (haploid) ste5 (haploid) ste12 (haploid)

AA Metab Carbo Mating 6

Behavior Explained by Pattern

slide-25
SLIDE 25

Bioinformatics Group Fox Chase Cancer Center

Correlations (Fus3/Kss1)

0.5 1 1.5 2 2.5 3

YAL012W YAL034W YAL066W YAR009C YAR047C YAR070C YBL005W YBL043W YBL049W YBL098W YBL101W YBR012C YBR012W YBR040W

fus3 (haploid) fus3, kss1 (haploid) Ratio of Expression

slide-26
SLIDE 26

Bioinformatics Group Fox Chase Cancer Center

Mating Pattern Correlations

0.5 1 1.5 2 2.5 3 YAL018C YMR082C MEI5 YFR057W STE2 ** PES4 FIG1 ** YAL066W SPO19 YNL018C YNL028W YOR235W AGA2 ** YMR082C FUS1 YOR376W YMR082C SST2 ** BAR1 ** YMR082C YOL131W PRR2 AGA1 ** YER181C YPL280W YLR042C HOP2 TEC1

fus3 (haploid) fus3, kss1 (haploid)

* * * * * *

Ratio of Expression

* * * * *

slide-27
SLIDE 27

Bioinformatics Group Fox Chase Cancer Center

Pattern 6

  • Mating Pattern -> Pattern 6 for Ste11, Ste7,

Ste5, and Ste12

  • PSI-BLAST and SMART pick up 5

matches among unknown ORFs to transposon and retroposon proteins

slide-28
SLIDE 28

Bioinformatics Group Fox Chase Cancer Center

Conclusions

  • Life is Very Complex

– Multiple Pathways and Interactions for Each Protein with Transcription/Translation – Natural Stochastic Variations

  • Analysis Tools Must

– Isolate Areas of Interest without Loss of Knowledge Discovery – Incorporate Maximal Prior Knowledge to Reduce “Search Space”

slide-29
SLIDE 29

Bioinformatics Group Fox Chase Cancer Center

Bayesian Decomposition

  • Ability to Identify Overlapping

Coexpression Can Lead to Identification of Pathways Affected in an Experiment

  • Capability to Encode Prior Knowledge Can

Improve Results

– Ochs et al, J Magn Res, 137, 161, 1999 – Ochs et al, Magn Res Med, in press

slide-30
SLIDE 30

Bioinformatics Group Fox Chase Cancer Center

Future Directions

  • Application to Clinical Trials

– Study of GIST Tumor Response to Gleevec – Study of Drug Response of HOSE Lines

  • Bayesian Decomposition Development

– Incorporation of Time Domain Modelling – Incorporation of Knowledge of Coexpression

slide-31
SLIDE 31

Bioinformatics Group Fox Chase Cancer Center

Credits (Definitely Due)

  • Fox Chase

– Frank Manion – Thomas Moloshok – Jeffrey Grant – Yue Zhang – Ghislain Bidaut – Burt Eisenberg – Andy Godwin

  • City of Hope

– Bob Klevecz

  • Johns Hopkins

– Giovanni Parmigiani

  • Fox Chase (NMR)

– Truman Brown (Columbia)