Bioinformatics Group Fox Chase Cancer Center
Bayesian Decomposition Expression to Pathways Michael Ochs - - PowerPoint PPT Presentation
Bayesian Decomposition Expression to Pathways Michael Ochs - - PowerPoint PPT Presentation
Bayesian Decomposition Expression to Pathways Michael Ochs Bioinformatics Group Fox Chase Cancer Center Cancer Biology Cancer is many Diseases but with a Single Theme a cell becomes immortal Insert poster a cell becomes mobile
Bioinformatics Group Fox Chase Cancer Center
Cancer Biology
Insert poster Cancer is many Diseases but with a Single Theme
- a cell becomes immortal
- a cell becomes mobile
Signalling and Metabolic Pathways Hold the Key
www.biosource.com
Bioinformatics Group Fox Chase Cancer Center
Signalling Pathways
Downward, Nature, 411, 759, 2001
mRNA Stimulus Signal Transduction Transcription
Bioinformatics Group Fox Chase Cancer Center
Identifying Pathways
mRNA Interacting Pathways Lead to Confusion if All Genes Need to Lie in a Single Cluster
www.promega.com
Bioinformatics Group Fox Chase Cancer Center
Bayesian Decomposition
- Data Mining/Pattern Recognition Algorithm
– Unsupervised Method – Create Multiple, Overlapping “Clusters”
- Each Gene can be in Multiple Patterns
- Get to Pathways: Key for Cancer Development
- Methodology
– Markov Chain Monte Carlo Algorithm – Simulated Annealing – Integration of Prior Knowledge
Bioinformatics Group Fox Chase Cancer Center
BD: Matrix Decomposition
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Data X
gene 1 gene N * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * gene 1 gene N pattern 1 pattern k Exp 1 Exp M * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * pattern 1 pattern k
Distribution of Patterns Patterns of Behavior =
The behavior of
- ne gene can be
explained as a mixture of patterns with different behaviors Exp 1 Exp M
Bioinformatics Group Fox Chase Cancer Center
BD: Domains
A Atomic Domain P Atomic Domain * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
A P Model Domain
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Data X Data Domain
convolution convolution
Bioinformatics Group Fox Chase Cancer Center
BD: Markov Chain MC
Cloud in N-Dimensional Space: Probability Density for the Model Results from Atomic Domain Prior, Model Functions (Prior), and the Likelihood
Based on Maximum Entropy Data Consultants Massive Inference Sampler
Bioinformatics Group Fox Chase Cancer Center
BD Requirements
- Data Points > (A + P) Points
- Atomic Domains (Sibisi and Skilling, J R Stat Soc B, 59, 217, 1997)
– Positive Additive Distributions – Infinitely Divisible Process
- Model Domains
– Linked to Atomic Domains by Model Function – Correlations between Parameters are Introduced by Model Functions (Atomic > Model)
Bioinformatics Group Fox Chase Cancer Center
BD Features
- Basis Vectors (Patterns) are Nonorthogonal
– Physically Meaningful if Good Model – Artifacts Removed if Do Not Fit Model
- Noise is Treated
– Noise is Integral Part of Fitting Process – Artifacts Often Appear in Residuals (i.e. noise)
- Markov Chain Sampling Yields
– Mean of Probable Distributions and Patterns – Uncertainties for Distributions and Patterns
Bioinformatics Group Fox Chase Cancer Center
BD: Gene Expression
A Atomic Domain P Atomic Domain
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * gene 1 gene N pattern 1 pattern k
*
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * pattern 1 pattern k Exp 1 Exp M
*
Bioinformatics Group Fox Chase Cancer Center
Rosetta Data Set
- Filtering
– Eliminate Genes
- >25% Data Missing in Ratios or Uncertainties
- < 2 Experiments with 3 Fold Change
– Eliminate Experiments
- < 2 Genes Changing by 3 Fold
- Uncertainties
– Used Values from Rosetta Error Model – Missing Data Log Ratio =1, Log Unc = 100
Bioinformatics Group Fox Chase Cancer Center
Analysis
- Analyzed Full Experimental Data with PCA
– Estimate of Dimensionality of Data
- Bayesian Decomposition
– Filtered Data: 764 Genes, 228 Experiments – Ran Multiple Seeds, Multiple Pattern Number – Focus on Dimensions Suggested by PCA
- Data Driven
– Let Analysis Determine Where to Look
Bioinformatics Group Fox Chase Cancer Center
PCA Results
Principal Component Score (EigenValue)
Bioinformatics Group Fox Chase Cancer Center
Bayesian Decomposition
- Distributions
– Assignment of Genes to Patterns
- Patterns
– Each Pattern Defines Behavior Across Experiments
- Experimental Patterns
– Experiments explained by a single pattern – Correlations between experiments
- Genes in Patterns
– Identify biological processes – Identify correlations in genes
Bioinformatics Group Fox Chase Cancer Center
Experiments High in One Pattern
- Pattern 1
– YHR034C 56%
- Pattern 2
– rpd3 89%
- Pattern 3
– ssn6 (cyc8) 76% – YER024W 56% – tup1 54%
- Pattern 5
– YJL107C 53% – yap3 51%
Bioinformatics Group Fox Chase Cancer Center
Genes in Patterns
(Proteome Database Cellular Role)
- Pattern 1
– 403 Genes – 22/36 AA metabolism – 9 additional metabolism
- Pattern 2
– 410 Genes – 7/27 metabolism – 7/27 DNA/RNA processing – 6 transport
- Pattern 3
– 390 Genes – 13/26 metabolism – 6 transport, 4 Pol II
- Pattern 4
– 276 Genes, 30/50 Unknown
- Pattern 5
– 355 Genes – 14/37 carbohydrate metabolism – 7/37 cell stress – 6 transport
- Pattern 6
– 297 Genes, 30/50 unknown
- Pattern 7
– 223 Genes – 13/23 mating response – 5/23 meiosis
AA Pattern Carbo Pattern Metabolic Pattern Mating Pattern
Bioinformatics Group Fox Chase Cancer Center
Metabolic Patterns
- Patterns 1 and 5
– yap 3 98% – YJL107C 98% – YHR034C 98% – FR901,228 98%
- Patterns 1, 3, and 5
– ssn6 100% – swi6 99% – yap 3 98% – YJL107C 98% – YHR034C 98% – FR901,228 98%
Bioinformatics Group Fox Chase Cancer Center
Metabolic Pattern
0% 10% 20% 30% 40% 50% 60% 70% 80% ssn6 (haploid) tup1 (haploid) yer024w AA Metab Carbo Mating
Behavior Explained by Pattern
Bioinformatics Group Fox Chase Cancer Center
Sterile Family Proteins
0% 5% 10% 15% 20% 25% 30% 35% 40% 45%
AA Patt 2 Metab Patt 4 Carbo Patt 6 Mating
ste11 (haploid) ste12 (haploid) ste18 (haploid) ste2 (haploid) ste20 (**11) ste24 (haploid) ste4 (haploid) ste5 (haploid) ste7 (haploid)
Behavior Explained by Pattern
Bioinformatics Group Fox Chase Cancer Center
Ste2
0% 5% 10% 15% 20% 25% 30% 35% 40% 45%
AA Patt 2 Metab Patt 4 Carbo Patt 6 Mating
ste2 (haploid) yil117c (haploid)
YIL117C is prm5, a pheromone regulated protein of unknown function
Behavior Explained by Pattern
Bioinformatics Group Fox Chase Cancer Center
Mating Pattern
0% 5% 10% 15% 20% 25% 30% 35% 40% dig1, dig2 (haploid) dig1, dig2 dig1 dig2 ste20 (**11) fus3 (haploid)
AA Metab Carbo Mating
Behavior Explained by Pattern
Bioinformatics Group Fox Chase Cancer Center
Mating Pathway
Posas, et al, Curr Opin Microbiology, 1, 175, 1998
Bioinformatics Group Fox Chase Cancer Center
Mating Pattern
0% 5% 10% 15% 20% 25% 30% 35% 40% 45% 50% fus3 (haploid) fus3, kss1 (haploid) ste11 (haploid) ste7 (haploid) ste5 (haploid) ste12 (haploid)
AA Metab Carbo Mating 6
Behavior Explained by Pattern
Bioinformatics Group Fox Chase Cancer Center
Correlations (Fus3/Kss1)
0.5 1 1.5 2 2.5 3
YAL012W YAL034W YAL066W YAR009C YAR047C YAR070C YBL005W YBL043W YBL049W YBL098W YBL101W YBR012C YBR012W YBR040W
fus3 (haploid) fus3, kss1 (haploid) Ratio of Expression
Bioinformatics Group Fox Chase Cancer Center
Mating Pattern Correlations
0.5 1 1.5 2 2.5 3 YAL018C YMR082C MEI5 YFR057W STE2 ** PES4 FIG1 ** YAL066W SPO19 YNL018C YNL028W YOR235W AGA2 ** YMR082C FUS1 YOR376W YMR082C SST2 ** BAR1 ** YMR082C YOL131W PRR2 AGA1 ** YER181C YPL280W YLR042C HOP2 TEC1
fus3 (haploid) fus3, kss1 (haploid)
* * * * * *
Ratio of Expression
* * * * *
Bioinformatics Group Fox Chase Cancer Center
Pattern 6
- Mating Pattern -> Pattern 6 for Ste11, Ste7,
Ste5, and Ste12
- PSI-BLAST and SMART pick up 5
matches among unknown ORFs to transposon and retroposon proteins
Bioinformatics Group Fox Chase Cancer Center
Conclusions
- Life is Very Complex
– Multiple Pathways and Interactions for Each Protein with Transcription/Translation – Natural Stochastic Variations
- Analysis Tools Must
– Isolate Areas of Interest without Loss of Knowledge Discovery – Incorporate Maximal Prior Knowledge to Reduce “Search Space”
Bioinformatics Group Fox Chase Cancer Center
Bayesian Decomposition
- Ability to Identify Overlapping
Coexpression Can Lead to Identification of Pathways Affected in an Experiment
- Capability to Encode Prior Knowledge Can
Improve Results
– Ochs et al, J Magn Res, 137, 161, 1999 – Ochs et al, Magn Res Med, in press
Bioinformatics Group Fox Chase Cancer Center
Future Directions
- Application to Clinical Trials
– Study of GIST Tumor Response to Gleevec – Study of Drug Response of HOSE Lines
- Bayesian Decomposition Development
– Incorporation of Time Domain Modelling – Incorporation of Knowledge of Coexpression
Bioinformatics Group Fox Chase Cancer Center
Credits (Definitely Due)
- Fox Chase
– Frank Manion – Thomas Moloshok – Jeffrey Grant – Yue Zhang – Ghislain Bidaut – Burt Eisenberg – Andy Godwin
- City of Hope
– Bob Klevecz
- Johns Hopkins
– Giovanni Parmigiani
- Fox Chase (NMR)