Machine Reading for Cancer Panomics
Hoifung Poon
1
Cancer Panomics Hoifung Poon 1 Overview ATTCGG A TATTTAAG G C - - PowerPoint PPT Presentation
Machine Reading for Cancer Panomics Hoifung Poon 1 Overview ATTCGG A TATTTAAG G C ATTCGGGTATTTAAGCC Disease Genes Drug Targets High-Throughput Data KB Cancer Systems Modeling 2 Overview ATTCGG
1
2
… ATTCGGATATTTAAGGC …
… ATTCGGGTATTTAAGCC …
…… …… Disease Genes Drug Targets ……
High-Throughput Data
3
… ATTCGGATATTTAAGGC …
… ATTCGGGTATTTAAGCC …
…… …… Disease Genes Drug Targets …
High-Throughput Data Grounded Semantic Parsing
5
Before Treatment 15 Weeks
6
Before Treatment 15 Weeks 23 Weeks
7
8
Targeted Experiments Discovery
9
High-Throughput Experiments Discovery
… ATTCGGATATTTAAGGC …
… ATTCGGGTATTTAAGCC …
… ATTCGGATATTTAAGGC …
… ATTCGGGTATTTAAGCC …
… ATTCGGATATTTAAGGC …
… ATTCGGGTATTTAAGCC …
… ATTCGGATATTTAAGGC … … ATTCGGGTATTTAAGCC …
Healthy Disease
(e.g., Alzheimer, Cancer)
“A Decade Later, Genetic Maps Yield Few New Cures” New York Times, June 2010.
10
Human genome: 3 billion base pairs Potential variations: > 10 million variants Combination: > 101000000 (1 million zeros) Machine learning problem
Atomic features: > 10 million Feature combination: Too many to enumerate
11
12
Discovery
… ATTCGGATATTTAAGGC …
… ATTCGGGTATTTAAGCC …
… ATTCGGATATTTAAGGC …
… ATTCGGGTATTTAAGCC …
… ATTCGGATATTTAAGGC …
… ATTCGGGTATTTAAGCC …
High-Throughput Experiments
Hundreds of mutations Most are “passenger”, not driver Can we identify likely drivers?
13
… ATTCGGATATTTAAGGC … … ATTCGGGTATTTAAGCC …
Normal cells Tumor cells
14
… ATTCGGATATTTAAGGC …
15
16
Hanahan & Weinberg [Cell 2011]
Subtypes with alternative pathway profile Compensatory pathways can be activated
17
Subtypes with alternative pathway profile Compensatory pathways can be activated
18
19
Gene A DNA mRNA Protein Protein Active Transcription Translation Activation
… ATTCGGATATTTAAGGC …
20
Gene A DNA mRNA Protein Protein Active Gene B DNA mRNA Protein Protein Active Gene C DNA mRNA Protein Protein Active Transcription Factor Protein Kinase
21
Gene A DNA mRNA Protein Protein Active Gene B DNA mRNA Protein Protein Active Gene C DNA mRNA Protein Protein Active Transcription Factor Protein Kinase
22
Gene A DNA mRNA Protein Protein Active Gene B DNA mRNA Protein Protein Active Gene C DNA mRNA Protein Protein Active Transcription Factor Protein Kinase
23
Gene A DNA mRNA Protein Protein Active Gene B DNA mRNA Protein Protein Active Gene C DNA mRNA Protein Protein Active Transcription Factor Protein Kinase
24
Gene A DNA mRNA Protein Protein Active Transcription Factor Protein Kinase Gene B DNA mRNA Protein Protein Active Gene C DNA mRNA Protein Protein Active
25
… ATTCGGATATTTAAGGC …
… ATTCGGGTATTTAAGCC …
…… …… Disease Genes Drug Targets ……
High-Throughput Data
24 millions abstracts Two new abstracts every minute Adds over one million every year
26
… VDR+ binds to SMAD3 to form … … JUN expression is induced by SMAD3/4 … PMID: 123 PMID: 456 ……
27
28
29
PROTEIN PROTEIN PROTEIN CELL
30
Site Theme Cause
Theme Cause Theme
REGULATION REGULATION REGULATION PROTEIN PROTEIN PROTEIN CELL
31
Site Theme Cause
Theme Cause Theme
REGULATION REGULATION REGULATION PROTEIN PROTEIN PROTEIN CELL
32
TP53 inhibits BCL2. Tumor suppressor P53 down-regulates the activity of BCL-2 proteins. BCL2 transcription is suppressed by P53 expression. The inhibition of B-cell CLL/Lymphoma 2 expression by TP53 … ……
GENIA (BioNLP Shared Task 2009-2013)
1999 abstracts MeSH: human, blood cell, transcription factor
Challenge for “supervised” machine learning Can we breach this bottleneck?
33
Similar context Probably similar meaning Annotation as latent variables
Unsupervised semantic parsing
34
Poon & Domingos, “Unsupervised Semantic Parsing”. EMNLP 2009. Best Paper Award.
35
TP53 inhibits BCL2. Tumor suppressor P53 down-regulates the activity of BCL-2 proteins. BCL2 transcription is suppressed by P53 expression. The inhibition of B-cell CLL/Lymphoma 2 expression by TP53 … ……
36
TP53 inhibits BCL2. Tumor suppressor P53 down-regulates the activity of BCL-2 proteins. BCL2 transcription is suppressed by P53 expression. The inhibition of B-cell CLL/Lymphoma 2 expression by TP53 … ……
37
TP53 inhibits BCL2. Tumor suppressor P53 down-regulates the activity of BCL-2 proteins. BCL2 transcription is suppressed by P53 expression. The inhibition of B-cell CLL/Lymphoma 2 expression by TP53 … ……
38
TP53 inhibits BCL2. Tumor suppressor P53 down-regulates the activity of BCL-2 proteins. BCL2 transcription is suppressed by P53 expression. The inhibition of B-cell CLL/Lymphoma 2 expression by TP53 … ……
BCL2, BCL-2 proteins, B-cell CLL/Lymphoma 2 …… TP53,Tumor suppressor P53 …… inhibits, down-regulates, suppresses, inhibition, … Theme Cause
Many KBs available
Gene/Protein: GeneBank, UniProt, … Pathways: NCI, Reactome, KEGG, BioCarta, …
Annotation as latent variables
Grounded semantic parsing
39
40
ID Symbol Alias 990 BCL2 B-cell CLL/Lymphoma 2, … 11998 TP53 Tumor suppressor P53, … … … … HGNC
41
ID Symbol Alias 990 BCL2 B-cell CLL/Lymphoma 2, … 11998 TP53 Tumor suppressor P53, … … … … HGNC
TP53 inhibits BCL2. Tumor suppressor P53 down-regulates the activity of BCL-2 proteins. BCL2 transcription is suppressed by P53 expression. The inhibition of B-cell CLL/Lymphoma 2 expression by TP53 … ……
42
Regulation Theme Cause Positive A2M FOXO1 Positive ABCB1 TP53 Negative BCL2 TP53 … … … NCI-PID Pathway KB
TP53 inhibits BCL2. Tumor suppressor P53 down-regulates the activity of BCL-2 proteins. BCL2 transcription is suppressed by P53 expression. The inhibition of B-cell CLL/Lymphoma 2 expression by TP53 … ……
43
Regulation Theme Cause Positive A2M FOXO1 Positive ABCB1 TP53 Negative BCL2 TP53 … … … NCI-PID Pathway KB
TP53 inhibits BCL2. Tumor suppressor P53 down-regulates the activity of BCL-2 proteins. BCL2 transcription is suppressed by P53 expression. The inhibition of B-cell CLL/Lymphoma 2 expression by TP53 … ……
44
Poon, “Grounded Unsupervised Semantic Parsing”. ACL 2013.
Supervised Unsupervised
Generalize distant supervision:
Prior: Favor semantic parse grounded in KB Outperformed the majority of participants in
45
Parikh, Poon, Toutanova. In Progress.
46
Poon et al., “Literome: PubMed-Scale Genomic Knowledge Base in the Cloud”, Bioinformatics 2014.
Preliminary pass:
2 million instances 13,000 genes, 870,000 unique regulations
Applications:
UCSC Genome Browser, MSR Interactions Track Expression profile modeling Validate de novo pathway prediction Etc.
47
Poon, Toutanova, Quirk, “Distant Supervision for Cancer Pathway Extraction from Text”. PSB 2015. To appear.
48
49
Big Data
50
Big Data Rich Knowledge
51
Deep Model Big Data Rich Knowledge
52
Deep Model Big Data Rich Knowledge Hypotheses
53
Deep Model Big Data Rich Knowledge Hypotheses Experiments
54
Deep Model Big Data Rich Knowledge Hypotheses Experiments
Extract richer knowledge:
Cell type, experimental condition, … Hedging, negation, …
Formulate coherent models:
Supporting evidence, contradiction, … Intellectual gaps, hypotheses, …
Integrate w. data & experiments:
Cancer panomics Driver genes / pathways Single-drug response Drug combo prioritization
55
42-million program
Reading, Assembly, Explanation Domain: Cancer signaling pathways
We are in
PI: Andrey Rzhetsky Co-PI w. James Evans, Ross King
56
57
Berkeley AMP Lab OHSU Microsoft Research
58
59
Precision medicine is the future Cancer systems modeling
Extract pathways from PubMed
Literome: KB for genomic medicine
60
61
U. Chicago: Andrey Rzhetsky, Kevin White OHSU: Brian Drucker, Jeff Tyner Berkeley AMP Lab: David Patterson U. Wisconsin: Anthony Gitter Microsoft Research: Chris Quirk, Kristina
62
… ATTCGGATATTTAAGGC …
… ATTCGGGTATTTAAGCC …
…… …… Disease Genes Drug Targets ……
High-Throughput Data