cancer panomics
play

Cancer Panomics Hoifung Poon 1 Overview ATTCGG A TATTTAAG G C - PowerPoint PPT Presentation

Machine Reading for Cancer Panomics Hoifung Poon 1 Overview ATTCGG A TATTTAAG G C ATTCGGGTATTTAAGCC Disease Genes Drug Targets High-Throughput Data KB Cancer Systems Modeling 2 Overview ATTCGG


  1. Machine Reading for Cancer Panomics Hoifung Poon 1

  2. Overview … ATTCGG A TATTTAAG G C … … ATTCGGGTATTTAAGCC … …… …… Disease Genes Drug Targets High-Throughput Data KB …… Cancer Systems Modeling 2

  3. Overview … ATTCGG A TATTTAAG G C … … ATTCGGGTATTTAAGCC … …… …… Disease Genes Drug Targets High-Throughput Data KB … Grounded Extract Pathways Semantic Parsing from PubMed 3

  4. Precision Medicine

  5. Vemurafenib on BRAF-V600 Melanoma Before Treatment 15 Weeks 5

  6. Vemurafenib on BRAF-V600 Melanoma Before Treatment 15 Weeks 23 Weeks 6

  7. 7

  8. Traditional Biology Discovery Targeted Experiments One hypothesis 8

  9. Genomics … ATTCGG A TATTTAAG G C … … ATTCGGGTATTTAAGCC … ? … ATTCGG A TATTTAAG G C … … ATTCGGGTATTTAAGCC … … ATTCGG A TATTTAAG G C … … ATTCGGGTATTTAAGCC … High-Throughput Experiments Discovery Many hypotheses 9

  10. Genome-Wide Association Studies (GWAS) Disease … ATTCGG A TATTTAAG G C … (e.g., Alzheimer, Cancer) Healthy … ATTCGGGTATTTAAGCC … “Genetic diagnosis of diseases would be accomplished in 10 years and that 2000 treatments would start to roll out perhaps five years after that. ” “ A Decade Later, Genetic Maps Yield Few New Cures ” 2010 New York Times, June 2010. 10

  11. Key Challenges  Human genome: 3 billion base pairs  Potential variations: > 10 million variants  Combination: > 10 1000000 (1 million zeros)  Machine learning problem  Atomic features: > 10 million  Feature combination: Too many to enumerate 11

  12. Genomics … ATTCGG A TATTTAAG G C … … ATTCGGGTATTTAAGCC … … ATTCGG A TATTTAAG G C … … ATTCGGGTATTTAAGCC … … ATTCGG A TATTTAAG G C … … ATTCGGGTATTTAAGCC … High-Throughput Experiments Discovery How to Scale Discovery? 12

  13. Cancer Tumor cells … ATTCGG A TATTTAAG G C … Normal cells … ATTCGGGTATTTAAGCC …  Hundreds of mutations  Most are “passenger”, not driver  Can we identify likely drivers? 13

  14. Panomics … ATTCGG A TATTTAAG G C … Genome Transcriptome Epigenome …… 14

  15. Pathway Knowledge Genes work synergistically in pathways 15

  16. Why Hard to Identify Drivers? Complex diseases  Perturb multiple pathways Hanahan & Weinberg [Cell 2011] 16

  17. Why Cancer Comes Back?  Subtypes with alternative pathway profile  Compensatory pathways can be activated EphA2 EphB2 Ovarian Cancer 17

  18. Why Cancer Comes Back?  Subtypes with alternative pathway profile  Compensatory pathways can be activated EphA2 EphB2 X Ovarian Cancer 18

  19. Cancer Systems Modeling Translation Activation Transcription Gene A DNA mRNA Protein Protein Active Functional activity … ATTCGG A TATTTAAG G C … Mutation effect Drug Target …… 19

  20. Knowledge  Model Gene A DNA mRNA Protein Protein Active Transcription Factor Gene B DNA mRNA Protein Protein Active Protein Kinase Gene C DNA mRNA Protein Protein Active 20

  21. Knowledge  Model ? Gene A DNA mRNA Protein Protein Active Transcription Factor Gene B DNA mRNA Protein Protein Active Protein Kinase Gene C DNA mRNA Protein Protein Active 21

  22. Knowledge  Model ? Gene A DNA mRNA Protein Protein Active Transcription Factor Gene B DNA mRNA Protein Protein Active Protein Kinase Gene C DNA mRNA Protein Protein Active 22

  23. Knowledge  Model ! Gene A DNA mRNA Protein Protein Active Transcription Factor Gene B DNA mRNA Protein Protein Active Protein Kinase Gene C DNA mRNA Protein Protein Active 23

  24. Approach: Graph HMM Gene A DNA mRNA Protein Protein Active Transcription Factor Gene B DNA mRNA Protein Protein Active Protein Kinase Gene C DNA mRNA Protein Protein Active 24

  25. Extract Pathways from PubMed … ATTCGG A TATTTAAG G C … … ATTCGGGTATTTAAGCC … …… …… Disease Genes Drug Targets High-Throughput Data KB …… 25

  26. PubMed  24 millions abstracts  Two new abstracts every minute  Adds over one million every year 26

  27. Machine Reading PMID: 123 … VDR+ binds to SMAD3 to form … PMID: 456 … JUN expression is induced by SMAD3/4 … …… 27

  28. Machine Reading Involvement of p70(S6)-kinase activation in IL-10 up-regulation in human monocytes by gp41 envelope protein of human immunodeficiency virus type 1 ... 28

  29. Machine Reading Involvement of p70(S6)-kinase activation in IL-10 up-regulation in human monocytes by gp41 envelope protein of human immunodeficiency virus type 1 ... human p70(S6)-kinase gp41 IL-10 monocyte PROTEIN PROTEIN PROTEIN 29 CELL

  30. Machine Reading Involvement of p70(S6)-kinase activation in IL-10 up-regulation in human monocytes by gp41 envelope protein of human immunodeficiency virus type 1 ... Involvement REGULATION Cause Theme up-regulation activation REGULATION REGULATION Site Theme Cause Theme human p70(S6)-kinase gp41 IL-10 monocyte PROTEIN PROTEIN PROTEIN 30 CELL

  31. Machine Reading Involvement of p70(S6)-kinase activation in IL-10 up-regulation in human monocytes by gp41 envelope protein of human immunodeficiency virus type 1 ... Involvement REGULATION Cause Theme Semantic Parsing up-regulation activation REGULATION REGULATION Site Theme Cause Theme human p70(S6)-kinase gp41 IL-10 monocyte PROTEIN PROTEIN PROTEIN 31 CELL

  32. Long Tail of Variations TP53 inhibits BCL2. Tumor suppressor P53 down-regulates the activity of BCL-2 proteins. BCL2 transcription is suppressed by P53 expression. The inhibition of B-cell CLL/Lymphoma 2 expression by TP53 … …… 32

  33. Bottleneck: Annotated Examples  GENIA ( BioNLP Shared Task 2009-2013 )  1999 abstracts  MeSH: human, blood cell, transcription factor  Challenge for “supervised” machine learning  Can we breach this bottleneck? 33

  34. Free Lunch #1: Distributional Similarity  Similar context  Probably similar meaning  Annotation as latent variables Textual expression  Recursive clusters  Unsupervised semantic parsing Poon & Domingos, “Unsupervised Semantic Parsing”. EMNLP 2009. Best Paper Award . 34

  35. Recursive Clustering TP53 inhibits BCL2. Tumor suppressor P53 down-regulates the activity of BCL-2 proteins. BCL2 transcription is suppressed by P53 expression. The inhibition of B- cell CLL/Lymphoma 2 expression by TP53 … …… 35

  36. Recursive Clustering TP53 inhibits BCL2. Tumor suppressor P53 down-regulates the activity of BCL-2 proteins. BCL2 transcription is suppressed by P53 expression. The inhibition of B-cell CLL/Lymphoma 2 expression by TP53 … …… 36

  37. Recursive Clustering TP53 inhibits BCL2. Tumor suppressor P53 down-regulates the activity of BCL-2 proteins. BCL2 transcription is suppressed by P53 expression. The inhibition of B-cell CLL/Lymphoma 2 expression by TP53 … …… 37

  38. Recursive Clustering TP53 inhibits BCL2. Tumor suppressor P53 down-regulates the activity of BCL-2 proteins. BCL2 transcription is suppressed by P53 expression. The inhibition of B-cell CLL/Lymphoma 2 expression by TP53 … …… inhibits, down-regulates, suppresses, inhibition, … Theme Cause BCL2, BCL-2 proteins, TP53,Tumor B-cell CLL/Lymphoma 2 suppressor P53 …… …… 38

  39. Free Lunch #2: Existing KBs  Many KBs available  Gene/Protein: GeneBank, UniProt , …  Pathways: NCI, Reactome, KEGG, BioCarta , …  Annotation as latent variables Textual expression  Table, column, join, …  Grounded semantic parsing 39

  40. Entity Extraction ID Symbol Alias B- cell CLL/Lymphoma 2, … 990 BCL2 HGNC Tumor suppressor P53, … 11998 TP53 … … … 40

  41. Entity Extraction ID Symbol Alias B- cell CLL/Lymphoma 2, … 990 BCL2 HGNC Tumor suppressor P53, … 11998 TP53 … … … TP53 inhibits BCL2. Tumor suppressor P53 down-regulates the activity of BCL-2 proteins. BCL2 transcription is suppressed by P53 expression. The inhibition of B-cell CLL/Lymphoma 2 expression by TP53 … …… 41

  42. Relation Extraction Regulation Theme Cause Positive A2M FOXO1 NCI-PID Positive ABCB1 TP53 Pathway KB Negative BCL2 TP53 … … … TP53 inhibits BCL2. Tumor suppressor P53 down-regulates the activity of BCL-2 proteins. BCL2 transcription is suppressed by P53 expression. The inhibition of B- cell CLL/Lymphoma 2 expression by TP53 … …… 42

  43. Relation Extraction Regulation Theme Cause Positive A2M FOXO1 NCI-PID Positive ABCB1 TP53 Pathway KB Negative BCL2 TP53 … … … TP53 inhibits BCL2. Tumor suppressor P53 down-regulates the activity of BCL-2 proteins. Grounded Learning BCL2 transcription is suppressed by P53 expression. The inhibition of B- cell CLL/Lymphoma 2 expression by TP53 … …… 43

  44. Question Answering w.r.t. KB System Accuracy ZC07 84.6 Supervised FUBL 82.8 GUSP 83.5 Unsupervised Poon, “Grounded Unsupervised Semantic Parsing”. ACL 2013. 44

  45. Pathway Extraction  Generalize distant supervision : Nested events in KB likely occur in semantic parse of some sentence  Prior: Favor semantic parse grounded in KB  Outperformed the majority of participants in original GENIA Event Shared Task Parikh, Poon, Toutanova. In Progress . 45

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend