cancer panomics
play

Cancer Panomics Hoifung Poon 1 Overview ATTCGG A TATTTAAG G C - PowerPoint PPT Presentation

Semantic Parsing for Cancer Panomics Hoifung Poon 1 Overview ATTCGG A TATTTAAG G C ATTCGGGTATTTAAGCC Disease Genes Drug Targets High-Throughput Data KB 2 Overview ATTCGG A TATTTAAG G C


  1. Semantic Parsing for Cancer Panomics Hoifung Poon 1

  2. Overview … ATTCGG A TATTTAAG G C … … ATTCGGGTATTTAAGCC … …… …… Disease Genes Drug Targets High-Throughput Data KB …… 2

  3. Overview … ATTCGG A TATTTAAG G C … … ATTCGGGTATTTAAGCC … …… …… Disease Genes Drug Targets High-Throughput Data KB …… Infer cancer driver mutations 3

  4. Overview … ATTCGG A TATTTAAG G C … … ATTCGGGTATTTAAGCC … …… …… Disease Genes Drug Targets High-Throughput Data KB … Grounded Extract Pathways Unsupervised Semantic Parsing from Pubmed 4

  5. Collaborators David Heckerman Kristina Toutanova Chris Quirk Lucy Vanderwende Tony Gitter Ankur Parikh 5

  6. Precision Medicine

  7. Vemurafenib on BRAF-V600 Melanoma Before Treatment 15 Weeks 7

  8. Vemurafenib on BRAF-V600 Melanoma Before Treatment 15 Weeks 23 Weeks 8

  9. 9

  10. Traditional Biology Discovery Targeted Experiments One hypothesis 10

  11. Genomics … ATTCGG A TATTTAAG G C … … ATTCGGGTATTTAAGCC … ? … ATTCGG A TATTTAAG G C … … ATTCGGGTATTTAAGCC … … ATTCGG A TATTTAAG G C … … ATTCGGGTATTTAAGCC … High-Throughput Experiments Discovery Many hypotheses 11

  12. Genome-Wide Association Studies (GWAS) Disease … ATTCGG A TATTTAAG G C … (e.g., Alzheimer, Cancer) Healthy … ATTCGGGTATTTAAGCC … “Genetic diagnosis of diseases would be accomplished in 10 years and that 2000 treatments would start to roll out perhaps five years after that. ” “ A Decade Later, Genetic Maps Yield Few New Cures ” 2010 New York Times, June 2010. 12

  13. Key Challenges  Human genome: 3 billion base pairs  Potential variations: > 10 million mutations  Combination: > 10 1000000 (1 million zeros)  Machine learning problem  Atomic features: > 10 million  Feature combination: Too many to enumerate 13

  14. Genomics … ATTCGG A TATTTAAG G C … … ATTCGGGTATTTAAGCC … … ATTCGG A TATTTAAG G C … … ATTCGGGTATTTAAGCC … … ATTCGG A TATTTAAG G C … … ATTCGGGTATTTAAGCC … High-Throughput Experiments Discovery How to Scale Discovery? 14

  15. Cancer Tumor cells … ATTCGG A TATTTAAG G C … Normal cells … ATTCGGGTATTTAAGCC …  Hundreds of mutations  Most are “passenger”, not driver  Can we identify likely drivers? 15

  16. Panomics … ATTCGG A TATTTAAG G C … Genome Transcriptome Epigenome …… 16

  17. Pathway Knowledge Genes work synergistically in pathways 17

  18. Why Hard to Identify Drivers?  Complex diseases  Synergistic perturbation of multiple pathways  Cancer: 6  8 “hallmarks”  Promote growth  Avoid suicide  Evade immune attack  Induce blood vessels  Invade neighboring tissues  … 18

  19. Hanahan & Weinberg [Cell 2011] 19

  20. Why Cancer Comes Back?  Subtypes with alternative pathway profile  Compensatory pathways can be activated EphA2 EphB2 Ovarian Cancer 20

  21. Why Cancer Comes Back?  Subtypes with alternative pathway profile  Compensatory pathways can be activated EphA2 EphB2 X Ovarian Cancer 21

  22. A Grammar of Cancer? Cancer  Anti-Apoptosis & ProGrowth & … Anti-Apoptosis  Deactivate TP53 Anti-Apoptosis  Activate BCL-2 … 22

  23. Infer Cancer Driver Mutations Translation Activation Transcription Gene A DNA mRNA Protein Protein Active What’s the level of activity? … ATTCGG A TATTTAAG G C … Is change caused by mutation? 23

  24. Pathway Knowledge Gene A DNA mRNA Protein Protein Active Transcription Factor Gene B DNA mRNA Protein Protein Active Protein Kinase Gene C DNA mRNA Protein Protein Active 24

  25. Pathway Knowledge ? Gene A DNA mRNA Protein Protein Active Transcription Factor Gene B DNA mRNA Protein Protein Active Protein Kinase Gene C DNA mRNA Protein Protein Active 25

  26. Pathway Knowledge ? Gene A DNA mRNA Protein Protein Active Transcription Factor Gene B DNA mRNA Protein Protein Active Protein Kinase Gene C DNA mRNA Protein Protein Active 26

  27. Pathway Knowledge ! Gene A DNA mRNA Protein Protein Active Transcription Factor Gene B DNA mRNA Protein Protein Active Protein Kinase Gene C DNA mRNA Protein Protein Active 27

  28. Approach: Graph HMM Gene A DNA mRNA Protein Protein Active Transcription Factor Gene B DNA mRNA Protein Protein Active Protein Kinase Gene C DNA mRNA Protein Protein Active 28

  29. Extract Pathways from Pubmed … ATTCGG A TATTTAAG G C … … ATTCGGGTATTTAAGCC … …… …… Disease Genes Drug Targets High-Throughput Data KB …… 29

  30. PubMed  22 millions abstracts  Two new abstracts every minute  Adds 2000-4000 every day 30

  31. Extract Pathways from Pubmed PMID: 123 … VDR+ binds to SMAD3 to form … PMID: 456 … JUN expression is induced by SMAD3/4 … …… 31

  32. Extract Complex Knowledge Involvement of p70(S6)-kinase activation in IL-10 up-regulation in human monocytes by gp41 envelope protein of human immunodeficiency virus type 1 ... Involvement up-regulation activation human p70(S6)-kinase gp41 IL-10 monocyte 32

  33. Extract Complex Knowledge Involvement of p70(S6)-kinase activation in IL-10 up-regulation in human monocytes by gp41 envelope protein of human immunodeficiency virus type 1 ... Involvement REGULATION up-regulation activation REGULATION REGULATION human p70(S6)-kinase gp41 IL-10 monocyte PROTEIN PROTEIN PROTEIN 33 CELL

  34. Extract Complex Knowledge Involvement of p70(S6)-kinase activation in IL-10 up-regulation in human monocytes by gp41 envelope protein of human immunodeficiency virus type 1 ... Involvement REGULATION Cause Theme up-regulation activation REGULATION REGULATION Site Theme Cause Theme human p70(S6)-kinase gp41 IL-10 monocyte PROTEIN PROTEIN PROTEIN 34 CELL

  35. Extract Complex Knowledge Involvement of p70(S6)-kinase activation in IL-10 up-regulation in human monocytes by gp41 envelope protein of human immunodeficiency virus type 1 ... Involvement REGULATION Semantic Parsing Cause Theme up-regulation activation REGULATION REGULATION Site Theme Cause Theme human p70(S6)-kinase gp41 IL-10 monocyte PROTEIN PROTEIN PROTEIN 35 CELL

  36. Bottleneck: Annotated Examples  GENIA ( BioNLP Shared Task 2009-2013 )  1999 abstracts  MeSH: human, blood cell, transcription factor  Can we breach the annotation bottleneck? 36

  37. Free Lunch #1: Distributional Similarity  Similar context  Probably similar meaning  Annotation as latent variables Textual expression  Recursive clusters  Unsupervised semantic parsing Poon & Domingos, “Unsupervised Semantic Parsing”. EMNLP-2009 (Best Paper Award). 37

  38. Problem Formulation Dependency tree Semantic parse Probability Parsing Learning Prior: Favor fewer parameters 38

  39. Free Lunch #2: Existing KBs  Many KBs available  Gene/Protein: GeneBank, UniProt , …  Pathways: NCI, Reactome, KEGG, BioCarta , …  Annotation as latent variables Textual expression  Table, column, join, …  Grounded unsupervised semantic parsing Poon, “Grounded Unsupervised Semantic Parsing”. ACL -13. 39

  40. Natural-Language Interface to Database Get flight from Toronto to San Diego stopping at DTW SELECT flight.flight_id FROM flight, city, city c2, flight_stop, airport_service, airport_service as2 WHERE flight.from_airport = airport_service.airport_code AND flight.to_airport = as2.airport_code AND airport_service.city_code = city.city_code AND as2.city_code = city2.city_code AND city.city_name = ‘ toronto ’ AND city2.city_name = ‘san diego ’ AND flight_stop.flight_id = flight.flight_id AND flight_stop.stop_airport = ‘ dtw ’ Answers 40

  41. Clusters  KB Elements  Entity: Table, Column, Cell  Relation: Relational join  Priors:  Favor lexical similarity  Favor short relational joins 41

  42. GUSP: Key Ideas  Leverage target database JOB Bootstrap learning Job ID Company System with lexical prior 001 IBM Unix Prior: Favor Unix → System 002 Roche IBM 003 Microsoft Windows …… 42

  43. GUSP: Key Ideas  Leverage target database Flight Airport …… …… Flight ID From Airport Airport Code Airport Name Foreign Key 43

  44. GUSP: Key Ideas  Leverage target database Flight Airport 44

  45. GUSP: Key Ideas  Leverage target database Airline Days Fare Flight Airport 45

  46. GUSP: Key Ideas  Leverage target database Airline Airline Days Days Fare Fare Flight Flight Airport Airport ? flight BWI 46

  47. GUSP: Key Ideas  Leverage target database Airline Days Fare Leverage schema to guide learning Flight Airport Prior: Favor shorter join flight BWI 47

  48. Free Lunch #3: Dependency Parses  Start from syntactic parse  Rich resources and available parsers  Intractable structure learning  Tree HMM  Exact inference is linear-time  Need to handle syntax-semantics mismatch 48

  49. Syntax-Semantics Mismatch get from flight to diego toronto stopping san at dtw 49

  50. Syntax-Semantics Mismatch get from flight to diego toronto stopping san at dtw 50

  51. Syntax-Semantics Mismatch get from flight to diego toronto stopping san at dtw 51

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend