medical t ext data
play

Medical T ext Data Sendong (Stan) Zhao + , Meng Jiang * , Ming Liu + - PowerPoint PPT Presentation

CausalTriad: T oward Pseudo Causal Relation Discovery and Hypotheses Generation from Medical T ext Data Sendong (Stan) Zhao + , Meng Jiang * , Ming Liu + , Bing Qin + , Ting Liu + + Harbin Institute of Technology, China * University of Notre


  1. CausalTriad: T oward Pseudo Causal Relation Discovery and Hypotheses Generation from Medical T ext Data Sendong (Stan) Zhao + , Meng Jiang * , Ming Liu + , Bing Qin + , Ting Liu + + Harbin Institute of Technology, China * University of Notre Dame, USA

  2. Pseudo Causal Relation • Golden standard ⁃ Randomized controlled experiments ⁃ Too costly • Observational data ⁃ Structured data, eg. EHR ⁃ Unstructured data ( Text data ), eg. medical literature, patient report • Pseudo causal relation ⁃ Semantic-level causal relations ⁃ Verified true causal knowledge ⁃ Or, have not been identified previously ⁃ Or, no evidence to support them

  3. Previous Studies • Extract causal relations from single sentences • While causal relations usually span multiple sentences • Use only textual information and ignore structural information • While causal relations naturally have an attached network structure • Only extraction rather than inference • While causality itself is a basic logical rule

  4. Causation Transitivity • Preserving transitivity is a basic desideratum for an adequate analysis of causation -- L. A. Paul and Ned Hall “ Causation: A User’s Guide” 𝐷 𝐶 𝐷 𝐵 𝐵 ……

  5. Causation Transitivity in Medical Text Obesity Diabetes cause Obesity usually increases the risk of diabetes. cause ? People with diabetes have more sugar in blood Hyperglycemia called hyperglycemia . ? Metformin has become a mainstay of type 2 diabetes management and is now the recommended first-line drug for treating the disease. Metformin

  6. Motivation • Jointly utilize ⁃ Textual information (context and co-occurrence) ⁃ Structural information (causation transitivity rule) • Through inference to ⁃ Discover causal relations in text ⁃ Generate new causal relation hypotheses

  7. Problem Definition • Problem : Causal Relation Discovery from Triad Structures • Medical Cause- Effect Candidates Network 𝐻 = 𝑊, 𝐹 , 𝐹 ∈ 𝑊 × 𝑊 • Triad Structure ⁃ Each Triangle in the network ⁃ Basic unit

  8. Our method • Causal Relation Candidates Matching • 3 Clues for Causal Discovery ⁃ Causal Association ⁃ Contextual Information ⁃ Causal Transitivity Rules • Factor Graph Model

  9. Causal Relation Candidates Matching • Medical Dictionary ⁃ Dryad data package ⁃ TCMonline and TCMID • For every n consecutive sentences • Match medical entities • Pair each of them into several pairs • Every two pairs with a shared entity generate a triad structure • Eg. ( 𝑓 𝑗 , 𝑓 𝑙 ) and ( 𝑓 𝑗 , 𝑓 𝑘 ) generate a triad structure ( 𝑓 𝑙 , 𝑓 𝑗 , 𝑓 𝑘 )

  10. Our method • Causal Relation Candidates Matching • 3 Clues for Causal Discovery ⁃ Causal Association ⁃ Contextual Information ⁃ Causal Transitivity Rules • Factor Graph Model

  11. 3 Clues for Causal Discovery • Causal Association ⁃ Frequently co-occurring entities are more likely to be a causation [Do and Roth 2013] ⁃ e i is a possible cause of entity e j , if e j happens more frequently with e i than by itself [Suppes 1970] • Contextual Information ⁃ Causal relations in the text tend to share special contexts ⁃ Like domain-related words, causal triggers, connectives, etc. • Causation Transitivity Rule

  12. Causal Association • Modeling causal association 𝐷𝐵 𝑓 𝑗𝑘 = 𝐽(𝑓 𝑗 , 𝑓 𝑘 ) × 𝐸(𝑓 𝑗 , 𝑓 𝑘 ) × 𝑁𝑏𝑦(𝑣 𝑗 , 𝑣 𝑘 ) ⁃ Larger mutual information 𝑘 = 𝑚𝑝𝑕 𝑄(𝑓 𝑗 , e j ) 𝐽 𝑓 𝑗 , 𝑓 𝑄 𝑓 𝑗 𝑄(𝑓 𝑘 ) ⁃ Award pairs that co-exist closer, while penalizing those are further apart in text 𝑘 = − log 𝑡𝑓𝑜𝑢 𝑓 𝑗 − 𝑡𝑓𝑜𝑢 𝑓 + 1 𝑘 𝐸 𝑓 𝑗 , 𝑓 2 × 𝑋𝑇 ⁃ Model the frequency of co-occurrence of two medical entities, 𝑁𝑏𝑦 𝑣 𝑗 , 𝑣 𝑘 𝑄(𝑓 𝑗 ,𝑓 𝑘 ) 𝑄(𝑓 𝑗 ,𝑓 𝑘 ) 𝑣 𝑗 = −𝑄(𝑓 𝑗 ,𝑓 𝑘 )+𝜁 , 𝑣 𝑘 = max 𝑄 𝑓 𝑗 ,𝑓 𝑙 max 𝑄 𝑓 𝑙 ,𝑓 𝑘 −𝑄(𝑓 𝑗 ,𝑓 𝑘 )+𝜁 𝑙 𝑙

  13. Contextual Information (1) • Encode Synthetic Context

  14. Contextual Information (2) • Encode context based on pre-trained word2vec Word Embedding • Three ways

  15. Causation Transitivity Rules • angle rules and triadic rule

  16. Integrate 3 Clues • Combining evidence from both textual supports and structural inferences, the above three clues are better equipped to discover causal relations. • They are complementary in several ways: ⁃ Causal association gives preferences to frequently co-occurring causal pairs. ⁃ Causal transitivity rules are designed to identify causal relations with few textual supports except for those that follow the transitivity rule and generate new causal hypothesis. ⁃ Incorporating contextual information from the text can potentially eliminate those frequently co-occurring medical entities which are not causal.

  17. Our method • Causal Relation Candidates Matching • 3 Clues for Causal Discovery ⁃ Causal Association ⁃ Contextual Information ⁃ Causal Transitivity Rules • Factor Graph Model

  18. CausalTriad: Factor Graph for Each Triad Structure

  19. Experiments • Data collection ⁃ TCM consists of the abstracts of 106,151 papers. ⁃ HealthBoards consists of post messages on health and medical issues such as diseases, symptoms, medicines, and side- effects, etc.

  20. Experimental Results • Generating new causal relation hypotheses

  21. Experimental Results • Different types of causal relations ⁃ DISEASE – cause – SYMPTOM ⁃ FORMULA – against – DISEASE ⁃ HERB – against – DISEASE ⁃ FORMULA – relieve – SYMPTOM ⁃ HERB – relieve – SYMPTOM ⁃ DISEASE – bring – DISEASE ⁃ DRUG – against – DISEASE ⁃ DISEASE – cause – SYMPTOM

  22. Experimental Results • Patterns causal reasoning rules

  23. Experimental Results • Causal relation extraction

  24. Experimental Results • Extracting causal relations from single sentence and multiple sentences. • Extracting implicit causal relations

  25. Influence Factors • Influence from the size of labeled training data

  26. Influence Factors • Influence from the number of bootstrapping rounds and window size

  27. Conclusions • We propose CausalTriad to incorporate both textual and structural clues for causal relation discovery from text. • Experimental results on two datasets demonstrate that: ⁃ CausalTriad is effective for discovering explicit and implicit causal relations from both single sentence and multiple sentences. ⁃ CausalTriad can generate new causal relation hypotheses through inference.

  28. Sendong (Stan) Zhao Meng Jiang Ming Liu Bing Qin Ting Liu Thank You! Any comments and suggestions? Homepage: http://ir.hit.edu.cn/~sdzhao/ Email: zhaosendong@gmail.com

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend