Medical T ext Data Sendong (Stan) Zhao + , Meng Jiang * , Ming Liu + - - PowerPoint PPT Presentation

medical t ext data
SMART_READER_LITE
LIVE PREVIEW

Medical T ext Data Sendong (Stan) Zhao + , Meng Jiang * , Ming Liu + - - PowerPoint PPT Presentation

CausalTriad: T oward Pseudo Causal Relation Discovery and Hypotheses Generation from Medical T ext Data Sendong (Stan) Zhao + , Meng Jiang * , Ming Liu + , Bing Qin + , Ting Liu + + Harbin Institute of Technology, China * University of Notre


slide-1
SLIDE 1

CausalTriad: T

  • ward Pseudo Causal Relation

Discovery and Hypotheses Generation from Medical T ext Data

Sendong (Stan) Zhao+, Meng Jiang*, Ming Liu+, Bing Qin+, Ting Liu+

+Harbin Institute of Technology, China *University of Notre Dame, USA

slide-2
SLIDE 2

Pseudo Causal Relation

  • Golden standard
  • Randomized controlled experiments
  • Too costly
  • Observational data
  • Structured data, eg. EHR
  • Unstructured data (Text data), eg. medical literature, patient report
  • Pseudo causal relation
  • Semantic-level causal relations
  • Verified true causal knowledge
  • Or, have not been identified previously
  • Or, no evidence to support them
slide-3
SLIDE 3

Previous Studies

  • Extract causal relations from single sentences
  • While causal relations usually span multiple sentences
  • Use only textual information and ignore structural information
  • While causal relations naturally have an attached network structure
  • Only extraction rather than inference
  • While causality itself is a basic logical rule
slide-4
SLIDE 4

Causation Transitivity

  • Preserving transitivity is a basic desideratum for an

adequate analysis of causation

  • -L. A. Paul and Ned Hall “Causation: A User’s Guide”

𝐵 𝐶 …… 𝐷 𝐵 𝐷

slide-5
SLIDE 5

Causation Transitivity in Medical Text

Obesity usually increases the risk of diabetes. People with diabetes have more sugar in blood called hyperglycemia. Metformin has become a mainstay of type 2 diabetes management and is now the recommended first-line drug for treating the disease.

Obesity Diabetes Hyperglycemia Metformin ? ? cause cause

slide-6
SLIDE 6

Motivation

  • Jointly utilize
  • Textual information (context and co-occurrence)
  • Structural information (causation transitivity rule)
  • Through inference to
  • Discover causal relations in text
  • Generate new causal relation hypotheses
slide-7
SLIDE 7

Problem Definition

  • Problem: Causal Relation Discovery from Triad Structures
  • Medical Cause-Effect Candidates Network

𝐻 = 𝑊, 𝐹 , 𝐹 ∈ 𝑊 × 𝑊

  • Triad Structure
  • Each Triangle in the network
  • Basic unit
slide-8
SLIDE 8

Our method

  • Causal Relation Candidates Matching
  • 3 Clues for Causal Discovery
  • Causal Association
  • Contextual Information
  • Causal Transitivity Rules
  • Factor Graph Model
slide-9
SLIDE 9

Causal Relation Candidates Matching

  • Medical Dictionary
  • Dryad data package
  • TCMonline and TCMID
  • For every n consecutive sentences
  • Match medical entities
  • Pair each of them into several pairs
  • Every two pairs with a shared entity generate a triad structure
  • Eg. (𝑓𝑗, 𝑓𝑙) and (𝑓𝑗, 𝑓

𝑘) generate a triad structure (𝑓𝑙, 𝑓𝑗, 𝑓 𝑘)

slide-10
SLIDE 10

Our method

  • Causal Relation Candidates Matching
  • 3 Clues for Causal Discovery
  • Causal Association
  • Contextual Information
  • Causal Transitivity Rules
  • Factor Graph Model
slide-11
SLIDE 11

3 Clues for Causal Discovery

  • Causal Association
  • Frequently co-occurring entities are more likely to be a causation [Do and

Roth 2013]

  • ei is a possible cause of entity ej, if ej happens more frequently with ei than by

itself [Suppes 1970]

  • Contextual Information
  • Causal relations in the text tend to share special contexts
  • Like domain-related words, causal triggers, connectives, etc.
  • Causation Transitivity Rule
slide-12
SLIDE 12

Causal Association

  • Modeling causal association

𝐷𝐵 𝑓𝑗𝑘 = 𝐽(𝑓𝑗, 𝑓

𝑘) × 𝐸(𝑓𝑗, 𝑓 𝑘) × 𝑁𝑏𝑦(𝑣𝑗, 𝑣𝑘)

  • Larger mutual information

𝐽 𝑓𝑗, 𝑓

𝑘 = 𝑚𝑝𝑕 𝑄(𝑓𝑗, ej)

𝑄 𝑓𝑗 𝑄(𝑓

𝑘)

  • Award pairs that co-exist closer, while penalizing those are further apart in text

𝐸 𝑓𝑗, 𝑓

𝑘 = − log 𝑡𝑓𝑜𝑢 𝑓𝑗 − 𝑡𝑓𝑜𝑢 𝑓 𝑘

+ 1 2 × 𝑋𝑇

  • Model the frequency of co-occurrence of two medical entities, 𝑁𝑏𝑦 𝑣𝑗, 𝑣𝑘

𝑣𝑗 =

𝑄(𝑓𝑗,𝑓𝑘) max

𝑙

𝑄 𝑓𝑗,𝑓𝑙 −𝑄(𝑓𝑗,𝑓𝑘 )+𝜁 , 𝑣𝑘 = 𝑄(𝑓𝑗,𝑓𝑘) max

𝑙

𝑄 𝑓𝑙,𝑓𝑘 −𝑄(𝑓𝑗,𝑓𝑘 )+𝜁

slide-13
SLIDE 13

Contextual Information (1)

  • Encode Synthetic Context
slide-14
SLIDE 14

Contextual Information (2)

  • Encode context based on pre-trained word2vec Word Embedding
  • Three ways
slide-15
SLIDE 15

Causation Transitivity Rules

  • angle rules and triadic rule
slide-16
SLIDE 16

Integrate 3 Clues

  • Combining evidence from both textual supports and structural

inferences, the above three clues are better equipped to discover causal relations.

  • They are complementary in several ways:
  • Causal association gives preferences to frequently co-occurring causal pairs.
  • Causal transitivity rules are designed to identify causal relations with few

textual supports except for those that follow the transitivity rule and generate new causal hypothesis.

  • Incorporating contextual information from the text can potentially eliminate

those frequently co-occurring medical entities which are not causal.

slide-17
SLIDE 17

Our method

  • Causal Relation Candidates Matching
  • 3 Clues for Causal Discovery
  • Causal Association
  • Contextual Information
  • Causal Transitivity Rules
  • Factor Graph Model
slide-18
SLIDE 18

CausalTriad: Factor Graph for Each Triad Structure

slide-19
SLIDE 19

Experiments

  • Data collection
  • TCM consists of the abstracts of 106,151 papers.
  • HealthBoards consists of post messages on health and medical issues such as

diseases, symptoms, medicines, and side-effects, etc.

slide-20
SLIDE 20
  • Generating new causal relation hypotheses

Experimental Results

slide-21
SLIDE 21
  • Different types of causal relations
  • DISEASE–cause–SYMPTOM
  • FORMULA–against–DISEASE
  • HERB–against–DISEASE
  • FORMULA–relieve–SYMPTOM
  • HERB–relieve–SYMPTOM
  • DISEASE–bring–DISEASE
  • DRUG–against–DISEASE
  • DISEASE–cause–SYMPTOM

Experimental Results

slide-22
SLIDE 22
  • Patterns causal reasoning rules

Experimental Results

slide-23
SLIDE 23
  • Causal relation extraction

Experimental Results

slide-24
SLIDE 24
  • Extracting causal relations from single sentence and multiple

sentences.

  • Extracting implicit causal relations

Experimental Results

slide-25
SLIDE 25

Influence Factors

  • Influence from the size of labeled training data
slide-26
SLIDE 26

Influence Factors

  • Influence from the number of bootstrapping rounds and window size
slide-27
SLIDE 27

Conclusions

  • We propose CausalTriad to incorporate both textual and structural

clues for causal relation discovery from text.

  • Experimental results on two datasets demonstrate that:
  • CausalTriad is effective for discovering explicit and implicit causal relations

from both single sentence and multiple sentences.

  • CausalTriad can generate new causal relation hypotheses through inference.
slide-28
SLIDE 28

Thank You!

Any comments and suggestions?

Homepage: http://ir.hit.edu.cn/~sdzhao/ Email: zhaosendong@gmail.com

Sendong (Stan) Zhao Meng Jiang Ming Liu Ting Liu Bing Qin