Machine learning vs. knowledge based approaches to ADR - - PowerPoint PPT Presentation

machine learning vs knowledge based approaches to adr
SMART_READER_LITE
LIVE PREVIEW

Machine learning vs. knowledge based approaches to ADR - - PowerPoint PPT Presentation

Machine learning vs. knowledge based approaches to ADR identification November 2017 Topics Short about us Iden3fying ADRs Machine Learning for seman3c rela3ons iden3fica3on Results Challenges SHORT ABOUT US Focus on


slide-1
SLIDE 1

Machine learning vs. knowledge based approaches to ADR identification

November 2017

slide-2
SLIDE 2

Topics

  • Short about us
  • Iden3fying ADRs
  • Machine Learning for seman3c rela3ons

iden3fica3on

  • Results
  • Challenges
slide-3
SLIDE 3

Voi

  • ice of
  • f the

he Pa3 Pa3ent nt Ele Electroni

  • nic He

Healt alth h Re Recor

  • rds

ds Othe her text xt sou sources s

Scien3fic literature – FDA – Patents – Business opportuni3es SHORT ABOUT US

Focus on text-analy3cs for Pharmaceu3cals. Since 1998

slide-4
SLIDE 4

SHORT ABOUT US

Advanced Databases Group, Universidad Carlos III de Madrid § Research lines:

  • Natural language processing
  • Accessibility

§ Resources produced:

  • Drug-drug-interac-on collec-on (DDI

Corpus)

  • DINTO ontology
slide-5
SLIDE 5

Our goal at TAC ADR

“Combine Knowledge Based with Machine Learning

Based approaches to leverage ADR identification”

slide-6
SLIDE 6

Iden%fying ADRs

slide-7
SLIDE 7

TO TOPI PIC C EXTRA EXTRACTI CTION N

NLP and Resource based approach

¡ SI

SIDER DER

¡ UMLS

UMLS

¡ Tr

Trai aini ning ng cor

  • rpus

s

slide-8
SLIDE 8

TO TOPI PIC C EXTRA EXTRACTI CTION N

NLP and Resource based approach

¡ SI

SIDER DER

¡ UMLS

UMLS

¡ Tr

Trai aini ning ng cor

  • rpus

s

Dictionary #entries Adverse Reactions 21,826 Factor 41 Severity 158 Animal 27 DrugClass 101

slide-9
SLIDE 9

TO TOPI PIC C EXTRA EXTRACTI CTION N

NLP and Resource based approach

  • And some rules to iden3fy nega3on:
  • MeaningCloud Insights Engine API supports this rule syntax
slide-10
SLIDE 10

Machine Learning for seman%c rela%ons iden%fica%on

slide-11
SLIDE 11

Machine learning for seman3c rela3ons iden3fica3on Represen3ng ADR men3on context through a set of features:

Ø M1TXT, M1TXT, M2TXT, M2TXT, BWTXT BWTXT: the text of both/between men3ons. Ø C1BO C1BOW, W, C2BO C2BOW: bag-of-words of both men3ons. Ø C1PO C1POS, S, C2PO C2POS: part of speech of both men3ons. Ø PB1PO PB1POS, S, PA PA1PO 1POS, S, PB2PO PB2POS, S, PA PA2PO 2POS, S, PWPO PWPOS: the PoS tags of the two tokens before/ aZer/between both men3ons. Ø WA WA1TXT, 1TXT, WB2TXT, WB2TXT, WA WA2TXT, 2TXT, WB1TXT WB1TXT: the two tokens aZer/before the men3on. Ø LA LA1LEM, 1LEM, LB2LEM, LB2LEM, LA LA2LEM, 2LEM, LB1LEM LB1LEM: the lemmas of the two tokens aZer/before both men3ons. Ø LWLEM LWLEM: the lemmas between of the two men3ons Ø NTO NTOKB KB: the number of tokens between the two men3ons.

ADRMen9on – Other pairs (where Other is Severity, DrugClass, Nega9on, Animal or Factor)

slide-12
SLIDE 12

Machine learning for seman3c rela3ons iden3fica3on And the algorithm? § SVM, support vector machines (using scikit-learn on Python) § Specifically, SVC implementa3on:

  • Default parameter values
  • Linear kernel

But, no deep learning??!! Of course (CNN), but not in the official runs.

slide-13
SLIDE 13

Results

slide-14
SLIDE 14

Results

  • Task 1. ADR and related en33es
  • Task 2. Rela3ons between ADRs and en33es

Low precision!! Oh, oh!!

slide-15
SLIDE 15

Results

  • Task 3. Posi3ve ADRs
  • Task 4. Normaliza3on through MedDRA

PreQy good!! Only a few negated men9ons? Using dic9onaries with seman9c informa9on produces nice results

slide-16
SLIDE 16

Challenges

slide-17
SLIDE 17

Challenges

  • Nega3on iden3fica3on requires more effort (not only on the ADRs field).

Some weird things found in the test set: Eg.: The most frequently observed malignancies other than non- melanoma skin cancer …

  • CNNs and the use of syntac3c features improves results

Negation? P R F1 Other 0.71 0.81 0.76 Negated 0.72 0.40 0.51 Hypothetical 0.75 0.75 0.75 Effect 0.76 0.61 0.68 Avg / total 0.73 0.73 0.73

slide-18
SLIDE 18

Challenges

  • Recall must be improved:
  • separated mul3word men3ons
  • ADRs with no MedDRA code, enough lexical resources?
  • How to approach errors when applying deep learning?
  • Enough accuracy for prac3cal applica3ons? What does FDA think?
slide-19
SLIDE 19

Thanks

QUESTIONS?

slide-20
SLIDE 20

LabDA Resources Corpus DDI (Drug-Drug Interac3ons)

}

1,025 annotated documents, 18,502 en33es and 5,028 DDIs (by expert pharma)

}

MedLine and DrugBank texts

}

Annota3ns guidelines and interannotator agreement.

}

Available at labda.inf.uc3m.es

}

Used at DDIExtrac3on 2011 and DDIExtrac3on 2013 Semeval Tasks

slide-21
SLIDE 21

LabDA Resources DINTO Ontology- knowledge about drugs and intera3ons (11,555 DDIs and 8,786 pharmacological en33es). Available at OBO Foundry Applica3on to Informa3on Extrac3on and Predic3on

slide-22
SLIDE 22

Automa3ng the extrac3on of Meaning from any informa3on source.

35-37 36th Street 11106 Astoria NY jmar3nez@meaningcloud.com Phone: +1 (646) 403-3104

Addr ddress ss Cont Contac act Inf nfo

  • Te

Tele lephone hone

Me Meani aning ngClou Cloud d LLC LLC

meaningcloud.com