Thomas Wood NLP/data science consultant Past projects Boehringer - - PowerPoint PPT Presentation

thomas wood nlp data science consultant past projects
SMART_READER_LITE
LIVE PREVIEW

Thomas Wood NLP/data science consultant Past projects Boehringer - - PowerPoint PPT Presentation

Thomas Wood NLP/data science consultant Past projects Boehringer Ingelheim - pharma CV Library: Predict industries/salaries from CV - word2vec + CNN Predict search terms from CV - LSTM Forensic stylometry demo


slide-1
SLIDE 1

Thomas Wood NLP/data science consultant

slide-2
SLIDE 2

Past projects

  • Boehringer Ingelheim - pharma
  • CV Library:

○ Predict industries/salaries from CV - word2vec + CNN ○ Predict search terms from CV - LSTM

  • Forensic stylometry demo

○ Identifying author

  • Chatbots

○ Intelligent home etc ○ Question answering about products

  • Document clustering, classification, trend detection, sentiment analysis
  • Cambridge Masters: anaphora resolution it’s raining
slide-3
SLIDE 3

Boehringer Ingelheim

  • Before running a clinical trial a

pharma company writes a 200 page PDF called a protocol.

  • I developed an ML model which

extracts important data from the protocol: type of treatment, toxicity, number of subjects, etc.

slide-4
SLIDE 4

Boehringer Ingelheim (2)

  • Company has factories all over the world. Most medicines go through

multiple facilities and countries before going to market.

  • When manufacturing defect occurs it is written in free text in local

language by factory worker, e.g. temperature deviation of 5 degrees due to crack in vial probably occurring in transit

  • I ran unsupervised topic detection to identify commonest problems in

various categories of products from the unstructured text data.

slide-5
SLIDE 5

CV-Library

  • Upload CV
  • Goes through word2vec
  • Recommends industry
  • Use TensorFlow NMT to

recommend search term

○ Repurposed Viet translator

  • Trained on 12 million CVs
  • Deployed on GCP
  • 7% increase in signups - £££

When you upload a CV, it gets converted to TXT and passed through deep NN ... Then some fields which candidate previously filled out, get autofilled! Result: more engagement, fewer dropouts This was 2.5 years ago, before ELMO/BERT

slide-6
SLIDE 6

Chatbots Artificial Solutions

  • Worked building chatbots for mobile and

web

  • Shell, AT&T, IKEA, Samsung, HTC, Rightmove
  • Integrated smart home with voice

commands

○ turn on the coffee machine every Tuesday when I

  • pen the downstairs front door
slide-7
SLIDE 7

Forensic stylometry

  • https://www.fastdatascience.com/author-prediction-demo
  • Oxford University workshop on NLP every summer
slide-8
SLIDE 8

Document analysis, trend detection

  • Developed NLP pipeline for English and

German at Pattern Science AG, near Frankfurt

  • Used for document classification
  • Trend detection
  • Emerging topics
slide-9
SLIDE 9

Masters Cambridge

  • Unsupervised learning for identifying

pleonastic pronouns

○ It seemed that things would never get any better ○ It surprised me to hear him say that

  • Download available