Text Mining in Clinical Domain: Dealing with Noise Author: Hoang - PowerPoint PPT Presentation

Text Mining in Clinical Domain: Dealing with Noise Author: Hoang Nguyen, Jon Patrick Source: KDD’16   Advisor: Jia-Ling Koh   Speaker: Avon Yu Date: 2018/12/4 � 1

Outline • Introduc*on • Method • Experiment • Conclusion � 2

Introduction • MoOvaOon • High level of noise in clinical corpus. • unknown word (ex. misspellings, acronym, abbreviaOons) • non-word (clinical scores & measure ex. BP140/65, HR 72…) • poor grammaOcal sentence • Costly labelled data, which sOll o\en contain errors and inconsistencies. • Imbalanced data distribuOon. � 3

Introduction • Goal • Introduces a general clinical data mining architecture which is potenOal of addressing these challenges using: • Pre-processing system (proof-reading) • InteracOve model development • AcOve learning � 4

Introduction • Framework � 5

Outline • IntroducOon • Method • Experiment • Conclusion � 6

Method • StandardisaOon • Ring-fencing tokeniser • Finite State Recognizer (FSR) uses training examples to recognize token paaerns consOtuOng a score or measurement that requires standardisaOon. � 7

Method • NormalisaOon & Clinical Concepts RecogniOon • The Lexicon Management System (LMS) store the accumulated lexical knowledge and contains categorizaOons of spelling errors, acronyms and non- word tokens. dictionary for English and Medical terms � 8

Method • IteraOve Model Development • The model is evaluated and the algorithm is revised in a feedback process to produce a more accurate result. � 9

Method • IteraOve Model Development • Feature selecOon: • Bag of words(BOW) • Proof reading • Ring-fencing • Lemma • Medical term and gazeaer • Bag of tags(BOT) • Context feature • NegaOon and modality � 10

Method • IteraOve Model Development • New model is delivered to the Visual Annotator(VA) to perform manual correcOon with the support of an annotaOon validaOon tool. � 11

Method • AcOve Learning • The learner queries the most informaOve instances to retrain the model instead of making a random selecOon. � 12

Method • AcOve Learning • Pool-based acOve learning � 13

Method • AcOve Learning • Simple AL Data within the margin is less imbalanced than the enOre data. � 14

Method • AcOve Learning • Self Confident • Chooses the next example to be labeled so that, when it is added to the training data, the future generalizaOon error probability is minimized • log-loss funcOon: � 15

Method • AcOve Learning • Kernel Farthest-First • The most informaOve instance is the farthest instance in the unseen pool from the current training set � 16

Method • AcOve Learning • Balanced Explora*on and Exploita*on(Balance-EE) • A combinaOon of Simple and KFF • The probability p for exploraOon will be updated as: � 17

Experiment • Dataset: • All reports provided in a year’s data collecOon by the imaging services in Australia. • Sample of 16472 reports was drawn from Lake Imaging and assigned to cancer (4784 reports) or non-cancer (11 688 reports) classes by the cancer registry � 19

Experiment • Descriptor (De) • 形態學、地形學、細胞型態 .. • EnOty (En) • subject of the report • LinguisOc (Li) • lexical polarity, normality and modifier • Radiologist’s coding (Ra) • cancer stage , TNM • Structure (St) • heading tags � 20

Experiment � 21

Experiment The evaluaOon of the reportability classifier presented here was executed independently at the Cancer Registry. The final version is implemented based on two ML algorithms, they are CondiOonal Random Fields(CRFs) and SVMs. ‘ sensitivity ’ is equal to ‘recall’ of the posiOve class (reportable) ‘ specificity ’ is the ‘recall’ of the negaOve class (non-reportable) � 22

Conclusion • Presents a general system for text mining in clinical domain with a focus on dealing with mulOple frequent kinds of noise. • Can dramaOcally reduce human effort in idenOfying relevant reports from the large imaging pool for further invesOgaOon of cancer. • The classifier is built on a large real-world dataset and can achieve high performance in filtering relevant reports. � 24

Text Mining in Clinical Domain: Dealing with Noise Author: Hoang - PowerPoint PPT Presentation

Text Mining in Clinical Domain: Dealing with Noise Author: Hoang Nguyen, Jon Patrick Source: KDD16 Advisor: Jia-Ling Koh Speaker: Avon Yu Date: 2018/12/4 1 Outline Introduc*on Method Experiment Conclusion 2

10 slides that always work Simple text boxes (I) Sample text Sample text Sample text

Dealing With The Irate Customer Dealing With The Irate Customer Dealing with difficult

Web Mining Web Mining Web Mining Web Mining Web mining is the use of data mining techniques

Module-2c: Two Port Noise Modelling 20 July 2018 16:40 Shot Noise vs. Flicker Noise Simple

Text mining with ngram variables Matthias Schonlau, Ph.D. The most common approach to dealing

Text mining with ngram variables Matthias Schonlau, Ph.D. The most common approach to dealing

CONTENT TITLE Insert Subtitle Here Enter Text Here Enter Text Here Enter Text Here

Text Mining Text Mining Web pages Emails Technical documents Corporate documents

Data Mining 2020 Text Classification Naive Bayes Ad Feelders Universiteit Utrecht Ad Feelders

Visioning Committee Air Quality and Noise January 23, 2020 Noise Data Noise is evaluated on

Lecture 19- ECE 240a Laser Phase Noise 1 ECE 240a Lasers - Fall 2019 Lecture 19 Phase Noise

Making Polynomials Robust to Noise Alexander Sherstov U C L A Noise in computation 2 Noise in

Johnson Noise: Determinations of k and Absolute Zero Edwin Ng | 12 December 2011 Nyquists

Noises Jaanus Jaggo Noise Noise is a function: noise(coordinate) -> value Pseudo-random:

Noises Jaanus Jaggo Noise Noise is a function: noise(coordinate) -> value Pseudo-random:

Post-Conference Presentation Sunday Oladayo Oladejo Table of Content A Introduction B

Long Long Term Care Term Care Problems, Problems, Policies Policies and Prosp and Prospects:

Cell & Gene Therapy Investor Day April 17, 2018 1 Safe Harbor Statement Safe Harbor

Development and demonstration of a new SANEX process for actinide(III)/lanthanide(III) separation

A perfect partnership: improving the financials and delivering better client outcomes Liz

AFTER MANDIBLE DISTRACTION FOR HEMIFACIAL MICROSOMIA Jang ng Hwan n Min, M.D., Seung ng Gee

BME Design 301 Design of Weight Distribution Monitoring System Andrew Vamos Team Leader Xiyu

Florenc Demrozi PhD Student Speaker (florenc.demrozi@univr.it) Computer Science Department

applications, mandates and other activities Irina Olaru 8 th GMO Network Meeting 23 May 2017

Sambuz

Useful Links

Newsletter

Mail Us

Text Mining in Clinical Domain: Dealing with Noise Author: Hoang - PowerPoint PPT Presentation

Text Mining in Clinical Domain: Dealing with Noise Author: Hoang Nguyen, Jon Patrick Source: KDD16 Advisor: Jia-Ling Koh Speaker: Avon Yu Date: 2018/12/4 1 Outline Introduc*on Method Experiment Conclusion 2

10 slides that always work Simple text boxes (I) Sample text Sample text Sample text

Dealing With The Irate Customer Dealing With The Irate Customer Dealing with difficult

Web Mining Web Mining Web Mining Web Mining Web mining is the use of data mining techniques

Module-2c: Two Port Noise Modelling 20 July 2018 16:40 Shot Noise vs. Flicker Noise Simple

Text mining with ngram variables Matthias Schonlau, Ph.D. The most common approach to dealing

Text mining with ngram variables Matthias Schonlau, Ph.D. The most common approach to dealing

CONTENT TITLE Insert Subtitle Here Enter Text Here Enter Text Here Enter Text Here

Text Mining Text Mining Web pages Emails Technical documents Corporate documents

Data Mining 2020 Text Classification Naive Bayes Ad Feelders Universiteit Utrecht Ad Feelders

Visioning Committee Air Quality and Noise January 23, 2020 Noise Data Noise is evaluated on

Lecture 19- ECE 240a Laser Phase Noise 1 ECE 240a Lasers - Fall 2019 Lecture 19 Phase Noise

Making Polynomials Robust to Noise Alexander Sherstov U C L A Noise in computation 2 Noise in

Johnson Noise: Determinations of k and Absolute Zero Edwin Ng | 12 December 2011 Nyquists

Noises Jaanus Jaggo Noise Noise is a function: noise(coordinate) -&gt; value Pseudo-random:

Noises Jaanus Jaggo Noise Noise is a function: noise(coordinate) -&gt; value Pseudo-random:

Post-Conference Presentation Sunday Oladayo Oladejo Table of Content A Introduction B

Long Long Term Care Term Care Problems, Problems, Policies Policies and Prosp and Prospects:

Cell &amp; Gene Therapy Investor Day April 17, 2018 1 Safe Harbor Statement Safe Harbor

Development and demonstration of a new SANEX process for actinide(III)/lanthanide(III) separation

A perfect partnership: improving the financials and delivering better client outcomes Liz

AFTER MANDIBLE DISTRACTION FOR HEMIFACIAL MICROSOMIA Jang ng Hwan n Min, M.D., Seung ng Gee

BME Design 301 Design of Weight Distribution Monitoring System Andrew Vamos Team Leader Xiyu

Florenc Demrozi PhD Student Speaker (florenc.demrozi@univr.it) Computer Science Department

applications, mandates and other activities Irina Olaru 8 th GMO Network Meeting 23 May 2017

Sambuz

Useful Links

Newsletter

Mail Us

Noises Jaanus Jaggo Noise Noise is a function: noise(coordinate) -> value Pseudo-random:

Noises Jaanus Jaggo Noise Noise is a function: noise(coordinate) -> value Pseudo-random:

Cell & Gene Therapy Investor Day April 17, 2018 1 Safe Harbor Statement Safe Harbor