Text Mining in Clinical Domain: Dealing with Noise
Author: Hoang Nguyen, Jon Patrick Source: KDD’16 Advisor: Jia-Ling Koh Speaker: Avon Yu
Date: 2018/12/4
- 1
Text Mining in Clinical Domain: Dealing with Noise Author: Hoang - - PowerPoint PPT Presentation
Text Mining in Clinical Domain: Dealing with Noise Author: Hoang Nguyen, Jon Patrick Source: KDD16 Advisor: Jia-Ling Koh Speaker: Avon Yu Date: 2018/12/4 1 Outline Introduc*on Method Experiment Conclusion 2
Author: Hoang Nguyen, Jon Patrick Source: KDD’16 Advisor: Jia-Ling Koh Speaker: Avon Yu
Date: 2018/12/4
2
inconsistencies.
3
which is potenOal of addressing these challenges using:
4
5
6
recognize token paaerns consOtuOng a score or measurement that requires standardisaOon.
7
accumulated lexical knowledge and contains categorizaOons of spelling errors, acronyms and non- word tokens.
dictionary for English and Medical terms
8
a feedback process to produce a more accurate result.
9
10
perform manual correcOon with the support of an annotaOon validaOon tool.
11
retrain the model instead of making a random selecOon.
12
13
Data within the margin is less imbalanced than the enOre data.
14
it is added to the training data, the future generalizaOon error probability is minimized
15
in the unseen pool from the current training set
16
17
18
imaging services in Australia.
and assigned to cancer (4784 reports) or non-cancer (11 688 reports) classes by the cancer registry
19
20
21
The evaluaOon of the reportability classifier presented here was executed independently at the Cancer Registry. The final version is implemented based on two ML algorithms, they are CondiOonal Random Fields(CRFs) and SVMs. ‘sensitivity’ is equal to ‘recall’ of the posiOve class (reportable) ‘specificity’ is the ‘recall’ of the negaOve class (non-reportable)
22
23
domain with a focus on dealing with mulOple frequent kinds of noise.
relevant reports from the large imaging pool for further invesOgaOon of cancer.
can achieve high performance in filtering relevant reports.
24