Machine learning-based detection of chemical risk N Grabar 1 , O - - PowerPoint PPT Presentation

machine learning based detection of chemical risk
SMART_READER_LITE
LIVE PREVIEW

Machine learning-based detection of chemical risk N Grabar 1 , O - - PowerPoint PPT Presentation

Context Material Methods Results and Discussion Conclusion Machine learning-based detection of chemical risk N Grabar 1 , O Wandji Tchami 1 , L Maxim 2 1 CNRS UMR8163 STL, Universit e Lille 3, France 2 Institut des Sciences de la


slide-1
SLIDE 1

Context Material Methods Results and Discussion Conclusion

Machine learning-based detection of chemical risk

N Grabar1, O Wandji Tchami1, L Maxim2

1 CNRS UMR8163 STL, Universit´

e Lille 3, France

2 Institut des Sciences de la Communication, CNRS UPS3088, France

MIE, Istambul, September 2014

1/13 Machine learning-based detection of chemical risk N Grabar, O Wandji Tchami, L Maxim

slide-2
SLIDE 2

Context Material Methods Results and Discussion Conclusion

Plan

1 Context 2 Material 3 Methods 4 Results and Discussion 5 Conclusion and Future work 2/13 Machine learning-based detection of chemical risk N Grabar, O Wandji Tchami, L Maxim

slide-3
SLIDE 3

Context Material Methods Results and Discussion Conclusion

Context

Chemical risk:

when chemical substances dangerous for human or animal health or for environment

Bisphenol A, phtalates: endocrine disrupters

at certain doses can interfere with the endocrine (or hormone system) in mammals

Great number of severe disorders:

sexual development problems (feminizing of males or masculine effects on females) breast cancer, prostate cancer, thyroid and other cancers brain development problems and deformations of the body

Controversial topics

3/13 Machine learning-based detection of chemical risk N Grabar, O Wandji Tchami, L Maxim

slide-4
SLIDE 4

Context Material Methods Results and Discussion Conclusion

Context

Dangerous substances Authorization for marketing is required European Food Safety Authority (EFSA)

analysis of a great amount of literature to provide scientifically-based arguments for decision-makers

  • n possibility and appropriateness of marketing products and

goods

→ Propose an automatic approach

for the analysis of literature for the detection of sentences related to chemical risk

4/13 Machine learning-based detection of chemical risk N Grabar, O Wandji Tchami, L Maxim

slide-5
SLIDE 5

Context Material Methods Results and Discussion Conclusion

Material

Processed corpus:

literature on Bisphenol A-related experiments and results

  • ver 80,000 word occurrences

typical statements from scientific and institutional literature used to support the chemical decisions on risk management

Linguistic resources:

negation: no, not, neither, lack, absent, missing uncertainty: possible, hypothetical, should, can, may, usually limitations: only, shortcoming, small, insufficient approximation: approximately, commonly, estimated

5/13 Machine learning-based detection of chemical risk N Grabar, O Wandji Tchami, L Maxim

slide-6
SLIDE 6

Context Material Methods Results and Discussion Conclusion

Material

Classification:

types of chemical risk factors and uncertainties causal relationship between the chemicals and the induced risk laboratory procedures, human factors, animals tested, significance of results, form of reporting, natural variability, control of confounders, exposure, dosage, assumptions, performance of the measurement and analytical method

Reference data:

manual annotation by a specialist of chemical risk 425 segments are assigned to 55 classes of risk factors classes do not overlap the reference data are a subset of the whole corpus

6/13 Machine learning-based detection of chemical risk N Grabar, O Wandji Tchami, L Maxim

slide-7
SLIDE 7

Context Material Methods Results and Discussion Conclusion

Methods: supervised categorization

Pre-processing and Annotation:

pre-processing with the Ogmios plateform tokenization, POS-tagging and lemmatization by Genia tagger annotation with the linguistic resources

Supervised categorization:

categories to be recognized:

detect sentences concerned with the chemical risk detect to which classes of chemical risk the sentences belong

datasets with equal numbers of positive and negative examples

Features used:

forms: uncertain, risks lemmas: uncertain, risk lf: uncertain/uncertain, risks/risk tag: adj, noun lft: uncertain/uncertain/adj, risks/risk/noun stag: uncertainty, negation, limitations, approximation all: combination of all the features

7/13 Machine learning-based detection of chemical risk N Grabar, O Wandji Tchami, L Maxim

slide-8
SLIDE 8

Context Material Methods Results and Discussion Conclusion

Methods: supervised categorization

Feature weighting:

freq: row frequency of features norm: normalization of the frequency by the document length tfidf: weighting of the frequency by term frequency*inverse document frequency

Baseline:

assignment of sentences in the default category with a two-category test: 50% performance

Evaluation:

cross-validation: three-fold cross-validation precision, recall, f-measure gain: real improvement of the performance P by comparison with the baseline BL:

P−BL 1−BL

8/13 Machine learning-based detection of chemical risk N Grabar, O Wandji Tchami, L Maxim

slide-9
SLIDE 9

Context Material Methods Results and Discussion Conclusion

Results and Discussion

0.2 0.4 0.6 0.8 1 all form lemm lf lft stag tag

performance descripteurs

freq norm tfidf

little impact of features and their weighting semantic tags: positive effect

9/13 Machine learning-based detection of chemical risk N Grabar, O Wandji Tchami, L Maxim

slide-10
SLIDE 10

Context Material Methods Results and Discussion Conclusion

Results and Discussion

0.2 0.4 0.6 0.8 1 all form lemm lf lft stag tag

performance descripteurs

freq norm tfidf 0.2 0.4 0.6 0.8 1 all form lemm lf lft stag tag

performance descripteurs

freq norm tfidf 0.2 0.4 0.6 0.8 1 all form lemm lf lft stag tag

performance descripteurs

freq norm tfidf

Natural unexplained Choice of Results reporting variability uncertainty factors

...inter-individual differences occur in expression of the isoenzymes responsible for the detoxification of BPA The use of the standard uncertainty factor (UF) of 10 to take into account interspecies differences is therefore considered quite conservative From the study description, although not clearly stated, it can be inferred that the BPA dose level was 40 microg/kg b.w./day

10/13 Machine learning-based detection of chemical risk N Grabar, O Wandji Tchami, L Maxim

slide-11
SLIDE 11

Context Material Methods Results and Discussion Conclusion

Results and Discussion

0.2 0.4 0.6 0.8 1 all form lemm lf lft stag tag

performance descripteurs

freq norm tfidf 0.2 0.4 0.6 0.8 1 all form lemm lf lft stag tag

performance descripteurs

freq norm tfidf 0.2 0.4 0.6 0.8 1 all form lemm lf lft stag tag

performance descripteurs

freq norm tfidf

Significance of results Control of confounders Assumptions

Based on the re-analysis the Panel considered that no conclusion can be drawn from this study on the effect of BPA on learning and memory behaviour due to large variability in the data In consideration of the shortcomings in the design of both studies, in particular the uncertainty regarding the lactational as well as in utero exposure of the offspring to BPA... For this reason it has been hypothesised that circulating UGTs may substantially contribute to detoxification of xenobiotics in the foetus

11/13 Machine learning-based detection of chemical risk N Grabar, O Wandji Tchami, L Maxim

slide-12
SLIDE 12

Context Material Methods Results and Discussion Conclusion

Results and Discussion

Gain

Class Baseline F-measure Gain Natural unexplained variability 0.50 0.95 0.90 Choice of uncertainty factors 0.50 0.93 0.86 Results reporting 0.50 0.82 0.64 Significance of the results 0.50 0.72 0.44 Control of confounders 0.50 0.60 0.20 Assumptions 0.50 0.67 0.34 no direct relations between the number of sentences and performance semantic annotation exploited by automatic categorization lexical specifity of some classes

12/13 Machine learning-based detection of chemical risk N Grabar, O Wandji Tchami, L Maxim

slide-13
SLIDE 13

Context Material Methods Results and Discussion Conclusion

Conclusion and Future work

Extraction of sentences concerned by chemical risk at two levels:

chemical risk classes of chemical risk

Some classes contain small number of sentences:

not processed individually → oversampling

Performance:

0.60-0.70 for classes that are difficult to detect 0.82-0.95 for classes that show lexical and semantic specificities

Future work:

building the dedicated lexicon application of over-sampling algorithms use of other methods (topic modeling, information retrieval) larger corpus, other substances evaluation by experts working in environmental agencies

13/13 Machine learning-based detection of chemical risk N Grabar, O Wandji Tchami, L Maxim