machine learning based detection of chemical risk
play

Machine learning-based detection of chemical risk N Grabar 1 , O - PowerPoint PPT Presentation

Context Material Methods Results and Discussion Conclusion Machine learning-based detection of chemical risk N Grabar 1 , O Wandji Tchami 1 , L Maxim 2 1 CNRS UMR8163 STL, Universit e Lille 3, France 2 Institut des Sciences de la


  1. Context Material Methods Results and Discussion Conclusion Machine learning-based detection of chemical risk N Grabar 1 , O Wandji Tchami 1 , L Maxim 2 1 CNRS UMR8163 STL, Universit´ e Lille 3, France 2 Institut des Sciences de la Communication, CNRS UPS3088, France MIE, Istambul, September 2014 1/13 Machine learning-based detection of chemical risk N Grabar, O Wandji Tchami, L Maxim

  2. Context Material Methods Results and Discussion Conclusion Plan 1 Context 2 Material 3 Methods 4 Results and Discussion 5 Conclusion and Future work 2/13 Machine learning-based detection of chemical risk N Grabar, O Wandji Tchami, L Maxim

  3. Context Material Methods Results and Discussion Conclusion Context Chemical risk: when chemical substances dangerous for human or animal health or for environment Bisphenol A, phtalates: endocrine disrupters at certain doses can interfere with the endocrine (or hormone system) in mammals Great number of severe disorders: sexual development problems (feminizing of males or masculine effects on females) breast cancer, prostate cancer, thyroid and other cancers brain development problems and deformations of the body Controversial topics 3/13 Machine learning-based detection of chemical risk N Grabar, O Wandji Tchami, L Maxim

  4. Context Material Methods Results and Discussion Conclusion Context Dangerous substances Authorization for marketing is required European Food Safety Authority (EFSA) analysis of a great amount of literature to provide scientifically-based arguments for decision-makers on possibility and appropriateness of marketing products and goods → Propose an automatic approach for the analysis of literature for the detection of sentences related to chemical risk 4/13 Machine learning-based detection of chemical risk N Grabar, O Wandji Tchami, L Maxim

  5. Context Material Methods Results and Discussion Conclusion Material Processed corpus : literature on Bisphenol A-related experiments and results over 80,000 word occurrences typical statements from scientific and institutional literature used to support the chemical decisions on risk management Linguistic resources : negation: no, not, neither, lack, absent, missing uncertainty: possible, hypothetical, should, can, may, usually limitations: only, shortcoming, small, insufficient approximation: approximately, commonly, estimated 5/13 Machine learning-based detection of chemical risk N Grabar, O Wandji Tchami, L Maxim

  6. Context Material Methods Results and Discussion Conclusion Material Classification : types of chemical risk factors and uncertainties causal relationship between the chemicals and the induced risk laboratory procedures, human factors, animals tested, significance of results, form of reporting, natural variability, control of confounders, exposure, dosage, assumptions, performance of the measurement and analytical method Reference data : manual annotation by a specialist of chemical risk 425 segments are assigned to 55 classes of risk factors classes do not overlap the reference data are a subset of the whole corpus 6/13 Machine learning-based detection of chemical risk N Grabar, O Wandji Tchami, L Maxim

  7. Context Material Methods Results and Discussion Conclusion Methods: supervised categorization Pre-processing and Annotation : pre-processing with the Ogmios plateform tokenization, POS-tagging and lemmatization by Genia tagger annotation with the linguistic resources Supervised categorization : categories to be recognized: detect sentences concerned with the chemical risk detect to which classes of chemical risk the sentences belong datasets with equal numbers of positive and negative examples Features used : forms : uncertain, risks lemmas : uncertain, risk lf : uncertain/uncertain, risks/risk tag : adj, noun lft : uncertain/uncertain/adj, risks/risk/noun stag : uncertainty, negation, limitations, approximation all : combination of all the features 7/13 Machine learning-based detection of chemical risk N Grabar, O Wandji Tchami, L Maxim

  8. Context Material Methods Results and Discussion Conclusion Methods: supervised categorization Feature weighting : freq : row frequency of features norm : normalization of the frequency by the document length tfidf : weighting of the frequency by term frequency*inverse document frequency Baseline : assignment of sentences in the default category with a two-category test: 50% performance Evaluation : cross-validation: three-fold cross-validation precision, recall, f-measure gain: real improvement of the performance P by comparison P − BL with the baseline BL : 1 − BL 8/13 Machine learning-based detection of chemical risk N Grabar, O Wandji Tchami, L Maxim

  9. Context Material Methods Results and Discussion Conclusion Results and Discussion 1 freq norm 0.8 tfidf performance 0.6 0.4 0.2 0 all form lemm lf lft stag tag descripteurs little impact of features and their weighting semantic tags: positive effect 9/13 Machine learning-based detection of chemical risk N Grabar, O Wandji Tchami, L Maxim

  10. Context Material Methods Results and Discussion Conclusion Results and Discussion 1 1 1 freq freq freq norm norm norm 0.8 0.8 0.8 tfidf tfidf tfidf performance performance performance 0.6 0.6 0.6 0.4 0.4 0.4 0.2 0.2 0.2 0 0 0 all form lemm lf lft stag tag all form lemm lf lft stag tag all form lemm lf lft stag tag descripteurs descripteurs descripteurs Natural unexplained Choice of Results reporting variability uncertainty factors ...inter-individual differences occur in expression of the isoenzymes responsible for the detoxification of BPA The use of the standard uncertainty factor (UF) of 10 to take into account interspecies differences is therefore considered quite conservative From the study description, although not clearly stated, it can be inferred that the BPA dose level was 40 microg/kg b.w./day 10/13 Machine learning-based detection of chemical risk N Grabar, O Wandji Tchami, L Maxim

  11. Context Material Methods Results and Discussion Conclusion Results and Discussion 1 1 1 freq freq freq norm norm norm 0.8 0.8 0.8 tfidf tfidf tfidf performance performance performance 0.6 0.6 0.6 0.4 0.4 0.4 0.2 0.2 0.2 0 0 0 all form lemm lf lft stag tag all form lemm lf lft stag tag all form lemm lf lft stag tag descripteurs descripteurs descripteurs Significance of results Control of confounders Assumptions Based on the re-analysis the Panel considered that no conclusion can be drawn from this study on the effect of BPA on learning and memory behaviour due to large variability in the data In consideration of the shortcomings in the design of both studies, in particular the uncertainty regarding the lactational as well as in utero exposure of the offspring to BPA... For this reason it has been hypothesised that circulating UGTs may substantially contribute to detoxification of xenobiotics in the foetus 11/13 Machine learning-based detection of chemical risk N Grabar, O Wandji Tchami, L Maxim

  12. Context Material Methods Results and Discussion Conclusion Results and Discussion Gain Class Baseline F-measure Gain Natural unexplained variability 0.50 0.95 0.90 Choice of uncertainty factors 0.50 0.93 0.86 Results reporting 0.50 0.82 0.64 Significance of the results 0.50 0.72 0.44 Control of confounders 0.50 0.60 0.20 Assumptions 0.50 0.67 0.34 no direct relations between the number of sentences and performance semantic annotation exploited by automatic categorization lexical specifity of some classes 12/13 Machine learning-based detection of chemical risk N Grabar, O Wandji Tchami, L Maxim

  13. Context Material Methods Results and Discussion Conclusion Conclusion and Future work Extraction of sentences concerned by chemical risk at two levels: chemical risk classes of chemical risk Some classes contain small number of sentences: not processed individually → oversampling Performance: 0.60-0.70 for classes that are difficult to detect 0.82-0.95 for classes that show lexical and semantic specificities Future work: building the dedicated lexicon application of over-sampling algorithms use of other methods (topic modeling, information retrieval) larger corpus, other substances evaluation by experts working in environmental agencies 13/13 Machine learning-based detection of chemical risk N Grabar, O Wandji Tchami, L Maxim

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend