Natural language analysis to detect Parkinsons disease Paula Andrea - PowerPoint PPT Presentation

Natural language analysis to detect Parkinson’s disease Paula Andrea Pérez-Toro 1 , Juan Camilo Vásquez-Correa 1 , 2 , Martin Strauss 2 , Juan Rafael Orozco-Arroyave 1 , 2 , and Elmar Nöth 2 1 Faculty of Engineering, University of Antioquia, Medellin, Colombia 2 Pattern Recognition Lab, Friedrich-Alexander University of Erlangen-Nürnberg September 30, 2019

Introduction: Parkinson’s Disease (PD) • Second neuro-degenerative disorder worldwide. • 6.000.000 Parkinson’s patients around the world. • Neurologists evaluated PD accord- ing to MDS-UPDRS-III scale (Goetz et al. 2008). Motor impairments • Bradykinesia • Rigidity • Resting tremor • Micrographia • Dysartrhia J. C. Vásquez-Correa | TSD 2019, Ljubljana, Slovenia September 30, 2019 1

Introduction: Parkinson’s Disease (PD) Non-motor symptoms • Sleep disturbances. • Depression. • Cognitive impairments. • Communication disorders. J. C. Vásquez-Correa | TSD 2019, Ljubljana, Slovenia September 30, 2019 1

Introduction: Parkinson’s Disease (PD) Communication and Language impairments • Deficits in grammar production. • Less use of action verbs. • Low information context. • Simple syntax. • Differences in sentence length, number of propositions, and grammatical complexity. J. C. Vásquez-Correa | TSD 2019, Ljubljana, Slovenia September 30, 2019 1

Introduction: Hypothesis and Aims Hyphotesis: We believe that using NLP methods can also capture the effect of language impairments that affect the communication capabilities in PD, and also to detect the presence of the disease. J. C. Vásquez-Correa | TSD 2019, Ljubljana, Slovenia September 30, 2019 2

Introduction: Hypothesis and Aims Hyphotesis: We believe that using NLP methods can capture the effect of language impairments that affect the communication capabilities of PD patients, and detect the presence of the disease. Aims: • To model components related to communication deficits in PD using verbal information. • To analyze the suitability of NLP methods to discriminate PD vs. Healthy Control (HC) subjects. J. C. Vásquez-Correa | TSD 2019, Ljubljana, Slovenia September 30, 2019 2

Database Table: General information of the subjects. Time since diagnosis, age and education are given in years. PD patients HC subjects Gender [F/M] 25/25 25/25 Age [F/M] 60.7(7.3)/61.3(11.7) 61.4(7.1)/60.5(11.6) Education [F/M] 11.5(4.1)/10.9(4.5) 11.5(5.2)/10.6(4.4) Time since diagnosis [F/M] 12.6(11.5)/8.7(5.8) MDS–UPDRS–III [F/M] 37.6(14.0)/37.8(22.1) • The task consisted on asking the participants to talk about their daily routines • Average duration of the monologues: 48 ± 29 seconds for the patients and 45 ± 24 for the healthy subjects. J. C. Vásquez-Correa | TSD 2019, Ljubljana, Slovenia September 30, 2019 3

Methods: Methodology Training and Text Feature Training Development Pre Extraction Classifier Sets processing Noisy entities removal, lexicon BoW, TFIDF, Classification normalization, W2V cleaned text Text Test Feature Class Pre Sample Extraction Prediction processing J. C. Vásquez-Correa | TSD 2019, Ljubljana, Slovenia September 30, 2019 4

Methods: Pre-processing The data is cleaned and standardized, making it noise–free and ready for analysis. Noise Lexicon Tokenization Removal Normalization J. C. Vásquez-Correa | TSD 2019, Ljubljana, Slovenia September 30, 2019 5

Methods: Bag of Words-BoW Collection of words into a feature vector. 1. The sentences are represented as a collection of words. 2. Vocabulary → 1182 words. 3. The words of the transcripts are counted and stored as the feature vector. J. C. Vásquez-Correa | TSD 2019, Ljubljana, Slovenia September 30, 2019 6

Methods: Term Frequency-Inverse Document Frequency–TF-IDF • TF: gives the relative frequency of a specific word. • IDF: the frequency of occurrence of the word in the collection of documents. • TF-IDF features aims to model the vocabulary of the patients, and the relevance of the word they use in their transcripts. • TF-IDF is given for the word W i , j by: � � N W i , j = TF i , j log d f i TF i , j : the number of occurrences of the term i in the document j . d f i : the number of documents containing i . N : the total number of documents. J. C. Vásquez-Correa | TSD 2019, Ljubljana, Slovenia September 30, 2019 7

Methods: Word2Vec-W2V • A Neural Network with one hidden layer. • Input → One-hot-Encoding representation of the words. • Activations of the hidden layer are the “word vectors". Input layer Hidden layer Output layer y 1 X 1 X 2 y 2 X 3 y 3 h 1 h 2 X k y k h i W VxN ={ w ki } W' VxN ={ w' ij } h N X V-1 y V-1 X V y V J. C. Vásquez-Correa | TSD 2019, Ljubljana, Slovenia September 30, 2019 8

Methods: Word2Vec-W2V • The model was trained with a continuos bag of words (CBOW) architecture. • Trained using the Spanish WikiCorpus, which contains 120 millions of words. • The model considered a window size of 7 words to model the temporal context. • Dimension of the word vectors was set to 100. • Statistical functionals were computed for the transcript of each user: average, standard deviation, skewness, and kurtosis. J. C. Vásquez-Correa | TSD 2019, Ljubljana, Slovenia September 30, 2019 9

Methods: Classification • Two classifiers are considered: A soft margin Support Vector Machine (SVM) with Gaussian kernel, and a Random Forest (RF). • Validation: A ten-fold cross-validation scheme was implemented. • An early fusion strategy was implemented to combine the different feature sets. J. C. Vásquez-Correa | TSD 2019, Ljubljana, Slovenia September 30, 2019 10

Results A B Word cloud representation: A) PD patient. B) HC subject. J. C. Vásquez-Correa | TSD 2019, Ljubljana, Slovenia September 30, 2019 11

Results Table: Classification results. RBF-SVM RF Features Acc(%) Sens(%) Spe(%) AUC Acc(%) Sens(%) Spe(%) AUC BoW 62.0 70.0 54.0 0.60 70.0 74.0 66.0 0.76 TF–IDF 58.0 58.0 56.0 0.60 67.0 68.0 66.0 0.71 72.0 92.0 52.0 0.66 W2V 67.0 74.0 60.0 0.71 Fusion 60.0 62.0 58.0 0.62 66.0 68.0 64.0 0.71 Notes: Acc : accuracy. Sens : sensitivity. Spe : specificity. AUC : Area under the ROC curve. • PD patients are better discriminated in most of the cases. • The fusion strategy did not improve the results indicating that the considered features are not complementary. • Further research is required to find an optimal strategy to merge such information. J. C. Vásquez-Correa | TSD 2019, Ljubljana, Slovenia September 30, 2019 12

Results 1.0 0.8 True Positive 0.6 0.4 0.2 BoW -> auc=0.7588 TF-IDF -> auc=0.7098 W2V -> auc=0.7018 Fusion -> auc=0.7084 0.0 0.0 0.2 0.4 0.6 0.8 1.0 False Positive ROC curves for the different feature sets. J. C. Vásquez-Correa | TSD 2019, Ljubljana, Slovenia September 30, 2019 13

Results Scores obtained for the BoW feature set. J. C. Vásquez-Correa | TSD 2019, Ljubljana, Slovenia September 30, 2019 14

Conclusion • Several NLP techniques were considered in this paper to discriminate between HC subjects and PD patients. • The proposed approach allows the study of different communication disorders that cannot be observed in motor activities. • PD patients do mainly passive activities like reading, thinking, and taking their medication, while HC subjects do more active activities. • The results suggest that there is information that reflects language impairments in PD patients. J. C. Vásquez-Correa | TSD 2019, Ljubljana, Slovenia September 30, 2019 15

Conclusion • Limitation: the task performed by the participants might not reflect properly the communication deficits of PD patients, but the difference between the daily routine performed by the patients and the HC subjects. • Our team is currently collecting more recordings with the aim to evaluate the suitability of other tasks. • Further experiments will explore more robust word embedding methods such as ELMo or BERT to improve the performance of the system. • Fusion of acoustic and language information will be implemented. • Evaluation of specific non-motor impairments of PD patients will be addressed in further experiments: depression, anxiety, among others. J. C. Vásquez-Correa | TSD 2019, Ljubljana, Slovenia September 30, 2019 16

Thank you for your attention. Questions? Camilo Vasquez Pattern Recognition Lab, Department of Computer Science, Friedrich-Alexander University Erlangen-Nurenberg, Erlangen, Germany juan.vasquez@fau.de This project has received funding from the European Union’s Horizon 2020 research and innovation programme under Marie Sklodowska-Curie grant agreement No 766287. This project was also funded by CODI at UdeA grant # PRG2017-15530. J. C. Vásquez-Correa | TSD 2019, Ljubljana, Slovenia September 30, 2019 16

References I Goetz, C.G. et al. (2008). “Movement Disorder Society-sponsored revision of the Unified Parkinson’s Disease Rating Scale (MDS-UPDRS): Scale presentation and clinimetric testing results”. In: Movement Disorders 23.15, pp. 2129–2170. J. C. Vásquez-Correa | TSD 2019, Ljubljana, Slovenia September 30, 2019 16

Natural language analysis to detect Parkinsons disease Paula Andrea - PowerPoint PPT Presentation

Natural language analysis to detect Parkinsons disease Paula Andrea Prez-Toro 1 , Juan Camilo Vsquez-Correa 1 , 2 , Martin Strauss 2 , Juan Rafael Orozco-Arroyave 1 , 2 , and Elmar Nth 2 1 Faculty of Engineering, University of Antioquia,

Developing a Translational Toolbox for Parkinson disease: The Parkinson Progression Marker

Can We Detect Crisp Sets Based Only on How to Detect 1- . . . the Subsethood Ordering of Fuzzy

Preparing for the Unexpected Samuel Parkinson samuel.parkinson@ft.com #qconlondon

Wake Up to Lyme What is Lyme Disease? Risk of Lyme Disease Preventing Lyme Disease

NeuroMat Parkinson Network (AMPARO) Andr Frazo Helene https://amparo.numec.prp.usp.br/

Natural Language Understanding We want to communicate with computers using natural language

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Information Extraction Industrial Natural Language Processing Industrial Natural Language

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Paula

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Optimizing the Treatment of Parkinson's Disease: Patient-specific Considerations Outline

A Presentation To Mayor, Council Members and Staff of the Town of Smiths Falls www.parkinson.ca

British Columbia John Murphy Doug Fast 1 Parkinson Society BC Agenda 1. Introductions 2.

Natural Language Processing 1 Lecture 11: Language generation and summarisation Katia Shutova

Automated Guidance of Post- Operative DBS Programming Webinar Will Begin at 12:00 PM EDT Outline

Medical Perspective from SNMMI Frederic H. Fahey, DSc, FSNMMI, FACR, FAAPM SNMMI Past President

Example of User Application Example of User Application http://cern.ch/geant4 The full set of

Public Sector Equality Duty for CCGs Tim Gunning and Joanna Owen About the Equality and Human

Algorithmic Approaches for the construc3on of gene co-expression

Tactile Images Gary Li, Vidya Narayanan, Hannah Rosen THE PROBLEM Individuals who are blind

Welcome! Signs and Wayfinding 2010 ADA Standards of Accessible Design will begin at 2:00 p.m.

Alexa Skill Blueprints: Creating games to enhance learning for children and/or adults. By Karla