 
              Aprendizaje por transferencia en redes neuronales convolucionales para el diagn´ ostico y monitoreo de la enfermedad de Parkinson usando se˜ nales de voz en tres idiomas diferentes Cristian David Rios Urrego BSc. student in Electronic Engineering Advisor: Prof. Juan Rafael Orozco Arroyave Ph.D. Co-Advisor: MSc. Juan Camilo Vazquez Correa GITA research group, University of Antioquia. cdavid.rios@udea.edu.co June 5, 2019 1 / 29
Outline Introduction Overview Hypothesis and objectives Databases Methodology Data pre-processing Segmentation Short-time Fourier transform Convolutional Neural Network Convolution stage Pooling stage Transfer learning Experiments and Results Conclusions Conclusions and Future work 2 / 29
Introduction 3 / 29
Context: Parkinson’s disease Parkinson’s disease (PD) is a neu- rodegenerative disorder character- ized by symptoms such as resting tremor, bradykinesia, rigidity and alterations in the gait, caused by the loss of dopaminergic neurons. 4 / 29
Context: Parkinson’s disease Speech symptoms ◮ Low voice volume ◮ Reduction of prosodic pitch ◮ Monotonous speech ◮ Voice tremor ◮ Imprecise articulation 4 / 29
Context: Parkinson’s disease Computational tools ◮ Early Detection ◮ Diagnostic support ◮ Neurological state monitoring 4 / 29
Hypothesis and objectives Hypothesis It is possible to improve the classification of patients with Parkinson’s disease and healthy controls from transfer learning in monolingual data. 5 / 29
Hypothesis and objectives Objectives General Objective To implement and evaluate the method of transfer learning in convolutional neural networks (CNN) for three different languages in order to support diagnosis and monitor patients with PD. 5 / 29
Hypothesis and objectives Objectives Specific Objectives 1. To implement algorithms for pre-processing and segmentation of voice signals, for the extraction of onset-offset transitions. 2. To design and train CNNs with ResNet topology for different languages from time-frequency representations of the transitions. 3. To implement the transfer learning technique in the trained models of CNNs for the evaluation and monitoring of PD patients. 4. To evaluate and compare the performance of CNNs trained in different languages and the CNNs implemented with the transfer learning technique. 5 / 29
Databases 6 / 29
Databases Spanish Tasks Table: Information of the speakers in PC-GITA. µ : mean, σ : standard deviation . ◮ 10 sentences ◮ The rapid repetition PD patients Healthy controls of diadochokinetics Male Female Male Female (DDKs) Number of subjects 25 25 25 25 Age ( µ ± σ ) 61.3 ± 11.4 60.7 ± 7.3 60.5 ± 11.6 61.4 ± 7.0 ◮ Read text with 36 Range of age 33 – 81 49 – 75 31 – 89 49 – 76 words Disease duration ( µ ± σ ) 8.7 ± 5.8 12.6 ± 11.6 MDS-UPDRS-III ( µ ± σ ) 37.8 ± 22.1 37.6 ± 14.1 ◮ Monologue Range of MDS-UPDRS-III 6 – 93 19 – 71 7 / 29
Databases German Tasks Table: Information of the speakers in the German database. µ : mean, σ : standard deviation . ◮ 5 sentences ◮ The rapid repetition PD patients Healthy controls of the syllables Male Female Male Female /pa-ta-ka/ Number of subjects 47 41 44 44 Age ( µ ± σ ) 66.7 ± 8.4 66.2 ± 9.7 63.8 ± 12.7 62.6 ± 15.2 ◮ Read text with 81 Range of age 44 – 82 42 – 84 26 – 83 28 – 85 Disease duration ( µ ± σ ) 7.0 ± 5.5 7.1 ± 6.2 words MDS-UPDRS-III ( µ ± σ ) 22.1 ± 9.9 23.3 ± 12.0 ◮ Monologue Range of MDS-UPDRS-III 5 – 43 6 – 5 7 / 29
Databases Czech Table: Information of the speakers in the Czech database. µ : Tasks mean, σ : standard deviation . ◮ The rapid repetition of the syllables PD patients Healthy controls /pa-ta-ka/ Male Female Male Female Number of subjects 30 20 30 19 ◮ Read text with 80 Age ( µ ± σ ) 65.3 ± 9.6 60.1 ± 8.7 60.3 ± 11.5 63.5 ± 11.1 Range of age 43 – 82 41 – 72 41 – 77 40 – 79 words Disease duration ( µ ± σ ) 6.7 ± 4.5 6.8 ± 5.2 ◮ Monologue MDS-UPDRS-III ( µ ± σ ) 21.4 ± 11.5 18.1 ± 9.7 Range of MDS-UPDRS-III 4 – 54 6 – 38 7 / 29
Methodology 8 / 29
Methodology Base model Conv 4 Conv 3 Conv 2 32x10x5 Base Conv 1 32x5x2 16x20x10 8x40x20 16x10x5 4x80x41 4x40x20 8x20x10 language 128 64 Transfer parameters Target language Pre-trained CNN model New model Figure: Transfer learning strategy proposed in this study to classify PD from speech with utterances from different languages. 9 / 29
Data pre-processing 10 / 29
Segmentation Voiced and unvoiced segments are identified by the presence of the fundamental fre- quency of the voice (pitch) in frames of short duration. Figure: Voiced/unvoiced segments. Figure taken from Arias-Vergara et al. 2018. 11 / 29
Segmentation Onset and offset transitions are considered to model difficulties of the PD patients to start and to stop a movement like the vocal fold vibration. Figure: Onset/offset transitions. Figure taken from Arias-Vergara et al. 2018. 11 / 29
Time-frequency representations The short-time Fourier transform (STFT), is a Fourier-related transform used to deter- mine the frequency content (Ω) of local sections of a signal as it changes over time. ∞ � x ( n ) f ( n − m ) e − j Ω n X m (Ω) = (1) n = −∞ Where x ( n ) is the signal to be transformed, and f ( n ) is the window function, commonly a Blackman, Hamming or Hanning window. 12 / 29
Time-frequency representations Male healthy control Male PD patient. Age: 54 Age: 48; MDS-UPDRS: 9 0 0 0 Frecuen Frecuen 0 0 1.00 70 0.8 70 70 70 70 10 10 10 0.75 0.6 60 60 60 60 60 Frecuen Frecuen Frecuencia (Mel) 20 Frecuencia (Mel) # Mel scale filters 20 20 # Mel scale filters 20 20 Frecuencia (Mel) Frecuencia (Mel) 0.4 0.50 50 50 50 50 50 Amplitude 30 Amplitude 30 30 30 30 Frecuencia Frecuencia 0.2 0.25 40 40 40 40 40 cia (Mel) cia (Mel) 40 40 40 40 40 0.0 0.00 30 30 30 30 30 50 50 0.2 50 50 50 0.25 20 20 20 20 20 (Mel) (Mel) 60 60 0.4 60 60 60 0.50 10 10 10 10 10 0.6 70 70 70 0.75 0 0 0 0 0.8 0 0.00 0.02 0.04 0.06 0.08 0.10 0.12 0.14 0.16 0.08 20 0.16 40 0.00 0.02 0.04 0.06 0.08 0.10 0.12 0.14 0.16 0.08 20 0.16 40 0 20 40 0 20 40 0.08 0.16 Time (s) Tiempo (s Tiempo (s) Time (s) Time (s) Tiempo (s Tiempo (s) Time (s) 12 / 29
Convolutional Neural Network 13 / 29
Convolutional nueral network A CNN typically consists of 3 stages: a convolution stage in parallel to produce a set of linear activations, a pooling stage to modify the output of the layer and a classification stage. Convolution Fully connected Pooling Input layer layer layer layer Convolution Pooling Convolution Pooling Figure: Typical structure of a CNN. 14 / 29
Convolution Stage The convolution operation has the effect of filtering the input image with a trainable kernel. ∞ � s ( t ) = x ( a ) w ( t − a ) (2) a = −∞ Where ( x ) is known as the input, and the second argument ( w ) is the kernel, and the output is the feature map. Figure: Example of a 2-D convolution.Figure taken from Goodfellow, Bengio, and Courville 2016. 15 / 29
Pooling Stage Input layer 12 20 30 0 Output layer The main function is to reduce the spatial 8 12 2 1 20 30 Pooling dimensions of the input layer from a 34 70 37 4 122 37 statistical summary of the nearest outputs in the layer. 122 100 25 12 Figure: Pooling layer using the max pooling method. 16 / 29
Transfer learning 17 / 29
Transfer learning The initial idea of transfer learning is to reuse the experience gained to improve the learning of new models. Transfer learning can take advantage of the knowledge (features and weights) of previ- ously created models to train new models and even address model problems with small amounts of data. 18 / 29
Transfer learning Traditional learning Transfer learning Unlike traditional learning that is Dataset 1 Model 1 Dataset 1 Model 1 isolated and based exclusively on specific tasks, data sets and Knowledge training on separate models, learning by transfer takes Dataset 2 Model 2 Dataset 2 Model 2 advantage of knowledge from previously created models. Figure: Comparison between traditional learning and transfer learning. 18 / 29
Experiments and Results 19 / 29
Results: Validation and Regularization Error ◮ 10-fold Cross-Validation strategy, Best speaker independent. model ◮ Regularization: Validation a) L 2 Regularization. b) Dropout. Training c) Early stopping. Number of epochs Figure: Early stopping. 20 / 29
Recommend
More recommend