SLIDE 1 Convolutional Neural Network to Model Articulation Impairments in Patients with Parkinson’s Disease
Juan Camilo V´ asquez-Correa1,2 Juan Rafael Orozco-Arroyave1,2, Elmar N¨
1GITA research group, University of Antioquia UdeA. 2Pattern recognition Lab. Friedrich Alexander Universit¨
urnberg. jcamilo.vasquez@udea.edu.co 18th INTERSPEECH, 2017
November 9, 2017
1 / 32
SLIDE 2
Outline
Introduction Methods Experimental framework Results Conclusion
2 / 32
SLIDE 3
Outline
Introduction Methods Experimental framework Results Conclusion
3 / 32
SLIDE 4
Introduction: Parkinson’s Disease
◮ Second most prevalent neurologi-
cal disorder worldwide.
◮ Patients develop several motor
and non-motor impairments. (O. Hornykiewicz 1998).
◮ Speech impairments are one of
the earliest manifestations.
4 / 32
SLIDE 5
Introduction: Speech impairments
Speech impairments in PD patients: hypokinetic dysarthria Phonation Articulation Prosody Intelligibility
pataka pataka
5 / 32
SLIDE 6
Introduction: Imprecise articulation
◮ One of the most deviant speech dimensions in PD. ◮ Reduced velocity of lip, tongue, and jaw movements. ◮ Strong indication of the literature statement: imprecise con-
sonants caused by reduced range of movements of ar- ticulators
pa ta ka
6 / 32
SLIDE 7
Introduction: Hypothesis
PD patients have difficulties to begin and to stop the vocal fold vibration, and such difficulties can be observed on speech sig- nals by modeling the transitions between voiced and unvoiced sounds
Voiced Voiced Unvoiced Voiced Unvoiced Unvoiced Onset transition Offset transition
7 / 32
SLIDE 8
Introduction: Aims
◮ To model the time-frequency (TF) information provided by
the onset and offset transitions: short-time Fourier trans- form (STFT) and continuous wavelet transform (CWT).
◮ To “learn” features from time-frequency representations: con-
volutional neural network (CNN).
◮ Why TF and feature-learning? both have been successfully
used in several paralinguistics tasks: emotion, deception, depression, and others.
8 / 32
SLIDE 9
Outline
Introduction Methods Experimental framework Results Conclusion
9 / 32
SLIDE 10
Methods
Transitions detection Time frequency representations Convolutional neural network
10 / 32
SLIDE 11 Methods: Transitions detection
Transitions detection Time frequency representations Convolutional neural network
Onset transition Offset transition
Onset and offset are detected according to the presence
- f the fundamental frequency.
11 / 32
SLIDE 12 Methods: Time-frequency representation
Transitions detection Time frequency representations Convolutional neural network
0.00 0.02 0.04 0.06 0.08 0.10 0.12 0.14 Time (s) 500 1000 1500 2000 2500 3000 3500 4000 Frequency (Hz) 0.00 0.02 0.04 0.06 0.08 0.10 0.12 0.14 Time (s)
STFT of onset for a PD patient (left) and a HC subject (right)
Play PD Play HC
12 / 32
SLIDE 13 Methods: Time-frequency representation
Transitions detection Time frequency representations Convolutional neural network
0.00 0.02 0.04 0.06 0.08 0.10 0.12 0.14 Time (s) 100 200 300 400 500 Scale 0.00 0.02 0.04 0.06 0.08 0.10 0.12 0.14 Time (s)
CWT of onset for a PD patient (left) and a HC subject (right)
13 / 32
SLIDE 14 Methods: Convolutional neural network
Transitions detection Time frequency representations Convolutional neural network
Convolution layer I Convolution layer II Max-pool. layer 1 Max-pool layer 2 Fully conected MLP
Input layer PD vs. HC Feature maps 1 Feature maps 2
CNN learns high–level representations from the low–level raw data
14 / 32
SLIDE 15
Outline
Introduction Methods Experimental framework Results Conclusion
15 / 32
SLIDE 16
Data
◮ Three databases with recordings in three languages: Span-
ish, German, and Czech.
◮ Diadochokinetic exercises, isolated sentences, read texts,
and monologues.
16 / 32
SLIDE 17
Data
Language Description Spanish 50 Patients and 50 Healthy controls. Balanced in age (60 years old) and gender. Patients in middle state of the disease. German 88 Patients and 88 Healthy controls. Balanced in age (64 years old). patients in low and middle state of the disease. Czech 20 Patients and 15 Healthy controls. All male speakers. Patients diagnosed during recording session.
Table: Databases
17 / 32
SLIDE 18 Experiments and validation
◮ Classification of PD patients vs. HC subjects in the same
language.
◮ 10 fold cross-validation: 8 for training, 1 to optimize
hyper-parameters, and 1 for test.
◮ Cross-language classification.
◮ One language used for train and validation and other
language used for test.
18 / 32
SLIDE 19 Experiments and validation
◮ Results are compared respect to previous studies1.
Support vector machine
1Juan Camilo V´
asquez-Correa et al. “Effect of acoustic conditions on algorithms to detect Parkinson’s disease from speech”. In: International Conference on Acoustics, Speech and Signal Processing (ICASSP),. 2017, pp. 5065–5069.
19 / 32
SLIDE 20
Outline
Introduction Methods Experimental framework Results Conclusion
20 / 32
SLIDE 21
Results: same language for train and test
TFR Onset Offset Onset+Offset Spanish CNN-STFT 85.3 81.6 85.9 CNN-CWT 84.2 81.8 85.2 Baseline 69.3 69.6 71.6 German CNN-STFT 70.3 68.0 75.0 CNN-CWT 68.0 66.9 70.5 Baseline 72.7 70.9 74.0 Czech CNN-STFT 77.9 80.4 84.4 CNN-CWT 89.2 87.7 89.4 Baseline 75.3 74.4 78.8
21 / 32
SLIDE 22
Results: same language for train and test
TFR Onset Offset Onset+Offset Spanish CNN-STFT 85.3 81.6 85.9 CNN-CWT 84.2 81.8 85.2 Baseline 69.3 69.6 71.6 German CNN-STFT 70.3 68.0 75.0 CNN-CWT 68.0 66.9 70.5 Baseline 72.7 70.9 74.0 Czech CNN-STFT 77.9 80.4 84.4 CNN-CWT 89.2 87.7 89.4 Baseline 75.3 74.4 78.8
22 / 32
SLIDE 23
Results: same language for train and test
TFR Onset Offset Onset+Offset Spanish CNN-STFT 85.3 81.6 85.9 CNN-CWT 84.2 81.8 85.2 Baseline 69.3 69.6 71.6 German CNN-STFT 70.3 68.0 75.0 CNN-CWT 68.0 66.9 70.5 Baseline 72.7 70.9 74.0 Czech CNN-STFT 77.9 80.4 84.4 CNN-CWT 89.2 87.7 89.4 Baseline 75.3 74.4 78.8
23 / 32
SLIDE 24
Results: same language for train and test
50 100 150 Time (ms) 1000 2000 3000 4000 Frequency (Hz) 50 100 150 Time (ms) Low Energy High Energy
Figure: Output of the CNN after the last max–pool layer: PD patient (left) and a HC speaker (right)
24 / 32
SLIDE 25
Results: same language for train and test
Speech tasks Spanish German Czech read text 85.0 70.3 88.5 monologue 85.6 70.3 89.1 /pa-ta-ka/ 85.4 70.7 89.2
25 / 32
SLIDE 26 Results: different language for train and test
Test Lang. TFR
Train with Spanish German CNN-STFT 51.7 50.2 54.7 German Baseline 53.7 55.0 54.1 Czech CNN-CWT 55.2 55.4 57.9 Czech Baseline 60.3 57.4 60.4 Train with German Spanish CNN-STFT 58.0 55.7 55.8 Spanish Baseline 53.5 53.5 53.6 Czech CNN-STFT 53.0 52.4 53.0 Czech Baseline 50.9 51.7 52.6 Train with Czech Spanish CNN-CWT 53.8 56.3 56.7 Spanish Baseline 53.4 51.6 52.4 German CNN-STFT 54.0 51.8 54.0 German Baseline 51.2 51.0 50.7
26 / 32
SLIDE 27 Results: different language for train and test
Test Lang. TFR
Train with Spanish German CNN-STFT 51.7 50.2 54.7 German Baseline 53.7 55.0 54.1 Czech CNN-CWT 55.2 55.4 57.9 Czech Baseline 60.3 57.4 60.4 Train with German Spanish CNN-STFT 58.0 55.7 55.8 Spanish Baseline 53.5 53.5 53.6 Czech CNN-STFT 53.0 52.4 53.0 Czech Baseline 50.9 51.7 52.6 Train with Czech Spanish CNN-CWT 53.8 56.3 56.7 Spanish Baseline 53.4 51.6 52.4 German CNN-STFT 54.0 51.8 54.0 German Baseline 51.2 51.0 50.7
27 / 32
SLIDE 28 Results: different language for train and test
Test Lang. TFR
Train with Spanish German CNN-STFT 51.7 50.2 54.7 German Baseline 53.7 55.0 54.1 Czech CNN-CWT 55.2 55.4 57.9 Czech Baseline 60.3 57.4 60.4 Train with German Spanish CNN-STFT 58.0 55.7 55.8 Spanish Baseline 53.5 53.5 53.6 Czech CNN-STFT 53.0 52.4 53.0 Czech Baseline 50.9 51.7 52.6 Train with Czech Spanish CNN-CWT 53.8 56.3 56.7 Spanish Baseline 53.4 51.6 52.4 German CNN-STFT 54.0 51.8 54.0 German Baseline 51.2 51.0 50.7
28 / 32
SLIDE 29
Outline
Introduction Methods Experimental framework Results Conclusion
29 / 32
SLIDE 30
Conclusion
◮ A deep learning approach is proposed to model articulation
impairments of PD patients.
◮ Voiced-Unvoiced transitions are modeled with CNNs using
STFT and CWT.
30 / 32
SLIDE 31
Conclusion
◮ The proposed method is able to classify PD patients and
HC subjects and improves the baseline when the language used for train and test is the same.
◮ Additional approaches should be proposed when the train
and test language are different.
◮ Recurrent neural networks and other architectures may be
considered to assess co-articulation.
◮ Deep learning approaches trained with phonation, articula-
tion, and prosody information may be addressed to evaluate specific speech impairments.
31 / 32
SLIDE 32 Convolutional Neural Network to Model Articulation Impairments in Patients with Parkinson’s Disease
Juan Camilo V´ asquez-Correa1,2 Juan Rafael Orozco-Arroyave1,2, Elmar N¨
1GITA research group, University of Antioquia UdeA. 2Pattern recognition Lab. Friedrich Alexander Universit¨
urnberg. jcamilo.vasquez@udea.edu.co 18th INTERSPEECH, 2017
November 9, 2017
32 / 32