Convolutional Neural Network to Model Articulation Impairments in - - PowerPoint PPT Presentation

convolutional neural network to model articulation
SMART_READER_LITE
LIVE PREVIEW

Convolutional Neural Network to Model Articulation Impairments in - - PowerPoint PPT Presentation

Convolutional Neural Network to Model Articulation Impairments in Patients with Parkinsons Disease asquez-Correa 1 , 2 Juan Camilo V Juan Rafael Orozco-Arroyave 1 , 2 , Elmar N oth 2 1 GITA research group, University of Antioquia UdeA. 2


slide-1
SLIDE 1

Convolutional Neural Network to Model Articulation Impairments in Patients with Parkinson’s Disease

Juan Camilo V´ asquez-Correa1,2 Juan Rafael Orozco-Arroyave1,2, Elmar N¨

  • th2

1GITA research group, University of Antioquia UdeA. 2Pattern recognition Lab. Friedrich Alexander Universit¨

  • at. Erlangen-N¨

urnberg. jcamilo.vasquez@udea.edu.co 18th INTERSPEECH, 2017

November 9, 2017

1 / 32

slide-2
SLIDE 2

Outline

Introduction Methods Experimental framework Results Conclusion

2 / 32

slide-3
SLIDE 3

Outline

Introduction Methods Experimental framework Results Conclusion

3 / 32

slide-4
SLIDE 4

Introduction: Parkinson’s Disease

◮ Second most prevalent neurologi-

cal disorder worldwide.

◮ Patients develop several motor

and non-motor impairments. (O. Hornykiewicz 1998).

◮ Speech impairments are one of

the earliest manifestations.

4 / 32

slide-5
SLIDE 5

Introduction: Speech impairments

Speech impairments in PD patients: hypokinetic dysarthria Phonation Articulation Prosody Intelligibility

pataka pataka

5 / 32

slide-6
SLIDE 6

Introduction: Imprecise articulation

◮ One of the most deviant speech dimensions in PD. ◮ Reduced velocity of lip, tongue, and jaw movements. ◮ Strong indication of the literature statement: imprecise con-

sonants caused by reduced range of movements of ar- ticulators

pa ta ka

6 / 32

slide-7
SLIDE 7

Introduction: Hypothesis

PD patients have difficulties to begin and to stop the vocal fold vibration, and such difficulties can be observed on speech sig- nals by modeling the transitions between voiced and unvoiced sounds

Voiced Voiced Unvoiced Voiced Unvoiced Unvoiced Onset transition Offset transition

7 / 32

slide-8
SLIDE 8

Introduction: Aims

◮ To model the time-frequency (TF) information provided by

the onset and offset transitions: short-time Fourier trans- form (STFT) and continuous wavelet transform (CWT).

◮ To “learn” features from time-frequency representations: con-

volutional neural network (CNN).

◮ Why TF and feature-learning? both have been successfully

used in several paralinguistics tasks: emotion, deception, depression, and others.

8 / 32

slide-9
SLIDE 9

Outline

Introduction Methods Experimental framework Results Conclusion

9 / 32

slide-10
SLIDE 10

Methods

Transitions detection Time frequency representations Convolutional neural network

10 / 32

slide-11
SLIDE 11

Methods: Transitions detection

Transitions detection Time frequency representations Convolutional neural network

Onset transition Offset transition

Onset and offset are detected according to the presence

  • f the fundamental frequency.

11 / 32

slide-12
SLIDE 12

Methods: Time-frequency representation

Transitions detection Time frequency representations Convolutional neural network

0.00 0.02 0.04 0.06 0.08 0.10 0.12 0.14 Time (s) 500 1000 1500 2000 2500 3000 3500 4000 Frequency (Hz) 0.00 0.02 0.04 0.06 0.08 0.10 0.12 0.14 Time (s)

STFT of onset for a PD patient (left) and a HC subject (right)

Play PD Play HC

12 / 32

slide-13
SLIDE 13

Methods: Time-frequency representation

Transitions detection Time frequency representations Convolutional neural network

0.00 0.02 0.04 0.06 0.08 0.10 0.12 0.14 Time (s) 100 200 300 400 500 Scale 0.00 0.02 0.04 0.06 0.08 0.10 0.12 0.14 Time (s)

CWT of onset for a PD patient (left) and a HC subject (right)

13 / 32

slide-14
SLIDE 14

Methods: Convolutional neural network

Transitions detection Time frequency representations Convolutional neural network

Convolution layer I Convolution layer II Max-pool. layer 1 Max-pool layer 2 Fully conected MLP

Input layer PD vs. HC Feature maps 1 Feature maps 2

CNN learns high–level representations from the low–level raw data

14 / 32

slide-15
SLIDE 15

Outline

Introduction Methods Experimental framework Results Conclusion

15 / 32

slide-16
SLIDE 16

Data

◮ Three databases with recordings in three languages: Span-

ish, German, and Czech.

◮ Diadochokinetic exercises, isolated sentences, read texts,

and monologues.

16 / 32

slide-17
SLIDE 17

Data

Language Description Spanish 50 Patients and 50 Healthy controls. Balanced in age (60 years old) and gender. Patients in middle state of the disease. German 88 Patients and 88 Healthy controls. Balanced in age (64 years old). patients in low and middle state of the disease. Czech 20 Patients and 15 Healthy controls. All male speakers. Patients diagnosed during recording session.

Table: Databases

17 / 32

slide-18
SLIDE 18

Experiments and validation

◮ Classification of PD patients vs. HC subjects in the same

language.

◮ 10 fold cross-validation: 8 for training, 1 to optimize

hyper-parameters, and 1 for test.

◮ Cross-language classification.

◮ One language used for train and validation and other

language used for test.

18 / 32

slide-19
SLIDE 19

Experiments and validation

◮ Results are compared respect to previous studies1.

Support vector machine

1Juan Camilo V´

asquez-Correa et al. “Effect of acoustic conditions on algorithms to detect Parkinson’s disease from speech”. In: International Conference on Acoustics, Speech and Signal Processing (ICASSP),. 2017, pp. 5065–5069.

19 / 32

slide-20
SLIDE 20

Outline

Introduction Methods Experimental framework Results Conclusion

20 / 32

slide-21
SLIDE 21

Results: same language for train and test

TFR Onset Offset Onset+Offset Spanish CNN-STFT 85.3 81.6 85.9 CNN-CWT 84.2 81.8 85.2 Baseline 69.3 69.6 71.6 German CNN-STFT 70.3 68.0 75.0 CNN-CWT 68.0 66.9 70.5 Baseline 72.7 70.9 74.0 Czech CNN-STFT 77.9 80.4 84.4 CNN-CWT 89.2 87.7 89.4 Baseline 75.3 74.4 78.8

21 / 32

slide-22
SLIDE 22

Results: same language for train and test

TFR Onset Offset Onset+Offset Spanish CNN-STFT 85.3 81.6 85.9 CNN-CWT 84.2 81.8 85.2 Baseline 69.3 69.6 71.6 German CNN-STFT 70.3 68.0 75.0 CNN-CWT 68.0 66.9 70.5 Baseline 72.7 70.9 74.0 Czech CNN-STFT 77.9 80.4 84.4 CNN-CWT 89.2 87.7 89.4 Baseline 75.3 74.4 78.8

22 / 32

slide-23
SLIDE 23

Results: same language for train and test

TFR Onset Offset Onset+Offset Spanish CNN-STFT 85.3 81.6 85.9 CNN-CWT 84.2 81.8 85.2 Baseline 69.3 69.6 71.6 German CNN-STFT 70.3 68.0 75.0 CNN-CWT 68.0 66.9 70.5 Baseline 72.7 70.9 74.0 Czech CNN-STFT 77.9 80.4 84.4 CNN-CWT 89.2 87.7 89.4 Baseline 75.3 74.4 78.8

23 / 32

slide-24
SLIDE 24

Results: same language for train and test

50 100 150 Time (ms) 1000 2000 3000 4000 Frequency (Hz) 50 100 150 Time (ms) Low Energy High Energy

Figure: Output of the CNN after the last max–pool layer: PD patient (left) and a HC speaker (right)

24 / 32

slide-25
SLIDE 25

Results: same language for train and test

Speech tasks Spanish German Czech read text 85.0 70.3 88.5 monologue 85.6 70.3 89.1 /pa-ta-ka/ 85.4 70.7 89.2

25 / 32

slide-26
SLIDE 26

Results: different language for train and test

Test Lang. TFR

  • nset
  • ffset
  • nset+offset

Train with Spanish German CNN-STFT 51.7 50.2 54.7 German Baseline 53.7 55.0 54.1 Czech CNN-CWT 55.2 55.4 57.9 Czech Baseline 60.3 57.4 60.4 Train with German Spanish CNN-STFT 58.0 55.7 55.8 Spanish Baseline 53.5 53.5 53.6 Czech CNN-STFT 53.0 52.4 53.0 Czech Baseline 50.9 51.7 52.6 Train with Czech Spanish CNN-CWT 53.8 56.3 56.7 Spanish Baseline 53.4 51.6 52.4 German CNN-STFT 54.0 51.8 54.0 German Baseline 51.2 51.0 50.7

26 / 32

slide-27
SLIDE 27

Results: different language for train and test

Test Lang. TFR

  • nset
  • ffset
  • nset+offset

Train with Spanish German CNN-STFT 51.7 50.2 54.7 German Baseline 53.7 55.0 54.1 Czech CNN-CWT 55.2 55.4 57.9 Czech Baseline 60.3 57.4 60.4 Train with German Spanish CNN-STFT 58.0 55.7 55.8 Spanish Baseline 53.5 53.5 53.6 Czech CNN-STFT 53.0 52.4 53.0 Czech Baseline 50.9 51.7 52.6 Train with Czech Spanish CNN-CWT 53.8 56.3 56.7 Spanish Baseline 53.4 51.6 52.4 German CNN-STFT 54.0 51.8 54.0 German Baseline 51.2 51.0 50.7

27 / 32

slide-28
SLIDE 28

Results: different language for train and test

Test Lang. TFR

  • nset
  • ffset
  • nset+offset

Train with Spanish German CNN-STFT 51.7 50.2 54.7 German Baseline 53.7 55.0 54.1 Czech CNN-CWT 55.2 55.4 57.9 Czech Baseline 60.3 57.4 60.4 Train with German Spanish CNN-STFT 58.0 55.7 55.8 Spanish Baseline 53.5 53.5 53.6 Czech CNN-STFT 53.0 52.4 53.0 Czech Baseline 50.9 51.7 52.6 Train with Czech Spanish CNN-CWT 53.8 56.3 56.7 Spanish Baseline 53.4 51.6 52.4 German CNN-STFT 54.0 51.8 54.0 German Baseline 51.2 51.0 50.7

28 / 32

slide-29
SLIDE 29

Outline

Introduction Methods Experimental framework Results Conclusion

29 / 32

slide-30
SLIDE 30

Conclusion

◮ A deep learning approach is proposed to model articulation

impairments of PD patients.

◮ Voiced-Unvoiced transitions are modeled with CNNs using

STFT and CWT.

30 / 32

slide-31
SLIDE 31

Conclusion

◮ The proposed method is able to classify PD patients and

HC subjects and improves the baseline when the language used for train and test is the same.

◮ Additional approaches should be proposed when the train

and test language are different.

◮ Recurrent neural networks and other architectures may be

considered to assess co-articulation.

◮ Deep learning approaches trained with phonation, articula-

tion, and prosody information may be addressed to evaluate specific speech impairments.

31 / 32

slide-32
SLIDE 32

Convolutional Neural Network to Model Articulation Impairments in Patients with Parkinson’s Disease

Juan Camilo V´ asquez-Correa1,2 Juan Rafael Orozco-Arroyave1,2, Elmar N¨

  • th2

1GITA research group, University of Antioquia UdeA. 2Pattern recognition Lab. Friedrich Alexander Universit¨

  • at. Erlangen-N¨

urnberg. jcamilo.vasquez@udea.edu.co 18th INTERSPEECH, 2017

November 9, 2017

32 / 32