Combining different modalities in classifying phonological - PowerPoint PPT Presentation

Combining different modalities in classifying phonological categories 1 S H U N A N Z H A O 1 A N D F R A N K R U D Z I C Z 1 , 2 1 U N I V E R S I T Y O F T O R O N T O 2 T O R O N T O R E H A B I L I T A T I O N I N S T I T U T E

Introduction 2  Imagined speech : “hearing” one’s own voice silently to oneself, without the intentional movement of any extremities such as lips, tongue, or hands (from Wikipedia).  Uses: ¡ Clinical tool to assist those with severe paralysis. ¡ “Synthetic telepathy” for the military (Bogue, 2010). ¡ General purpose communication.

Previous Approaches 3  Previous approaches at imagined speech classification ¡ Invasive and partially-invasive methods (Blakely et al., 2008; Bartels et al., 2008; Kellis et al., 2010; Pasley et al., 2012) . ¡ EEG (Suppes et al., 1997; Brigham and Kumar, 2010; Callan et al., 2000; D’Zmura et al., 2009; DaSalla 2009)  We are interested in discovering solutions that can be applied more generally and that relate acoustics to speech production .

Our Approach 4  We collect audio, facial (from the Kinect ) and EEG data of vocalized and imagined speech.  This allows us to relate the acoustics with internal speech production and speech articulation .

Participants 5  12 participants (mean age = 27.4, σ = 5, range = 14) were recruited from the University of Toronto campus.  All participants were right-handed , had some post-secondary education , and had no history of neurological conditions or substance abuse .  10 participants identified NA English as their native language and 2 spoke NA English at a fluent level.

Recording 6  A Microsoft Kinect camera was used to record facial information (6 animation units) and audio , while EEG was recorded using a 64- channel cap.

Task 7  Participants performed the following task: Rest state: (5 sec.) Participants were instructed to clear 1. their mind. Stimulus state: A prompt appeared on the screen and was 2. played over the computer’s speakers. Participants were instructed to move their articulators into position to begin pronouncing the prompt. Imagined state: (5 sec.) Participants imagined speaking the 3. prompt without moving. Speaking state: Participants spoke the prompt aloud. 4.

Animation Units 8  Upper Lip Raiser  Jaw Lowerer  Lip Stretcher  Brow Lowerer  Lip Corner Depressor  Outer Brow Raiser

Different States 9 Rest state Stimulus state 20 40 10 20 Power Power 0 0 − 10 − 20 − 20 − 40 0 1000 2000 3000 4000 5000 0 500 1000 1500 2000 Time (ms) Time (ms) Imagined state Speaking state 20 20 10 10 0 Power Power 0 − 10 − 10 − 20 − 20 − 30 0 1000 2000 3000 4000 5000 0 500 1000 1500 Time (ms) Time (ms)

Prompts 10  We used 7 phonemic/syllabic prompts . ¡ /iy/ , /uw/ , /piy/ , /tiy/ , /diy/, /m/ , /n/  And, 4 words from Kent’s list of phonetically- similar pairs (Kent et al., 1989) ¡ pat , pot , knew , gnaw  Each prompt was presented 12 times , for a total of 132 trials per person.  The phonemic prompts were first presented, followed by the 4 “Kent” words. Within each section, the trials were randomly permuted.

Pre-processing 11  Pre-processing for the EEG data was done using EEGLAB (Delorme and Makeig, 2004) and ocular artifacts were removed using BSS (Gomez-Herrero et al., 2006) .  The data was filtered between 1 and 50 Hz and mean values were subtracted from each channel.  We applied a small Laplacian filter to each channel, using the neighbourhood of adjacent channels.

Features 12  For the EEG and audio data, we window the data to approximately 10% of the segment , with a 50% overlap between consecutive windows. ¡ For each window, we compute various statistical measures, spectral entropy, energy, kurtosis, and skewness. We also compute the first and second derivative of the above features. ¡ This gives us 65,835 EEG features (over 62 channels) and 1197 acoustic features.  For the facial data, we compute a subset of the above features.  We perform feature selection by ranking features by their Pearson correlations with the given classes, for each task independently.

13 We computed the • Pearson correlations between all features in the audio and each of the 62 channels. The 10 channels • with the highest absolute correlations are circled in red in the image on the right. This seems to • confirm the involvement of the motor cortex in the Most informative electrode planning of speech positions articulation (Pulvermuller et al., 2005)

Experiments 14  We use subject-independent leave-one-out cross-validation for our experiments.  We use three classifiers: ¡ A deep-belief network ( DBN ), with one hidden layer whose size is 25% of the input size. We also do up to 10 iterations of pre-training, a learning rate of 0.1, and a dropout rate of 0.5. ¡ An SVM with a quadratic kernel ( SVM-quad ). ¡ An SVM with a radial basis function kernel ( SVM-rbf )

Classification of Phonological Categories 15  We classify between various phonological categories.  We consider the 5 binary classification tasks: ¡ Vowel-only vs. consonant ( C / V ) ¡ Presence of nasal (± Nasal ) ¡ Presence of bilabial (± Bilab. ) ¡ Presence of high-front vowel (±/ iy /) ¡ Presence of high-back vowel (±/ uw /)  We use six different feature sets: EEG -only, facial features ( FAC )-only, audio ( AUD )-only, EEG and facial features ( EEG + FAC ), EEG and audio features ( EEG + AUD) , and all modalities.

Results 16 100 90 80 70 60 Accuracy (%) 50 40 30 DBN (non − )uw 20 SVN − quad (non − )uw SVN − rbf (non − )uw DBN C/V 10 SVN − quad C/V SVN − rbf C/V 0 1 2 3 4 5 6 7 Subject

Classification of Mental State 17  As a second experiment, we classify the different states of each trial in three binary tasks: ¡ Stimulus vs. speaking ( ST / SP ) ¡ Rest vs. imagined ( R / I ) ¡ Stimulus vs. imagined ( ST / I )  We use the same classifiers as before with the same hyper-parameters.  To improve performance, we concatenate the band- pass filtered data from 6/8 participants and perform ICA .

Classification Results 18

Conclusions and Future Work 19  We present the first classification of phonological categories combining acoustic , facial , and EEG data, using relatively inexpensive equipment.  We plan on making the data publicly available in the near future.  Future work will involve methods to reconstruct acoustic features from the EEG.

Combining different modalities in classifying phonological - PowerPoint PPT Presentation

Combining different modalities in classifying phonological categories 1 S H U N A N Z H A O 1 A N D F R A N K R U D Z I C Z 1 , 2 1 U N I V E R S I T Y O F T O R O N T O 2 T O R O N T O R E H A B I L I T A T I O N I N S T I T U T E

V0D 2016 Classifying Studies V0D V0D 2016 Classifying Studies 1 2016 Classifying Studies

Programming Modalities Modalities of Programming In 2020, there are three prevalent modalities

Marie-France Bellin Technical innovations in existing modalities New imaging modalities

69a History of Massage: Modalities 69a History of Massage: Modalities Class Outline 5

DFG Graduiertenkolleg 1564 (Research Training Group 1564) Imaging New Modalities Multimodal

69a History of Massage: Modalities 69a History of Massage: Modalities Class Outline 5 minutes

Modalities in HoTT Egbert Rijke, Mike Shulman, Bas Spitters 1706.07526 Higher toposes Internal

Classifying Homogeneous Structures Cherlin Introduction The finite case Gregory Cherlin

Columbia-UCF MED2010: Combining Multiple Modalities, Contextual Concepts, and Temporal Matching

Combining Modalities in Multimodal Interfaces Focus on speech and gestures Focus on speech and

Combining Models Oliver Schulte - CMPT 726 Bishop PRML Ch. 14 Combining Models: Some Theory

The Use and Application of Physical Agents and Modalities: Can be used for: Physical Agent

Home Dialysis Modalities Supporting Patient Choice Objectives Review home dialysis therapy

Theme 2. Modalities, approaches for establishing a global sustainable mechanization knowledge

Alignment at ERA-LEARN 2020 Analysis of existing and potential modalities for aligning Analysis

FDE-Modalities and Weak Definability . Odintsov 1 Sergei P 1 Sobolev Institute of Mathematics

Confirming the diagnosis of epilepsy Disclosures at first presentation I am President of the

EEG/MEG Inverse Solution Driven by fMRI Yaroslav Halchenko CS @ NJIT 1 Functional Brain

Manifold-regression to predict from MEG/EEG brain signals without source modeling D. Sabbagh, P.

PyEEG A Python Module for EEG Feature Extraction Forrest Sheng Bao 1 , 2 and Christina R. Zhang 3

Operating ActiveTwo Innovative solutions for research in electrophysiology and behavior

Introduction to Machine Learning 10701 Independent Component Analysis Barnabs Pczos &

Source Analysis Source Analysis of Interictal Spikes of Interictal Spikes in EEG and MEG in

The smoothed multivariate square-root Lasso: an optimization lens on concomitant estimation

Combining different modalities in classifying phonological - PowerPoint PPT Presentation

Combining different modalities in classifying phonological categories 1 S H U N A N Z H A O 1 A N D F R A N K R U D Z I C Z 1 , 2 1 U N I V E R S I T Y O F T O R O N T O 2 T O R O N T O R E H A B I L I T A T I O N I N S T I T U T E

V0D 2016 Classifying Studies V0D V0D 2016 Classifying Studies 1 2016 Classifying Studies

Programming Modalities Modalities of Programming In 2020, there are three prevalent modalities

Marie-France Bellin Technical innovations in existing modalities New imaging modalities

69a History of Massage: Modalities 69a History of Massage: Modalities Class Outline 5

DFG Graduiertenkolleg 1564 (Research Training Group 1564) Imaging New Modalities Multimodal

69a History of Massage: Modalities 69a History of Massage: Modalities Class Outline 5 minutes

Modalities in HoTT Egbert Rijke, Mike Shulman, Bas Spitters 1706.07526 Higher toposes Internal

Classifying Homogeneous Structures Cherlin Introduction The finite case Gregory Cherlin

Columbia-UCF MED2010: Combining Multiple Modalities, Contextual Concepts, and Temporal Matching

Combining Modalities in Multimodal Interfaces Focus on speech and gestures Focus on speech and

Combining Models Oliver Schulte - CMPT 726 Bishop PRML Ch. 14 Combining Models: Some Theory

The Use and Application of Physical Agents and Modalities: Can be used for: Physical Agent

Home Dialysis Modalities Supporting Patient Choice Objectives Review home dialysis therapy

Theme 2. Modalities, approaches for establishing a global sustainable mechanization knowledge

Alignment at ERA-LEARN 2020 Analysis of existing and potential modalities for aligning Analysis

FDE-Modalities and Weak Definability . Odintsov 1 Sergei P 1 Sobolev Institute of Mathematics

Confirming the diagnosis of epilepsy Disclosures at first presentation I am President of the

EEG/MEG Inverse Solution Driven by fMRI Yaroslav Halchenko CS @ NJIT 1 Functional Brain

Manifold-regression to predict from MEG/EEG brain signals without source modeling D. Sabbagh, P.

PyEEG A Python Module for EEG Feature Extraction Forrest Sheng Bao 1 , 2 and Christina R. Zhang 3

Operating ActiveTwo Innovative solutions for research in electrophysiology and behavior

Introduction to Machine Learning 10701 Independent Component Analysis Barnabs Pczos &amp;

Source Analysis Source Analysis of Interictal Spikes of Interictal Spikes in EEG and MEG in

The smoothed multivariate square-root Lasso: an optimization lens on concomitant estimation

Introduction to Machine Learning 10701 Independent Component Analysis Barnabs Pczos &