Wavelet-Based Time-Frequency Representations for Automatic - PowerPoint PPT Presentation

Wavelet-Based Time-Frequency Representations for Automatic Recognition of Emotions from Speech asquez-Correa 1 , 2 ∗ , T. Arias-Vergara 1 , J. C. V´ J. R. Orozco-Arroyave 1 , 2 , J. F . Vargas-Bonilla 1 , E. N¨ oth 2 1 Department of Electronics and Telecommunication Engineering, University of Antioquia UdeA. 2 Pattern recognition Lab. Friedrich Alexander Universit¨ at. Erlangen-N¨ urnberg. *jcamilo.vasquez@udea.edu.co 1 / 34

Outline Introduction Methodology Data Experiments and Results Conclusion 2 / 34

Introduction: Emotions 3 / 34

Introduction: Emotion recognition Recognition of emotion from speech: ◮ Call centers ◮ Emergency services ◮ Depression Treatment ◮ Intelligent vehicles ◮ Public surveillance 4 / 34

Introduction: Non-stationary analysis 5 / 34

Introduction: Non-stationary analysis ◮ Time–Frequency Analysis Wavelet Transform Wigner–Ville distribution Modulation Spectra 6 / 34

Introduction: Proposal Features based on the energy content of three Wavelet–based TF representations for the classification of emotions from speech. ◮ Continuous Wavelet transform ◮ Bionic Wavelet transform ◮ Synchro–squeezing Wavelet transform 7 / 34

Methodology 8 / 34

Methodology: segmentation Two types of sounds: ◮ Voiced ◮ Unvoiced 9 / 34

Methodology: Wavelet Transforms Speech segment 1 0 −1 0 20 40 60 80 100 120 Time [ms] CWT SSWT BWT 8000 8000 2.8e−2 Frequency [Hz] Frequency [Hz] 2000 2000 6.1e−3 Scale s 490 490 1.3e−3 120 120 2.9e−4 31 31 6.3e−5 7.6 7.6 20 40 60 80 100 120 20 40 60 80 100 120 20 40 60 80 100 120 Time [ms] Time [ms] Time [ms] CWT: continuous wavelet transform BWT: bionic wavelet transform SSWT: synchro-squeezed wavelet transform 10 / 34

Methodology: feature extraction WT 8000 Log−Energy 3150 1720 Frequency [Hz] 920 Speech 510 Frame 300 200 Log−Energy 100 Log−Energy 0 10 20 30 40 Time [ms] � � N � � 1 � 2 � � � � � � E [ i ] = log (1) � WT ( u k , f i ) � � N � � u k f i � � 11 / 34

Methodology: feature extraction Descriptors (16 × 2) statistic functions (12) ZCR mean RMS Energy standard deviation F 0 kurtosis, skewness HNR max, min, relative position, range MFCC 1-12 slope, offset, MSE linear regression ∆ s Table: Features implemented using openEAR 1 1 Florian Eyben, Martin W¨ ollmer, and Bj¨ orn Schuller. “OpenSmile: the munich versatile and fast open-source audio feature extractor”. In: 18th ACM international conference on Multimedia . ACM. 2010, pp. 1459–1462. 12 / 34

Methodology: classification GMM-UBM Train set Emotions SVM Extraction 13 / 34

Methodology: classification ◮ The scores of the SVM are fused and used as new features for a second SVM. ◮ Leave one speaker out cross validation is performed. ◮ UAR as performance measure. SVM Unvoiced GMM Unvoiced Features Distance to hyperplane Supervector Unvoiced unvoiced segments SVM Fusion Emotion SVM Voiced GMM Voiced Features Supervector Voiced voiced Distance to hyperplane segments 14 / 34

Data Table: Databases used in this study Database # Rec. # Speak. Fs (Hz) Type Emotions Fear, Disgust Happiness, Neutral Berlin 534 10 16000 Acted Boredom, Sadness Anger Fear, Disgust Happiness, Anger IEMOCAP 10039 10 16000 Acted Surprise, Excitation Frustration, Sadness Neutral Anger, Happiness SAVEE 480 4 44100 Acted Disgust, Fear, Neutral Sadness, Surprise Fear, Disgust enterface05 1317 44 44100 Evoked Happiness, Anger Surprise, Sadness 15 / 34

Experiments and Results: high vs. low arousal HIGH AROUSAL Anger Happiness Fear Surprise Stress Interest Disgust NEGATIVE POSITIVE Neutral VALENCE VALENCE Sadness Relaxed Boredom Calm LOW AROUSAL 16 / 34

Experiments and Results: high vs. low arousal Table: Detection of high vs. low arousal emotions. V: voiced, U: unvoiced. Features Segm. Berlin SAVEE enterface05 IEMOCAP V 96 ± 6 83 ± 9 81 ± 2 74 ± 4 CWT U 89 ± 9 80 ± 8 80 ± 1 75 ± 3 Fusion 93 ± 8 87 ± 7 81 ± 3 76 ± 3 V 96 ± 6 82 ± 8 82 ± 2 74 ± 4 BWT U 90 ± 9 80 ± 7 80 ± 2 75 ± 3 Fusion 94 ± 7 85 ± 7 82 ± 2 76 ± 4 V 96 ± 6 84 ± 8 81 ± 2 76 ± 5 SSWT U 89 ± 8 80 ± 7 80 ± 1 76 ± 3 Fusion 95 ± 6 82 ± 6 80 ± 3 77 ± 4 OpenEAR - 97 ± 3 83 ± 9 81 ± 2 76 ± 4 17 / 34

Experiments and Results: positive vs. negative HIGH AROUSAL Anger Happiness Fear Surprise Stress Interest Disgust NEGATIVE POSITIVE Neutral VALENCE VALENCE Sadness Relaxed Boredom Calm LOW AROUSAL 20 / 34

Experiments and Results: positive vs. negative Table: Detection of positive vs. negative valence emotions. V: voiced, U: unvoiced. Features Segm. Berlin SAVEE enterface05 IEMOCAP V 80 ± 4 64 ± 5 75 ± 2 55 ± 4 CWT U 76 ± 5 64 ± 3 73 ± 3 58 ± 2 Fusion 78 ± 4 67 ± 4 74 ± 2 58 ± 5 V 80 ± 4 64 ± 6 74 ± 2 55 ± 4 BWT U 76 ± 7 64 ± 5 74 ± 3 58 ± 2 Fusion 78 ± 6 65 ± 6 74 ± 4 58 ± 3 V 82 ± 5 64 ± 5 76 ± 3 56 ± 4 SSWT U 77 ± 6 63 ± 3 74 ± 3 58 ± 2 Fusion 79 ± 4 65 ± 5 74 ± 4 60 ± 3 OpenEAR - 87 ± 2 72 ± 6 81 ± 4 59 ± 3 21 / 34

Experiments and Results: multiple emotions HIGH AROUSAL Anger Happiness Fear Surprise Disgust NEGATIVE POSITIVE Neutral VALENCE VALENCE Sadness Relaxed Boredom LOW AROUSAL 24 / 34

Experiments and Results: multiple emotions Table: Classification of multiple emotions. V: voiced, U: unvoiced. Features Segm. Berlin SAVEE enterface-05 IEMOCAP V 61 ± 8 41 ± 13 48 ± 5 47 ± 6 CWT U 55 ± 7 39 ± 6 46 ± 4 51 ± 4 Fusion 67 ± 7 44 ± 9 51 ± 6 56 ± 5 V 64 ± 9 41 ± 15 48 ± 4 47 ± 5 BWT U 56 ± 7 40 ± 4 45 ± 4 51 ± 4 Fusion 66 ± 7 47 ± 10 50 ± 4 55 ± 6 V 64 ± 8 43 ± 11 48 ± 4 49 ± 5 SSWT U 55 ± 8 40 ± 6 46 ± 4 52 ± 3 Fusion 69 ± 8 45 ± 12 49 ± 6 58 ± 4 OpenEAR - 80 ± 8 49 ± 17 63 ± 7 57 ± 3 25 / 34

Experiments and Results: multiple emotions Table: Classification of multiple emotions. V: voiced, U: unvoiced. Features Segm. Berlin SAVEE enterface-05 IEMOCAP V 61 ± 8 41 ± 13 48 ± 5 47 ± 6 CWT U 55 ± 7 39 ± 6 46 ± 4 51 ± 4 Fusion 67 ± 7 44 ± 9 51 ± 6 56 ± 5 V 64 ± 9 41 ± 15 48 ± 4 47 ± 5 BWT U 56 ± 7 40 ± 4 45 ± 4 51 ± 4 Fusion 66 ± 7 47 ± 10 50 ± 4 55 ± 6 V 64 ± 8 43 ± 11 48 ± 4 49 ± 5 SSWT U 55 ± 8 40 ± 6 46 ± 4 52 ± 3 Fusion 69 ± 8 45 ± 12 49 ± 6 58 ± 4 OpenEAR - 80 ± 8 49 ± 17 63 ± 7 57 ± 3 26 / 34

Wavelet-Based Time-Frequency Representations for Automatic - PowerPoint PPT Presentation

Wavelet-Based Time-Frequency Representations for Automatic Recognition of Emotions from Speech asquez-Correa 1 , 2 , T. Arias-Vergara 1 , J. C. V J. R. Orozco-Arroyave 1 , 2 , J. F . Vargas-Bonilla 1 , E. N oth 2 1 Department of

Recall 1 Wavelet coefficients of images are Laplacian distributed! The various wavelet

A wavelet based approach to climate biome clustering Derek Desantis University of Nebraska -

Frequency Decomposition The base frequency or the fundamental frequency is the lowest frequency.

Time-Frequency Analysis Time Frequency Analysis in Visual Signal Yetmen Wang AnCAD, Inc.

The Haar Wavelet Transform: Compression and Adams and Halsey Reconstruction Patterson Damien

Optimizing Discrete Wavelet Transform Optimizing Discrete Wavelet Transform on the Cell Broadband

Multi-D wavelet construction using Quillen-Suslin theorem for Laurent polynomials Youngmi Hur

Wavelet Scattering Transforms Haixia Liu Department of Mathematics The Hong Kong University of

Burst detection method in wavelet domain (WaveBurst) S.Klimenko, G.Mitselmakher University of

Continuous Wavelet Transform: ECG Recognition Based on Phase and Modulus Representations E DGAR G

61A Lecture 16 Announcements String Representations String Representations 4 String

Morphing and wavelet EnKF data assimilation Jan Mandel Based on joint work with J. D. Beezley, L.

GPU-Accelerated Undecimated Wavelet Transform for Film and Video Denoising Hermann Frntratt ,

The Structure of Digital Imaging: The Haar Wavelet Transformation and Image Compression Adam L.

Some Essentials of Data Analysis with Wavelets Slides in the wavelet part of the course in data

Wavelet coorbit spaces over general dilation groups Hartmut Fhr fuehr@matha.rwth-aachen.de AHA

FXT Daniel Cebra Daniel Cebra CBM-STAR Joint Workshop CBM-STAR Joint Workshop Slide 1 of 23

R q - average value 1 j P j 1 1 P - dispersion 2 R ( q

FA17 10-701 Homework 5 Recitation 1 Easwaran Ramamurthy Guoquan (GQ) Zhao Logan Brooks Note

Model Assumptions in Model Assumptions in ANOVA ANOVA

Multilevel Hybrid Split Step Implicit Tau-Leap for Stochastic Reaction Networks (SRNs) Chiheb Ben

Destabilizes Speculative Markets Klaus Pawelzik with Felix Patzelt Institute for Theoretical

Innovation and sustainability in economic development Rainer Walz Fraunhofer ISI Karlsruhe,

Sustainable Agricultural Development in the Context of Climate Change Per Pinstrup-Andersen Per