Emotion Recognition in Speech under Environmental Noise Conditions - PowerPoint PPT Presentation

Emotion Recognition in Speech under Environmental Noise Conditions using Wavelet Decomposition asquez-Correa 1 , N. Garc´ ıa 1 , J.R. Orozco-Arroyave 1 , 2 , J.C. V´ no 1 , J.F. Vargas-Bonilla 1 , Elmar N¨ oth 2 J.D. Arias-Londo˜ 1 Faculty of Engineering, University of Antioquia UdeA, Medell´ ın, Colombia. 2 Pattern Recognition Lab., Friedrich-Alexander-Universit¨ at, Erlangen-N¨ urnberg, Germany jesus.vargas@udea.edu.co 1 / 25

Introduction: Emotion recognition Recognition of emotion in speech: ◮ Call centers ◮ Emergency services ◮ Psychologic therapy ◮ Intelligent vehicles ◮ Public surveillance 2 / 25

Introduction: Fear-type emotions 3 / 25

Introduction: Challenges ◮ Naturalness of databases (Acted, Natural, Evoked) ◮ Large set of features ◮ Acoustic conditions (Telephone, Background noise) 4 / 25

Introduction: Previous Work (2-class) ◮ Emotion recognition under AWGN noise ◮ Emotion recognition under GSM and wired-line telephone channel Condition Original Affected KLT logMMSE AWGN SNR=3dB 76 . 9% 71 . 3% 78 . 1% 74 . 7% AWGN SNR=10dB 76 . 9% 74 . 7% 80 . 1% 76 . 7% GSM channel 76 . 9% 77 . 8% 62 , 9% 70 . 6% wired-line 76 . 9% 65 . 2% 59 . 0% 75 . 1% Table: Emotion recognition Berlin database 5 / 25

Methodology A new characterization approach based on wavelet packet transform for recognition of emotions in speech evaluated in non-controlled noise conditions. ◮ Log-energy ◮ Log-energy entropy ◮ MFCC ◮ Lempel-Ziv complexity 6 / 25

Methodology: Characterization Wavelet decomposition Voiced segments x[n] W 1 , 0 W 1 , 1 W 2 , 0 W 2 , 1 W 2 , 2 W 2 , 3 W 3 , 0 W 3 , 1 W 3 , 2 W 3 , 3 W 3 , 4 W 3 , 5 W 3 , 6 W 3 , 7 W 4 , 0 W 4 , 1 W 4 , 2 W 4 , 3 W 4 , 6 W 4 , 7 W 5 , 5 W 5 , 6 Wavelet decomposition Unvoiced segments x[n] W 1 , 0 W 1 , 1 W 2 , 0 W 2 , 1 W 2 , 2 W 2 , 3 W 3 , 0 W 3 , 1 W 3 , 2 W 3 , 3 W 3 , 4 W 3 , 5 W 3 , 6 W 3 , 7 W 4 , 0 W 4 , 1 W 4 , 2 W 4 , 3 W 5 , 0 W 5 , 1 W 5 , 2 W 5 , 3 W 6 , 2 W 6 , 3 W 6 , 4 W 6 , 5 7 / 25

Databases database # recordings Speakers Fs (Hz) Naturalness Emotions Hot anger Boredorm Disgust Berlin 534 12 16000 Acted Anxiety/Fear Happiness Sadness Neutral Hot anger Happiness Disgust Enterface05 (Audio-Video) 1317 44 44100 Evoked Anxiety/Fear Sadness Surprise 8 / 25

Experiments Experiment Berlin DB enterface05 DB Anger Anger Disgust Disgust Multi-class Fear Fear Neutral (Anger, disgust, fear) (Anger, disgust, fear, sadness) 2-class vs vs Neutral (Happiness, Surprise) Table: Experiments performed 9 / 25

Methodology: Classification 10 / 25

Results: Original signals Segments feat. Class. task Berlin DB enterface05 DB multi-class 80 . 0 ± 11 . 6 57 . 7 ± 6 . 8 Voiced 120 2-class 89 . 9 ± 7 . 8 65 . 1 ± 4 . 6 multi-class 62 . 5 ± 5 . 0 55 . 4 ± 6 . 8 Unvoiced 120 2-class 82 . 5 ± 8 . 6 64 . 6 ± 6 . 0 multi-class 74 . 7 ± 11 . 9 61 . 6 ± 4 . 5 Fusion 2-class 94 . 6 ± 5 . 1 69 . 2 ± 1 . 5 all signal multi-class 84 . 3 ± 6 . 6 66 . 6 ± 4 . 2 384 openEAR [Eyben2012] 2-class 94 . 9 ± 4 . 1 68 . 6 ± 4 . 8 Table: Accuracy for original non-affected speech signals 11 / 25

Results: Original signals Segments feat. Class. task Berlin DB enterface05 DB multi-class 80 . 0 ± 11 . 6 57 . 7 ± 6 . 8 Voiced 120 2-class 89 . 9 ± 7 . 8 65 . 1 ± 4 . 6 multi-class 62 . 5 ± 5 . 0 55 . 4 ± 6 . 8 Unvoiced 120 2-class 82 . 5 ± 8 . 6 64 . 6 ± 6 . 0 multi-class 74 . 7 ± 11 . 9 61 . 6 ± 4 . 5 Fusion 2-class 94 . 6 ± 5 . 1 69 . 2 ± 1 . 5 all signal multi-class 84 . 3 ± 6 . 6 66 . 6 ± 4 . 2 384 openEAR [Eyben2012] 2-class 94 . 9 ± 4 . 1 68 . 6 ± 4 . 8 Table: Accuracy for original non-affected speech signals. Previous Work: 76.9% 13 / 25

Experiments: Environments ◮ Original non-affected speech signals ◮ Cafeteria babble noise ◮ Street noise ◮ KLT algorithm ◮ LogMMSE algorithm SNR evaluated ranges from -3dB to 6dB 16 / 25

Results: Affected signals, 2-class (OpenEAR) Berlin database 100 Accuracy (%) 90 80 Original Noisy Caf 70 −3 −2 −1 0 1 2 3 4 5 6 Noisy Street enterface05 database KLT Caf 72 KLT Street 70 LogMMSE Caf Accuracy (%) LogMMSE Street 68 66 64 62 −3 −2 −1 0 1 2 3 4 5 6 SNR (dB) 17 / 25

Results: Affected signals, M-class (OpenEAR) Berlin database 90 Accuracy (%) 80 70 Original Noisy Caf 60 −3 −2 −1 0 1 2 3 4 5 6 Noisy Street enterface05 database KLT Caf 70 KLT Street LogMMSE Caf 65 Accuracy (%) LogMMSE Street 60 55 50 −3 −2 −1 0 1 2 3 4 5 6 SNR (dB) 18 / 25

Databases database # recordings Speakers Fs (Hz) Naturalness Berlin 534 12 16000 Acted Enterface05 (Audio-Video) 1317 44 44100 Evoked Segments Classif task enterface05 logMMSE Difference multi-class 66 . 9 ± 4 . 2 +0 . 3 openEAR 2-class 68 . 8 ± 3 . 1 +0 . 2 19 / 25

Results: Affected signals, 2class (WPT) Berlin database 95 Accuracy (%) 90 Original 85 Noisy Cafeteria Noisy Street −2 0 2 4 6 KLT Cafeteria enterface05 database KLT Street 70 LogMMSE Cafeteria LogMMSE Street 69.5 Accuracy (%) 69 68.5 68 67.5 67 −3 −2 −1 0 1 2 3 4 5 6 SNR (dB) 20 / 25

Results: Affected signals, 2-class (OpenEAR) Berlin database 100 Accuracy (%) 90 80 Original Noisy Caf 70 −3 −2 −1 0 1 2 3 4 5 6 Noisy Street enterface05 database KLT Caf 72 KLT Street 70 LogMMSE Caf Accuracy (%) LogMMSE Street 68 66 64 62 −3 −2 −1 0 1 2 3 4 5 6 SNR (dB) 21 / 25

Conclusion I 1. A different scheme for feature extraction based on WPT is presented, it highlights the low frequency zone from the speech signal. Its performance it is acceptable for the 2-class problem when compared with a well established scheme as OpenEAR. 2. The use of WPT in low frequency bands must be evaluated more deeply in order to improve performance for Multi-class problem. 3. Other features calculated from the wavelet decompositions must be considered, specially for unvoiced segments. 22 / 25

Conclusion II 4. New methodology seems to be more robust against non-controlled conditions. Although logMMSE algorithm outperforms KLT, performance for Speech Enhancement is not good enough. The affectation produced by the cafeteria babble noise is more crit- ical than the produced by the street noise. 5. Evaluation of non-additive environmental noise must be ad- dressed in the future. 23 / 25

Questions Thanks! Q? jesus.vargas@udea.edu.co 24 / 25

Questions Thanks! Q? jesus.vargas@udea.edu.co 25 / 25

Emotion Recognition in Speech under Environmental Noise Conditions - PowerPoint PPT Presentation

Emotion Recognition in Speech under Environmental Noise Conditions using Wavelet Decomposition asquez-Correa 1 , N. Garc a 1 , J.R. Orozco-Arroyave 1 , 2 , J.C. V no 1 , J.F. Vargas-Bonilla 1 , Elmar N oth 2 J.D. Arias-Londo 1 Faculty

8-Speech Recognition Speech Recognition Concepts Speech Recognition Approaches

Motivation and Emotion: Emotions, Stress and Health Unit Overview Theories of Emotion

Speech Processing Speech Processing Using Speech with Computers Overview Overview Speech vs

EECS E6870 converting speech to text Speech Recognition automatic speech recognition

HMMS and Speech HMMS and Speech HMMS and Speech Recognition Recognition Recognition Presented

Emotion and Child Development Eve Ekman, MSW, PhC UC Berkeley, UCSF Overview Science of Emotion

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 25: Speech

Speech recognition Brief history Technology Computer Literacy 1 Lecture 22 How does

Module-2c: Two Port Noise Modelling 20 July 2018 16:40 Shot Noise vs. Flicker Noise Simple

Emotion Lecturer: Dr Tony Mowbray (tony.mowbray@monash.edu) Learning Objectives Define

6-Text To Speech (TTS) Speech Synthesis Speech Synthesis Concept Speech Naturalness Phone

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Speech Recognition Acoustic

Speech Processing 15-492/18-492 Speech Recognition Template matching Speech Recognition by

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 23: Speech

GPU-Accelerated GPU-Accelerated Large Vocabulary Continuous Speech Recognition Large

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 1: Introduction

Disclaimer This presentation is based on CAOTs reference document, Working for the

CAF 2.0: A Next-generation Coarray Fortran Laksono Adhianto, John Mellor-Crummey, and Bill

METRONET Railcars Procurement and Maintenance Industry Briefing 12/12/2018 Momentum West: A

Study abroad programs = Academic programs Take classes seriously

Simplification of Cylindrical Algebraic Formulas Changbo Chen Joint work with Marc Moreno Maza

HTAs PROGRAMMING FOR PARALLELISM AND LOCALITY WITH PAPER PUBLISHED AT PPOPP MARCH 2006

Line Commission Meeting September 27, 2018 Agenda Margaret Doane Introductions Ho

Cache Aware Optimization of Stream Programs Janis Sermulins, William Thies, Rodric Rabbah and