Nonlinear Aspects of Speech Production: Modulations and Energy - PowerPoint PPT Presentation

Computer Vision, Speech Communication & Signal Processing Group, Intelligent Robotics and Automation Laboratory National Technical University of Athens, Greece (NTUA) Robot Perception and Interaction Unit, Athena Research and Innovation Center (Athena RIC) Nonlinear Aspects of Speech Production: Modulations and Energy Operators Petros Maragos Summer School on Speech Signal Processing (S4P) DA-IICT, Gandhinagar, India, 9-11 Sept. 2018 1

Outline  Nonlinear Speech Processing  Modulations  Energy Operators  AM-FM Speech Model, Demodulation Algorithms  Applications to Speech Recognition  Applications to Music Recognition  Application to Audio Summarization  Application to Distant Speech Recognition  Applications of Spatio-Temporal Modulations to Image and Video Processing 2

LINEAR Physics of Linear models ACOUSTICS speech airflow of speech APPROXIMATION production

Physics of Speech Airflow  p • airflow variables: = air density; = pressure  u = 3D air particle velocity • governing equations:          u 0 mass conservation (continuity eqn):  t momentum conservation (Navier-Stokes eqn):       u     1                 2 u u p g u u        t  3  p   1.4 const. state equation: • time-varying boundary conditions

Nonlinear Speech Processing • Modulations • Turbulence – Fractals – Chaos

Evidence for Speech Modulations • separated & unstable airflow • vortices • oscillators with time-varying elements • energy pulses (Teager)

Time-varying Oscillators  AM-FM Simple second-order oscillators with time-varying elements produce modulations: - If mass or compliance are time-varying  FM [Van der Pol, Proc. IRE 1930] - If damping is time-varying  AM [Van der Pol, IEE J. London 1946]

AM-FM Speech Model, Energy Demodulation Algorithms

AM-FM Speech Modulation Model [ Maragos, Kaiser & Quatieri, IEEE T-SP Oct.1993 ] • One Single Resonance as damped AM–FM:   t           t   S ( t ) A ( t ) e cos t q ( ) d        c 0          a ( t )  ( t ) d      Inst.Frequency: ω (t) 2 π f(t) φ (t) ω (t) q(t) c dt • If due to 2 nd -order LTI system  constant,  A(t) ω (t) ω c • Speech Signal as multi-component AM-FM:      Speech ( t ) a ( t ) cos ( t ) k k k

AM-FM Demodulation Problem      Given , estimate a t ( ), ( ) t x t ( ) a t ( ) cos( ( )) t • Variational approach • Hilbert Transform: 1    x (t) x(t) x(t) +j π t ω -j     d x    arctan     2 2 x x a   dt x • Energy Operators

Energy Tracking in Oscillators • harmonic oscillator • energy x(t) K m 1 1 m       2 2 2 2 E m x kx ( A ) constant 2 2 2 • motion equation  kx  m   x 0 • energy tracking • response E      2 2 2    (x) ( x ) x x A ω     ( m 2 ) x ( t ) A cos( t ),   2 k m

1D Energy Operators (Teager, Kaiser ICASSP 1990) • Continuous-time signals x ( t ) :       2   x ( t ) [ x ( t )] x ( t ) x ( t ) c property:      rt 2 2 rt 2 Ae cos ( ω t θ ) A e ω c c c • Discrete-time signals x(n) : -Discretize Derivatives [Maragos, Kaiser & Quatieri, T-        2 x ( n ) x ( n ) x ( n 1 ) x ( n 1 ) SP Apr.1993] d -Special case of Quadratic opers [Atlas & Fang, T-SP 1995] property:       n 2 2 n 2 A r cos ( Ω n ) A r sin ( Ω )   d c c

Energy Separation Algorithm (ESA) (Maragos, Kaiser & Quatieri, IEEE T-SP Oct. 1993)   x(t) A cos ( ω (t) θ ) • Cosine: c     2 2  2 4 [ x(t) ] A ω [ x (t) ] A ω c c t   x(t) a(t) cos ( ω ( τ )d τ ) • AM-FM signal: 0  a(t), ω (t) do not vary too fast or too much w.r.t. c  [ ( )] x t   [ ( )] x t  a t ( )   ( ) t   [ ( )] x t  [ ( )] x t

Discrete ESA (DESA-2) n    • AM-FM Signal: x n [ ] a n [ ]cos ( (m)dm ) 0 • Energy Tracking:        2 2 x n [ ] a [ ] sin n [ ] n           4 4 x n [ 1] x n [ 1] 4 a [ ] sin n [ ] n • DESA-2:    2 x n [ ]  a n [ ]       x n [ 1] x n [ 1]       x n [ 1] x n [ 1]   arcsin [ ] n    4 x n [ ]

ESA Applied to Synthetic AM-FM 1.25 1 SQRT ENERGY AM--FM SIGNAL 0 0.5 -1.25 0 0 100 200 300 400 0 100 200 300 400 SAMPLE SAMPLE 1.25 0.25 INST. FREQUENCY / PI AMPLITUDE ENVELOPE 1 0.2 0.75 0.15 0 100 200 300 400 0 100 200 300 400 SAMPLE SAMPLE 0.0007 0.006 FREQUENCY ERROR / PI AMPLITUDE ERROR 0 0 -0.0006 -0.007 0 100 200 300 400 0 100 200 300 400 SAMPLE SAMPLE

ESA Applied to Speech Resonance 1 1 SPEECH SIGNAL SQRT ENERGY 0 0.5 -1 0 0 10 20 30 0 10 20 30 TIME (msec) TIME (msec) 200 3 AMPLITUDE ENVELOPE 100 100 SPEECH SPECTRUM (dB) 2 0 -100 1 -200 0 0 10 20 30 -300 0 1 2 3 4 5 6 TIME (msec) FREQUENCY (kHz) 3800 1.1 INST. FREQUENCY (Hz) 3600 BANDPASS SPEECH 3400 0 3200 3000 2800 -1.1 0 10 20 30 0 10 20 30 TIME (msec) TIME (msec)

ESA in Noise and BP Filtering (Bovik, Maragos & Quatieri, IEEE T-SP Dec. 1993) t • AM-FM signal:    x(t) a(t) cos ( ω ( τ )d τ ) n(t)          0 signal • Noise: wss Gaussian zero-mean, p.spectrum N( ξ ) • Bandpass Filter: 2 a (t)  SNR(t) x(t) G(ξ) y(t)    N ( ) d passband   • ESA Ampl./Freq. Estimates: a (t), ω (t)   4 SNR(t) E      2 2 [ ω (t) ] ω (t) 1    2 [ SNR(t) 2 ]      10 SNR(t) 4    2     2 2 E [ a (t) ] a (t) 1 G ω (t)    SNR(t) [ SNR(t) 2 ]  

Multiband Demodulation and F/B Tracking … f f f f 1 2 3 N x ( t , f ) x ( t , f ) x ( t , f ) x ( t , f ) 3 N 1 2 … ESA ESA ESA ESA a ( t , f ) a ( t , f ) a ( t , f ) a ( t , f ) 1 2 3 N f ( t , f ) f ( t , f ) f ( t , f ) f ( t , f ) 1 2 3 N …  f 2   f 2   f 2   f 2  a a a a F(t,f) B(t,f) [ A. Potamianos & P. Maragos, JASA 1996 ]

Frequency and Bandwidth Estimates • Center Frequency Estimates: 2 T f t a 1  ( ) ( ) t dt T o  F  f t dt ( )  Fw o u 2 ( ) T a T  t dt o • Bandwidth Estimates: 1 2 T   2 B  ( f t ( ) F ) dt o u u T   T 2 2 2      ( ( ) / 2 ) a t ( f t ( ) F ) a ( ) t dt o    w  2  Bw 2 ( ) T a t dt  o

Speech Pyknogram [ A. Potamianos & P. Maragos, JASA 1996 ]

Smooth Energy Operators and tracking  Teager-Kaiser Energy Operator (TKEO):  AM-FM signals :  Regularized or Gabor TKEO : where the Gabor filter’s impulse response  Wideband signals (sum of non-stationary sinusoids)  Simultaneous narrowband component separation, energy tracking and denoising  2D Gabor TKEO : Refs: Dimitriadis & Maragos, Speech Com 2006. Kokkinos, Evangelopoulos & Maragos, T-PAMI 2009

1/f Speech Modulation Model • Model a resonance of a random speech phoneme as a phase-modulated 1/f signal:      S t ( ) A cos t P t ( )  c    ( ) t • Nonlinear phase signal P(t) modeled as 1/f random process . • Useful model for broad resonances often observed in fricative voiced or unvoiced sounds and probably caused by nonlinear phenomena during speech production. [ Dimakis & Maragos, IEEE T-SP 2005 ]

Nonlinear Aspects of Speech Production: Modulations and Energy - PowerPoint PPT Presentation

Computer Vision, Speech Communication & Signal Processing Group, Intelligent Robotics and Automation Laboratory National Technical University of Athens, Greece (NTUA) Robot Perception and Interaction Unit, Athena Research and Innovation

A Comparison of energy efficiency for UWB Modulations Adil ELABBOUBI, Fouzia ELBAHHAR, Marc

Speech Processing Speech Processing Using Speech with Computers Overview Overview Speech vs

6-Text To Speech (TTS) Speech Synthesis Speech Synthesis Concept Speech Naturalness Phone

Nonlinear Control Lecture # 31 Nonlinear Observers Nonlinear Control Lecture # 31 Nonlinear

Nonlinear Control Lecture # 22 Special nonlinear Forms Nonlinear Control Lecture # 22 Special

Nonlinear Control Lecture # 21 Special nonlinear Forms Nonlinear Control Lecture # 21 Special

Nonlinear Control Lecture # 12 Nonlinear Observers and Output Feedback Stabilization Nonlinear

Nonlinear Control Lecture # 8 Special nonlinear Forms Nonlinear Control Lecture # 8 Special

Nonlinear Control Lecture # 20 Special nonlinear Forms Nonlinear Control Lecture # 20 Special

Chapter 3 Acoustic Theory of Speech Production 1 Outline Speech

Phonetics & Phonology Jrgen Trouvain Areas of phonetics Speech production Speech

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Speech Recognition Acoustic

Project Overview Speech Speech Generation Generation Common Semantic Frame Speech Speech

EECS E6870 converting speech to text Speech Recognition automatic speech recognition

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Speech Synthesis Evaluation

Speech Processing 15-492/18-492 Speech Synthesis Overview Text processing Speech Synthesis

Results and problems on diophantine properties of radix representations Attila Peth o

Slice regular Malmquist-Takenaka systems Margit Pap University of P ecs, Hungary 6th Workshop

Critical Perspectives on Management, Governance and Control of ICT Margunn Aanestad Today:

Applications of the change-of-rings spectral sequence to the computation of Hochschild cohomology

Intro to NGS Theoretical and Practical HiC Workshop: Wet-lab and Bioinformatics November 4th,

Flocculation and Dissolved Air Flotation Sean Bowen, Aleah Henry Cost Estimation Biomass

Ch 7 SAQs (Pop Quiz) 1. Why is it difficult to know if the affective principles have been

New sciences for a new web Prabhakar Raghavan Yahoo! Research Microeconomics Social Sciences

Nonlinear Aspects of Speech Production: Modulations and Energy - PowerPoint PPT Presentation

Computer Vision, Speech Communication & Signal Processing Group, Intelligent Robotics and Automation Laboratory National Technical University of Athens, Greece (NTUA) Robot Perception and Interaction Unit, Athena Research and Innovation

A Comparison of energy efficiency for UWB Modulations Adil ELABBOUBI, Fouzia ELBAHHAR, Marc

Speech Processing Speech Processing Using Speech with Computers Overview Overview Speech vs

6-Text To Speech (TTS) Speech Synthesis Speech Synthesis Concept Speech Naturalness Phone

Nonlinear Control Lecture # 31 Nonlinear Observers Nonlinear Control Lecture # 31 Nonlinear

Nonlinear Control Lecture # 22 Special nonlinear Forms Nonlinear Control Lecture # 22 Special

Nonlinear Control Lecture # 21 Special nonlinear Forms Nonlinear Control Lecture # 21 Special

Nonlinear Control Lecture # 12 Nonlinear Observers and Output Feedback Stabilization Nonlinear

Nonlinear Control Lecture # 8 Special nonlinear Forms Nonlinear Control Lecture # 8 Special

Nonlinear Control Lecture # 20 Special nonlinear Forms Nonlinear Control Lecture # 20 Special

Chapter 3 Acoustic Theory of Speech Production 1 Outline Speech

Phonetics &amp; Phonology Jrgen Trouvain Areas of phonetics Speech production Speech

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Speech Recognition Acoustic

Project Overview Speech Speech Generation Generation Common Semantic Frame Speech Speech

EECS E6870 converting speech to text Speech Recognition automatic speech recognition

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Speech Synthesis Evaluation

Speech Processing 15-492/18-492 Speech Synthesis Overview Text processing Speech Synthesis

Results and problems on diophantine properties of radix representations Attila Peth o

Slice regular Malmquist-Takenaka systems Margit Pap University of P ecs, Hungary 6th Workshop

Critical Perspectives on Management, Governance and Control of ICT Margunn Aanestad Today:

Applications of the change-of-rings spectral sequence to the computation of Hochschild cohomology

Intro to NGS Theoretical and Practical HiC Workshop: Wet-lab and Bioinformatics November 4th,

Flocculation and Dissolved Air Flotation Sean Bowen, Aleah Henry Cost Estimation Biomass

Ch 7 SAQs (Pop Quiz) 1. Why is it difficult to know if the affective principles have been

New sciences for a new web Prabhakar Raghavan Yahoo! Research Microeconomics Social Sciences

Phonetics & Phonology Jrgen Trouvain Areas of phonetics Speech production Speech