Nonlinear Aspects of Speech Production: Fractals and Chaotic - PowerPoint PPT Presentation

Computer Vision, Speech Communication & Signal Processing Group, Intelligent Robotics and Automation Laboratory National Technical University of Athens, Greece (NTUA) Robot Perception and Interaction Unit, Athena Research and Innovation Center (Athena RIC) Nonlinear Aspects of Speech Production: Fractals and Chaotic Dynamics Petros Maragos Summer School on Speech Signal Processing (S4P) DA-IICT, Gandhinagar, India, 9-11 Sept. 2018 1

Outline  Nonlinear Speech Processing  Turbulence: Fractals, Chaotic Dynamics  Multiscale Fractal Dimensions of Speech Sounds  Fractal Modulations for Fricative Sounds  Chaotic Dynamics of Speech Sounds  Algorithms for Speech Fractal & Chaos Analysis  Application to Speech Recognition  Application to Music Recognition 2

Linear Source-Filter Model PITCH PERIOD A V VOCAL TRACT GLOTTAL IMPULSE PARAMETERS PULSE TRAIN X MODEL GENERATOR G(z) VOCAL VOICED/UNVOICED RADIATION TRACT SWITCH MODEL MODEL u ( n ) s ( n ) R(z) V(z) RANDOM NOISE X GENERATOR (Rabiner & Schafer, 1978) A N

Nonlinear Fluid Dynamic of the Vocal Tract (Kaiser 1993)

Physics of Speech Airflow  p • airflow variables: = air density; = pressure  u = 3D air particle velocity • governing equations:          u 0 mass conservation (continuity eqn):  t momentum conservation (Navier-Stokes eqn):       u     1                 2 u u p g u u        t  3  p   1.4 const. state equation: • time-varying boundary conditions

Speech Aerodynamics    ( velocity scale ) ( length scale ) • Reynolds number:  Re   • low viscosity μ high Re  inertia forces viscous forces  • “aerodynamic” phenomena (Re >>1): air jet, rotational motion, separated airflow, boundary layers, vortices, turbulence • experimental & theoretical evidence for nonlinear phenomena: Teager (1970s–1980s), Kaiser (1983 – ), Thomas (1986), McGowan (1988), Barney, Shadle & Davis (1999), ...

Vortices   • vorticity:     u   • VORTEX is a flow region of similar • a vortex can be generated by: – velocity gradients in boundary layers – separated air flow – curved geometry of vocal tract • dynamics of vortex propagation:                      2 u u  t      u vorticity twisting & stretching   2   diffusion of vorticity

Nonlinear Speech Processing • Modulations • Turbulence – Fractals – Chaos

Turbulence • flow state with broad-spectrum rapidly-varying (in space and time) velocity and vorticity • transition to turbulence is easier for higher Re flows • eddies: vortices of a characteristic size  • Energy Cascade Theory (Richardson,1922) (multiscale hierarchy of eddies) • 5/3 spectral law (Kolmogorov, 1941):   k   2 3 5 3 S k r , r    k 2 /  wavenumber r  energy dissipation rate   S k r  , velocity wavenumber spectrum

Turbulence, Fractals and Chaos • fractal geometry quantifies multiscale structures in turbulence • Kolmogorov’s 5/3 law       2 3        Var u x u x x x    • we use fractal dimension to quantify “amount” of turbulence in speech  • chaos turbulence

Multiscale Fractal Dimension of Speech Spounds 400 800 3000 SPEECH SIGNAL: / IY / SPEECH SIGNAL: / F / SPEECH SIGNAL: / V / 0 0 0 −400 −800 −3000 0 5 10 15 20 25 30 0 5 10 15 20 25 30 0 5 10 15 20 25 30 TIME (millisec) TIME (millisec) TIME (millisec) 2 2 2 1.9 1.9 1.9 FRACTAL DIMENSION of / IY / FRACTAL DIMENSION of / F / FRACTAL DIMENSION of / V / 1.8 1.8 1.8 1.7 1.7 1.7 1.6 1.6 1.6 1.5 1.5 1.5 1.4 1.4 1.4 1.3 1.3 1.3 1.2 1.2 1.2 1.1 1.1 1.1 1 1 1 0 0.5 1 1.5 2 2.5 3 3.5 4 0 0.5 1 1.5 2 2.5 3 3.5 4 0 0.5 1 1.5 2 2.5 3 3.5 4 SCALE (millisec) SCALE (millisec) SCALE (millisec) /f/ /v/ /iy/ [ P. Maragos & A. Potamianos, JASA 1999 ]

Speech Attractors /ao/ 1 0.5 /ao/,D E =6, #1846 /iy/,D E =5, #1068 /iy/ 1 X(t) 0 0.5 −0.5 X(t) 0 −1 1 0 500 1000 1500 Time 1 −0.5 −1 0.5 0 200 400 600 800 1000 Time 0.5 0 0 −0.5 −0.5 −1 −1 1 1 0.5 1 1 0.5 0.5 0 0.5 0 0 /k/,D E =6, #816 −0.5 0 /s/,D E =5, #829 −0.5 −0.5 −0.5 −1 −1 −1 −1 1 1 0.5 0.5 0 0 −0.5 −0.5 /k/ 1 −1 /s/ 0.5 1 −1 −1.5 X(t) 1 1 0 0.5 0.5 1 1 0.5 −0.5 X(t) 0 0 0.5 0.5 0 0 −0.5 −1 0 0 200 400 600 800 −0.5 −0.5 Time −0.5 −1 −0.5 −1 −1 −1.5 −1 −1 −1.5 0 200 400 600 800 [ Pitsikalis & Maragos, Speech Commun 2009 ] Time

Multiscale Fractal Dimensions for Speech Sounds Refs: • P. Maragos and A. Potamianos, “ Fractal Dimensions of Speech Sounds: Computation and Application to Automatic Speech Recognition ”, Journal of Acoustical Society of America , March 1999. • P. Maragos, “ Fractal Signal Analysis Using Mathematical Morphology ”, in Advances in Electronics and Electron Physics, vol.88, Academic Press, 1994.

FRACTALS: Definitions • Mandelbrot’s definition  S set is fractal  D ( ) S D ( ) S Hausdorff dim topological dim H T • Examples     S = D 0 D 1 fractal dust T H     S = D 1 D 2 fractal curve T H     S = D 2 D 3 fractal surface T H • Signals    v A function is a fractal if its graph f :  v  1 Gr f ( ) is a fractal set in      f v D D [ Gr f ( )] v 1 is continuous T H 14

‘F RACTAL ’ D IMENSIONS (OF SETS IN R ν ) D Hausdorff dimension = H D = Minkowski-Bouligand dimension MB D = box counting dimension BC D = similarity dimension S £ £ £ = £ 0 D D D D v T H MB BC £ D D H S 15

Morphological Measurement of Fractal Dimension      • Minkowski cover of curve G : rB z C ( ) r     B  z G   D  • Fractal (Minkowski-Bouligand) dimension 1,2   A r            B  1 D A r area C r  ; length of G r r  B B 2 r • Least-Squares line fit to data          2 log A r r ,log 1 r D   B

Morphological (Flat & Weighted) Filters     ( f g )( ) x max f y ( ) g x ( y ) Dilation (Max-plus convolution): y     Erosion (Min-plus correlation): ( f g )( ) x min f y ( ) g y ( x ) y ORIGINAL SIGNAL EROSION BY FLAT & PARABOLIC SE DILATION BY FLAT & PARABOLIC SE 100 110 110 100 100 50 50 50 PARABOLA PULSE 10 0 0 0 100 200 300 0 Sample Index −10 0 100 200 300 −10 0 100 200 300 0 −10 0 10 Sample Index OPENING BY FLAT & PARABOLIC SE CLOSING BY FLAT & PARABOLIC SE Opening: 110 110 100 100     f g ( f g ) g Closing: 50 50     f g ( f g ) g 0 0 −10 −10 0 100 200 300 0 100 200 300

Minkowski Fractal Dimension of 1D Curve and Morphological Algorithm for 1D Signals 18

ST Speech & Fractal Dimension ZERO−CROSSINGS MS−AMPLITUDE 2 1.8 FRACTAL DIMENSION 1.6 1.4 1.2 1 SPEECH SIGNAL / SOOTHING / 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 TIME (in sec)

Multiscale Speech Fractal Dimension • short-time speech • variable power law signal           2 D area G B C   ,  t  S t 0 T • signal graph • multiscale fractal           2 G t S t , R :0 t T “dimension” (speech fractogram):  • fractal     of    MFD t , D constant power law short-time speech segment          2 D area G B C , as 0 t around time

Multiscale Fractal Dimension of Speech Sp ounds 400 800 3000 SPEECH SIGNAL: / IY / SPEECH SIGNAL: / F / SPEECH SIGNAL: / V / 0 0 0 −400 −800 −3000 0 5 10 15 20 25 30 0 5 10 15 20 25 30 0 5 10 15 20 25 30 TIME (millisec) TIME (millisec) TIME (millisec) 2 2 2 1.9 1.9 1.9 FRACTAL DIMENSION of / IY / FRACTAL DIMENSION of / F / FRACTAL DIMENSION of / V / 1.8 1.8 1.8 1.7 1.7 1.7 1.6 1.6 1.6 1.5 1.5 1.5 1.4 1.4 1.4 1.3 1.3 1.3 1.2 1.2 1.2 1.1 1.1 1.1 1 1 1 0 0.5 1 1.5 2 2.5 3 3.5 4 0 0.5 1 1.5 2 2.5 3 3.5 4 0 0.5 1 1.5 2 2.5 3 3.5 4 SCALE (millisec) SCALE (millisec) SCALE (millisec) /f/ /v/ /iy/ [ P. Maragos & A. Potamianos, JASA 1999 ]

Nonlinear Aspects of Speech Production: Fractals and Chaotic - PowerPoint PPT Presentation

Computer Vision, Speech Communication & Signal Processing Group, Intelligent Robotics and Automation Laboratory National Technical University of Athens, Greece (NTUA) Robot Perception and Interaction Unit, Athena Research and Innovation

Fractals and the Mandelbrot Set Matt Ziemke October, 2012 Matt Ziemke Fractals and the

Diffusion on fractals: Branching Processes and Random Fractals Ben Hambly Mathematical Insitute

Speech Processing Speech Processing Using Speech with Computers Overview Overview Speech vs

6-Text To Speech (TTS) Speech Synthesis Speech Synthesis Concept Speech Naturalness Phone

Nonlinear Control Lecture # 31 Nonlinear Observers Nonlinear Control Lecture # 31 Nonlinear

Nonlinear Control Lecture # 22 Special nonlinear Forms Nonlinear Control Lecture # 22 Special

Nonlinear Control Lecture # 21 Special nonlinear Forms Nonlinear Control Lecture # 21 Special

Computer Simulation and Applications in Life Sciences Fractals and Simulation of Recursive

CS324e - Elements of Graphics and Visualization Fractals and 3D Landscapes Fractals A

Knots, four dimensions, and fractals Arunima Ray Brandeis University February 6, 2017 Arunima

Branching random walks and fractals Ben Hambly (joint with David Croydon, Philippe Charmoy)

Fractals Self Similarity and Fractal Geometry presented by Pauline Jepp 601.73 Biological

Principles of Database Systems V. Megalooikonomou Fractals and Databases (based on notes by C.

CS 5 4 3 : Com puter Graphics Lecture 3 ( Part I ) : Fractals Emmanuel Agu W hat are Fractals?

Nonlinear Control Lecture # 12 Nonlinear Observers and Output Feedback Stabilization Nonlinear

Nonlinear Control Lecture # 8 Special nonlinear Forms Nonlinear Control Lecture # 8 Special

Markov Decision Processes Philipp Koehn 7 April 2020 Philipp Koehn Artificial Intelligence:

Lecture 11 Authentication 1 Where are we now? We know a bit of the following:

A Versatile Sharp I nterface I mmersed A Versatile Sharp I nterface I mmersed Boundary Method

Continuous Authentication for Voice Assistants Huan Feng * , Kassem Fawaz * , and Kang G. Shin

BBNANG243 Phonological analysis 34. Contrast in English consonants Zoltn G. Kiss,

IMLI: An Incremental Framework for MaxSAT-Based Learning of Interpretable Classification Rules

Intro, packages & tools Advanced functional programming - Lecture 1 Wouter Swierstra and

Serious Mental Illness in Youth: Working with Families Lindsay Smart, Ph.D. Center for Rural and

Nonlinear Aspects of Speech Production: Fractals and Chaotic - PowerPoint PPT Presentation

Computer Vision, Speech Communication & Signal Processing Group, Intelligent Robotics and Automation Laboratory National Technical University of Athens, Greece (NTUA) Robot Perception and Interaction Unit, Athena Research and Innovation

Fractals and the Mandelbrot Set Matt Ziemke October, 2012 Matt Ziemke Fractals and the

Diffusion on fractals: Branching Processes and Random Fractals Ben Hambly Mathematical Insitute

Speech Processing Speech Processing Using Speech with Computers Overview Overview Speech vs

6-Text To Speech (TTS) Speech Synthesis Speech Synthesis Concept Speech Naturalness Phone

Nonlinear Control Lecture # 31 Nonlinear Observers Nonlinear Control Lecture # 31 Nonlinear

Nonlinear Control Lecture # 22 Special nonlinear Forms Nonlinear Control Lecture # 22 Special

Nonlinear Control Lecture # 21 Special nonlinear Forms Nonlinear Control Lecture # 21 Special

Computer Simulation and Applications in Life Sciences Fractals and Simulation of Recursive

CS324e - Elements of Graphics and Visualization Fractals and 3D Landscapes Fractals A

Knots, four dimensions, and fractals Arunima Ray Brandeis University February 6, 2017 Arunima

Branching random walks and fractals Ben Hambly (joint with David Croydon, Philippe Charmoy)

Fractals Self Similarity and Fractal Geometry presented by Pauline Jepp 601.73 Biological

Principles of Database Systems V. Megalooikonomou Fractals and Databases (based on notes by C.

CS 5 4 3 : Com puter Graphics Lecture 3 ( Part I ) : Fractals Emmanuel Agu W hat are Fractals?

Nonlinear Control Lecture # 12 Nonlinear Observers and Output Feedback Stabilization Nonlinear

Nonlinear Control Lecture # 8 Special nonlinear Forms Nonlinear Control Lecture # 8 Special

Markov Decision Processes Philipp Koehn 7 April 2020 Philipp Koehn Artificial Intelligence:

Lecture 11 Authentication 1 Where are we now? We know a bit of the following:

A Versatile Sharp I nterface I mmersed A Versatile Sharp I nterface I mmersed Boundary Method

Continuous Authentication for Voice Assistants Huan Feng * , Kassem Fawaz * , and Kang G. Shin

BBNANG243 Phonological analysis 34. Contrast in English consonants Zoltn G. Kiss,

IMLI: An Incremental Framework for MaxSAT-Based Learning of Interpretable Classification Rules

Intro, packages &amp; tools Advanced functional programming - Lecture 1 Wouter Swierstra and

Serious Mental Illness in Youth: Working with Families Lindsay Smart, Ph.D. Center for Rural and

Intro, packages & tools Advanced functional programming - Lecture 1 Wouter Swierstra and