Speech Signal Representations Part 1: Digital Signal Processing - PowerPoint PPT Presentation

Speech Signal Representations Part 1: Digital Signal Processing Hsin-min Wang References: 1 X. Huang et al., Spoken Language Processing, Chapters 5-6 2 J. R. Deller et al., Discrete-Time Processing of Speech Signals, Chapters 4-6 3 J. W. Picone, “Signal modeling techniques in speech recognition,” proceedings of the IEEE, September 1993, pp. 1215-1247 1

Introduction � Current speech recognition systems are mainly composed of: − A front-end feature extractor (feature extraction module) • Discover salient characteristics suited for classification • Based on scientific and/or heuristic knowledge about patterns to recognize − A back-end classifier (classification module) • Set class boundaries accurately in the feature space • Statistically designed according to the fundamental Bayes’ decision theory 2

Analog Signal to Digital Signal Analog Signal Digital Signal: Discrete-time Discrete-time Signal or Digital Signal signal with discrete amplitude [ ] ( ) = x n x nT , T : sampling period; a t = nT sampling period=125 μ s 1 T F s = sampling rate =>sampling rate=8kHz 3

Two Main Approaches to Digital Signal Processing � Filtering Signal in Signal out Filter [ ] [ ] x n y n Amplify or attenuate some frequency components of [ ] x n � Parameter Extraction Signal in Parameter out Parameter [ ] Extraction x n     c c   c 21 L 1 11       c c c     e.g.:   22 L 2 12       1. Spectrum Estimation       2. Parameter for Recognition             c c c       2 m Lm 1 m 4

5.1 Digital Signals and Systems 5

Sinusoidal Signals [ ] ( ) = ω + φ x n A cos n f : normalized frequency ≤ f ≤ 0 1 A − : amplitude ( 振幅 ) ω ω = π 2 f − : angular frequency ( 角頻率 ), φ − : phase ( 相角 ) π   = [ ] period : T 25 samples = ω n −   x n A cos  2  = frequency : f 0 . 04 6

Sinusoidal Signals – periodic vs. non-periodic [ ] [ ] [ ] + = is periodic with period N if and only if x n x n N x n ( ) ( ) ω + + φ = ω + φ A cos ( n N ) A cos n π 2 ω = π ω = N 2 N ( ) ω + φ A cos n is not periodic for all values of w � Examples [ ] ( ) = π − is periodic with period N=8 x n cos n / 4 1 [ ] ( ) = π − x n cos 3 n / 8 is periodic with period N=16 2 [ ] ( ) = − x n cos n is not periodic 3 7

Sinusoidal Signals – periodic vs. non-periodic (cont.) [ ] ( ) = π x n cos n / 4 1 π π π     =  +  =  +  cos ( n N ) cos n N 1 1  4   4 4  π ⇒ = π ⋅ ⇒ = N 2 k N 8 k (both N and k are intergers) 1 1 2 4 ∴ = period N 8 1 [ ] ( ) = π x n cos 3 n / 8 2  π   π π  3 3 3 = + = +     cos ( n N ) cos n N 2 2 8 8 8     π 3 16 ⇒ = π ⋅ ⇒ = N 2 k N k (both N and k are intergers) 2 2 2 8 3 ∴ = period N 16 2 [ ] ( ) = x n cos n 3 = + cos( n N ) 3 ⇒ = π ⋅ N 2 k 3 can' t find N that satistify this equation under the condition that both N and k are intergers 3 3 ⇒ non - periodic 8

Sinusoidal Signals – complex exponential expression � A complex number z can be expressed in Cartesian form φ = + = − e j z x jy , j 1 = φ + φ cos j sin � The complex can also be expressed in polar form φ j = φ z Ae , where A is the amplitude and is the phase A sinusoidal signal can be expressed as the real part of the corresponding complex exponential [ ] ( ) = ω + φ x n A cos n = φ = φ x A cos( ), y A sin( ) { } ( ) ω + φ = j n Re Ae 9

Sinusoidal Signals – sum of two signals � The sum of two complex exponential signals with same frequency ( ) ( ) ω + φ ω + φ j n + j n A e A e 0 1 0 1 ( ) φ φ ω = j n j + j e A e A e 0 1 0 1 ω φ = j n j e Ae ( ) ω + φ = j n Ae A , A and A are real numbers 0 1 − taking the real part ( ) ( ) ( ) ω + φ + ω + φ = ω + φ A cos n A cos n A cos n 0 0 1 1 The sum of N sinusoids of the same frequency is another sinusoid of the same frequency 10

Some Digital Signals 11

Some Digital Signals – (cont.) � Any sequence x [ n ] can be represented as a sum of shift and scaled unit impulse sequences (signals) [ ] [ ] [ ] ∞ = δ − x n x k n k ∑ Time-shifted unit = −∞ k Scale/weighted impulse sequence 12

Digital Systems � A digital system T is a system that, given an input signal x [ n ] , generates an output signal y [ n ] [ ] [ ] { } = y n T x n � Properties of digital systems [ ] [ ] [ ] [ ] { } { } { } + = + − Linear : T ax n bx n aT x n bT x n 1 2 1 2 • Linear combination of inputs maps to linear combination of outputs [ ] [ ] { } − = − y n n T x n n − Time-invariant : 0 0 • A time shift in the input by n 0 samples gives a shift in the output by n 0 samples 13

LTI Systems � Linear-time-invariant (LTI) : system output can be expressed as a convolution ( 迴旋積分 ) of the input x [ n ] and the impulse response h [ n ] [ ] [ ] [ ] Time-shifted unit ∞ = δ − x n x k n k ∑ impulse sequence = −∞ k scale { } [ ] [ ] [ ] { } ∞ ⇒ = δ − T x n T x k n k ∑ = −∞ k linear [ ] [ ] ∞ { } = δ − x k T n k ∑ Impulse response = −∞ k [ ] [ ] [ ] [ ] ∞ δ n h n = − x k h n k Digital ∑ Unit impulse System = −∞ k [ ] [ ] Time-invariant = ∗ x n h n Time invariant [ ] [ ] δ  → T n h n convolution [ ] [ ] δ −  → T − n k h n k 14

LTI Systems (cont.) Length= M =3 [ ] 3 δ n 1 [ ] 2 h n LTI 0 1 -2 Length= L =3 [ ] 3 x n ? Length= L+M-1 2 1 LTI 9 0 1 2 [ ] 3 ⋅ 3 h n 3 2 Sum up [ ] ⋅ δ 3 n 0 1 [ ] 11 y n -6 0 [ ] 6 3 ⋅ − 1 3 2 h n 1 2 2 4 [ ] ⋅ δ − 3 2 n 1 1 0 2 -1 1 1 2 -2 -4 [ ] − h n 2 1 3 1 [ ] ⋅ δ − 4 1 n 2 2 2 3 15 -2

LTI Systems - convolution � Reflect h [ k ] about the origin ( → h [- k ] ) � Slide ( h [ - k ] → h [- k + n ] or h [-( k - n) ] ), multiply with x [ k ] � Sum up [ ] x k [ ] Reflect Multiply h k Sum up [ ] h − k slide 16

3 1 [ ] [ ] [ ] [ ] 2 h k = ∗ y n x n h n 0 1 [ ] [ ] -2 ∞ Reflect = − x k h n k ∑ 3 2 = −∞ 1 k [ ] x k 0 1 2 [ ] 3 = y n , n 0 3 1 -2 [ ] h − k 0 -1 0 -2 11 [ ] 3 = y n , n 1 1 [ ] -1 − k + h 1 Sum up 0 1 1 -2 [ ] 11 3 [ ] y n 1 = y n , n 2 1 [ ] 3 0 − k + 1 3 h 2 4 2 1 2 1 0 2 -2 -1 3 [ ] = -2 y n , n 3 1 [ ] 3 1 − k + h 3 2 3 -1 [ ] -2 = y n , n 4 3 1 [ ] 4 − k + 2 h 4 3 4 -2 -2 17

LTI Systems – convolution (cont.) � Convolution is commutative and distributive [ ] [ ] [ ] x n * h n * h n 1 2 [ ] [ ] [ ] = x n * h n * h n 2 1 [ ] [ ] [ ] [ ] h 1 n h 2 n h 2 n h 1 n Commutation [ ] ( [ ] [ ] ) + x n * h n h n [ ] 1 2 [ ] [ ] [ ] [ ] h 2 n = + x n * h n x n * h n [ ] [ ] 1 2 + h n h n Distribution 1 2 [ ] h 1 n [ ] [ ] [ ] = y n x n * h n – An impulse response has finite duration [ ] [ ] = h n * x n » Finite-Impulse Response (FIR) [ ] [ ] ∞ = − x k h n k ∑ – An impulse response has infinite duration = −∞ k [ ] [ ] ∞ » Infinite-Impulse Response (IIR) = − h k x n k ∑ = −∞ k 18

5.2 Continuous-Frequency Transforms 19

Discrete-Time Fourier Transform (DTFT) [ ] = jw n x [ n ] e h n y [ n ]=? 0 ( ) ∞ ∞ ∑ ∑ ω − ω − ω ω ω = = = j ( n k ) j n j k j n j y [ n ] h [ k ] e e h [ k ] e e H e 0 0 0 0 0 = −∞ = −∞ k k When the input is a complex exponential, the output is another complex exponential of the same frequency and amplitude multiplied ( ) ω by the complex quantity given by j H e 0 ( ) ∞ ∑ ω − = j jwn H e h [ n ] e = −∞ n The discrete-time Fourier transform of h [ n ] 20

Discrete-Time Fourier Transform (cont.) ( ) ω � The discrete-time Fourier transform of h [ n ], , is a j H e periodic function of w with period 2 π − One period can fully describe it, typically – π < w < π ( ) ω j − H e is a complex function of w , it can be expressed as real part imaginary part ( ) ( ) ( ) ω ω ω j = j + j Cartesian form H e H e jH e r i ( ) ( ) ω j ω ∠ = j j H e Polar form H e e phase magnitude 21

Speech Signal Representations Part 1: Digital Signal Processing - PowerPoint PPT Presentation

Speech Signal Representations Part 1: Digital Signal Processing Hsin-min Wang References: 1 X. Huang et al., Spoken Language Processing, Chapters 5-6 2 J. R. Deller et al., Discrete-Time Processing of Speech Signals, Chapters 4-6 3 J. W.

Speech Processing Speech Processing Using Speech with Computers Overview Overview Speech vs

Speech Signal Representations Part 2: Speech Signal Processing Hsin-min Wang References: 1 X.

Speech Processing 15-492/18-492 Speech Recognition Signal Processing Analog to Digital Speech

Chapter 1 Introduction to Speech Signal Processing 1 Outline The

6-Text To Speech (TTS) Speech Synthesis Speech Synthesis Concept Speech Naturalness Phone

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Speech Recognition Acoustic

Speech Processing 15-492/18-492 Speech Synthesis Signal Processing Signal Manipulation Signal

Speech Processing 15-492/18-492 Computer Speech Analog to Digital Speech (sound) is analog

Part-of-Speech Tagging Part-of-Speech Tagging Berlin Chen 2003 References: 1. Speech and

Analysis of speech Dr. Anil Kumar Vuppala IIIT Hyderabad Analysis of speech Representing speech

EECS E6870 converting speech to text Speech Recognition automatic speech recognition

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Speech Synthesis Evaluation

Speech Processing 15-492/18-492 Speech Synthesis Overview Text processing Speech Synthesis

Speech Processing 15- -492/18 492/18- -492 492 Speech Processing 15 Speech Synthesis Prosody

Project Overview Speech Speech Generation Generation Common Semantic Frame Speech Speech

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 25: Speech

High-Level Feature Extraction Using SIFT GMMs, Audio Models, and MFoM Ilseo Kim, Nakamasa

The complementarity of automatic, semi-automatic and phonetic measures of vocal tract output

Short Time Fourier Transform. Spectrograms. Mathematical Tools for ITS (11MAI) Mathematical

Lecture 2 Signal Processing and Dynamic Time Warping Michael Picheny, Bhuvana Ramabhadran,

New Algebraic estimation techniques in signal processing Mamadou Mboup UFR de Math ematiques

Inconsistent Executions Andrew DeOrio Daya Shanker Khudia Valeria Bertacco University of

Methods to Enhance the PUF Reliability of Key Generation from PUFs J.-L.Danger, F . Lozach,

Prss tr