3. Feature Extraction 3.1 Feature Extraction from Speech or other - - PowerPoint PPT Presentation

3 feature extraction
SMART_READER_LITE
LIVE PREVIEW

3. Feature Extraction 3.1 Feature Extraction from Speech or other - - PowerPoint PPT Presentation

3. Feature Extraction 3.1 Feature Extraction from Speech or other types of audio like music See Schukat-Talamazzini Chapter 3 2 Goal of Feature Extraction Capture essential information about speech Be robust against background


slide-1
SLIDE 1
  • 3. Feature Extraction
slide-2
SLIDE 2

2

3.1 Feature Extraction from Speech

… or other types of audio like music

See Schukat-Talamazzini Chapter 3

slide-3
SLIDE 3

3

Goal of Feature Extraction

  • Capture essential information about speech
  • Be robust against background noise
  • Steps:
  • Sampling and quantization
  • Short time analysis
  • Transform to frequency space
  • Filtering
  • Optimize class separability
slide-4
SLIDE 4

4

Overview Feature Extraction

Convert the continuous speech signal into a sequence of vectors Each window gives one vector The following slides will give the details of this procedure From: HTK-manual

slide-5
SLIDE 5

5

Sampling and Quantization

Measure signal periodically and store in variable Sampling rate: T Quantization: use B bits to represent signal a 2B possible values fn: sampled values of the signal numbered using index n what happens when you store a signal in a computer?

slide-6
SLIDE 6

6

Sampling Theorem

  • Reconstruction of original signal is
  • nly possible if the signals highest

frequency is limited

  • Let fG the frequency limit
  • Else: spectral aliasing

that is frequencies will be confused T fG 2 1

slide-7
SLIDE 7

7

Pre-emphasis

  • Correct for filtering of the lips
  • Boosts higher frequencies
  • Iterative scheme:
  • Typical values: =0.95

1 ´ n n n

f f f

What does it do for =1

slide-8
SLIDE 8

8

From Signal to Spectrum: Fourier Transform

  • Definition

n n i n m n i m

e w f e F ) (

) (

wn : window function : frequency times 2 i: imaginary unit The window cut’s the sum to a number of finite values Complex exponentials are easier than cos or sin functions

slide-9
SLIDE 9

9

Example: putting a rectangular on a speech signal

Fram e shi f t t yp. : 10m s Fram e w i dt h t yp. : 25m s

slide-10
SLIDE 10

10

Fourier Transform in Practice

  • Use “Fast Fourier Transform” (FFT)
  • Requires number of samples N to be power
  • f 2 (e.g. N=256)
  • Code available
  • Complexity N log( N)
slide-11
SLIDE 11

11

Established Window Functions

  • Use to get sharper peaks
  • Rectangular window:
  • Generalized Hamming Window:
  • Gauss window:
  • Parabola window:

1

R n

w

) 1 2 cos( ) 1 ( N n wH

n 2

) 2 / 3 2 / ( 5 . N N n G n

e w

) 1 ( 4 N n N n wP

n

( =0.54 : standard Hamming window)

n=0...N-1

  • Window functions vanish outside this interval
slide-12
SLIDE 12

12

Rewrite of Fourier Transform

  • Definition:
  • Window functions vanish outside the

interval n=0...N-1

  • Define

N 1 2

n n i n m n i m

e w f e F ) (

) (

1 2 ) ( N n N n i n n m m

e w f F

slide-13
SLIDE 13

13

Example for ö

Short time spectrum Smoothed spectrum

Frequency (Hz) Frequency (Hz)

How can you best look at multiple spectra at the same time

slide-14
SLIDE 14

14

Spectrogram

  • Calculate a spectrum for any point in time
  • Code the local intensity: color/grey scale

Time

slide-15
SLIDE 15

15

Spectrogram

http://www.wilhelm-kurz-software.de/dynaplot/applicationnotes/spectrogram.htm

"To return to the main menu, press the star key".

slide-16
SLIDE 16

16

Use praat to generate a Spectrogram

  • Praat: software for doing

phonetics by computer

  • Written by: Paul Boersma

and David Weenink

  • quite powerful:

spectrograms, formants, pitch, …

  • Download:

http://www.fon.hum.uva.nl/ praat/

slide-17
SLIDE 17

17

Use praat to generate a Spectrogram

  • Praat: software for doing

phonetics by computer

  • Written by: Paul Boersma

and David Weenink

  • quite powerful:

spectrograms, formants, pitch, …

  • Download:

http://www.fon.hum.uva.nl/ praat/

slide-18
SLIDE 18

18

Use praat to generate a Spectrogram

a demo

slide-19
SLIDE 19

19

Smoothing the Spectrum: filter bank

  • Idea: imitate ear
  • Do an average over neighboring frequencies
  • Scale the frequencies according to the Mel or the

Bark scale a Reduction from 256 Fourier coefficients to 24

  • utputs of a filter bank
slide-20
SLIDE 20

20

Example of a Filterbank

slide-21
SLIDE 21

21

Filterbank

  • Spacing of center frequency:

– According to mel scale:

  • Low frequency cut off:

– E.g. 300 Hz (for telephone speech)

  • High frequency cut off:

– E.g. 3400 Hz (for telephone speech )

  • Different settings for e.g. head set connected PC

) 700 1 ( log 2595 ) (

10

f f Mel

How can you adjust to different vocal tracts?

slide-22
SLIDE 22

22

Vocal Tract Length Normalization

  • Idea:
  • Average position of formants depends on length
  • f vocal tract
  • a varying position of frequencies of filter bank
  • A kind of speaker adaptation
slide-23
SLIDE 23

23

Vocal Tract Length Normalization: Frequency Warping

  • Translation table

for frequencies

  • Keep minimum

and maximum frequency unchanged

min=0.8 to max=1.2

slide-24
SLIDE 24

24

Training the Warping Factor

  • Issue: how to scale for a specific speaker
  • Slow version:
  • Use 11 different warping factors
  • Do speech recognition with all of them
  • Pick the best one
  • Oldest approach
  • Not very efficient
  • Improvement: 10% less recognition errors
slide-25
SLIDE 25

25

From Spectrum to Cepstrum

  • Name: swapping of letters (spectrum/cepstrum)
  • Useful as a preparation to remove channel

distortions

  • Cepstral mean subtraction (CMS)

method to remove channel distortions What are examples

  • f channel distortions?
slide-26
SLIDE 26

26

Definition “Cepstrum”

Fourier Transform log Discrete Cosine Transform Signal Spectrum Cepstrum

slide-27
SLIDE 27

27

Math for Cepstrum

  • en: original signal (e.g. excitation from glotis)
  • fn: measured signal
  • hn: impulse response of channel (e.g. vocal

tract, telephone, room acoustics)

n n n m m

e h f

slide-28
SLIDE 28

28

Math for Cepstrum

  • Apply Fourier transform F
  • Use convolution theorem

} { } {

n n n m n

e h f F F

} { } { } {

n n n

e h f F F F

slide-29
SLIDE 29

29

Math for Cepstrum

  • Apply logarithm
  • Impulse response and excitation now separated
  • If stationary part of impulse response hn can

now be removed

}) { log( }) { log( }) { log(

n n n

e h f F F F

slide-30
SLIDE 30

30

Cepstrum: do discrete cosine transform after log

  • Discrete cosine transform:

,... 2 , 1 ) ) 2 / 1 ( cos( ) log( 2

1 ) ( ) (

n N l n F N c

N l m l m n

You do not need to remember this formula

slide-31
SLIDE 31

31

Dynamic Features

  • Spectrum captures local aspects of speech
  • Window size 25 ms
  • Capture slow changes in spectrum
  • Other name: delta features
slide-32
SLIDE 32

32

Dynamic Features

  • Capture slow changes in spectrum
slide-33
SLIDE 33

33

Dynamic Features

  • Calculate first and second derivatives
  • Naïve approach to first derivative

– Continuous function – Time discrete sampling

t t t f t t f dt t df 2 ) ( ) ( ) ( 2 ) ( ) ( ) (

m m m

t f t f dt t df

tm: m-th sample of the signal

slide-34
SLIDE 34

34

Difference/Regression

Sample i-th component of feature vector m m-3 m-2 m+1 m-1 m+2 m+3 Regression curve Line through extremes

slide-35
SLIDE 35

35

Regression Formula

M i M i i m i m

i t f t f i dt t df

1 2 1

2 )) ( ) ( ( ) (

Can you make it agree with

t t t f t t f dt t df 2 ) ( ) ( ) (

slide-36
SLIDE 36

36

Dynamic Features

  • Invented by Furui 1981
  • Standard in any modern ASR system
  • Alternative:
  • Linear mapping of neighboring feature vectors
  • Issue:
  • Dimension of feature vectors
slide-37
SLIDE 37

37

Linear Discriminant Analysis

  • Method to decrease size of feature vector
  • Maximize severability of class regions
  • Linear transform of feature vectors
  • More: later in the lecture
slide-38
SLIDE 38

38 Complete Pipeline for Mel-Frequency Cepstral Coefficients (MFCC)

Sampling Windowing Fast Fourier Transform

512 Fourier Coefficients

Absolute Value Mel-scaled Filterbank log Discrete Cosine Transform Dynamic Features (1. and 2. derivative) Linear Discriminant Analysis

16 kHz; 16 Bit quantization

Pre-emphasis

Signal

Feature Vectors Window size: 25 ms Typical values: 24 filterbank values keep only 20 lowest cepstra 60 dimensional vector

slide-39
SLIDE 39

39

Alternative Feature Extraction Methods

  • LP-Cepstrum (LP=linear prediction)
  • Derived from speech coding
  • No longer much in use
  • PLP (=Perceptual linear prediction)
  • For certain applications popular
  • Claim: mode noise robust than MFCCs
  • Main change: us |.|1/3 instead of log in MFCC
slide-40
SLIDE 40

40

Summary

  • Classical “plain vanilla” feature extraction:

Mel-Frequency Cepstral Coefficients

  • Main deficiency: not very noise robust
  • Used in
  • Speech Recognition
  • Speaker Recognition
  • Music genre classification
slide-41
SLIDE 41

41

3.2 Feature Extraction from Image Processing

slide-42
SLIDE 42

42

Overview

  • Feature types:
  • Color
  • Texture
  • Edge
slide-43
SLIDE 43

43

Image

slide-44
SLIDE 44

44

Physics

  • It’s all electromagnetic (EM) radiation
  • Different colors correspond to radiation of

different wavelengths

  • Intensity of each wavelength specified by

amplitude

  • We perceive EM radiation within the 400-

700 nm range, a tiny piece of spectrum between infra-red and ultraviolet

slide-45
SLIDE 45

45

Visible Light

slide-46
SLIDE 46

46

Color and Wavelength

Most light we see is not just a single wavelength, but a combination of many wavelengths (see below). This profile is often referred to as a spectrum, or spectral power distribution.

slide-47
SLIDE 47

47

Image Representation (RGB)

slide-48
SLIDE 48

48

Image Representation (Channels)

slide-49
SLIDE 49

49

Image Representation

(r,g,b)

C pixels wide R pixels long

slide-50
SLIDE 50

50

Color Histogram

Calculate percentage of color present in image Deficiency: loss of regional information

slide-51
SLIDE 51

51

Localized Features

Do color histogram for any region of the image

slide-52
SLIDE 52

52

Edge Detection: Sobel Operator

) / arctan( | | 1 2 1 1 2 1 1 1 2 2 1 1

2 2

Gx Gy Gy Gx G Gy Gx

Apply matrices Gx and Gy to any image region

slide-53
SLIDE 53

53

Texture Image Examples

  • From the VisTex Texture Database
slide-54
SLIDE 54

54 Gaussian window modulated with a complex sinusoid Gabor filters at different scales and spatial frequencies Top row shows anti-symmetric (or odd) filters, bottom row the symmetric (or even) filters Visual Cortical cells have band-pass responses very similar to Gabor filters

Gabor filters

) 2 / ) ( exp( ) 2 / ) ( exp( )) ( exp( ) , (

2 2 2 2

j K G

You do not need to remember this formula

slide-55
SLIDE 55

55

Summary

  • Main features for image recognition
  • Color
  • Edges
  • Texture