3 feature extraction
play

3. Feature Extraction 3.1 Feature Extraction from Speech or other - PowerPoint PPT Presentation

3. Feature Extraction 3.1 Feature Extraction from Speech or other types of audio like music See Schukat-Talamazzini Chapter 3 2 Goal of Feature Extraction Capture essential information about speech Be robust against background


  1. 3. Feature Extraction

  2. 3.1 Feature Extraction from Speech … or other types of audio like music See Schukat-Talamazzini Chapter 3 2

  3. Goal of Feature Extraction • Capture essential information about speech • Be robust against background noise • Steps: • Sampling and quantization • Short time analysis • Transform to frequency space • Filtering • Optimize class separability 3

  4. Overview Feature Extraction Convert the continuous speech signal into a sequence of vectors Each window gives one vector The following slides will give the details of this procedure From: HTK-manual 4

  5. Sampling and Quantization what happens when you store a signal in a computer? Measure signal periodically and store in variable Sampling rate: T Quantization: use B bits to represent signal a 2 B possible values f n : sampled values of the signal numbered using index n 5

  6. Sampling Theorem • Reconstruction of original signal is only possible if the signals highest frequency is limited • Let f G the frequency limit 1 f G 2 T • Else: spectral aliasing that is frequencies will be confused 6

  7. Pre-emphasis • Correct for filtering of the lips • Boosts higher frequencies • Iterative scheme: ´ f f f 1 n n n • Typical values: =0.95 What does it do for =1 7

  8. From Signal to Spectrum: Fourier Transform • Definition ( ) m i i n ( ) F e f w e n m n n w n : window function : frequency times 2 i: imaginary unit The window cut’s the sum to a number of finite values Complex exponentials are easier than cos or sin functions 8

  9. Example: putting a rectangular on a speech signal F ram e shi f t F ram e w i dt h t yp. : 10m s t yp. : 25m s 9

  10. Fourier Transform in Practice • Use “Fast Fourier Transform” (FFT) • Requires number of samples N to be power of 2 (e.g. N=256) • Code available • Complexity N log( N) 10

  11. Established Window Functions • Use to get sharper peaks R 1 w n • Rectangular window: • Generalized Hamming Window: ( =0.54 : standard 2 n w H ( 1 ) cos( ) Hamming window) n 1 N / 2 n N 2 0 . 5 ( ) • Gauss window: G 3 N / 2 w e n • Parabola window: n n w P 4 ( 1 ) n N N n=0...N-1 • Window functions vanish outside this interval 11

  12. Rewrite of Fourier Transform • Definition: ( ) m i i n ( ) F e f w e n m n n • Window functions vanish outside the interval n=0...N-1 1 • Define 2 N n 1 N 2 i ( ) m N F f w e m n n 0 n 12

  13. Example for ö How can you best look at multiple spectra at the same time Short time spectrum Smoothed spectrum Frequency (Hz) Frequency (Hz) 13

  14. Spectrogram • Calculate a spectrum for any point in time • Code the local intensity: color/grey scale Time 14

  15. Spectrogram http://www.wilhelm-kurz-software.de/dynaplot/applicationnotes/spectrogram.htm "To return to the main menu, press the star key". 15

  16. Use praat to generate a Spectrogram • Praat: software for doing phonetics by computer • Written by: Paul Boersma and David Weenink • quite powerful: spectrograms, formants, pitch, … • Download: http://www.fon.hum.uva.nl/ praat/ 16

  17. Use praat to generate a Spectrogram • Praat: software for doing phonetics by computer • Written by: Paul Boersma and David Weenink • quite powerful: spectrograms, formants, pitch, … • Download: http://www.fon.hum.uva.nl/ praat/ 17

  18. Use praat to generate a Spectrogram a demo 18

  19. Smoothing the Spectrum: filter bank • Idea: imitate ear • Do an average over neighboring frequencies • Scale the frequencies according to the Mel or the Bark scale a Reduction from 256 Fourier coefficients to 24 outputs of a filter bank 19

  20. Example of a Filterbank 20

  21. Filterbank • Spacing of center frequency: – According to mel scale: f ( ) 2595 log ( 1 ) Mel f 10 700 • Low frequency cut off: – E.g. 300 Hz (for telephone speech) • High frequency cut off: – E.g. 3400 Hz (for telephone speech ) • Different settings for e.g. head set connected PC How can you adjust to different vocal tracts? 21

  22. Vocal Tract Length Normalization • Idea: • Average position of formants depends on length of vocal tract • a varying position of frequencies of filter bank • A kind of speaker adaptation 22

  23. Vocal Tract Length Normalization: Frequency Warping -Translation table for frequencies -Keep minimum and maximum frequency unchanged min =0.8 to max =1.2 23

  24. Training the Warping Factor • Issue: how to scale for a specific speaker • Slow version: • Use 11 different warping factors • Do speech recognition with all of them • Pick the best one • Oldest approach • Not very efficient • Improvement: 10% less recognition errors 24

  25. From Spectrum to Cepstrum • Name: swapping of letters ( s pe c trum/cepstrum) • Useful as a preparation to remove channel distortions What are examples of channel distortions? • Cepstral mean subtraction (CMS) method to remove channel distortions 25

  26. Definition “Cepstrum” Signal Fourier Transform Spectrum log Discrete Cosine Transform Cepstrum 26

  27. Math for Cepstrum • e n : original signal (e.g. excitation from glotis) • f n : measured signal • h n : impulse response of channel (e.g. vocal tract, telephone, room acoustics) f h e m m n n n 27

  28. Math for Cepstrum • Apply Fourier transform F F F { } { } f h e n m n n n • Use convolution theorem F F F { } { } { } f h e n n n 28

  29. Math for Cepstrum • Apply logarithm F F F log( { }) log( { }) log( { }) f h e n n n • Impulse response and excitation now separated • If stationary part of impulse response h n can now be removed 29

  30. Cepstrum: do discrete cosine transform after log • Discrete cosine transform: N 2 ( 1 / 2 ) n l ( ) ( ) m m log( ) cos( ) 1 , 2 ,... c F n n l N N 1 l You do not need to remember this formula 30

  31. Dynamic Features • Spectrum captures local aspects of speech • Window size 25 ms • Capture slow changes in spectrum • Other name: delta features 31

  32. Dynamic Features • Capture slow changes in spectrum 32

  33. Dynamic Features • Calculate first and second derivatives • Naïve approach to first derivative – Continuous function ( ) ( ) ( ) df t f t t f t t 2 dt t – Time discrete sampling ( ) ( ) ( ) df t f t f t m m m 2 dt t m : m-th sample of the signal 33

  34. Difference/Regression i-th component of feature vector Line through extremes Regression curve m-3 m-2 m-1 m m+1 m+2 m+3 Sample 34

  35. Regression Formula M ( ( ) ( )) i f t f t m i m i ( ) df t 1 i M dt 2 2 i 1 i Can you make it agree with ( ) ( ) ( ) df t f t t f t t 2 dt t 35

  36. Dynamic Features • Invented by Furui 1981 • Standard in any modern ASR system • Alternative: • Linear mapping of neighboring feature vectors • Issue: • Dimension of feature vectors 36

  37. Linear Discriminant Analysis • Method to decrease size of feature vector • Maximize severability of class regions • Linear transform of feature vectors • More: later in the lecture 37

  38. Complete Pipeline for Mel-Frequency Cepstral Coefficients (MFCC) Typical values: Sampling 16 kHz; 16 Bit quantization Pre-emphasis Signal Windowing Window size: 25 ms Fast Fourier Transform 512 Fourier Coefficients Absolute Value Mel-scaled Filterbank 24 filterbank values log keep only 20 Discrete Cosine Transform lowest cepstra Feature Vectors Dynamic Features (1. and 2. derivative) 60 dimensional vector Linear Discriminant Analysis 38

  39. Alternative Feature Extraction Methods • LP-Cepstrum (LP=linear prediction) • Derived from speech coding • No longer much in use • PLP (=Perceptual linear prediction) • For certain applications popular • Claim: mode noise robust than MFCCs • Main change: us |.| 1/3 instead of log in MFCC 39

  40. Summary • Classical “plain vanilla” feature extraction: Mel-Frequency Cepstral Coefficients • Main deficiency: not very noise robust • Used in • Speech Recognition • Speaker Recognition • Music genre classification 40

  41. 3.2 Feature Extraction from Image Processing 41

  42. Overview • Feature types: • Color • Texture • Edge 42

  43. Image 43

  44. Physics • It’s all electromagnetic (EM) radiation • Different colors correspond to radiation of different wavelengths • Intensity of each wavelength specified by amplitude • We perceive EM radiation within the 400- 700 nm range, a tiny piece of spectrum between infra-red and ultraviolet 44

  45. Visible Light 45

  46. Color and Wavelength Most light we see is not just a single wavelength, but a combination of many wavelengths (see below). This profile is often referred to as a spectrum, or spectral power distribution. 46

  47. Image Representation (RGB) 47

  48. Image Representation (Channels) 48

  49. Image Representation C pixels wide (r,g,b) R pixels long 49

  50. Color Histogram Calculate percentage of color present in image Deficiency: loss of regional information 50

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend