3. Feature Extraction 3.1 Feature Extraction from Speech or other - PowerPoint PPT Presentation

3. Feature Extraction

3.1 Feature Extraction from Speech … or other types of audio like music See Schukat-Talamazzini Chapter 3 2

Goal of Feature Extraction • Capture essential information about speech • Be robust against background noise • Steps: • Sampling and quantization • Short time analysis • Transform to frequency space • Filtering • Optimize class separability 3

Overview Feature Extraction Convert the continuous speech signal into a sequence of vectors Each window gives one vector The following slides will give the details of this procedure From: HTK-manual 4

Sampling and Quantization what happens when you store a signal in a computer? Measure signal periodically and store in variable Sampling rate: T Quantization: use B bits to represent signal a 2 B possible values f n : sampled values of the signal numbered using index n 5

Sampling Theorem • Reconstruction of original signal is only possible if the signals highest frequency is limited • Let f G the frequency limit 1 f G 2 T • Else: spectral aliasing that is frequencies will be confused 6

Pre-emphasis • Correct for filtering of the lips • Boosts higher frequencies • Iterative scheme: ´ f f f 1 n n n • Typical values: =0.95 What does it do for =1 7

From Signal to Spectrum: Fourier Transform • Definition ( ) m i i n ( ) F e f w e n m n n w n : window function : frequency times 2 i: imaginary unit The window cut’s the sum to a number of finite values Complex exponentials are easier than cos or sin functions 8

Example: putting a rectangular on a speech signal F ram e shi f t F ram e w i dt h t yp. : 10m s t yp. : 25m s 9

Fourier Transform in Practice • Use “Fast Fourier Transform” (FFT) • Requires number of samples N to be power of 2 (e.g. N=256) • Code available • Complexity N log( N) 10

Established Window Functions • Use to get sharper peaks R 1 w n • Rectangular window: • Generalized Hamming Window: ( =0.54 : standard 2 n w H ( 1 ) cos( ) Hamming window) n 1 N / 2 n N 2 0 . 5 ( ) • Gauss window: G 3 N / 2 w e n • Parabola window: n n w P 4 ( 1 ) n N N n=0...N-1 • Window functions vanish outside this interval 11

Rewrite of Fourier Transform • Definition: ( ) m i i n ( ) F e f w e n m n n • Window functions vanish outside the interval n=0...N-1 1 • Define 2 N n 1 N 2 i ( ) m N F f w e m n n 0 n 12

Example for ö How can you best look at multiple spectra at the same time Short time spectrum Smoothed spectrum Frequency (Hz) Frequency (Hz) 13

Spectrogram • Calculate a spectrum for any point in time • Code the local intensity: color/grey scale Time 14

Spectrogram http://www.wilhelm-kurz-software.de/dynaplot/applicationnotes/spectrogram.htm "To return to the main menu, press the star key". 15

Use praat to generate a Spectrogram • Praat: software for doing phonetics by computer • Written by: Paul Boersma and David Weenink • quite powerful: spectrograms, formants, pitch, … • Download: http://www.fon.hum.uva.nl/ praat/ 16

Use praat to generate a Spectrogram • Praat: software for doing phonetics by computer • Written by: Paul Boersma and David Weenink • quite powerful: spectrograms, formants, pitch, … • Download: http://www.fon.hum.uva.nl/ praat/ 17

Use praat to generate a Spectrogram a demo 18

Smoothing the Spectrum: filter bank • Idea: imitate ear • Do an average over neighboring frequencies • Scale the frequencies according to the Mel or the Bark scale a Reduction from 256 Fourier coefficients to 24 outputs of a filter bank 19

Example of a Filterbank 20

Filterbank • Spacing of center frequency: – According to mel scale: f ( ) 2595 log ( 1 ) Mel f 10 700 • Low frequency cut off: – E.g. 300 Hz (for telephone speech) • High frequency cut off: – E.g. 3400 Hz (for telephone speech ) • Different settings for e.g. head set connected PC How can you adjust to different vocal tracts? 21

Vocal Tract Length Normalization • Idea: • Average position of formants depends on length of vocal tract • a varying position of frequencies of filter bank • A kind of speaker adaptation 22

Vocal Tract Length Normalization: Frequency Warping -Translation table for frequencies -Keep minimum and maximum frequency unchanged min =0.8 to max =1.2 23

Training the Warping Factor • Issue: how to scale for a specific speaker • Slow version: • Use 11 different warping factors • Do speech recognition with all of them • Pick the best one • Oldest approach • Not very efficient • Improvement: 10% less recognition errors 24

From Spectrum to Cepstrum • Name: swapping of letters ( s pe c trum/cepstrum) • Useful as a preparation to remove channel distortions What are examples of channel distortions? • Cepstral mean subtraction (CMS) method to remove channel distortions 25

Definition “Cepstrum” Signal Fourier Transform Spectrum log Discrete Cosine Transform Cepstrum 26

Math for Cepstrum • e n : original signal (e.g. excitation from glotis) • f n : measured signal • h n : impulse response of channel (e.g. vocal tract, telephone, room acoustics) f h e m m n n n 27

Math for Cepstrum • Apply Fourier transform F F F { } { } f h e n m n n n • Use convolution theorem F F F { } { } { } f h e n n n 28

Math for Cepstrum • Apply logarithm F F F log( { }) log( { }) log( { }) f h e n n n • Impulse response and excitation now separated • If stationary part of impulse response h n can now be removed 29

Cepstrum: do discrete cosine transform after log • Discrete cosine transform: N 2 ( 1 / 2 ) n l ( ) ( ) m m log( ) cos( ) 1 , 2 ,... c F n n l N N 1 l You do not need to remember this formula 30

Dynamic Features • Spectrum captures local aspects of speech • Window size 25 ms • Capture slow changes in spectrum • Other name: delta features 31

Dynamic Features • Capture slow changes in spectrum 32

Dynamic Features • Calculate first and second derivatives • Naïve approach to first derivative – Continuous function ( ) ( ) ( ) df t f t t f t t 2 dt t – Time discrete sampling ( ) ( ) ( ) df t f t f t m m m 2 dt t m : m-th sample of the signal 33

Difference/Regression i-th component of feature vector Line through extremes Regression curve m-3 m-2 m-1 m m+1 m+2 m+3 Sample 34

Regression Formula M ( ( ) ( )) i f t f t m i m i ( ) df t 1 i M dt 2 2 i 1 i Can you make it agree with ( ) ( ) ( ) df t f t t f t t 2 dt t 35

Dynamic Features • Invented by Furui 1981 • Standard in any modern ASR system • Alternative: • Linear mapping of neighboring feature vectors • Issue: • Dimension of feature vectors 36

Linear Discriminant Analysis • Method to decrease size of feature vector • Maximize severability of class regions • Linear transform of feature vectors • More: later in the lecture 37

Complete Pipeline for Mel-Frequency Cepstral Coefficients (MFCC) Typical values: Sampling 16 kHz; 16 Bit quantization Pre-emphasis Signal Windowing Window size: 25 ms Fast Fourier Transform 512 Fourier Coefficients Absolute Value Mel-scaled Filterbank 24 filterbank values log keep only 20 Discrete Cosine Transform lowest cepstra Feature Vectors Dynamic Features (1. and 2. derivative) 60 dimensional vector Linear Discriminant Analysis 38

Alternative Feature Extraction Methods • LP-Cepstrum (LP=linear prediction) • Derived from speech coding • No longer much in use • PLP (=Perceptual linear prediction) • For certain applications popular • Claim: mode noise robust than MFCCs • Main change: us |.| 1/3 instead of log in MFCC 39

Summary • Classical “plain vanilla” feature extraction: Mel-Frequency Cepstral Coefficients • Main deficiency: not very noise robust • Used in • Speech Recognition • Speaker Recognition • Music genre classification 40

3.2 Feature Extraction from Image Processing 41

Overview • Feature types: • Color • Texture • Edge 42

Image 43

Physics • It’s all electromagnetic (EM) radiation • Different colors correspond to radiation of different wavelengths • Intensity of each wavelength specified by amplitude • We perceive EM radiation within the 400- 700 nm range, a tiny piece of spectrum between infra-red and ultraviolet 44

Visible Light 45

Color and Wavelength Most light we see is not just a single wavelength, but a combination of many wavelengths (see below). This profile is often referred to as a spectrum, or spectral power distribution. 46

Image Representation (RGB) 47

Image Representation (Channels) 48

Image Representation C pixels wide (r,g,b) R pixels long 49

Color Histogram Calculate percentage of color present in image Deficiency: loss of regional information 50

3. Feature Extraction 3.1 Feature Extraction from Speech or other - PowerPoint PPT Presentation

3. Feature Extraction 3.1 Feature Extraction from Speech or other types of audio like music See Schukat-Talamazzini Chapter 3 2 Goal of Feature Extraction Capture essential information about speech Be robust against background

Outline Reducing Dimensionality Feature Selection 1 Steven J Zeil Feature Extraction 2

Automated Feature Extraction Automated Feature Extraction for Object Recognition for Object

Feature Extraction 7-1 Ronald Peikert SciVis 2007 - Feature Extraction What are features?

Feature Extraction 7-1 Ronald Peikert SciVis 2008 - Feature Extraction What are features?

Reducing Dimensionality Steven J Zeil Old Dominion Univ. Fall 2010 1 Feature Selection

uf: Minimizing the Coq Extraction TCB Eric Mullen , Stuart Pernsteiner, James Wilcox, Zachary

Decision Tree Prof. Seungchul Lee Industrial AI Lab. Feature Test Feature 1 Feature 2 Feature

Feature Extraction Combining Feature Extraction Combining Spectral Noise Reduction and Spectral

AB Feature Extraction Experiments Discussion Noise Robust LVCSR Feature Extraction Based on

Object based feature extraction of Google based feature extraction of Google Object Earth

A Distinctive Feature of A Distinctive Feature of A Distinctive Feature of A Distinctive Feature

Soil Extraction Cell: An Alternative Soil Extraction Cell: An Alternative Method of Soil

Feature Extraction Aleix M. Martinez aleix@ece.osu.edu Continuous Feature Space Let us now

PCA & ICA CE-717: Machine Learning Sharif University of Technology Spring 2018 Soleymani

Earth: The Feature Presentation - feature, landscape, topography Earth: The Feature Presentation

Declarative Information Extraction Declarative Information Extraction Using Datalog Datalog with

Ngaruroro WCO Hearing Part 2 Lower Catchment Hydrology Introduction In my evidence to the

solution solution TELEMARKETING, AND 100s OF TELEMARKETING, AND 100s OF OTHER VERTICAL MARKETS

Dielectronic recombination computations of Rh-, Pd- and Ag-like W with the FAC code Bowen Li

HealthData.gov Report: Draft Recommendations NCVHS Work Group on HHS Data Use and Access

2016 Collegiate Wind Competition: Tunnel Team B Michael Evans Korey Holaas Scott Muente Jess

Yes, we know how Spesializing in mechanical and Visiting address: Lundeveien 171, 4550 Farsund

HVDC Underground Cables The Empire Connection Paul M Grant, Ph.D. Physicist & Writer,

AERODYNAMIC AND STRUCTURAL DESIGN OF A HIGH EFFICIENCY SMALL SCALE COMPOSITE VERTICAL AXIS WIND

3. Feature Extraction 3.1 Feature Extraction from Speech or other - PowerPoint PPT Presentation

3. Feature Extraction 3.1 Feature Extraction from Speech or other types of audio like music See Schukat-Talamazzini Chapter 3 2 Goal of Feature Extraction Capture essential information about speech Be robust against background

Outline Reducing Dimensionality Feature Selection 1 Steven J Zeil Feature Extraction 2

Automated Feature Extraction Automated Feature Extraction for Object Recognition for Object

Feature Extraction 7-1 Ronald Peikert SciVis 2007 - Feature Extraction What are features?

Feature Extraction 7-1 Ronald Peikert SciVis 2008 - Feature Extraction What are features?

Reducing Dimensionality Steven J Zeil Old Dominion Univ. Fall 2010 1 Feature Selection

uf: Minimizing the Coq Extraction TCB Eric Mullen , Stuart Pernsteiner, James Wilcox, Zachary

Decision Tree Prof. Seungchul Lee Industrial AI Lab. Feature Test Feature 1 Feature 2 Feature

Feature Extraction Combining Feature Extraction Combining Spectral Noise Reduction and Spectral

AB Feature Extraction Experiments Discussion Noise Robust LVCSR Feature Extraction Based on

Object based feature extraction of Google based feature extraction of Google Object Earth

A Distinctive Feature of A Distinctive Feature of A Distinctive Feature of A Distinctive Feature

Soil Extraction Cell: An Alternative Soil Extraction Cell: An Alternative Method of Soil

Feature Extraction Aleix M. Martinez aleix@ece.osu.edu Continuous Feature Space Let us now

PCA &amp; ICA CE-717: Machine Learning Sharif University of Technology Spring 2018 Soleymani

Earth: The Feature Presentation - feature, landscape, topography Earth: The Feature Presentation

Declarative Information Extraction Declarative Information Extraction Using Datalog Datalog with

Ngaruroro WCO Hearing Part 2 Lower Catchment Hydrology Introduction In my evidence to the

solution solution TELEMARKETING, AND 100s OF TELEMARKETING, AND 100s OF OTHER VERTICAL MARKETS

Dielectronic recombination computations of Rh-, Pd- and Ag-like W with the FAC code Bowen Li

HealthData.gov Report: Draft Recommendations NCVHS Work Group on HHS Data Use and Access

2016 Collegiate Wind Competition: Tunnel Team B Michael Evans Korey Holaas Scott Muente Jess

Yes, we know how Spesializing in mechanical and Visiting address: Lundeveien 171, 4550 Farsund

HVDC Underground Cables The Empire Connection Paul M Grant, Ph.D. Physicist &amp; Writer,

AERODYNAMIC AND STRUCTURAL DESIGN OF A HIGH EFFICIENCY SMALL SCALE COMPOSITE VERTICAL AXIS WIND

PCA & ICA CE-717: Machine Learning Sharif University of Technology Spring 2018 Soleymani

HVDC Underground Cables The Empire Connection Paul M Grant, Ph.D. Physicist & Writer,