E9 205 Machine Learning for Signal Procesing Deep Learning for Audio - - PowerPoint PPT Presentation

▶

Sep 05, 2022 227 likes •558 views

E9 205 Machine Learning for Signal Procesing Deep Learning for Audio and Vision 20-11-2019 Speech Recognition Noise Channel Automatic Speech Systems Courtesy Google Images Signal Modeling Short-term spectra integrated in mel frequency

SLIDE 1

E9 205 Machine Learning for Signal Procesing

20-11-2019

Deep Learning for Audio and Vision

SLIDE 2

Speech Recognition

Noise Channel Automatic Speech Systems

Courtesy – Google Images

SLIDE 3

▪

Short-term spectra integrated in mel frequency bands followed by log compression + DCT – mel frequency cepstral coefficients (MFCC) [Davis and Mermelstein, 1979].

Signal Modeling

Short-term Spectrum Integration + Log + DCT 25ms

SLIDE 4

Mel Frequency Cepstral Coefficients

▪

MFCC processing repeated for every short-term frame yielding a sequence of features. Typically 25ms frames with 10ms hop in time.

SLIDE 5

Map the features to phone class. Using phone labelled data.

Speech Recognition

/w/ /^/ /n/ w - |^| n Triphone Classes

Classical machine learning - train a classifier on speech training data

that maps to the target phoneme class.

SLIDE 6

Back to Speech Recognition

Mapping Speech Features to Phonemes

SLIDE 7

Back to Speech Recognition

Mapping Speech Features to Phonemes to words

Language Model [Dictionary of Words Pronunciation Model Word Syntax] Decoded Text

SLIDE 8

State of Progress

2018 5.3%

Claims of human parity using BLSTM based Models !!!

SLIDE 9

Moving to End-to-End

Audio Features Text Output

SLIDE 10

Image Processing

SLIDE 11

Visual Graphics Group Network

SLIDE 12

ImageNet Task

1000 images in each of 1000 categories. In all, there are roughly 1.2 million training images, 50,000 validation images, and 150,000 testing

images. ImageNet consists of variable-resolution images. Therefore, the

images have been down-sampled to a fixed resolution of 224×224.

SLIDE 13

Can we go deeper

SLIDE 14

SLIDE 15

Residual Blocks

SLIDE 16

Deep Networks with Residual Blocks

SLIDE 17

Deep Networks with Residual Blocks

SLIDE 18

Results with ResNet

SLIDE 19

Image Segmentation

SLIDE 20

SLIDE 21

The Problem of Segmentation

SLIDE 22

SegNet Architecture

SLIDE 23

Results from Segnet

SLIDE 24

SLIDE 25

U-net

SLIDE 26

Summary of the Course

SLIDE 27

Distribution Pie Chart

55% 45% Generative Modeling and Dimensionality Reduction Discriminative Modeling

SLIDE 28

Generative Modeling and Dimensionality Reduction

15% 15% 8% 31% 15% 15%

Feature Processing PCA/LDA Gaussian and GMM NMF Linear and Logistic Regression kernel methods

SLIDE 29

Discriminative Modeling

11% 17% 6% 6% 6% 6% 11% 11% 11% 17%

SVM Neural Networks Improving Learning Improving Generalization Deep Networks

Conv. Networks

RNNs Understanding DNNs Deep Generative Modeling Applications

SLIDE 30

When we started …

SLIDE 31

Dates of Various Rituals

❖ 5 Assignments spread over 3 months (roughly one assignment every two

weeks).

❖ September 1st week - project topic announcements. ❖ September 3rd week - 1st Midterm ❖ September 4th week - project topic and team finalization and proposal

submission. [1 and 2 person teams].

❖ October 1st week - Project Proposal ❖ October 3rd week - 2nd MidTerm ❖ November 1st week - Project MidTerm Presentations. ❖ December 1st week - Final Exams ❖ December 2nd week - Project Final Presentations.

SLIDE 32

Content Delivery

Implementation and Understanding

Theory and Mathematical Foundation Intuition and Analysis

Speech Recognition

Signal Modeling

Mel Frequency Cepstral Coefficients

Speech Recognition

Back to Speech Recognition

Mapping Speech Features to Phonemes

Back to Speech Recognition

Mapping Speech Features to Phonemes to words

State of Progress

Moving to End-to-End

Audio Features Text Output

Image Processing

Visual Graphics Group Network

ImageNet Task

Can we go deeper

Residual Blocks

Deep Networks with Residual Blocks

Deep Networks with Residual Blocks

Results with ResNet

Image Segmentation

The Problem of Segmentation

SegNet Architecture

Results from Segnet

U-net

Summary of the Course

Distribution Pie Chart

55% 45% Generative Modeling and Dimensionality Reduction Discriminative Modeling

Generative Modeling and Dimensionality Reduction

15% 15% 8% 31% 15% 15%

Discriminative Modeling

11% 17% 6% 6% 6% 6% 11% 11% 11% 17%

When we started …

Dates of Various Rituals

Content Delivery

In Class Beyond Class