E9 205 Machine Learning for Signal Procesing Deep Learning for Audio - - PowerPoint PPT Presentation

e9 205 machine learning for signal procesing
SMART_READER_LITE
LIVE PREVIEW

E9 205 Machine Learning for Signal Procesing Deep Learning for Audio - - PowerPoint PPT Presentation

E9 205 Machine Learning for Signal Procesing Deep Learning for Audio and Vision 20-11-2019 Speech Recognition Noise Channel Automatic Speech Systems Courtesy Google Images Signal Modeling Short-term spectra integrated in mel frequency


slide-1
SLIDE 1

E9 205 Machine Learning for Signal Procesing

20-11-2019

Deep Learning for Audio and Vision

slide-2
SLIDE 2

Speech Recognition

Noise Channel Automatic Speech Systems

Courtesy – Google Images

slide-3
SLIDE 3

Short-term spectra integrated in mel frequency bands followed by log compression + DCT – mel frequency cepstral coefficients (MFCC) [Davis and Mermelstein, 1979].

Signal Modeling

Short-term Spectrum Integration + Log + DCT 25ms

slide-4
SLIDE 4

Mel Frequency Cepstral Coefficients

MFCC processing repeated for every short-term frame yielding a sequence of features. Typically 25ms frames with 10ms hop in time.

slide-5
SLIDE 5
  • Map the features to phone class. Using phone labelled data.

Speech Recognition

/w/ /^/ /n/ w - |^| n Triphone Classes

  • Classical machine learning - train a classifier on speech training data

that maps to the target phoneme class.

slide-6
SLIDE 6

Back to Speech Recognition

Mapping Speech Features to Phonemes

slide-7
SLIDE 7

Back to Speech Recognition

Mapping Speech Features to Phonemes to words

Language Model [Dictionary of Words Pronunciation Model Word Syntax] Decoded Text

slide-8
SLIDE 8

State of Progress

2018 5.3%

Claims of human parity using BLSTM based Models !!!

slide-9
SLIDE 9

Moving to End-to-End

Audio Features Text Output

slide-10
SLIDE 10

Image Processing

slide-11
SLIDE 11

Visual Graphics Group Network

slide-12
SLIDE 12

ImageNet Task

1000 images in each of 1000 categories. In all, there are roughly 1.2 million training images, 50,000 validation images, and 150,000 testing

  • images. ImageNet consists of variable-resolution images. Therefore, the

images have been down-sampled to a fixed resolution of 224×224.

slide-13
SLIDE 13

Can we go deeper

slide-14
SLIDE 14
slide-15
SLIDE 15

Residual Blocks

slide-16
SLIDE 16

Deep Networks with Residual Blocks

slide-17
SLIDE 17

Deep Networks with Residual Blocks

slide-18
SLIDE 18

Results with ResNet

slide-19
SLIDE 19

Image Segmentation

slide-20
SLIDE 20
slide-21
SLIDE 21

The Problem of Segmentation

slide-22
SLIDE 22

SegNet Architecture

slide-23
SLIDE 23

Results from Segnet

slide-24
SLIDE 24
slide-25
SLIDE 25

U-net

slide-26
SLIDE 26

Summary of the Course

slide-27
SLIDE 27

Distribution Pie Chart

55% 45% Generative Modeling and Dimensionality Reduction Discriminative Modeling

slide-28
SLIDE 28

Generative Modeling and Dimensionality Reduction

15% 15% 8% 31% 15% 15%

Feature Processing PCA/LDA Gaussian and GMM NMF Linear and Logistic Regression kernel methods

slide-29
SLIDE 29

Discriminative Modeling

11% 17% 6% 6% 6% 6% 11% 11% 11% 17%

SVM Neural Networks Improving Learning Improving Generalization Deep Networks

  • Conv. Networks

RNNs Understanding DNNs Deep Generative Modeling Applications

slide-30
SLIDE 30

When we started …

slide-31
SLIDE 31

Dates of Various Rituals

❖ 5 Assignments spread over 3 months (roughly one assignment every two

weeks).

❖ September 1st week - project topic announcements. ❖ September 3rd week - 1st Midterm ❖ September 4th week - project topic and team finalization and proposal

  • submission. [1 and 2 person teams].

❖ October 1st week - Project Proposal ❖ October 3rd week - 2nd MidTerm ❖ November 1st week - Project MidTerm Presentations. ❖ December 1st week - Final Exams ❖ December 2nd week - Project Final Presentations.

slide-32
SLIDE 32

Content Delivery

Implementation and Understanding

Theory and Mathematical Foundation Intuition and Analysis

In Class Beyond Class