fo for Dia ialect Classification of f Sp Spectrogram Im Images - - PowerPoint PPT Presentation

fo for dia ialect classification of f
SMART_READER_LITE
LIVE PREVIEW

fo for Dia ialect Classification of f Sp Spectrogram Im Images - - PowerPoint PPT Presentation

www.intelligentvoice.com Deep Convolution Neural Networks fo for Dia ialect Classification of f Sp Spectrogram Im Images Nigel Cannings Chase Information Technology Services Limited 1 www.intelligentvoice.com Convolution Networks: Brief


slide-1
SLIDE 1

www.intelligentvoice.com

Deep Convolution Neural Networks fo for Dia ialect Classification of f Sp Spectrogram Im Images

Nigel Cannings

Chase Information Technology Services Limited 1

slide-2
SLIDE 2

www.intelligentvoice.com

Convolution Networks: Brief History

  • Inspired from receptive fields

in the visual cortex

  • Notable Implementations:
  • Fukushima’s

NeoCognitron (1980)

  • Explicit parallel

implementations (1988)

  • LeCun’s LeNet-5 (1998)
  • Ciresan’s GPU

Implementation (2011)

  • GoogLeNet (2014)

2

Fukushima, Kunihiko, ‘Neocognitron: A Self-organizing Neural Network Model for a Mechanism of Pattern Recognition Unaffected by Shift in Position,’ Biological Cybernetics 36 (4): 193-202, 1980 LeNet 5 (1998), image source: http://yann.lecun.com/exdb/lenet/

slide-3
SLIDE 3

www.intelligentvoice.com

Deep Le Learning

3

  • Sigmoidal activation functions have now

been largely replaced with rectified linear units (ReLU)

  • ‘Vanishing error’ problem (Hochreiter,

1991) doesn’t exist with ReLU

  • Now we can do `deep’ learning i.e.

networks with more than 2 hidden layers

  • This discovery and GPU computing has

resulted in much recent activity in the Neural Network community

slide-4
SLIDE 4

www.intelligentvoice.com

GoogLeNet

  • State of the Art winner of the

ImageNet 2014 competition: classifying 1.2M images into 1K classes

  • Convolution neural network inspired

by LeCun’s LeNet-5

  • Has 9 ‘Inception’ modules, multiple

convolution sizes, and pooling in each module

  • Stochastic Gradient Descent used to

train the network with ‘dropout’ which helps prevents overfitting

4 Szegedy, ‘Going deeper with convolutions,’ arXiv, 2014

slide-5
SLIDE 5

www.intelligentvoice.com

GoogLeNet St Structure

Topology consists of ‘Inception’ modules consisting of:

  • Convolutions – Filters for extracting

features, filter size tends to be small in the early layers, bigger in later layers

  • Pooling – dimensionality reduction
  • Softmax loss for predicting classes at 3

progressive stages of the network

  • Other – concatenations for combining

convolutions

‘Rinse and Repeat’ 9 times

5

slide-6
SLIDE 6

www.intelligentvoice.com

NIS IST LR LRE Competition

  • 6 Language clusters, 20 dialects:
  • Ara

rabic ic (Egyptian, Iraqi, Levantine, Maghrebi, Modern Standard)

  • Ch

Chin inese (Cantonese, Mandarin, Min, Wu)

  • Englis

glish (British, General American, Indian)

  • Fre

rench (West African, Haitian Creole)

  • Ib

Iberia ian (Caribbean Spanish, European Spanish, Latin American Spanish, Brazilian Portuguese)

  • Sla

lavic ic (Polish, Russian)

  • 500+ hours of speech data
  • Data set very unbalanced

6

2015 NIST Language Recognition Evaluation, http://www.nist.gov/itl/iad/lre15.cfm

slide-7
SLIDE 7

www.intelligentvoice.com

RASTA 12 MATLAB RASTA

Spectrogram Convolution Network

  • Based on Nvidia’s Digits implementation of

GoogLeNet

  • Converted speech to 256x256 pixel spectrograms
  • Tried different spectral representations and coding…

7

SOX PYTHON

slide-8
SLIDE 8

www.intelligentvoice.com 8

GoogLeNet Processing

slide-9
SLIDE 9

www.intelligentvoice.com 9

GoogLeNet Processing

slide-10
SLIDE 10

www.intelligentvoice.com 10

GoogLeNet Processing

Dat Database: 501248 spectrograms for training 24352 spectrograms for validation 51501 spectrograms for testing

slide-11
SLIDE 11

www.intelligentvoice.com 11

GoogLeNet Processing

Dat Database: 501248 spectrograms for training 24352 spectrograms for validation 51501 spectrograms for testing Apply convolutions to extract primitives such as edges

slide-12
SLIDE 12

www.intelligentvoice.com 12

GoogLeNet Processing

Dat Database: 501248 spectrograms for training 24352 spectrograms for validation 51501 spectrograms for testing Apply convolutions to extract primitives such as edges Object parts extracted

slide-13
SLIDE 13

www.intelligentvoice.com 13

GoogLeNet Processing

Dat Database: 501248 spectrograms for training 24352 spectrograms for validation 51501 spectrograms for testing Apply convolutions to extract primitives such as edges Object parts extracted Full Spectral Features, e.g. phones, words

slide-14
SLIDE 14

www.intelligentvoice.com 14

GoogLeNet Processing

Dat Database: 501248 spectrograms for training 24352 spectrograms for validation 51501 spectrograms for testing Apply convolutions to extract primitives such as edges Object parts extracted Full Spectral Features, e.g. phones, words Refinement

  • f accuracy
slide-15
SLIDE 15

www.intelligentvoice.com 15

GoogLeNet Processing

Dat Database: 501248 spectrograms for training 24352 spectrograms for validation 51501 spectrograms for testing Apply convolutions to extract primitives such as edges Object parts extracted Full Spectral Features, e.g. phones, words Refinement

  • f accuracy

Dial Dialect Clas lassi sifi fication Loss1 Loss2 Loss3

slide-16
SLIDE 16

www.intelligentvoice.com

Pre reliminary Results

16

20 40 60 80 100 Arabic-Leventine French-Haitian Slavic-Polish Chinese-Wu French-West_African English-American Arabic-Iraqi Chinese-Mandarin Arabic-Maghrebi Slavic-Russian Spanish-Caribbean English-British Arabic-Egyptian Chinese-Cantonese Arabic-Modern_Standard Chinese-Min_Dong Spanish-European Spanish-… Portuguese-Brazilian English-South_Asian_(Indian)

  • Accuracy – 83.99 (Top-1), 98.89% (Top-5)
slide-17
SLIDE 17

www.intelligentvoice.com

Still to be investigated…

  • Many of the scaling, cropping, rotating of images

common in image classification to balance data and improve generalisation is not appropriate for spectrograms

  • Dynamic frequency warping techniques to balance the

data sets and improve generalisation

  • Taxonomy of languages investigation of the similarity
  • f classification results across dialects
  • David Cameron – Arabic?

17

slide-18
SLIDE 18

www.intelligentvoice.com

Questions Th Thank you

18