DCASE 2016 CONVOLUTIONAL NEURAL NETWORKS FOR ACOUSTIC SCENE - - PowerPoint PPT Presentation

dcase 2016
SMART_READER_LITE
LIVE PREVIEW

DCASE 2016 CONVOLUTIONAL NEURAL NETWORKS FOR ACOUSTIC SCENE - - PowerPoint PPT Presentation

DCASE 2016 CONVOLUTIONAL NEURAL NETWORKS FOR ACOUSTIC SCENE CLASSIFICATION Michele Valenti 1 (valenti.michele.w@gmail.com), Aleksandr Diment 2 , Giambattista Parascandolo 2 , Stefano Squartini 1 , Tuomas Virtanen 2 1 Universit Politecnica delle


slide-1
SLIDE 1

DCASE 2016

CONVOLUTIONAL NEURAL NETWORKS FOR ACOUSTIC SCENE CLASSIFICATION

Michele Valenti1 (valenti.michele.w@gmail.com), Aleksandr Diment2, Giambattista Parascandolo2, Stefano Squartini1, Tuomas Virtanen2

1Università Politecnica delle Marche, Italy 2Tampere University of T

echnology, Finland

slide-2
SLIDE 2

DCASE 2016

CONVOLUTIONAL NEURAL NETWORKS FOR ACOUSTIC SCENE CLASSIFICATION

Michele Valenti1 (valenti.michele.w@gmail.com), Aleksandr Diment2, Giambattista Parascandolo2, Stefano Squartini1, Tuomas Virtanen2

1Università Politecnica delle Marche, Italy 2Tampere University of T

echnology, Finland

slide-3
SLIDE 3

Outline

  • Introduction
  • Our system
  • Training modes
  • Results
  • Challenge ranking
slide-4
SLIDE 4

Introduction

What is “acoustic scene classification”?

slide-5
SLIDE 5

Introduction

Forest path Car Home Audio

What is “acoustic scene classification”?

slide-6
SLIDE 6

Our system

Audio Feature extraction Sequence splitting Label CNN Scores averaging Overview

slide-7
SLIDE 7

Our system

Log-mel spectrogram Raw audio Features

Audio Features

slide-8
SLIDE 8

Our system

Log-mel spectrogram Raw audio segment Sequence Sequence splitting

Sequence splitting Features

slide-9
SLIDE 9

Our system

Sequence Convolutional neural network

slide-10
SLIDE 10

Our system

Sequence Feature maps 128 Convolutional neural network

CNN Sequences

slide-11
SLIDE 11

Our system

Sequence Feature maps 128 Convolutional neural network

CNN Sequences

Batch normalization

slide-12
SLIDE 12

Our system

Sequence 128 128 Feature maps Subsampled feature maps Convolutional neural network

CNN Sequences

slide-13
SLIDE 13

Our system

Feature maps Sequence Subsampled feature maps 128 128 256 New feature maps Convolutional neural network

CNN Sequences

slide-14
SLIDE 14

Our system

Sequence 256 128 128 Feature maps Subsampled feature maps New feature maps

Time shrinking

Convolutional neural network

CNN Sequences

slide-15
SLIDE 15

Our system

Sequence 256 128 128

Flattening

Feature maps Subsampled feature maps New feature maps Convolutional neural network

CNN Sequences

slide-16
SLIDE 16

Our system

Sequence 256 128 128

Fully-connected softmax layer

Feature maps Subsampled feature maps New feature maps Convolutional neural network

CNN Sequences

slide-17
SLIDE 17

Our system

Sequence 256 128 128 Feature maps Subsampled feature maps New feature maps Convolutional neural network

CNN Sequences

slide-18
SLIDE 18

Our system

Scores averaging

Class prediction scores

Scores averaging Prediction scores

slide-19
SLIDE 19

Our system

Scores averaging

Class prediction scores File’s class

! " Σ

argmax

Scores averaging Prediction scores

slide-20
SLIDE 20

T raining

slide-21
SLIDE 21

T raining

Cross-validation setup

T est Training + validation T est T est T est Fold 1 Fold 2 Fold 3 Fold 4

slide-22
SLIDE 22

T raining

Non-full training Training Validation

T raining + validation T est Fold n

slide-23
SLIDE 23

T raining

Non-full training Training Validation

T raining + validation T est Fold n

Non-full training

slide-24
SLIDE 24

T raining

Non-full training Training Validation Accuracies Epochs Training

T raining + validation T est Fold n

Validation

slide-25
SLIDE 25

T raining

Non-full training Training Validation Accuracies Epochs Training

T raining + validation T est Fold n

Validation

Convergence time

slide-26
SLIDE 26

T raining

Non-full training Training Validation

T raining + validation T est Fold n

Training

slide-27
SLIDE 27

T raining

Non-full training Training Validation

T raining + validation T est Fold n

Training

Full training

slide-28
SLIDE 28

Results

T est Training + validation T est T est T est Fold 1 Fold 2 Fold 3 Fold 4

Test data

slide-29
SLIDE 29

Results

Sequence length

65 70 75 80 0,5 1,5 3 5 10 30

Accuracy (%) Sequence length (s) Non-full training Full training

slide-30
SLIDE 30

Results

Sequence length

65 70 75 80 0,5 1,5 3 5 10 30

Accuracy (%) Sequence length (s) Non-full training Full training

slide-31
SLIDE 31

Results

Sequence length

65 70 75 80 0,5 1,5 3 5 10 30

Accuracy (%) Sequence length (s) Non-full training Full training

slide-32
SLIDE 32

Results

Class accuracies

Class Accuracy (%) Beach 75.6 Bus 76.9 Café/Restaurant 74.4 Car 91.0 City center 93.6 Forest path 96.2 Grocery store 88.5 Home 80.8 Class Accuracy (%) Library 66.6 Metro station 96.2 Office 97.4 Park 59.0 Residential area 73.1 T rain 46.2 T ram 78.2

slide-33
SLIDE 33

Class Accuracy (%) Beach 75.6 Bus 76.9 Café/Restaurant 74.4 Car 91.0 City center 93.6 Forest path 96.2 Grocery store 88.5 Home 80.8 Class Accuracy (%) Library 66.6 Metro station 96.2 Office 97.4 Park 59.0 Residential area 73.1 Train 46.2 Tram 78.2

Results

Class accuracies

34.6% Residential area 29.5% Bus

slide-34
SLIDE 34

Results

Other classifiers

System Sequence length (s) Accuracy (%) Non-full training Full training Baseline GMM (MFCC)

  • 72.6

T wo-layer CNN (MFCC) 5 67.7 72.6 T wo-layer MLP (log-mel)

  • 66.6

69.3 One-layer CNN (log-mel) 3 70.3 74.8 Two-layer CNN (log-mel) 3 75.9 79.0

slide-35
SLIDE 35

Challenge ranking

Final training

Training + validation + test Secret challenge data Extended training set Evaluation set

slide-36
SLIDE 36

Challenge ranking

Final training

Training + validation + test Extended training set Evaluation set New training New validation Secret challenge data

slide-37
SLIDE 37

Challenge ranking

Final training

Training + validation + test Extended training set Evaluation set New training New validation Secret challenge data

400 epochs convergence

slide-38
SLIDE 38

Challenge ranking

Final training

Training + validation + test Extended training set Evaluation set Final training for 400 epochs Secret challenge data

slide-39
SLIDE 39

Challenge ranking

89,7 88,7 87,7 87,2 86,4 86,4 86,2 85,9 85,6 85,4 84,6 84,1 77,2 62,8 10 20 30 40 50 60 70 80 90 100

slide-40
SLIDE 40

DCASE 2016

CONVOLUTIONAL NEURAL NETWORKS FOR ACOUSTIC SCENE CLASSIFICATION

Michele Valenti1 (valenti.michele.w@gmail.com), Aleksandr Diment2, Giambattista Parascandolo2, Stefano Squartini1, Tuomas Virtanen2

1Università Politecnica delle Marche, Italy 2Tampere University of T

echnology, Finland

slide-41
SLIDE 41

Results

Feature comparison

System Sequence length (s) Accuracy (%) Non-full training Full training T wo-layer CNN (MFCC) 5 67.7 72.6 T wo-layer CNN (log-mel) 5 74.1 78.3