Learning Sound Event Classifiers from Web Audio with Noisy Labels
Eduardo Fonseca1, Manoj Plakal2, Daniel P. W. Ellis2, Frederic Font1, Xavier Favory1, and Xavier Serra1
1 2
Learning Sound Event Classifiers from Web Audio with Noisy Labels - - PowerPoint PPT Presentation
Learning Sound Event Classifiers from Web Audio with Noisy Labels Eduardo Fonseca 1 , Manoj Plakal 2 , Daniel P. W. Ellis 2 , Frederic Font 1 , Xavier Favory 1 , and Xavier Serra 1 1 2 Label noise in sound event classification Labels that fail
1 2
2
3
4
5
1.
FSDnoisy18k: a dataset to foster label noise research
6
1.
FSDnoisy18k: a dataset to foster label noise research
2.
CNN baseline system
3.
Evaluation of noise-robust loss functions
7
8
⇀
Audio content & metadata (tags)
⇀
20 classes (labels)
9
⇀
Audio content & metadata (tags)
⇀
20 classes (labels)
10
⇀
Audio content & metadata (tags)
⇀
20 classes (labels)
11
12
13
14
15
Observed label from the vocabulary: Acoustic guitar / Bass guitar / Clapping / Coin (dropping) / Crash cymbal / Dishes, pots, and pans / Engine / Fart / Fire / Fireworks / Glass / Hi-hat / Piano / Rain / Slam / Squeak / Tearing / Walk, footsteps / Wind / Writing
16
True label from the vocabulary: Acoustic guitar / Bass guitar / Clapping / Coin (dropping) / Crash cymbal / Dishes, pots, and pans / Engine / Fart / Fire / Fireworks / Glass / Hi-hat / Piano / Rain / Slam / Squeak / Tearing / Walk, footsteps / Wind / Writing
17
Observed label from the vocabulary: Acoustic guitar / Bass guitar / Clapping / Coin (dropping) / Crash cymbal / Dishes, pots, and pans / Engine / Fart / Fire / Fireworks / Glass / Hi-hat / Piano / Rain / Slam / Squeak / Tearing / Walk, footsteps / Wind / Writing
Missing labels: male speech / laughter / children shouting / chirp, tweet / chatter
18
True label from the vocabulary: Acoustic guitar / Bass guitar / Clapping / Coin (dropping) / Crash cymbal / Dishes, pots, and pans / Engine / Fart / Fire / Fireworks / Glass / Hi-hat / Piano / Rain / Slam / Squeak / Tearing / Walk, footsteps / Wind / Writing
19
Observed label from the vocabulary Acoustic guitar / Bass guitar / Clapping / Coin (dropping) / Crash cymbal / Dishes, pots, and pans / Engine / Fart / Fire / Fireworks / Glass / Hi-hat / Piano / Rain / Slam / Squeak / Tearing / Walk, footsteps / Wind / Writing
True label: electronic music
20
True label from the vocabulary: Acoustic guitar / Bass guitar / Clapping / Coin (dropping) / Crash cymbal / Dishes, pots, and pans / Engine / Fart / Fire / Fireworks / Glass / Hi-hat / Piano / Rain / Slam / Squeak / Tearing / Walk, footsteps / Wind / Writing
21
22
23
⇀
model-agnostic / minimal intervention / efficient
24
⇀
model-agnostic / minimal intervention / efficient
25
target labels predictions
⇀
model-agnostic / minimal intervention / efficient
⇀
beneficial for clean data ⇀ detrimental for noisy data
26
⇀
dynamically update target labels based on model’s current state ⇀ updated target label: convex combination
27
Scott E. Reed, Honglak Lee, Dragomir Anguelov, Christian Szegedy, Dumitru Erhan, Andrew Rabinovich, Training Deep Neural Networks on Noisy Labels with Bootstrapping. In ICLR 2015 target labels predictions updated target labels
⇀
CCE: sensitive to noisy labels (weighting)
⇀
Mean Absolute Error (MAE):
■
avoid weighting
■
difficult convergence
28
Zhilu Zhang and Mert Sabuncu, Generalized cross entropy loss for training deep neural networks with noisy labels. In NeurIPS 2018
⇀
CCE: sensitive to noisy labels (weighting)
⇀
Mean Absolute Error (MAE):
■
avoid weighting
■
difficult convergence
⇀
negative Box-Cox transformation of softmax predictions
⇀
q = 1 → ℒq = MAE ; q → 0 → ℒq = CCE
29
Zhilu Zhang and Mert Sabuncu, Generalized cross entropy loss for training deep neural networks with noisy labels. In NeurIPS 2018
30
31
32
⇀
⇀
20 classes / 18k clips / 42.5 h / singly-labeled data ⇀ small amount of manually-labelled data and a large amount of noisy data ⇀ label noise characterization
⇀
large amount of Freesound audio & tags feasible for training sound recognizers
⇀ efficient way to improve performance in presence of noisy labels ⇀ ℒq is top-performing loss
33
1 2
http://www.eduardofonseca.net/FSDnoisy18k/ https://zenodo.org/record/2529934 https://github.com/edufonseca/icassp19
⇀
classes described with tags also used for other audio materials
■
Bass guitar, Crash cymbal, Engine, ... ⇀ field-recordings: several sound sources expected
■
⇀
pairs of related classes:
■
Squeak & Slam / Wind & Rain
35