Learning Sound Event Classifiers from Web Audio with Noisy Labels - PowerPoint PPT Presentation

Learning Sound Event Classifiers from Web Audio with Noisy Labels Eduardo Fonseca 1 , Manoj Plakal 2 , Daniel P. W. Ellis 2 , Frederic Font 1 , Xavier Favory 1 , and Xavier Serra 1 1 2

Label noise in sound event classification Labels that fail to properly represent acoustic content in audio clip ● Why is label noise relevant? ● Label noise effects: performance decrease / increased complexity ● 2

How to mitigate label noise? 3

How to mitigate label noise? 4

How to mitigate label noise? automatic approaches 5

Our contributions FSDnoisy18k: a dataset to foster label noise research 1. 6

Our contributions FSDnoisy18k: a dataset to foster label noise research 1. CNN baseline system 2. Evaluation of noise-robust loss functions 3. 7

FSDnoisy18k 20 classes ● 18k audio clips ● 42.5 hours of audio ● 8

FSDnoisy18k: creation Freesound ● Audio content & metadata (tags) ⇀ AudioSet Ontology ● 20 classes (labels) ⇀ 9

Types of label noise singly-labeled data ● 12

Types of label noise singly-labeled data ● 13

Types of label noise singly-labeled data ● in-vocabulary (IV): events that are part of our target class set (closed-set) ● out-of-vocabulary (OOV): events not covered by the class set (open-set) ● 14

Examples: clip #1 Observed label from the vocabulary : Acoustic guitar / Bass guitar / Clapping / Coin (dropping) / Crash cymbal / Dishes, pots, and pans / Engine / Fart / Fire / Fireworks / Glass / Hi-hat / Piano / Rain / Slam / Squeak / Tearing / Walk, footsteps / Wind / Writing 15

Examples: clip #1 True label from the vocabulary: Acoustic guitar / Bass guitar / Clapping / Coin (dropping) / Crash cymbal / Dishes, pots, and pans / Engine / Fart / Fire / Fireworks / Glass / Hi-hat / Piano / Rain / Slam / Squeak / Tearing / Walk, footsteps / Wind / Writing 16

Examples: clip #2 Observed label from the vocabulary : Acoustic guitar / Bass guitar / Clapping / Coin (dropping) / Crash cymbal / Dishes, pots, and pans / Engine / Fart / Fire / Fireworks / Glass / Hi-hat / Piano / Rain / Slam / Squeak / Tearing / Walk, footsteps / Wind / Writing 17

Examples: clip #2 True label from the vocabulary : Acoustic guitar / Bass guitar / Clapping / Coin (dropping) / Crash cymbal / Dishes, pots, and pans / Engine / Fart / Fire / Fireworks / Glass / Hi-hat / Piano / Rain / Slam / Squeak / Tearing / Walk, footsteps / Wind / Writing Missing labels : male speech / laughter / children shouting / chirp, tweet / chatter 18

Examples: clip #3 Observed label from the vocabulary Acoustic guitar / Bass guitar / Clapping / Coin (dropping) / Crash cymbal / Dishes, pots, and pans / Engine / Fart / Fire / Fireworks / Glass / Hi-hat / Piano / Rain / Slam / Squeak / Tearing / Walk, footsteps / Wind / Writing 19

Examples: clip #3 True label from the vocabulary: Acoustic guitar / Bass guitar / Clapping / Coin (dropping) / Crash cymbal / Dishes, pots, and pans / Engine / Fart / Fire / Fireworks / Glass / Hi-hat / Piano / Rain / Slam / Squeak / Tearing / Walk, footsteps / Wind / Writing True label : electronic music 20

Label noise distribution in FSDnoisy18k most frequent types of label noise: OOV ● * some clips are incorrectly labeled, but still similar in terms of acoustics ● 21

FSDnoisy18k 20 classes / 18k clips / 42.5 h ● singly-labeled data ● variable clip duration: 300ms - 30s ● proportion train_noisy / train_clean = 90% / 10% ● per-class varying degree of types and amount of label noise ● expandable ● http://www.eduardofonseca.net/FSDnoisy18k/ ● 22

CNN baseline system 23

Noise-robust loss functions Why? ● model-agnostic / minimal intervention / efficient ⇀ 24

Noise-robust loss functions Why? ● model-agnostic / minimal intervention / efficient ⇀ Default loss function in multi-class setting: Categorical Cross-Entropy (CCE) ● predictions target labels 25

Noise-robust loss functions Why? ● model-agnostic / minimal intervention / efficient ⇀ Default loss function in multi-class setting: Categorical Cross-Entropy (CCE) ● CCE is sensitive to label noise: emphasis on difficult examples (weighting) ● beneficial for clean data ⇀ detrimental for noisy data ⇀ 26

Noise-robust loss functions Soft bootstrapping ● dynamically update target labels based on model’s current state ⇀ updated target label: convex combination ⇀ updated target labels predictions target labels Scott E. Reed, Honglak Lee, Dragomir Anguelov, Christian Szegedy, Dumitru Erhan, Andrew Rabinovich, Training Deep Neural Networks on Noisy Labels with Bootstrapping . In ICLR 2015 27

Noise-robust loss functions ● ℒ q loss intuition CCE: sensitive to noisy labels (weighting) ⇀ Mean Absolute Error (MAE): ⇀ avoid weighting ■ difficult convergence ■ Zhilu Zhang and Mert Sabuncu, Generalized cross entropy loss for training deep neural networks with noisy labels . In NeurIPS 2018 28

Noise-robust loss functions ● ℒ q loss intuition CCE: sensitive to noisy labels (weighting) ⇀ Mean Absolute Error (MAE): ⇀ avoid weighting ■ difficult convergence ■ ● ℒ q loss is a generalization of CCE and MAE: negative Box-Cox transformation of softmax predictions ⇀ q = 1 → ℒ q = MAE ; q → 0 → ℒ q = CCE ⇀ Zhilu Zhang and Mert Sabuncu, Generalized cross entropy loss for training deep neural networks with noisy labels . In NeurIPS 2018 29

Experiments supervision by user-provided tags can be useful for sound event classification ● ● ℒ q works well for sound classification tasks with OOV (and some IV) noises 30

Experiments boost by using ℒ q on noisy set: 1.9% ( little engineering effort) ● boost by adding curated data to noisy set: 5.1% ( significant manual effort) ● 31

Summary & takeaways FSDnoisy18k ● open dataset for investigation of label noise ⇀ 20 classes / 18k clips / 42.5 h / singly-labeled data ⇀ small amount of manually-labelled data and a large amount of noisy data ⇀ label noise characterization ⇀ CNN baseline system ● large amount of Freesound audio & tags feasible for training sound recognizers ⇀ Noise-robust loss functions ● efficient way to improve performance in presence of noisy labels ⇀ ⇀ ℒ q is top-performing loss 32

If you are interested in label noise... 33

Learning Sound Event Classifiers from Web Audio with Noisy Labels Thank you! http://www.eduardofonseca.net/FSDnoisy18k/ https://zenodo.org/record/2529934 https://github.com/edufonseca/icassp19 Eduardo Fonseca 1 , Manoj Plakal 2 , Daniel P. W. Ellis 2 , Frederic Font 1 , Xavier Favory 1 , and Xavier Serra 1 1 2

Why this vocabulary? data availability ● classes “suitable” for the study of label noise ● classes described with tags also used for other audio materials ⇀ Bass guitar, Crash cymbal, Engine, ... ■ field-recordings: several sound sources expected ⇀ only the most predominant(s) tagged: Rain, Fireworks, Slam, Fire, ... ■ pairs of related classes: ⇀ Squeak & Slam / Wind & Rain ■ 35

Learning Sound Event Classifiers from Web Audio with Noisy Labels - PowerPoint PPT Presentation

Learning Sound Event Classifiers from Web Audio with Noisy Labels Eduardo Fonseca 1 , Manoj Plakal 2 , Daniel P. W. Ellis 2 , Frederic Font 1 , Xavier Favory 1 , and Xavier Serra 1 1 2 Label noise in sound event classification Labels that fail

Nonlinear Classifiers II 2 Nonlinear Classifiers: Introduction Classifiers Supervised

Audio Device Client Better and Faster Audio I/O on Web Hongchan Choi Google Chrome Web Audio

Introduction to Web Design Web Audio and Video Introduction to Web Design Web Audio and Video

Cognitive Modeling Unseen Examples 2 Bayes Classifiers Lecture 14: Naive Bayes Classifiers

Occasion-level Classifiers or Event-level Classifiers? -Evidence from Child Language Acquisition

Audio and Speech August 13, 2001 Audio 2 Digital sound anti-aliasing amplifier codec filter

Cirrus Audio Solutions Cirrus Audio Solutions Home Audio Portable Audio Personal CD Player

Model-agnostic Approaches to Handling Noisy Labels When Training Sound Event Classifiers Eduardo

SOUND SOUND Wha hat is t is sound sound? Click on the image below to find out. Sounds are

? Message sound Message P(wolf|sound) P(sound| wolf) x P(wolf) 1 9/4/19 P(sound| wolf)

Sonification - Sound of Science VU, WS 2013 Lecture 8 - Parameter Mapping Visda Goudarzi

The Sound Group Joe Bota Aaron Camm Alex Cueto Brief Overview The Physics of Sound Audio

Machine Learning Nave Bayes classifiers Types of classifiers We can divide the large

Web Audio Tutorial 9/4, 2015 Your Goal Learn what digital audio is and how it works

Fusion of Continuous Output Classifiers Classifiers Jacob Hays Amit Pillay James DeFelice

CS440/ECE448 Lecture 22: Including Slides by Svetlana Lazebnik, 10/2016 Linear Classifiers

and Parents Briefing 18 January 2019 (Friday) To provide a vibrant environment that A

Markov Decision Processes CS60077: Reinforcement Learning Abir Das IIT Kharagpur Sep 14 and 15,

Learning to Play Games Tutorial Lectures Professor Simon M. Lucas Game Intelligence Group

A Concerted Effort Towards Flourishing Global Software Development Dehua Ju ASTI Shanghai

Testing an odd optimization problem Cap'n Robert Merkel A-ha Me Hearties???? Why pirates???

Fair division, Part 2 Herve Moulin, Rice University Summer School in Algorithmic Game Theory CMU,

http://www.phone-lab.org blue: A Systems Research Group Anandatirthra Carl Nandugudi Nuessle

trLxl G,ryrJ dX Z P.[x --] * teNulol aa Lul,^,,t, ,t 6,Vr"r," ] E[6xi= trtxJu[xJ NO

Learning Sound Event Classifiers from Web Audio with Noisy Labels - PowerPoint PPT Presentation

Learning Sound Event Classifiers from Web Audio with Noisy Labels Eduardo Fonseca 1 , Manoj Plakal 2 , Daniel P. W. Ellis 2 , Frederic Font 1 , Xavier Favory 1 , and Xavier Serra 1 1 2 Label noise in sound event classification Labels that fail

Nonlinear Classifiers II 2 Nonlinear Classifiers: Introduction Classifiers Supervised

Audio Device Client Better and Faster Audio I/O on Web Hongchan Choi Google Chrome Web Audio

Introduction to Web Design Web Audio and Video Introduction to Web Design Web Audio and Video

Cognitive Modeling Unseen Examples 2 Bayes Classifiers Lecture 14: Naive Bayes Classifiers

Occasion-level Classifiers or Event-level Classifiers? -Evidence from Child Language Acquisition

Audio and Speech August 13, 2001 Audio 2 Digital sound anti-aliasing amplifier codec filter

Cirrus Audio Solutions Cirrus Audio Solutions Home Audio Portable Audio Personal CD Player

Model-agnostic Approaches to Handling Noisy Labels When Training Sound Event Classifiers Eduardo

SOUND SOUND Wha hat is t is sound sound? Click on the image below to find out. Sounds are

? Message sound Message P(wolf|sound) P(sound| wolf) x P(wolf) 1 9/4/19 P(sound| wolf)

Sonification - Sound of Science VU, WS 2013 Lecture 8 - Parameter Mapping Visda Goudarzi

The Sound Group Joe Bota Aaron Camm Alex Cueto Brief Overview The Physics of Sound Audio

Machine Learning Nave Bayes classifiers Types of classifiers We can divide the large

Web Audio Tutorial 9/4, 2015 Your Goal Learn what digital audio is and how it works

Fusion of Continuous Output Classifiers Classifiers Jacob Hays Amit Pillay James DeFelice

CS440/ECE448 Lecture 22: Including Slides by Svetlana Lazebnik, 10/2016 Linear Classifiers

and Parents Briefing 18 January 2019 (Friday) To provide a vibrant environment that A

Markov Decision Processes CS60077: Reinforcement Learning Abir Das IIT Kharagpur Sep 14 and 15,

Learning to Play Games Tutorial Lectures Professor Simon M. Lucas Game Intelligence Group

A Concerted Effort Towards Flourishing Global Software Development Dehua Ju ASTI Shanghai

Testing an odd optimization problem Cap'n Robert Merkel A-ha Me Hearties???? Why pirates???

Fair division, Part 2 Herve Moulin, Rice University Summer School in Algorithmic Game Theory CMU,

http://www.phone-lab.org blue: A Systems Research Group Anandatirthra Carl Nandugudi Nuessle

trLxl G,*ryrJ dX Z P.[x --*] * teNulol aa Lul,^,,t, ,t 6,Vr&quot;r,&quot; ] E[6*xi= trtxJ*u[xJ NO

trLxl G,ryrJ dX Z P.[x --] * teNulol aa Lul,^,,t, ,t 6,Vr"r," ] E[6xi= trtxJu[xJ NO