Machine Listening in Complex Environments Some challenges in - - PowerPoint PPT Presentation
Machine Listening in Complex Environments Some challenges in - - PowerPoint PPT Presentation
Machine Listening in Complex Environments Some challenges in understanding musical and environmental sounds Mathieu Lagrange June 25, 2013 Music Human Perception Melody Enhancement Scattering Scene Synthesis Outline Motivation Let humans
Music Human Perception Melody Enhancement Scattering Scene Synthesis
Outline
Motivation
Let humans access audio data in a way that makes sense for them
2 / 28
Music Human Perception Melody Enhancement Scattering Scene Synthesis
Outline
Motivation
Let humans access audio data in a way that makes sense for them
Means
explore different means of representing sound to quantify the notion
- f resemblance between sounds as experienced by humans
in musical corpora for environmental sounds
2 / 28
Music Human Perception Melody Enhancement Scattering Scene Synthesis
Listening in a Complex Environment
semantic representations: human perception processes: mathematical representation: computational issues:
3 / 28
Music Human Perception Melody Enhancement Scattering Scene Synthesis
Listening in a Complex Environment
semantic representations: human perception processes: is there a need for segregating elements of interest ? mathematical representation: computational issues:
3 / 28
Music Human Perception Melody Enhancement Scattering Scene Synthesis
Listening in a Complex Environment
semantic representations: human perception processes: mathematical representation: what models can be relevant for representing complex scenes in a generic way ? computational issues:
3 / 28
Music Human Perception Melody Enhancement Scattering Scene Synthesis
Listening in a Complex Environment
semantic representations: human perception processes: mathematical representation: computational issues: is it meaningful to evaluate computational systems using artificial data ?
3 / 28
Music Human Perception Melody Enhancement Scattering Scene Synthesis
Music Information Retrieval (MIR)
As in every multimedia retrieval task, the main issue is to bridge the semantic gap. Depending on the data at hand, the difficulty of the task ranges from impossible to hardly doable
1 raw data (signal) 4 / 28
Music Human Perception Melody Enhancement Scattering Scene Synthesis
Music Information Retrieval (MIR)
As in every multimedia retrieval task, the main issue is to bridge the semantic gap. Depending on the data at hand, the difficulty of the task ranges from impossible to hardly doable
1 raw data (signal) 2 meta data (tags: genre) 4 / 28
Music Human Perception Melody Enhancement Scattering Scene Synthesis
Music Information Retrieval (MIR)
As in every multimedia retrieval task, the main issue is to bridge the semantic gap. Depending on the data at hand, the difficulty of the task ranges from impossible to hardly doable
1 raw data (signal) 2 meta data (tags: genre) 3 user ratings (likes) 4 / 28
Music Human Perception Melody Enhancement Scattering Scene Synthesis
Content-based Similarity in Music
Fingerprint Cover Similarity Signal Chords Tag User
5 / 28
Music Human Perception Melody Enhancement Scattering Scene Synthesis
Can we break the glass ceiling in MIR ?
One way is to consider the important property that Music is usually polyphonic.
6 / 28
Music Human Perception Melody Enhancement Scattering Scene Synthesis
Can we break the glass ceiling in MIR ?
One way is to consider the important property that Music is usually polyphonic. Let us focus on what is important in this polyphony !
6 / 28
Music Human Perception Melody Enhancement Scattering Scene Synthesis
Can we break the glass ceiling in MIR ?
One way is to consider the important property that Music is usually polyphonic. Let us focus on what is important in this polyphony ! at a low auditory level [Mesgarani & al Nature’12]
6 / 28
Music Human Perception Melody Enhancement Scattering Scene Synthesis
Can we break the glass ceiling in MIR ?
One way is to consider the important property that Music is usually polyphonic. Let us focus on what is important in this polyphony ! at a low auditory level [Mesgarani & al Nature’12] Since the multi-track recordings are usually not available, one has to resort to use approximate solutions, based on enhancement of the sources of interest within mix-down versions.
6 / 28
Music Human Perception Melody Enhancement Scattering Scene Synthesis
Can we break the glass ceiling in MIR ?
One way is to consider the important property that Music is usually polyphonic. Let us focus on what is important in this polyphony ! at a low auditory level [Mesgarani & al Nature’12] Since the multi-track recordings are usually not available, one has to resort to use approximate solutions, based on enhancement of the sources of interest within mix-down versions. Expected performance gain is far from being achieved with state-of-the-art approaches, though.
6 / 28
Music Human Perception Melody Enhancement Scattering Scene Synthesis
We do segregate, right ?
In front of a complex scene, we are remarkably efficient at focusing
- n a source of interest. But arguably, are we actually performing
the segregation at a low level, i.e. spectrogram level) ?
7 / 28
Music Human Perception Melody Enhancement Scattering Scene Synthesis
We do segregate, right ?
In front of a complex scene, we are remarkably efficient at focusing
- n a source of interest. But arguably, are we actually performing
the segregation at a low level, i.e. spectrogram level) ? Apparently, we do ;-)
7 / 28
Music Human Perception Melody Enhancement Scattering Scene Synthesis
Misgarani & al [Nature’12]
record cortical activity from human subjects implanted with customized high-density multi- electrode arrays as part of their clinical work-up for epilepsy surgery
8 / 28
Music Human Perception Melody Enhancement Scattering Scene Synthesis
Misgarani & al [Nature’12]
record cortical activity from human subjects implanted with customized high-density multi- electrode arrays as part of their clinical work-up for epilepsy surgery Subjects listened to speech samples from a corpus commonly used in multi-talker communication research. A typical sentence was "ready tiger go to red two now" where "tiger" is the call sign, and "red two" is the colour - number combination.
8 / 28
Music Human Perception Melody Enhancement Scattering Scene Synthesis
Misgarani & al [Nature’12]
record cortical activity from human subjects implanted with customized high-density multi- electrode arrays as part of their clinical work-up for epilepsy surgery Subjects listened to speech samples from a corpus commonly used in multi-talker communication research. A typical sentence was "ready tiger go to red two now" where "tiger" is the call sign, and "red two" is the colour - number combination. The method of stimulus reconstruction was used to estimate the speech spectrogram represented by the population neural responses.
8 / 28
Music Human Perception Melody Enhancement Scattering Scene Synthesis
Misgarani & al [Nature’12]
9 / 28
Music Human Perception Melody Enhancement Scattering Scene Synthesis
Misgarani & al [Nature’12]
9 / 28
Music Human Perception Melody Enhancement Scattering Scene Synthesis
Why Source Separation for Music Analysis is not trivial ?
The enhancement process must operate with limited prior knowledge about the properties of the specific parts to be enhanced.
10 / 28
Music Human Perception Melody Enhancement Scattering Scene Synthesis
Why Source Separation for Music Analysis is not trivial ?
The enhancement process must operate with limited prior knowledge about the properties of the specific parts to be enhanced. Distortions inevitably remain that propagate to the subsequent feature extraction and classification stages.
10 / 28
Music Human Perception Melody Enhancement Scattering Scene Synthesis
Why Source Separation for Music Analysis is not trivial ?
The enhancement process must operate with limited prior knowledge about the properties of the specific parts to be enhanced. Distortions inevitably remain that propagate to the subsequent feature extraction and classification stages. To reduce their impact, one needs to
design features that are robust use standard features and estimate whether they are relevant
- r not by considering a notion of uncertainty.
10 / 28
Music Human Perception Melody Enhancement Scattering Scene Synthesis
Why Source Separation for Music Analysis is not trivial ?
The enhancement process must operate with limited prior knowledge about the properties of the specific parts to be enhanced. Distortions inevitably remain that propagate to the subsequent feature extraction and classification stages. To reduce their impact, one needs to
design features that are robust use standard features and estimate whether they are relevant
- r not by considering a notion of uncertainty.
Uncertainty can be
binary (missing feature theory [Eggink2003]) Gaussian [Droppo2005].
10 / 28
Music Human Perception Melody Enhancement Scattering Scene Synthesis
Contributions
1 Promote the use of Gaussian uncertainty instead of binary
uncertainty for robust classification in the field of MIR,
11 / 28
Music Human Perception Melody Enhancement Scattering Scene Synthesis
Contributions
1 Promote the use of Gaussian uncertainty instead of binary
uncertainty for robust classification in the field of MIR,
2 Use a procedure for Gaussian uncertainty estimation
that is fully automatic that be derived for any feature
3 Learn classifiers directly from noisy data with Gaussian
uncertainty.
4 Matlab source code for GMM decoding and learning is
available at http://bass-db.gforge.inria.fr/amulet.
11 / 28
Music Human Perception Melody Enhancement Scattering Scene Synthesis
The standard MFCC / GMM classification scheme.
Mixture MFCC computation Model learning Learning Decoding GMM Likelihood computation Likelihood Mixture MFCC computation
12 / 28
Music Human Perception Melody Enhancement Scattering Scene Synthesis
Considering melody enhancement as a pre-processing step
Melody enhancement Mixture MFCC computation Model learning Learning Decoding GMM Likelihood computation Likelihood Melody enhancement Mixture MFCC computation MFCC computation
13 / 28
Music Human Perception Melody Enhancement Scattering Scene Synthesis
Considering melody enhancement as a pre-processing step
Melody enhancement Mixture MFCC computation Model learning Learning Decoding GMM Likelihood computation Likelihood Melody enhancement Mixture MFCC computation MFCC computation
14 / 28
Music Human Perception Melody Enhancement Scattering Scene Synthesis
Proposed Approach with melody enhancement and Gaussian uncertainty
Melody enhancement Mixture MFCC computation Model learning Learning Decoding GMM Likelihood computation Likelihood Melody enhancement Mixture MFCC computation
15 / 28
Music Human Perception Melody Enhancement Scattering Scene Synthesis
Results
Accuracy (%) per 10 sec. singing segment input Fold 1 Fold 2 Fold 3 Fold 4 Total mix 51 53 55 38 49 v-sep 60 63 53 43 55 v-sep-uncrt 71 72 84 83 77
16 / 28
Music Human Perception Melody Enhancement Scattering Scene Synthesis
Future work
larger scale experiments current experiments on Orca whales calls
17 / 28
Music Human Perception Melody Enhancement Scattering Scene Synthesis
What is a good representation of sounds ?
Seek invariance
in time
18 / 28
Music Human Perception Melody Enhancement Scattering Scene Synthesis
What is a good representation of sounds ?
Seek invariance
in time
18 / 28
Music Human Perception Melody Enhancement Scattering Scene Synthesis
What is a good representation of sounds ?
Seek invariance
in time in frequency
18 / 28
Music Human Perception Melody Enhancement Scattering Scene Synthesis
What is a good representation of sounds ?
Seek invariance
in time in frequency
18 / 28
Music Human Perception Melody Enhancement Scattering Scene Synthesis
What is a good representation of sounds ?
Seek invariance
in time in frequency
compacity
18 / 28
Music Human Perception Melody Enhancement Scattering Scene Synthesis
Goals
Need a way to find alternative to the accumulation of heterogeneous features. Need to model spectral and temporal characteristics of the sound that are perceptually meaningful
19 / 28
Music Human Perception Melody Enhancement Scattering Scene Synthesis
The scattering in a nutshell [Anden11]
20 / 28
Music Human Perception Melody Enhancement Scattering Scene Synthesis
The scattering in a nutshell [Anden11]
20 / 28
Music Human Perception Melody Enhancement Scattering Scene Synthesis
The scattering in a nutshell [Anden11]
The scattering has relevant properties
local time-shift invariance while keeping the amplitude modulation logarithmic filter bank allows us to stabilize the high frequencies
20 / 28
Music Human Perception Melody Enhancement Scattering Scene Synthesis
The scattering in a nutshell [Anden11]
20 / 28
Music Human Perception Melody Enhancement Scattering Scene Synthesis
The scattering in a nutshell [Anden11]
The scattering operator
at the order 1, the Cosine Log Scattering is equivalent to the MFCCs at the order 2, capture nicely temporal modulations higher orders does not capture a lot of energy, but may be interesting to study
20 / 28
Music Human Perception Melody Enhancement Scattering Scene Synthesis
MDS visualization on the Houix Database
Human similarity judgment (MAP=94%)
21 / 28
Music Human Perception Melody Enhancement Scattering Scene Synthesis
MDS visualization on the Houix Database
Dynamic Time Warping (DTW) (MAP=55.4%)
21 / 28
Music Human Perception Melody Enhancement Scattering Scene Synthesis
MDS visualization on the Houix Database
l1-norm of order 2 CLS Coefficients (MAP=56.7%)
21 / 28
Music Human Perception Melody Enhancement Scattering Scene Synthesis
MDS visualization on the Houix Database
Cosine distance of order 2 Separable Scattering Coefficients (MAP=59%)
21 / 28
Music Human Perception Melody Enhancement Scattering Scene Synthesis
Take home message on the scattering
The scattering operator
has strong mathematical grounding show some strong resemblances with models issued from neurophysiological studies (Torsten Dau, Shihab Shamma)
Challenges for retrieval purposes
dimensionality larger time scales: model long term complex modulations due to articulation separability (as by segregation)
22 / 28
Music Human Perception Melody Enhancement Scattering Scene Synthesis
Synthetizing Environmental Scenes
Automatically analyzing environmental scenes have interesting applications and is useful for research purposes. That said, evaluation data is still relatively scarce and does not
- ffer any kind of flexibility in terms of controlling important
characteristics such as diversity of sources degree of overlap background level ...
23 / 28
Music Human Perception Melody Enhancement Scattering Scene Synthesis
Synthetizing Environmental Scenes
Automatically analyzing environmental scenes have interesting applications and is useful for research purposes. That said, evaluation data is still relatively scarce and does not
- ffer any kind of flexibility in terms of controlling important
characteristics such as diversity of sources degree of overlap background level ...
23 / 28
Music Human Perception Melody Enhancement Scattering Scene Synthesis
Proposal: build a synthetizer
Design choices
- pen source
communicate with freesound.org full webaudio interface matlab frontend
24 / 28
Music Human Perception Melody Enhancement Scattering Scene Synthesis
Usage
First use for the evaluation of machine audition systems during the IEEE ASSP Challenge on Detection and Classification of Acoustic Scenes and Events Good results so far !!
25 / 28
Music Human Perception Melody Enhancement Scattering Scene Synthesis
Usage
First use for the evaluation of machine audition systems during the IEEE ASSP Challenge on Detection and Classification of Acoustic Scenes and Events
25 / 28
Music Human Perception Melody Enhancement Scattering Scene Synthesis
and beyond
Numerous potential research applications have been raised "Paris by ear, Paris by heart", reverse engineering the mental representation of the city Sound and well being in urban areas ... Bio acoustics ?
26 / 28
Music Human Perception Melody Enhancement Scattering Scene Synthesis
Thank you !!
27 / 28
Music Human Perception Melody Enhancement Scattering Scene Synthesis
People
Carlo Baugé Joakim Anden Stéphane Mallat
28 / 28