Automatic Transcription and Separation of the Main Melody from - - PowerPoint PPT Presentation

automatic transcription and separation of the main melody
SMART_READER_LITE
LIVE PREVIEW

Automatic Transcription and Separation of the Main Melody from - - PowerPoint PPT Presentation

Automatic Transcription and Separation of the Main Melody from Polyphonic Music Signals Jean-Louis Durrieu, PhD candidate TSI Department, Telecom ParisTech http://perso.telecom-paristech.fr/durrieu/en/ 07/09/09 Automatic Transcription and


slide-1
SLIDE 1

07/09/09

Automatic Transcription and Separation of the Main Melody from Polyphonic Music Signals

Jean-Louis Durrieu, PhD candidate TSI Department, Telecom ParisTech

http://perso.telecom-paristech.fr/durrieu/en/

slide-2
SLIDE 2

page 2 direction ou services

Automatic Transcription and Separation of the Main Melody from Polyphonic Music Signals

1.Introduction 2.Signal Models 3.Transcription of the Melody 4.“Solo/Accompaniment” Separation

slide-3
SLIDE 3

page 3 direction ou services

Automatic Transcription and Separation of the Main Melody from Polyphonic Music Signals

1.Introduction 2.Signal Models 3.Transcription of the Melody 4.“Solo/Accompaniment” Separation

slide-4
SLIDE 4

page 4 direction ou services

Introduction

Blind Audio Source Separation (BASS) for music

and Music Information Retrieval (MIR): → Inter-related Fields

Polyphonic music recordings:

a BASS/MIR hybrid approach to main melody transcription/separation

Applications

slide-5
SLIDE 5

page 5 direction ou services

Introduction: link between BASS/MIR

BASS Approaches

  • Based on models
  • Data-driven
  • “Low-level”

(signal level) MIR Approaches

  • Perceptually

motivated

  • Knowledge driven
  • “High-level”

(semantic level) Separated Musical Sources

  • Transcription
  • Indexing

“Breaking” music into “atomic” elements

slide-6
SLIDE 6

page 6 direction ou services

Introduction: Bridging BASS/MIR “gap”

Improving BASS with MIR, and MIR with BASS 2 instruments transcription/separation example: Hybrid approaches:

  • E. Vincent, “Musical Source Separation Using Time-

Frequency Source Priors”, IEEE Transactions on Audio, Speech and Language Processing, vol. 14, No 1

  • Singing voice signals?

BASS MIR MIR

slide-7
SLIDE 7

page 7 direction ou services

Introduction: Main Melody Transcription, Main Instrument Separation

Definitions:

  • [MIREX] “Audio Melody Extraction”: extract the main

melody from polyphonic audio signals.

  • [Paiva2007]: “[The Main] Melody is the dominant

individual pitched line in a musical ensemble”.

Addressing 2 tasks:

  • Main Melody Transcription: identify and transcribe

the sequence of fundamental frequencies played by the main instrument in a polyphonic music signal (mono or stereo),

  • Main Instrument/Accompaniment Separation:

separate the instrument playing the main melody from the other accompaniment instruments.

slide-8
SLIDE 8

page 8 direction ou services

Introduction: Applications

Transcribed Melody

  • Indexing large music database,
  • Musical transcription into “human readable” score,
  • ...

Separating the Main Instrument from the

Accompaniment:

  • Generate accompaniments for solo performers
  • Pre-Processing for MIR applications (chord

detection, instrument classification, etc.)

  • ...
slide-9
SLIDE 9

page 9 direction ou services

Introduction: Presentation Outline

Signal Models

Source/Filter model for the main instrument, NMF for the other instruments; estimation algorithm for the corresponding parameters,

Melody transcription

Viterbi smoothing of the melody sequence,

Main Instrument/Accompaniment Separation (also

referred to as Solo/Accompaniment Separation) Wiener filters to estimate the separated sources,

Conclusion/Discussions

slide-10
SLIDE 10

page 10 direction ou services

Introduction: System Outline (ICASSP09)

slide-11
SLIDE 11

page 11 direction ou services

Introduction: Contributors at Telecom ParisTech

Supervisors:

  • Bertrand DAVID,
  • Gaël RICHARD.

Team members:

  • Nancy BERTIN,
  • Cédric FEVOTTE,
  • Alexey OZEROV,
  • And all the other Audiosig project team

members...

slide-12
SLIDE 12

page 12 direction ou services

Automatic Transcription and Separation of the Main Melody from Polyphonic Music Signals

1.Introduction 2.Signal Models 3.Transcription of the Melody 4.“Solo/Accompaniment” Separation

slide-13
SLIDE 13

page 13 direction ou services

Signal Models

Audio signals:

  • Time-Frequency Representation,
  • Statistical modeling.

Mixture model Source/Filter model for the main instrument

  • Motivations
  • Characterizing the main melody instrument

NMF-based model for the accompaniment

  • Decomposition on limited dictionary
  • Link between NMF and our framework

Parameter estimation

  • NMF-like algorithm: multiplicative gradient approach
slide-14
SLIDE 14

page 14 direction ou services

Signal Models: Time-Frequency Representation

Digital audio: waveform Time-frequency

representation:

  • Evolution of

frequency content,

  • Human auditory

system.

Short-Time Fourier

Transform (STFT):

Time (s) Frequency (Hz)

slide-15
SLIDE 15

page 15 direction ou services

Signal Models: Complex Proper Gaussians

Model for complex spectrum: Independence across time and frequencies: For stationary processes, power

spectrum density (PSD) of

Variance/PSD matrix s.t.

slide-16
SLIDE 16

page 16 direction ou services

Signal Models: Mixture Model

Mixture = Solo + Accompaniment

Voice Music

Each signal centered-Gaussian, with resp.

variances :

Independence between V and M:

Source/Filter model NMF decomposition

  • f the power spectrum
slide-17
SLIDE 17

page 17 direction ou services

Signal Models: Mixture Model

Mixture = Solo + Accompaniment

Voice Music

Each signal centered-Gaussian, with resp.

variances :

Independence between V and M:

Source/Filter model NMF decomposition

  • f the power spectrum
slide-18
SLIDE 18

page 18 direction ou services

Signal Models: Source/Filter Model for the Main Instrument

Motivations:

  • Singing voice often

main instrument,

  • Source/Filter widely

used, suitable for wide range of other instruments,

  • Separately modeling

pitched aspects (source) from timbre aspects (filter).

Human vocal tract (from Wikipedia)

slide-19
SLIDE 19

page 19 direction ou services

Signal Models: Source/Filter Principle

(Vocal Tract) Filter (Glottal) Source

Frequency (Hz) Frequency (Hz) Frequency (Hz)

slide-20
SLIDE 20

page 20 direction ou services

Signal Models: Source/Filter Variability

Time (s) Frequency (Hz) A Vocal Signal (by Tamy - from MTG MASS database)

slide-21
SLIDE 21

page 21 direction ou services

Signal Models: Source/Filter Variability

Human singer:

  • Independent evolution of pitches and filters

(vowel),

  • Continuous pitch variations,
  • Limited set of vowels (smooth filters),
  • Unvoiced parts...

Proposed Model for Main Instrument:

  • Discrete range of possible for voiced source

component, log-spaced s.t. 96 per octave,

  • Limited number of “smooth” filters,
  • Unvoiced source component integrated later in

the estimation process.

slide-22
SLIDE 22

page 22 direction ou services

Signal Models: Source Component (1/2)

Voiced source component:

  • KLGLOTT88 (Glottal source) model, [Klatt90]: spectral

comb dictionary , “notes”,

  • Freq. , Pitch : power spectrum ,
  • Pitch , Frame : activation coefficients ,
  • Nonnegative combination of the element of the dictionary

Unvoiced source

  • In dictionary , “unvoiced” component such that:
  • Activation coefficient estimated only after filter part.
slide-23
SLIDE 23

page 23 direction ou services

Signal Models: Source Component (2/2)

Frequency (Hz) Time (s) f0 number Frequency (Hz) f0 number

slide-24
SLIDE 24

page 24 direction ou services

Signal Models: Filter Component (1/2)

Filter component:

  • Dictionary of filters ,
  • Freq. , Filter number : freq. response ,
  • Filter , Frame : activation ,
  • Combination:

Filter smoothness:

  • Decomposition on spectral dictionary of smooth

“atomic” elements , activations ,

  • That is to say:
slide-25
SLIDE 25

page 25 direction ou services

Signal Models: Filter Component (2/2)

Time (s) filter Frequency (Hz) Time (s) Frequency (Hz) Frequency (Hz) p

slide-26
SLIDE 26

page 26 direction ou services

Signal Models: Source/Filter Summary

Source contribution: Filter contribution: Main Instrument contribution to the mixture power

spectrum:

Parameters:

  • Fixed parameters: dictionaries and
  • To estimate:
slide-27
SLIDE 27

page 27 direction ou services

Signal Models: Mixture Model

Mixture = Solo + Accompaniment

Voice Music

Each signal centered-Gaussian, with resp.

variances :

Independence between V and M:

Source/Filter model NMF decomposition

  • f the power spectrum
slide-28
SLIDE 28

page 28 direction ou services

Signal Models: Accompaniment (1/2)

Accompaniment/Background Music component:

  • Power spectrum dictionary , with elements,
  • Activation matrix ,
  • Nonnegative combination of the element of the

dictionary

Equivalence between [Fevotte09]:

  • Maximum Likelihood (ML) estimation of and

with

  • NMF minimizing the Itakura-Saito divergence between

and the matrix product

slide-29
SLIDE 29

page 29 direction ou services

Signal Models: Accompaniment (2/2)

Frequency (Hz) Time (s) Frequency (Hz) r r

slide-30
SLIDE 30

page 30 direction ou services

Signal Models: Mixture model summary

Mixture variance/PSD matrix:

  • Main Instrument:
  • Accompaniment:
  • Mixture:

Parameters:

  • Fixed Parameters:
  • To be estimated:
slide-31
SLIDE 31

page 31 direction ou services

Signal Models: Parameter Estimation

 Maximum Likelihood (ML) estimation:

  • Log-likelihood of the observations :
  • With the parameterized variance:

NMF inspired algorithm:

  • Itakura-Saito divergence between and
  • Multiplicative updates for parameter estimation
slide-32
SLIDE 32

page 32 direction ou services

Automatic Transcription and Separation of the Main Melody from Polyphonic Music Signals

1.Introduction 2.Signal Models 3.Transcription of the Melody 4.“Solo/Accompaniment” Separation

slide-33
SLIDE 33

page 33 direction ou services

Transcription of the Melody

Application definition and scope Model to estimate a smooth melody Dynamic Programming (Viterbi algorithm)

slide-34
SLIDE 34

page 34 direction ou services

Transcription: Definition and scope

Definition:

  • “[The Main] Melody is the dominant individual pitched

line in a musical ensemble.” [Paiva2007],

  • Transcribe the fundamental frequencies played by the

predominant instrument in a polyphonic music recording.

Scope:

  • “low-level” transcription: sequence of pitches,
  • Various genres and musical ensembles (MIREX 2004

and 2005 database),

  • Participation to an international evaluation campaign:

MIREX 2008 (“audio melody extraction”)

slide-35
SLIDE 35

page 35 direction ou services

Transcription: Modeling a Smooth Melody

Assumptions on the melody line :

  • Smooth,
  • Predominant as concerns the energy,
  • Realistic melody line: trade-off between the

smoothness and the energy of the line.

Hidden Markov Model :

slide-36
SLIDE 36

page 36 direction ou services

Transcription: Melody Tracking

Maximum Likelihood estimation:

  • ,

concerns the energy:

  • Likelihood of the sequence of pitches:

where we chose:

Viterbi Tracking algorithm

  • Dynamic Programming,
  • Modified to deal with silences in the main melody.
slide-37
SLIDE 37

page 37 direction ou services

Transcription: Melody Tracking

Time (s) Time (s) Melody line Time (s) Time (s)

slide-38
SLIDE 38

page 38 direction ou services

Transcription: Results

slide-39
SLIDE 39

page 39 direction ou services

Automatic Transcription and Separation of the Main Melody from Polyphonic Music Signals

1.Introduction 2.Signal Models 3.Transcription of the Melody 4.“Solo/Accompaniment” Separation

slide-40
SLIDE 40

page 40 direction ou services

Solo/Accompaniment Separation

Definition Estimation of the separated signals Results Applications and Extensions

slide-41
SLIDE 41

page 41 direction ou services

Solo/Accompaniment Separation

Definition:

  • “Solo”: the track played by the main instrument, with

the main melody,

  • “Accompaniment”: the remaining other background

instruments.

  • Separate these two contributions and obtain their

images.

MIR-aided approach:

  • First step: melody tracking,
  • Second step: re-estimation of the parameters

knowing the melody,

  • ( Third step: re-estimation including unvoiced parts )
slide-42
SLIDE 42

page 42 direction ou services

Solo/Accompaniment Separation

slide-43
SLIDE 43

page 43 direction ou services

Solo/Accompaniment Separation

Time (s) Time (s)

slide-44
SLIDE 44

page 44 direction ou services

Solo/Accompaniment Separation: Results

ICASSP 2009:

  • + 8 dB SDR for the estimated singing voice,
  • + 2 dB SDR for the accompaniment extraction.

SiSEC “Professionally produced music recordings” (

http://sisec.wiki.irisa.fr/)

  • Interesting result: on the excerpt by “Tamy”,

flute+guitar, best results for algorithms who first estimate the melody.

Some sound examples on:

  • http://perso.enst.fr/durrieu/en/results_en.html
  • http://perso.enst.fr/durrieu/en/icassp09/

Some other sounds here...

slide-45
SLIDE 45

page 45 direction ou services

Solo/Accompaniment Separation: Applications/Extensions

MIR applications (MIREX 2008):

  • Pre-processing for multipitch estimation,
  • Accompaniment enhancement for Chord detection,

Other potential extensions:

  • Stereophonic signals: submission to Eusipco 2009,
  • Enhancing discrimination of main instrument by

classification methods,

  • Adding constraints (priors) to the parameters,

avoiding several steps to achieve separation.

slide-46
SLIDE 46

page 46 direction ou services

Conclusions/Discussions

Conclusions:

  • Hybrid Framework BASS/MIR,
  • State-of-the-art for “audio melody transcription”

(MIREX08) and “solo/accompaniment” separation (SiSEC),

  • Techniques suitable for other applications: multipitch,

background music enhancement, indexing, etc.

Extensions:

  • Better formalism for multichannel signals,
  • Transcription into musical notes/musical score.
slide-47
SLIDE 47

page 47 direction ou services

Conclusions/Discussions

Conclusions:

  • Hybrid Framework BASS/MIR,
  • State-of-the-art for “audio melody transcription”

(MIREX08) and “solo/accompaniment” separation (SiSEC),

  • Techniques suitable for other applications: multipitch,

background music enhancement, indexing, etc.

Extensions:

  • Better formalism for multichannel signals,
  • Transcription into musical notes/musical score.

Any questions?