Machine listening for birds: analysis techniques matched to the - - PowerPoint PPT Presentation

machine listening for birds analysis techniques matched
SMART_READER_LITE
LIVE PREVIEW

Machine listening for birds: analysis techniques matched to the - - PowerPoint PPT Presentation

Multiple birdsong tracking Representing fine modulations Machine listening for birds: analysis techniques matched to the characteristics of bird vocalisations Dan Stowell and Mark D Plumbley Centre for Digital Music School of Elec Eng &


slide-1
SLIDE 1

Multiple birdsong tracking Representing fine modulations

Machine listening for birds: analysis techniques matched to the characteristics of bird vocalisations

Dan Stowell and Mark D Plumbley

Centre for Digital Music School of Elec Eng & Computer Science Queen Mary, University of London

June 2013, Listening in the Wild

dan.stowell@eecs.qmul.ac.uk Analysis techniques matched to bird vocalisations 1

slide-2
SLIDE 2

Multiple birdsong tracking Representing fine modulations

Motivation

“Cocktail party” problems. . .

dan.stowell@eecs.qmul.ac.uk Analysis techniques matched to bird vocalisations 2

slide-3
SLIDE 3

Multiple birdsong tracking Representing fine modulations

Motivation

Photo: Shutterstock / Romeo Mikulic dan.stowell@eecs.qmul.ac.uk Analysis techniques matched to bird vocalisations 3

slide-4
SLIDE 4

Multiple birdsong tracking Representing fine modulations

Motivation

We often have audio with multiple birds, and would like to perform automatic tasks (recognition, tracking, counting. . . ) Existing computational methods don’t quite fit the characteristics

  • f bird vocalisations:
  • 1. Multiple “speakers”, and discontinuous utterances

—problematic for methods adapted from speech recognition

  • 2. Birds often use very rapid modulations,

yet typical signal representations (spectrograms, MFCCs, LPC) do not capture them

dan.stowell@eecs.qmul.ac.uk Analysis techniques matched to bird vocalisations 4

slide-5
SLIDE 5

Multiple birdsong tracking Representing fine modulations

Outline

  • 1. Syllable-to-syllable tracking of multiple birds
  • 2. Representing the fine detail of bird vocalisations

2000 4000 6000 8000

dan.stowell@eecs.qmul.ac.uk Analysis techniques matched to bird vocalisations 5

slide-6
SLIDE 6

Multiple birdsong tracking Representing fine modulations

Multiple birdsong tracking

Chiffchaff (Phylloscopus collybita)

dan.stowell@eecs.qmul.ac.uk Analysis techniques matched to bird vocalisations 6

slide-7
SLIDE 7

Multiple birdsong tracking Representing fine modulations

Automatic Speech Recognition

Hidden Markov Model:

time

t 1 t 2 t 3 t 4 x1 x2 x3 x4 y1 y2 y3 y4

dan.stowell@eecs.qmul.ac.uk Analysis techniques matched to bird vocalisations 7

slide-8
SLIDE 8

Multiple birdsong tracking Representing fine modulations

Intermittent polyphonic sources

dan.stowell@eecs.qmul.ac.uk Analysis techniques matched to bird vocalisations 8

slide-9
SLIDE 9

Multiple birdsong tracking Representing fine modulations

Intermittent polyphonic sources

dan.stowell@eecs.qmul.ac.uk Analysis techniques matched to bird vocalisations 8

slide-10
SLIDE 10

Multiple birdsong tracking Representing fine modulations

Modelling an intermittent source

Markov renewal process (“MRP”): P(τn+1 ≤ t, Xn+1 = j | (X1, T1), . . . , (Xn = i, Tn) ) =P(τn+1 ≤ t, Xn+1 = j | Xn = i ) where τn+1 is the time difference Tn+1 − Tn.

dan.stowell@eecs.qmul.ac.uk Analysis techniques matched to bird vocalisations 9

slide-11
SLIDE 11

Multiple birdsong tracking Representing fine modulations

Multiple MRPs

Problem sketch: assume multiple MRPs, plus potential “clutter”. Given transition probabilities, find the most likely set of paths. (Max 1 path per node)

dan.stowell@eecs.qmul.ac.uk Analysis techniques matched to bird vocalisations 10

slide-12
SLIDE 12

Multiple birdsong tracking Representing fine modulations

Flow networks, and minimum cost flow

s ab(X1) ab(X2) t ad(X1) at(X1,X2,T2-T1) ad(X2) ac(X1) V1 ac(X3) V3 ac(X2) V2 ab(X3) at(X1,X3,T3-T1) ad(X3) at(X2,X3,T3-T2)

Convert likelihood expression to flow “costs”: ab(X) = − log pb(X) ad(X) = − log pd(X) at(X, X ′, τ) = − log fX(X ′, τ) ac(X) = log pc(X)

dan.stowell@eecs.qmul.ac.uk Analysis techniques matched to bird vocalisations 11

slide-13
SLIDE 13

Multiple birdsong tracking Representing fine modulations

Flow networks, and minimum cost flow

s ab(X1) ab(X2) t ad(X1) at(X1,X2,T2-T1) ad(X2) ac(X1) V1 ac(X3) V3 ac(X2) V2 ab(X3) at(X1,X3,T3-T1) ad(X3) at(X2,X3,T3-T2)

Convert likelihood expression to flow “costs”: ab(X) = − log pb(X) ad(X) = − log pd(X) at(X, X ′, τ) = − log fX(X ′, τ) ac(X) = log pc(X)

dan.stowell@eecs.qmul.ac.uk Analysis techniques matched to bird vocalisations 11

slide-14
SLIDE 14

Multiple birdsong tracking Representing fine modulations

Flow networks, and minimum cost flow

s ab(X1) ab(X2) t ad(X1) at(X1,X2,T2-T1) ad(X2) ac(X1) V1 ac(X3) V3 ac(X2) V2 ab(X3) at(X1,X3,T3-T1) ad(X3) at(X2,X3,T3-T2)

Convert likelihood expression to flow “costs”: ab(X) = − log pb(X) ad(X) = − log pd(X) at(X, X ′, τ) = − log fX(X ′, τ) ac(X) = log pc(X)

dan.stowell@eecs.qmul.ac.uk Analysis techniques matched to bird vocalisations 11

slide-15
SLIDE 15

Multiple birdsong tracking Representing fine modulations

Minimum cost flow

Minimum cost flow algorithms can therefore solve this problem:

◮ Optimal minimum-cost flow: Edmonds-Karp algorithm,

asymptotic time complexity O(|V ||A|2).

◮ Or use inexact (greedy) algorithm: O(|V ||A|) or lower.

dan.stowell@eecs.qmul.ac.uk Analysis techniques matched to bird vocalisations 12

slide-16
SLIDE 16

Multiple birdsong tracking Representing fine modulations

Synthetic example

2 4 6 8 10 10 20 30 40 50 60 LR 6.33e+18 2 4 6 8 10 10 20 30 40 50 60 LR 1.45e+21 2 4 6 8 10 10 20 30 40 50 60

generator: locked

2 4 6 8 10 10 20 30 40 50 60 2 4 6 8 10 10 20 30 40 50 60 LR 1.42e+12 2 4 6 8 10 10 20 30 40 50 60 LR 4.55e+17 2 4 6 8 10 10 20 30 40 50 60

generator: coherent

2 4 6 8 10 10 20 30 40 50 60 0.0 0.2 0.4 0.6 0.8 1.0

inferred (coherent)

10 20 30 40 50 60 2 4 6 8 10

inferred (segregated)

10 20 30 40 50 60 LR 3.11e+16 2 4 6 8 10

clean signal

10 20 30 40 50 60

generator: segregated

2 4 6 8 10

signal in noise

10 20 30 40 50 60 2 4 6 8 10 10 20 30 40 50 60 LR 6.33e+18 2 4 6 8 10 10 20 30 40 50 60 LR 1.45e+21 2 4 6 8 10 10 20 30 40 50 60

generator: locked

2 4 6 8 10 10 20 30 40 50 60 2 4 6 8 10 10 20 30 40 50 60 LR 1.42e+12 2 4 6 8 10 10 20 30 40 50 60 LR 4.55e+17 2 4 6 8 10 10 20 30 40 50 60

generator: coherent

2 4 6 8 10 10 20 30 40 50 60 0.0 0.2 0.4 0.6 0.8 1.0

inferred (coherent)

10 20 30 40 50 60 2 4 6 8 10

inferred (segregated)

10 20 30 40 50 60 LR 3.11e+16 2 4 6 8 10

clean signal

10 20 30 40 50 60

generator: segregated

2 4 6 8 10

signal in noise

10 20 30 40 50 60

dan.stowell@eecs.qmul.ac.uk Analysis techniques matched to bird vocalisations 13

slide-17
SLIDE 17

Multiple birdsong tracking Representing fine modulations

Birdsong experiment

25 European recordings of Chiffchaff (source: Xeno Canto) Mixtures of 2–5 recordings, 5-fold crossvalidation Can it cluster the “syllables” in the same way as the source audio?

dan.stowell@eecs.qmul.ac.uk Analysis techniques matched to bird vocalisations 14

slide-18
SLIDE 18

Multiple birdsong tracking Representing fine modulations

Data preparation

Syllables detected by spectrogram cross-correlation.

0.05 0.11 0.17 Time (s) 3100 4000 4800 5700 6500 7400 Freq (Hz)

Template

5 10 15 20 25 Time (s) 2000 4000 6000 8000 10000 Freq (Hz)

XC25760-dn.xcor

dan.stowell@eecs.qmul.ac.uk Analysis techniques matched to bird vocalisations 15

slide-19
SLIDE 19

Multiple birdsong tracking Representing fine modulations

Results

1 2 3 4 5 Number of signals in mixture 0.0 0.2 0.4 0.6 0.8 1.0 Ftrans

Ideal recovery, trained on test data Ideal recovery Ideal recovery plus synthetic noise Recovery from audio Recovery from audio (greedy) Recovery from audio (baseline)

Means and standard errors are shown (5-fold crossvalidation)

dan.stowell@eecs.qmul.ac.uk Analysis techniques matched to bird vocalisations 16

slide-20
SLIDE 20

Multiple birdsong tracking Representing fine modulations dan.stowell@eecs.qmul.ac.uk Analysis techniques matched to bird vocalisations 17

slide-21
SLIDE 21

Multiple birdsong tracking Representing fine modulations

Representing fine modulations

Many (song)birds use very rapid frequency modulation (FM)

◮ Songbirds can perceive fine detail of FM

(Dooling et al. 2002, Lohr et al. 2006)

◮ FM detail can affect behavioural responses

(Trillo et al. 2005, de Kort et al. 2009) Yet... Standard representations assume local stationarity (i.e. signal parameters unchanging) at fine timescales.

◮ Fourier transform magnitudes (spectrograms, MFCCs) ◮ Linear prediction (LPC)

Detail at < 20 ms likely to be smeared or discarded.

dan.stowell@eecs.qmul.ac.uk Analysis techniques matched to bird vocalisations 18

slide-22
SLIDE 22

Multiple birdsong tracking Representing fine modulations

Representing fine modulations

Many (song)birds use very rapid frequency modulation (FM)

◮ Songbirds can perceive fine detail of FM

(Dooling et al. 2002, Lohr et al. 2006)

◮ FM detail can affect behavioural responses

(Trillo et al. 2005, de Kort et al. 2009) Yet... Standard representations assume local stationarity (i.e. signal parameters unchanging) at fine timescales.

◮ Fourier transform magnitudes (spectrograms, MFCCs) ◮ Linear prediction (LPC)

Detail at < 20 ms likely to be smeared or discarded.

dan.stowell@eecs.qmul.ac.uk Analysis techniques matched to bird vocalisations 18

slide-23
SLIDE 23

Multiple birdsong tracking Representing fine modulations Method: Matching Pursuit using Gabor dictionary, single-scale dan.stowell@eecs.qmul.ac.uk Analysis techniques matched to bird vocalisations 19

slide-24
SLIDE 24

Multiple birdsong tracking Representing fine modulations

Distribution derivative method (DDM)

With Muˇ seviˇ c: DDM, related to spectrogram “reassignment”, recovering modulation information as well as fine frequency detail

time(s) frequency(Hz) spectrogram 1.94 1.96 1.98 2 2.02 2.04 2.06 2.08 2.1 2.12 2000 4000 6000 8000 time(s) frequency(Hz) DDM spectrogram (freq. polynomial superimposed) 1.94 1.96 1.98 2 2.02 2.04 2.06 2.08 2.1 2.12 2000 4000 6000 8000 dan.stowell@eecs.qmul.ac.uk Analysis techniques matched to bird vocalisations 20

slide-25
SLIDE 25

Multiple birdsong tracking Representing fine modulations

DDM spectrogram improves tracking

With Muˇ seviˇ c: Reassigned spectrogram (with chirp info) can improve segregation (Stowell et al. 2013, ICASSP)

1 2 3 4 5 Number of signals in mixture 0.0 0.2 0.4 0.6 0.8 1.0 Ftrans

Ideal recovery Recovery from audio (+fwise) Recovery from audio Recovery from audio (baseline)

1 2 3 4 5 Number of signals in mixture 0.0 0.2 0.4 0.6 0.8 1.0 Ftrans

Ideal recovery Recovery from audio (+fwise) Recovery from audio Recovery from audio (baseline)

dan.stowell@eecs.qmul.ac.uk Analysis techniques matched to bird vocalisations 21

slide-26
SLIDE 26

Multiple birdsong tracking Representing fine modulations

Comparison of FM analysis techniques

Many modern techniques exist that can capture rapid modulations:

◮ Spectrogram reassignment and similar

(e.g. ICASSP 2013 with Muˇ seviˇ c)

◮ Chirplets (see Stowell and Plumbley EUSIPCO 2012) ◮ Sparse representations

using e.g. chirplet dictionary, or dictionary learning Do they yield any strong signals of species identity? We can use a classification experiment to investigate.

dan.stowell@eecs.qmul.ac.uk Analysis techniques matched to bird vocalisations 22

slide-27
SLIDE 27

Multiple birdsong tracking Representing fine modulations

Preliminary results

Data: 762 recordings over 84 species (Animal Sound Archive)

  • f which 45 recordings over 5 Phylloscopus species

Method: feature selection, information gain for species classific’n

Phylloscopus Passerines

Spectral statistics (median, max, range) strongest for discrimination FM statistics (median, upper percentiles) strongest

Chirplet detection outperformed sparse representations

dan.stowell@eecs.qmul.ac.uk Analysis techniques matched to bird vocalisations 23

slide-28
SLIDE 28

Multiple birdsong tracking Representing fine modulations

Conclusions

Machine listening methods adapted to bird vocalisations:

  • 1. Multiple tracking with a Markov renewal process model

◮ Novel formulation for multiple intermittent tracking ◮ Parses a scene with an unknown number of sources ◮ Applications in source separation, population estimation, etc

  • 2. Capturing detail of fine modulations

◮ An important feature which need not be obscured in analysis ◮ FM detail improves MMRP tracking, and species classification ◮ Potential data source for study of acoustic adaptation etc

Future work:

◮ Combining recognition and tracking ◮ Scaling up (large data sizes, large num species, . . . ) ◮ Applications

dan.stowell@eecs.qmul.ac.uk Analysis techniques matched to bird vocalisations 24