machine listening for birds analysis techniques matched
play

Machine listening for birds: analysis techniques matched to the - PowerPoint PPT Presentation

Multiple birdsong tracking Representing fine modulations Machine listening for birds: analysis techniques matched to the characteristics of bird vocalisations Dan Stowell and Mark D Plumbley Centre for Digital Music School of Elec Eng &


  1. Multiple birdsong tracking Representing fine modulations Machine listening for birds: analysis techniques matched to the characteristics of bird vocalisations Dan Stowell and Mark D Plumbley Centre for Digital Music School of Elec Eng & Computer Science Queen Mary, University of London June 2013, Listening in the Wild dan.stowell@eecs.qmul.ac.uk Analysis techniques matched to bird vocalisations 1

  2. Multiple birdsong tracking Representing fine modulations Motivation “Cocktail party” problems. . . dan.stowell@eecs.qmul.ac.uk Analysis techniques matched to bird vocalisations 2

  3. Multiple birdsong tracking Representing fine modulations Motivation Photo: Shutterstock / Romeo Mikulic dan.stowell@eecs.qmul.ac.uk Analysis techniques matched to bird vocalisations 3

  4. Multiple birdsong tracking Representing fine modulations Motivation We often have audio with multiple birds, and would like to perform automatic tasks (recognition, tracking, counting. . . ) Existing computational methods don’t quite fit the characteristics of bird vocalisations: 1. Multiple “speakers”, and discontinuous utterances —problematic for methods adapted from speech recognition 2. Birds often use very rapid modulations, yet typical signal representations (spectrograms, MFCCs, LPC) do not capture them dan.stowell@eecs.qmul.ac.uk Analysis techniques matched to bird vocalisations 4

  5. Multiple birdsong tracking Representing fine modulations Outline 1. Syllable-to-syllable tracking of multiple birds 2. Representing the fine detail of bird vocalisations 8000 6000 4000 2000 dan.stowell@eecs.qmul.ac.uk Analysis techniques matched to bird vocalisations 5

  6. Multiple birdsong tracking Representing fine modulations Multiple birdsong tracking Chiffchaff ( Phylloscopus collybita ) dan.stowell@eecs.qmul.ac.uk Analysis techniques matched to bird vocalisations 6

  7. Multiple birdsong tracking Representing fine modulations Automatic Speech Recognition Hidden Markov Model: y 1 y 2 y 3 y 4 x 1 x 2 x 3 x 4 t 1 t 2 t 3 t 4 dan.stowell@eecs.qmul.ac.uk Analysis techniques matched to bird vocalisations 7 time

  8. Multiple birdsong tracking Representing fine modulations Intermittent polyphonic sources dan.stowell@eecs.qmul.ac.uk Analysis techniques matched to bird vocalisations 8

  9. Multiple birdsong tracking Representing fine modulations Intermittent polyphonic sources dan.stowell@eecs.qmul.ac.uk Analysis techniques matched to bird vocalisations 8

  10. Multiple birdsong tracking Representing fine modulations Modelling an intermittent source Markov renewal process (“MRP”): P ( τ n +1 ≤ t , X n +1 = j | ( X 1 , T 1 ) , . . . , ( X n = i , T n ) ) = P ( τ n +1 ≤ t , X n +1 = j | X n = i ) where τ n +1 is the time difference T n +1 − T n . dan.stowell@eecs.qmul.ac.uk Analysis techniques matched to bird vocalisations 9

  11. Multiple birdsong tracking Representing fine modulations Multiple MRPs Problem sketch: assume multiple MRPs, plus potential “clutter”. Given transition probabilities, find the most likely set of paths. (Max 1 path per node) dan.stowell@eecs.qmul.ac.uk Analysis techniques matched to bird vocalisations 10

  12. Multiple birdsong tracking Representing fine modulations Flow networks, and minimum cost flow a c (X 1 ) V 1 a t (X 1 ,X 3 ,T 3 -T 1 ) a d (X 1 ) a c (X 3 ) a t (X 1 ,X 2 ,T 2 -T 1 ) a b (X 1 ) a d (X 3 ) V 3 t a b (X 3 ) a d (X 2 ) s a t (X 2 ,X 3 ,T 3 -T 2 ) a b (X 2 ) a c (X 2 ) V2 Convert likelihood expression to flow “costs”: a b ( X ) = − log p b ( X ) a d ( X ) = − log p d ( X ) a t ( X , X ′ , τ ) = − log f X ( X ′ , τ ) a c ( X ) = log p c ( X ) dan.stowell@eecs.qmul.ac.uk Analysis techniques matched to bird vocalisations 11

  13. Multiple birdsong tracking Representing fine modulations Flow networks, and minimum cost flow a c (X 1 ) V 1 a t (X 1 ,X 3 ,T 3 -T 1 ) a d (X 1 ) a c (X 3 ) a t (X 1 ,X 2 ,T 2 -T 1 ) a b (X 1 ) a d (X 3 ) V 3 t a b (X 3 ) a d (X 2 ) s a t (X 2 ,X 3 ,T 3 -T 2 ) a b (X 2 ) a c (X 2 ) V2 Convert likelihood expression to flow “costs”: a b ( X ) = − log p b ( X ) a d ( X ) = − log p d ( X ) a t ( X , X ′ , τ ) = − log f X ( X ′ , τ ) a c ( X ) = log p c ( X ) dan.stowell@eecs.qmul.ac.uk Analysis techniques matched to bird vocalisations 11

  14. Multiple birdsong tracking Representing fine modulations Flow networks, and minimum cost flow a c (X 1 ) V 1 a t (X 1 ,X 3 ,T 3 -T 1 ) a d (X 1 ) a c (X 3 ) a t (X 1 ,X 2 ,T 2 -T 1 ) a b (X 1 ) a d (X 3 ) V 3 t a b (X 3 ) a d (X 2 ) s a t (X 2 ,X 3 ,T 3 -T 2 ) a b (X 2 ) a c (X 2 ) V2 Convert likelihood expression to flow “costs”: a b ( X ) = − log p b ( X ) a d ( X ) = − log p d ( X ) a t ( X , X ′ , τ ) = − log f X ( X ′ , τ ) a c ( X ) = log p c ( X ) dan.stowell@eecs.qmul.ac.uk Analysis techniques matched to bird vocalisations 11

  15. Multiple birdsong tracking Representing fine modulations Minimum cost flow Minimum cost flow algorithms can therefore solve this problem: ◮ Optimal minimum-cost flow: Edmonds-Karp algorithm, asymptotic time complexity O ( | V || A | 2 ). ◮ Or use inexact (greedy) algorithm: O ( | V || A | ) or lower. dan.stowell@eecs.qmul.ac.uk Analysis techniques matched to bird vocalisations 12

  16. LR 6.33e+18 LR 1.45e+21 60 60 60 60 generator: locked 50 50 50 50 40 40 40 40 30 30 30 30 20 20 20 20 Multiple birdsong tracking 10 10 10 10 Representing fine modulations 0 2 4 6 8 10 0 2 4 6 8 10 0 2 4 6 8 10 0 2 4 6 8 10 LR 1.42e+12 LR 4.55e+17 Synthetic example generator: coherent 60 60 60 60 50 50 50 50 40 40 40 40 30 30 30 30 20 20 20 20 10 10 10 10 LR 6.33e+18 LR 1.45e+21 0 2 4 6 8 10 0 2 4 6 8 10 0 2 4 6 8 10 0 2 4 6 8 10 LR 3.11e+16 generator: segregated 60 60 60 60 generator: locked 60 60 60 60 50 50 50 50 50 50 50 50 40 40 40 40 40 40 40 40 30 30 30 30 30 30 30 30 20 20 20 20 20 20 20 20 10 10 10 10 10 10 10 10 0 2 4 6 8 10 0 2 4 6 8 10 0 2 4 6 8 10 0 2 4 6 8 10 LR 1.42e+12 LR 4.55e+17 0 2 4 6 8 10 0 2 4 6 8 10 0.0 0.2 0.4 0.6 0.8 1.0 0 2 4 6 8 10 clean signal signal in noise inferred (coherent) inferred (segregated) generator: coherent 60 60 60 60 50 50 50 50 40 40 40 40 30 30 30 30 20 20 20 20 10 10 10 10 0 2 4 6 8 10 0 2 4 6 8 10 0 2 4 6 8 10 0 2 4 6 8 10 dan.stowell@eecs.qmul.ac.uk Analysis techniques matched to bird vocalisations 13 LR 3.11e+16 generator: segregated 60 60 60 60 50 50 50 50 40 40 40 40 30 30 30 30 20 20 20 20 10 10 10 10 0 2 4 6 8 10 0 2 4 6 8 10 0.0 0.2 0.4 0.6 0.8 1.0 0 2 4 6 8 10 clean signal signal in noise inferred (coherent) inferred (segregated)

  17. Multiple birdsong tracking Representing fine modulations Birdsong experiment 25 European recordings of Chiffchaff (source: Xeno Canto) Mixtures of 2–5 recordings, 5-fold crossvalidation Can it cluster the “syllables” in the same way as the source audio? dan.stowell@eecs.qmul.ac.uk Analysis techniques matched to bird vocalisations 14

  18. Multiple birdsong tracking Representing fine modulations Data preparation Syllables detected by spectrogram cross-correlation. 7400 6500 5700 Freq (Hz) 4800 XC25760-dn.xcor 4000 10000 3100 0.05 0.11 0.17 Time (s) 8000 Template 6000 Freq (Hz) 4000 2000 0 0 5 10 15 20 25 Time (s) dan.stowell@eecs.qmul.ac.uk Analysis techniques matched to bird vocalisations 15

  19. Multiple birdsong tracking Representing fine modulations Results 1.0 0.8 0.6 Ftrans 0.4 Ideal recovery, trained on test data Ideal recovery Ideal recovery plus synthetic noise 0.2 Recovery from audio Recovery from audio (greedy) Recovery from audio (baseline) 0.0 1 2 3 4 5 Number of signals in mixture Means and standard errors are shown (5-fold crossvalidation) dan.stowell@eecs.qmul.ac.uk Analysis techniques matched to bird vocalisations 16

  20. Multiple birdsong tracking Representing fine modulations dan.stowell@eecs.qmul.ac.uk Analysis techniques matched to bird vocalisations 17

  21. Multiple birdsong tracking Representing fine modulations Representing fine modulations Many (song)birds use very rapid frequency modulation (FM) ◮ Songbirds can perceive fine detail of FM (Dooling et al. 2002, Lohr et al. 2006) ◮ FM detail can affect behavioural responses (Trillo et al. 2005, de Kort et al. 2009) Yet... Standard representations assume local stationarity (i.e. signal parameters unchanging) at fine timescales. ◮ Fourier transform magnitudes (spectrograms, MFCCs) ◮ Linear prediction (LPC) Detail at < 20 ms likely to be smeared or discarded. dan.stowell@eecs.qmul.ac.uk Analysis techniques matched to bird vocalisations 18

  22. Multiple birdsong tracking Representing fine modulations Representing fine modulations Many (song)birds use very rapid frequency modulation (FM) ◮ Songbirds can perceive fine detail of FM (Dooling et al. 2002, Lohr et al. 2006) ◮ FM detail can affect behavioural responses (Trillo et al. 2005, de Kort et al. 2009) Yet... Standard representations assume local stationarity (i.e. signal parameters unchanging) at fine timescales. ◮ Fourier transform magnitudes (spectrograms, MFCCs) ◮ Linear prediction (LPC) Detail at < 20 ms likely to be smeared or discarded. dan.stowell@eecs.qmul.ac.uk Analysis techniques matched to bird vocalisations 18

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend