drum transcription via joint beat and drum modeling using
play

DRUM TRANSCRIPTION VIA JOINT BEAT AND DRUM MODELING USING - PowerPoint PPT Presentation

DRUM TRANSCRIPTION VIA JOINT BEAT AND DRUM MODELING USING CONVOLUTIONAL RNNs Richard Vogl 1,2 , Matthias Dorfer 2 , Gerhard Widmer 2 , Peter Knees 1 richard.vogl@tuwien.ac.at, matthias.dorfer@jku.at, gerhard.widmer@jku.at,


  1. DRUM TRANSCRIPTION VIA 
 JOINT BEAT AND DRUM MODELING USING CONVOLUTIONAL RNNs Richard Vogl 1,2 , Matthias Dorfer 2 , Gerhard Widmer 2 , Peter Knees 1 richard.vogl@tuwien.ac.at, matthias.dorfer@jku.at, gerhard.widmer@jku.at, peter.knees@tuwien.ac.at 1 2

  2. WHAT IS DRUM TRANSCRIPTION? Input: western popular music containing drums Output: symbolic representation of notes played by drum instruments 2

  3. WHAT IS DRUM TRANSCRIPTION? Focus on the three major drum instruments: ‣ bass or kick drum ( KD ) ‣ snare drum ( SD ) ‣ hi-hat ( HH ) SD HH KD Reasons: ‣ Dominant instruments: most onsets ‣ Common subset for public datasets 3

  4. SYSTEM OVERVIEW NN 
 feature extraction 
 signal peak picking event detection preprocessing classification audio events NN training 4

  5. SYSTEM OVERVIEW NN 
 feature extraction 
 signal peak picking event detection preprocessing classification audio events NN training spectrogram f [Hz] t [s] 4

  6. SYSTEM OVERVIEW NN 
 feature extraction 
 signal peak picking event detection preprocessing classification audio events NN training spectrogram activation functions f [Hz] t [s] t [s] 4

  7. SYSTEM OVERVIEW NN 
 feature extraction 
 signal peak picking event detection preprocessing classification audio events NN training spectrogram activation functions f [Hz] t [s] t [s] 4

  8. ISSUES OF CURRENT SYSTEMS 5

  9. ISSUES OF CURRENT SYSTEMS Performance not satisfying on real music 5

  10. ISSUES OF CURRENT SYSTEMS Performance not satisfying on real music Do not produce additional information for transcripts 
 drum onset detection vs drum transcription 5

  11. ISSUES OF CURRENT SYSTEMS Performance not satisfying on real music Do not produce additional information for transcripts 
 drum onset detection vs drum transcription ‣ bars lines 5

  12. ISSUES OF CURRENT SYSTEMS Performance not satisfying on real music Do not produce additional information for transcripts 
 drum onset detection vs drum transcription ‣ bars lines ‣ tempo 5

  13. ISSUES OF CURRENT SYSTEMS Performance not satisfying on real music Do not produce additional information for transcripts 
 drum onset detection vs drum transcription ‣ bars lines ‣ tempo ‣ meter 5

  14. ISSUES OF CURRENT SYSTEMS Performance not satisfying on real music Do not produce additional information for transcripts 
 drum onset detection vs drum transcription ‣ bars lines ‣ tempo ‣ meter ‣ dynamics / accents 5

  15. ISSUES OF CURRENT SYSTEMS Performance not satisfying on real music Do not produce additional information for transcripts 
 drum onset detection vs drum transcription ‣ bars lines ‣ tempo ‣ meter ‣ dynamics / accents ‣ stroke / playing technique 5

  16. ISSUES OF CURRENT SYSTEMS Performance not satisfying on real music Do not produce additional information for transcripts 
 drum onset detection vs drum transcription ‣ bars lines ‣ tempo ‣ meter ‣ dynamics / accents ‣ stroke / playing technique Only three instrument classes etc. 5

  17. ISSUES OF CURRENT SYSTEMS Performance not satisfying on real music Do not produce additional information for transcripts 
 drum onset detection vs drum transcription ‣ bars lines ‣ tempo ‣ meter ‣ dynamics / accents ‣ stroke / playing technique Only three instrument classes etc. 5

  18. ADDITIONAL INFORMATION FOR TRANSCRIPTS HH 
 SD 
 KD t 6

  19. ADDITIONAL INFORMATION FOR TRANSCRIPTS Use beat and downbeat tracking to get: HH 
 SD 
 KD t 6

  20. ADDITIONAL INFORMATION FOR TRANSCRIPTS Use beat and downbeat tracking to get: beats 2 3 4 1 2 3 4 1 bars lines ‣ HH 
 SD 
 tempo ‣ KD meter ‣ t 6

  21. ADDITIONAL INFORMATION FOR TRANSCRIPTS ✔ Use beat and downbeat tracking to get: beats 2 3 4 1 2 3 4 1 bars lines ‣ HH 
 SD 
 tempo ‣ KD meter ‣ t 6

  22. IMPROVE PERFORMANCE Three components to reach this goal: 1. Leverage beat information 2. Better model for drum detection 3. Dataset with real music for training 7

  23. 1. LEVERAGE BEAT INFORMATION beats 2 3 4 1 2 3 4 1 HH 
 SD 
 KD t 8

  24. 1. LEVERAGE BEAT INFORMATION beats 2 3 4 1 2 3 4 1 HH 
 SD 
 KD t Beats are highly correlated with drum patterns 8

  25. 1. LEVERAGE BEAT INFORMATION beats 2 3 4 1 2 3 4 1 HH 
 SD 
 KD t Beats are highly correlated with drum patterns Assume that prior knowledge of beats is helpful for drum transcription 
 (drum hit locations / repetitive patterns) 8

  26. 1. LEVERAGE BEAT INFORMATION beats 2 3 4 1 2 3 4 1 HH 
 SD 
 KD t Beats are highly correlated with drum patterns Assume that prior knowledge of beats is helpful for drum transcription 
 (drum hit locations / repetitive patterns) Use multi-task learning for beats and drums 8

  27. MULTI-TASK LEARNING input output f [Hz] t [s] 9

  28. MULTI-TASK LEARNING input output f [Hz] t [s] Three experiments: 9

  29. MULTI-TASK LEARNING input output f [Hz] t [s] t [s] Three experiments: ‣ Training on drum targets ( DT ) 9

  30. MULTI-TASK LEARNING input output f [Hz] t [s] t [s] Three experiments: ‣ Training on drum targets ( DT ) ‣ Training on drum targets with annotated beats as additional input features ( BF ) 9

  31. MULTI-TASK LEARNING input output f [Hz] t [s] t [s] Three experiments: ‣ Training on drum targets ( DT ) ‣ Training on drum targets with annotated beats as additional input features ( BF ) ‣ Training on drum and beat targets as multi-task problem ( MT ) 9

  32. MULTI-TASK LEARNING input output f [Hz] t [s] t [s] Three experiments: ‣ Training on drum targets ( DT ) ‣ Training on drum targets with annotated beats as additional input features ( BF ) ‣ Training on drum and beat targets as multi-task problem ( MT ) Expected increase in performance for BF compared to DT 9

  33. MULTI-TASK LEARNING input output f [Hz] t [s] t [s] Three experiments: ‣ Training on drum targets ( DT ) ‣ Training on drum targets with annotated beats as additional input features ( BF ) ‣ Training on drum and beat targets as multi-task problem ( MT ) Expected increase in performance for BF compared to DT Expected increase in performance for MT compared to DT 9

  34. 2. NETWORK MODELS — BASELINE MODELS 10

  35. 2. NETWORK MODELS — BASELINE MODELS Recurrent neural networks 10

  36. 2. NETWORK MODELS — BASELINE MODELS Recurrent neural networks ‣ Recurrent connections act as memory ‣ Processing of sequential data RNN train data sample 10

  37. 2. NETWORK MODELS — BASELINE MODELS Recurrent neural networks ‣ Recurrent connections act as memory ‣ Processing of sequential data ‣ Work well for drum detection and beat tracking 
 [Böck et al. ISMIR’16] 
 RNN train data sample 10

  38. 2. NETWORK MODELS — BASELINE MODELS Recurrent neural networks ‣ Recurrent connections act as memory ‣ Processing of sequential data ‣ Work well for drum detection and beat tracking 
 [Böck et al. ISMIR’16] 
 RNN with label time shift ( tsRNN ) 
 state-of-the-art baseline [Vogl et al. ICASSP’17] 
 Bidirectional recurrent NN ( BDRNN ) 
 [Vogl et al. ISMIR’16] [Southall et al. ISMIR’16] RNN train data sample ‣ Similar performance tsRNN 10

  39. 2. NETWORK MODELS — NEW FOR DT 11

  40. 2. NETWORK MODELS — NEW FOR DT Convolutional NN ( CNN ) ‣ Convolutions capture local correlations ‣ Acoustic modeling of drum sounds CNN train data sample 11

  41. 2. NETWORK MODELS — NEW FOR DT Convolutional NN ( CNN ) ‣ Convolutions capture local correlations ‣ Acoustic modeling of drum sounds Convolutional BDRNN ( CRNN ) ‣ ”best of both worlds” ‣ Low-level CNN for acoustic modeling ‣ Higher-level RNN for repetitive pattern modeling CRNN train data sample 11

  42. NETWORK MODELS Frames Context Conv. Layers Rec. Layers Dense Layers BDRNN (S) 100 — — 2x50 GRU — BDRNN (L) 400 — — 3x30 GRU — CNN (S) — 9 — 2x256 2 x 32 3x3 filt. 
 3x3 max pooling 
 CNN (L) — 25 — 2x256 2 x 64 3x3 filt. 
 CRNN (S) 100 9 2x50 GRU — 3x3 max pooling 
 all w/ batch norm. CRNN (L) 400 13 3x60 GRU — tsRNN state-of-the-art baseline [Vogl et al. ICASSP’17] 12

  43. CLASSIC DATASETS (ONLY DRUMS) 13

  44. CLASSIC DATASETS (ONLY DRUMS) IDMT-SMT-Drums [Dittmar and Gärtner 2014] ♫ ‣ Solo drum tracks, recorded, synthesized, and sampled ‣ 95 tracks, total: 24m , onsets: 8004 + training samples 13

  45. CLASSIC DATASETS (ONLY DRUMS) IDMT-SMT-Drums [Dittmar and Gärtner 2014] ♫ ‣ Solo drum tracks, recorded, synthesized, and sampled ‣ 95 tracks, total: 24m , onsets: 8004 + training samples 13

  46. CLASSIC DATASETS (ONLY DRUMS) IDMT-SMT-Drums [Dittmar and Gärtner 2014] ♫ ‣ Solo drum tracks, recorded, synthesized, and sampled ‣ 95 tracks, total: 24m , onsets: 8004 + training samples ENST-Drums [Gillet and Richard 2006] ‣ Recordings, three drummers on different drum kits, optional accompaniment ‣ 64 tracks, total: 1h , onsets: 22391 + training samples ♫ ♫ 13

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend