towards multi instrument drum transcription
play

TOWARDS MULTI-INSTRUMENT DRUM TRANSCRIPTION Richard Vogl 1,2 , - PowerPoint PPT Presentation

TOWARDS MULTI-INSTRUMENT DRUM TRANSCRIPTION Richard Vogl 1,2 , Gerhard Widmer 2 , Peter Knees 1 richard.vogl@tuwien.ac.at, gerhard.widmer@jku.at, peter.knees@tuwien.ac.at 1 2 WHAT IS DRUM TRANSCRIPTION? Input: popular music containing drums


  1. TOWARDS MULTI-INSTRUMENT DRUM TRANSCRIPTION Richard Vogl 1,2 , Gerhard Widmer 2 , Peter Knees 1 richard.vogl@tuwien.ac.at, gerhard.widmer@jku.at, peter.knees@tuwien.ac.at 1 2

  2. WHAT IS DRUM TRANSCRIPTION? Input: popular music containing drums Output: symbolic representation of notes played by drum instruments � 2

  3. STATE OF THE ART Current state-of-the-art systems: ‣ End-to-end / activation-function-based approaches ‣ NN based approaches and NMF approaches spectrogram activation functions hi-hat snare bass t [ms] t [ms] Overview Article 
 Wu, C.-W., Dittmar, C., Southall, C.,Vogl, R., Widmer, G., Hockman, J., Müller, M., Lerch, A.: 
 “ An Overview of Automatic Drum Transcription ,” IEEE TASLP, vol. 26, no. 9, Sept. 2018. � 3

  4. FOCUS OF THIS WORK SD HH BD � 4

  5. FOCUS OF THIS WORK SotA works focus bass drum ( BD ) snare ( SD ) and hi-hat ( HH ) SD HH BD � 4

  6. FOCUS OF THIS WORK SotA works focus bass drum ( BD ) snare ( SD ) and hi-hat ( HH ) ‣ Make up majority of notes in datasets SD HH BD � 4

  7. FOCUS OF THIS WORK SotA works focus bass drum ( BD ) snare ( SD ) and hi-hat ( HH ) ‣ Make up majority of notes in datasets ‣ Beat defining / most important SD HH BD � 4

  8. FOCUS OF THIS WORK SotA works focus bass drum ( BD ) snare ( SD ) and hi-hat ( HH ) ‣ Make up majority of notes in datasets ‣ Beat defining / most important ‣ Well separated spectral energy distribution SD HH BD bass drum snare drum hi-hat � 4

  9. FOCUS OF THIS WORK SotA works focus bass drum ( BD ) snare ( SD ) and hi-hat ( HH ) ‣ Make up majority of notes in datasets ‣ Beat defining / most important ‣ Well separated spectral energy distribution SD HH BD bass drum snare drum hi-hat � 4

  10. FOCUS OF THIS WORK SotA works focus bass drum ( BD ) snare ( SD ) and hi-hat ( HH ) ‣ Make up majority of notes in datasets ‣ Beat defining / most important ‣ Well separated spectral energy distribution SD HH BD bass drum snare drum hi-hat � 4

  11. FOCUS OF THIS WORK SotA works focus bass drum ( BD ) snare ( SD ) and hi-hat ( HH ) ‣ Make up majority of notes in datasets ‣ Beat defining / most important ‣ Well separated spectral energy distribution SD HH BD bass drum snare drum hi-hat � 4

  12. FOCUS OF THIS WORK SotA works focus bass drum ( BD ) snare ( SD ) and hi-hat ( HH ) ‣ Make up majority of notes in datasets ‣ Beat defining / most important ‣ Well separated spectral energy distribution SD HH Other instruments are important! → Increase number of instruments for drum transcription BD bass drum snare drum hi-hat � 4

  13. SYSTEM OVERVIEW train data NN training NN 
 signal feature extraction 
 peak picking preprocessing event detection classification audio events waveform spectrogram activation functions detected peaks f [Hz] A hi-hat hi-hat snare snare bass bass t [s] t [s] t [s] t [s] � 5

  14. NETWORK ARCHITECTURES � 6

  15. NETWORK ARCHITECTURES Convolutional NN ( CNN ) ‣ Convolutions capture local correlations ‣ Acoustic modeling of drum sounds CNN train data sample � 6

  16. NETWORK ARCHITECTURES Convolutional NN ( CNN ) ‣ Convolutions capture local correlations ‣ Acoustic modeling of drum sounds Convolutional RNN ( CRNN ) ‣ ”best of both worlds” ‣ Low-level CNN for acoustic modeling ‣ Higher-level RNN for repetitive pattern modeling CRNN train data sample � 6

  17. NETWORK ARCHITECTURES CNN CRNN Early stopping 2 x conv: 32 x 3x3 (batch norm) 2 x conv: 32 x 3x3 (batch norm) Batch normalization max pool: 1x3 max pool: 1x3 L2 norm Dropout (30%) 2 x conv: 64 x 3x3 (batch norm) 2 x conv: 64 x 3x3 (batch norm) ADAM optimizer max pool: 1x3 max pool: 1x3 2 x dense: 256 3 x RNN: 50 BD GRU frames context conv. layers rec. layers dense layers CNN — 25 — 2x256 see figure CRNN 400 13 3 x 50 BD GRU — � 7

  18. DATASETS � 8

  19. DATASETS ENST-Drums [Gillet and Richard 2006] ♫ ‣ Recordings, three drummers / drum kits ‣ 64 tracks, total duration: 1h � 8

  20. DATASETS ENST-Drums [Gillet and Richard 2006] ♫ ‣ Recordings, three drummers / drum kits ‣ 64 tracks, total duration: 1h � 8

  21. DATASETS ENST-Drums [Gillet and Richard 2006] ♫ ‣ Recordings, three drummers / drum kits ‣ 64 tracks, total duration: 1h MDB Drums [Southall et al. 2017] ♫ ‣ Drum annotations for Medley DB subset ‣ 23 tracks, total duration: 20m � 8

  22. DATASETS ENST-Drums [Gillet and Richard 2006] ♫ ‣ Recordings, three drummers / drum kits ‣ 64 tracks, total duration: 1h MDB Drums [Southall et al. 2017] ♫ ‣ Drum annotations for Medley DB subset ‣ 23 tracks, total duration: 20m � 8

  23. DATASETS ENST-Drums [Gillet and Richard 2006] ♫ ‣ Recordings, three drummers / drum kits ‣ 64 tracks, total duration: 1h MDB Drums [Southall et al. 2017] ♫ ‣ Drum annotations for Medley DB subset ‣ 23 tracks, total duration: 20m RBMA13-Drums [Vogl et al. 2017] ♫ ‣ Music from 2013 Red Bull Music Academy, different styles ‣ 27 tracks, total duration: 1h 43m � 8

  24. DATASETS ENST-Drums [Gillet and Richard 2006] ♫ ‣ Recordings, three drummers / drum kits ‣ 64 tracks, total duration: 1h MDB Drums [Southall et al. 2017] ♫ ‣ Drum annotations for Medley DB subset ‣ 23 tracks, total duration: 20m RBMA13-Drums [Vogl et al. 2017] ♫ ‣ Music from 2013 Red Bull Music Academy, different styles ‣ 27 tracks, total duration: 1h 43m � 8

  25. DATASETS number of classes instrument name 3 8 18 BD BD BD bass drum SD SD SD snare drum SS side stick CLP hand clap HT hight tom MT TT mid tom LT low tom CHH closed hi-hat PHH HH HH pedal hi-hat OHH open hi-hat TB tambourine RD RD ride cymbal RB ride bell BE CB cowbell CRC crash cymbal SPC CY splash cymbal CHC Chinese cymbal CL CL clave/sticks � 9

  26. relative frequency of instrument onsets DATASETS number of classes instrument name 3 8 18 3 BD BD BD bass drum SD SD SD snare drum SS side stick CLP hand clap HT hight tom MT TT mid tom LT low tom 8 CHH closed hi-hat PHH HH HH pedal hi-hat OHH open hi-hat TB tambourine RD RD ride cymbal RB ride bell BE CB cowbell 18 CRC crash cymbal SPC CY splash cymbal CHC Chinese cymbal CL CL clave/sticks � 9

  27. relative frequency of instrument onsets DATASETS number of classes instrument name 3 8 18 3 BD BD BD bass drum SD SD SD snare drum SS side stick CLP hand clap HT hight tom MT TT mid tom LT low tom 8 CHH closed hi-hat PHH HH HH pedal hi-hat OHH open hi-hat TB tambourine RD RD ride cymbal RB ride bell BE CB cowbell 18 CRC crash cymbal SPC CY splash cymbal CHC Chinese cymbal CL CL clave/sticks � 9

  28. SYNTHETIC DATASET NEW! � 10

  29. SYNTHETIC DATASET NEW! Synthetic dataset from MIDI songs ‣ Mix of different genres, full songs � 10

  30. SYNTHETIC DATASET NEW! Synthetic dataset from MIDI songs ‣ Mix of different genres, full songs ‣ Optional accompaniment � 10

  31. SYNTHETIC DATASET NEW! Synthetic dataset from MIDI songs ‣ Mix of different genres, full songs ‣ Optional accompaniment ‣ Diverse drum sounds ( 57 different drum kits , acoustic and electronic) � 10

  32. SYNTHETIC DATASET NEW! Synthetic dataset from MIDI songs ‣ Mix of different genres, full songs ‣ Optional accompaniment ‣ Diverse drum sounds ( 57 different drum kits , acoustic and electronic) ‣ Varying quality, no vocals ! � 10

  33. SYNTHETIC DATASET NEW! Synthetic dataset from MIDI songs ‣ Mix of different genres, full songs ‣ Optional accompaniment ‣ Diverse drum sounds ( 57 different drum kits , acoustic and electronic) ‣ Varying quality, no vocals ! ‣ 4197 tracks, total duration: 259h ♫ � 10

  34. SYNTHETIC DATASET NEW! Synthetic dataset from MIDI songs ‣ Mix of different genres, full songs ‣ Optional accompaniment ‣ Diverse drum sounds ( 57 different drum kits , acoustic and electronic) ‣ Varying quality, no vocals ! ‣ 4197 tracks, total duration: 259h ♫ � 10

  35. relative frequency of instrument onsets SYNTHETIC DATASET 3 8 18 � 11

  36. relative frequency of instrument onsets SYNTHETIC DATASET Follows the same relative instrument 3 distribution 8 18 � 11

  37. relative frequency of instrument onsets SYNTHETIC DATASET Follows the same relative instrument 3 distribution − same bias for instruments same problems during training 8 18 � 11

  38. relative frequency of instrument onsets SYNTHETIC DATASET Follows the same relative instrument 3 distribution − same bias for instruments same problems during training + datasets are representative samples 8 18 � 11

  39. BALANCING OF SYNTHETIC DATASET � 12

  40. BALANCING OF SYNTHETIC DATASET Swap instruments for individual tracks � 12

  41. BALANCING OF SYNTHETIC DATASET Swap instruments for individual tracks Artificial balancing of instrument distribution � 12

  42. relative frequency of instrument onsets BALANCING OF SYNTHETIC DATASET 3 Swap instruments for individual tracks Artificial balancing of instrument distribution 8 ♫ 18 � 12

  43. relative frequency of instrument onsets BALANCING OF SYNTHETIC DATASET 3 Swap instruments for individual tracks Artificial balancing of instrument distribution 8 ♫ 18 � 12

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend