TOWARDS MULTI-INSTRUMENT DRUM TRANSCRIPTION Richard Vogl 1,2 , - PowerPoint PPT Presentation

TOWARDS MULTI-INSTRUMENT DRUM TRANSCRIPTION Richard Vogl 1,2 , Gerhard Widmer 2 , Peter Knees 1 richard.vogl@tuwien.ac.at, gerhard.widmer@jku.at, peter.knees@tuwien.ac.at 1 2

WHAT IS DRUM TRANSCRIPTION? Input: popular music containing drums Output: symbolic representation of notes played by drum instruments � 2

STATE OF THE ART Current state-of-the-art systems: ‣ End-to-end / activation-function-based approaches ‣ NN based approaches and NMF approaches spectrogram activation functions hi-hat snare bass t [ms] t [ms] Overview Article   Wu, C.-W., Dittmar, C., Southall, C.,Vogl, R., Widmer, G., Hockman, J., Müller, M., Lerch, A.:   “ An Overview of Automatic Drum Transcription ,” IEEE TASLP, vol. 26, no. 9, Sept. 2018. � 3

FOCUS OF THIS WORK SD HH BD � 4

FOCUS OF THIS WORK SotA works focus bass drum ( BD ) snare ( SD ) and hi-hat ( HH ) SD HH BD � 4

FOCUS OF THIS WORK SotA works focus bass drum ( BD ) snare ( SD ) and hi-hat ( HH ) ‣ Make up majority of notes in datasets SD HH BD � 4

FOCUS OF THIS WORK SotA works focus bass drum ( BD ) snare ( SD ) and hi-hat ( HH ) ‣ Make up majority of notes in datasets ‣ Beat defining / most important SD HH BD � 4

FOCUS OF THIS WORK SotA works focus bass drum ( BD ) snare ( SD ) and hi-hat ( HH ) ‣ Make up majority of notes in datasets ‣ Beat defining / most important ‣ Well separated spectral energy distribution SD HH BD bass drum snare drum hi-hat � 4

FOCUS OF THIS WORK SotA works focus bass drum ( BD ) snare ( SD ) and hi-hat ( HH ) ‣ Make up majority of notes in datasets ‣ Beat defining / most important ‣ Well separated spectral energy distribution SD HH Other instruments are important! → Increase number of instruments for drum transcription BD bass drum snare drum hi-hat � 4

SYSTEM OVERVIEW train data NN training NN   signal feature extraction   peak picking preprocessing event detection classification audio events waveform spectrogram activation functions detected peaks f [Hz] A hi-hat hi-hat snare snare bass bass t [s] t [s] t [s] t [s] � 5

NETWORK ARCHITECTURES � 6

NETWORK ARCHITECTURES Convolutional NN ( CNN ) ‣ Convolutions capture local correlations ‣ Acoustic modeling of drum sounds CNN train data sample � 6

NETWORK ARCHITECTURES Convolutional NN ( CNN ) ‣ Convolutions capture local correlations ‣ Acoustic modeling of drum sounds Convolutional RNN ( CRNN ) ‣ ”best of both worlds” ‣ Low-level CNN for acoustic modeling ‣ Higher-level RNN for repetitive pattern modeling CRNN train data sample � 6

NETWORK ARCHITECTURES CNN CRNN Early stopping 2 x conv: 32 x 3x3 (batch norm) 2 x conv: 32 x 3x3 (batch norm) Batch normalization max pool: 1x3 max pool: 1x3 L2 norm Dropout (30%) 2 x conv: 64 x 3x3 (batch norm) 2 x conv: 64 x 3x3 (batch norm) ADAM optimizer max pool: 1x3 max pool: 1x3 2 x dense: 256 3 x RNN: 50 BD GRU frames context conv. layers rec. layers dense layers CNN — 25 — 2x256 see figure CRNN 400 13 3 x 50 BD GRU — � 7

DATASETS � 8

DATASETS ENST-Drums [Gillet and Richard 2006] ♫ ‣ Recordings, three drummers / drum kits ‣ 64 tracks, total duration: 1h � 8

DATASETS ENST-Drums [Gillet and Richard 2006] ♫ ‣ Recordings, three drummers / drum kits ‣ 64 tracks, total duration: 1h MDB Drums [Southall et al. 2017] ♫ ‣ Drum annotations for Medley DB subset ‣ 23 tracks, total duration: 20m � 8

DATASETS ENST-Drums [Gillet and Richard 2006] ♫ ‣ Recordings, three drummers / drum kits ‣ 64 tracks, total duration: 1h MDB Drums [Southall et al. 2017] ♫ ‣ Drum annotations for Medley DB subset ‣ 23 tracks, total duration: 20m RBMA13-Drums [Vogl et al. 2017] ♫ ‣ Music from 2013 Red Bull Music Academy, different styles ‣ 27 tracks, total duration: 1h 43m � 8

DATASETS number of classes instrument name 3 8 18 BD BD BD bass drum SD SD SD snare drum SS side stick CLP hand clap HT hight tom MT TT mid tom LT low tom CHH closed hi-hat PHH HH HH pedal hi-hat OHH open hi-hat TB tambourine RD RD ride cymbal RB ride bell BE CB cowbell CRC crash cymbal SPC CY splash cymbal CHC Chinese cymbal CL CL clave/sticks � 9

relative frequency of instrument onsets DATASETS number of classes instrument name 3 8 18 3 BD BD BD bass drum SD SD SD snare drum SS side stick CLP hand clap HT hight tom MT TT mid tom LT low tom 8 CHH closed hi-hat PHH HH HH pedal hi-hat OHH open hi-hat TB tambourine RD RD ride cymbal RB ride bell BE CB cowbell 18 CRC crash cymbal SPC CY splash cymbal CHC Chinese cymbal CL CL clave/sticks � 9

SYNTHETIC DATASET NEW! � 10

SYNTHETIC DATASET NEW! Synthetic dataset from MIDI songs ‣ Mix of different genres, full songs � 10

SYNTHETIC DATASET NEW! Synthetic dataset from MIDI songs ‣ Mix of different genres, full songs ‣ Optional accompaniment � 10

SYNTHETIC DATASET NEW! Synthetic dataset from MIDI songs ‣ Mix of different genres, full songs ‣ Optional accompaniment ‣ Diverse drum sounds ( 57 different drum kits , acoustic and electronic) � 10

SYNTHETIC DATASET NEW! Synthetic dataset from MIDI songs ‣ Mix of different genres, full songs ‣ Optional accompaniment ‣ Diverse drum sounds ( 57 different drum kits , acoustic and electronic) ‣ Varying quality, no vocals ! � 10

SYNTHETIC DATASET NEW! Synthetic dataset from MIDI songs ‣ Mix of different genres, full songs ‣ Optional accompaniment ‣ Diverse drum sounds ( 57 different drum kits , acoustic and electronic) ‣ Varying quality, no vocals ! ‣ 4197 tracks, total duration: 259h ♫ � 10

relative frequency of instrument onsets SYNTHETIC DATASET 3 8 18 � 11

relative frequency of instrument onsets SYNTHETIC DATASET Follows the same relative instrument 3 distribution 8 18 � 11

relative frequency of instrument onsets SYNTHETIC DATASET Follows the same relative instrument 3 distribution − same bias for instruments same problems during training 8 18 � 11

relative frequency of instrument onsets SYNTHETIC DATASET Follows the same relative instrument 3 distribution − same bias for instruments same problems during training + datasets are representative samples 8 18 � 11

BALANCING OF SYNTHETIC DATASET � 12

BALANCING OF SYNTHETIC DATASET Swap instruments for individual tracks � 12

BALANCING OF SYNTHETIC DATASET Swap instruments for individual tracks Artificial balancing of instrument distribution � 12

relative frequency of instrument onsets BALANCING OF SYNTHETIC DATASET 3 Swap instruments for individual tracks Artificial balancing of instrument distribution 8 ♫ 18 � 12

TOWARDS MULTI-INSTRUMENT DRUM TRANSCRIPTION Richard Vogl 1,2 , - PowerPoint PPT Presentation

TOWARDS MULTI-INSTRUMENT DRUM TRANSCRIPTION Richard Vogl 1,2 , Gerhard Widmer 2 , Peter Knees 1 richard.vogl@tuwien.ac.at, gerhard.widmer@jku.at, peter.knees@tuwien.ac.at 1 2 WHAT IS DRUM TRANSCRIPTION? Input: popular music containing drums

FROM DRUM TRANSCRIPTION TO DRUM PATTERN VARIATION Richard Vogl richard.vogl@tuwien.ac.at PART 1

DRUM SHADE HAY Drum Shade is a fabric covered light shade with a laminated textile onto a

DRUM TRANSCRIPTION VIA JOINT BEAT AND DRUM MODELING USING CONVOLUTIONAL RNNs Richard Vogl

Automatic Drum Transcription E6820 Project Proposal Ron Weiss ronw@ee.columbia.edu Automatic

DRUM TRANSCRIPTION VIA JOINT BEAT AND DRUM MODELING USING CONVOLUTIONAL RNNs Richard Vogl 1,2

Combining Temporal And Spectral Features in HMM-based Drum Transcription Jouni Paulus, Anssi

Good morning, it is my pleasure to introduce you to DRUM for UHC. DRUM is the brainchild of

DRUM TRANSCRIPTION FROM POLYPHONIC MUSIC WITH RECURRENT NEURAL NETWORKS Richard Vogl 1,2 ,

Towards Deep Multi-View Stereo Silvano Galliani October 2, 2017 1 / 40 Towards Deep Multi-View

TFClass a classifjcation of transcription factors Jrgen Dnitz, Edgar Wingender T

Unsupervised Piano Music Transcription Taylor Berg-Kirkpatrick Jacob Andreas and Dan Klein

Theoretical Biology 2016 Transcription factors bind DNA to block or enhance transcription

Transcription: Pausing and Backtracking: Error Correction Mamata Sahoo and Stefan Klumpp Theory

The synpad a position sensing midi drum interface I will be talking today about my attempts

Can you hear the shape of a drum ? and Deformational Spectral Rigidity V. Kaloshin February 7,

The X-ray Correlation Spectroscopy Instrument at LCLS Aymeric Robert XCS Instrument @ LCLS

Structured training for large-vocabulary chord recognition Brian McFee* & Juan Pablo Bello

T O MB RAIDER T HE ART O F EPIC SC O RING I N T R O D U C T I O N 1 INT RO DUC T IO N

Energy Systema-cs Studies Elizabeth Worcester (BNL) March 15,

Walking Bass & Jazz Founda'ons Guide The Easy To Understand Guide To Crea'ng Walking Bass

Decision Making Probabilistic model Known Unknown Bayes Decision Supervised Unsupervised

Pattern Recognition: An Overview Prof. Richard Zanibbi Pattern Recognition (One) Definition

WSD Word Sense Disambiguation: Determine from context (or otherwise) what Word Sense

Introduction to Machine Learning Machine Perception An Example Pattern Recognition Systems The

Sambuz

Useful Links

Newsletter

Mail Us

TOWARDS MULTI-INSTRUMENT DRUM TRANSCRIPTION Richard Vogl 1,2 , - PowerPoint PPT Presentation

TOWARDS MULTI-INSTRUMENT DRUM TRANSCRIPTION Richard Vogl 1,2 , Gerhard Widmer 2 , Peter Knees 1 richard.vogl@tuwien.ac.at, gerhard.widmer@jku.at, peter.knees@tuwien.ac.at 1 2 WHAT IS DRUM TRANSCRIPTION? Input: popular music containing drums

FROM DRUM TRANSCRIPTION TO DRUM PATTERN VARIATION Richard Vogl richard.vogl@tuwien.ac.at PART 1

DRUM SHADE HAY Drum Shade is a fabric covered light shade with a laminated textile onto a

DRUM TRANSCRIPTION VIA JOINT BEAT AND DRUM MODELING USING CONVOLUTIONAL RNNs Richard Vogl

Automatic Drum Transcription E6820 Project Proposal Ron Weiss ronw@ee.columbia.edu Automatic

DRUM TRANSCRIPTION VIA JOINT BEAT AND DRUM MODELING USING CONVOLUTIONAL RNNs Richard Vogl 1,2

Combining Temporal And Spectral Features in HMM-based Drum Transcription Jouni Paulus, Anssi

Good morning, it is my pleasure to introduce you to DRUM for UHC. DRUM is the brainchild of

DRUM TRANSCRIPTION FROM POLYPHONIC MUSIC WITH RECURRENT NEURAL NETWORKS Richard Vogl 1,2 ,

Towards Deep Multi-View Stereo Silvano Galliani October 2, 2017 1 / 40 Towards Deep Multi-View

TFClass a classifjcation of transcription factors Jrgen Dnitz, Edgar Wingender T

Unsupervised Piano Music Transcription Taylor Berg-Kirkpatrick Jacob Andreas and Dan Klein

Theoretical Biology 2016 Transcription factors bind DNA to block or enhance transcription

Transcription: Pausing and Backtracking: Error Correction Mamata Sahoo and Stefan Klumpp Theory

The synpad a position sensing midi drum interface I will be talking today about my attempts

Can you hear the shape of a drum ? and Deformational Spectral Rigidity V. Kaloshin February 7,

The X-ray Correlation Spectroscopy Instrument at LCLS Aymeric Robert XCS Instrument @ LCLS

Structured training for large-vocabulary chord recognition Brian McFee* &amp; Juan Pablo Bello

T O MB RAIDER T HE ART O F EPIC SC O RING I N T R O D U C T I O N 1 INT RO DUC T IO N

Energy Systema-cs Studies Elizabeth Worcester (BNL) March 15,

Walking Bass &amp; Jazz Founda'ons Guide The Easy To Understand Guide To Crea'ng Walking Bass

Decision Making Probabilistic model Known Unknown Bayes Decision Supervised Unsupervised

Pattern Recognition: An Overview Prof. Richard Zanibbi Pattern Recognition (One) Definition

WSD Word Sense Disambiguation: Determine from context (or otherwise) what Word Sense

Introduction to Machine Learning Machine Perception An Example Pattern Recognition Systems The

Sambuz

Useful Links

Newsletter

Mail Us

Structured training for large-vocabulary chord recognition Brian McFee* & Juan Pablo Bello

Walking Bass & Jazz Founda'ons Guide The Easy To Understand Guide To Crea'ng Walking Bass