DRUM TRANSCRIPTION VIA JOINT BEAT AND DRUM MODELING USING - PowerPoint PPT Presentation

WHY IS CONTEXT RELEVANT? Instruments from the same class often sound quite different   Similar sound for different instruments ♫ ♫ snare drums: crash v.s. splash: When humans transcribe drums ‣ Function in a track equally important (snare drum v.s. backbeat) ‣ Inaudible onsets will be filled in if expected � 13

WHY IS CONTEXT RELEVANT? Instruments from the same class often sound quite different   Similar sound for different instruments ♫ ♫ snare drums: crash v.s. splash: When humans transcribe drums ‣ Function in a track equally important (snare drum v.s. backbeat) ‣ Inaudible onsets will be filled in if expected Music Language Model � 13

BASS DRUM OR LOW TOM? ♫ ♫ ? 1: bass drum 2: floor tom 3 : ? ? ? � 14

BASS DRUM OR LOW TOM? ♫ ♫ ♫ context ? 1: bass drum 2: floor tom 3 : ? ? ? � 14

BASS DRUM OR LOW TOM? ♫ ♫ ♫ context ? 1: bass drum 2: floor tom 3 : bass drum � 14

DATASETS � 15

DATASETS ♫ IDMT-SMT-Drums [Dittmar and Gärtner 2014] ‣ Solo drum tracks, recorded, synthesized, and sampled ‣ 95 tracks, total: 24m , onsets: 8004 � 15

DATASETS SMT (simple!) ♫ IDMT-SMT-Drums [Dittmar and Gärtner 2014] ‣ Solo drum tracks, recorded, synthesized, and sampled ‣ 95 tracks, total: 24m , onsets: 8004 � 15

DATASETS SMT (simple!) ♫ IDMT-SMT-Drums [Dittmar and Gärtner 2014] ‣ Solo drum tracks, recorded, synthesized, and sampled ‣ 95 tracks, total: 24m , onsets: 8004 ENST-Drums [Gillet and Richard 2006] ‣ Recordings, three drummers on different drum kits, optional accompaniment ‣ 64 tracks, total: 1h , onsets: 22391 ♫ ♫ � 15

DATASETS SMT (simple!) ♫ IDMT-SMT-Drums [Dittmar and Gärtner 2014] ‣ Solo drum tracks, recorded, synthesized, and sampled ‣ 95 tracks, total: 24m , onsets: 8004 ENST-Drums [Gillet and Richard 2006] ‣ Recordings, three drummers on different drum kits, optional accompaniment ‣ 64 tracks, total: 1h , onsets: 22391 ♫ ♫ ENST solo (harder!) � 15

DATASETS SMT (simple!) ♫ IDMT-SMT-Drums [Dittmar and Gärtner 2014] ‣ Solo drum tracks, recorded, synthesized, and sampled ‣ 95 tracks, total: 24m , onsets: 8004 ENST-Drums [Gillet and Richard 2006] ‣ Recordings, three drummers on different drum kits, optional accompaniment ‣ 64 tracks, total: 1h , onsets: 22391 ♫ ♫ . c c a T S N ENST solo E ) ! t l u c i f f i d (harder!) ( � 15

NETWORK MODELS Frames Context Conv. Layers Rec. Layers Dense Layers RNN (S) 100 — — 2x50 GRU — RNN (L) 400 — — 3x30 GRU — Architecture CNN (S) — 9 — 2x256 2 x 32 3x3 filt.   3x3 max pooling   CNN (L) — 25 — 2x256 2 x 64 3x3 filt.   CRNN (S) 100 9 2x50 GRU — 3x3 max pooling   all w/ batch norm. CRNN (L) 400 13 3x60 GRU — tsRNN baseline [Vogl et al. ICASSP’17] Early stopping Dropout Batch normalization ADAM optimizer L2 norm � 16

accompaniment SMT ENST with   SMT ENST acc. ENST solo RESULTS 100 90 tsRNN F-measure [%] RNN (S) RNN (L) CNN (S) 80 CNN (L) CRNN (S) CRNN (L) 70 60 ENST solo � 17

HOW DOES IT SOUND? “Punk” MEDLEY DB hi-hat snare bass ♫ ♫ ♫ � 18

HOW DOES IT SOUND? “Hendrix” MEDLEY DB hi-hat snare bass ♫ ♫ ♫ � 18

HOW DOES IT SOUND? Alexa, play some music… hi-hat snare bass ♫ ♫ ♫ � 18

PART 1 AUTOMATIC DRUM TRANSCRIPTION Task Definition, Problem Modeling, Architectures PART 2 MULTI-TASK LEARNING Metadata for Transcripts

LIMITATIONS OF CURRENT SYSTEMS � 20

LIMITATIONS OF CURRENT SYSTEMS Do not produce additional information for transcripts   drum onset detection vs drum transcription � 20

LIMITATIONS OF CURRENT SYSTEMS Do not produce additional information for transcripts   drum onset detection vs drum transcription ‣ bars lines � 20

LIMITATIONS OF CURRENT SYSTEMS Do not produce additional information for transcripts   drum onset detection vs drum transcription ‣ bars lines ‣ tempo � 20

LIMITATIONS OF CURRENT SYSTEMS Do not produce additional information for transcripts   drum onset detection vs drum transcription ‣ bars lines ‣ tempo ‣ meter � 20

LIMITATIONS OF CURRENT SYSTEMS Do not produce additional information for transcripts   drum onset detection vs drum transcription ‣ bars lines ‣ tempo ‣ meter ‣ dynamics / accents � 20

LIMITATIONS OF CURRENT SYSTEMS Do not produce additional information for transcripts   drum onset detection vs drum transcription ‣ bars lines ‣ tempo ‣ meter ‣ dynamics / accents ‣ stroke / playing technique � 20

LIMITATIONS OF CURRENT SYSTEMS Do not produce additional information for transcripts   drum onset detection vs drum transcription ‣ bars lines ‣ tempo ‣ meter ‣ dynamics / accents ‣ stroke / playing technique Only three instrument classes Richard Vogl, Gerhard Widmer, and Peter Knees, “ Towards multi-instrument drum transcription ,” in Proc. 21th Intl. Conf. on Digital Audio Effects (DAFx18), Aveiro, Portugal, Sep. 2018. � 20

ADDITIONAL INFORMATION FOR TRANSCRIPTS HH   SD   BD t � 21

ADDITIONAL INFORMATION FOR TRANSCRIPTS Use beat and downbeat tracking to get: 1 1 beats 2 3 4 2 3 4 HH   SD   BD t � 21

ADDITIONAL INFORMATION FOR TRANSCRIPTS Use beat and downbeat tracking to get: 1 1 beats 2 3 4 2 3 4 bars lines ‣ HH   SD   BD t � 21

ADDITIONAL INFORMATION FOR TRANSCRIPTS Use beat and downbeat tracking to get: 1 1 beats 2 3 4 2 3 4 bars lines ‣ HH   SD   tempo ‣ BD t � 21

ADDITIONAL INFORMATION FOR TRANSCRIPTS 4/4 Use beat and downbeat tracking to get: 1 1 beats 2 3 4 2 3 4 bars lines ‣ HH   SD   tempo ‣ BD meter ‣ t � 21

ADDITIONAL INFORMATION FOR TRANSCRIPTS ✔ 4/4 Use beat and downbeat tracking to get: 1 1 beats 2 3 4 2 3 4 bars lines ‣ HH   SD   tempo ‣ BD meter ‣ t � 21

LEVERAGE BEAT INFORMATION beats 2 3 4 1 2 3 4 1 HH   SD   BD t � 22

LEVERAGE BEAT INFORMATION beats 2 3 4 1 2 3 4 1 HH   SD   BD t Beats are highly correlated with drum patterns   (drum onset locations / repetitive patterns) � 22

LEVERAGE BEAT INFORMATION beats 2 3 4 1 2 3 4 1 HH   SD   BD t Beats are highly correlated with drum patterns   (drum onset locations / repetitive patterns) Assume that prior knowledge of beats is helpful for drum transcription � 22

DRUM TRANSCRIPTION VIA JOINT BEAT AND DRUM MODELING USING - PowerPoint PPT Presentation

DRUM TRANSCRIPTION VIA JOINT BEAT AND DRUM MODELING USING CONVOLUTIONAL RNNs Richard Vogl richard.vogl@tuwien.ac.at ifs.tuwien.ac.at/~vogl 21 st Vienna Deep Learning Meetup 15 th of October 2018 Institute of Computational Perception DRUM

DRUM TRANSCRIPTION VIA JOINT BEAT AND DRUM MODELING USING CONVOLUTIONAL RNNs Richard Vogl 1,2

FROM DRUM TRANSCRIPTION TO DRUM PATTERN VARIATION Richard Vogl richard.vogl@tuwien.ac.at PART 1

DRUM SHADE HAY Drum Shade is a fabric covered light shade with a laminated textile onto a

Automatic Drum Transcription E6820 Project Proposal Ron Weiss ronw@ee.columbia.edu Automatic

Combining Temporal And Spectral Features in HMM-based Drum Transcription Jouni Paulus, Anssi

TOWARDS MULTI-INSTRUMENT DRUM TRANSCRIPTION Richard Vogl 1,2 , Gerhard Widmer 2 , Peter Knees 1

Good morning, it is my pleasure to introduce you to DRUM for UHC. DRUM is the brainchild of

DRUM TRANSCRIPTION FROM POLYPHONIC MUSIC WITH RECURRENT NEURAL NETWORKS Richard Vogl 1,2 ,

Beat by Beat: Classifying Cardiac Arrhythmias with Recurrent Neural Networks Patrick Schwab,

GRASS VALLEY CHP Beat 31 Beat 31 is SR-49 starting at the Bear River (Nevada / Placer County

Unsupervised Piano Music Transcription Taylor Berg-Kirkpatrick Jacob Andreas and Dan Klein

TFClass a classifjcation of transcription factors Jrgen Dnitz, Edgar Wingender T

ROBOD: a Real-time Online Beat and Offbeat Drummer ock 1 , Florian Krebs 1 , 2 , Amaury Durand 3 ,

Beat the Street Torbay 19 th September 31 st October Beat the Street turns your whole area

The Biopharmaceutical Industrys Efforts to Beat Coronavirus Sharon Lamberton, MS, RN (State

Music transcription via convex optimization Song Mei ICME, Stanford June 3, 2015 Song Mei

Idea Pitch Each of you will pitch one idea 60

Teleportation, Majorana zero modes and long distance entanglement P . Sodano Facolt di Scienze

Multimedia Editing in the Cloud: Treating Audio as Big Data Adam T. Lindsay Multi-Service

Media and Creativity Tools CS 347 Michael Bernstein Create. 2 What are creativity tools?

INFORMATION ORGANIZATION LAB Faculty: Bob Glushko Student Instructors: Nick Doty & Ryan

What technology can do for us: an alternative to open your books Click to edit Master

moment of electrons Key words : CP violation, electron dipole moment (EDM), ACME collaboration

Social Bootstrapping How Pinterest and Last.fm Social Communities Benefit by Borrowing Links from

DRUM TRANSCRIPTION VIA JOINT BEAT AND DRUM MODELING USING - PowerPoint PPT Presentation

DRUM TRANSCRIPTION VIA JOINT BEAT AND DRUM MODELING USING CONVOLUTIONAL RNNs Richard Vogl richard.vogl@tuwien.ac.at ifs.tuwien.ac.at/~vogl 21 st Vienna Deep Learning Meetup 15 th of October 2018 Institute of Computational Perception DRUM

DRUM TRANSCRIPTION VIA JOINT BEAT AND DRUM MODELING USING CONVOLUTIONAL RNNs Richard Vogl 1,2

FROM DRUM TRANSCRIPTION TO DRUM PATTERN VARIATION Richard Vogl richard.vogl@tuwien.ac.at PART 1

DRUM SHADE HAY Drum Shade is a fabric covered light shade with a laminated textile onto a

Automatic Drum Transcription E6820 Project Proposal Ron Weiss ronw@ee.columbia.edu Automatic

Combining Temporal And Spectral Features in HMM-based Drum Transcription Jouni Paulus, Anssi

TOWARDS MULTI-INSTRUMENT DRUM TRANSCRIPTION Richard Vogl 1,2 , Gerhard Widmer 2 , Peter Knees 1

Good morning, it is my pleasure to introduce you to DRUM for UHC. DRUM is the brainchild of

DRUM TRANSCRIPTION FROM POLYPHONIC MUSIC WITH RECURRENT NEURAL NETWORKS Richard Vogl 1,2 ,

Beat by Beat: Classifying Cardiac Arrhythmias with Recurrent Neural Networks Patrick Schwab,

GRASS VALLEY CHP Beat 31 Beat 31 is SR-49 starting at the Bear River (Nevada / Placer County

Unsupervised Piano Music Transcription Taylor Berg-Kirkpatrick Jacob Andreas and Dan Klein

TFClass a classifjcation of transcription factors Jrgen Dnitz, Edgar Wingender T

ROBOD: a Real-time Online Beat and Offbeat Drummer ock 1 , Florian Krebs 1 , 2 , Amaury Durand 3 ,

Beat the Street Torbay 19 th September 31 st October Beat the Street turns your whole area

The Biopharmaceutical Industrys Efforts to Beat Coronavirus Sharon Lamberton, MS, RN (State

Music transcription via convex optimization Song Mei ICME, Stanford June 3, 2015 Song Mei

Idea Pitch Each of you will pitch one idea 60

Teleportation, Majorana zero modes and long distance entanglement P . Sodano Facolt di Scienze

Multimedia Editing in the Cloud: Treating Audio as Big Data Adam T. Lindsay Multi-Service

Media and Creativity Tools CS 347 Michael Bernstein Create. 2 What are creativity tools?

INFORMATION ORGANIZATION LAB Faculty: Bob Glushko Student Instructors: Nick Doty &amp; Ryan

What technology can do for us: an alternative to open your books Click to edit Master

moment of electrons Key words : CP violation, electron dipole moment (EDM), ACME collaboration

Social Bootstrapping How Pinterest and Last.fm Social Communities Benefit by Borrowing Links from

INFORMATION ORGANIZATION LAB Faculty: Bob Glushko Student Instructors: Nick Doty & Ryan