a software framework for Musical Data Augmentation Brian McFee*, - PowerPoint PPT Presentation

a software framework for Musical Data Augmentation Brian McFee*, Eric J. Humphrey, Juan P. Bello

Modeling music is hard! Musical concepts are necessarily complex ❏ Complex concepts require big models ❏ Big models need big data! ❏ … but good data is hard to find ❏ https://commons.wikimedia.org/wiki/File:Music_Class_at_St_Elizabeths_Orphanage_New_Orleans_1940.jpg

http://photos.jdhancock.com/photo/2012-09-28-001422-big-data.html

Data augmentation dog Machine learning Training data https://commons.wikimedia.org/wiki/File:Horizontal_milling_machine--Cincinnati--early_1900s--001.png

Data augmentation Note: test data remains unchanged desaturate dog over-expose dog dog rotate Training data dog Machine learning https://commons.wikimedia.org/wiki/File:Horizontal_milling_machine--Cincinnati--early_1900s--001.png

Deforming inputs and outputs Note: test data remains unchanged add noise pitch-shift time-stretch Training data Machine learning https://commons.wikimedia.org/wiki/File:Horizontal_milling_machine--Cincinnati--early_1900s--001.png

Deforming inputs and outputs Some deformations may change labels! add noise pitch-shift C:maj D:maj time-stretch Training data Machine learning https://commons.wikimedia.org/wiki/File:Horizontal_milling_machine--Cincinnati--early_1900s--001.png

The big idea Musical data augmentation applies to both input (audio) and output (annotations)

… but how will we keep everything contained? … but how will we keep everything contained? https://www.flickr.com/photos/shreveportbossier/6015498526

A simple container for all annotations ❏ JAMS A structure to store (meta) data ❏ JSON Annotated Music Specification But v0.1 lacked a unified, cross-task interface ❏ [Humphrey et al., ISMIR 2014]

Pump up the JAMS: v0.2.0 Unified annotation interface ❏ chord DataFrame backing for easy manipulation ❏ segment Query engine to filter annotations by type ❏ chord, tag, beat, etc . ❏ Per-task schema and validation ❏ beat

Musical data augmentation In [1]: import muda

Deformer architecture Input JAMS Deformation object Output JAMS transform(input JAMS J_orig ) 1. For each state S : JAMS a. J := copy J_orig JAMS JAMS JAMS b. modify J.audio by S JAMS JAMS c. modify J.metadata by S d. Deform each annotation by S e. Append S to J.history f. yield J

Deformer architecture Input JAMS Deformation object Output JAMS transform(input JAMS J_orig ) State encapsulates a deformation’s parameters ❏ 1. For each state S : Iterating over states implements 1-to-Many mapping ❏ JAMS a. J := copy J_orig JAMS JAMS JAMS b. modify J.audio by S Examples: ❏ JAMS JAMS c. modify J.metadata by S d. Deform each annotation by S pitch_shift ∊ [-2, -1, 0, 1, 2] ❏ e. Append S to J.history f. yield J time_stretch ∊ [0.8, 1.0, 1.25] ❏ background noise ∊ sample library ❏

Deformer architecture Input JAMS Deformation object Output JAMS transform(input JAMS J_orig ) Audio is temporarily stored within the JAMS object ❏ 1. For each state S : JAMS a. J := copy J_orig JAMS All deformations depend on the state S ❏ JAMS JAMS b. modify J.audio by S JAMS JAMS c. modify J.metadata by S d. Deform each annotation by S All steps are optional ❏ e. Append S to J.history f. yield J

Deformer architecture Input JAMS Deformation object Output JAMS transform(input JAMS J_orig ) Each deformer knows how to handle different annotation types, e.g. : ❏ 1. For each state S : PitchShift. deform_chord () ❏ JAMS a. J := copy J_orig JAMS PitchShift. deform_pitch_hz () ❏ JAMS JAMS b. modify J.audio by S TimeStretch. deform_tempo () ❏ JAMS JAMS c. modify J.metadata by S TimeStretch. deform_all () ❏ d. Deform each annotation by S e. Append S to J.history JAMS makes it trivial to filter annotations by type ❏ f. yield J Multiple deformations may apply to a single annotation ❏

Deformer architecture Input JAMS Deformation object Output JAMS transform(input JAMS J_orig ) This provides data provenance ❏ 1. For each state S : JAMS a. J := copy J_orig JAMS JAMS JAMS b. modify J.audio by S All deformations are fully reproducible ❏ JAMS JAMS c. modify J.metadata by S d. Deform each annotation by S e. Append S to J.history f. yield J The constructed JAMS contains all state and object parameters ❏

Deformer architecture Input JAMS Deformation object Output JAMS transform(input JAMS J_orig ) 1. For each state S : JAMS a. J := copy J_orig JAMS JAMS JAMS b. modify J.audio by S JAMS JAMS c. modify J.metadata by S d. Deform each annotation by S e. Append S to J.history f. yield J

Deformation pipelines 1 2 r = 1.0 3 r = 0.8 p = +0 r = 1.25 4 5 6 p = +1 p = -1 7 8 9 for new_jam in jam_pipe(original_jam): process(new_jam)

Example application instrument recognition in mixtures https://commons.wikimedia.org/wiki/File:Instruments_on_stage.jpg

Data: MedleyDB 122 tracks/stems, mixed instruments ❏ [Bittner et al., ISMIR 2014] 75 unique artist identifiers ❏ We model (the top) 15 instrument classes ❏ Time-varying instrument activation labels ❏ http://medleydb.weebly.com/

Convolutional model Input Input Output ❏ (CQT patch) (instrument classes) a. ~1sec log-CQT patches b. 36 bins per octave 3x2 max 1x2 max c. 6 octaves (C2-C8) ReLU ReLU 216 15 68 60 96 Convolutional layers ❏ 0 13 1 a. 24x ReLU, 3x2 max-pool 9 0 b. 48x ReLU, 1x2 max-pool 0 7 1 ReLU sigmoid 9 24 18 6 ฀ 24 48 Dense layers ❏ a. 96d ReLU, dropout=0.5 b. 15d sigmoid, ℓ 2 penalty 44 ~1.7 million parameters

Five augmentation conditions: ❏ N Baseline P pitch shift [+- 1 semitone] PT + time-stretch [√2, 1/√2] PTB ++ background noise [3x noise] Experiment PTBC +++ dynamic range compression [2x] 1 input ⇒ up to 108 outputs ❏ How does training with data augmentation impact model stability? 15x (artist-conditional) 4:1 shuffle-splits ❏ Predict instrument activity on 1sec clips ❏ Note: test data remains unchanged

Results across all categories Pitch-shift improves model stability ❏ Label-ranking average precision Additional transformations don’t ❏ seem to help (on average) But is this the whole story? ❏

Baseline (no augmentation) Results by category All augmentations help for most classes ❏ synthesizer may be ill-defined ❏ Time-stretch can hurt high-vibrato instruments ❏ Change in F1-score

Conclusions We developed a general framework for musical data augmentation ❏ Training with augmented data can improve model stability ❏ Care must be taken in selecting deformations ❏ Implementation is available at https://github.com/bmcfee/muda ❏ soon: pip install muda

Thanks! brian.mcfee@nyu.edu https://bmcfee.github.io https://github.com/bmcfee/muda

a software framework for Musical Data Augmentation Brian McFee*, - PowerPoint PPT Presentation

a software framework for Musical Data Augmentation Brian McFee*, Eric J. Humphrey, Juan P. Bello Modeling music is hard! Musical concepts are necessarily complex Complex concepts require big models Big models need big data!

A Musical Journey A Musical Journey A Musical Journey A Musical Journey A Musical Journey A

Data Augmentation in NLP 2020-03-21 Xiachong Feng Outline Why we need Data Augmentation?

Population Based Augmentation Efficient Learning of Augmentation Policy Schedules Daniel Ho , Eric

image-augmentation April 9, 2019 1 Image Augmentation In [1]: % matplotlib inline import d2l

Memorial Day Choral Festival May 24-27, 2019 A Musical Tribute To Americas Veterans A Musical

Musical Instruments They sound different, even on the same note They require energy to

Galileo Local Element Augmentation System Galileo Local Element Augmentation System (GALILEA)

SCAF: Simplicial Complex Augmentation Framework for Bijective Maps Zhongshi Jiang, New York

ECE 417 Fall 2018 Lecture 19: Mini-Batch Training and Data Augmentation Mark Hasegawa-Johnson

SwitchOut: An Efficient Data Augmentation for Neural Machine Translation Xinyi Wang , Hieu

Convolutional Neural Networks with Data Augmentation against Jitter-Based Countermeasures Eleonora

Does Data Augmentation Lead to Positive Margin? Dimitris Po-Ling Loh Shashank Rajput* Zhili

Science, Policy & Musical Science, Policy & Musical Chairs Chairs Millie Baird Millie

The environment for musical learning A creative, confident learner

The Vision Mission Statement: Mission Statement: Musical- online Musical- online

Musical Theatre Song: A Comprehensive Course In Selection, Preparation, And Presentation For The

Deformations of G 2 -structures, String Dualities and Flat Higgs Bundles Rodrigo Barbosa Physics

TT-deformed classical and quantum field theories Trieste, October 2018 Tateo Roberto Based

LAPLACE-BELTRAMI EIGENFUNCTIONS FOR DEFORMATION INVARIANT SHAPE REPRESENTATION Raif Rustamov

Direct Skinning Methods and Deformation Primitives Ladislav Kavan University of Pennsylvania 1

T T and Double Trace Deformations Akikazu Hashimoto Chicago 2018 1709.00445 1711.01257

Reactive Trajectory Deformation to Navigate Dynamic Environment Vivien Delsart and Thierry

Topological string theory from Landau-Ginzburg models based on: arXiv:0904.0862 [hep-th],

Deformation Capture and Modeling This subtitle is 20 points of Soft Objects Bullets are

a software framework for Musical Data Augmentation Brian McFee*, - PowerPoint PPT Presentation

a software framework for Musical Data Augmentation Brian McFee*, Eric J. Humphrey, Juan P. Bello Modeling music is hard! Musical concepts are necessarily complex Complex concepts require big models Big models need big data!

A Musical Journey A Musical Journey A Musical Journey A Musical Journey A Musical Journey A

Data Augmentation in NLP 2020-03-21 Xiachong Feng Outline Why we need Data Augmentation?

Population Based Augmentation Efficient Learning of Augmentation Policy Schedules Daniel Ho , Eric

image-augmentation April 9, 2019 1 Image Augmentation In [1]: % matplotlib inline import d2l

Memorial Day Choral Festival May 24-27, 2019 A Musical Tribute To Americas Veterans A Musical

Musical Instruments They sound different, even on the same note They require energy to

Galileo Local Element Augmentation System Galileo Local Element Augmentation System (GALILEA)

SCAF: Simplicial Complex Augmentation Framework for Bijective Maps Zhongshi Jiang, New York

ECE 417 Fall 2018 Lecture 19: Mini-Batch Training and Data Augmentation Mark Hasegawa-Johnson

SwitchOut: An Efficient Data Augmentation for Neural Machine Translation Xinyi Wang , Hieu

Convolutional Neural Networks with Data Augmentation against Jitter-Based Countermeasures Eleonora

Does Data Augmentation Lead to Positive Margin? Dimitris Po-Ling Loh Shashank Rajput* Zhili

Science, Policy &amp; Musical Science, Policy &amp; Musical Chairs Chairs Millie Baird Millie

The environment for musical learning A creative, confident learner

The Vision Mission Statement: Mission Statement: Musical- online Musical- online

Musical Theatre Song: A Comprehensive Course In Selection, Preparation, And Presentation For The

Deformations of G 2 -structures, String Dualities and Flat Higgs Bundles Rodrigo Barbosa Physics

TT-deformed classical and quantum field theories Trieste, October 2018 Tateo Roberto Based

LAPLACE-BELTRAMI EIGENFUNCTIONS FOR DEFORMATION INVARIANT SHAPE REPRESENTATION Raif Rustamov

Direct Skinning Methods and Deformation Primitives Ladislav Kavan University of Pennsylvania 1

T T and Double Trace Deformations Akikazu Hashimoto Chicago 2018 1709.00445 1711.01257

Reactive Trajectory Deformation to Navigate Dynamic Environment Vivien Delsart and Thierry

Topological string theory from Landau-Ginzburg models based on: arXiv:0904.0862 [hep-th],

Deformation Capture and Modeling This subtitle is 20 points of Soft Objects Bullets are

Science, Policy & Musical Science, Policy & Musical Chairs Chairs Millie Baird Millie