CNN and Musical Applications Juhan Nam Motivation Sensory data - PowerPoint PPT Presentation

GCT634/AI613: Musical Applications of Machine Learning (Fall 2020) CNN and Musical Applications Juhan Nam

Motivation ● Sensory data (image or audio) have high-dimensionality Image: 256 x 256 pixels (commonly used size after crop and resize) ○ The average image resolution on ImageNet is 469x387 pixels) ■ Audio: 128 mel bins x 128 frames (commonly used 3 sec mel-spectrogram) ○ 44,100 or 22050 samples/sec ■ ● The fully-connected layer requires a large size of weight If the hidden layer size is 256 for 256x256 images, the number of ○ parameters is 256 x 256 x 3 (RGB) x 256 (hidden layer size) = 50M! ● Can we reduce the number of parameters?

Locality and Translation Invariance ● Locality: the objects of our interest tend to have a local spatial support Important parts of the object structures are locally correlated ○ ● Translation invariance: object appearance is independent of location

Incorporating Locality ● Change the fully -connected layer to a locally -connected layer Each hidden unit is connected to a local area ( receptive field) ○ Different hidden units connected to different locations ( feature map) ○ Source: NIPS 2017 Tutorial, Deep Learning: Practice and Trend

Incorporating Translation Invariance ● Make the hidden units connected to different locations have the same weights ( weight sharing ) Convolutional layer: locally-connected layer with weight sharing ○ The weight are invariant to the location and the output is equivalent ○ Source: NIPS 2017 Tutorial, Deep Learning: Practice and Trend

Convolutional Neural Network (CNN) ● Consists of convolution layer and subsampling (or pooling ) layer Local filters (or weight ) are convolved with the input or hidden layers and ○ return feature maps The feature maps are sub-sampled (or pooled) to reduce the dimensionality ○ C3: f. maps 16@10x10 C1: feature maps S4: f. maps 16@5x5 INPUT 6@28x28 32x32 S2: f. maps C5: layer OUTPUT F6: layer 6@14x14 120 10 84 Gaussian connections Full connection Subsampling Subsampling Full connection Convolutions Convolutions LeNet-5 ( LeCun 98)

Convolutional Neural Network (CNN) ● The breakthrough in image classification (2012) CNN with more convolution and max-pooling layers ○ ReLU (fast and non-saturated), dropout (regularization) ○ Trained with 2 GPUs “directly” on 1.2M images during one week ○ ImageNet challenge: top-5% error 15.3% (>10% lower than the second) ○ ImageNet classification with deep convolutional neural networks, Alex Krizhevsky, Ilya Sutskever, Geoffrey E Hinton, 2012

ImageNet Challenge ● CNN models have been deeper and deeper �� Deep Learning Breakthrough �� Surpass human recognition ��

Convolution Mechanics ● Image input and feature Map 3D tensor: width(W), height (H) and depth (channel) ○ Channel (C): R, G, B ○ The input data become a 4D tensor when a batch or mini-batch is used à N (example) x C x W x H Height Height Hidden Width unit Width Filter Channel Channel Filter must have the same This channel corresponds 2D convolution (or Depth) depth as the input has to the number of filters!

Convolution Mechanics ● Stride: sliding with hopping (equivalent to the hop size in STFT) ● Padding: adjust the feature map size by zero-padding to the border of the input No padding No padding Pad size=1 Pad size=1 No striding Stride size=2 Stride size=2 No Striding (Filter size: 3 x 3) Source: https://github.com/vdumoulin/conv_arithmetic

Sub-Sampling (or Pooling) ● Down-size the feature map by summarizing the local features ● Types Max-pooling: most popular choice ○ Average pooling, standard deviation pooling, L^p (power-average) pooling ○ 1 5 2 4 2 x 2 max pooling 2 3 9 1 5 9 Stride with 2 5 3 3 4 8 4 7 8 2 2

�� CNN Architecture for Image Classification ● ResNet (deep and high performance) ● GooLeNet (efficiency) ● VGGNet (flexibility) 34-layer residual image ○ ○ ○ ○ Add skip connections between conv blocks: better gradient flow 1x1 filter: reduce the depth (significantly reduce parameters) Inception module: multiple parallel convolution layers Small filter size (3x3) 7x7 conv, 64, /2 pool, /2 3x3 conv, 64 3x3 conv, 64 3x3 conv, 64 3x3 conv, 64 3x3 conv, 64 �� 3x3 conv, 64 � �� 3x3 conv, 128, /2 � � � � � 3x3 conv, 128 �� 3x3 conv, 128 3x3 conv, 128 3x3 conv, 128 3x3 conv, 128 3x3 conv, 128 3x3 conv, 128 ResNet 3x3 conv, 256, /2 3x3 conv, 256 One block of VGGNet 3x3 conv, 256 �� 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 �� 3x3 conv, 256 3x3 conv, 256 �� 3x3 conv, 256 3x3 conv, 512, /2 �� 3x3 conv, 512 Depth:256 �� 3x3 conv, 512 3x3 conv, 512 3x3 conv, 512 1x1 Filter (64) 3x3 conv, 512 avg pool Depth:64 �� fc 1000 ��

Classification-based MIR Tasks Using CNN “soft rock” ● Semantic-Level (long segment) Music genre/mood classification and auto-tagging ○ Music recommendation ○ “piano” “singing voice” ● Event-Level (note, beat or phrase) Onset Detection ○ Musical instrument recognition ○ Singing voice detection ○ pitch contour (The output is usually predicted in frame-level) (quantized) ● Frame-Level (single audio frame) Pitch estimation ○ Multiple F0 estimation ○

CNN and Musical Applications Juhan Nam Motivation Sensory data - PowerPoint PPT Presentation

GCT634/AI613: Musical Applications of Machine Learning (Fall 2020) CNN and Musical Applications Juhan Nam Motivation Sensory data (image or audio) have high-dimensionality Image: 256 x 256 pixels (commonly used size after crop and resize)

A Musical Journey A Musical Journey A Musical Journey A Musical Journey A Musical Journey A

CS7015 (Deep Learning) : Lecture 12 Object Detection: R-CNN, Fast R-CNN, Faster R-CNN, You Only

Object Detection using R-CNN Experiments CS381V: Visual Recognition, Spring 2016 William Xie

Memorial Day Choral Festival May 24-27, 2019 A Musical Tribute To Americas Veterans A Musical

Musical Instruments They sound different, even on the same note They require energy to

RNN and Musical Applications Juhan Nam Motivation When the output is sequential, e.g., pitch

Decay vertex ID using CNN for p K+ Aaron Higuera University of Houston CNN Tools on

CNN Ba CNN Based ed Pi Pipeline peline for or Op Optical ical Fl Flow ow Tal Schuster,

CENG5030 Part 2-1: Introduction to Convolutional Nueral Network Bei Yu (Latest update: March 4,

Nue Energy Reconstruction with CNN Lars Hertel, Ilsoo Seong, Jianming Bian 2018/08/20 Intro.

Musical Interfaces and Sequencers Graduate School of Culture Technology, KAIST Juhan Nam Musical

Musical Theatre Song: A Comprehensive Course In Selection, Preparation, And Presentation For The

Fundamentals of Musical Acoustics Graduate School of Culture Technology, KAIST Juhan Nam

Science, Policy & Musical Science, Policy & Musical Chairs Chairs Millie Baird Millie

The environment for musical learning A creative, confident learner

The Vision Mission Statement: Mission Statement: Musical- online Musical- online

Evidential Clustering: a Review of Some New Developments Thierry Denux Universit de

Reform! Research on learning shows: the entire standards-based Some benefits of

CSCE 478/878 Lecture 8: Instance-Based Learning Stephen D. Scott (Adapted from Tom Mitchells

Text processing Format Text File IASP 321 IASP 221 Dr. John Yoon Text Processing Commands

An Introduction to Hilbert Space Embedding of Probability Measures Krikamol Muandet Max Planck

Quickest Quickest Exam 1 Extra Credit: Exam 1 Extra Credit: either either show up and watch

Parallelization and Parallelization and Proling Proling Programming for Statistical

Many-Core Scheduling of Data Parallel Applications using SMT Solvers Pranav Tendulkar Peter

CNN and Musical Applications Juhan Nam Motivation Sensory data - PowerPoint PPT Presentation

GCT634/AI613: Musical Applications of Machine Learning (Fall 2020) CNN and Musical Applications Juhan Nam Motivation Sensory data (image or audio) have high-dimensionality Image: 256 x 256 pixels (commonly used size after crop and resize)

A Musical Journey A Musical Journey A Musical Journey A Musical Journey A Musical Journey A

CS7015 (Deep Learning) : Lecture 12 Object Detection: R-CNN, Fast R-CNN, Faster R-CNN, You Only

Object Detection using R-CNN Experiments CS381V: Visual Recognition, Spring 2016 William Xie

Memorial Day Choral Festival May 24-27, 2019 A Musical Tribute To Americas Veterans A Musical

Musical Instruments They sound different, even on the same note They require energy to

RNN and Musical Applications Juhan Nam Motivation When the output is sequential, e.g., pitch

Decay vertex ID using CNN for p K+ Aaron Higuera University of Houston CNN Tools on

CNN Ba CNN Based ed Pi Pipeline peline for or Op Optical ical Fl Flow ow Tal Schuster,

CENG5030 Part 2-1: Introduction to Convolutional Nueral Network Bei Yu (Latest update: March 4,

Nue Energy Reconstruction with CNN Lars Hertel, Ilsoo Seong, Jianming Bian 2018/08/20 Intro.

Musical Interfaces and Sequencers Graduate School of Culture Technology, KAIST Juhan Nam Musical

Musical Theatre Song: A Comprehensive Course In Selection, Preparation, And Presentation For The

Fundamentals of Musical Acoustics Graduate School of Culture Technology, KAIST Juhan Nam

Science, Policy &amp; Musical Science, Policy &amp; Musical Chairs Chairs Millie Baird Millie

The environment for musical learning A creative, confident learner

The Vision Mission Statement: Mission Statement: Musical- online Musical- online

Evidential Clustering: a Review of Some New Developments Thierry Denux Universit de

Reform! Research on learning shows: the entire standards-based Some benefits of

CSCE 478/878 Lecture 8: Instance-Based Learning Stephen D. Scott (Adapted from Tom Mitchells

Text processing Format Text File IASP 321 IASP 221 Dr. John Yoon Text Processing Commands

An Introduction to Hilbert Space Embedding of Probability Measures Krikamol Muandet Max Planck

Quickest Quickest Exam 1 Extra Credit: Exam 1 Extra Credit: either either show up and watch

Parallelization and Parallelization and Proling Proling Programming for Statistical

Many-Core Scheduling of Data Parallel Applications using SMT Solvers Pranav Tendulkar Peter

Science, Policy & Musical Science, Policy & Musical Chairs Chairs Millie Baird Millie