Deep Learning: Intro Juhan Nam Review of Traditional Machine - PowerPoint PPT Presentation

GCT634/AI613: Musical Applications of Machine Learning (Fall 2020) Deep Learning: Intro Juhan Nam

Review of Traditional Machine Learning ● The traditional machine learning pipeline Temporal Frame-level Unsupervised Classifier Summary Features Learning

Review of Traditional Machine Learning ● The traditional machine learning pipeline Temporal Frame-level Unsupervised Classifier Summary Features Learning K-means MFCC Logistic Regression Abs Mel Log Non-linear Temporal Linear DFT DCT (magnitude) Filterbank compression Transform Pooling Classifier

Review of Traditional Machine Learning ● Each module can be replaced with a chain of linear transform and nonlinear function Temporal Frame-level Unsupervised Classifier Summary Features Learning K-means MFCC Logistic Regression Abs Mel Log Non-linear Temporal Linear DFT DCT (magnitude) Filterbank compression Transform Pooling Classifier Non-linear Linear Non-linear Linear Linear Linear Linear Non-linear Temporal Transform function Transform Classifier Transform function Transform function Pooling

Review of Traditional Machine Learning ● The entire modules can be replaced with a long chain of linear transform and non-linear function (i.e. deep neural network) Each module is locally optimized in the traditional machine learning ○ Temporal Frame-level Unsupervised Classifier Summary Features Learning K-means MFCC Logistic Regression Abs Mel Log Non-linear Temporal Linear DFT DCT (magnitude) Filterbank compression Transform Pooling Classifier Non-linear Linear Non-linear Linear Linear Linear Linear Non-linear Temporal Transform function Transform Classifier Transform function Transform function Pooling Deep Neural Network

Deep Learning ● The entire blocks (or layers) are optimized in an end-to-end manner The parameters (or weights) in all layers are learned to minimize the loss ○ function of the classifier The loss is back-propagated through all layers (from right to left) as a ○ gradient with regard to each parameter (“error back-propagation”) ● Therefore, we “learn features” instead of designing or engineering them Non-linear Linear Non-linear Linear Linear Linear Linear Non-linear Temporal Transform function Transform Classifier Transform function Transform function Pooling Deep Neural Network

Deep Learning: Building Models ● There are many choices of basic building blocks (or layers) ● Connectivity patterns (parametric) Fully-connected (i.e. linear transform) ○ Convolutional (note that STFT is a convolutional operation) ○ Skip / Residual ○ Recurrent ○ ● Nonlinearity functions (non-parametric) Sigmoid ○ Tanh ○ Rectified Linear Units (ReLU) and variations ○

Deep Learning: Building Models ● We “design” a deep neural network architecture depending on the nature of data and the task Modular synth as a “musical analogy” ○ 1 �D�i�e�� ci��a��: a�� he �a�age�e�� f �he f�e��e�c� a�d i��e 1 �D�i�e�� ci��a��: a�� he �a�age�e�� f �he f�e��e�c� a�d i��e 1 �D�i�e�� ci��a��: a�� he �a�age�e�� f �he f�e��e�c� a�d i��e �id�h �f �he 3 ��a�e� ��ci��a��. The�e 3 ��a�e� ��ci��a�� ca� be ��ed a�d �id�h �f �he 3 ��a�e� ��ci��a��. The�e 3 ��a�e� ��ci��a�� ca� be ��ed a�d �id�h �f �he 3 ��a�e� ��ci��a��. The�e 3 ��a�e� ��ci��a�� ca� be ��ed a�d The � � s�itch lets �ou choose: a monophonic pla�ing mode The � � button, acti�e �hen the s�nthesi�er is in monophonic mode, � or �glide� in English � oscillator modules (“Low Frequency Oscillator”) are used to The � � button, also acti�e �hen the s�nthesi�er is in monophonic mode, The “slave” oscillators can also be used as LFOs when they are brought to quencies (“low in �our pla�ing sequence. If, on the other hand, �ou don�t �ish to re freq”). This gives a total availability of 11 LFO modules. An o�cilla�or bank: 1 �dri�er� and 3 ��la�e o�cilla�or�� An o�cilla�or bank: 1 �dri�er� and 3 ��la�e o�cilla�or�� An o�cilla�or bank: 1 �dri�er� and 3 ��la�e o�cilla�or�� simultaneousl� (�pol�� screen. Thi – – – – – – oscillator modules (“Low Frequency Oscillator”) are used to To acti�ate the portamento mode, click on the �ON� button under the �i�h �he �o�a�ing �le�el� b��on and ampli��de mod�la�ion inp��. portamento intensit� knob (�glide�), situated ne�t to the 2 dials, on the � � �impl� click on �he �link� b��on �ha� �epa�a�e� �hem. The “slave” oscillators can also be used as LFOs when they are brought to � � quencies (“low �i�h �he �o�a�ing �le�el� b��on and ampli��de mod�la�ion inp��. � � freq”). This gives a total availability of 11 LFO modules. � � �impl� click on �he �link� b��on �ha� �epa�a�e� �hem. – – � � � � � – – � The por�amen�o (�Glide�) � � – – – – – – – – Se� �he �Range� of oscilla�or 3 �o 16. I� �ill pla� 1 oc�a�e �pper �han �he 3 o�hers (Images from the Arturia Modula V manual) – – � � in mode p�o�ide� acce�� o man� ��ef�l fea��e�. Le�� look a� �hem in mode p�o�ide� acce�� o man� ��ef�l fea��e�. Le�� look a� �hem – – – – Increase �he a��ack �ime of �he en�elope �o 2 o�clock in order �o make �he No� le��s make ano�her mod�la�ion appear gi�en b� �he �ill be pro�oked b� �he af�er�o�ch. For �his, connec� �he �Tri� o��p�� of �his LFO – – – – – –

Deep Learning: Training Models ● Loss Function Cross entropy (logistic loss) ○ ● Hyper parameter (initialization, Hinge loss ○ regularization, model search) Maximum likelihood ○ ○ Weight initialization L2 (root mean square) and L1 ○ ○ L1 and L2 (Weight decay) Adversarial ○ ○ Dropout Variational ○ ○ Learning rate ● Optimizers ○ Layer size SGD ○ Batch size ○ Momentum ○ ○ Data augmentation RMSProp ○ Adagrad ○ Adam ○

Multi-Layer Perceptron (MLP) ● Neural networks that consist of fully-connected layers and non-linear functions Also called Feedforward Neural Network or Deep Feedforward Network ○ A long history: perceptron (Rosenblatt, 1962), back-propagation (Rumelhart, ○ 1986), deep belief networks (Hinton and Salakhutdinov, 2006) 𝑧 = 𝑋 (&) ℎ (%) + 𝑐 (&) ℎ (") ℎ ($) ℎ (%) 𝑦 ℎ (%) = 𝑕 ( 𝑨 (%) ) 𝑧 𝑨 (%) = 𝑋 (%) ℎ ($) + 𝑐 (%) ℎ ($) = 𝑕 ( 𝑨 ($) ) 𝑨 ($) = 𝑋 ($) ℎ (") + 𝑐 ($) ℎ (") = 𝑕 ( 𝑨 (") ) Output layer Input layer 𝑨 (") = 𝑋 (") 𝑦 + 𝑐 (") Hidden layers

Deep Feedforward Network ● It is argued that the first breakthrough of deep learning is from the deep feedforward network (2011) The state-of-the-art acoustic model (GMM-HMM) in the speech recognition ○ Replace the GMM module with a deep feedforward network (up to 5 layers) ○ Initialize the weights matrix using an unsupervised learning algorithm ○ Deep belief network: greedy layer-wise using restricted Boltzmann machine ■ Context-Dependent Pre-trained Deep Neural Networks for Large Vocabulary Speech Recognition, George Dahl, Dong Yu, Li Deng, Alex Acero, 2012

Non-linear Functions ● There are several choices of non-linear functions (or activation functions) ReLU is the default choice in modern deep learning: fast and effective ○ There are also other choices such as Elastic Linear Unit and Maxout ○ Note that this is an element-wise operation in the neural network ○ 𝑓 " − 𝑓 !" Sigmoid Tanh 1 𝑓 " + 𝑓 !" 1 1 𝜏 𝑦 = 1 + 𝑓 !" -10 10 10 -10 0 0 Leaky ReLU ReLU 10 10 max(0, 𝑦) max(0.1𝑦, 𝑦) 10 -10 0 10 -10 0

Why the Nonlinear Function in the Hidden Layer? ● They capture the interaction between input elements with high-order This enables finding non-linear boundaries between different classes ○ Taylor series of a nonlinear function 𝑕 𝑨 ○ Non-zero coefficients for high-order polynomials: ■ 𝑕 𝑨 = 𝑏 # + 𝑏 $ 𝑨 + 𝑏 % 𝑨 % + 𝑏 & 𝑨 & + ⋯ interactions between all input elements 𝑨 % = 𝑥 $ % + 2𝑥 $ 𝑥 % 𝑦 $ 𝑦 % + 𝑥 % % + 2𝑥 $ 𝑦 $ 𝑐 + 2𝑥 % 𝑦 % 𝑐 + 𝑐 % 𝑨 = 𝑥 $ 𝑦 $ + 𝑥 % 𝑦 % + 𝑐 % 𝑦 $ % 𝑦 % 𝑦 $ 𝑦 $ 𝑦 " 𝑦 " 𝑏 ! + 𝑏 " 𝑨 + 𝑏𝑨 # = 0 𝑨 = 0

Why the Nonlinear Function in the Hidden Layer? ● What if the nonlinear functions are absent? Multiplications of linear transform = another linear transform ○ Geometrically, linear transformation does scaling, sheering and rotation ○ source: http://www.ams.org/publicoutreach/feature-column/fcarc-svd

Deep Learning: Intro Juhan Nam Review of Traditional Machine - PowerPoint PPT Presentation

GCT634/AI613: Musical Applications of Machine Learning (Fall 2020) Deep Learning: Intro Juhan Nam Review of Traditional Machine Learning The traditional machine learning pipeline Temporal Frame-level Unsupervised Classifier Summary

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

Interchange Intro Presentation Plus: Intro (Mixed media Interchange Intro Presentation Plus: Intro

Interchange Intro Presentation Plus: Intro (Mixed media Interchange Intro Presentation Plus: Intro

AGN deep multiwavelength AGN deep multiwavelength AGN deep multiwavelength surveys: surveys:

Deep Learning: Theory and Practice Deep Learning - Practical 02-04-2020 Considerations

Presentation about Deep Learning --- Zhongwu xie Contents 1.Brief introduction of Deep learning.

Deep Learning on GPUs March 2016 What is Deep Learning? GPUs and DL AGENDA DL in practice

Deep learning Deep reinforcement learning Hamid Beigy Sharif university of technology December

Differen'able Func'onal Programming Noel Welsh @noelwelsh underscore Goals Deep learning

DSC 102 Systems for Scalable Analytics Arun Kumar Topic 6: Deep Learning Systems 1 Outline

ACCELERATE DEEP LEARNING WITH NVIDIA'S DEEP LEARNING PLATFORM | STEPHEN JONES | GTC16 DEEP

Deep learning for natural language processing A short primer on deep learning Benoit Favre <

Relational Deep Learning: A Deep Latent Variable Model for Link Prediction Hao Wang, Xingjian

Medical Imaging Elisa Sayrol Medical Imaging Interest in this area in Deep Learning: DeepDeep

Variable Strength Covering Arrays Lucia Moura School of Electrical Engineering and Computer

Preserving Randomness for Adaptive Algorithms Adam R. Klivans William M. Hoza May 25, 2017

Abstract Algebraic Logic 5th lesson Petr Cintula 1 and Carles Noguera 2 1 Institute of Computer

Augmenting the Connectivity of Planar and Geometric Graphs Ignaz Rutter Alexander Wolff

What is this talk about? What is this talk about? Deriving tight and safe task execution time

(Pseudo Wire Emulation) Ghassem Koleyni Nortel Networks ghassem@nortelnetworks.com Phone : +1

Volunteer Clouds for the LHC experiments Laurence Field Hassen Riahi CERN IT-SDC EGI User

Updating IEEE 1471 David Emery & Rich Hilliard* WICSA 2008 Working Session 4

Deep Learning: Intro Juhan Nam Review of Traditional Machine - PowerPoint PPT Presentation

GCT634/AI613: Musical Applications of Machine Learning (Fall 2020) Deep Learning: Intro Juhan Nam Review of Traditional Machine Learning The traditional machine learning pipeline Temporal Frame-level Unsupervised Classifier Summary

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

Interchange Intro Presentation Plus: Intro (Mixed media Interchange Intro Presentation Plus: Intro

Interchange Intro Presentation Plus: Intro (Mixed media Interchange Intro Presentation Plus: Intro

AGN deep multiwavelength AGN deep multiwavelength AGN deep multiwavelength surveys: surveys:

Deep Learning: Theory and Practice Deep Learning - Practical 02-04-2020 Considerations

Presentation about Deep Learning --- Zhongwu xie Contents 1.Brief introduction of Deep learning.

Deep Learning on GPUs March 2016 What is Deep Learning? GPUs and DL AGENDA DL in practice

Deep learning Deep reinforcement learning Hamid Beigy Sharif university of technology December

Differen'able Func'onal Programming Noel Welsh @noelwelsh underscore Goals Deep learning

DSC 102 Systems for Scalable Analytics Arun Kumar Topic 6: Deep Learning Systems 1 Outline

ACCELERATE DEEP LEARNING WITH NVIDIA'S DEEP LEARNING PLATFORM | STEPHEN JONES | GTC16 DEEP

Deep learning for natural language processing A short primer on deep learning Benoit Favre &lt;

Relational Deep Learning: A Deep Latent Variable Model for Link Prediction Hao Wang, Xingjian

Medical Imaging Elisa Sayrol Medical Imaging Interest in this area in Deep Learning: DeepDeep

Variable Strength Covering Arrays Lucia Moura School of Electrical Engineering and Computer

Preserving Randomness for Adaptive Algorithms Adam R. Klivans William M. Hoza May 25, 2017

Abstract Algebraic Logic 5th lesson Petr Cintula 1 and Carles Noguera 2 1 Institute of Computer

Augmenting the Connectivity of Planar and Geometric Graphs Ignaz Rutter Alexander Wolff

What is this talk about? What is this talk about? Deriving tight and safe task execution time

(Pseudo Wire Emulation) Ghassem Koleyni Nortel Networks ghassem@nortelnetworks.com Phone : +1

Volunteer Clouds for the LHC experiments Laurence Field Hassen Riahi CERN IT-SDC EGI User

Updating IEEE 1471 David Emery &amp; Rich Hilliard* WICSA 2008 Working Session 4

Deep learning for natural language processing A short primer on deep learning Benoit Favre <

Updating IEEE 1471 David Emery & Rich Hilliard* WICSA 2008 Working Session 4