deep learning intro
play

Deep Learning: Intro Juhan Nam Review of Traditional Machine - PowerPoint PPT Presentation

GCT634/AI613: Musical Applications of Machine Learning (Fall 2020) Deep Learning: Intro Juhan Nam Review of Traditional Machine Learning The traditional machine learning pipeline Temporal Frame-level Unsupervised Classifier Summary


  1. GCT634/AI613: Musical Applications of Machine Learning (Fall 2020) Deep Learning: Intro Juhan Nam

  2. Review of Traditional Machine Learning ● The traditional machine learning pipeline Temporal Frame-level Unsupervised Classifier Summary Features Learning

  3. Review of Traditional Machine Learning ● The traditional machine learning pipeline Temporal Frame-level Unsupervised Classifier Summary Features Learning K-means MFCC Logistic Regression Abs Mel Log Non-linear Temporal Linear DFT DCT (magnitude) Filterbank compression Transform Pooling Classifier

  4. Review of Traditional Machine Learning ● Each module can be replaced with a chain of linear transform and non- linear function Temporal Frame-level Unsupervised Classifier Summary Features Learning K-means MFCC Logistic Regression Abs Mel Log Non-linear Temporal Linear DFT DCT (magnitude) Filterbank compression Transform Pooling Classifier Non-linear Linear Non-linear Linear Linear Linear Linear Non-linear Temporal Transform function Transform Classifier Transform function Transform function Pooling

  5. Review of Traditional Machine Learning ● The entire modules can be replaced with a long chain of linear transform and non-linear function (i.e. deep neural network) Each module is locally optimized in the traditional machine learning ○ Temporal Frame-level Unsupervised Classifier Summary Features Learning K-means MFCC Logistic Regression Abs Mel Log Non-linear Temporal Linear DFT DCT (magnitude) Filterbank compression Transform Pooling Classifier Non-linear Linear Non-linear Linear Linear Linear Linear Non-linear Temporal Transform function Transform Classifier Transform function Transform function Pooling Deep Neural Network

  6. Deep Learning ● The entire blocks (or layers) are optimized in an end-to-end manner The parameters (or weights) in all layers are learned to minimize the loss ○ function of the classifier The loss is back-propagated through all layers (from right to left) as a ○ gradient with regard to each parameter (“error back-propagation”) ● Therefore, we “learn features” instead of designing or engineering them Non-linear Linear Non-linear Linear Linear Linear Linear Non-linear Temporal Transform function Transform Classifier Transform function Transform function Pooling Deep Neural Network

  7. Deep Learning: Building Models ● There are many choices of basic building blocks (or layers) ● Connectivity patterns (parametric) Fully-connected (i.e. linear transform) ○ Convolutional (note that STFT is a convolutional operation) ○ Skip / Residual ○ Recurrent ○ ● Nonlinearity functions (non-parametric) Sigmoid ○ Tanh ○ Rectified Linear Units (ReLU) and variations ○

  8. Deep Learning: Building Models ● We “design” a deep neural network architecture depending on the nature of data and the task Modular synth as a “musical analogy” ○ 1 �D�i�e�� ��ci��a���: a���� �he �a�age�e�� �f �he f�e��e�c� a�d i�����e 1 �D�i�e�� ��ci��a���: a���� �he �a�age�e�� �f �he f�e��e�c� a�d i�����e 1 �D�i�e�� ��ci��a���: a���� �he �a�age�e�� �f �he f�e��e�c� a�d i�����e �id�h �f �he 3 ���a�e� ��ci��a����. The�e 3 ���a�e� ��ci��a���� ca� be ���ed a�d �id�h �f �he 3 ���a�e� ��ci��a����. The�e 3 ���a�e� ��ci��a���� ca� be ���ed a�d �id�h �f �he 3 ���a�e� ��ci��a����. The�e 3 ���a�e� ��ci��a���� ca� be ���ed a�d The � � s�itch lets �ou choose: a monophonic pla�ing mode The � � button, acti�e �hen the s�nthesi�er is in monophonic mode, � or �glide� in English � oscillator modules (“Low Frequency Oscillator”) are used to The � � button, also acti�e �hen the s�nthesi�er is in monophonic mode, The “slave” oscillators can also be used as LFOs when they are brought to quencies (“low in �our pla�ing sequence. If, on the other hand, �ou don�t �ish to re freq”). This gives a total availability of 11 LFO modules. An o�cilla�or bank: 1 �dri�er� and 3 ��la�e o�cilla�or�� An o�cilla�or bank: 1 �dri�er� and 3 ��la�e o�cilla�or�� An o�cilla�or bank: 1 �dri�er� and 3 ��la�e o�cilla�or�� simultaneousl� (�pol�� screen. Thi – – – – – – oscillator modules (“Low Frequency Oscillator”) are used to To acti�ate the portamento mode, click on the �ON� button under the �i�h �he �o�a�ing �le�el� b���on and ampli��de mod�la�ion inp��. portamento intensit� knob (�glide�), situated ne�t to the 2 dials, on the � � �impl� click on �he �link� b���on �ha� �epa�a�e� �hem. The “slave” oscillators can also be used as LFOs when they are brought to � � quencies (“low �i�h �he �o�a�ing �le�el� b���on and ampli��de mod�la�ion inp��. � � freq”). This gives a total availability of 11 LFO modules. � � �impl� click on �he �link� b���on �ha� �epa�a�e� �hem. – – � � � � � – – � The por�amen�o (�Glide�) � � – – – – – – – – Se� �he �Range� of oscilla�or 3 �o 16. I� �ill pla� 1 oc�a�e �pper �han �he 3 o�hers (Images from the Arturia Modula V manual) – – � � in mode p�o�ide� acce�� �o man� ��ef�l fea���e�. Le��� look a� �hem in mode p�o�ide� acce�� �o man� ��ef�l fea���e�. Le��� look a� �hem – – – – Increase �he a��ack �ime of �he en�elope �o 2 o�clock in order �o make �he No� le��s make ano�her mod�la�ion appear gi�en b� �he �ill be pro�oked b� �he af�er�o�ch. For �his, connec� �he �Tri� o��p�� of �his LFO – – – – – –

  9. Deep Learning: Training Models ● Loss Function Cross entropy (logistic loss) ○ ● Hyper parameter (initialization, Hinge loss ○ regularization, model search) Maximum likelihood ○ ○ Weight initialization L2 (root mean square) and L1 ○ ○ L1 and L2 (Weight decay) Adversarial ○ ○ Dropout Variational ○ ○ Learning rate ● Optimizers ○ Layer size SGD ○ Batch size ○ Momentum ○ ○ Data augmentation RMSProp ○ Adagrad ○ Adam ○

  10. Multi-Layer Perceptron (MLP) ● Neural networks that consist of fully-connected layers and non-linear functions Also called Feedforward Neural Network or Deep Feedforward Network ○ A long history: perceptron (Rosenblatt, 1962), back-propagation (Rumelhart, ○ 1986), deep belief networks (Hinton and Salakhutdinov, 2006) 𝑧 = 𝑋 (&) ℎ (%) + 𝑐 (&) ℎ (") ℎ ($) ℎ (%) 𝑦 ℎ (%) = 𝑕 ( 𝑨 (%) ) 𝑧 𝑨 (%) = 𝑋 (%) ℎ ($) + 𝑐 (%) ℎ ($) = 𝑕 ( 𝑨 ($) ) 𝑨 ($) = 𝑋 ($) ℎ (") + 𝑐 ($) ℎ (") = 𝑕 ( 𝑨 (") ) Output layer Input layer 𝑨 (") = 𝑋 (") 𝑦 + 𝑐 (") Hidden layers

  11. Deep Feedforward Network ● It is argued that the first breakthrough of deep learning is from the deep feedforward network (2011) The state-of-the-art acoustic model (GMM-HMM) in the speech recognition ○ Replace the GMM module with a deep feedforward network (up to 5 layers) ○ Initialize the weights matrix using an unsupervised learning algorithm ○ Deep belief network: greedy layer-wise using restricted Boltzmann machine ■ Context-Dependent Pre-trained Deep Neural Networks for Large Vocabulary Speech Recognition, George Dahl, Dong Yu, Li Deng, Alex Acero, 2012

  12. Non-linear Functions ● There are several choices of non-linear functions (or activation functions) ReLU is the default choice in modern deep learning: fast and effective ○ There are also other choices such as Elastic Linear Unit and Maxout ○ Note that this is an element-wise operation in the neural network ○ 𝑓 " − 𝑓 !" Sigmoid Tanh 1 𝑓 " + 𝑓 !" 1 1 𝜏 𝑦 = 1 + 𝑓 !" -10 10 10 -10 0 0 Leaky ReLU ReLU 10 10 max(0, 𝑦) max(0.1𝑦, 𝑦) 10 -10 0 10 -10 0

  13. Why the Nonlinear Function in the Hidden Layer? ● They capture the interaction between input elements with high-order This enables finding non-linear boundaries between different classes ○ Taylor series of a nonlinear function 𝑕 𝑨 ○ Non-zero coefficients for high-order polynomials: ■ 𝑕 𝑨 = 𝑏 # + 𝑏 $ 𝑨 + 𝑏 % 𝑨 % + 𝑏 & 𝑨 & + ⋯ interactions between all input elements 𝑨 % = 𝑥 $ % + 2𝑥 $ 𝑥 % 𝑦 $ 𝑦 % + 𝑥 % % + 2𝑥 $ 𝑦 $ 𝑐 + 2𝑥 % 𝑦 % 𝑐 + 𝑐 % 𝑨 = 𝑥 $ 𝑦 $ + 𝑥 % 𝑦 % + 𝑐 % 𝑦 $ % 𝑦 % 𝑦 $ 𝑦 $ 𝑦 " 𝑦 " 𝑏 ! + 𝑏 " 𝑨 + 𝑏𝑨 # = 0 𝑨 = 0

  14. Why the Nonlinear Function in the Hidden Layer? ● What if the nonlinear functions are absent? Multiplications of linear transform = another linear transform ○ Geometrically, linear transformation does scaling, sheering and rotation ○ source: http://www.ams.org/publicoutreach/feature-column/fcarc-svd

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend