CSI5180. MachineLearningfor BioinformaticsApplications Deep learning - PowerPoint PPT Presentation

CSI5180. MachineLearningfor BioinformaticsApplications Deep learning — practical issues by Marcel Turcotte Version November 19, 2019

Preamble 2/31

salpha Preamble Preamble 2/31

Preamble Deep learning — practical issues In this last lecture deep learning, we consider practical issues when using existing tools and libraries. General objective : Discuss the pitfalls, limitations, and practical considerations when using deep learning algorithms. Preamble 3/31

Learning objectives Discuss the pitfalls, limitations, and practical considerations when using deep learning algorithms. Explain what is a dropout layer Discuss further mechanisms to regularize deep networks Reading: Christof Angermueller, Tanel Pärnamaa, Leopold Parts, and Oliver Stegle. Deep learning for computational biology. Mol Syst Biol 12 (7):878, 07 2016. Preamble 4/31

Plan 1. Preamble 2. As mentioned previously 3. Regularization 4. Hyperparameters 5. Keras 6. Further considerations 7. Prologue Preamble 5/31

Asmentionedpreviously As mentioned previously 6/31

Overview Source: [1] Box 1 As mentioned previously 7/31

Summary In a dense layer, all the neurons are connected to all the neurons from the previous layer. As mentioned previously 8/31

Summary In a dense layer, all the neurons are connected to all the neurons from the previous layer. The number of parameters grows exponentially with each additional layer, making it nearly impossible to create deep networks. As mentioned previously 8/31

Summary In a dense layer, all the neurons are connected to all the neurons from the previous layer. The number of parameters grows exponentially with each additional layer, making it nearly impossible to create deep networks. Local connectivity . In a convolutional layer each neuron is connected to a small number of neurons from the previous layer. This small rectangular region is called the receptive field . As mentioned previously 8/31

Summary In a dense layer, all the neurons are connected to all the neurons from the previous layer. The number of parameters grows exponentially with each additional layer, making it nearly impossible to create deep networks. Local connectivity . In a convolutional layer each neuron is connected to a small number of neurons from the previous layer. This small rectangular region is called the receptive field . Parameter sharing . All the neurons in a given feature map of a convolutional layer share the same kernel ( filter ). As mentioned previously 8/31

Convolutional layer (Conv1D) Source: [1] Figure 2B As mentioned previously 9/31

Convolutional layer Contrary to Dense layers , Conv1D layers preserve the identity of the monomers (nucleotides or amino acids), which are seen as channels. As mentioned previously 10/31

Convolutional layer Contrary to Dense layers , Conv1D layers preserve the identity of the monomers (nucleotides or amino acids), which are seen as channels. Convolutional Neural Networks are able to detect patterns irrespective of their location in the input. As mentioned previously 10/31

Convolutional layer Contrary to Dense layers , Conv1D layers preserve the identity of the monomers (nucleotides or amino acids), which are seen as channels. Convolutional Neural Networks are able to detect patterns irrespective of their location in the input. Pooling makes the network less sensitive to small translations. As mentioned previously 10/31

Convolutional layer Contrary to Dense layers , Conv1D layers preserve the identity of the monomers (nucleotides or amino acids), which are seen as channels. Convolutional Neural Networks are able to detect patterns irrespective of their location in the input. Pooling makes the network less sensitive to small translations. In bioinformatics, CNN networks are ideally suited to detect local (sequence) motifs, independent of their position within the input (sequence). They are also the most prevalent architecture. As mentioned previously 10/31

Summary Recurrent networks (RNN) and Long Short-Term Memory ( LSTM ) can process input sequences of varying length. As mentioned previously 11/31

Summary Recurrent networks (RNN) and Long Short-Term Memory ( LSTM ) can process input sequences of varying length. Literature suggests that RNNs are more difficult to train than other architectures. As mentioned previously 11/31

Regularization Regularization 12/31

https://keras.io/layers/core/ Dropout Hinton and colleagues say that dropout layers are “ preventing co-adaptation ”. model = keras . models . S e q u e n t i a l ( [ . . . Dropout ( 0 . 5 ) , . . . ] ) Regularization 13/31

https://keras.io/layers/core/ Dropout Hinton and colleagues say that dropout layers are “ preventing co-adaptation ”. During training , each input unit in a dropout layer has probability p of being ignored (set to 0). model = keras . models . S e q u e n t i a l ( [ . . . Dropout ( 0 . 5 ) , . . . ] ) Regularization 13/31

https://keras.io/layers/core/ Dropout Hinton and colleagues say that dropout layers are “ preventing co-adaptation ”. During training , each input unit in a dropout layer has probability p of being ignored (set to 0). According to [3] §11: model = keras . models . S e q u e n t i a l ( [ . . . Dropout ( 0 . 5 ) , . . . ] ) Regularization 13/31

https://keras.io/layers/core/ Dropout Hinton and colleagues say that dropout layers are “ preventing co-adaptation ”. During training , each input unit in a dropout layer has probability p of being ignored (set to 0). According to [3] §11: 20-30% is a typical value of p convolution networks ; model = keras . models . S e q u e n t i a l ( [ . . . Dropout ( 0 . 5 ) , . . . ] ) Regularization 13/31

https://keras.io/layers/core/ Dropout Hinton and colleagues say that dropout layers are “ preventing co-adaptation ”. During training , each input unit in a dropout layer has probability p of being ignored (set to 0). According to [3] §11: 20-30% is a typical value of p convolution networks ; whereas, 40-50% is a typical of p for recurrent networks . model = keras . models . S e q u e n t i a l ( [ . . . Dropout ( 0 . 5 ) , . . . ] ) Regularization 13/31

https://keras.io/layers/core/ Dropout Hinton and colleagues say that dropout layers are “ preventing co-adaptation ”. During training , each input unit in a dropout layer has probability p of being ignored (set to 0). According to [3] §11: 20-30% is a typical value of p convolution networks ; whereas, 40-50% is a typical of p for recurrent networks . Dropout layers can make the network converging more slowly. However, the resulting network is expected to make fewer generalization errors . model = keras . models . S e q u e n t i a l ( [ . . . Dropout ( 0 . 5 ) , . . . ] ) Regularization 13/31

Dropout Source: [1] Figure 5F Regularization 14/31

https://keras.io/regularizers/ Regularizers Applying penalties on layer parameters # other import d i r e c t i v e s are here from keras import r e g u l a r i z e r s model = S e q u e n t i a l () model . add ( Dense (32 , input_shape =(16 ,))) model . add ( Dense (64 , input_dim =64, k e r n e l _ r e g u l a r i z e r=r e g u l a r i z e r s . l 2 ( 0 . 0 1 ) ) ) Available penalties keras . r e g u l a r i z e r s . l 1 ( 0 . ) keras . r e g u l a r i z e r s . l 2 ( 0 . ) keras . r e g u l a r i z e r s . l1_l2 ( l 1 =0.01 , l 2 =0.01) Regularization 15/31

Early stopping Source: [1] Figure 5E Regularization 16/31

Hyperparameters Hyperparameters 17/31

Optimizers An optimizer should be fast and should ideally guide the solution towards a “good” local optimum (or better, a global optimum). Hyperparameters 18/31

Optimizers An optimizer should be fast and should ideally guide the solution towards a “good” local optimum (or better, a global optimum). Momentum Hyperparameters 18/31

Optimizers An optimizer should be fast and should ideally guide the solution towards a “good” local optimum (or better, a global optimum). Momentum Momentum methods keep track of the previous gradients and this information is used to update the weights. m = β m − η ∇ θ J ( θ ) θ = θ + m Hyperparameters 18/31

Optimizers An optimizer should be fast and should ideally guide the solution towards a “good” local optimum (or better, a global optimum). Momentum Momentum methods keep track of the previous gradients and this information is used to update the weights. m = β m − η ∇ θ J ( θ ) θ = θ + m Momentum methods can escape plateau more effectively. Hyperparameters 18/31

CSI5180. MachineLearningfor BioinformaticsApplications Deep learning - PowerPoint PPT Presentation

CSI5180. MachineLearningfor BioinformaticsApplications Deep learning practical issues by Marcel Turcotte Version November 19, 2019 Preamble 2/31 salpha Preamble Preamble 2/31 Preamble Deep learning practical issues In this last

CSI5180. MachineLearningfor BioinformaticsApplications Fundamentals of Machine Learning tasks

CSI5180. MachineLearningfor BioinformaticsApplications Essential Cellular Biology (continued) by

CSI5180. MachineLearningfor BioinformaticsApplications Essential Cellular Biology by Marcel

CSI5180. MachineLearningfor BioinformaticsApplications Ensemble Learning by Marcel Turcotte

CSI5180. MachineLearningfor BioinformaticsApplications Hidden Markov Models by Marcel Turcotte

CSI5180. MachineLearningfor BioinformaticsApplications Essential Bioinformatics Skills by Marcel

CSI5180. MachineLearningfor BioinformaticsApplications Course overview by Marcel Turcotte

CSI5180. MachineLearningfor BioinformaticsApplications Rule Learning by Marcel Turcotte Version

CSI5180. MachineLearningfor BioinformaticsApplications Fundamentals of Machine Learning

CSI5180. MachineLearningfor BioinformaticsApplications Fundamentals of Machine Learning

CSI5180. MachineLearningfor BioinformaticsApplications Deep learning encoding and transfer

CSI5180. MachineLearningfor BioinformaticsApplications Regularized Linear Models by Marcel

Deep Learning Gradient-based optimization Caio Corro Universit Paris Sud 23 octobre 2019

MIT 9.520/6.860, Fall 2018 Class 11: Neural networks tips, tricks & software Andrzej

Learning Transferable Features with Deep Adaptation Networks Mingsheng Long 12 , Yue Cao 1 ,

Emergence of Cooperative Long-lasting Loyalty in Double Auction Markets Aleksandra Aloric

Generative vs. discriminative Generative Discriminative Belief network A is more More

Deep Learning: Training Juhan Nam Training Deep Neural Networks Forward (hidden unit

Understanding Convolutional Neural Networks David Stutz July 24th, 2014 David Stutz | July

DEEP LEARNING FFR135, Artificial Neural Networks Olof Mogren Chalmers University of Technology

tss

Le Lecture 7 7 R Recap ap I2DL: Prof. Niessner, Prof. Leal-Taix 1 Na Nave L Losse

Lectu ture 7 Recap Prof. Leal-Taix and Prof. Niessner 1 Bey Beyon ond l linea ear

Neural Networks. Petr Pok Czech Technical University in Prague Faculty of Electrical

CSI5180. MachineLearningfor BioinformaticsApplications Deep learning - PowerPoint PPT Presentation

CSI5180. MachineLearningfor BioinformaticsApplications Deep learning practical issues by Marcel Turcotte Version November 19, 2019 Preamble 2/31 salpha Preamble Preamble 2/31 Preamble Deep learning practical issues In this last

CSI5180. MachineLearningfor BioinformaticsApplications Fundamentals of Machine Learning tasks

CSI5180. MachineLearningfor BioinformaticsApplications Essential Cellular Biology (continued) by

CSI5180. MachineLearningfor BioinformaticsApplications Essential Cellular Biology by Marcel

CSI5180. MachineLearningfor BioinformaticsApplications Ensemble Learning by Marcel Turcotte

CSI5180. MachineLearningfor BioinformaticsApplications Hidden Markov Models by Marcel Turcotte

CSI5180. MachineLearningfor BioinformaticsApplications Essential Bioinformatics Skills by Marcel

CSI5180. MachineLearningfor BioinformaticsApplications Course overview by Marcel Turcotte

CSI5180. MachineLearningfor BioinformaticsApplications Rule Learning by Marcel Turcotte Version

CSI5180. MachineLearningfor BioinformaticsApplications Fundamentals of Machine Learning

CSI5180. MachineLearningfor BioinformaticsApplications Fundamentals of Machine Learning

CSI5180. MachineLearningfor BioinformaticsApplications Deep learning encoding and transfer

CSI5180. MachineLearningfor BioinformaticsApplications Regularized Linear Models by Marcel

Deep Learning Gradient-based optimization Caio Corro Universit Paris Sud 23 octobre 2019

MIT 9.520/6.860, Fall 2018 Class 11: Neural networks tips, tricks &amp; software Andrzej

Learning Transferable Features with Deep Adaptation Networks Mingsheng Long 12 , Yue Cao 1 ,

Emergence of Cooperative Long-lasting Loyalty in Double Auction Markets Aleksandra Aloric

Generative vs. discriminative Generative Discriminative Belief network A is more More

Deep Learning: Training Juhan Nam Training Deep Neural Networks Forward (hidden unit

Understanding Convolutional Neural Networks David Stutz July 24th, 2014 David Stutz | July

DEEP LEARNING FFR135, Artificial Neural Networks Olof Mogren Chalmers University of Technology

tss

Le Lecture 7 7 R Recap ap I2DL: Prof. Niessner, Prof. Leal-Taix 1 Na Nave L Losse

Lectu ture 7 Recap Prof. Leal-Taix and Prof. Niessner 1 Bey Beyon ond l linea ear

Neural Networks. Petr Pok Czech Technical University in Prague Faculty of Electrical

MIT 9.520/6.860, Fall 2018 Class 11: Neural networks tips, tricks & software Andrzej