Lecture 4 Artificial Neural Networks Rui Xia T ext M ining Group N - - PowerPoint PPT Presentation
Lecture 4 Artificial Neural Networks Rui Xia T ext M ining Group N - - PowerPoint PPT Presentation
Lecture 4 Artificial Neural Networks Rui Xia T ext M ining Group N anjing U niversity of S cience & T echnology rxia@njust.edu.cn Brief History Rosenblatt (1958) created the perceptron, an algorithm for pattern recognition. Neural
Brief History
- Rosenblatt (1958) created the perceptron, an algorithm for pattern
recognition.
- Neural network research stagnated after machine learning research
by Minsky and Papert (1969), who discovered two key issues with the computational machines that processed neural networks.
– Basic perceptrons were incapable of processing the exclusive-or circuit. – Computers didn't have enough processing power to effectively handle the work required by large neural networks.
- A key trigger for the renewed interest in neural networks and
learning was Paul Werbos's (1975) back-propagation algorithm.
- Both shallow and deep learning (e.g., recurrent nets) of ANNs have
been explored for many years.
Machine Learning, NJUST, 2018 2
Brief History
- In 2006, Hinton and Salakhutdinov showed how a many-layered
feedforward neural network could be effectively pre-trained one layer at a time.
- Advances in hardware enabled the renewed interest after 2009.
- Industrial applications of deep learning to large-scale speech
recognition started around 2010.
- Significant additional impacts in image or object recognition were
felt from 2011–2012.
- Deep learning approaches have obtained very high performance
across many different natural language processing tasks after 2013.
- Till now, deep learning architectures such as CNN, RNN, LSTM, GAN
have been applied to a lot of fields, where they produced results comparable to and in some cases superior to human experts.
Machine Learning, NJUST, 2018 3
Machine Learning, NJUST, 2018 4
Inspired from Neural Networks
Multi-layer Neural Networks
Machine Learning, NJUST, 2018 5
Machine Learning, NJUST, 2018 6
3-layer Forward Neural Networks
- ANN Structure
- Hypothesis
ො 𝑧𝑘 = 𝜀(𝛾𝑘 + 𝜄
𝑘)
𝛾𝑘 =
ℎ=1 𝑟
𝑥ℎ𝑘𝑐ℎ 𝑐ℎ = 𝜀(𝛽ℎ + 𝛿ℎ) 𝛽ℎ =
𝑗=1 𝑒
𝑤𝑗ℎ𝑦𝑗
Learning algorithm
- Cost function
Machine Learning, NJUST, 2018 7
- Gradients to calculate
- Parameters
- Training Set
𝐸 = 𝑦 1 , 𝑧 1 , 𝑦 2 , 𝑧 2 , … , 𝑦 𝑛 , 𝑧 𝑛 , 𝑦 𝑗 𝜗𝑆𝑒, 𝑧 𝑗 𝜗𝑆𝑚 𝐹 𝑙 = 1 2
𝑘=1 𝑚
ො 𝑧𝑘
(𝑙) − 𝑧𝑘 (𝑙) 2
𝑤 𝜗 𝑆𝑒∗𝑟, 𝛿 𝜗 𝑆𝑟, 𝜕 𝜗 𝑆𝑟∗𝑚, 𝜄 𝜗 𝑆𝑚 𝜖𝐹(𝑙) 𝜖𝑤𝑗ℎ , 𝜖𝐹(𝑙) 𝜖𝛿ℎ , 𝜖𝐹(𝑙) 𝜖𝜕ℎ𝑘 , 𝜖𝐹(𝑙) 𝜖𝜄
𝑘
Gradient Calculation
- Firstly, gradient with respect to 𝜕ℎ𝑘:
Machine Learning, NJUST, 2018 8
where, 𝜖𝐹(𝑙) 𝜖𝜕ℎ𝑘 = 𝜖𝐹(𝑙) 𝜖 ො 𝑧𝑘
(𝑙) ∙
𝜖 ො 𝑧𝑘
(𝑙)
𝜖(𝛾𝑘 + 𝜄
𝑘) ∙ 𝜖(𝛾𝑘 + 𝜄 𝑘)
𝜖𝜕ℎ𝑘 𝜖𝐹(𝑙) 𝜖 ො 𝑧𝑘
(𝑙) =
ො 𝑧𝑘
(𝑙) − 𝑧𝑘 (𝑙)
𝜖 ො 𝑧𝑘
(𝑙)
𝜖(𝛾𝑘 + 𝜄
𝑘) = 𝜀′ 𝛾𝑘 + 𝜄 𝑘 = 𝜀 𝛾𝑘 + 𝜄 𝑘 ∙ 1 − 𝜀 𝛾𝑘 + 𝜄 𝑘
= ො 𝑧𝑘
(𝑙) ∙ 1 − ො
𝑧𝑘
(𝑙)
𝜖(𝛾𝑘 + 𝜄
𝑘)
𝜖𝜕ℎ𝑘 = 𝑐ℎ
Machine Learning, NJUST, 2018 9
Gradient Calculation
𝜖𝐹(𝑙) 𝜖𝜄
𝑘
= 𝜖𝐹(𝑙) 𝜖 ො 𝑧𝑘
(𝑙) ∙
𝜖 ො 𝑧𝑘
(𝑙)
𝜖(𝛾𝑘 + 𝜄
𝑘) ∙ 𝜖(𝛾𝑘 + 𝜄 𝑘)
𝜖𝜄
𝑘
= 𝑓𝑠𝑠𝑝𝑠
𝑘 𝑃𝑣𝑢𝑞𝑣𝑢𝑀𝑏𝑧𝑓𝑠 ∙ 1
Define: Then:
𝑓𝑠𝑠𝑝𝑠
𝑘 𝑃𝑣𝑢𝑞𝑣𝑢𝑀𝑏𝑧𝑓𝑠 =
𝜖𝐹(𝑙) 𝜖(𝛾𝑘 + 𝜄
𝑘) = 𝜖𝐹(𝑙)
𝜖 ො 𝑧𝑘
(𝑙) ∙
𝜖 ො 𝑧𝑘
(𝑙)
𝜖(𝛾𝑘 + 𝜄
𝑘)
= ො 𝑧𝑘
(𝑙) − 𝑧𝑘 (𝑙) ∙ ො
𝑧𝑘
(𝑙) ∙ 1 − ො
𝑧𝑘
(𝑙)
𝜖𝐹(𝑙) 𝜖𝜕ℎ𝑘 = 𝑓𝑠𝑠𝑝𝑠
𝑘 𝑃𝑣𝑢𝑞𝑣𝑢𝑀𝑏𝑧𝑓𝑠 ∙ 𝑐ℎ
- Secondly, gradient with respect to 𝜄
𝑘:
Machine Learning, NJUST, 2018 10
Gradient Calculation
where, 𝜖𝐹(𝑙) 𝜖𝑤𝑗ℎ =
𝑘=1 𝑚
𝜖𝐹(𝑙) 𝜖(𝛾𝑘 + 𝜄
𝑘) ∙ 𝜖(𝛾𝑘 + 𝜄 𝑘)
𝜖𝑐ℎ ∙ 𝜖𝑐ℎ 𝜖(𝛽ℎ + 𝛿ℎ) ∙ 𝜖(𝛽ℎ + 𝛿ℎ) 𝜖𝑤𝑗ℎ 𝜖𝐹(𝑙) 𝜖(𝛾𝑘 + 𝜄
𝑘) = 𝑓𝑠𝑠𝑝𝑠 𝑘 𝑃𝑣𝑢𝑞𝑣𝑢𝑀𝑏𝑧𝑓𝑠
𝜖(𝛾𝑘 + 𝜄
𝑘)
𝜖𝑐ℎ = 𝜕ℎ𝑘 𝜖𝑐ℎ 𝜖(𝛽ℎ + 𝛿ℎ) = 𝜀′ 𝛽ℎ + 𝛿ℎ = δ 𝛽ℎ + 𝛿ℎ ∙ 1 − δ 𝛽ℎ + 𝛿ℎ = 𝑐ℎ ∙ 1 − 𝑐ℎ 𝜖(𝛽ℎ + 𝛿ℎ) 𝜖𝑤𝑗ℎ = 𝑦𝑗
(𝑙)
- Thirdly, gradient with respect to 𝑤𝑗ℎ:
Machine Learning, NJUST, 2018 11
Gradient Calculation
define: then: =
𝑘=1 𝑚
𝜖𝐹(𝑙) 𝜖(𝛾𝑘 + 𝜄
𝑘) ∙ 𝜖(𝛾𝑘 + 𝜄 𝑘)
𝜖𝑐ℎ ∙ 𝜖𝑐ℎ 𝜖(𝛽ℎ + 𝛿ℎ) =
𝑘=1 𝑚
𝑓𝑠𝑠𝑝𝑠
𝑘 𝑃𝑣𝑢𝑞𝑣𝑢𝑀𝑏𝑧𝑓𝑠 ∙ 𝜕ℎ𝑘∙ 𝜀′ 𝛽ℎ + 𝛿ℎ
=
𝑘=1 𝑚
𝑓𝑠𝑠𝑝𝑠
𝑘 𝑃𝑣𝑢𝑞𝑣𝑢𝑀𝑏𝑧𝑓𝑠 ∙ 𝜕ℎ𝑘∙ 𝑐ℎ ∙ 1 − 𝑐ℎ
𝑓𝑠𝑠𝑝𝑠
ℎ 𝐼𝑗𝑒𝑒𝑓𝑜𝑀𝑏𝑧𝑓𝑠 =
𝜖𝐹(𝑙) 𝜖(𝛽ℎ + 𝛿ℎ) 𝜖𝐹(𝑙) 𝜖𝑤𝑗ℎ = 𝑓𝑠𝑠𝑝𝑠
ℎ 𝐼𝑗𝑒𝑒𝑓𝑜𝑀𝑏𝑧𝑓𝑠 ∙ 𝑦𝑗 (𝑙)
Machine Learning, NJUST, 2018 12
Gradient Calculation
𝜖𝐹(𝑙) 𝜖𝛿ℎ =
𝑘=1 𝑚
𝜖𝐹(𝑙) 𝜖(𝛾𝑘 + 𝜄
𝑘) ∙ 𝜖(𝛾𝑘 + 𝜄 𝑘)
𝜖𝑐ℎ ∙ 𝜖𝑐ℎ 𝜖 𝛽ℎ + 𝛿ℎ ∙ 𝜖 𝛽ℎ + 𝛿ℎ 𝜖𝛿ℎ = 𝑓𝑠𝑠𝑝𝑠
ℎ 𝐼𝑗𝑒𝑒𝑓𝑜𝑀𝑏𝑧𝑓𝑠 ∙ 1
- Finally, gradient with respect to 𝛿ℎ:
Machine Learning, NJUST, 2018 13
Back propagation algorithm
algorithm flowchart
Input: training set: = (𝑦 𝑙 , 𝑧 𝑙 ) 𝑙=1
𝑛
learning rate 𝜃 Steps: 1: initialize all parameters within (0,1) 2: repeat: 3: for all 𝑦 𝑙 , 𝑧 𝑙 ∈ do: 4: calculate 𝑧 𝑙 5: calculate 𝑓𝑠𝑠𝑝𝑠𝑃𝑣𝑢𝑞𝑣𝑢𝑀𝑏𝑧𝑓𝑠 : 6: calculate 𝑓𝑠𝑠𝑝𝑠𝐼𝑗𝑒𝑒𝑓𝑜𝑀𝑏𝑧𝑓𝑠: 7: update 𝑤 , 𝜄 , 𝑤 and 𝛿 8: end for 9: until reach stop condition Output: trained ANN
weight updating
𝜕ℎ𝑘 ≔ 𝜕ℎ𝑘 − η ∙ 𝜖𝐹 𝑙 𝜖𝜕ℎ𝑘 𝜄
𝑘 ≔ 𝜄 𝑘 − η ∙ 𝜖𝐹(𝑙)
𝜖𝜄
𝑘
𝑤𝑗ℎ ≔ 𝑤𝑗ℎ − η ∙ 𝜖𝐹(𝑙) 𝜖𝑤𝑗ℎ 𝛿ℎ ≔ 𝛿ℎ − η ∙ 𝜖𝐹(𝑙) 𝜖𝛿ℎ where η is the learning rate
Practice: 3-layer Forward NN with BP
- Given the following training data:
- Implement 3-layer Forward Neural Network with Back-Propagation and report the
5-fold cross validation performance (code by yourself, don’t use Tensorflow);
- Compare it with logistic regression and softmax regression.
http://openclassroom.stanford.edu/MainFolder/DocumentPage.php?course=DeepLearning&doc=exercises/ex4/ex4.html
Machine Learning, NJUST, 2018 14
Practice #2: 3-layer Forward NN with BP
- Given the following training data:
- Implement multi-layer Forward Neural Network with Back-Propagation and report
the 5-fold cross validation performance (code by yourself);
- Do that again (by using Tensorflow)
- Tune the model by using different numbers of hidden layers and hidden nodes,
different activation functions, different cost functions, different learning rates.
http://openclassroom.stanford.edu/MainFolder/DocumentPage.php?course=DeepLearning&doc=exercises/ex4/ex4.html
Machine Learning, NJUST, 2018 15
Questions?
Machine Learning, NJUST, 2018 16