Lecture 4 Artificial Neural Networks Rui Xia T ext M ining Group N - - PowerPoint PPT Presentation

lecture 4 artificial neural networks
SMART_READER_LITE
LIVE PREVIEW

Lecture 4 Artificial Neural Networks Rui Xia T ext M ining Group N - - PowerPoint PPT Presentation

Lecture 4 Artificial Neural Networks Rui Xia T ext M ining Group N anjing U niversity of S cience & T echnology rxia@njust.edu.cn Brief History Rosenblatt (1958) created the perceptron, an algorithm for pattern recognition. Neural


slide-1
SLIDE 1

Lecture 4 Artificial Neural Networks

Rui Xia Text Mining Group Nanjing University of Science & Technology rxia@njust.edu.cn

slide-2
SLIDE 2

Brief History

  • Rosenblatt (1958) created the perceptron, an algorithm for pattern

recognition.

  • Neural network research stagnated after machine learning research

by Minsky and Papert (1969), who discovered two key issues with the computational machines that processed neural networks.

– Basic perceptrons were incapable of processing the exclusive-or circuit. – Computers didn't have enough processing power to effectively handle the work required by large neural networks.

  • A key trigger for the renewed interest in neural networks and

learning was Paul Werbos's (1975) back-propagation algorithm.

  • Both shallow and deep learning (e.g., recurrent nets) of ANNs have

been explored for many years.

Machine Learning, NJUST, 2018 2

slide-3
SLIDE 3

Brief History

  • In 2006, Hinton and Salakhutdinov showed how a many-layered

feedforward neural network could be effectively pre-trained one layer at a time.

  • Advances in hardware enabled the renewed interest after 2009.
  • Industrial applications of deep learning to large-scale speech

recognition started around 2010.

  • Significant additional impacts in image or object recognition were

felt from 2011–2012.

  • Deep learning approaches have obtained very high performance

across many different natural language processing tasks after 2013.

  • Till now, deep learning architectures such as CNN, RNN, LSTM, GAN

have been applied to a lot of fields, where they produced results comparable to and in some cases superior to human experts.

Machine Learning, NJUST, 2018 3

slide-4
SLIDE 4

Machine Learning, NJUST, 2018 4

Inspired from Neural Networks

slide-5
SLIDE 5

Multi-layer Neural Networks

Machine Learning, NJUST, 2018 5

slide-6
SLIDE 6

Machine Learning, NJUST, 2018 6

3-layer Forward Neural Networks

  • ANN Structure
  • Hypothesis

ො 𝑧𝑘 = 𝜀(𝛾𝑘 + 𝜄

𝑘)

𝛾𝑘 = ෍

ℎ=1 𝑟

𝑥ℎ𝑘𝑐ℎ 𝑐ℎ = 𝜀(𝛽ℎ + 𝛿ℎ) 𝛽ℎ = ෍

𝑗=1 𝑒

𝑤𝑗ℎ𝑦𝑗

slide-7
SLIDE 7

Learning algorithm

  • Cost function

Machine Learning, NJUST, 2018 7

  • Gradients to calculate
  • Parameters
  • Training Set

𝐸 = 𝑦 1 , 𝑧 1 , 𝑦 2 , 𝑧 2 , … , 𝑦 𝑛 , 𝑧 𝑛 , 𝑦 𝑗 𝜗𝑆𝑒, 𝑧 𝑗 𝜗𝑆𝑚 𝐹 𝑙 = 1 2 ෍

𝑘=1 𝑚

ො 𝑧𝑘

(𝑙) − 𝑧𝑘 (𝑙) 2

𝑤 𝜗 𝑆𝑒∗𝑟, 𝛿 𝜗 𝑆𝑟, 𝜕 𝜗 𝑆𝑟∗𝑚, 𝜄 𝜗 𝑆𝑚 𝜖𝐹(𝑙) 𝜖𝑤𝑗ℎ , 𝜖𝐹(𝑙) 𝜖𝛿ℎ , 𝜖𝐹(𝑙) 𝜖𝜕ℎ𝑘 , 𝜖𝐹(𝑙) 𝜖𝜄

𝑘

slide-8
SLIDE 8

Gradient Calculation

  • Firstly, gradient with respect to 𝜕ℎ𝑘:

Machine Learning, NJUST, 2018 8

where, 𝜖𝐹(𝑙) 𝜖𝜕ℎ𝑘 = 𝜖𝐹(𝑙) 𝜖 ො 𝑧𝑘

(𝑙) ∙

𝜖 ො 𝑧𝑘

(𝑙)

𝜖(𝛾𝑘 + 𝜄

𝑘) ∙ 𝜖(𝛾𝑘 + 𝜄 𝑘)

𝜖𝜕ℎ𝑘 𝜖𝐹(𝑙) 𝜖 ො 𝑧𝑘

(𝑙) =

ො 𝑧𝑘

(𝑙) − 𝑧𝑘 (𝑙)

𝜖 ො 𝑧𝑘

(𝑙)

𝜖(𝛾𝑘 + 𝜄

𝑘) = 𝜀′ 𝛾𝑘 + 𝜄 𝑘 = 𝜀 𝛾𝑘 + 𝜄 𝑘 ∙ 1 − 𝜀 𝛾𝑘 + 𝜄 𝑘

= ො 𝑧𝑘

(𝑙) ∙ 1 − ො

𝑧𝑘

(𝑙)

𝜖(𝛾𝑘 + 𝜄

𝑘)

𝜖𝜕ℎ𝑘 = 𝑐ℎ

slide-9
SLIDE 9

Machine Learning, NJUST, 2018 9

Gradient Calculation

𝜖𝐹(𝑙) 𝜖𝜄

𝑘

= 𝜖𝐹(𝑙) 𝜖 ො 𝑧𝑘

(𝑙) ∙

𝜖 ො 𝑧𝑘

(𝑙)

𝜖(𝛾𝑘 + 𝜄

𝑘) ∙ 𝜖(𝛾𝑘 + 𝜄 𝑘)

𝜖𝜄

𝑘

= 𝑓𝑠𝑠𝑝𝑠

𝑘 𝑃𝑣𝑢𝑞𝑣𝑢𝑀𝑏𝑧𝑓𝑠 ∙ 1

Define: Then:

𝑓𝑠𝑠𝑝𝑠

𝑘 𝑃𝑣𝑢𝑞𝑣𝑢𝑀𝑏𝑧𝑓𝑠 =

𝜖𝐹(𝑙) 𝜖(𝛾𝑘 + 𝜄

𝑘) = 𝜖𝐹(𝑙)

𝜖 ො 𝑧𝑘

(𝑙) ∙

𝜖 ො 𝑧𝑘

(𝑙)

𝜖(𝛾𝑘 + 𝜄

𝑘)

= ො 𝑧𝑘

(𝑙) − 𝑧𝑘 (𝑙) ∙ ො

𝑧𝑘

(𝑙) ∙ 1 − ො

𝑧𝑘

(𝑙)

𝜖𝐹(𝑙) 𝜖𝜕ℎ𝑘 = 𝑓𝑠𝑠𝑝𝑠

𝑘 𝑃𝑣𝑢𝑞𝑣𝑢𝑀𝑏𝑧𝑓𝑠 ∙ 𝑐ℎ

  • Secondly, gradient with respect to 𝜄

𝑘:

slide-10
SLIDE 10

Machine Learning, NJUST, 2018 10

Gradient Calculation

where, 𝜖𝐹(𝑙) 𝜖𝑤𝑗ℎ = ෍

𝑘=1 𝑚

𝜖𝐹(𝑙) 𝜖(𝛾𝑘 + 𝜄

𝑘) ∙ 𝜖(𝛾𝑘 + 𝜄 𝑘)

𝜖𝑐ℎ ∙ 𝜖𝑐ℎ 𝜖(𝛽ℎ + 𝛿ℎ) ∙ 𝜖(𝛽ℎ + 𝛿ℎ) 𝜖𝑤𝑗ℎ 𝜖𝐹(𝑙) 𝜖(𝛾𝑘 + 𝜄

𝑘) = 𝑓𝑠𝑠𝑝𝑠 𝑘 𝑃𝑣𝑢𝑞𝑣𝑢𝑀𝑏𝑧𝑓𝑠

𝜖(𝛾𝑘 + 𝜄

𝑘)

𝜖𝑐ℎ = 𝜕ℎ𝑘 𝜖𝑐ℎ 𝜖(𝛽ℎ + 𝛿ℎ) = 𝜀′ 𝛽ℎ + 𝛿ℎ = δ 𝛽ℎ + 𝛿ℎ ∙ 1 − δ 𝛽ℎ + 𝛿ℎ = 𝑐ℎ ∙ 1 − 𝑐ℎ 𝜖(𝛽ℎ + 𝛿ℎ) 𝜖𝑤𝑗ℎ = 𝑦𝑗

(𝑙)

  • Thirdly, gradient with respect to 𝑤𝑗ℎ:
slide-11
SLIDE 11

Machine Learning, NJUST, 2018 11

Gradient Calculation

define: then: = ෍

𝑘=1 𝑚

𝜖𝐹(𝑙) 𝜖(𝛾𝑘 + 𝜄

𝑘) ∙ 𝜖(𝛾𝑘 + 𝜄 𝑘)

𝜖𝑐ℎ ∙ 𝜖𝑐ℎ 𝜖(𝛽ℎ + 𝛿ℎ) = ෍

𝑘=1 𝑚

𝑓𝑠𝑠𝑝𝑠

𝑘 𝑃𝑣𝑢𝑞𝑣𝑢𝑀𝑏𝑧𝑓𝑠 ∙ 𝜕ℎ𝑘∙ 𝜀′ 𝛽ℎ + 𝛿ℎ

= ෍

𝑘=1 𝑚

𝑓𝑠𝑠𝑝𝑠

𝑘 𝑃𝑣𝑢𝑞𝑣𝑢𝑀𝑏𝑧𝑓𝑠 ∙ 𝜕ℎ𝑘∙ 𝑐ℎ ∙ 1 − 𝑐ℎ

𝑓𝑠𝑠𝑝𝑠

ℎ 𝐼𝑗𝑒𝑒𝑓𝑜𝑀𝑏𝑧𝑓𝑠 =

𝜖𝐹(𝑙) 𝜖(𝛽ℎ + 𝛿ℎ) 𝜖𝐹(𝑙) 𝜖𝑤𝑗ℎ = 𝑓𝑠𝑠𝑝𝑠

ℎ 𝐼𝑗𝑒𝑒𝑓𝑜𝑀𝑏𝑧𝑓𝑠 ∙ 𝑦𝑗 (𝑙)

slide-12
SLIDE 12

Machine Learning, NJUST, 2018 12

Gradient Calculation

𝜖𝐹(𝑙) 𝜖𝛿ℎ = ෍

𝑘=1 𝑚

𝜖𝐹(𝑙) 𝜖(𝛾𝑘 + 𝜄

𝑘) ∙ 𝜖(𝛾𝑘 + 𝜄 𝑘)

𝜖𝑐ℎ ∙ 𝜖𝑐ℎ 𝜖 𝛽ℎ + 𝛿ℎ ∙ 𝜖 𝛽ℎ + 𝛿ℎ 𝜖𝛿ℎ = 𝑓𝑠𝑠𝑝𝑠

ℎ 𝐼𝑗𝑒𝑒𝑓𝑜𝑀𝑏𝑧𝑓𝑠 ∙ 1

  • Finally, gradient with respect to 𝛿ℎ:
slide-13
SLIDE 13

Machine Learning, NJUST, 2018 13

Back propagation algorithm

algorithm flowchart

Input: training set: 𝒠 = (𝑦 𝑙 , 𝑧 𝑙 ) 𝑙=1

𝑛

learning rate 𝜃 Steps: 1: initialize all parameters within (0,1) 2: repeat: 3: for all 𝑦 𝑙 , 𝑧 𝑙 ∈ 𝒠 do: 4: calculate 𝑧 𝑙 5: calculate 𝑓𝑠𝑠𝑝𝑠𝑃𝑣𝑢𝑞𝑣𝑢𝑀𝑏𝑧𝑓𝑠 : 6: calculate 𝑓𝑠𝑠𝑝𝑠𝐼𝑗𝑒𝑒𝑓𝑜𝑀𝑏𝑧𝑓𝑠: 7: update 𝑤 , 𝜄 , 𝑤 and 𝛿 8: end for 9: until reach stop condition Output: trained ANN

weight updating

𝜕ℎ𝑘 ≔ 𝜕ℎ𝑘 − η ∙ 𝜖𝐹 𝑙 𝜖𝜕ℎ𝑘 𝜄

𝑘 ≔ 𝜄 𝑘 − η ∙ 𝜖𝐹(𝑙)

𝜖𝜄

𝑘

𝑤𝑗ℎ ≔ 𝑤𝑗ℎ − η ∙ 𝜖𝐹(𝑙) 𝜖𝑤𝑗ℎ 𝛿ℎ ≔ 𝛿ℎ − η ∙ 𝜖𝐹(𝑙) 𝜖𝛿ℎ where η is the learning rate

slide-14
SLIDE 14

Practice: 3-layer Forward NN with BP

  • Given the following training data:
  • Implement 3-layer Forward Neural Network with Back-Propagation and report the

5-fold cross validation performance (code by yourself, don’t use Tensorflow);

  • Compare it with logistic regression and softmax regression.

http://openclassroom.stanford.edu/MainFolder/DocumentPage.php?course=DeepLearning&doc=exercises/ex4/ex4.html

Machine Learning, NJUST, 2018 14

slide-15
SLIDE 15

Practice #2: 3-layer Forward NN with BP

  • Given the following training data:
  • Implement multi-layer Forward Neural Network with Back-Propagation and report

the 5-fold cross validation performance (code by yourself);

  • Do that again (by using Tensorflow)
  • Tune the model by using different numbers of hidden layers and hidden nodes,

different activation functions, different cost functions, different learning rates.

http://openclassroom.stanford.edu/MainFolder/DocumentPage.php?course=DeepLearning&doc=exercises/ex4/ex4.html

Machine Learning, NJUST, 2018 15

slide-16
SLIDE 16

Questions?

Machine Learning, NJUST, 2018 16