Presentation about Deep Learning --- Zhongwu xie Contents 1.Brief - PowerPoint PPT Presentation

Presentation about Deep Learning --- Zhongwu xie

Contents 1.Brief introduction of Deep learning. 2.Brief introduction of Backpropagation. 3.Brief introduction of Convolutional Neural Networks.

Deep learning

I . Introduction to Deep Learning Deep learning is a particular kind of machine learning that achieves great power and flexibility by learning to represent the world as a nested hierarchy of concepts , with each concept defined in relation to simpler concepts , and more abstract representations computed in terms of less abstract ones.---Ian Goodfellow

I . Introduction to Deep Learning In the plot on the left , A Venn diagram showing how deep learning is a kind of representation learning , which is in turn of machine learning. In the plot on the left ,the graph shows that deep learning has Multilayer.

I . What is Deep Learning • Data: 𝑦 𝑗 , 𝑧 𝑗 1 ≤ 𝑗 ≤ 𝑛 • Model: ANN • Criterion: -Cost function: 𝑀(𝑧, 𝑔(𝑦)) -Empirical risk minimization: 𝑆 𝜄 = 1 𝑛 𝑀(𝑧 𝑗 , 𝑔(𝑦 𝑗 , 𝜄)) 𝑛 σ 𝑗=1 -Regularization: || 𝑥 ||, | 𝑥 | 2 , Early Stopping , Dropout -objective function ： 𝑛𝑗𝑜𝑗 𝑆 𝜄 + λ ∗ (Regularization Function) • Algorithm : BP Gradient descent Learning is cast as optimization.

II . Why should we need to learn Deep Learning? --- Efficiency famous Instances : self-driven • Speech Recognition AlphaGo ---The phoneme error rate on TIMIT: Basing on HMM-GMM in 1990s : about 26% Restricted Boltzmann machines(RBMs) in 2009: 20.7%; LSTM-RNN in 2013:17.7% • Computer Vision ---The Top- 5 error of ILSVRC 2017 Classification Task is 2.251%, while human being’s is 5.1%. • Natural Language Processing ---language model (n-gram) Machine translation • Recommender Systems ---Recommend ads , social network news feeds , movies , jokes , or advice from experts etc.

Backward propagation

I . Introduction to Notation 𝑦 1 𝑨 = 𝑥 𝑈 𝑦 + 𝑐 𝑦 2 𝑏 = ො 𝑧 𝑥 𝑈 𝑦 + 𝑐 𝑕(𝑦) 𝑨 𝑏 𝑏 = 𝑕(𝑨) 𝑦 3 layer1 [2] 𝑥 43 layer0 layer2 𝑦 1 𝑚 is the weight from the 𝑘 𝑢ℎ 𝑥 𝑘𝑙 neuron in the (𝑚 − 1) 𝑢ℎ layer to the 𝑦 2 𝑙 𝑢ℎ neuron in the 𝑚 𝑢ℎ layer. 𝑦 3

I . Introduction to Forward propagation and Notation [1] = 𝑥 1 [1] = 𝜏(𝑨 1 [1] , [1] ) 1 𝑈 𝑦 + 𝑐 2 𝑦 1 𝑨 1 𝑏 1 [1] = 𝑥 2 [1] = 𝜏(𝑨 2 [1] , [1] ) 1 𝑈 𝑦 + 𝑐 2 𝑨 2 𝑏 2 𝑦 2 [1] = 𝑥 3 [1] = 𝜏(𝑨 3 𝑧 = 𝑏 ො 1 𝑈 𝑦 + 𝑐 3 [1] , [1] ) 𝑨 3 𝑏 3 [1] = 𝑥 4 [1] = 𝜏(𝑨 4 [1] , [1] ) 1 𝑈 𝑦 + 𝑐 4 𝑨 4 𝑏 4 𝑦 3 𝑥 [1] 1 𝑦 𝑙 + 𝑐 1 [1] [1] 3 T [1] 𝑨 1 [1] σ 𝑙=1 𝑥 𝑙1 𝑏 1 𝑐 1 [1] [1] [1] [1] 𝑥 11 𝑥 12 𝑥 13 𝑥 14 1 𝑦 𝑙 + 𝑐 2 𝑦 1 [1] [1] [1] [1] 3 = σ 𝑙=1 𝑥 𝑙2 𝑨 2 𝑐 2 𝑏 2 𝑨 [1] = 𝑏 [1] = = 𝜏 𝑨 1 , where 𝜏 𝑦 is 𝑢ℎe sigmoid function [1] [1] [1] [1] 𝑦 2 + = 𝑥 21 𝑥 22 𝑥 23 𝑥 24 1 𝑦 𝑙 + 𝑐 3 [1] [1] [1] [1] 3 σ 𝑙=1 𝑏 3 𝑐 3 𝑥 𝑙3 𝑨 3 𝑦 3 [1] [1] [1] [1] 𝑥 31 𝑥 32 𝑥 33 𝑥 34 1 𝑦 𝑙 + 𝑐 4 [1] [1] [1] [1] 3 𝑐 4 𝑏 4 σ 𝑙=1 𝑥 𝑙4 𝑨 4 𝑑𝑝𝑡𝑢 𝑔𝑣𝑜𝑑𝑢𝑗𝑝𝑜: 𝑀 𝑏, 𝑧 𝑒𝑥 [1] = 𝜖𝑀(𝑏,𝑧) 𝜖𝑥 [1] , 𝑒𝑐 [1] = 𝜖𝑀(𝑏,𝑧) 𝜖𝑐 [1]

II . Backward propagation. --- the chain rule If 𝑦 = 𝑔 𝑥 , 𝑧 = 𝑔 𝑦 , 𝑨 = 𝑔(𝑧) 𝜖𝑨 𝜖𝑨 𝜖𝑧 𝜖𝑦 So, 𝜖𝑥 = 𝜖𝑧 𝜖𝑦 𝜖𝑥 ---the functions of neural network are same as the above function , so we can use the chain rule to the gradient of the neural network. 𝑦 𝑏 = 𝜏 (z) 𝑨 = 𝑥 𝑈 𝑦 + 𝑐 𝑥 𝑀 𝑏, 𝑧 𝑐

II . Backward propagation. --- the chain rule 𝑥 [2] 𝑀 𝑏, 𝑧 = −[𝑧𝑚𝑝𝑕𝑏 + 1 − 𝑧 log 1 − 𝑏 ] 𝑐 [2] 𝑦 𝑥 [1] 𝑨 [1] = 𝑥 [1] 𝑦 + 𝑐 [1] 𝑏 [1] = 𝜏 ( 𝑨 [1] ) 𝑨 [2] = 𝑥 [2] 𝑏 [1] + 𝑐 [2] 𝑏 [2] = 𝜏 ( 𝑨 [2] ) 𝑀 𝑏 [2] , 𝑧 𝑐 [1] 𝜖𝑀(𝑏,𝑧) 𝑧 1−𝑧 𝑒𝑏 [2] = 𝜖𝑏 [2] 𝜖𝑨 [2] 𝜖𝑏 [1] 𝜖𝑨 [1] 𝑒𝑥 [1] = 𝜖𝑀(𝑏,𝑧) 𝜖𝑀(𝑏,𝑧) 𝜖𝑏 [2] = − 𝑏 + 𝜖𝑥 [1] = 𝑒𝑨 [1] 𝑦 𝑈 𝜖𝑥 [1] = 𝜖𝑏 [2] × 𝜖𝑨 [2] × 𝜖𝑏 [1] × 𝜖𝑨 [1] × 1−𝑏 𝜖𝑏 [2] 𝑒𝑨 [2] = 𝜖𝑀(𝑏,𝑧) 𝜖𝑀(𝑏,𝑧) 𝜖𝑨 [2] = 𝑏 [2] − 𝑧 × 𝜖𝑏 [2] 𝜖𝑨 [2] × 𝜖𝑨 [2] 𝜖𝑏 [1] × 𝜖𝑏 [1] 𝜖𝑨 [1] × 𝜖𝑨 [1] 𝑒𝑐 [1] = 𝜖𝑀(𝑏, 𝑧) = 𝜖𝑀(𝑏, 𝑧) 𝜖𝑏 [2] × 𝜖𝑨 [2] = 𝜖𝑐 [1] = 𝑒𝑨 [1] 𝜖𝑐 [1] 𝑏 [2] 𝜖𝑏 [2] 𝜖𝑨 [2] 𝑒𝑥 [2] = 𝜖𝑀(𝑏,𝑧) 𝜖𝑀(𝑏,𝑧) 𝜖𝑥 [2] = 𝑒𝑨 [2] 𝑏 1 𝑈 × 𝜖𝑨 [2] × 𝜖𝑥 [2] = 𝑏 [2] × 𝜖𝑏 [2] 𝜖𝑨 [2] × 𝜖𝑨 [2] 𝑒𝑐 [2] = 𝜖𝑀(𝑏, 𝑧) = 𝜖𝑀(𝑏, 𝑧) 𝜖𝑐 [2] = 𝑒𝑨 [2] 𝜖𝑐 [2] 𝑏 [2] × 𝜖𝑏 [2] 𝜖𝑨 [2] × 𝜖𝑨 [2] 𝜖𝑏 [1] × 𝜖𝑏 [1] 𝑒𝑨 [1] = 𝜖𝑀(𝑏, 𝑧) = 𝜖𝑀(𝑏, 𝑧) 𝜖𝑨 [1] 𝑏 [2] 𝜖𝑨 [1] = 𝑥 2 𝑈 𝑒𝑨 [2] * 𝜏 ′ (𝑨 [1] )

II . Summary : The Backpropagation [𝑚] [𝑚+1] 𝜖𝑏 𝑘 𝜖𝑏 𝑟 𝜖𝐷 𝑚 𝜖𝑥 𝑚 𝜖𝑏 𝑘 𝑘𝑙 𝑀 𝜖𝑏 𝑛 … … [𝑚] 𝑚+1 𝑀 𝑀−1 𝜖𝑏 𝑘 𝜖𝐷 𝜖𝑏 𝑛 𝜖𝑏 𝑜 𝑀−2 ∙∙∙ 𝜖𝑏 𝑟 … 𝑚 ∆𝐷 ≈ ෍ 𝑚 ∆𝑥 𝑘𝑙 𝑀 𝑀−1 𝑚 𝜖𝑏 𝑛 𝜖𝑏 𝑜 𝜖𝑏 𝑞 𝜖𝑏 𝑘 𝜖𝑥 𝑘𝑙 mn𝑞..𝑟 … … ∆𝐷 [𝑚] … 𝑀 𝑀−1 𝑚+1 𝜖𝑏 𝑘 𝑀−2 ∙∙∙ 𝜖𝑏 𝑟 𝜖𝐷 𝜖𝐷 𝜖𝑏 𝑛 𝜖𝑏 𝑜 𝑚 = ෍ 𝑀 𝑀−1 𝑚 𝑚 𝜖𝑏 𝑛 𝜖𝑏 𝑜 𝜖𝑏 𝑞 𝜖𝑥 𝜖𝑏 𝑘 𝜖𝑥 … … 𝑘𝑙 𝑘𝑙 mn𝑞..𝑟 The backpropagation algorithm is a clever way of keeping track of small perturbations the weights (and biases) as they propagate through the network , reach the output , and then affect the cost. ---Michael Nielsen

II . Summary : The Backpropagation algorithm 1.Input 𝑦 :Set the corresponding activation for the input layer. 2.Feedforward : For each 𝑚 = 𝟑, 𝟒, … , 𝐌 compute 𝑨 [𝑚] = 𝑥 [𝑚] 𝑏 [𝑚−1] + 𝑐 [𝑚] and 𝑏 [𝑚] = 𝜏 𝑨 𝑚 . 3.Output error 𝑒𝑨 [𝑀] : 𝑒𝑨 [𝑀] = 𝑏 [𝑀] - 𝑧. T 4.Back propagate the cost error:For each l =L-1,L- 2,…2 compute : dz [𝑚] = (w 𝑚+1 ) dz [𝑚+1] ∗ 𝜏 ′ (z [𝑚] ) 5.Output : The gradient of the cost function is given by ： 𝜖𝑀(𝑏,𝑧) 𝜖𝑀(𝑏,𝑧) 𝑒𝑥 [𝑚] = 𝜖𝑥 [𝑚] = 𝑒𝑨 [𝑚] 𝑏 𝑚−1 𝑈 and 𝑒𝑐 [𝑚] = 𝜖𝑐 [𝑚] = 𝑒𝑨 [𝑚] [𝑚] and 𝑐 [𝑚] ： Update the 𝑥 𝑘𝑙 𝑘 [𝑚] − 𝛽 𝜖𝑀(𝑏,𝑧) [𝑚] = 𝑥 𝑥 𝑘𝑙 𝑘𝑙 [𝑚] 𝜖𝑥 𝑘𝑙 [𝑚] = 𝑐 [𝑚] −𝛽 𝜖𝑀(𝑏,𝑧) 𝑐 𝑘 𝑘 [𝑚] 𝜖 𝑐 𝑘

Convolutional Neural Networks

1 . Types of layers in a convolutional network. • -Convolution • -Pooling • -Fully connected

2.1 Convolution in Neural Network 10 10 10 0 0 0 0 30 30 0 10 10 10 0 0 0 1 0 -1 0 30 30 0 10 10 10 0 0 0 1 0 -1 * = 0 30 30 0 10 10 10 0 0 0 1 0 -1 10 10 10 0 0 0 0 30 30 0 10 10 10 0 0 0 10 10 10 1 0 -1 * = 0 1 0 -1 10 10 10 1 0 -1 10 10 10

2.2 Multiple filters = * 3 × 3 × 3 4 × 4 4 × 4 × 2 6 × 6 × 3 = * 4 × 4 Why convolutions ？ 3 × 3 × 3 ---Parameter sharing ---Sparsity of connections

3 . Pooling layers • Max pooling 1 3 2 1 Hyperparameters: 9 2 2 9 1 1 Max pool with 2 × 2 f:filter size filters and stride 2 s:stride 1 3 2 3 6 3 Max or average pooling 5 6 1 2 • Remove the redundancy information of convolutional layer . ---By having less spatial information you gain computation performance ---Less spatial information also means less parameters, so less chance to over- fit ---You get some translation invariance

3 . Full connection layer The CNNs help extract certain features from the image , then fully connected layer is able to generalize from these features into the output-space. [LeCun et al.,1998.Gradient-based learning applied to document recognition.]

4 . Classic networks---AlexNet 𝑁𝐵𝑌 − 𝑄𝑝𝑝𝑚 𝑁𝐵𝑌 − 𝑄𝑝𝑝𝑚 3 × 3 3 × 3 11 × 11 5 × 5 S=2 S=4 S=4 Same 27 × 27 × 96 27 × 27 × 256 13 × 13 × 256 55 × 55 × 96 Parameters:9216 × 4096 × 4096=154,618,822,656 227 × 227 × 3 𝑁𝐵𝑌 − 𝑄𝑝𝑝𝑚 𝑇𝑝𝑔𝑢𝑛𝑏𝑦 = = 3 × 3 1000 3 × 3 3 × 3 3 × 3 Same S=2 13 × 13 × 256 9216 4096 4096 6 × 6 × 256 13 × 13 × 384 13 × 13 × 384

Thank you

Presentation about Deep Learning --- Zhongwu xie Contents 1.Brief - PowerPoint PPT Presentation

Presentation about Deep Learning --- Zhongwu xie Contents 1.Brief introduction of Deep learning. 2.Brief introduction of Backpropagation. 3.Brief introduction of Convolutional Neural Networks. Deep learning I . Introduction to Deep Learning

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

AGN deep multiwavelength AGN deep multiwavelength AGN deep multiwavelength surveys: surveys:

Deep Learning: Theory and Practice Deep Learning - Practical 02-04-2020 Considerations

Deep Learning on GPUs March 2016 What is Deep Learning? GPUs and DL AGENDA DL in practice

Deep learning Deep reinforcement learning Hamid Beigy Sharif university of technology December

Differen'able Func'onal Programming Noel Welsh @noelwelsh underscore Goals Deep learning

DSC 102 Systems for Scalable Analytics Arun Kumar Topic 6: Deep Learning Systems 1 Outline

ACCELERATE DEEP LEARNING WITH NVIDIA'S DEEP LEARNING PLATFORM | STEPHEN JONES | GTC16 DEEP

Deep learning for natural language processing A short primer on deep learning Benoit Favre <

Relational Deep Learning: A Deep Latent Variable Model for Link Prediction Hao Wang, Xingjian

Medical Imaging Elisa Sayrol Medical Imaging Interest in this area in Deep Learning: DeepDeep

Deep learning Optimization and Regularization in deep networks Hamid Beigy Sharif university of

Minjie Wang Deep Learning Deep Learning trend in the past 10 years Caffe State-of-art DL

Deep Learning: State of the Art (2020) Deep Learning Lecture Series https://deeplearning.mit.edu

UNDV Conference ABS155 Logic and Correct Mindset any Peace-making Leaders Must Acquire Candong

medium, low) o Impact (high, medium, low) o Support (positive, neutral, negative) S TAKEHOLDER

NGSS Playbook: Plan for student success in science! Andy Byerley, Forest Grove SD STEAM

Ecology 1 Ecology Unhealthy and Healthy Ecosystems Lesson 2012 with student responses.notebook

State Plans Lessons Learned Washington State Alzheimer Plan Partners Meeting September 4, 2014

Prepared By Joint Venture of Sheltech Consultants Pvt. Ltd and ARC Bangladesh Ltd

Big Data for housing providers Jim Vine 3Vs of Big Data Data science Source:

ISSUES FOR DUTYHOLDERS EurIng David A. Cooper BSC(Hons), MSc, MPhil, CEng, FCIBSE, FIET, FCABE,

Presentation about Deep Learning --- Zhongwu xie Contents 1.Brief - PowerPoint PPT Presentation

Presentation about Deep Learning --- Zhongwu xie Contents 1.Brief introduction of Deep learning. 2.Brief introduction of Backpropagation. 3.Brief introduction of Convolutional Neural Networks. Deep learning I . Introduction to Deep Learning

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

AGN deep multiwavelength AGN deep multiwavelength AGN deep multiwavelength surveys: surveys:

Deep Learning: Theory and Practice Deep Learning - Practical 02-04-2020 Considerations

Deep Learning on GPUs March 2016 What is Deep Learning? GPUs and DL AGENDA DL in practice

Deep learning Deep reinforcement learning Hamid Beigy Sharif university of technology December

Differen'able Func'onal Programming Noel Welsh @noelwelsh underscore Goals Deep learning

DSC 102 Systems for Scalable Analytics Arun Kumar Topic 6: Deep Learning Systems 1 Outline

ACCELERATE DEEP LEARNING WITH NVIDIA'S DEEP LEARNING PLATFORM | STEPHEN JONES | GTC16 DEEP

Deep learning for natural language processing A short primer on deep learning Benoit Favre &lt;

Relational Deep Learning: A Deep Latent Variable Model for Link Prediction Hao Wang, Xingjian

Medical Imaging Elisa Sayrol Medical Imaging Interest in this area in Deep Learning: DeepDeep

Deep learning Optimization and Regularization in deep networks Hamid Beigy Sharif university of

Minjie Wang Deep Learning Deep Learning trend in the past 10 years Caffe State-of-art DL

Deep Learning: State of the Art (2020) Deep Learning Lecture Series https://deeplearning.mit.edu

UNDV Conference ABS155 Logic and Correct Mindset any Peace-making Leaders Must Acquire Candong

medium, low) o Impact (high, medium, low) o Support (positive, neutral, negative) S TAKEHOLDER

NGSS Playbook: Plan for student success in science! Andy Byerley, Forest Grove SD STEAM

Ecology 1 Ecology Unhealthy and Healthy Ecosystems Lesson 2012 with student responses.notebook

State Plans Lessons Learned Washington State Alzheimer Plan Partners Meeting September 4, 2014

Prepared By Joint Venture of Sheltech Consultants Pvt. Ltd and ARC Bangladesh Ltd

Big Data for housing providers Jim Vine 3Vs of Big Data Data science Source:

ISSUES FOR DUTYHOLDERS EurIng David A. Cooper BSC(Hons), MSc, MPhil, CEng, FCIBSE, FIET, FCABE,

Deep learning for natural language processing A short primer on deep learning Benoit Favre <