Neural Networks, Chapter 11 in ESL II STK-IN4300 Statistical - PowerPoint PPT Presentation

Neural Networks, Chapter 11 in ESL II STK-IN4300 Statistical Learning Methods in Data Science Odd Kolbjørnsen, oddkol@math.uio.no

Learning today Neural nets • Projection pursuit – What is it? – How to solve it: Stagewise • Neural nets – What is it? – Graphical display – Connection to Projection pursuit – How to solve it: Backpropagation – Stochastic Gradient decent – Deep and wide – CNN • Example 25. September 2019 STK-IN 4300 Lecture 4- Neural nets 3

Neural network • Used for prediction • Universal approximation – with enough data and the correct algorithm you will get it right eventually… • Used for both «regression type» and «classification» type problems • Many versions and forms, currently deep learning is a hot topic • Often portrayed as fully automatic, but tailoring might help • Perform highly advanced analysis • Can create utterly complex models which are hard to decipher and hard to use for knowledge transfer. • The network provide good prediction, but is it for the right reasons? Constructed example from: Ribeiro et.al (2016) “Why Should I Trust You?” Explaining the Predictions of Any Classifier 25. September 2019 STK-IN 4300 Lecture 4- Neural nets 4

In neural nets training is based on minimization of a loss function over the training set 𝑂 Neural nets are 𝑀 𝑍, መ 𝑀(𝑧 𝑗 , መ 𝑔 𝑌 = ෍ 𝑔 𝑦 𝑗 ) General form defined by a specific form of 𝑗=1 • the model 𝑔 𝑌 Target might be multi dimensional 𝑧 𝑗 = 𝑧 𝑗1 , … , 𝑧 𝑗𝐿 𝑈 • Continuous response («regression type») 𝑂 𝐿 Squared error 2 (common) 𝑀 𝑍, መ 𝑧 𝑗𝑙 − መ 𝑔 𝑌 = ෍ ෍ 𝑔 𝑙 𝑦 𝑗 𝑗=1 𝑙=1 • Discrete (K – classes) 𝑂 𝐿 2 𝑀 𝑍, መ 𝑧 𝑗𝑙 − መ Squared error 𝑔 𝑌 = ෍ ෍ 𝑔 𝑙 𝑦 𝑗 መ 𝑔 𝑙 𝑦 𝑗 ≈ Prob(𝑧 𝑗𝑙 = 1) 𝑗=1 𝑙=1 𝑂 𝐿 𝑧 𝑗𝑙 = ቊ0 if 𝑧 𝑗 ≠ 𝑙 Cross-entropy 𝑀 𝑍, መ − log መ 1 if 𝑧 𝑗 = 𝑙 𝑔 𝑌 = ෍ ෍ 𝑔 𝑙 𝑦 𝑗 ⋅ 𝑧 𝑗𝑙 or deviance 𝑗=1 𝑙=1 STK-IN4300 Lecture 6- Neural nets 25. september 2019 5

Projection pursuit Regression Unknown functions ( ℝ → ℝ ) 𝑁 𝑈 𝑌) Friedman and Tukey (1974) 𝑔 𝑌 = ෍ 𝑕 𝑛 (𝑥 𝑛 Friedman and Stuetzle (1981) 𝑛=1 Derived feature (number m ), 𝑈 𝑌 , (size 1 × 1 ) 𝑊 𝑛 = 𝑥 𝑛 Unknown Features = Weight Explanatory (size p × 1) Variables Unit vector (size p × 1) Ridge functions are constant along directions orthogonal to the directional unit vector 𝑥 25. September 2019 STK-IN 4300 Lecture 4- Neural nets 6

Fitting Projection pursuit: M=1 • M = 1 model, known as the single index model in econometrics: – 𝑔 𝑌 = 𝑕 𝑥 𝑈 𝑌 = 𝑕(𝑊) , 𝑊 = 𝑥 𝑈 𝑌 • If 𝑥 is known fitting ො 𝑕(𝑤) is just a 1D smoothing problem Iterate – Smoothing spline, Local linear (or polynomial) regression, Kernel smoothing, K- nearest… • If 𝑕() is known fitting ෝ 𝑥 is obtained by quasi-Newton search 𝑈 𝑦 𝑗 𝑈 𝑦 𝑗 + 𝑕 ′ (𝑥 old 𝑈 𝑦 𝑗 ) 𝑥 − 𝑥 old 𝑈 𝑕 𝑥 𝑈 𝑦 𝑗 ≈ 𝑕 𝑥 old – – Minimize the objective function (with approximation inserted) 𝑂 𝑂 2 ≈ ෍ 2 𝑈 𝑦 𝑗 − 𝑕 ′ (𝑥 old 𝑈 𝑦 𝑗 ) 𝑥 − 𝑥 old 𝑈 𝑦 𝑗 𝑧 𝑗 − 𝑕 𝑥 𝑈 𝑦 𝑗 𝑈 ෍ 𝑧 𝑗 − 𝑕 𝑥 old 𝑗=1 𝑗=1 𝑂 2 𝑈 𝑦 𝑗 2 𝑧 𝑗 − 𝑕 𝑥 old Solve for 𝑥 using 𝑈 𝑦 𝑗 𝑈 𝑦 𝑗 − 𝑥 𝑈 𝑦 𝑗 𝑕 ′ 𝑥 old = ෍ + 𝑥 old weighted regression: 𝑈 𝑦 𝑗 𝑕 ′ 𝑥 old 𝑈 𝑦 𝑗 2 weight = 𝑕 ′ 𝑥 old 7 𝑗=1 25. September 2019 STK-IN 4300 Lecture 4- Neural nets

Fitting Projection pursuit, 𝑁 > 1 𝑁 • Stage-wise (greedy) 𝑈 𝑌) 𝑔 𝑌 = ෍ 𝑕 𝑛 (𝑥 𝑛 – Set 𝑧 𝑗,1 = 𝑧 𝑗 𝑛=1 – For 𝑛 = 1, … , 𝑁 • Assume there is just one function to match (as previous page) • Minimize Loss with respect to 𝑧 𝑗,𝑛 to obtain 𝑕 𝑛 () and 𝑥 𝑛 𝑂 2 𝑈 𝑦 𝑗 [ ො 𝑕 𝑛 ⋅ , ෝ 𝑥 𝑛 ] = argmin ෍ 𝑧 𝑗,𝑛 − 𝑕 𝑛 𝑥 𝑛 𝑕 𝑛 (⋅),𝑥 𝑛 𝑗=1 • Store ො 𝑕 𝑛 ⋅ and ෝ 𝑥 𝑛 𝑈 𝑦 𝑗 • Subtract estimate from data 𝑧 𝑗,𝑛+1 = 𝑧 𝑗,𝑛 − ො 𝑕 𝑛 ෝ 𝑥 𝑛 – Final prediction: 𝑁 𝑈 𝑌) መ 𝑔 𝑌 = ෍ 𝑕 𝑛 (ෝ ො 𝑥 𝑛 𝑛=1 25. September 2019 STK-IN 4300 Lecture 4- Neural nets 8

Implementation details 1. Need a smoothing method with efficient evaluation of 𝑕(𝑤) and 𝑕 ′ 𝑤 – Local regression or smoothing splines 𝑕 𝑛 (𝑤) from previous steps can be readjusted using a backfitting 2. procedure (Chapter 9), but it is unclear if this improves the performance 1. Set 𝑠 𝑗 = 𝑧 𝑗 − መ 𝑔 𝑦 𝑗 + ො 𝑕 𝑛 ෝ 𝑥 𝑛 𝑦 𝑗 2. Re-estimate 𝑕 𝑛 (⋅) from 𝑠 𝑗 . (and center the results) 3. Do this repeatedly for 𝑛 = 1, … 𝑁, 1 … 𝑁, … 3. It is not common to readjust ෝ 𝑥 𝑛 , as this is computationally demanding 4. Stopping criterion for number of terms to include . 1. When the model does not improve appreciably 2. Use cross validation to determine M 25. September 2019 STK-IN 4300 Lecture 4- Neural nets 9

Example • Train data: 1000 • Two terms: 25. September 2019 STK-IN 4300 Lecture 4- Neural nets 10

Neural network • Simplified model of a nerve system Perceptron: Input Weights Net input Activation function 𝑦 0 = 1 𝑦 0 𝛽 0 𝛽 1 𝑦 1 𝑞 𝛽 … Output 𝑤 = ෍ 𝛽 𝑗 𝑦 𝑗 𝑗=0 ⋮ 𝜏 𝑤 𝑞 𝛽 𝑞 𝜏 ෍ 𝛽 𝑗 𝑦 𝑗 𝑦 𝑞 𝑗=0 25. September 2019 STK-IN 4300 Lecture 4- Neural nets 11

Activation functions 𝜏 𝑤 • never Initially: The binary step function used • Next: Sigmoid = Logistic = Soft step Smooth • Now: there is a «rag bag» of alternatives some more suited than others for specific tasks – Smooth #2 ArcTan Continious #1 – Rectified linear ReLu (and variants) Smooth – Gaussian (NB not monotone gives different behavior ) Illustrations from: https://en.wikipedia.org/wiki/Activation_function 25. September 2019 STK-IN 4300 Lecture 4- Neural nets 12

Single layer feed-forward Neural nets Activation function ( ℝ → ℝ ) 𝑁 𝑈 𝑌 + 𝛽 0 𝑔 𝑌 = ෍ 𝛾 𝑛 𝜏 𝛽 𝑛 𝑛=1 Derived feature (number m ), 𝑈 𝑌 + 𝛽 0 , (size 1 × 1 ) 𝑎 𝑛 = 𝛽 𝑛 Unknown «Bias» or «Shift» Feature = Weight Explanatory (size (p × 1) Sigmoid variables Not unit vector (size p × 1) 𝜏 𝑡 ⋅ 𝑤 unit vector scale s=1 𝑈 𝑌 + 𝛽 0 𝜏 𝛽 𝑈 𝑌 + 𝛽 0 = 𝜏 𝑡 𝑛 ⋅ 𝑥 𝑛 = 𝜏 𝑡 𝑛 ⋅ 𝑊 𝑛 + 𝛽 0 s=0.5 s=10 𝑥 𝑛 = 𝛽 𝑛 «PP – Feature» , 𝑡 𝑛 = 𝛽 𝑛 𝑡 𝑛 13 25. September 2019 STK-IN 4300 Lecture 4- Neural nets

Graphical display of single hidden layer feed forward neural network 𝑁 𝑈 𝑌 + 𝛽 0 Output 𝑔 𝑙 𝑌 = ෍ 𝛾 𝑙,𝑛 𝜏 𝛽 𝑛 𝑛=1 We will however traverse the or W 2 𝛾 Note! graph in the With respect to model opposite direction definition as well …. 𝜏(⋅) Hidden Feed forward means: • Connections in the 𝛽 or W 1 graph are directional • The direction goes Input from input to output 25. September 2019 STK-IN 4300 Lecture 4- Neural nets 14

Output layer is often «different» 𝑈&𝑍 𝑈 𝑌 , Hidden layer: 𝑎 𝑛 = 𝜏 𝛽 0,𝑛 + 𝛽 𝑛 𝑛 = 1, … 𝑁 𝑈 𝑎, 𝑎 Output layer: 𝑈 𝑙 = 𝛾 0,𝑙 , +𝛾 𝑙 𝑙 = 1 … 𝐿 Some alternatives for 𝑔 𝑙 () : 𝑌 𝜏(𝑈 𝑙 ) Transform Same as «hidden» layers Identity 𝑈 𝑙 Common in regression setting Joint transform 𝑕 𝑙 (𝑈) Common for classification, e.g. softmax Identity Softmax exp 𝑈 𝑙 𝑔 𝑙 𝑌 = 𝑈 𝑙 𝑔 𝑙 𝑌 = 𝐿 𝑈 𝑌 + 𝛽 0,𝑛 σ 𝑘=1 exp(𝑈 𝑘 ) 𝑁 = 𝛾 0,𝑙 + σ 𝑛=1 𝛾 𝑙,𝑛 𝜏 𝛽 𝑛 𝑈 𝑎 exp 𝛾 0,𝑙 +𝛾 𝑙 = 𝐿 𝑈 𝑎) σ 𝑘=1 exp(𝛾 0,𝑘 +𝛾 𝑘 25. September 2019 STK-IN 4300 Lecture 4- Neural nets 15

Comparision Projection pursuit (PP) and Neural nets (NN) 𝑁 𝑄𝑄 𝑁 𝑂𝑂 𝑈 𝑌) 𝑈 𝑌 + 𝛽 0 𝑔 𝑌 = ෍ 𝑕 𝑛 (𝑥 𝑛 𝑔 𝑌 = ෍ 𝛾 𝑛 𝜏 𝛽 𝑛 𝑛=1 𝑛=1 𝑈 𝑌) 𝑈 𝑌 + 𝛽 0 𝑡 𝑛 = | 𝛽 | 𝑕 𝑛 (𝑥 𝑛 vs 𝛾 𝑛 𝜏 𝑡 𝑛 ⋅ 𝑥 𝑛 • The flexibility of 𝑕 𝑛 is much larger than what is obtained with 𝑡 𝑛 and 𝛽 0 which are the additional parameters of neural nets • There are usually less terms in PP than NN, i.e. 𝑁 𝑄𝑄 ≪ 𝑁 𝑂𝑂 • Both methods are powerful for regression and classification • Effective in problems with high signal to noise ratio • Suited for prediction without interpretation • Identifiability of weights an open question and creates problems in interpretations • The fitting procedures are different 25. September 2019 STK-IN 4300 Lecture 4- Neural nets 16

Neural Networks, Chapter 11 in ESL II STK-IN4300 Statistical - PowerPoint PPT Presentation

Neural Networks, Chapter 11 in ESL II STK-IN4300 Statistical Learning Methods in Data Science Odd Kolbjrnsen, oddkol@math.uio.no Learning today Neural nets Projection pursuit What is it? How to solve it: Stagewise Neural nets

The Progression of ELD SKI LL AREA PROFI CI ENCY LEVEL DESCRI PTORS, GRADES 9 1 2 Listening

PEAC Peralta ESL Advisory Council Implementing Our NEW Accelerated ESL Model Dr. Sedique Popal

CHAPTER II I CHAPTER I Recurrent Neural Networks Recurrent Neural Networks CHAPTER II : I :

CHAPTER II III I CHAPTER Neural Networks as Neural Networks as Associative Memory

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

Neural networks Chapter 20 Chapter 20 1 Outline Brains Neural networks Perceptrons

Neural Networks and Handwriting Recognition Background Neural Networks Neural Network Steven

CHAPTER VI VI CHAPTER Learning in Feedforward Feedforward Learning in Neural Networks Neural

Neural Networks Neural networks arise from attempts to model Neural Networks human/animal

Neural networks Chapter 20, Section 5 Chapter 20, Section 5 1 Outline Brains Neural

18-ESL-602 (001) SPEAKING AND LISTENING SKILLS I English as a Second Language (ESL) Program

Literacy Learning Simulation www.esl-literacy.com Workshop Outcome Participants will reflect

Tri-Cities Adult Literacy & ESL Working Group The Adult Literacy & ESL Working Group

Starting and Growing a Church-Based ESL program: contexts and principles Michael Pasquale &

Sequential Data with Neural Networks Recurrent Neural Networks Sequential input / output Greg

CHAPTER IV IV CHAPTER Combinatorial Optimization Combinatorial Optimization by Neural Networks

ECE 6504: Deep Learning for Perception Topics: Recurrent Neural Networks (RNNs) BackProp

Optimal Learning Rate What is the optimal value opt of the learning rate? Consider 1 -dim.

Neurodynamic Optimization: New Models and kWTA Applications Jun Wang jwang@mae.cuhk.edu.hk

The 3-pound universe we live in Cerebrum/Cerebral Cortex Thalamus Hypothalamus Pons

Quantum neurons Yudong Cao with Gian Giacomo Guerreschi, Aln Aspuru-Guzik Quantum Techniques

Unsupervised Learning Gustavo Velasco-Hern andez Pattern Recognition, 2014 Gustavo

Deep Learning - Theory and Practice Deep Neural Networks 12-03-2020

Artificial Neural Network : Architectures Debasis Samanta IIT Kharagpur dsamanta@iitkgp.ac.in

Neural Networks, Chapter 11 in ESL II STK-IN4300 Statistical - PowerPoint PPT Presentation

Neural Networks, Chapter 11 in ESL II STK-IN4300 Statistical Learning Methods in Data Science Odd Kolbjrnsen, oddkol@math.uio.no Learning today Neural nets Projection pursuit What is it? How to solve it: Stagewise Neural nets

The Progression of ELD SKI LL AREA PROFI CI ENCY LEVEL DESCRI PTORS, GRADES 9 1 2 Listening

PEAC Peralta ESL Advisory Council Implementing Our NEW Accelerated ESL Model Dr. Sedique Popal

CHAPTER II I CHAPTER I Recurrent Neural Networks Recurrent Neural Networks CHAPTER II : I :

CHAPTER II III I CHAPTER Neural Networks as Neural Networks as Associative Memory

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

Neural networks Chapter 20 Chapter 20 1 Outline Brains Neural networks Perceptrons

Neural Networks and Handwriting Recognition Background Neural Networks Neural Network Steven

CHAPTER VI VI CHAPTER Learning in Feedforward Feedforward Learning in Neural Networks Neural

Neural Networks Neural networks arise from attempts to model Neural Networks human/animal

Neural networks Chapter 20, Section 5 Chapter 20, Section 5 1 Outline Brains Neural

18-ESL-602 (001) SPEAKING AND LISTENING SKILLS I English as a Second Language (ESL) Program

Literacy Learning Simulation www.esl-literacy.com Workshop Outcome Participants will reflect

Tri-Cities Adult Literacy &amp; ESL Working Group The Adult Literacy &amp; ESL Working Group

Starting and Growing a Church-Based ESL program: contexts and principles Michael Pasquale &amp;

Sequential Data with Neural Networks Recurrent Neural Networks Sequential input / output Greg

CHAPTER IV IV CHAPTER Combinatorial Optimization Combinatorial Optimization by Neural Networks

ECE 6504: Deep Learning for Perception Topics: Recurrent Neural Networks (RNNs) BackProp

Optimal Learning Rate What is the optimal value opt of the learning rate? Consider 1 -dim.

Neurodynamic Optimization: New Models and kWTA Applications Jun Wang jwang@mae.cuhk.edu.hk

The 3-pound universe we live in Cerebrum/Cerebral Cortex Thalamus Hypothalamus Pons

Quantum neurons Yudong Cao with Gian Giacomo Guerreschi, Aln Aspuru-Guzik Quantum Techniques

Unsupervised Learning Gustavo Velasco-Hern andez Pattern Recognition, 2014 Gustavo

Deep Learning - Theory and Practice Deep Neural Networks 12-03-2020

Artificial Neural Network : Architectures Debasis Samanta IIT Kharagpur dsamanta@iitkgp.ac.in

Tri-Cities Adult Literacy & ESL Working Group The Adult Literacy & ESL Working Group

Starting and Growing a Church-Based ESL program: contexts and principles Michael Pasquale &