neural networks chapter 11 in esl ii
play

Neural Networks, Chapter 11 in ESL II STK-IN4300 Statistical - PowerPoint PPT Presentation

Neural Networks, Chapter 11 in ESL II STK-IN4300 Statistical Learning Methods in Data Science Odd Kolbjrnsen, oddkol@math.uio.no Learning today Neural nets Projection pursuit What is it? How to solve it: Stagewise Neural nets


  1. Neural Networks, Chapter 11 in ESL II STK-IN4300 Statistical Learning Methods in Data Science Odd Kolbjørnsen, oddkol@math.uio.no

  2. Learning today Neural nets • Projection pursuit – What is it? – How to solve it: Stagewise • Neural nets – What is it? – Graphical display – Connection to Projection pursuit – How to solve it: Backpropagation – Stochastic Gradient decent – Deep and wide – CNN • Example 25. September 2019 STK-IN 4300 Lecture 4- Neural nets 3

  3. Neural network • Used for prediction • Universal approximation – with enough data and the correct algorithm you will get it right eventually… • Used for both «regression type» and «classification» type problems • Many versions and forms, currently deep learning is a hot topic • Often portrayed as fully automatic, but tailoring might help • Perform highly advanced analysis • Can create utterly complex models which are hard to decipher and hard to use for knowledge transfer. • The network provide good prediction, but is it for the right reasons? Constructed example from: Ribeiro et.al (2016) “Why Should I Trust You?” Explaining the Predictions of Any Classifier 25. September 2019 STK-IN 4300 Lecture 4- Neural nets 4

  4. In neural nets training is based on minimization of a loss function over the training set 𝑂 Neural nets are 𝑀 𝑍, መ 𝑀(𝑧 𝑗 , መ 𝑔 𝑌 = ෍ 𝑔 𝑦 𝑗 ) General form defined by a specific form of 𝑗=1 • the model 𝑔 𝑌 Target might be multi dimensional 𝑧 𝑗 = 𝑧 𝑗1 , … , 𝑧 𝑗𝐿 𝑈 • Continuous response («regression type») 𝑂 𝐿 Squared error 2 (common) 𝑀 𝑍, መ 𝑧 𝑗𝑙 − መ 𝑔 𝑌 = ෍ ෍ 𝑔 𝑙 𝑦 𝑗 𝑗=1 𝑙=1 • Discrete (K – classes) 𝑂 𝐿 2 𝑀 𝑍, መ 𝑧 𝑗𝑙 − መ Squared error 𝑔 𝑌 = ෍ ෍ 𝑔 𝑙 𝑦 𝑗 መ 𝑔 𝑙 𝑦 𝑗 ≈ Prob(𝑧 𝑗𝑙 = 1) 𝑗=1 𝑙=1 𝑂 𝐿 𝑧 𝑗𝑙 = ቊ0 if 𝑧 𝑗 ≠ 𝑙 Cross-entropy 𝑀 𝑍, መ − log መ 1 if 𝑧 𝑗 = 𝑙 𝑔 𝑌 = ෍ ෍ 𝑔 𝑙 𝑦 𝑗 ⋅ 𝑧 𝑗𝑙 or deviance 𝑗=1 𝑙=1 STK-IN4300 Lecture 6- Neural nets 25. september 2019 5

  5. Projection pursuit Regression Unknown functions ( ℝ → ℝ ) 𝑁 𝑈 𝑌) Friedman and Tukey (1974) 𝑔 𝑌 = ෍ 𝑕 𝑛 (𝑥 𝑛 Friedman and Stuetzle (1981) 𝑛=1 Derived feature (number m ), 𝑈 𝑌 , (size 1 × 1 ) 𝑊 𝑛 = 𝑥 𝑛 Unknown Features = Weight Explanatory (size p × 1) Variables Unit vector (size p × 1) Ridge functions are constant along directions orthogonal to the directional unit vector 𝑥 25. September 2019 STK-IN 4300 Lecture 4- Neural nets 6

  6. Fitting Projection pursuit: M=1 • M = 1 model, known as the single index model in econometrics: – 𝑔 𝑌 = 𝑕 𝑥 𝑈 𝑌 = 𝑕(𝑊) , 𝑊 = 𝑥 𝑈 𝑌 • If 𝑥 is known fitting ො 𝑕(𝑤) is just a 1D smoothing problem Iterate – Smoothing spline, Local linear (or polynomial) regression, Kernel smoothing, K- nearest… • If 𝑕() is known fitting ෝ 𝑥 is obtained by quasi-Newton search 𝑈 𝑦 𝑗 𝑈 𝑦 𝑗 + 𝑕 ′ (𝑥 old 𝑈 𝑦 𝑗 ) 𝑥 − 𝑥 old 𝑈 𝑕 𝑥 𝑈 𝑦 𝑗 ≈ 𝑕 𝑥 old – – Minimize the objective function (with approximation inserted) 𝑂 𝑂 2 ≈ ෍ 2 𝑈 𝑦 𝑗 − 𝑕 ′ (𝑥 old 𝑈 𝑦 𝑗 ) 𝑥 − 𝑥 old 𝑈 𝑦 𝑗 𝑧 𝑗 − 𝑕 𝑥 𝑈 𝑦 𝑗 𝑈 ෍ 𝑧 𝑗 − 𝑕 𝑥 old 𝑗=1 𝑗=1 𝑂 2 𝑈 𝑦 𝑗 2 𝑧 𝑗 − 𝑕 𝑥 old Solve for 𝑥 using 𝑈 𝑦 𝑗 𝑈 𝑦 𝑗 − 𝑥 𝑈 𝑦 𝑗 𝑕 ′ 𝑥 old = ෍ + 𝑥 old weighted regression: 𝑈 𝑦 𝑗 𝑕 ′ 𝑥 old 𝑈 𝑦 𝑗 2 weight = 𝑕 ′ 𝑥 old 7 𝑗=1 25. September 2019 STK-IN 4300 Lecture 4- Neural nets

  7. Fitting Projection pursuit, 𝑁 > 1 𝑁 • Stage-wise (greedy) 𝑈 𝑌) 𝑔 𝑌 = ෍ 𝑕 𝑛 (𝑥 𝑛 – Set 𝑧 𝑗,1 = 𝑧 𝑗 𝑛=1 – For 𝑛 = 1, … , 𝑁 • Assume there is just one function to match (as previous page) • Minimize Loss with respect to 𝑧 𝑗,𝑛 to obtain 𝑕 𝑛 () and 𝑥 𝑛 𝑂 2 𝑈 𝑦 𝑗 [ ො 𝑕 𝑛 ⋅ , ෝ 𝑥 𝑛 ] = argmin ෍ 𝑧 𝑗,𝑛 − 𝑕 𝑛 𝑥 𝑛 𝑕 𝑛 (⋅),𝑥 𝑛 𝑗=1 • Store ො 𝑕 𝑛 ⋅ and ෝ 𝑥 𝑛 𝑈 𝑦 𝑗 • Subtract estimate from data 𝑧 𝑗,𝑛+1 = 𝑧 𝑗,𝑛 − ො 𝑕 𝑛 ෝ 𝑥 𝑛 – Final prediction: 𝑁 𝑈 𝑌) መ 𝑔 𝑌 = ෍ 𝑕 𝑛 (ෝ ො 𝑥 𝑛 𝑛=1 25. September 2019 STK-IN 4300 Lecture 4- Neural nets 8

  8. Implementation details 1. Need a smoothing method with efficient evaluation of 𝑕(𝑤) and 𝑕 ′ 𝑤 – Local regression or smoothing splines 𝑕 𝑛 (𝑤) from previous steps can be readjusted using a backfitting 2. procedure (Chapter 9), but it is unclear if this improves the performance 1. Set 𝑠 𝑗 = 𝑧 𝑗 − መ 𝑔 𝑦 𝑗 + ො 𝑕 𝑛 ෝ 𝑥 𝑛 𝑦 𝑗 2. Re-estimate 𝑕 𝑛 (⋅) from 𝑠 𝑗 . (and center the results) 3. Do this repeatedly for 𝑛 = 1, … 𝑁, 1 … 𝑁, … 3. It is not common to readjust ෝ 𝑥 𝑛 , as this is computationally demanding 4. Stopping criterion for number of terms to include . 1. When the model does not improve appreciably 2. Use cross validation to determine M 25. September 2019 STK-IN 4300 Lecture 4- Neural nets 9

  9. Example • Train data: 1000 • Two terms: 25. September 2019 STK-IN 4300 Lecture 4- Neural nets 10

  10. Neural network • Simplified model of a nerve system Perceptron: Input Weights Net input Activation function 𝑦 0 = 1 𝑦 0 𝛽 0 𝛽 1 𝑦 1 𝑞 𝛽 … Output 𝑤 = ෍ 𝛽 𝑗 𝑦 𝑗 𝑗=0 ⋮ 𝜏 𝑤 𝑞 𝛽 𝑞 𝜏 ෍ 𝛽 𝑗 𝑦 𝑗 𝑦 𝑞 𝑗=0 25. September 2019 STK-IN 4300 Lecture 4- Neural nets 11

  11. Activation functions 𝜏 𝑤 • never Initially: The binary step function used • Next: Sigmoid = Logistic = Soft step Smooth • Now: there is a «rag bag» of alternatives some more suited than others for specific tasks – Smooth #2 ArcTan Continious #1 – Rectified linear ReLu (and variants) Smooth – Gaussian (NB not monotone gives different behavior ) Illustrations from: https://en.wikipedia.org/wiki/Activation_function 25. September 2019 STK-IN 4300 Lecture 4- Neural nets 12

  12. Single layer feed-forward Neural nets Activation function ( ℝ → ℝ ) 𝑁 𝑈 𝑌 + 𝛽 0 𝑔 𝑌 = ෍ 𝛾 𝑛 𝜏 𝛽 𝑛 𝑛=1 Derived feature (number m ), 𝑈 𝑌 + 𝛽 0 , (size 1 × 1 ) 𝑎 𝑛 = 𝛽 𝑛 Unknown «Bias» or «Shift» Feature = Weight Explanatory (size (p × 1) Sigmoid variables Not unit vector (size p × 1) 𝜏 𝑡 ⋅ 𝑤 unit vector scale s=1 𝑈 𝑌 + 𝛽 0 𝜏 𝛽 𝑈 𝑌 + 𝛽 0 = 𝜏 𝑡 𝑛 ⋅ 𝑥 𝑛 = 𝜏 𝑡 𝑛 ⋅ 𝑊 𝑛 + 𝛽 0 s=0.5 s=10 𝑥 𝑛 = 𝛽 𝑛 «PP – Feature» , 𝑡 𝑛 = 𝛽 𝑛 𝑡 𝑛 13 25. September 2019 STK-IN 4300 Lecture 4- Neural nets

  13. Graphical display of single hidden layer feed forward neural network 𝑁 𝑈 𝑌 + 𝛽 0 Output 𝑔 𝑙 𝑌 = ෍ 𝛾 𝑙,𝑛 𝜏 𝛽 𝑛 𝑛=1 We will however traverse the or W 2 𝛾 Note! graph in the With respect to model opposite direction definition as well …. 𝜏(⋅) Hidden Feed forward means: • Connections in the 𝛽 or W 1 graph are directional • The direction goes Input from input to output 25. September 2019 STK-IN 4300 Lecture 4- Neural nets 14

  14. Output layer is often «different» 𝑈&𝑍 𝑈 𝑌 , Hidden layer: 𝑎 𝑛 = 𝜏 𝛽 0,𝑛 + 𝛽 𝑛 𝑛 = 1, … 𝑁 𝑈 𝑎, 𝑎 Output layer: 𝑈 𝑙 = 𝛾 0,𝑙 , +𝛾 𝑙 𝑙 = 1 … 𝐿 Some alternatives for 𝑔 𝑙 () : 𝑌 𝜏(𝑈 𝑙 ) Transform Same as «hidden» layers Identity 𝑈 𝑙 Common in regression setting Joint transform 𝑕 𝑙 (𝑈) Common for classification, e.g. softmax Identity Softmax exp 𝑈 𝑙 𝑔 𝑙 𝑌 = 𝑈 𝑙 𝑔 𝑙 𝑌 = 𝐿 𝑈 𝑌 + 𝛽 0,𝑛 σ 𝑘=1 exp(𝑈 𝑘 ) 𝑁 = 𝛾 0,𝑙 + σ 𝑛=1 𝛾 𝑙,𝑛 𝜏 𝛽 𝑛 𝑈 𝑎 exp 𝛾 0,𝑙 +𝛾 𝑙 = 𝐿 𝑈 𝑎) σ 𝑘=1 exp(𝛾 0,𝑘 +𝛾 𝑘 25. September 2019 STK-IN 4300 Lecture 4- Neural nets 15

  15. Comparision Projection pursuit (PP) and Neural nets (NN) 𝑁 𝑄𝑄 𝑁 𝑂𝑂 𝑈 𝑌) 𝑈 𝑌 + 𝛽 0 𝑔 𝑌 = ෍ 𝑕 𝑛 (𝑥 𝑛 𝑔 𝑌 = ෍ 𝛾 𝑛 𝜏 𝛽 𝑛 𝑛=1 𝑛=1 𝑈 𝑌) 𝑈 𝑌 + 𝛽 0 𝑡 𝑛 = | 𝛽 | 𝑕 𝑛 (𝑥 𝑛 vs 𝛾 𝑛 𝜏 𝑡 𝑛 ⋅ 𝑥 𝑛 • The flexibility of 𝑕 𝑛 is much larger than what is obtained with 𝑡 𝑛 and 𝛽 0 which are the additional parameters of neural nets • There are usually less terms in PP than NN, i.e. 𝑁 𝑄𝑄 ≪ 𝑁 𝑂𝑂 • Both methods are powerful for regression and classification • Effective in problems with high signal to noise ratio • Suited for prediction without interpretation • Identifiability of weights an open question and creates problems in interpretations • The fitting procedures are different 25. September 2019 STK-IN 4300 Lecture 4- Neural nets 16

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend