backpropagation
play

Backpropagation Many slides attributable to: Prof. Mike Hughes - PowerPoint PPT Presentation

Tufts COMP 135: Introduction to Machine Learning https://www.cs.tufts.edu/comp/135/2020f/ Backpropagation Many slides attributable to: Prof. Mike Hughes Erik Sudderth (UCI), Emily Fox (UW), Finale Doshi-Velez (Harvard) James, Witten, Hastie,


  1. Tufts COMP 135: Introduction to Machine Learning https://www.cs.tufts.edu/comp/135/2020f/ Backpropagation Many slides attributable to: Prof. Mike Hughes Erik Sudderth (UCI), Emily Fox (UW), Finale Doshi-Velez (Harvard) James, Witten, Hastie, Tibshirani (ISL/ESL books) 2

  2. Objectives Today (day 11) Backpropagation • Review: Multi-layer perceptrons • MLPs can learn feature representations • Activation functions • Training via gradient descent • Back-propagation = gradient descent + chain rule Mike Hughes - Tufts COMP 135 - Fall 2020 3

  3. What will we learn? Evaluation Supervised Training Learning Data, Label Pairs Performance { x n , y n } N measure Task n =1 Unsupervised Learning data label x y Reinforcement Learning Prediction Mike Hughes - Tufts COMP 135 - Fall 2020 4

  4. Task: Binary Classification y is a binary variable Supervised (red or blue) Learning binary classification x 2 Unsupervised Learning Reinforcement Learning x 1 Mike Hughes - Tufts COMP 135 - Fall 2020 5

  5. Feature Transform Pipeline Data, Label Pairs { x n , y n } N n =1 Feature, Label Pairs Performance Task { φ ( x n ) , y n } N measure n =1 label data φ ( x ) y x Mike Hughes - Tufts COMP 135 - Fall 2020 6

  6. MLP: Multi-Layer Perceptron Neural network with 1 or more hidden layers followed by 1 output layer Mike Hughes - Tufts COMP 135 - Fall 2020 7

  7. <latexit sha1_base64="gcounLZx65+YwqFznctCxJvd5DQ=">ACHicbVDLSgNBEJyNrxhfqx49OBiEiBJ2o6AXIejFYwTzgCQs5PZMjsg5leTVhy9OKvePGgiFc/wZt/4yTZgyYWNBRV3XR3uZHgCizr28gsLC4tr2RXc2vrG5tb5vZOTYWxpKxKQxHKhksUEzxgVeAgWCOSjPiuYHW3fz326/dMKh4GdzCMWNsn3YB7nBLQkmPue46NL3EL2ACSTgijwoNjn2A8OMLH2HVsx8xbRWsCPE/slORiopjfrU6IY19FgAVRKmbUXQTogETgUb5VqxYhGhfdJlTU0D4jPVTiaPjPChVjrYC6WuAPBE/T2REF+poe/qTp9AT816Y/E/rxmDd9FOeBDFwAI6XeTFAkOIx6ngDpeMghqQqjk+lZMe0QSCjq7nA7Bn15ntRKRfu0WLo9y5ev0jiyaA8doAKy0TkqoxtUQVE0SN6Rq/ozXgyXox342PamjHSmV30B8bnD+Fol1E=</latexit> <latexit sha1_base64="p5nNjLHZMP1/vfZtEXvoZDAvyGc=">ACFnicbZDLSgMxFIYz9VbrerSTbAILWqZGQXdCEU3LivYC7RlyKSZNjRzITmjlqFP4cZXceNCEbfizrcxbWehrT8Efr5zDifndyPBFZjmt5FZWFxaXsmu5tbWNza38ts7dRXGkrIaDUomy5RTPCA1YCDYM1IMuK7gjXcwdW43rhjUvEwuIVhxDo+6QXc45SARk7+2HNsfIHbwB4g6YwKt479hFOAaEaeI5VKuFD7Dq2ky+YZXMiPG+s1BRQqT/2p3Qxr7LAqiFIty4ygkxAJnAo2yrVjxSJCB6THWtoGxGeqk0zOGuEDTbrYC6V+AeAJ/T2REF+poe/qTp9AX83WxvC/WisG7yT8CKgQV0usiLBYQjzPCXS4ZBTHUhlDJ9V8x7ROp09BJ5nQI1uzJ86Zul62Tsn1zWqhcpnFk0R7aR0VkoTNUQdeoimqIokf0jF7Rm/FkvBjvxse0NWOkM7voj4zPH2y6nQA=</latexit> <latexit sha1_base64="4GovjSsUH60H40Rau/NP9N/Tx/8=">ACFnicbZDLSgMxFIYz9VbrbdSlm2ARWtQy0wq6EYpuXFawF2jLkEkzNTRzITmjlqFP4cZXceNCEbfizrcxbWehrT8Efr5zDifndyPBFVjWt5FZWFxaXsmu5tbWNza3zO2dhgpjSVmdhiKULZcoJnjA6sBsFYkGfFdwZru4HJcb94xqXgY3MAwYl2f9APucUpAI8c89pwKPscdYA+Q9EIYFe6dyhFOAaEaeE65WMSH2HUqjpm3StZEeN7YqcmjVDXH/Or0Qhr7LAqiFJt24qgmxAJnAo2ynVixSJCB6TP2toGxGeqm0zOGuEDTXrYC6V+AeAJ/T2REF+poe/qTp/ArZqtjeF/tXYM3lk34UEUAwvodJEXCwhHmeEe1wyCmKoDaGS679iekukTkMnmdMh2LMnz5tGuWRXSuXrk3z1Io0ji/bQPiogG52iKrpCNVRHFD2iZ/SK3own48V4Nz6mrRkjndlFf2R8/gBzFZ0E</latexit> Neural nets with many layers f 1 = dot( w 1 , x ) + b 1 f 2 = dot( w 2 , act( f 1 )) + b 2 f 3 = dot( w 3 , act( f 2 )) + b 3 Output Input f 3 data x dot = matrix multiply act = activation function Mike Hughes - Tufts COMP 135 - Fall 2020 8

  8. Each Layer Extracts “Higher Level” Features Mike Hughes - Tufts COMP 135 - Fall 2020 9

  9. <latexit sha1_base64="5kYIeNn6iF1uWQ/izHBnMWO1DbQ=">AB/XicbVDLSsNAFL3xWeurPnZuBovgqiRV0GXRjQsXVewDmhgm0k7dDIJMxOhuKvuHGhiFv/w51/46TtQlsPDBzOuZd75gQJZ0rb9re1sLi0vLJaWCub2xubZd2dpsqTiWhDRLzWLYDrChngjY05y2E0lxFHDaCgaXud96oFKxWNzpYUK9CPcECxnB2kh+aT/0r5HLBHIjrPtBkN2O7h2/VLYr9honjhTUoYp6n7py+3GJI2o0IRjpTqOnWgvw1Izwumo6KaKJpgMcI92DBU4osrLxulH6MgoXRTG0jyh0Vj9vZHhSKlhFJjJPKOa9XLxP6+T6vDcy5hIUk0FmRwKU450jPIqUJdJSjQfGoKJZCYrIn0sMdGmsKIpwZn98jxpVivOSaV6c1quXUzrKMABHMIxOHAGNbiCOjSAwCM8wyu8WU/Wi/VufUxGF6zpzh78gfX5A4IylJ4=</latexit> <latexit sha1_base64="4GovjSsUH60H40Rau/NP9N/Tx/8=">ACFnicbZDLSgMxFIYz9VbrbdSlm2ARWtQy0wq6EYpuXFawF2jLkEkzNTRzITmjlqFP4cZXceNCEbfizrcxbWehrT8Efr5zDifndyPBFVjWt5FZWFxaXsmu5tbWNza3zO2dhgpjSVmdhiKULZcoJnjA6sBsFYkGfFdwZru4HJcb94xqXgY3MAwYl2f9APucUpAI8c89pwKPscdYA+Q9EIYFe6dyhFOAaEaeE65WMSH2HUqjpm3StZEeN7YqcmjVDXH/Or0Qhr7LAqiFJt24qgmxAJnAo2ynVixSJCB6TP2toGxGeqm0zOGuEDTXrYC6V+AeAJ/T2REF+poe/qTp/ArZqtjeF/tXYM3lk34UEUAwvodJEXCwhHmeEe1wyCmKoDaGS679iekukTkMnmdMh2LMnz5tGuWRXSuXrk3z1Io0ji/bQPiogG52iKrpCNVRHFD2iZ/SK3own48V4Nz6mrRkjndlFf2R8/gBzFZ0E</latexit> <latexit sha1_base64="lBjEWcUXegEqcmwDhP9JHAWhM=">ACHicbVDLSsNAFJ34tr6qLt0MFqFLUkVdCMU3bjoKthTaEyXSiQycPZm7UEvIhbvwVNy4UceNC8G+c1Cy09cDA4ZxzuXOPGwmuwDS/jKnpmdm5+YXFwtLyupacX2jrcJYUtaioQhlxyWKCR6wFnAQrBNJRnxXsCt3cJb5V7dMKh4GlzCMmO2T64B7nBLQklM8JykeIT3AN2D0k/hLSM75zGXi4QqoUs2+lQrexa7TcIols2qOgCeJlZMSytF0ih+9fkhjnwVABVGqa5kR2AmRwKlgaEXKxYROiDXrKtpQHym7GR0XIp3tNLHXij1CwCP1N8TCfGVGvquTvoEbtS4l4n/ed0YvGM74UEUAwvozyIvFhCnDWF+1wyCmKoCaGS679iekOkbkT3WdAlWOMnT5J2rWodVGsXh6X6aV7HAtpC26iMLHSE6ugcNVELUfSAntALejUejWfjzXj/iU4Z+cwm+gPj8xviaJ/z</latexit> <latexit sha1_base64="ITklBqrNbU3P4pKWir9wal98Oc=">AB/XicbVDLSgMxFL1TX7W+6mPnJlgEV2WmFXRZ7MZlFfuAzlgyaYNzWSGJCPUofgrblwo4tb/cOfmLaz0NYDgcM593JPjh9zprRtf1u5ldW19Y38ZmFre2d3r7h/0FJRIgltkohHsuNjRTkTtKmZ5rQTS4pDn9O2P6pP/fYDlYpF4k6PY+qFeCBYwAjWRuoVj4JeFblMIDfEeuj76e3kvt4rluyPQNaJk5GSpCh0St+uf2IJCEVmnCsVNexY+2lWGpGOJ0U3ETRGJMRHtCuoQKHVHnpLP0EnRqlj4JImic0mqm/N1IcKjUOfTM5zagWvan4n9dNdHDpUzEiaCzA8FCUc6QtMqUJ9JSjQfG4KJZCYrIkMsMdGmsIpwVn8jJpVcpOtVy5OS/VrI68nAMJ3AGDlxADa6hAU0g8AjP8Apv1pP1Yr1bH/PRnJXtHMIfWJ8/de2Ulw=</latexit> <latexit sha1_base64="gcounLZx65+YwqFznctCxJvd5DQ=">ACHicbVDLSgNBEJyNrxhfqx49OBiEiBJ2o6AXIejFYwTzgCQs5PZMjsg5leTVhy9OKvePGgiFc/wZt/4yTZgyYWNBRV3XR3uZHgCizr28gsLC4tr2RXc2vrG5tb5vZOTYWxpKxKQxHKhksUEzxgVeAgWCOSjPiuYHW3fz326/dMKh4GdzCMWNsn3YB7nBLQkmPue46NL3EL2ACSTgijwoNjn2A8OMLH2HVsx8xbRWsCPE/slORiopjfrU6IY19FgAVRKmbUXQTogETgUb5VqxYhGhfdJlTU0D4jPVTiaPjPChVjrYC6WuAPBE/T2REF+poe/qTp9AT816Y/E/rxmDd9FOeBDFwAI6XeTFAkOIx6ngDpeMghqQqjk+lZMe0QSCjq7nA7Bn15ntRKRfu0WLo9y5ev0jiyaA8doAKy0TkqoxtUQVE0SN6Rq/ozXgyXox342PamjHSmV30B8bnD+Fol1E=</latexit> <latexit sha1_base64="XYCjQhdjved7mxD3Aj7RJEMTE4s=">AB/XicbVDLSsNAFL3xWesrPnZuBovgqiRV0GWpG5dV7AOaWibTSTt0MgkzE6G4q+4caGIW/DnX/jpM1CWw8MHM65l3vm+DFnSjvOt7W0vLK6tl7YKG5ube/s2nv7TRUlktAGiXgk2z5WlDNBG5pTtuxpDj0OW35o6vMbz1QqVgk7vQ4pt0QDwQLGMHaSD37MOhVkMcE8kKsh76f3k7uaz275JSdKdAicXNSghz1nv3l9SOShFRowrFSHdeJdTfFUjPC6aToJYrGmIzwgHYMFTikqptO0/QiVH6KIikeUKjqfp7I8WhUuPQN5NZRjXvZeJ/XifRwWU3ZSJONBVkdihIONIRyqpAfSYp0XxsCaSmayIDLHERJvCiqYEd/7Li6RZKbtn5crNealay+sowBEcwym4cAFVuIY6NIDAIzDK7xZT9aL9W59zEaXrHznAP7A+vwBctSUlQ=</latexit> <latexit sha1_base64="p5nNjLHZMP1/vfZtEXvoZDAvyGc=">ACFnicbZDLSgMxFIYz9VbrerSTbAILWqZGQXdCEU3LivYC7RlyKSZNjRzITmjlqFP4cZXceNCEbfizrcxbWehrT8Efr5zDifndyPBFZjmt5FZWFxaXsmu5tbWNza38ts7dRXGkrIaDUomy5RTPCA1YCDYM1IMuK7gjXcwdW43rhjUvEwuIVhxDo+6QXc45SARk7+2HNsfIHbwB4g6YwKt479hFOAaEaeI5VKuFD7Dq2ky+YZXMiPG+s1BRQqT/2p3Qxr7LAqiFIty4ygkxAJnAo2yrVjxSJCB6THWtoGxGeqk0zOGuEDTbrYC6V+AeAJ/T2REF+poe/qTp9AX83WxvC/WisG7yT8CKgQV0usiLBYQjzPCXS4ZBTHUhlDJ9V8x7ROp09BJ5nQI1uzJ86Zul62Tsn1zWqhcpnFk0R7aR0VkoTNUQdeoimqIokf0jF7Rm/FkvBjvxse0NWOkM7voj4zPH2y6nQA=</latexit> <latexit sha1_base64="xM5WVmfZNZQY3EYPiHZWie5Z8JI=">AB/XicbVDLSsNAFL3xWeurPnZuBovgqiRV0GXVjcsq9gFNDJPpB06mYSZiVBD8VfcuFDErf/hzr9x0nahrQcGDufcyz1zgoQzpW3721pYXFpeWS2sFdc3Nre2Szu7TRWnktAGiXks2wFWlDNBG5pTtuJpDgKOG0Fg6vcbz1QqVgs7vQwoV6Ee4KFjGBtJL+0H/oOcplAboR1Pwiy29H9hV8q2xV7DRPnCkpwxR1v/TldmOSRlRowrFSHcdOtJdhqRnhdFR0U0UTAa4RzuGChxR5WXj9CN0ZJQuCmNpntBorP7eyHCk1DAKzGSeUc16ufif10l1eO5lTCSpoJMDoUpRzpGeRWoyQlmg8NwUQykxWRPpaYaFNY0ZTgzH5njSrFekUr05Ldcup3U4AO4RgcOIMaXEMdGkDgEZ7hFd6sJ+vFerc+JqML1nRnD/7A+vwBb7uUkw=</latexit> Multi-layer perceptron (MLP) You can define an MLP by specifying: - Number of hidden layers (L-1) and size of each layer - hidden_layer_sizes = [A, B, C, D, …] - Hidden layer activation function - ReLU, etc. - Output layer activation function f 1 ∈ R A f 1 = dot( w 1 , x ) + b 1 f 2 ∈ R B f 2 = dot( w 2 , act( f 1 )) + b 2 f 3 ∈ R C f 3 = dot( w 3 , act( f 2 )) + b 3 f L ∈ R 1 f L = dot( w L , act( f L − 1 )) + b L Mike Hughes - Tufts COMP 135 - Fall 2020 10

  10. How to train Neural Nets? Just like logistic regression 1. Set up a loss function 2. Apply gradient descent! Mike Hughes - Tufts COMP 135 - Fall 2020 11

  11. Output as function of weights f 3 ( f 2 ( f 1 ( x, w 1 ) , w 2 ) , w 3 ) f 1 ( x, w 1 ) f 2 ( · , w 2 ) f 3 ( · , w 3 ) Input data x Mike Hughes - Tufts COMP 135 - Fall 2020 12

  12. Minimizing loss for multi-layer functions N X min loss( y n , f 3 ( f 2 ( f 1 ( x n , w 1 ) , w 2 ) , w 3 ) w 1 ,w 2 ,w 3 n =1 Loss can be: • Squared error for regression problems • Log loss for binary classification problems • … many others possible! Can try to find best possible weights with gradient descent…. But how do we compute gradients ? Mike Hughes - Tufts COMP 135 - Fall 2020 13

  13. Big idea: NN as a Computation Graph Each node represents one scalar result produced by our NN • Node 1: Input x • Node 2 and 3: Hidden layer 1 • Node 4 and 5: Hidden layer 2 • Node 6: Output Each edge represents one scalar weight that is a parameter of our NN To keep this simple, we omit bias parameters. Mike Hughes - Tufts COMP 135 - Fall 2020 14

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend