Machine Learning: Chenhao Tan University of Colorado Boulder - PowerPoint PPT Presentation

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 7 Slides adapted from Jordan Boyd-Graber, Chris Ketelsen Machine Learning: Chenhao Tan | Boulder | 1 of 39

Final projects • WSDM Cup • SemEval 2018 Machine Learning: Chenhao Tan | Boulder | 2 of 39

Overview Forward propagation recap Back propagation Chain rule Back propagation Full algorithm Machine Learning: Chenhao Tan | Boulder | 3 of 39

Forward propagation recap Outline Forward propagation recap Back propagation Chain rule Back propagation Full algorithm Machine Learning: Chenhao Tan | Boulder | 4 of 39

Forward propagation recap Forward propagation algorithm Store the biases for layer l in b l , weight matrix in W l W 1 , b 1 W 2 , b 2 W 3 , b 3 x 1 x 2 o 1 . . . o 2 x d Machine Learning: Chenhao Tan | Boulder | 5 of 39

Forward propagation recap Forward propagation algorithm Suppose your network has L layers Make a prediction based on test point x 1: Initialize a 0 = x 2: for l = 1 to L do z l = W l a l − 1 + b l 3: a l = g ( z l ) 4: 5: end for y is simply a L 6: The prediction ˆ Machine Learning: Chenhao Tan | Boulder | 6 of 39

Forward propagation recap Neural networks in a nutshell • Training data S train = { ( x , y ) } • Network architecture (model) ˆ y = f w ( x ) W l , b l , l = 1 , . . . , L • Loss function (objective function) L ( y , ˆ y ) • How do we learn the parameters? Machine Learning: Chenhao Tan | Boulder | 7 of 39

Forward propagation recap Neural networks in a nutshell • Training data S train = { ( x , y ) } • Network architecture (model) ˆ y = f w ( x ) W l , b l , l = 1 , . . . , L • Loss function (objective function) L ( y , ˆ y ) • How do we learn the parameters? Stochastic gradient descent, W l ← W l − η∂ L ( y , ˆ y ) ∂ W l Machine Learning: Chenhao Tan | Boulder | 7 of 39

Forward propagation recap Challenge • Challenge : How the heck do we compute derivatives of the loss function with respect to weights and biases? • Solution : Back Propagation Machine Learning: Chenhao Tan | Boulder | 8 of 39

Back propagation Outline Forward propagation recap Back propagation Chain rule Back propagation Full algorithm Machine Learning: Chenhao Tan | Boulder | 9 of 39

Back propagation | Chain rule The Chain Rule The chain rule allows us to take derivatives of nested functions. There are two forms of the Chain Rule Machine Learning: Chenhao Tan | Boulder | 10 of 39

Back propagation | Chain rule The Chain Rule The chain rule allows us to take derivatives of nested functions. There are two forms of the Chain Rule Baby Chain Rule : dx f ( g ( x )) = f ′ ( g ( x )) g ′ ( x ) = df d dg dg dx Machine Learning: Chenhao Tan | Boulder | 10 of 39

Back propagation | Chain rule The Chain Rule The chain rule allows us to take derivatives of nested functions. There are two forms of the Chain Rule Baby Chain Rule : dx f ( g ( x )) = f ′ ( g ( x )) g ′ ( x ) = df d dg dg dx d dx sin( x 2 ) = cos( x 2 ) 2 x Example: Machine Learning: Chenhao Tan | Boulder | 10 of 39

Back propagation | Chain rule The Chain Rule Full-Grown Adult Chain Rule : x ( r , s ) u r ( u , v ) y ( r , s ) f ( x , y , z ) v s ( u , v ) z ( r , s ) Machine Learning: Chenhao Tan | Boulder | 11 of 39

Back propagation | Chain rule The Chain Rule Full-Grown Adult Chain Rule : x ( r , s ) u r ( u , v ) y ( r , s ) f ( x , y , z ) v s ( u , v ) z ( r , s ) Derivative of L with respect to x : ∂ f ∂ x Similarly, ∂ f ∂ y , ∂ f ∂ z Machine Learning: Chenhao Tan | Boulder | 11 of 39

Back propagation | Chain rule The Chain Rule What is the derivative of f with respect to r ? x ( r , s ) u r ( u , v ) y ( r , s ) f ( x , y , z ) v s ( u , v ) z ( r , s ) Machine Learning: Chenhao Tan | Boulder | 12 of 39

Back propagation | Chain rule The Chain Rule What is the derivative of f with respect to r ? x ( r , s ) u r ( u , v ) y ( r , s ) f ( x , y , z ) v s ( u , v ) z ( r , s ) ∂ f ∂ r = ∂ f ∂ x ∂ r + ∂ f ∂ r + ∂ f ∂ y ∂ z ∂ x ∂ y ∂ z ∂ r Machine Learning: Chenhao Tan | Boulder | 12 of 39

Back propagation | Chain rule The Chain Rule What is the derivative of f with respect to s ? x ( r , s ) u r ( u , v ) y ( r , s ) f ( x , y , z ) v s ( u , v ) z ( r , s ) Machine Learning: Chenhao Tan | Boulder | 13 of 39

Back propagation | Chain rule The Chain Rule What is the derivative of f with respect to s ? x ( r , s ) u r ( u , v ) y ( r , s ) f ( x , y , z ) v s ( u , v ) z ( r , s ) ∂ f ∂ s = ∂ f ∂ x ∂ s + ∂ f ∂ s + ∂ f ∂ y ∂ z ∂ x ∂ y ∂ z ∂ s Machine Learning: Chenhao Tan | Boulder | 13 of 39

Back propagation | Chain rule The Chain Rule What is the derivative of f with respect to s ? x ( r , s ) u r ( u , v ) y ( r , s ) f ( x , y , z ) v s ( u , v ) z ( r , s ) Example : Let f = xyz , x = r , y = rs , and z = s . Find ∂ f /∂ s ∂ f ∂ s = ∂ f ∂ x ∂ s + ∂ f ∂ s + ∂ f ∂ y ∂ z ∂ x ∂ y ∂ z ∂ s Machine Learning: Chenhao Tan | Boulder | 14 of 39

Back propagation | Chain rule The Chain Rule What is the derivative of f with respect to s ? x ( r , s ) u r ( u , v ) y ( r , s ) f ( x , y , z ) v s ( u , v ) z ( r , s ) Example : Let f = xyz , x = r , y = rs , and z = s . Find ∂ f /∂ s ∂ f ∂ s = yz · 0 + xz · r + xy · 1 Machine Learning: Chenhao Tan | Boulder | 14 of 39

Back propagation | Chain rule The Chain Rule What is the derivative of f with respect to s ? x ( r , s ) u r ( u , v ) y ( r , s ) f ( x , y , z ) v s ( u , v ) z ( r , s ) Example : Let f = xyz , x = r , y = rs , and z = s . Find ∂ f /∂ s ∂ f ∂ s = rs 2 · 0 + rs · r + r 2 s · 1 Machine Learning: Chenhao Tan | Boulder | 14 of 39

Back propagation | Chain rule The Chain Rule What is the derivative of f with respect to s ? x ( r , s ) r ( u , v ) u y ( r , s ) f ( x , y , z ) s ( u , v ) v z ( r , s ) Example : Let f = xyz , x = r , y = rs , and z = s . Find ∂ f /∂ s ∂ f ∂ s = 2 r 2 s Machine Learning: Chenhao Tan | Boulder | 14 of 39

Back propagation | Chain rule The Chain Rule What is the derivative of f with respect to s ? x ( r , s ) r ( u , v ) u y ( r , s ) f ( x , y , z ) s ( u , v ) v z ( r , s ) Example : Let f = xyz , x = r , y = rs , and z = s . Find ∂ f /∂ s ∂ f f ( r , s ) = r · rs · s = r 2 s 2 ∂ s = 2 r 2 s � ⇒ Machine Learning: Chenhao Tan | Boulder | 15 of 39

Back propagation | Chain rule The Chain Rule What is the derivative of f with respect to u ? x ( r , s ) u r ( u , v ) y ( r , s ) f ( x , y , z ) v s ( u , v ) z ( r , s ) Machine Learning: Chenhao Tan | Boulder | 16 of 39

Back propagation | Chain rule The Chain Rule What is the derivative of f with respect to u ? x ( r , s ) u r ( u , v ) y ( r , s ) f ( x , y , z ) v s ( u , v ) z ( r , s ) ∂ f ∂ f ∂ r ∂ u + ∂ f ∂ s = ∂ u ∂ r ∂ s ∂ u Machine Learning: Chenhao Tan | Boulder | 16 of 39

Back propagation | Chain rule The Chain Rule What is the derivative of f with respect to u ? x ( r , s ) u r ( u , v ) y ( r , s ) f ( x , y , z ) v s ( u , v ) z ( r , s ) Crux : If you know derivative of objective w.r.t. intermediate value in the chain, can eliminate everything in between. Machine Learning: Chenhao Tan | Boulder | 17 of 39

Back propagation | Chain rule The Chain Rule What is the derivative of f with respect to u ? x ( r , s ) u r ( u , v ) y ( r , s ) f ( x , y , z ) v s ( u , v ) z ( r , s ) Crux : If you know derivative of objective w.r.t. intermediate value in the chain, can eliminate everything in between. This is the cornerstone of the Back Propagation algorithm. Machine Learning: Chenhao Tan | Boulder | 17 of 39

Back propagation | Back propagation Back Propagation W 1 , b 1 W 2 , b 2 W 3 , b 3 W 4 , b 4 x 1 x 2 o 1 . . . o 2 x d Machine Learning: Chenhao Tan | Boulder | 18 of 39

Back propagation | Back propagation Back Propagation For the derivation, we’ll consider a simplified network L ( y , a 2 ) z 1 | a 1 W 2 z 2 | a 2 a 0 W 1 We want to use back propagation to compute partial derivative of L w.r.t. the weights and biases ∂ L , for l = 1 , 2 ∂ w 2 ij Machine Learning: Chenhao Tan | Boulder | 19 of 39

Back propagation | Back propagation Back Propagation For the derivation, we’ll consider a simplified network L ( y , a 2 ) z 1 | a 1 W 2 z 2 | a 2 a 0 W 1 We need to choose an intermediate term that lives on the nodes, that we can easily compute derivative with respect to Could choose a ’s, but we’ll choose z ’s because math is easier Machine Learning: Chenhao Tan | Boulder | 19 of 39

Back propagation | Back propagation Back Propagation For the derivation, we’ll consider a simplified network L ( y , a 2 ) z 1 | a 1 W 2 z 2 | a 2 a 0 W 1 Define the derivative w.r.t. the z ’s by δ : j = ∂ L δ l ∂ z l j Note that δ l has the same size as z l and a l Machine Learning: Chenhao Tan | Boulder | 19 of 39

Back propagation | Back propagation Back Propagation For the derivation, we’ll consider a simplified network L ( y , a 2 ) z 1 | a 1 W 2 z 2 | a 2 a 0 W 1 Let’s compute δ L for output layer L : da L j = ∂ L = ∂ L j δ L ∂ z L ∂ a L dz L j j j Machine Learning: Chenhao Tan | Boulder | 19 of 39

Machine Learning: Chenhao Tan University of Colorado Boulder - PowerPoint PPT Presentation

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 7 Slides adapted from Jordan Boyd-Graber, Chris Ketelsen Machine Learning: Chenhao Tan | Boulder | 1 of 39 Final projects WSDM Cup SemEval 2018 Machine Learning:

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 3 Slides adapted from

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 4 Slides adapted from

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 1 Slides adapted from

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 2 Slides adapted from

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 10 Slides adapted from

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 5 Slides adapted from

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 6 Slides adapted from

Department of Computer Science CSCI 5622: Machine Learning Chenhao Tan Lecture 23: Machine

User Level Sentiment Analysis Incorporating Social Networks Chenhao Tan Department of Computer

Natural Language Processing (CSEP 517): Computational Pragmatics Chenhao Tan 2017 c

Department of Computer Science CSCI 5622: Machine Learning Chenhao Tan Lecture 18: Clustering

Department of Computer Science CSCI 5622: Machine Learning Chenhao Tan Lecture 13: Boosting

Department of Computer Science CSCI 5622: Machine Learning Chenhao Tan Lecture 12:

Department of Computer Science CSCI 5622: Machine Learning Chenhao Tan Lecture 21: Reinforcement

Department of Computer Science CSCI 5622: Machine Learning Chenhao Tan Lecture 17: Midterm

Department of Computer Science CSCI 5622: Machine Learning Chenhao Tan Lecture 16:

A brief history of deep learning 1 Andrew Kurenkov. This summary is based on A Brief

Circuit-GNN: Graph Neural Networks for Distributed Circuit Design Guo Zhang Hao He Dina Katabi

Modular Neural Networks CPSC 533 Franco Lee Ian Ko Modular Neural Networks What is it ? Dif

Neural Network Backpropagation 3-2-16 Recall from Monday... Perceptrons can only classify

Cost function Machine Learning Neural Network (Classification) total no. of layers in network

MultiLayer Neural Networks Xiaogang Wang xgwang@ee.cuhk.edu.hk January 15, 2019 cuhk Xiaogang

Natural Language Understanding Lecture 2: Revision of neural networks and backpropagation Adam

Navigating and Editing Prototxts Alexander Radovic College of William and Mary Alexander

Sambuz

Useful Links

Newsletter

Mail Us