Computation Graphs Philipp Koehn 29 September 2020 Philipp Koehn - PowerPoint PPT Presentation

Computation Graphs Philipp Koehn 29 September 2020 Philipp Koehn Machine Translation: Computation Graphs 29 September 2020

Neural Network Cartoon 1 • A common way to illustrate a neural network x h y Philipp Koehn Machine Translation: Computation Graphs 29 September 2020

Neural Network Math 2 • Hidden layer h = sigmoid ( W 1 x + b 1 ) • Final layer y = sigmoid ( W 2 h + b 2 ) Philipp Koehn Machine Translation: Computation Graphs 29 September 2020

Computation Graph 3 W 1 x b 1 prod sum W 2 sigmoid b 2 prod sum sigmoid Philipp Koehn Machine Translation: Computation Graphs 29 September 2020

Simple Neural Network 4 3.7 2.9 4.5 3.7 -5.2 2.9 -1.5 0 . -4.6 2 - 1 1 Philipp Koehn Machine Translation: Computation Graphs 29 September 2020

Computation Graph 5 � � 3 . 7 3 . 7 W 1 x 2 . 9 2 . 9 � � − 1 . 5 b 1 prod − 4 . 6 sum � 4 . 5 − 5 . 2 � W 2 sigmoid � − 2 . 0 � b 2 prod sum sigmoid Philipp Koehn Machine Translation: Computation Graphs 29 September 2020

Processing Input 6 � � 3 . 7 3 . 7 � � 1 . 0 W 1 x 2 . 9 2 . 9 0 . 0 � � − 1 . 5 b 1 prod − 4 . 6 sum � 4 . 5 − 5 . 2 � W 2 sigmoid � − 2 . 0 � b 2 prod sum sigmoid Philipp Koehn Machine Translation: Computation Graphs 29 September 2020

Processing Input 7 � � 3 . 7 3 . 7 � � 1 . 0 W 1 x 2 . 9 2 . 9 0 . 0 � � − 1 . 5 � � 3 . 7 b 1 prod − 4 . 6 2 . 9 sum � 4 . 5 − 5 . 2 � W 2 sigmoid � − 2 . 0 � b 2 prod sum sigmoid Philipp Koehn Machine Translation: Computation Graphs 29 September 2020

Processing Input 8 � � 3 . 7 3 . 7 � � 1 . 0 W 1 x 2 . 9 2 . 9 0 . 0 � � − 1 . 5 � � 3 . 7 b 1 prod − 4 . 6 2 . 9 � � 2 . 2 sum − 1 . 6 � 4 . 5 − 5 . 2 � W 2 sigmoid � − 2 . 0 � b 2 prod sum sigmoid Philipp Koehn Machine Translation: Computation Graphs 29 September 2020

Processing Input 9 � � 3 . 7 3 . 7 � � 1 . 0 W 1 x 2 . 9 2 . 9 0 . 0 � � − 1 . 5 � � 3 . 7 b 1 prod − 4 . 6 2 . 9 � � 2 . 2 sum − 1 . 6 � � . 900 � 4 . 5 − 5 . 2 � W 2 sigmoid . 168 � − 2 . 0 � b 2 prod sum sigmoid Philipp Koehn Machine Translation: Computation Graphs 29 September 2020

Processing Input 10 � � 3 . 7 3 . 7 � � 1 . 0 W 1 x 2 . 9 2 . 9 0 . 0 � � − 1 . 5 � � 3 . 7 b 1 prod − 4 . 6 2 . 9 � � 2 . 2 sum − 1 . 6 � � . 900 � 4 . 5 − 5 . 2 � W 2 sigmoid . 168 � − 2 . 0 � b 2 � 3 . 18 � prod � 1 . 18 � sum � . 765 � sigmoid Philipp Koehn Machine Translation: Computation Graphs 29 September 2020

Error Function 11 • For training, we need a measure how well we do ⇒ Cost function also known as objective function, loss, gain, cost, ... • For instance L2 norm error = 1 2( t − y ) 2 Philipp Koehn Machine Translation: Computation Graphs 29 September 2020

Gradient Descent 12 • We view the error as a function of the trainable parameters error ( λ ) • We want to optimize error ( λ ) by moving it towards its optimum 2 error( λ ) error( λ ) error( λ ) = t n e i d a r g gradient = 1 gradient = 0.2 λ λ λ optimal λ current λ optimal λ current λ optimal λ current λ • Why not just set it to its optimum? – we are updating based on one training example, do not want to overfit to it – we are also changing all the other parameters, the curve will look different Philipp Koehn Machine Translation: Computation Graphs 29 September 2020

Calculus Refresher: Chain Rule 13 • Formula for computing derivative of composition of two or more functions – functions f and g – composition f ◦ g maps x to f ( g ( x )) • Chain rule ( f ◦ g ) ′ = ( f ′ ◦ g ) · g ′ or F ′ ( x ) = f ′ ( g ( x )) g ′ ( x ) • Leibniz’s notation dy · dy dx = dz dz dx if z = f ( y ) and y = g ( x ) , then dy · dy dz dx = dz dx = f ′ ( y ) g ′ ( x ) = f ′ ( g ( x )) g ′ ( x ) Philipp Koehn Machine Translation: Computation Graphs 29 September 2020

Final Layer Update 14 • Linear combination of weights s = � k w k h k • Activation function y = sigmoid ( s ) • Error (L2 norm) E = 1 2 ( t − y ) 2 • Derivative of error with regard to one weight w k dE = dE dy ds dw k dy ds dw k Philipp Koehn Machine Translation: Computation Graphs 29 September 2020

Error Computation in Computation Graph 15 W 1 x b 1 prod sum W 2 sigmoid b 2 prod sum sigmoid t L2 Philipp Koehn Machine Translation: Computation Graphs 29 September 2020

Error Propagation in Computation Graph 16 A B E • Compute derivative at node A : dE dA = dE dB dB dA • Assume that we already computed dE dB (backward pass through graph) • So now we only have to get the formula for dB dA • For instance B is a square node – forward computation: B = A 2 dA = dA 2 – backward computation: dB dA = 2 A Philipp Koehn Machine Translation: Computation Graphs 29 September 2020

Derivatives for Each Node 17 W 1 x b 1 prod sum W 2 sigmoid b 2 prod sum sigmoid t 2 ( t − i ) 2 = t − i d sigmoid = do d L2 di = d 1 L2 di Philipp Koehn Machine Translation: Computation Graphs 29 September 2020

Derivatives for Each Node 18 W 1 x b 1 prod sum W 2 sigmoid b 2 prod sum d sigmoid = do di = d sigmoid di σ ( i ) = σ ( i )(1 − σ ( i )) t d sum 2 ( t − i ) 2 = t − i d sigmoid = do d L2 di = d 1 L2 di Philipp Koehn Machine Translation: Computation Graphs 29 September 2020

Derivatives for Each Node 19 W 1 x b 1 prod sum W 2 sigmoid b 2 prod d prod = do d sum di 1 i 1 + i 2 = 1 , do d di 1 = di 2 = 1 sum d sigmoid = do di = d sigmoid di σ ( i ) = σ ( i )(1 − σ ( i )) t d sum 2 ( t − i ) 2 = t − i d sigmoid = do d L2 di = d 1 L2 di Philipp Koehn Machine Translation: Computation Graphs 29 September 2020

Derivatives for Each Node 20 W 1 x b 1 prod sum W 2 sigmoid d sum d prod = do di 1 i 1 i 2 = i 2 , do d b 2 di 1 = di 2 = i 1 prod d prod = do d sum di 1 i 1 + i 2 = 1 , do d di 1 = di 2 = 1 sum d sigmoid = do di = d sigmoid di σ ( i ) = σ ( i )(1 − σ ( i )) t d sum 2 ( t − i ) 2 = t − i d sigmoid = do d L2 di = d 1 L2 di Philipp Koehn Machine Translation: Computation Graphs 29 September 2020

Backward Pass: Derivative Computation 21 � � 3 . 7 3 . 7 � � 1 . 0 W 1 x 2 . 9 2 . 9 0 . 0 � � − 1 . 5 b 1 � � 3 . 7 − 4 . 6 prod i 2 , i 1 2 . 9 � � 2 . 2 sum 1 , 1 − 1 . 6 � � . 900 � 4 . 5 − 5 . 2 � W 2 sigmoid σ ′ ( i ) . 17 � 3 . 18 � prod i 2 , i 1 � − 2 . 0 � b 2 � 1 . 18 � 1 , 1 sum � 1 . 0 � t � . 765 � sigmoid σ ′ ( i ) � . 0277 � � . 235 � i 2 − i 1 L2 Philipp Koehn Machine Translation: Computation Graphs 29 September 2020

Backward Pass: Derivative Computation 22 � � 3 . 7 3 . 7 � � 1 . 0 W 1 x 2 . 9 2 . 9 0 . 0 � � − 1 . 5 b 1 � � 3 . 7 − 4 . 6 prod i 2 , i 1 2 . 9 � � 2 . 2 sum 1 , 1 − 1 . 6 � � . 900 � 4 . 5 − 5 . 2 � W 2 sigmoid σ ′ ( i ) . 17 � 3 . 18 � prod i 2 , i 1 � − 2 . 0 � b 2 � 1 . 18 � 1 , 1 sum � 1 . 0 � t � . 765 � � . 180 � � . 235 � � . 0424 � sigmoid σ ′ ( i ) = × � . 0277 � � . 235 � i 2 − i 1 L2 Philipp Koehn Machine Translation: Computation Graphs 29 September 2020

Backward Pass: Derivative Computation 23 � � 3 . 7 3 . 7 � � 1 . 0 W 1 x 2 . 9 2 . 9 0 . 0 � � − 1 . 5 b 1 � � 3 . 7 − 4 . 6 prod i 2 , i 1 2 . 9 � � 2 . 2 sum 1 , 1 − 1 . 6 � � . 900 � 4 . 5 − 5 . 2 � W 2 sigmoid σ ′ ( i ) . 17 � 3 . 18 � prod i 2 , i 1 � − 2 . 0 � b 2 � 1 . 18 � � . 0424 � � . 0424 � 1 , 1 sum , � 1 . 0 � t � . 765 � � . 180 � � . 235 � � . 0424 � sigmoid σ ′ ( i ) = × � . 0277 � � . 235 � i 2 − i 1 L2 Philipp Koehn Machine Translation: Computation Graphs 29 September 2020

Backward Pass: Derivative Computation 24 � � 3 . 7 3 . 7 � � 1 . 0 W 1 x 2 . 9 2 . 9 0 . 0 � � − 1 . 5 b 1 � � � � � � 3 . 7 − . 0260 0171 0 − 4 . 6 prod i 2 , i 1 , 2 . 9 − . 0260 − . 0308 0 � � � � � � 2 . 2 . 0171 . 0171 sum 1 , 1 , − 1 . 6 − . 0308 − . 0308 � � � � . 900 . 0171 � 4 . 5 − 5 . 2 � W 2 sigmoid σ ′ ( i ) . 17 − . 0308 � � . 191 � 3 . 18 � � . 0382 . 00712 � prod i 2 , i 1 , � − 2 . 0 � b 2 − . 220 � 1 . 18 � � . 0424 � � . 0424 � 1 , 1 sum , � 1 . 0 � t � . 765 � � . 180 � � . 235 � � . 0424 � sigmoid σ ′ ( i ) = × � . 0277 � � . 235 � i 2 − i 1 L2 Philipp Koehn Machine Translation: Computation Graphs 29 September 2020

Gradients for Parameter Update 25 � � 3 . 7 3 . 7 W 1 x 2 . 9 2 . 9 � � − 1 . 5 b 1 � � . 0171 0 − 4 . 6 prod i 2 , i 1 − . 0308 0 � � . 0171 sum 1 , 1 − . 0308 � 4 . 5 − 5 . 2 � W 2 sigmoid σ ′ ( i ) � . 0382 . 00712 � prod i 2 , i 1 � − 2 . 0 � b 2 � . 0424 � 1 , 1 sum t sigmoid σ ′ ( i ) i 2 − i 1 L2 Philipp Koehn Machine Translation: Computation Graphs 29 September 2020

Computation Graphs Philipp Koehn 29 September 2020 Philipp Koehn - PowerPoint PPT Presentation

Computation Graphs Philipp Koehn 29 September 2020 Philipp Koehn Machine Translation: Computation Graphs 29 September 2020 Neural Network Cartoon 1 A common way to illustrate a neural network x h y Philipp Koehn Machine Translation:

Graphs () Graphs () Graphs Graphs Graphs are collections of nodes

Weighted graphs Weighted graphs Weighted graphs Weighted graphs Graphs with numbers, called

Week 4 Kullmann Graphs and directed graphs Elementary Graph Algorithms Representing graphs

On some classes of Deza graphs Deza graphs without 3-cocliques Line graphs V.V. Kabanov 1 Deza

Graphs Graphs Examples Definitions Implementation/Representation of graphs Graphs

Graphs Graphs Simple graphs Algorithms Depth-first search Breadth-first search

Searching on Graphs November 16, 2016 CMPE 250 Graphs- Searching on Graphs November 16, 2016 1

CS200: Graphs Prichard Ch. 14 Rosen Ch. 10 CS200 - Graphs 1 Graphs A collection of What can

Today. Types of graphs. Today. Types of graphs. Complete Graphs. Trees. Hypercubes. Today.

Formal Definition of Computation Formal Definition of Computation p.1/28 Computation

Examples of Obstructions to Apex Graphs, Edge-Apex Graphs, and Contraction-Apex Graphs

STACKED GRAPHS STACKED GRAPHS EVOLUTION OF STACKED GRAPHS Stacked Area Chart Themeriver

Algorithms for Lipschitz Learning on Graphs Sushant Sachdeva Yale Institute of Network Sciences

Graphs Graph definitions There are two kinds of graphs: directed graphs (sometimes called

Graphs Graphs Simple graphs Algorithms Depth-first search Breadth-first search

House of Graphs: Introduction what are interesting graphs? GraPHedron First Definition of

>>> ELEG5491: Introduction to Deep Learning >>> PyTorch Tutorials Name: GE

Geometric data analysis, beyond convolutions Jean Feydy, under the supervision of Alain Trouv

6/18/2018 Legacy Something handed down from one generation to the next! Are you leaving a

S ebastien Marcel marcel@idiap.ch IDIAP Research Institute Martigny, Switzerland

Distributed Deep Learning Using Hopsworks CGI Trainee Program Workshop Kim Hammar

Galatians: week 2 Galatians 2:15-21 And count the patience of our Lord as salvation, just as our

Lab 4 preview Hung-Wei Tseng In Lab 4... You will be extending the datapath and control unit

Identifying Frequent Items in Sliding Windows over On-Line Packet Streams Alejandro Lpez-Ortiz

Computation Graphs Philipp Koehn 29 September 2020 Philipp Koehn - PowerPoint PPT Presentation

Computation Graphs Philipp Koehn 29 September 2020 Philipp Koehn Machine Translation: Computation Graphs 29 September 2020 Neural Network Cartoon 1 A common way to illustrate a neural network x h y Philipp Koehn Machine Translation:

Graphs () Graphs () Graphs Graphs Graphs are collections of nodes

Weighted graphs Weighted graphs Weighted graphs Weighted graphs Graphs with numbers, called

Week 4 Kullmann Graphs and directed graphs Elementary Graph Algorithms Representing graphs

On some classes of Deza graphs Deza graphs without 3-cocliques Line graphs V.V. Kabanov 1 Deza

Graphs Graphs Examples Definitions Implementation/Representation of graphs Graphs

Graphs Graphs Simple graphs Algorithms Depth-first search Breadth-first search

Searching on Graphs November 16, 2016 CMPE 250 Graphs- Searching on Graphs November 16, 2016 1

CS200: Graphs Prichard Ch. 14 Rosen Ch. 10 CS200 - Graphs 1 Graphs A collection of What can

Today. Types of graphs. Today. Types of graphs. Complete Graphs. Trees. Hypercubes. Today.

Formal Definition of Computation Formal Definition of Computation p.1/28 Computation

Examples of Obstructions to Apex Graphs, Edge-Apex Graphs, and Contraction-Apex Graphs

STACKED GRAPHS STACKED GRAPHS EVOLUTION OF STACKED GRAPHS Stacked Area Chart Themeriver

Algorithms for Lipschitz Learning on Graphs Sushant Sachdeva Yale Institute of Network Sciences

Graphs Graph definitions There are two kinds of graphs: directed graphs (sometimes called

Graphs Graphs Simple graphs Algorithms Depth-first search Breadth-first search

House of Graphs: Introduction what are interesting graphs? GraPHedron First Definition of

&gt;&gt;&gt; ELEG5491: Introduction to Deep Learning &gt;&gt;&gt; PyTorch Tutorials Name: GE

Geometric data analysis, beyond convolutions Jean Feydy, under the supervision of Alain Trouv

6/18/2018 Legacy Something handed down from one generation to the next! Are you leaving a

S ebastien Marcel marcel@idiap.ch IDIAP Research Institute Martigny, Switzerland

Distributed Deep Learning Using Hopsworks CGI Trainee Program Workshop Kim Hammar

Galatians: week 2 Galatians 2:15-21 And count the patience of our Lord as salvation, just as our

Lab 4 preview Hung-Wei Tseng In Lab 4... You will be extending the datapath and control unit

Identifying Frequent Items in Sliding Windows over On-Line Packet Streams Alejandro Lpez-Ortiz

>>> ELEG5491: Introduction to Deep Learning >>> PyTorch Tutorials Name: GE