Graphs CMSC 470 Marine Carpuat Binary Classification with a - PowerPoint PPT Presentation

Neural Networks, Computation Graphs CMSC 470 Marine Carpuat

Binary Classification with a Multi-layer Perceptron φ “A” = 1 φ “site” = 1 φ “located” = 1 φ “Maizuru” = 1 -1 φ “,” = 2 φ “in” = 1 φ “Kyoto” = 1 φ “priest” = 0 φ “black” = 0

Example: binary classification with a NN φ 0 (x 1 ) = {-1, 1} φ 0 (x 2 ) = {1, 1} φ 0 [1] X O φ 0 [0] 1 φ 2 [0] = y 1 1 O X φ 0 (x 3 ) = {-1, -1} φ 0 (x 4 ) = {1, -1} φ 1 [1] 1 φ 1 (x 3 ) = {-1, 1} O φ 1 [0] 1 -1 φ 1 [0] -1 φ 1 (x 1 ) = {-1, -1} X φ 1 [1] -1 O φ 1 (x 2 ) = {1, -1} -1 φ 1 (x 4 ) = {-1, -1}

Example: the Final Net φ 0 [0] Replace “sign” with 1 smoother non-linear function (e.g. tanh, sigmoid) 1 φ 0 [1] φ 1 [0] tanh 1 -1 1 φ 0 [0] φ 2 [0] -1 tanh 1 -1 φ 0 [1] φ 1 [1] tanh -1 1 1 1

Multi-layer Perceptrons are a kind of “Neural Network” (NN) φ “A” = 1 • Input (aka features) φ “site” = 1 φ “located” = 1 • Output φ “Maizuru” = 1 • Nodes (aka neuron) -1 φ “,” • Layers = 2 φ “in” = 1 • Hidden layers φ “Kyoto” = 1 • Activation function φ “priest” = 0 (non-linear) φ “black” = 0

Neural Networks as Computation Graphs Example & figures by Philipp Koehn

Computation Graphs Make Prediction Easy: Forward Propagation

Neural Networks as Computation Graphs • Decomposes computation into simple operations over matrices and vectors • Forward propagation algorithm • Produces network output given an output • By traversing the computation graph in topological order

Neural Networks for Multiclass Classification

Multiclass Classification ● The softmax function 𝑓 𝐱⋅ϕ 𝑦,𝑧 Current class 𝑄 𝑧 ∣ 𝑦 = 𝑧 𝑓 𝐱⋅ϕ 𝑦, 𝑧 Sum of other classes Exact same function as in multiclass logistic regression

Example: A feedforward Neural Network for 3-way Classification Sigmoid function Softmax function (as in multi-class logistic reg) From Eisenstein p66

Designing Neural Networks: Activation functions • Hidden layer can be viewed as set of hidden features • The output of the hidden layer indicates the extent to which each hidden feature is “activated” by a given input • The activation function is a non- linear function that determines range of hidden feature values

Designing Neural Networks: Network structure • 2 key decisions: • Width (number of nodes per layer) • Depth (number of hidden layers) • More parameters means that the network can learn more complex functions of the input

Neural Networks so far • Powerful non-linear models for classification • Predictions are made as a sequence of simple operations • matrix-vector operations • non-linear activation functions • Choices in network structure • Width and depth • Choice of activation function • Feedforward networks (no loop) • Next: how to train?

Training Neural Networks

How do we estimate the parameters (aka “train”) a neural net? For training, we need: • Data: (a large number of) examples paired with their correct class (x,y) • Loss/error function: quantify how bad our prediction y is compared to the truth t • Let’s use squared error:

Stochastic Gradient Descent • We view the error as a function of the trainable parameters, on a given dataset • We want to find parameters that minimize the error Start with some initial parameter values Go through the training data w = 0 one example at a time for I iterations for each labeled pair x, y in the data 𝑒 error(w , x, y ) w = w − μ 𝑒 w Take a step down the gradient

Computation Graphs Make Training Easy: Computing Error

Computation Graphs Make Training Easy: Computing Gradients

Computation Graphs Make Training Easy: Given forward pass + derivatives for each node

Computation Graphs Make Training Easy: Computing Gradients

Computation Graphs Make Training Easy: Updating Parameters

Computation Graph: A Powerful Abstraction • To build a system, we only need to: • Define network structure • Define loss • Provide data • (and set a few more hyperparameters to control training) • Given network structure • Prediction is done by forward pass through graph (forward propagation) • Training is done by backward pass through graph (back propagation) • Based on simple matrix vector operations • Forms the basis of neural network libraries • Tensorflow, Pytorch, mxnet, etc.

Neural Networks • Powerful non-linear models for classification • Predictions are made as a sequence of simple operations • matrix-vector operations • non-linear activation functions • Choices in network structure • Width and depth • Choice of activation function • Feedforward networks (no loop) • Training with the back-propagation algorithm • Requires defining a loss/error function • Gradient descent + chain rule • Easy to implement on top of computation graphs

Graphs CMSC 470 Marine Carpuat Binary Classification with a - PowerPoint PPT Presentation

Neural Networks, Computation Graphs CMSC 470 Marine Carpuat Binary Classification with a Multi-layer Perceptron A = 1 site = 1 located = 1 Maizuru = 1 -1 , = 2 in = 1 Kyoto = 1

Graphs () Graphs () Graphs Graphs Graphs are collections of nodes

Weighted graphs Weighted graphs Weighted graphs Weighted graphs Graphs with numbers, called

Week 4 Kullmann Graphs and directed graphs Elementary Graph Algorithms Representing graphs

On some classes of Deza graphs Deza graphs without 3-cocliques Line graphs V.V. Kabanov 1 Deza

Graphs Graphs Examples Definitions Implementation/Representation of graphs Graphs

Graphs Graphs Simple graphs Algorithms Depth-first search Breadth-first search

Searching on Graphs November 16, 2016 CMPE 250 Graphs- Searching on Graphs November 16, 2016 1

CS200: Graphs Prichard Ch. 14 Rosen Ch. 10 CS200 - Graphs 1 Graphs A collection of What can

Today. Types of graphs. Today. Types of graphs. Complete Graphs. Trees. Hypercubes. Today.

Examples of Obstructions to Apex Graphs, Edge-Apex Graphs, and Contraction-Apex Graphs

STACKED GRAPHS STACKED GRAPHS EVOLUTION OF STACKED GRAPHS Stacked Area Chart Themeriver

Algorithms for Lipschitz Learning on Graphs Sushant Sachdeva Yale Institute of Network Sciences

Graphs Graph definitions There are two kinds of graphs: directed graphs (sometimes called

Graphs Graphs Simple graphs Algorithms Depth-first search Breadth-first search

House of Graphs: Introduction what are interesting graphs? GraPHedron First Definition of

Graphs Graphs Definitions Implementation/Representation of graphs Search Traversing

Machine Learning - MT 2016 8. Classification: Logistic Regression Varun Kanade University of

Linear and Logistic Regression Yingyu Liang Computer Sciences 760 Fall 2017

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

Lecture 2: Linear Classification Princeton University COS 495 Instructor: Yingyu Liang Review:

Sigmoid: ATwistedTaleofFluxandFields ByTylerBehm

CS480/680 Lecture 15: June 26, 2019 Deep Neural Networks [GBC] Chap. 6, 7, 8 University of

Dense layers IN TRODUCTION TO TEN S ORF LOW IN P YTH ON Isaiah Hull Economist The linear

JACKPOT: Online Experimentation of Cloud Microservices BY M. TOSLALI 1 , S. PARTHASARATHY 2 , F.