Neural Networks Part 2 Yingyu Liang yliang@cs.wisc.edu Computer - PowerPoint PPT Presentation

Neural Networks Part 2 Yingyu Liang yliang@cs.wisc.edu Computer Sciences Department University of Wisconsin, Madison [Based on slides from Jerry Zhu, Mohit Gupta]

Limited power of one single neuron • Perceptron: 𝑏 = 𝑕(σ 𝑒 𝑥 𝑒 𝑦 𝑒 ) • Activation function 𝑕 : linear, step, sigmoid 1 𝑥 0 𝑦 1 𝑥 1 𝑏 … 𝑕(෍ 𝑥 𝑒 𝑦 𝑒 ) 𝑒 𝑥 𝐸 𝑦 𝐸 slide 2

Limited power of one single neuron • Perceptron: 𝑏 = 𝑕(σ 𝑒 𝑥 𝑒 𝑦 𝑒 ) • Activation function 𝑕 : linear, step, sigmoid 1 𝑥 0 𝑦 1 𝑥 1 𝑏 … 𝑕(෍ 𝑥 𝑒 𝑦 𝑒 ) 𝑒 𝑥 𝐸 𝑦 𝐸 • Decision boundary linear even for nonlinear 𝑕 • XOR problem slide 3

Limited power of one single neuron • XOR problem • Wait! If one can represent AND, OR, NOT, one can represent any logic circuit (including XOR), by connecting them Question: how to? slide 4

Multi-layer neural networks • Standard way to connect Perceptrons • Example: 1 hidden layer, 1 output layer Layer 2 Layer 3 Layer 1 (hidden) (output) (input) 𝑦 1 𝑦 2 slide 5

Multi-layer neural networks • Standard way to connect Perceptrons • Example: 1 hidden layer, 1 output layer 2 = 𝑕 (2) 𝑏 1 ෍ 𝑦 𝑒 𝑥 1𝑒 (2) 𝑥 11 𝑒 𝑦 1 (2) 𝑥 12 𝑦 2 slide 6

Multi-layer neural networks • Standard way to connect Perceptrons • Example: 1 hidden layer, 1 output layer 2 = 𝑕 (2) 𝑏 1 ෍ 𝑦 𝑒 𝑥 1𝑒 (2) 𝑥 11 𝑒 𝑦 1 (2) 𝑥 21 2 = 𝑕 (2) 𝑏 2 𝑦 𝑒 𝑥 2𝑒 ෍ (2) 𝑥 12 𝑒 (2) 𝑥 22 𝑦 2 slide 7

Multi-layer neural networks • Standard way to connect Perceptrons • Example: 1 hidden layer, 1 output layer 2 = 𝑕 (2) 𝑏 1 ෍ 𝑦 𝑒 𝑥 1𝑒 (2) 𝑥 11 𝑒 𝑦 1 (2) 𝑥 21 (2) 𝑥 31 2 = 𝑕 (2) 𝑏 2 𝑦 𝑒 𝑥 2𝑒 ෍ (2) 𝑥 12 𝑒 (2) 𝑥 22 𝑦 2 (2) 𝑥 32 2 = 𝑕 (2) 𝑏 3 ෍ 𝑦 𝑒 𝑥 3𝑒 𝑒 slide 8

Multi-layer neural networks • Standard way to connect Perceptrons • Example: 1 hidden layer, 1 output layer 2 = 𝑕 (2) 𝑏 1 ෍ 𝑦 𝑒 𝑥 1𝑒 (2) 𝑥 11 (3) 𝑒 𝑥 1 𝑦 1 (2) 𝑥 21 (3) (2) 𝑥 2 𝑥 31 2 = 𝑕 2 𝑥 𝑗 (2) (3) 𝑏 2 𝑦 𝑒 𝑥 2𝑒 ෍ 𝑏 = 𝑕 ෍ 𝑏 𝑗 (2) 𝑥 12 𝑒 𝑗 (2) 𝑥 22 𝑦 2 (3) 𝑥 3 (2) 𝑥 32 2 = 𝑕 (2) 𝑏 3 ෍ 𝑦 𝑒 𝑥 3𝑒 𝑒 slide 9

Neural net for 𝐿 -way classification • Use 𝐿 output units • Training: encode a label 𝑧 by an indicator vector ▪ class1=(1,0,0,…,0), class2=(0,1,0,…,0) etc. • Test: choose the class corresponding to the largest output unit (3) 𝑥 11 2 𝑥 1𝑗 (3) 𝑏 1 = 𝑕 ෍ 𝑏 𝑗 𝑦 1 𝑗 … (3) 𝑥 12 𝑦 2 (3) 𝑥 13 slide 10

Neural net for 𝐿 -way classification • Use 𝐿 output units • Training: encode a label 𝑧 by an indicator vector ▪ class1=(1,0,0,…,0), class2=(0,1,0,…,0) etc. • Test: choose the class corresponding to the largest output unit (3) 𝑥 11 2 𝑥 1𝑗 (3) (3) 𝑏 1 = 𝑕 ෍ 𝑏 𝑗 𝑥 𝐿1 𝑦 1 𝑗 … (3) 𝑥 12 (3) 𝑥 𝐿2 𝑦 2 (3) 𝑥 13 2 𝑥 𝐿𝑗 (3) 𝑏 𝐿 = 𝑕 ෍ 𝑏 𝑗 (3) 𝑗 𝑥 𝐿3 slide 11

The (unlimited) power of neural network • In theory ▪ we don’t need too many layers: ▪ 1-hidden-layer net with enough hidden units can represent any continuous function of the inputs with arbitrary accuracy ▪ 2-hidden-layer net can even represent discontinuous functions slide 14

Learning in neural network • Again we will minimize the error ( 𝐿 outputs): 𝐿 𝐹 = 1 𝑧 − 𝑏 2 = ෍ 𝑏 𝑑 − 𝑧 𝑑 2 2 ෍ 𝐹 𝑦 , 𝐹 𝑦 = 𝑦∈𝐸 𝑑=1 • 𝑦 : one training point in the training set 𝐸 • 𝑏 𝑑 : the 𝑑 -th output for the training point 𝑦 • 𝑧 𝑑 : the 𝑑 -th element of the label indicator vector for 𝑦 𝑏 1 1 𝑦 1 0 … = 𝑧 … 𝑦 2 0 𝑏 𝐿 0 slide 15

Learning in neural network • Again we will minimize the error ( 𝐿 outputs): 𝐿 𝐹 = 1 𝑧 − 𝑏 2 = ෍ 𝑏 𝑑 − 𝑧 𝑑 2 2 ෍ 𝐹 𝑦 , 𝐹 𝑦 = 𝑦∈𝐸 𝑑=1 • 𝑦 : one training point in the training set 𝐸 • 𝑏 𝑑 : the 𝑑 -th output for the training point 𝑦 • 𝑧 𝑑 : the 𝑑 -th element of the label indicator vector for 𝑦 • Our variables are all the weights 𝑥 on all the edges ▪ Apparent difficulty: we don’t know the ‘correct’ output of hidden units slide 16

Learning in neural network • Again we will minimize the error ( 𝐿 outputs): 𝐿 𝐹 = 1 𝑧 − 𝑏 2 = ෍ 𝑏 𝑑 − 𝑧 𝑑 2 2 ෍ 𝐹 𝑦 , 𝐹 𝑦 = 𝑦∈𝐸 𝑑=1 • 𝑦 : one training point in the training set 𝐸 • 𝑏 𝑑 : the 𝑑 -th output for the training point 𝑦 • 𝑧 𝑑 : the 𝑑 -th element of the label indicator vector for 𝑦 • Our variables are all the weights 𝑥 on all the edges ▪ Apparent difficulty: we don’t know the ‘correct’ output of hidden units ▪ It turns out to be OK: we can still do gradient descent. The trick you need is the chain rule ▪ The algorithm is known as back-propagation slide 17

Gradient (on one data point) Layer (1) Layer (2) Layer (3) Layer (4) (4) 𝑥 11 𝑦 1 𝐹 𝑦 𝑦 2 𝜖𝐹 𝑦 want to compute 4 𝜖𝑥 11 slide 18

Gradient (on one data point) Layer (1) Layer (2) Layer (3) Layer (4) (4) 𝑥 11 𝑦 1 𝑏 1 𝑧 − 𝑏 2 = 𝐹 𝑦 𝑏 2 𝑦 2 𝑧 − 𝑏 2 𝑏 1 𝐹 𝑦 slide 19

Gradient (on one data point) Layer (1) Layer (2) Layer (3) Layer (4) (4) = 𝑥 11 (3) + 𝑥 12 (4) 𝑏 1 (4) 𝑏 2 (3) 𝑨 1 (4) 𝑥 11 𝑦 1 𝑏 1 𝑧 − 𝑏 2 = 𝐹 𝑦 𝑏 2 𝑦 2 (4) 𝑕 𝑨 1 𝑧 − 𝑏 2 (4) 𝑨 1 𝑏 1 𝐹 𝑦 slide 20

Gradient (on one data point) Layer (1) Layer (2) Layer (3) Layer (4) (4) = 𝑥 11 (3) + 𝑥 12 (4) 𝑏 1 (4) 𝑏 2 (3) 𝑨 1 (4) 𝑥 11 𝑦 1 𝑏 1 (4) 𝑥 12 𝑧 − 𝑏 2 = 𝐹 𝑦 𝑏 2 𝑦 2 4 𝑏 1 (3) 𝑥 11 (4) 𝑕 𝑨 1 𝑧 − 𝑏 2 (4) 𝑨 1 𝑏 1 𝐹 𝑦 4 𝑏 2 (3) 𝑥 12 slide 21

Gradient (on one data point) Layer (1) Layer (2) Layer (3) Layer (4) (4) = 𝑥 11 (3) + 𝑥 12 (4) 𝑏 1 (4) 𝑏 2 (3) 𝑨 1 (4) 𝑥 11 𝑦 1 𝑏 1 𝑧 − 𝑏 2 = 𝐹 𝑦 𝑏 2 𝑦 2 4 𝑏 1 (3) 𝑥 11 (4) 𝑕 𝑨 1 𝑧 − 𝑏 2 (4) 𝑨 1 𝑏 1 𝐹 𝑦 4 𝑏 2 (3) 𝜖𝑏 1 𝑥 12 𝜖𝐹 𝒚 (4) (4) = 𝑕′ 𝑨 1 = 2(𝑏 1 − 𝑧 1 ) 𝜖𝑏 1 𝜖𝑨 1 (4) 𝜖𝐹 𝒚 (4) = 𝜖𝐹 𝒚 𝜖𝑏 1 𝜖𝑨 1 By Chain Rule: (4) (4) 𝜖𝑏 1 𝜖𝑥 11 𝜖𝑨 1 𝜖𝑥 11 slide 22

Gradient (on one data point) Layer (1) Layer (2) Layer (3) Layer (4) (4) = 𝑥 11 (3) + 𝑥 12 (4) 𝑏 1 (4) 𝑏 2 (3) 𝑨 1 (4) 𝑥 11 𝑦 1 𝑏 1 𝑧 − 𝑏 2 = 𝐹 𝑦 𝑏 2 𝑦 2 4 𝑏 1 (3) 𝑥 11 (4) 𝑕 𝑨 1 𝑧 − 𝑏 2 (4) 𝑨 1 𝑏 1 𝐹 𝑦 4 𝑏 2 (3) 𝜖𝑏 1 𝑥 12 𝜖𝐹 𝒚 (4) (4) = 𝑕′ 𝑨 1 = 2(𝑏 1 − 𝑧 1 ) 𝜖𝑏 1 𝜖𝑨 1 (4) 𝜖𝐹 𝒚 𝜖𝑨 1 (4) (4) = 2(𝑏 1 − 𝑧 1 )𝑕′ 𝑨 1 By Chain Rule: (4) 𝜖𝑥 11 𝜖𝑥 11 slide 23

Gradient (on one data point) Layer (1) Layer (2) Layer (3) Layer (4) (4) = 𝑥 11 (3) + 𝑥 12 (4) 𝑏 1 (4) 𝑏 2 (3) 𝑨 1 (4) 𝑥 11 𝑦 1 𝑏 1 𝑧 − 𝑏 2 = 𝐹 𝑦 𝑏 2 𝑦 2 4 𝑏 1 (3) 𝑥 11 (4) 𝑕 𝑨 1 𝑧 − 𝑏 2 (4) 𝑨 1 𝑏 1 𝐹 𝑦 4 𝑏 2 (3) 𝜖𝑏 1 𝑥 12 𝜖𝐹 𝒚 (4) (4) = 𝑕′ 𝑨 1 = 2(𝑏 1 − 𝑧 1 ) 𝜖𝑏 1 𝜖𝑨 1 𝜖𝐹 𝒚 (4) 𝑏 1 (3) (4) = 2(𝑏 1 − 𝑧 1 )𝑕′ 𝑨 1 By Chain Rule: 𝜖𝑥 11 slide 24

Gradient (on one data point) Layer (1) Layer (2) Layer (3) Layer (4) (4) = 𝑥 11 (3) + 𝑥 12 (4) 𝑏 1 (4) 𝑏 2 (3) 𝑨 1 (4) 𝑥 11 𝑦 1 𝑏 1 𝑧 − 𝑏 2 = 𝐹 𝑦 𝑏 2 𝑦 2 4 𝑏 1 (3) 𝑥 11 (4) 𝑕 𝑨 1 𝑧 − 𝑏 2 (4) 𝑨 1 𝑏 1 𝐹 𝑦 4 𝑏 2 (3) 𝜖𝑏 1 𝑥 12 𝜖𝐹 𝒚 (4) (4) = 𝑕′ 𝑨 1 = 2(𝑏 1 − 𝑧 1 ) 𝜖𝑏 1 𝜖𝑨 1 𝜖𝐹 𝒚 (4) (4) (3) (4) = 2(𝑏 1 − 𝑧 1 )𝑕 𝑨 1 1 − 𝑕 𝑨 1 𝑏 1 By Chain Rule: 𝜖𝑥 11 slide 25

Gradient (on one data point) Layer (1) Layer (2) Layer (3) Layer (4) (4) = 𝑥 11 (3) + 𝑥 12 (4) 𝑏 1 (4) 𝑏 2 (3) 𝑨 1 (4) 𝑥 11 𝑦 1 𝑏 1 𝑧 − 𝑏 2 = 𝐹 𝑦 𝑏 2 𝑦 2 4 𝑏 1 (3) 𝑥 11 (4) 𝑕 𝑨 1 𝑧 − 𝑏 2 (4) 𝑨 1 𝑏 1 𝐹 𝑦 4 𝑏 2 (3) 𝜖𝑏 1 𝑥 12 𝜖𝐹 𝒚 (4) (4) = 𝑕′ 𝑨 1 = 2(𝑏 1 − 𝑧 1 ) 𝜖𝑏 1 𝜖𝑨 1 𝜖𝐹 𝒚 (3) (4) = 2 𝑏 1 − 𝑧 1 𝑏 1 1 − 𝑏 1 𝑏 1 By Chain Rule: 𝜖𝑥 11 slide 26

Gradient (on one data point) Layer (1) Layer (2) Layer (3) Layer (4) (4) = 𝑥 11 (3) + 𝑥 12 (4) 𝑏 1 (4) 𝑏 2 (3) 𝑨 1 (4) 𝑥 11 𝑦 1 𝑏 1 𝑧 − 𝑏 2 = 𝐹 𝑦 𝑏 2 𝑦 2 4 𝑏 1 (3) 𝑥 11 (4) 𝑕 𝑨 1 𝑧 − 𝑏 2 (4) 𝑨 1 𝑏 1 𝐹 𝑦 4 𝑏 2 (3) 𝜖𝑏 1 𝑥 12 𝜖𝐹 𝒚 (4) (4) = 𝑕′ 𝑨 1 = 2(𝑏 1 − 𝑧 1 ) 𝜖𝑏 1 𝜖𝑨 1 𝜖𝐹 𝒚 (3) (4) = 2 𝑏 1 − 𝑧 1 𝑏 1 1 − 𝑏 1 𝑏 1 By Chain Rule: 𝜖𝑥 11 Can be computed by network activation slide 27

Neural Networks Part 2 Yingyu Liang yliang@cs.wisc.edu Computer - PowerPoint PPT Presentation

Neural Networks Part 2 Yingyu Liang yliang@cs.wisc.edu Computer Sciences Department University of Wisconsin, Madison [Based on slides from Jerry Zhu, Mohit Gupta] Limited power of one single neuron Perceptron: = (

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

Neural Networks and Handwriting Recognition Background Neural Networks Neural Network Steven

Neural Networks Neural networks arise from attempts to model Neural Networks human/animal

Sequential Data with Neural Networks Recurrent Neural Networks Sequential input / output Greg

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural

CHAPTER II I CHAPTER I Recurrent Neural Networks Recurrent Neural Networks CHAPTER II : I :

CHAPTER II III I CHAPTER Neural Networks as Neural Networks as Associative Memory

Convolutional Neural Networks Convolutional neural networks One of the major kinds of ANNs in use

Neural Networks 0. Logistics Spring 2019 1 Neural Networks are taking over! Neural networks

Neural Networks and their Application to Go Neural Networks Learning Blackjack Theory Training

Neural Networks 1. Introduction Fall 2017 Neural Networks are taking over! Neural networks

Neural Networks Neural Net Basics Dan Klein, John DeNero UC Berkeley Slides adapted from Greg

Relaxation and Hopfield Networks Neural Networks Neural Networks - Hopfield 1 Bibliography

Neural Networks 1. Introduction Spring 2020 1 Neural Networks are taking over! Neural

Introduction to Artificial Intelligence Neural Networks - Deep Learning for NLP Janyl Jumadinova

Neural Networks 1. Introduction Spring 2019 1 Neural Networks are taking over! Neural

Convex Sets Associated to C -Algebras Trace Space Examples Scott Atkinson University of

Interface extensions YANG & VLAN sub-interface YANG Status update

Today. Types of graphs. Today. Types of graphs. Complete Graphs. Trees. Hypercubes. Today.

Context: Extended Number Hierarchy Object Magnitude Natural Number Integer Fraction Float

MAKING THE GREEN ECONOMY WORK Hadrian Mertins-Kirkwood Canadian Centre for Policy Alternatives

Attendance: Brandon Prior AGAT Labs Janice Gallwey AGAT Labs Neil Guay AltaGas Ltd. Shawna

Transforming Albertas Referral Experience Connecting Healthcare Professionals Allen Ausford

Spotlight on your Healthy Albertans through excellence in pharmacy practice Agenda Overview

Neural Networks Part 2 Yingyu Liang yliang@cs.wisc.edu Computer - PowerPoint PPT Presentation

Neural Networks Part 2 Yingyu Liang yliang@cs.wisc.edu Computer Sciences Department University of Wisconsin, Madison [Based on slides from Jerry Zhu, Mohit Gupta] Limited power of one single neuron Perceptron: = (

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

Neural Networks and Handwriting Recognition Background Neural Networks Neural Network Steven

Neural Networks Neural networks arise from attempts to model Neural Networks human/animal

Sequential Data with Neural Networks Recurrent Neural Networks Sequential input / output Greg

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural

CHAPTER II I CHAPTER I Recurrent Neural Networks Recurrent Neural Networks CHAPTER II : I :

CHAPTER II III I CHAPTER Neural Networks as Neural Networks as Associative Memory

Convolutional Neural Networks Convolutional neural networks One of the major kinds of ANNs in use

Neural Networks 0. Logistics Spring 2019 1 Neural Networks are taking over! Neural networks

Neural Networks and their Application to Go Neural Networks Learning Blackjack Theory Training

Neural Networks 1. Introduction Fall 2017 Neural Networks are taking over! Neural networks

Neural Networks Neural Net Basics Dan Klein, John DeNero UC Berkeley Slides adapted from Greg

Relaxation and Hopfield Networks Neural Networks Neural Networks - Hopfield 1 Bibliography

Neural Networks 1. Introduction Spring 2020 1 Neural Networks are taking over! Neural

Introduction to Artificial Intelligence Neural Networks - Deep Learning for NLP Janyl Jumadinova

Neural Networks 1. Introduction Spring 2019 1 Neural Networks are taking over! Neural

Convex Sets Associated to C -Algebras Trace Space Examples Scott Atkinson University of

Interface extensions YANG &amp; VLAN sub-interface YANG Status update

Today. Types of graphs. Today. Types of graphs. Complete Graphs. Trees. Hypercubes. Today.

Context: Extended Number Hierarchy Object Magnitude Natural Number Integer Fraction Float

MAKING THE GREEN ECONOMY WORK Hadrian Mertins-Kirkwood Canadian Centre for Policy Alternatives

Attendance: Brandon Prior AGAT Labs Janice Gallwey AGAT Labs Neil Guay AltaGas Ltd. Shawna

Transforming Albertas Referral Experience Connecting Healthcare Professionals Allen Ausford

Spotlight on your Healthy Albertans through excellence in pharmacy practice Agenda Overview

Interface extensions YANG & VLAN sub-interface YANG Status update