Neural Networks: What can a network represent Deep Learning, Fall - PowerPoint PPT Presentation

How many layers for a Boolean MLP? Truth table shows all input combinations for which output is 1 Truth Table X 1 X 2 X 3 X 4 X 5 Y � � � � � � � � � � � � � � � 0 0 1 1 0 1 � � � � � � � � � � � � � � � 0 1 0 1 1 1 0 1 1 0 0 1 1 0 0 0 1 1 1 0 1 1 1 1 1 1 0 0 1 1 X 2 X 3 X 4 X 5 X 1 • Expressed in disjunctive normal form 38

How many layers for a Boolean MLP? Truth table shows all input combinations for which output is 1 Truth Table X 1 X 2 X 3 X 4 X 5 Y � � � � � � � � � � � � � � � 0 0 1 1 0 1 � � � � � � � � � � � � � � � 0 1 0 1 1 1 0 1 1 0 0 1 1 0 0 0 1 1 1 0 1 1 1 1 1 1 0 0 1 1 X 2 X 3 X 4 X 5 X 1 • Expressed in disjunctive normal form 39

How many layers for a Boolean MLP? Truth table shows all input combinations for which output is 1 Truth Table X 1 X 2 X 3 X 4 X 5 Y � � � � � � � � � � � � � � � 0 0 1 1 0 1 � � � � � � � � � � � � � � � 0 1 0 1 1 1 0 1 1 0 0 1 1 0 0 0 1 1 1 0 1 1 1 1 1 1 0 0 1 1 X 2 X 3 X 4 X 5 X 1 • Any truth table can be expressed in this manner! • A one-hidden-layer MLP is a Universal Boolean Function 40

How many layers for a Boolean MLP? Truth table shows all input combinations for which output is 1 Truth Table X 1 X 2 X 3 X 4 X 5 Y � � � � � � � � � � � � � � � 0 0 1 1 0 1 � � � � � � � � � � � � � � � 0 1 0 1 1 1 0 1 1 0 0 1 1 0 0 0 1 1 1 0 1 1 1 1 1 1 0 0 1 1 X 2 X 3 X 4 X 5 X 1 • Any truth table can be expressed in this manner! • A one-hidden-layer MLP is a Universal Boolean Function But what is the largest number of perceptrons required in the single hidden layer for an N-input-variable function? 41

Reducing a Boolean Function YZ WX 00 01 11 10 This is a “Karnaugh Map” 00 It represents a truth table as a grid Filled boxes represent input combinations 01 for which output is 1; blank boxes have output 0 11 Adjacent boxes can be “grouped” to reduce the complexity of the DNF formula 10 for the table • DNF form: – Find groups – Express as reduced DNF 42

Reducing a Boolean Function YZ WX 00 01 11 10 00 Basic DNF formula will require 7 terms 01 11 10 43

Reducing a Boolean Function YZ WX 00 01 11 10 00 01 11 10 • Reduced DNF form: – Find groups – Express as reduced DNF 44

Reducing a Boolean Function YZ WX 00 01 11 10 00 01 11 10 W X Y Z • Reduced DNF form: – Find groups – Express as reduced DNF – Boolean network for this function needs only 3 hidden units • Reduction of the DNF reduces the size of the one-hidden-layer network 45

Largest irreducible DNF? YZ WX 00 01 11 10 00 01 11 10 • What arrangement of ones and zeros simply cannot be reduced further? 46

Largest irreducible DNF? YZ WX 00 01 11 10 Red=0, white=1 00 01 11 10 • What arrangement of ones and zeros simply cannot be reduced further? 47

Largest irreducible DNF? YZ How many neurons WX 00 01 11 10 in a DNF (one- 00 hidden-layer) MLP 01 for this Boolean 11 function? 10 • What arrangement of ones and zeros simply cannot be reduced further? 48

Width of a one-hidden-layer Boolean MLP Red=0, white=1 YZ WX 00 01 11 10 11 10 01 00 YZ 00 01 11 10 UV • How many neurons in a DNF (one-hidden- layer) MLP for this Boolean function of 6 variables? 49

Width of a one-hidden-layer Boolean MLP YZ WX 00 Can be generalized: Will require 2 N-1 perceptrons in hidden layer 01 Exponential in N 11 10 11 10 01 00 YZ 00 01 11 10 UV • How many neurons in a DNF (one-hidden- layer) MLP for this Boolean function 50

Width of a one-hidden-layer Boolean MLP YZ WX 00 Can be generalized: Will require 2 N-1 perceptrons in hidden layer 01 Exponential in N 11 10 11 10 01 00 YZ 00 01 11 10 UV • How many neurons in a DNF (one-hidden- How many units if we use multiple hidden layers? layer) MLP for this Boolean function 51

Size of a deep MLP YZ WX 00 01 11 10 YZ WX 00 00 01 01 11 11 10 11 10 01 10 00 YZ 00 01 11 10 UV 52

Multi-layer perceptron XOR X 1 1 1 -1 2 1 1 -1 -1 Y Hidden Layer • An XOR takes three perceptrons 53

Size of a deep MLP YZ WX 00 01 11 10 00 01 9 perceptrons 11 10 W X Y Z • An XOR needs 3 perceptrons • This network will require 3x3 = 9 perceptrons 54

Size of a deep MLP YZ WX 00 01 11 10 11 10 01 00 YZ 00 01 11 10 UV 15 perceptrons U V W X Y Z • An XOR needs 3 perceptrons • This network will require 3x5 = 15 perceptrons 55

Size of a deep MLP YZ WX 00 01 11 10 11 10 01 00 YZ 00 01 11 10 UV More generally, the XOR of N variables will require 3(N-1) U V W X Y Z perceptrons!! • An XOR needs 3 perceptrons • This network will require 3x5 = 15 perceptrons 56

One-hidden layer vs deep Boolean MLP YZ WX 00 Single hidden layer: Will require 2 N-1 +1 perceptrons in all (including output unit) 01 Exponential in N 11 10 11 10 01 Will require 3(N-1) perceptrons in a deep 00 YZ 00 01 11 10 UV network • How many neurons in a DNF (one-hidden- Linear in N!!! layer) MLP for this Boolean function Can be arranged in only 2log 2 (N) layers

A better representation 𝑌 � 𝑌 � • Only layers – By pairing terms – 2 layers per XOR … 58

A better representation XOR XOR XOR XOR 𝑌 � 𝑌 � • Only layers – By pairing terms – 2 layers per XOR … 59

The challenge of depth …… 𝑎 � 𝑎 � 𝑌 � 𝑌 � • Using only K hidden layers will require O(2 CN ) neurons in the Kth layer, where �(��)/� – Because the output can be shown to be the XOR of all the outputs of the K-1th hidden layer – I.e. reducing the number of layers below the minimum will result in an exponentially sized network to express the function fully – A network with fewer than the minimum required number of neurons cannot model 60 the function

The actual number of parameters in a network X 2 X 3 X 4 X 5 X 1 • The actual number of parameters in a network is the number of connections – In this example there are 30 • This is the number that really matters in software or hardware implementations • Networks that require an exponential number of neurons will require an exponential number of weights.. 61

Recap: The need for depth • Deep Boolean MLPs that scale linearly with the number of inputs … • … can become exponentially large if recast using only one hidden layer • It gets worse.. 62

The need for depth a b c d e f X 2 X 3 X 4 X 5 X 1 • The wide function can happen at any layer • Having a few extra layers can greatly reduce network size 63

Depth vs Size in Boolean Circuits • The XOR is really a parity problem • Any Boolean parity circuit of depth using AND,OR and NOT gates with unbounded fan-in must have size – Parity, Circuits, and the Polynomial-Time Hierarchy, M. Furst, J. B. Saxe, and M. Sipser, Mathematical Systems Theory 1984 – Alternately stated: • Set of constant-depth polynomial size circuits of unbounded fan-in elements 64

Caveat 1: Not all Boolean functions.. • Not all Boolean circuits have such clear depth-vs-size tradeoff • Shannon’s theorem: For , there is a Boolean function of variables that requires at least Boolean gates – More correctly, for large , almost all n -input Boolean functions need more than � Boolean gates • Regardless of depth • Note: If all Boolean functions over inputs could be computed using a circuit of size that is polynomial in , P = NP! 65

Network size: summary • An MLP is a universal Boolean function • But can represent a given function only if – It is sufficiently wide – It is sufficiently deep – Depth can be traded off for (sometimes) exponential growth of the width of the network • Optimal width and depth depend on the number of variables and the complexity of the Boolean function – Complexity: minimal number of terms in DNF formula to represent it 66

Story so far • Multi-layer perceptrons are Universal Boolean Machines • Even a network with a single hidden layer is a universal Boolean machine – But a single-layer network may require an exponentially large number of perceptrons • Deeper networks may require far fewer neurons than shallower networks to express the same function – Could be exponentially smaller 67

Caveat 2 • Used a simple “Boolean circuit” analogy for explanation • We actually have threshold circuit (TC) not, just a Boolean circuit (AC) – Specifically composed of threshold gates • More versatile than Boolean gates (can compute majority function) – E.g. “at least K inputs are 1” is a single TC gate, but an exponential size AC – For fixed depth, 𝐶𝑝𝑝𝑚𝑓𝑏𝑜 𝑑𝑗𝑠𝑑𝑣𝑗𝑢𝑡 ⊂ 𝑢ℎ𝑠𝑓𝑡ℎ𝑝𝑚𝑒 𝑑𝑗𝑠𝑑𝑣𝑗𝑢𝑡 (strict subset) � weights – A depth-2 TC parity circuit can be composed with • But a network of depth log (𝑜) requires only 𝒫 𝑜 weights – But more generally, for large , for most Boolean functions, a threshold circuit that is polynomial in at optimal depth may become exponentially large at • Other formal analyses typically view neural networks as arithmetic circuits – Circuits which compute polynomials over any field • So let’s consider functions over the field of reals 68

Today • Multi-layer Perceptrons as universal Boolean functions – The need for depth • MLPs as universal classifiers – The need for depth • MLPs as universal approximators • A discussion of optimal depth and width • Brief segue: RBF networks 69

Recap: The MLP as a classifier 2 784 dimensions (MNIST) 784 dimensions • MLP as a function over real inputs • MLP as a function that finds a complex “decision boundary” over a space of reals 70

A Perceptron on Reals x 1 1 x 2 x 3 w 1 x 1 +w 2 x 2 =T x 2 0 x N x 1 � � � x 2 • A perceptron operates on x 1 real- valued vectors – This is a linear classifier 71

Boolean functions with a real perceptron 1,1 1,1 1,1 0,1 0,1 0,1 X X Y Y Y X 0,0 1,0 0,0 1,0 0,0 1,0 • Boolean perceptrons are also linear classifiers – Purple regions are 1 72

Composing complicated “decision” boundaries Can now be composed into x 2 “networks” to compute arbitrary classification “boundaries” x 1 • Build a network of units with a single output that fires if the input is in the coloured area 73

Booleans over the reals x 2 x 1 x 2 x 1 • The network must fire if the input is in the coloured area 74

Booleans over the reals 3 � � x 2 x 2 4 �� 4 AND 3 3 5 y 1 y 2 y 3 y 4 y 5 x 1 x 1 4 4 3 3 4 x 2 x 1 • The network must fire if the input is in the coloured area 79

More complex decision boundaries OR AND AND x 2 x 1 x 2 x 1 • Network to fire if the input is in the yellow area – “OR” two polygons – A third layer is required 80

Complex decision boundaries • Can compose arbitrarily complex decision boundaries 81

Complex decision boundaries OR AND x 1 x 2 • Can compose arbitrarily complex decision boundaries 82

Complex decision boundaries OR AND x 1 x 2 • Can compose arbitrarily complex decision boundaries – With only one hidden layer! – How ? 83

Exercise: compose this with one hidden layer x 2 x 1 x 2 x 1 • How would you compose the decision boundary to the left with only one hidden layer? 84

Composing a Square decision boundary 2 2 2 4 2 � � y � ≥ 4? �� • The polygon net y 1 y 2 y 3 y 4 x 2 x 1 85

Composing a pentagon 2 2 3 4 4 3 3 5 4 4 2 4 2 3 3 � � y � ≥ 5? �� 2 y 1 y 2 y 3 y 4 y 5 • The polygon net x 2 x 1 86

Composing a hexagon 3 4 3 3 5 5 5 6 4 4 5 5 5 3 3 4 4 3 � � y � ≥ 6? �� y 6 y 1 y 2 y 3 y 4 y 5 • The polygon net x 2 x 1 87

How about a heptagon • What are the sums in the different regions? – A pattern emerges as we consider N > 6.. • N is the number of sides of the polygon 88

16 sides • What are the sums in the different regions? – A pattern emerges as we consider N > 6.. 89

Polygon net � � y � ≥ 𝑂? �� y 1 y 2 y 3 y 4 y 5 x 2 x 1 • Increasing the number of sides reduces the area outside the polygon that have � � � � 92

In the limit � � y � ≥ 𝑂? �� y 1 y 2 y 3 y 4 y 5 x 2 x 1 N N/2 � �� • � � � 𝐲�� – Value of the sum at the output unit, as a function of distance from center, as N increases • For small radius, it’s a near perfect cylinder – N in the cylinder, N/2 outside 93

Composing a circle � N � y � ≥ 𝑂? �� N/2 • The circle net – Very large number of neurons – Sum is N inside the circle, N/2 outside almost everywhere – Circle can be at any location 94

Composing a circle 𝑶 − 𝑶 N/2 � 𝐳 𝒋 𝟑 ≥ 𝟏? 𝒋�𝟐 −𝑂/2 1 0 • The circle net – Very large number of neurons – Sum is N/2 inside the circle, 0 outside almost everywhere – Circle can be at any location 95

Adding circles 𝟑𝑶 − 𝑶 � 𝐳 𝒋 𝟑 ≥ 𝟏? 𝒋�𝟐 • The “sum” of two circles sub nets is exactly N/2 inside either circle, and 0 almost everywhere outside 96

Composing an arbitrary figure 𝑳𝑶 − 𝑶 � 𝐳 𝒋 𝟑 ≥ 𝟏? 𝒋�𝟐 • Just fit in an arbitrary number of circles – More accurate approximation with greater number of smaller circles – Can achieve arbitrary precision 97

MLP: Universal classifier 𝑳𝑶 − 𝑶 � 𝐳 𝒋 𝟑 ≥ 𝟏? 𝒋�𝟐 • MLPs can capture any classification boundary • A one-hidden-layer MLP can model any classification boundary • MLPs are universal classifiers 98

Depth and the universal classifier x 2 x 1 x 1 x 2 • Deeper networks can require far fewer neurons 99

Optimal depth.. • Formal analyses typically view these as category of arithmetic circuits – Compute polynomials over any field • Valiant et. al: A polynomial of degree n requires a network of � depth – Cannot be computed with shallower networks – The majority of functions are very high (possibly ∞ ) order polynomials • Bengio et. al: Shows a similar result for sum-product networks – But only considers two-input units – Generalized by Mhaskar et al. to all functions that can be expressed as a binary tree – Depth/Size analyses of arithmetic circuits still a research problem 100

Neural Networks: What can a network represent Deep Learning, Fall - PowerPoint PPT Presentation

Neural Networks: What can a network represent Deep Learning, Fall 2020 1 Recap : Neural networks have taken over AI Tasks that are made possible by NNs, aka deep learning Tasks that were once assumed to be purely in the human domain

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

Neural Networks and Handwriting Recognition Background Neural Networks Neural Network Steven

Neural Networks Neural networks arise from attempts to model Neural Networks human/animal

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural

Sequential Data with Neural Networks Recurrent Neural Networks Sequential input / output Greg

Neural Networks: What can a network represent Deep Learning, Spring 2018 1 Recap : Neural

Neural Networks and their Application to Go Neural Networks Learning Blackjack Theory Training

Artificial Neural Networks Oliver Schulte - CMPT 726 Feed-forward Networks Network Training

CHAPTER II I CHAPTER I Recurrent Neural Networks Recurrent Neural Networks CHAPTER II : I :

CHAPTER II III I CHAPTER Neural Networks as Neural Networks as Associative Memory

Convolutional Neural Networks Convolutional neural networks One of the major kinds of ANNs in use

Neural Networks 0. Logistics Spring 2019 1 Neural Networks are taking over! Neural networks

Neural Networks 1. Introduction Fall 2017 Neural Networks are taking over! Neural networks

(Very) Brief Introduction to Neural Networks IITP-03 Algorithms for NLP 1 / 31 Learning

Neural Networks Neural Net Basics Dan Klein, John DeNero UC Berkeley Slides adapted from Greg

Relaxation and Hopfield Networks Neural Networks Neural Networks - Hopfield 1 Bibliography

Model order reduction for PDE constrained optimization in vibrations Karl Meerbergen (Joint work

Slide Reduction, RevisitedFilling the Gaps in Lattice SVP Approximation Jianwei Li ISG, RHUL,

Sustainable Transportation Advisory Council Meeting #1 Thursday, March 5, 2020 Co-chairs welcome

1.2 Row Reduction and Echelon Forms McDonald Fall 2018, MATH 2210Q 1.2 Slides Homework: Read the

Control Regularization for Reduced Variance Reinforcement Learning Richard Cheng, Abhinav Verma,

Parallel Numerical Algorithms Chapter 6 Matrix Models Section 6.2 Low Rank Approximation

+ Design of Parallel Algorithms Communication Algorithms + Topic Overview n One-to-All Broadcast

Efficient Stream Reduction on the GPU Efficient Stream Reduction on the GPU David Roger, Ulf