multi layer networks
play

Multi-Layer Networks M. Soleymani Deep Learning Sharif University - PowerPoint PPT Presentation

Multi-Layer Networks M. Soleymani Deep Learning Sharif University of Technology Spring 2019 Most slides have been adapted from: Bhiksha Raj, 11-785, CMU 2019 and Fei Fei Li lectures, cs231n, Stanford 2017 and some from Hinton, NN for


  1. Perceptron Algorithm 𝜹 𝜹 𝑺 𝛿 is the best-case margin R is the length of the longest vector 37

  2. Adjusting weights 𝒙 _`& = 𝒙 _ βˆ’ πœƒπ›ΌπΉ c 𝒙 _ β€’ Weight update for a training pair (π’š c , 𝑧 (c) ) : – Perceptron : If π‘‘π‘—π‘•π‘œ(𝒙 d π’š (c) ) β‰  𝑧 (c) then βˆ†π’™ = π’š (c) 𝑧 (c) else βˆ†π’™ = 𝟏 – ADALINE : βˆ†π’™ = πœƒ(𝑧 (c) βˆ’ 𝒙 d π’š (c) )π’š (c) 𝐹 c 𝒙 = 𝑧 (c) βˆ’ 𝒙 d π’š (c) ' β€’ Widrow-Hoff, LMS, or delta rule 38

  3. How to learn the weights: multi class example 40

  4. How to learn the weights: multi class example β€’ If correct: no change β€’ If wrong: – lower score of the wrong answer (by removing the input from the weight vector of the wrong answer) – raise score of the target (by adding the input to the weight vector of the target class) 41

  5. How to learn the weights: multi class example β€’ If correct: no change β€’ If wrong: – lower score of the wrong answer (by removing the input from the weight vector of the wrong answer) – raise score of the target (by adding the input to the weight vector of the target class) 42

  6. How to learn the weights: multi class example β€’ If correct: no change β€’ If wrong: – lower score of the wrong answer (by removing the input from the weight vector of the wrong answer) – raise score of the target (by adding the input to the weight vector of the target class) 43

  7. How to learn the weights: multi class example β€’ If correct: no change β€’ If wrong: – lower score of the wrong answer (by removing the input from the weight vector of the wrong answer) – raise score of the target (by adding the input to the weight vector of the target class) 44

  8. How to learn the weights: multi class example β€’ If correct: no change β€’ If wrong: – lower score of the wrong answer (by removing the input from the weight vector of the wrong answer) – raise score of the target (by adding the input to the weight vector of the target class) 45

  9. How to learn the weights: multi class example β€’ If correct: no change β€’ If wrong: – lower score of the wrong answer (by removing the input from the weight vector of the wrong answer) – raise score of the target (by adding the input to the weight vector of the target class) 46

  10. Single layer networks as template matching β€’ Weights for each class as a template (or sometimes also called a prototype) for that class. – The winner is the most similar template. β€’ The ways in which hand-written digits vary are much too complicated to be captured by simple template matches of whole shapes. β€’ To capture all the allowable variations of a digit we need to learn the features that it is composed of. 47

  11. The history of perceptrons β€’ They were popularised by Frank Rosenblatt in the early 1960’s. – They appeared to have a very powerful learning algorithm. – Lots of grand claims were made for what they could learn to do. β€’ In 1969, Minsky and Papert published a book called β€œPerceptrons” that analyzed what they could do and showed their limitations. – Many people thought these limitations applied to all neural network models. 48

  12. What binary threshold neurons cannot do β€’ A binary threshold output unit cannot even tell if two single bit features are the same! β€’ A geometric view of what binary threshold neurons cannot do β€’ The positive and negative cases cannot be separated by a plane 49

  13. What binary threshold neurons cannot do β€’ Positive cases (same): (1,1)->1; (0,0)->1 β€’ Negative cases (different): (1,0)->0; (0,1)->0 β€’ The four input-output pairs give four inequalities that are impossible to satisfy: – w 1 + w 2 β‰₯ΞΈ – 0 β‰₯ΞΈ – w 1 <ΞΈ – w 2 <ΞΈ 50

  14. Discriminating simple patterns under translation with wrap-around β€’ Suppose we just use pixels as the features. β€’ binary decision unit cannot discriminate patterns with the same number of on pixels – if the patterns can translate with wrap- around! 51

  15. Sketch of a proof β€’ For pattern A, use training cases in all possible translations. – Each pixel will be activated by 4 different translations of pattern A. – So the total input received by the decision unit over all these patterns will be four times the sum of all the weights. β€’ For pattern B, use training cases in all possible translations. – Each pixel will be activated by 4 different translations of pattern B. – So the total input received by the decision unit over all these patterns will be four times the sum of all the weights. β€’ But to discriminate correctly, every single case of pattern A must provide more input to the decision unit than every single case of pattern B. β€’ This is impossible if the sums over cases are the same. 52

  16. Networks with hidden units β€’ Networks without hidden units are very limited in the input-output mappings they can learn to model. – More layers of linear units do not help. Its still linear. – Fixed output non-linearities are not enough. β€’ We need multiple layers of adaptive, non-linear hidden units. But how can we train such nets? 53

  17. The multi-layer perceptron β€’ A network of perceptrons – Generally β€œlayered ” 54

  18. Feed-forward neural networks β€’ Also called Multi-Layer Perceptron (MLP) 55

  19. MLP with single hidden layer β€’ Two-layer MLP (Number of layers of adaptive weights is counted) ‰ ‰ ( ['] 𝑨 ['] 𝜚 * π‘₯ ,Λ† [&] 𝑦 , 𝑝 † π’š = πœ” * π‘₯ β‡’ 𝑝 † π’š = πœ” * π‘₯ Λ† ˆ† ˆ† Λ†qΕ  Λ†qΕ  ,qΕ  𝑨 Ε  = 1 𝑨 Λ† ['] [&] π‘₯ π‘₯ ,Λ† ˆ† 𝑨 & 𝑦 Ε  = 1 𝜚 πœ” 𝑝 & 𝑦 & … … 𝜚 … πœ” 𝑝 β€’ 𝑦 ( 𝜚 𝑨 ‰ Output Input 𝑗 = 0, … , 𝑒 π‘˜ = 1 … 𝑁 π‘˜ = 1 … 𝑁 𝑙 = 1, … , 𝐿 56

  20. Beyond linear models π’ˆ = π‘Ώπ’š π’ˆ = 𝑿 ' 𝜚 𝑿 𝟐 π’š 57

  21. Beyond linear models π’ˆ = π‘Ώπ’š π’ˆ = 𝑿 ' 𝜚 𝑿 𝟐 π’š π’ˆ = 𝑿 0 𝜚 𝑿 ' 𝜚 𝑿 𝟐 π’š 58

  22. Defining β€œdepth” β€’ What is a β€œdeep” network 60

  23. Deep Structures β€’ In any directed network of computational elements with input source nodes and output sink nodes, β€œdepth” is the length of the longest path from a source to a sink β€’ Left: Depth =2. Right: Depth =3 β€’ β€œ Deep ” [ Depth > 2 61

  24. The multi-layer perceptron N.Net β€’ Inputs are real or Boolean stimuli β€’ Outputs are real or Boolean value s – Can have multiple outputs for a single input β€’ What can this network compute? – What kinds of input/output relationships can it model? 63

  25. MLPs approximate functions 2 β„Ž 2 1 1 0 1 β„Ž n 1 -1 1 1 x 2 2 1 2 1 1 1 -1 -1 1 1 1 1 X Y Z A β€’ MLP s can compose Boolean functions β€’ MLPs can compose real-valued functions β€’ What are the limitations? 64

  26. Multi-layer Perceptrons as universal Boolean functions 65

  27. The perceptron as a Boolean gate X 1 -1 2 X 0 1 X 1 Y 1 1 Y β€’ A perceptron can model any simple binary Boolean gate 67

  28. Perceptron as aBoolean gate 1 1 1 L -1 -1 -1 Will fire only if X 1 .. X L are all 1 and X L+1 .. X N are all 0 β€’ The universal AND gate – AND any number of inputs β€’ Any subset of who may be negated 68

  29. Perceptron as aBoolean gate 1 1 1 L-N+1 -1 -1 -1 Will fire only if any of X 1 .. X L are 1 or any of X L+1 .. X N are 0 β€’ The universal OR gate – OR any number of inputs β€’ Any subset of who may be negated 69

  30. Perceptron as aBoolean gate 1 1 1 Will fire only if at least K inputs are 1 K 1 1 1 β€’ Generalized majority gate – Fire if at least K inputs are of the desired polarity 70

  31. Perceptron as aBoolean gate 1 1 Will fire only if the total number of of X 1 .. 1 X L that are 1 or X L+1 .. X N that are 0 is at L -N+K -1 least K -1 -1 β€’ Generalized majority gate – Fire if at least K inputs are of the desired polarity 71

  32. The perceptron is not enough X ? ? ? Y β€’ Cannot compute an XOR 72

  33. Multi-layer perceptron XOR X 1 1 1 1 2 1 -1 -1 -1 Y Hidden Layer β€’ An XOR takes three perceptrons 73

  34. Multi-layer perceptron XOR X 1 1 -2 1.5 0.5 1 1 Y β€’ With 2 neurons – 5 weights and two thresholds 74

  35. Multi-layer perceptron 2 1 1 0 1 1 -1 1 1 2 2 1 1 1 1 1 -1 -1 1 1 1 1 X Y Z A β€’ MLPs can compute more complex Boolean functions β€’ MLPs can compute any Boolean function – Since they can emulate individual gates β€’ MLPs are universal Boolean functions 75

  36. MLP as Boolean Functions 2 1 1 0 1 1 -1 1 1 2 1 1 2 -1 -1 1 1 1 1 1 1 1 X Y Z A β€’ MLPs are universal Boolean functions – Any function over any number of inputs and any number of outputs β€’ But how many β€œlayers” will they need? 76

  37. How many layers for aBoolean MLP? Truth Table X X X X X Y 1 2 3 4 5 0 0 1 1 0 1 0 1 0 1 1 1 Truth table shows all input 0 1 1 0 0 1 combinations for which output is 1 1 0 0 0 1 1 1 0 1 1 1 1 1 1 0 0 1 1 β€’ A Boolean function is just a truth table 77

  38. How many layers for aBoolean MLP? Truth table shows all input combinations for which output is 1 Truth Table X X X X X Y 1 2 3 4 5 0 0 1 1 0 1 0 1 0 1 1 1 ” & π‘Œ ” ' X 0 X – π‘Œ ” β€” + π‘Œ ” & X ' π‘Œ ” 0 X – X β€” + π‘Œ ” & X ' X 0 π‘Œ ” – π‘Œ ” β€” + y = π‘Œ 0 1 1 0 0 1 ” ' π‘Œ ” 0 π‘Œ ” – X β€” + X & π‘Œ ” ' X 0 X – X β€” + X & X ' π‘Œ ” 0 π‘Œ ” – X β€” X & π‘Œ 1 0 0 0 1 1 1 0 1 1 1 1 1 1 0 0 1 1 β€’ Expressed in disjunctive normal form 78

  39. How many layers for aBoolean MLP? Truth table shows all input combinations for which output is 1 Truth Table X X X X X Y 1 2 3 4 5 0 0 1 1 0 1 0 1 0 1 1 1 ” & π‘Œ ” ' X 0 X – π‘Œ ” β€” + π‘Œ ” & X ' π‘Œ ” 0 X – X β€” + π‘Œ ” & X ' X 0 π‘Œ ” – π‘Œ ” β€” + y = π‘Œ 0 1 1 0 0 1 ” ' π‘Œ ” 0 π‘Œ ” – X β€” + X & π‘Œ ” ' X 0 X – X β€” + X & X ' π‘Œ ” 0 π‘Œ ” – X β€” X & π‘Œ 1 0 0 0 1 1 1 0 1 1 1 1 1 1 0 0 1 1 X & X ' X 0 X – X β€” β€’ Expressed in disjunctive normal form 79

  40. How many layers for aBoolean MLP? Truth table shows all input combinations for which output is 1 Truth Table X X X X X Y 1 2 3 4 5 0 0 1 1 0 1 0 1 0 1 1 1 ” & π‘Œ ” ' X 0 X – π‘Œ ” β€” + π‘Œ ” & X ' π‘Œ ” 0 X – X β€” + π‘Œ ” & X ' X 0 π‘Œ ” – π‘Œ ” β€” + y = π‘Œ 0 1 1 0 0 1 ” ' π‘Œ ” 0 π‘Œ ” – X β€” + X & π‘Œ ” ' X 0 X – X β€” + X & X ' π‘Œ ” 0 π‘Œ ” – X β€” X & π‘Œ 1 0 0 0 1 1 1 0 1 1 1 1 1 1 0 0 1 1 X & X ' X 0 X – X β€” β€’ Expressed in disjunctive normal form 80

  41. How many layers for aBoolean MLP? Truth table shows all input combinations for which output is 1 Truth Table X X X X X Y 1 2 3 4 5 0 0 1 1 0 1 0 1 0 1 1 1 ” & π‘Œ ” ' X 0 X – π‘Œ ” β€” + π‘Œ ” & X ' π‘Œ ” 0 X – X β€” + π‘Œ ” & X ' X 0 π‘Œ ” – π‘Œ ” β€” + y = π‘Œ 0 1 1 0 0 1 ” ' π‘Œ ” 0 π‘Œ ” – X β€” + X & π‘Œ ” ' X 0 X – X β€” + X & X ' π‘Œ ” 0 π‘Œ ” – X β€” X & π‘Œ 1 0 0 0 1 1 1 0 1 1 1 1 1 1 0 0 1 1 X & X ' X 0 X – X β€” β€’ Expressed in disjunctive normal form 81

  42. How many layers for aBoolean MLP? Truth table shows all input combinations for which output is 1 Truth Table X X X X X Y 1 2 3 4 5 0 0 1 1 0 1 0 1 0 1 1 1 ” & π‘Œ ” ' X 0 X – π‘Œ ” β€” + π‘Œ ” & X ' π‘Œ ” 0 X – X β€” + π‘Œ ” & X ' X 0 π‘Œ ” – π‘Œ ” β€” + y = π‘Œ 0 1 1 0 0 1 ” ' π‘Œ ” 0 π‘Œ ” – X β€” + X & π‘Œ ” ' X 0 X – X β€” + X & X ' π‘Œ ” 0 π‘Œ ” – X β€” X & π‘Œ 1 0 0 0 1 1 1 0 1 1 1 1 1 1 0 0 1 1 X & X ' X 0 X – X β€” β€’ Expressed in disjunctive normal form 82

  43. How many layers for aBoolean MLP? Truth table shows all input combinations for which output is 1 Truth Table X X X X X Y 1 2 3 4 5 0 0 1 1 0 1 0 1 0 1 1 1 ” & π‘Œ ” ' X 0 X – π‘Œ ” β€” + π‘Œ ” & X ' π‘Œ ” 0 X – X β€” + π‘Œ ” & X ' X 0 π‘Œ ” – π‘Œ ” β€” + y = π‘Œ 0 1 1 0 0 1 ” ' π‘Œ ” 0 π‘Œ ” – X β€” + X & π‘Œ ” ' X 0 X – X β€” + X & X ' π‘Œ ” 0 π‘Œ ” – X β€” X & π‘Œ 1 0 0 0 1 1 1 0 1 1 1 1 1 1 0 0 1 1 X & X ' X 0 X – X β€” β€’ Expressed in disjunctive normal form 83

  44. How many layers for aBoolean MLP? Truth table shows all input combinations for which output is 1 Truth Table X X X X X Y 1 2 3 4 5 0 0 1 1 0 1 0 1 0 1 1 1 ” & π‘Œ ” ' X 0 X – π‘Œ ” β€” + π‘Œ ” & X ' π‘Œ ” 0 X – X β€” + π‘Œ ” & X ' X 0 π‘Œ ” – π‘Œ ” β€” + y = π‘Œ 0 1 1 0 0 1 ” ' π‘Œ ” 0 π‘Œ ” – X β€” + X & π‘Œ ” ' X 0 X – X β€” + X & X ' π‘Œ ” 0 π‘Œ ” – X β€” X & π‘Œ 1 0 0 0 1 1 1 0 1 1 1 1 1 1 0 0 1 1 X & X ' X 0 X – X β€” β€’ Expressed in disjunctive normal form 84

  45. How many layers for aBoolean MLP? Truth table shows all input combinations for which output is 1 Truth Table X X X X X Y 1 2 3 4 5 ” & π‘Œ ” ' X 0 X – π‘Œ ” β€” + π‘Œ ” & X ' π‘Œ ” 0 X – X β€” + π‘Œ ” & X ' X 0 π‘Œ ” – π‘Œ ” β€” + y = π‘Œ 0 0 1 1 0 1 0 1 0 1 1 1 ” ' π‘Œ ” 0 π‘Œ ” – X β€” + X & π‘Œ ” ' X 0 X – X β€” + X & X ' π‘Œ ” 0 π‘Œ ” – X β€” X & π‘Œ 0 1 1 0 0 1 1 0 0 0 1 1 1 0 1 1 1 1 1 1 0 0 1 1 X & X ' X 0 X – X β€” β€’ Expressed in disjunctive normal form 85

  46. How many layers for aBoolean MLP? ” & π‘Œ ” ' X 0 X – π‘Œ ” β€” + π‘Œ ” & X ' π‘Œ ” 0 X – X β€” + π‘Œ ” & X ' X 0 π‘Œ ” – π‘Œ ” β€” + y = π‘Œ Truth Table X X X X X Y ” ' π‘Œ ” 0 π‘Œ ” – X β€” + X & π‘Œ ” ' X 0 X – X β€” + X & X ' π‘Œ ” 0 π‘Œ ” – X β€” X & π‘Œ 1 2 3 4 5 0 0 1 1 0 1 0 1 0 1 1 1 0 1 1 0 0 1 1 0 0 0 1 1 1 0 1 1 1 1 1 1 0 0 1 1 X & X ' X 0 X – X β€” β€’ Any truth table can be expressed in this manner! β€’ A one-hidden-layer MLP is a Universal Boolean Function β€’ But what is the largest number of perceptrons required in the single hidden layer for an N-input-variable function? 86

  47. Worst case β€’ Which truth tables cannot be reduced further simply? β€’ Largest width needed for a single-layer Boolean network on N inputs – Worst case: 2 u˜& β€’ Example: Parity function π‘Œ, 𝑍 00 01 11 10 𝑋, π‘Ž 1 0 1 0 00 0 1 0 1 01 1 0 1 0 11 0 1 0 1 10 π‘Œ βŠ• 𝑍 βŠ• π‘Ž βŠ• 𝑋 87

  48. Boolean functions β€’ Input: N Boolean variable β€’ How many neurons in a one hidden layer MLP is required? β€’ More compact representation of a Boolean function – β€œKarnaugh Map” β€’ representing the truth table as a grid β€’ Grouping adjacent boxes to reduce the complexity of the Disjunctive Normal Form (DNF) formula π‘Œ, 𝑍 00 01 10 11 𝑋, π‘Ž 1 1 1 1 00 01 1 1 10 11 1 1 88

  49. How many neurons in the hidden layer? ”𝑍 ”𝑋 Ε“ π‘ŽΜ… ∨ π‘Œ ”𝑍𝑋 Ε“ π‘ŽΜ… ∨ π‘Œπ‘ ”𝑋 Ε“ π‘ŽΜ… ∨ π‘Œπ‘π‘‹ Ε“ π‘ŽΜ… ∨ π‘Œ ”𝑍 β€π‘‹π‘Ž ∨ π‘Œπ‘ β€π‘‹π‘ŽΜ… ∨ π‘Œπ‘π‘‹π‘ŽΜ… ∨ β€’ π‘Œ β€π‘‹π‘Ž π‘Œπ‘ π‘Œ, 𝑍 00 01 11 10 𝑋, π‘Ž 1 1 1 1 00 01 11 1 1 1 1 10 Ε“ π‘ŽΜ… ∨ 𝑍 β€π‘‹π‘Ž ∨ π‘Œπ‘‹π‘ŽΜ… β€’ 𝑋 89

  50. Width of a deepMLP Y Z WX 00 01 11 10 Y Z WX 00 00 01 01 11 11 10 11 10 01 10 00 Y Z 00 01 11 10 UV 92

  51. Using deep network: Parity function on N inputs β€’ Simple MLP with one hidden layer: 2 u˜& Hidden units 𝑂 + 2 2 u˜& + 1 Weights and biases 93

  52. Using deep network: Parity function on N inputs β€’ Simple MLP with one hidden layer: 2 u˜& Hidden units 𝑂 + 2 2 u˜& + 1 Weights and biases π‘Œ – β€’ 𝑔 = π‘Œ & βŠ• π‘Œ ' βŠ• β‹― βŠ• π‘Œ u 3(𝑂 βˆ’ 1) Nodes π‘Œ 0 9(𝑂 βˆ’ 1) Weights and biases The actual number of parameters in a network is the number that really matters in software or hardware implementations π‘Œ & π‘Œ ' 94

  53. A better architecture β€’ Only requires 2log𝑂 layers β€’ 𝑔 = π‘Œ & βŠ• π‘Œ ' βŠ• π‘Œ 0 βŠ• π‘Œ – βŠ• π‘Œ – βŠ• π‘Œ β€” βŠ• π‘Œ Β’ βŠ• π‘Œ Β£ π‘Œ & π‘Œ ' π‘Œ β€” π‘Œ Β’ π‘Œ 0 π‘Œ – π‘Œ Β£ π‘Œ Β€ 95

  54. The challenge of depth … … π‘Ž π‘Ž & ‰ 𝑦 & 𝑦 u β€’ Using only K hidden layers will require 𝑃 2 Β₯u neurons in the kth layer, where 𝐷 = 2 Λœβ€ /' – Because the output can be shown to be the XOR of all the outputs of k-1th hidden layer – i.e. reducing the number of layers below the minimum will result in an exponentially sized network to express the function fully – A network with fewer than the minimum required number of neurons cannot model the function 96

  55. Caveat 1: Not all Booleanfunctions.. β€’ Not all Boolean circuits have such clear depth-vs-size tradeoff β€’ Shannon’s theorem: For 𝑂 > 2 , there is Boolean function of 𝑂 variables that requires at least 2 u /𝑂 gates – More correctly, for large N, almost all N-input Boolean function need more than 2 u /𝑂 gates β€’ Regardless of depth β€’ Note: if all Boolean functions over 𝑂 inputs could be computed using a circuit of size that is polynomial in 𝑂 , P=NP ! 99

  56. Caveat 2 β€’ Used a simple β€œBoolean circuit” analogy for explanation β€’ We actually have threshold circuit (TC) not, just a Boolean circuit (AC) – Specifically composed of threshold gates β€’ More versatile than Boolean gates (can compute majority function) β€’ E.g. β€œat least K inputs are 1” is a single TC gate, but an exponential size AC β€’ For fixed depth, πΆπ‘π‘π‘šπ‘“π‘π‘œ 𝑑𝑗𝑠𝑑𝑣𝑗𝑒𝑑 βŠ‚ π‘’β„Žπ‘ π‘“π‘‘β„Žπ‘π‘šπ‘’ 𝑑𝑗𝑠𝑑𝑣𝑗𝑒𝑑 (𝑑𝑒𝑠𝑗𝑑𝑒 𝑑𝑣𝑐𝑑𝑓𝑒) – A depth-2 TC parity circuit can be composed with 𝑃(π‘œ ' ) weights β€’ But a network of depth log (π‘œ) requires only 𝑃(π‘œ) weights β€’ Other formal analyses typically view neural networks as arithmetic circuits – Circuits which compute polynomials over any field β€’ So lets consider functions over the field of reals 100

  57. Summary: Wide vs. deep network β€’ MLP with a single hidden layer is a universal Boolean function β€’ However, a single-layer network might need an exponential number of hidden units w.r.t. the number of inputs β€’ Deeper networks may require far fewer neurons than shallower networks to express the same function – Could be exponentially smaller β€’ Optimal width and depth depend on the number of variables and the complexity of the Boolean function – Complexity: minimal number of terms in DNF formula to represent it 101

  58. MLPs as universal classifiers 102

  59. The MLPas a classifier 2 784 dimensions (MNIST) 784 dimensions β€’ MLP as a function over real inputs β€’ MLP as a function that finds a complex β€œdecision boundary” over a space of reals 103

  60. A Perceptron onReals x 1 x 2 x 3 x N 1 * π‘₯ , 𝑦 , β‰₯ π‘ˆ , π‘₯ & 𝑦 & + π‘₯ ' 𝑦 ' = π‘ˆ x 2 0 x 2 x 1 x 1 β€’ A perceptron operates on real-valued vectors – This is a linear classifier 104

  61. Boolean functions with areal perceptron 1,1 1,1 1,1 0,1 0,1 0,1 X X Y Y Y X 0,0 1,0 0,0 1,0 0,0 1,0 β€’ Boolean perceptrons are also linear classifiers – Purple regions are 1 105

  62. Composing complicated β€œdecision” boundaries Can now be composed into β€œnetworks” to x 2 compute arbitrary classification β€œboundaries” x 1 β€’ Build a network of units with a single output that fires if the input is in the coloured area 106

  63. Booleans over the reals x 2 x 1 x 2 x 1 β€’ The network must fire if the input is in the coloured area 107

  64. Booleans over the reals x 2 x 1 x 2 x 1 β€’ The network must fire if the input is in the coloured area 108

  65. Booleans over the reals x 2 x 1 x 2 x 1 β€’ The network must fire if the input is in the coloured area 109

  66. Booleans over the reals x 2 x 1 x 2 x 1 β€’ The network must fire if the input is in the coloured area 110

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend