Perceptrons Sven Koenig, USC Russell and Norvig, 3 rd Edition, - PDF document

12/18/2019 Perceptrons Sven Koenig, USC Russell and Norvig, 3 rd Edition, Sections 18.7.1-18.7.4 These slides are new and can contain mistakes and typos. Please report them to Sven (skoenig@usc.edu). 1 Perceptrons • We now study how to acquire knowledge with machine learning. 2 1

12/18/2019 Inductive Learning for Classification • Labeled examples Feature_1 Feature_2 Class true true true true false false false true false Learn f(Feature_1, Feature_2) = Class from f(true, true) = true f(true, false) = false f(false, true) = false The function needs to be consistent with all labeled examples and should make the fewest mistakes on the unlabeled examples. • Unlabeled examples Feature_1 Feature_2 Class false false ? 3 Example: Perceptron Learning inputs x 1 x 2 … 0-1 0-1 0-1 weights w 1 w 2 … threshold function Neuron Perceptron threshold = activation function g(x) 1 0-1 0 output g(w 1 x 1 + w 2 x 2 + …) 0 threshold x • Objective: Learn the weights for a given perceptron. • From now on: binary (feature and class) values only (0=false, 1=true). 4 2

12/18/2019 Example: Perceptron Learning inputs x 1 x 2 x 1 x 2 x 1 0-1 0-1 0-1 0-1 0-1 weights 1.0 1.0 1.0 1.0 -1.0 threshold threshold threshold = 1.5 = 0.5 = -0.5 0-1 0-1 0-1 AND OR NOT 5 Example: Perceptron Learning • Labeled examples Feature_1 Feature_2 Class true true true true false false false true false Feature_1 Feature_2 0-1 0-1 1.0 1.0 threshold = 1.5 0-1 Class • Unlabeled examples (note: classification is very fast) Feature_1 Feature_2 Class false false ? (guess: false) 6 3

12/18/2019 Example: Perceptron Learning • Can perceptrons represent all Boolean functions? f(Feature_1, …, Feature_n) ≡ some propositional sentence 7 Example: Perceptron Learning • Can perceptrons represent all Boolean functions? f(Feature_1, …, Feature_n) ≡ some propositional sentence • Linear separability • We need to find an n-dimensional plane that separates the labeled examples with class true from the labeled examples with class false. • This plane determines the weights and threshold of the perceptron that can then be used to classify the unlabeled examples. 8 4

12/18/2019 Example: Perceptron Learning • Can perceptrons represent all Boolean functions? f(Feature_1, …, Feature_n) ≡ some propositional sentence • Linear separability • w 1 x 1 + w 2 x 2 = threshold • w 1 x 1 = threshold - w 2 x 2 • x 1 = (threshold / w 1 ) – (w 2 / w 1 ) x 2 = (1.5 / 1) – (1 / 1) x 2 = 1.5 – x 2 x 1 1.5 x 1 x 2 0-1 0-1 0 1 1.0 1.0 1 threshold = 1.5 0 0 0-1 0 AND 0 1 1.5 x 2 9 Example: Perceptron Learning • Can perceptrons represent all Boolean functions? f(Feature_1, …, Feature_n) ≡ some propositional sentence • Linear separability • w 1 x 1 + w 2 x 2 = threshold • w 1 x 1 = threshold - w 2 x 2 • x 1 = (threshold / w 1 ) – (w 2 / w 1 ) x 2 = (1.5 / 1) – (1 / 1) x 2 = 1.5 – x 2 x 1 0 1 ? 1 1 0 0 XOR 0 1 x 2 10 5

12/18/2019 Example: Perceptron Learning • Can perceptrons represent all Boolean functions? – no! f(Feature_1, …, Feature_n) ≡ some propositional sentence • An XOR cannot be represented with a single perceptron! • This does not mean that single perceptrons should not be used. They will make some mistakes for some Boolean functions (that is, might not be able to classify all labeled examples correctly) but they often work well, that is, make few mistakes on the labeled and unlabeled examples. Of course, you only want to use them if they do not make too many mistakes on the labeled examples. 11 Example: Perceptron Learning • The threshold can be expressed as a weight. • This way, a learning algorithm only needs to learn weights instead of the threshold and the weights. (The new threshold is always zero.) inputs x 1 x 2 x 1 x 2 always 1 0-1 0-1 0-1 0-1 1 weights 1.0 1.0 1.0 1.0 -1.5 threshold threshold AND = 1.5 = 0 0-1 0-1 12 6

12/18/2019 Example: Perceptron Learning j Feature f 1 Feature f 2 … Class E(xample) 1: l=1 f 11 f 12 … c 1 E(xample) 2: l=2 f 21 f 22 … c 2 l E(xample) 3: l=3 f 31 f 32 … c 3 … … … … … f 1 = x 1 f 2 = x 2 always 1 … 0-1 0-1 w 1 w 2 … • Learn the weights w 1 , w 2 , … so that the resulting perceptron is consistent with all labeled threshold = 0 examples 0-1 13 Gradient Descent • Finding a local minimum of a differentiable function f(x 1 , x 2 , …, x n ) with gradient descent f(x 1 , x 2 , …, x n ) 14 7

12/18/2019 Gradient Descent • Finding a local minimum of a differentiable function f(x 1 , x 2 , …, x n ) with gradient descent • Initialize x 1 , x 2 , …, x n with random values • Repeat until local minimum reached • Update x 1 , x 2 , …, x n to correspond to taking a small step against the gradient of f(x 1 , x 2 , …, x n ) at point (x 1 , x 2 , …, x n ), where the gradient is (d f(x 1 , x 2 , …, x n ) / d x 1 , d f(x 1 , x 2 , …, x n ) / d x 2 , …, d f(x 1 , x 2 , …, x n ) / d x n ). 15 Gradient Descent • Finding a local minimum of a differentiable function f(x 1 , x 2 , …, x n ) with gradient descent (for a small positive learning rate α) • Initialize x 1 , x 2 , …, x n with random values • Repeat until local minimum reached • For all x i in parallel • x i := x i – α d f(x 1 , x 2 , …, x n ) / d x i 16 8

12/18/2019 Gradient Descent • Finding a local minimum of a differentiable function f(x 1 , x 2 , …, x n ) with an approximation of gradient descent (for a small positive learning rate α) • Initialize x 1 , x 2 , …, x n with random values • Repeat until local minimum reached • For all x i • x i := x i – α d f(x 1 , x 2 , …, x n ) / d x i 17 Example: Perceptron Learning • We use the number of misclassified labeled examples as error and learn the weights w 1 , w 2 , … with gradient descent (for a small positive learning rate α) to correspond to a (local) minimum of the error function, that is, so that the resulting perceptron is consistent with all labeled examples: |x| • Minimize Error := 0.5 Σ l |o l – c l | - no: not differentiable at x=0 x x 2 • Minimize Error := 0.5 Σ l (o l – c l ) 2 x • for o l = g(Σ j w j f lj ), where g() is the activation function. • The 0.5 is for beauty reasons only (see the slide after the next one). 18 9

12/18/2019 Example: Perceptron Learning • Learn the weights w 1 , w 2 , … with gradient descent (for a small positive learning rate α) so that the resulting perceptron is consistent with all labeled examples: Threshold function Sigmoid function Slope (> 0) gives Slope (= 0) does not give gradient descent an g(x) g(x) gradient descent an indication to indication whether to 1 1 decrease x increase or decrease x to find a local minimum to find a local minimum 0 0 the output is any real the output is either 0 or 1 0 x 0 x value in the range (0,1) no: not differentiable at x=0 g(x) = 1 / (1 + e -x ) g’(x) = e -x / (1 + e -x ) 2 = g(x) (1 – g(x)) 19 Derivatives: Chain Rule • Quick reminder of the chain rule (since we need it on the next slide): d f(g(x)) / d x = f’(g(x)) g’(x) • For example, d (2x) 2 / d x = 2(2x) 2 = 8x by applying the chain rule since • f(x) = x 2 and g(x) = 2x • f’(x) = 2x and g’(x) = 2 • f(g(x)) = (2x) 2 • For example, d (e 2x ) 2 / dx = 2(e 2x ) e 2x 2 = 4 e 4x by applying the chain rule twice in a row 20 10

12/18/2019 Example: Perceptron Learning • Learn the weights w 1 , w 2 , … with gradient descent (for a small positive learning rate α) so that the resulting perceptron is consistent with all labeled examples: • Initialize all weights w j with random values • Repeat until local minimum reached • Let o l be the output of the perceptron for Example l for the current weights • For all weights w j in parallel called one epoch • w j := w j – α d Error(w 1 , w 2 , …,) /d w j This is the beauty reason! • Where • d Error(w 1 , w 2 , …) / d w j = d 0.5 Σ l (o l – c l ) 2 / d w j = d 0.5 Σ l (g(Σ j w j f lj ) – c l ) 2 / d w j = Σ l ((g(Σ j w j f lj ) – c l ) g’(Σ j w j f lj ) f lj ) = Σ l ((o l – c l ) g’(Σ j w j f lj ) f lj ) 21 Example: Perceptron Learning • Learn the weights w 1 , w 2 , … with an approximation of gradient descent (for a small positive learning rate α) so that the resulting perceptron is consistent with all labeled examples. Each labeled example is considered individually one after the other: • Initialize all weights w j with random values • Repeat until local minimum reached • Let o l be the output of the perceptron for Example l for the current weights • For all labeled examples l called one epoch • For all weights w j • w j := w j – α d Error(w 1 , w 2 , …,) /d w j • Where • d Error(w 1 , w 2 , …) / d wj = (o l – c l ) g’(Σ j w j f lj ) f lj 22 11

Perceptrons Sven Koenig, USC Russell and Norvig, 3 rd Edition, - PDF document

12/18/2019 Perceptrons Sven Koenig, USC Russell and Norvig, 3 rd Edition, Sections 18.7.1-18.7.4 These slides are new and can contain mistakes and typos. Please report them to Sven (skoenig@usc.edu). 1 Perceptrons We now study how to

Perceptrons Introduction: Neural Networks 1 The Perceptron 2 Using Perceptrons Perceptrons

Perceptrons Steven J Zeil Old Dominion Univ. Fall 2010 1 Introduction: Neural Networks The

CSC321 Lecture 5: Multilayer Perceptrons Roger Grosse Roger Grosse CSC321 Lecture 5: Multilayer

Machine Learning and Data Mining Multi-layer Perceptrons & Neural Networks: Basics Kalev

CS7015 (Deep Learning) : Lecture 2 McCulloch Pitts Neuron, Thresholding Logic, Perceptrons,

CSC421/2516 Lecture 3: Multilayer Perceptrons Roger Grosse and Jimmy Ba Roger Grosse and Jimmy

Perceptrons Barna Saha The Machine Learning Model Training set: A training set consists of a

Neural networks Chapter 20, Section 5 Chapter 20, Section 5 1 Outline Brains Neural

Machine Learning & Neural Networks CS16: Introduction to Data Structures & Algorithms

CMP784 DEEP LEARNING Lecture #03 Multi-layer Perceptrons Aykut Erdem // Hacettepe University

CS480/680 Lecture 9: June 5, 2019 Perceptrons, Neural Networks [D] Chapt. 4, [HTF] Chapt. 11,

Linear Classifiers CS 188: Artificial Intelligence Perceptrons and Logistic Regression Pieter

CS485/685 Lecture 7: Jan 24, 2012 Perceptrons, Neural Networks [B]: Sections 4.1.7, 5.1 CS485/685

Intro to Artificial Neural Networks Oscar Maas @oscmansan Outline 1. Perceptrons 2.

Introduction to Neural Networks Slides from L. Lazebnik, B. Hariharan Outline Perceptrons

Neural Net Backpropagation 3/20/17 Recall: Limitations of Perceptrons vs. AND and OR are

Chapter 3: Trees Computer Structure & Intro. to Digital Computers Dr. Guy Even Tel-Aviv

Boolean function analysis: beyond the hypercube Yuval Filmus Technion Israel Institute of

1. Boolean Algebra 1.1 Boolean Algebra Basics Verification Technology AND-operation

Professor: Alvin Chao The primitive data type boolean has two values: true and false . Boolean

Polynomial bounds for decoupling, with applications Ryan ODonnell, Yu Zhao Carnegie Mellon

A Novel Method for Minimization of Boolean Functions using Gray Code and development of a

Algebraic attacks and decomposition of Boolean functions Willi Meier 1 and Enes Pasalic 2 and

Graph-Based Algorithms for Boolean Function Manipulation Sofia Cassel March 9, 2012 Sofia

Perceptrons Sven Koenig, USC Russell and Norvig, 3 rd Edition, - PDF document

12/18/2019 Perceptrons Sven Koenig, USC Russell and Norvig, 3 rd Edition, Sections 18.7.1-18.7.4 These slides are new and can contain mistakes and typos. Please report them to Sven (skoenig@usc.edu). 1 Perceptrons We now study how to

Perceptrons Introduction: Neural Networks 1 The Perceptron 2 Using Perceptrons Perceptrons

Perceptrons Steven J Zeil Old Dominion Univ. Fall 2010 1 Introduction: Neural Networks The

CSC321 Lecture 5: Multilayer Perceptrons Roger Grosse Roger Grosse CSC321 Lecture 5: Multilayer

Machine Learning and Data Mining Multi-layer Perceptrons &amp; Neural Networks: Basics Kalev

CS7015 (Deep Learning) : Lecture 2 McCulloch Pitts Neuron, Thresholding Logic, Perceptrons,

CSC421/2516 Lecture 3: Multilayer Perceptrons Roger Grosse and Jimmy Ba Roger Grosse and Jimmy

Perceptrons Barna Saha The Machine Learning Model Training set: A training set consists of a

Neural networks Chapter 20, Section 5 Chapter 20, Section 5 1 Outline Brains Neural

Machine Learning &amp; Neural Networks CS16: Introduction to Data Structures &amp; Algorithms

CMP784 DEEP LEARNING Lecture #03 Multi-layer Perceptrons Aykut Erdem // Hacettepe University

CS480/680 Lecture 9: June 5, 2019 Perceptrons, Neural Networks [D] Chapt. 4, [HTF] Chapt. 11,

Linear Classifiers CS 188: Artificial Intelligence Perceptrons and Logistic Regression Pieter

CS485/685 Lecture 7: Jan 24, 2012 Perceptrons, Neural Networks [B]: Sections 4.1.7, 5.1 CS485/685

Intro to Artificial Neural Networks Oscar Maas @oscmansan Outline 1. Perceptrons 2.

Introduction to Neural Networks Slides from L. Lazebnik, B. Hariharan Outline Perceptrons

Neural Net Backpropagation 3/20/17 Recall: Limitations of Perceptrons vs. AND and OR are

Chapter 3: Trees Computer Structure &amp; Intro. to Digital Computers Dr. Guy Even Tel-Aviv

Boolean function analysis: beyond the hypercube Yuval Filmus Technion Israel Institute of

1. Boolean Algebra 1.1 Boolean Algebra Basics Verification Technology AND-operation

Professor: Alvin Chao The primitive data type boolean has two values: true and false . Boolean

Polynomial bounds for decoupling, with applications Ryan ODonnell, Yu Zhao Carnegie Mellon

A Novel Method for Minimization of Boolean Functions using Gray Code and development of a

Algebraic attacks and decomposition of Boolean functions Willi Meier 1 and Enes Pasalic 2 and

Graph-Based Algorithms for Boolean Function Manipulation Sofia Cassel March 9, 2012 Sofia

Machine Learning and Data Mining Multi-layer Perceptrons & Neural Networks: Basics Kalev

Machine Learning & Neural Networks CS16: Introduction to Data Structures & Algorithms

Chapter 3: Trees Computer Structure & Intro. to Digital Computers Dr. Guy Even Tel-Aviv