CS440/ECE448 Lecture 22: Including Slides by Svetlana Lazebnik, - PowerPoint PPT Presentation

Mark Hasegawa-Johnson, 3/2020 CS440/ECE448 Lecture 22: Including Slides by Svetlana Lazebnik, 10/2016 Linear Classifiers License: CC-BY 4.0

Linear Classifiers • Classifiers • Perceptron • Linear classifiers in general • Logistic regression

Classifiers example: dogs versus cats Can you write a program that can tell which ones are dogs, and which ones are cats? By YellowLabradorLooking_new.jpg: *derivative work: Djmirko (talk)YellowLabradorLooking.jpg: By Alvesgaspar - Top left:File:Cat August 2010-4.jpg by AlvesgasparTop middle:File:Gustav chocolate.jpg by User:HabjGolden_Retriever_Sammy.jpg: Pharaoh HoundCockerpoo.jpg: ALMMLonghaired_yorkie.jpg: Ed Garcia from Martin BahmannTop right:File:Orange tabby cat sitting on fallen leaves-Hisashi-01A.jpg by HisashiBottom United StatesBoxer_female_brown.jpg: Flickr user boxercabMilù_050.JPG: AleRBeagle1.jpg: left:File:Siam lilacpoint.jpg by Martin BahmannBottom middle:File:Felis catus-cat on snow.jpg by TobycatBasset_Hound_600.jpg: ToBNewfoundland_dog_Smoky.jpg: Flickr user DanDee Shotsderivative work: Von.grzankaBottom right:File:Sheba1.JPG by Dovenetel, CC BY-SA 3.0, December21st2012Freak (talk) - https://commons.wikimedia.org/w/index.php?curid=17960205 YellowLabradorLooking_new.jpgGolden_Retriever_Sammy.jpgCockerpoo.jpgLonghaired_yorkie.jpgBoxer_female_br own.jpgMilù_050.JPGBeagle1.jpgBasset_Hound_600.jpgNewfoundland_dog_Smoky.jpg, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=10793219

Classifiers example: dogs versus cats Can you write a program that can tell which ones are dogs, and which ones are cats? Idea #1: Cats are smaller than dogs. Our robot will pick up the animal and weigh it. If it weighs more than 20 pounds, call it a dog. Otherwise, call it a cat.

Classifiers example: dogs versus cats Can you write a program that can tell which ones are dogs, and which ones are cats? Oops. CC BY-SA 4.0, https://commons.wikimedia.o rg/w/index.php?curid=550843 03

Classifiers example: dogs versus cats Can you write a program that can tell which ones are dogs, and which ones are cats? Idea #2: Dogs are tame, cats are wild. We’ll try the following experiment: 40 different people call the animal’s name. Count how many times the animal comes when called. If the animal comes when called, more than 20 times out of 40, it’s a dog. If not, it’s a cat.

Classifiers example: dogs versus cats Can you write a program that can tell which ones are dogs, and which ones are cats? Oops. By Smok Bazyli - Own work, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=16864492

Classifiers example: dogs versus cats Can you write a program that can tell which ones are dogs, and which ones are cats? Idea #3: 𝑦 ! = # times the animal comes when called (out of 40). 𝑦 " = weight of the animal, in pounds. If 0.5𝑦 ! + 0. 5𝑦 " > 20 , call it a dog. Otherwise, call it a cat. This is called a “linear classifier” because 0.5𝑦 ! + 0. 5𝑦 " = 20 is the equation for a line.

Linear Classifiers • Classifiers • Perceptron • Linear classifiers in general • Logistic regression

• 1909: Williams discovers that The Giant Squid Axon the giant squid has a giant neuron (axon 1mm thick) • 1939: Young finds a giant synapse (fig. shown: Llinás, 1999, via Wikipedia). Hodgkin & Huxley put in voltage clamps. • 1952: Hodgkin & Huxley publish an electrical current model for the generation of binary action potentials from real-valued inputs.

• 1959: Rosenblatt is granted a Perceptron patent for the “perceptron,” an electrical circuit model of a neuron.

Perceptron model: action potential Perceptron = signum(affine function of the features) Input y* = sgn(w 1 x 1 + w 2 x 2 + … + w D x D + b) Weights = sgn( 𝑥 ! ⃗ 𝑦 ) x 1 w 1 x 2 Where 𝑥 = [𝑥 " , … , 𝑥 # , 𝑐] ! w 2 Output: sgn( w × x + b) 𝑦 = [𝑦 " , … , 𝑦 # , 1] ! and ⃗ x 3 w 3 . . . Can incorporate bias as w D component of the weight x D vector by always including a feature with value set to 1

Perceptron Rosenblatt’s big innovation: the perceptron learns from examples. • Initialize weights randomly • Cycle through training examples in multiple passes ( epochs ) • For each training example: • If classified correctly, do nothing • If classified incorrectly, update weights By Elizabeth Goodspeed - Own work, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=40188333

Perceptron For each training instance 𝒚 with ground truth label 𝑧 ∈ {−1,1} : • Classify with current weights: 𝑧 ∗ = sgn( 𝑥 ! ⃗ 𝑦 ) • Update weights: • if 𝑧 = 𝑧 ∗ then do nothing • If 𝑧 ≠ 𝑧 ∗ then 𝑥 = 𝑥 + ηy ⃗ 𝑦 • η (eta) is a “learning rate.” More about that later.

Perceptron training example: dogs vs. cats • Let’s start with the rule “if it comes when called (by at least 20 different people out of 40), it’s a dog.” • So if 𝑦 ! = # times it comes when called, then the rule is: If 𝑦 ! − 20 > 0 , call it a dog. In other words, 𝑧 ∗ = sgn(𝑥 $ ⃗ 𝑦) , where 𝑥 $ = 1,0, −20 , and ⃗ 𝑦 $ = [𝑦 ! , 𝑦 " , 1] . sgn 𝑥 # ⃗ 𝑦 " 𝑦 = 1 𝑥 # = 1,0, −20 𝑦 ! sgn 𝑥 # ⃗ 𝑦 = −1

Perceptron training example: dogs vs. cats • The Presa Canario gets misclassified as a cat ( 𝑧 = 1, but 𝑧 ∗ = −1 ) because it only obeys its trainer ( 𝑦 ! = 1 ), and nobody else. But we notice that the Presa Canario , though it rarely comes when called, is very large ( 𝑦 " = 100 pounds), so 𝑦 $ = 𝑦 ! , 𝑦 " , 1 = [1,100,1] . we have ⃗ sgn 𝑥 # ⃗ 𝑦 " 𝑦 = 1 𝑥 # = 1,0, −20 sgn 𝑥 # ⃗ 𝑦 ! 𝑦 = −1

Perceptron training example: dogs vs. cats • The Presa Canario gets misclassified as a cat ( 𝑧 = 1, but 𝑧 ∗ = −1 ) because it only obeys its trainer ( 𝑦 ! = 1 ), and nobody else. But we notice that the Presa Canario , though it rarely comes when called, is very large ( 𝑦 " = 100 pounds), so 𝑦 $ = 𝑦 ! , 𝑦 " , 1 = [1,100,1] . we have ⃗ • So we update: 𝑥 = 𝑥 + 𝑧 ⃗ 𝑦 = 1,0, −20 + 1,100,1 = [2,100, −19] 𝑦 " sgn 𝑥 # ⃗ 𝑦 = 1 𝑥 # = 2,100, −19 sgn 𝑥 # ⃗ 𝑦 ! 𝑦 = −1

Perceptron training example: dogs vs. cats • The Maltese , though it’s small ( 𝑦 " = 10 pounds), is very tame ( 𝑦 ! = 40 ): ⃗ 𝑦 = 𝑦 ! , 𝑦 " , 1 = 40,10,1 . • But it’s correctly classified! 𝑧 ∗ = sgn 𝑥 $ ⃗ 𝑦 = sgn 2×40 + 100×10 − 19 = + 1 , which is equal to 𝑧 = 1 . • So the 𝑥 vector is unchanged. 𝑦 " sgn 𝑥 # ⃗ 𝑦 = 1 𝑥 # = 2,100, −19 sgn 𝑥 # ⃗ 𝑦 ! 𝑦 = −1

Perceptron training example: dogs vs. cats • The Maine Coon cat is big ( 𝑦 " = 20 pounds: ⃗ 𝑦 = 0,20,1 ), so it gets misclassified as a dog (true label is 𝑧 = −1 =“cat,” but the classifier thinks 𝑧 ∗ = 1 =“dog”). 𝑦 " sgn 𝑥 # ⃗ 𝑦 = 1 𝑥 # = 2,100, −19 sgn 𝑥 # ⃗ 𝑦 ! 𝑦 = −1

Perceptron training example: dogs vs. cats • The Maine Coon cat is big ( 𝑦 " = 20 pounds: ⃗ 𝑦 = 0,20,1 ), so it gets misclassified as a dog (true label is 𝑧 = −1 =“cat,” but the classifier thinks 𝑧 ∗ = 1 =“dog”). • So we update: 𝑥 = 𝑥 + 𝑧 ⃗ 𝑦 = 2,100, −19 + (−1)× 0,20,1 = [2,80, −20] 𝑦 " sgn 𝑥 # ⃗ 𝑦 = 1 𝑥 # = 2,80, −20 sgn 𝑥 # ⃗ 𝑦 ! 𝑦 = −1

Perceptron: Proof of Convergence • Definition: linearly separable: • A dataset is linearly separable if and only if there exists a vector, 𝑥 , such that the ground truth label of each token is given by 𝑧 = sgn 𝑥 " ⃗ 𝑦 . • Theorem (proved in the next few slides): If the data are linearly separable, then the perceptron learning algorithm converges to a correct solution, even with a learning rate of η=1.

Perceptron: Proof of Convergence Suppose the data are linearly separable. For example, suppose red dots are the class y=1, and blue dots are the class y=-1: 𝑦 $ 𝑦 #

Perceptron: Proof of Convergence Instead of plotting ⃗ 𝑦 , plot y ⃗ 𝑦 . The red dots are unchanged; the blue dots are multiplied by -1. • Since the original data were linearly separable, the new data are all in the same half of the feature space. 𝑧𝑦 $ 𝑧𝑦 #

Perceptron: Proof of Convergence Suppose we start out with some initial guess, 𝑥 , that makes mistakes. In other words, sgn 𝑥 " (𝑧 ⃗ 𝑦) = −1 for some of the tokens. 𝑥 𝑧𝑦 $ 𝑧𝑦 # Oops! An error.

Perceptron: Proof of Convergence In that case, 𝑥 will be updated by adding 𝑧 ⃗ 𝑦 to it. Old 𝑥 y ⃗ 𝑦 New 𝑥 𝑧𝑦 $ 𝑧𝑦 #

Perceptron: Proof of Convergence If there is any 𝑥 such that sgn 𝑥 " (𝑧 ⃗ 𝑦) = 1 for all tokens, then this procedure will eventually find it. • If the data are linearly separable, the perceptron algorithm converges to a correct solution, even with η=1. New 𝑥 𝑧𝑦 $ 𝑧𝑦 #

CS440/ECE448 Lecture 22: Including Slides by Svetlana Lazebnik, - PowerPoint PPT Presentation

Mark Hasegawa-Johnson, 3/2020 CS440/ECE448 Lecture 22: Including Slides by Svetlana Lazebnik, 10/2016 Linear Classifiers License: CC-BY 4.0 Linear Classifiers Classifiers Perceptron Linear classifiers in general Logistic

CS440/ECE448: Artificial Intelligence Lecture 1: What is AI? CS440/ECE448 Lecture 1: What is AI?

Lecture 1: What is AI? Julia Hockenmaier juliahmr@illinois.edu Welcome to CS440/ECE448

CS440/ECE448: Artificial Intelligence Lecture 1: Course Intro Course Intro: Syllabus Web

CS 440/ECE448 Lecture 19: Bayes Net Inference Mark Hasegawa-Johnson, 3/2019 modified by Julia

CS440/ECE448 Lecture 26: Speech Mark Hasegawa-Johnson, 4/17/2019, CC-By 3.0 Outline Human

CS440/ECE448 Lecture 27: Societal Impacts of AI Slides by Svetlana Lazebnik, 12/2017 Image

CS440/ECE448 Lecture 8: Two-Player Games Slides by Svetlana Lazebnik 9/2016 Modified by Mark

CS440/ECE448 Lecture 21: Markov Decision Processes Slides by Svetlana Lazebnik, 11/2016 Modified

CS440/ECE448: Artificial Intelligence Lecture 2: History and Themes Slides by Svetlana Lazebnik,

CS440/ECE448 Lecture 12: Stochastic Games, Stochastic Search, and Learned Evaluation Functions

CS440/ECE448 Lecture 28: Review I Final Exam Mon, May 6, 9:3010:45 Covers all lectures after

CS440/ECE448 Lecture 15: Bayesian Networks By Mark Hasegawa-Johnson, 2/2020 With some slides by

CS440/ECE448 Lecture 29: Review II Final Exam Mon, May 6, 9:3010:45 Covers all lectures

CS440/ECE448 Lecture 12: Stochastic Games, Stochastic Search, and Learned Evaluation Functions

CS440/ECE448 Lecture 10: Two-Player Games Slides by Mark Hasegawa-Johnson & Svetlana

CS440/ECE448 Lecture 3: Search Order Slides by Mark Hasegawa-Johnson, 1/2020 Including some

Start N, A, X- #2 Finish N, A, X- #3 Halt Halt Sit N, A, X- #4 Halt Halt Sit Down N, A,

TXN/SEC CPU CORES TXN/SEC CPU CORES TXN/SEC CPU CORES TXN/SEC CPU CORES TXN/SEC CPU CORES

COMP200 INTERFACES OOP using Java, from slides by Shayan Javed 2 Interfaces 3 ANIMAL

A Supertag-Context Model for Weakly-Supervised CCG Parser Learning Dan Garrette U. Washington

Office Of fice of of Hou Housing sing Couns Counseling eling Facilitated by Booth

Assurance Introducing composites of DOGWOOD and BIRCH/CEDAR in EGI and beyond David Groep

Forest Protocol Forest Protocol Protocol Update Effort Protocol Update Effort Goals and

Transforming the Workplace Through Intentionality Lakesha McDay Dogwood Health Trust April 30,

CS440/ECE448 Lecture 22: Including Slides by Svetlana Lazebnik, - PowerPoint PPT Presentation

Mark Hasegawa-Johnson, 3/2020 CS440/ECE448 Lecture 22: Including Slides by Svetlana Lazebnik, 10/2016 Linear Classifiers License: CC-BY 4.0 Linear Classifiers Classifiers Perceptron Linear classifiers in general Logistic

CS440/ECE448: Artificial Intelligence Lecture 1: What is AI? CS440/ECE448 Lecture 1: What is AI?

Lecture 1: What is AI? Julia Hockenmaier juliahmr@illinois.edu Welcome to CS440/ECE448

CS440/ECE448: Artificial Intelligence Lecture 1: Course Intro Course Intro: Syllabus Web

CS 440/ECE448 Lecture 19: Bayes Net Inference Mark Hasegawa-Johnson, 3/2019 modified by Julia

CS440/ECE448 Lecture 26: Speech Mark Hasegawa-Johnson, 4/17/2019, CC-By 3.0 Outline Human

CS440/ECE448 Lecture 27: Societal Impacts of AI Slides by Svetlana Lazebnik, 12/2017 Image

CS440/ECE448 Lecture 8: Two-Player Games Slides by Svetlana Lazebnik 9/2016 Modified by Mark

CS440/ECE448 Lecture 21: Markov Decision Processes Slides by Svetlana Lazebnik, 11/2016 Modified

CS440/ECE448: Artificial Intelligence Lecture 2: History and Themes Slides by Svetlana Lazebnik,

CS440/ECE448 Lecture 12: Stochastic Games, Stochastic Search, and Learned Evaluation Functions

CS440/ECE448 Lecture 28: Review I Final Exam Mon, May 6, 9:3010:45 Covers all lectures after

CS440/ECE448 Lecture 15: Bayesian Networks By Mark Hasegawa-Johnson, 2/2020 With some slides by

CS440/ECE448 Lecture 29: Review II Final Exam Mon, May 6, 9:3010:45 Covers all lectures

CS440/ECE448 Lecture 12: Stochastic Games, Stochastic Search, and Learned Evaluation Functions

CS440/ECE448 Lecture 10: Two-Player Games Slides by Mark Hasegawa-Johnson &amp; Svetlana

CS440/ECE448 Lecture 3: Search Order Slides by Mark Hasegawa-Johnson, 1/2020 Including some

Start N, A, X- #2 Finish N, A, X- #3 Halt Halt Sit N, A, X- #4 Halt Halt Sit Down N, A,

TXN/SEC CPU CORES TXN/SEC CPU CORES TXN/SEC CPU CORES TXN/SEC CPU CORES TXN/SEC CPU CORES

COMP200 INTERFACES OOP using Java, from slides by Shayan Javed 2 Interfaces 3 ANIMAL

A Supertag-Context Model for Weakly-Supervised CCG Parser Learning Dan Garrette U. Washington

Office Of fice of of Hou Housing sing Couns Counseling eling Facilitated by Booth

Assurance Introducing composites of DOGWOOD and BIRCH/CEDAR in EGI and beyond David Groep

Forest Protocol Forest Protocol Protocol Update Effort Protocol Update Effort Goals and

Transforming the Workplace Through Intentionality Lakesha McDay Dogwood Health Trust April 30,

CS440/ECE448 Lecture 10: Two-Player Games Slides by Mark Hasegawa-Johnson & Svetlana