Learning: Nearest Neighbor, Perceptrons & Neural Nets - PowerPoint PPT Presentation

Learning: Nearest Neighbor, Perceptrons & Neural Nets Artificial Intelligence CSPP 56553 February 4, 2004

Nearest Neighbor Example II • Credit Rating: Name L R G/P – Classifier: Good / A 0 1.2 G Poor B 25 0.4 P – Features: C 5 0.7 G • L = # late payments/yr; D 20 0.8 P • R = Income/Expenses E 30 0.85 P F 11 1.2 G G 7 1.15 G H 15 0.8 P

Nearest Neighbor Example II Name L R G/P A 0 1.2 G 1 A F B 25 0.4 P G R E C 5 0.7 G H D C D 20 0.8 P E 30 0.85 P B F 11 1.2 G G 7 1.15 G 10 20 30 L H 15 0.8 P

Nearest Neighbor Example II Name L R G/P I 6 1.15 G 1 A F K J 22 0.45 P I G ?? E K 15 1.2 D H R C J B Distance Measure: Sqrt ((L1-L2)^2 + [sqrt(10)*(R1-R2)]^2)) 10 20 30 L - Scaled distance

Nearest Neighbor: Issues • Prediction can be expensive if many features • Affected by classification, feature noise – One entry can change prediction • Definition of distance metric – How to combine different features • Different types, ranges of values • Sensitive to feature selection

Efficient Implementations • Classification cost: – Find nearest neighbor: O(n) • Compute distance between unknown and all instances • Compare distances – Problematic for large data sets • Alternative: – Use binary search to reduce to O(log n)

Efficient Implementation: K-D Trees • Divide instances into sets based on features – Binary branching: E.g. > value – 2^d leaves with d split path = n • d= O(log n) – To split cases into sets, • If there is one element in the set, stop • Otherwise pick a feature to split on – Find average position of two middle objects on that dimension » Split remaining objects based on average position » Recursively split subsets

K-D Trees: Classification R > 0.825? Yes No L > 17.5? L > 9 ? No Yes Yes No R > 0.6? R > 0.75? R > 1.175 ? R > 1.025 ? No Yes No Yes No No Yes Yes Poor Good Good Poor Good Good Poor Good

Efficient Implementation: Parallel Hardware • Classification cost: – # distance computations • Const time if O(n) processors – Cost of finding closest • Compute pairwise minimum, successively • O(log n) time

Nearest Neighbor: Analysis • Issue: – What features should we use? • E.g. Credit rating: Many possible features – Tax bracket, debt burden, retirement savings, etc.. – Nearest neighbor uses ALL – Irrelevant feature(s) could mislead • Fundamental problem with nearest neighbor

Nearest Neighbor: Advantages • Fast training: – Just record feature vector - output value set • Can model wide variety of functions – Complex decision boundaries – Weak inductive bias • Very generally applicable

Summary: Nearest Neighbor • Nearest neighbor: – Training: record input vectors + output value – Prediction: closest training instance to new data • Efficient implementations • Pros: fast training, very general, little bias • Cons: distance metric (scaling), sensitivity to noise & extraneous features

Learning: Perceptrons Artificial Intelligence CSPP 56553 February 4, 2004

Agenda • Neural Networks: – Biological analogy • Perceptrons: Single layer networks • Perceptron training • Perceptron convergence theorem • Perceptron limitations • Conclusions

Neurons: The Concept Dendrites Axon Nucleus Cell Body Neurons: Receive inputs from other neurons (via synapses) When input exceeds threshold, “ fires” Sends output along axon to other neurons Brain: 10^11 neurons, 10^16 synapses

Artificial Neural Nets • Simulated Neuron: – Node connected to other nodes via links • Links = axon+synapse+link • Links associated with weight (like synapse) – Multiplied by output of node – Node combines input via activation function • E.g. sum of weighted inputs passed thru threshold • Simpler than real neuronal processes

Artificial Neural Net w x w Sum Threshold x + w x

Perceptrons • Single neuron-like element – Binary inputs – Binary outputs • Weighted sum of inputs > threshold

Perceptron Structure y w 0 w n w 1 w 3 w 2 x 0 =1 x 1 x 2 x 3 x n . . . n  1 if 0 > w i x ∑ i  = y i = 0  0 otherwise   x 0 w 0 compensates for threshold

Perceptron Example • Logical-OR: Linearly separable – 00: 0; 01: 1; 10: 1; 11: 1 x 2 x 2 + + + + 0 0 + + x 1 x 1 or or

Perceptron Convergence Procedure • Straight-forward training procedure – Learns linearly separable functions • Until perceptron yields correct output for all – If the perceptron is correct, do nothing – If the percepton is wrong, • If it incorrectly says “yes”, – Subtract input vector from weight vector • Otherwise, add input vector to weight vector

Perceptron Convergence Example • LOGICAL-OR: • Sample x0 x1 x2 Desired Output • 1 1 0 0 0 • 2 1 0 1 1 • 3 1 1 0 1 • 4 1 1 1 1 • Initial: w=(000);After S2, w=w+s2=(101) • Pass2: S1:w=w-s1=(001);S3:w=w+s3=(111) • Pass3: S1:w=w-s1=(011)

Perceptron Convergence Theorem n  • If there exists a vector W s.t. 1 if 0 > w i x ∑ i  = y • Perceptron training will find it i = 0  0 otherwise  • Assume ⋅ x > δ v r r  for all +ive examples x ... , = + + + ⋅ > δ w x x x v w k r r r r r r 1 2 k • ||w||^ 2 increases by at most ||x||^2, in each iteration • || w+x| |^2 <= || w|| ^2+||x||^2 <= k ||x||^2 k / δ x k • v.w/||w|| > <= 1 ( ) 2 1 δ / Converges in k <= O steps

Perceptron Learning • Perceptrons learn linear decision boundaries x 2 x 2 0 • E.g. 0 0 + 0 0 But not + + 0 0 + 0 + + + 0 + x 1 x 1 xor X1 X2 -1 -1 w1x1 + w2x2 < 0 1 -1 w1x1 + w2x2 > 0 => implies w1 > 0 1 1 w1x1 + w2x2 >0 => but should be false -1 1 w1x1 + w2x2 > 0 => implies w2 > 0

Perceptron Example • Digit recognition – Assume display= 8 lightable bars – Inputs – on/off + threshold – 65 steps to recognize “8”

Perceptron Summary • Motivated by neuron activation • Simple training procedure • Guaranteed to converge – IF linearly separable

Neural Nets • Multi-layer perceptrons – Inputs: real-valued – Intermediate “hidden” nodes – Output(s): one (or more) discrete-valued X1 Y1 X2 X3 Y2 X4 Inputs Hidden Hidden Outputs

Neural Nets • Pro: More general than perceptrons – Not restricted to linear discriminants – Multiple outputs: one classification each • Con: No simple, guaranteed training procedure – Use greedy, hill-climbing procedure to train – “Gradient descent”, “Backpropagation”

Solving the XOR Problem o 1 w 11 Network w 13 x 1 Topology: w 01 w 21 y 2 hidden nodes -1 w 23 w 12 w 03 1 output w 22 x 2 -1 w 02 o 2 Desired behavior: -1 x1 x2 o1 o2 y Weights: 0 0 0 0 0 w11= w12=1 1 0 0 1 1 w21=w22 = 1 0 1 0 1 1 w01=3/2; w02=1/2; w03=1/2 1 1 1 1 0 w13=-1; w23=1

Neural Net Applications • Speech recognition • Handwriting recognition • NETtalk: Letter-to-sound rules • ALVINN: Autonomous driving

ALVINN • Driving as a neural network • Inputs: – Image pixel intensities • I.e. lane lines • 5 Hidden nodes • Outputs: – Steering actions • E.g. turn left/right; how far • Training: – Observe human behavior: sample images, steering

Backpropagation • Greedy, Hill-climbing procedure – Weights are parameters to change – Original hill-climb changes one parameter/step • Slow – If smooth function, change all parameters/step • Gradient descent – Backpropagation: Computes current output, works backward to correct error

Producing a Smooth Function • Key problem: – Pure step threshold is discontinuous • Not differentiable • Solution: – Sigmoid (squashed ‘s’ function): Logistic fn 1 n ( ) = = z w x s z ∑ i i 1 + − z e i

Neural Net Training • Goal: – Determine how to change weights to get correct output • Large change in weight to produce large reduction in error • Approach: • Compute actual output: o • Compare to desired output: d • Determine effect of each weight w on error = d-o • Adjust weights

Learning: Nearest Neighbor, Perceptrons & Neural Nets - PowerPoint PPT Presentation

Learning: Nearest Neighbor, Perceptrons & Neural Nets Artificial Intelligence CSPP 56553 February 4, 2004 Nearest Neighbor Example II Credit Rating: Name L R G/P Classifier: Good / A 0 1.2 G Poor B 25

Nearest Neighbor and Locality-Sensitive Hashing Nearest Neighbor Set Similarity

Perceptrons Introduction: Neural Networks 1 The Perceptron 2 Using Perceptrons Perceptrons

CSCI 447/547 MACHINE LEARNING Outline Nearest Neighbor K-Nearest Neighbor Algorithm

NEAREST NEIGHBOR RULE Jeff Robble, Brian Renzenbrink, Doug Roberts Nearest Neighbor Rule

Learning From Data Lecture 16 Similarity and Nearest Neighbor Similarity Nearest Neighbor M.

Nearest Neighbor Classification Machine Learning 1 This lecture K-nearest neighbor

The Nearest Neighbor Algorithm The Nearest Neighbor Algorithm Hypothesis Space Hypothesis Space

High-Dimensional Nearest Neighbor Search High-Dimensional Nearest Neighbor Search Who?

Proximity in the Age of Distraction: Robust Approximate Nearest Neighbor Search Sariel Har-Peled

Graph-based Nearest Neighbor Search: From Practice to Theory Liudmila Prokhorenkova, Aleksandr

Simultaneous Nearest Neighbor Search Piotr Indyk Robert Kleinberg MIT Cornell Sepideh

BAYES AND NEAREST NEIGHBOR BAYES AND NEAREST NEIGHBOR CLASSIFIERS CLASSIFIERS Matthieu R Bloch

Simple and Fast Nearest Neighbor Search Marcel Birn, Manuel Holtgrewe, Peter Sanders , Johannes

Perceptrons Steven J Zeil Old Dominion Univ. Fall 2010 1 Introduction: Neural Networks The

Conflict nets: Efficient locally canonical MALL proof nets Dominic J. D. Hughes and Willem

9/28/2009 Nearest Neighbor Queries What are the two nearest stars to Andromeda? Reverse

Pricing of Accreting Swaptions using QuantLib Dr. Andr Miemiec, 13./14. Nov. 2013 Agenda 1.

Predicting Chroma from Luma using Frequency Domain Intra Prediction in Codecs Based on Lapped

Reflecting on the Goal and Baseline of Exascale Computing Thomas C. Schulthess | T. Schulthess

Part 5 Usability and Security Cognitive Errors, Usability vs. Security, Groupware Antonio Cerone

Introduction Harsh realities of network analytics netbeam Demo Technology

Pretty Good Democracy Peter Y A Ryan University of Luxembourg Vanessa Teague University of

Software Testing Testing 1 Background Main objectives of a project: High Quality & High

NEPR208 - Adaptation properties and mechanisms Functional advantages in properties of a neural

Learning: Nearest Neighbor, Perceptrons & Neural Nets - PowerPoint PPT Presentation

Learning: Nearest Neighbor, Perceptrons & Neural Nets Artificial Intelligence CSPP 56553 February 4, 2004 Nearest Neighbor Example II Credit Rating: Name L R G/P Classifier: Good / A 0 1.2 G Poor B 25

Nearest Neighbor and Locality-Sensitive Hashing Nearest Neighbor Set Similarity

Perceptrons Introduction: Neural Networks 1 The Perceptron 2 Using Perceptrons Perceptrons

CSCI 447/547 MACHINE LEARNING Outline Nearest Neighbor K-Nearest Neighbor Algorithm

NEAREST NEIGHBOR RULE Jeff Robble, Brian Renzenbrink, Doug Roberts Nearest Neighbor Rule

Learning From Data Lecture 16 Similarity and Nearest Neighbor Similarity Nearest Neighbor M.

Nearest Neighbor Classification Machine Learning 1 This lecture K-nearest neighbor

The Nearest Neighbor Algorithm The Nearest Neighbor Algorithm Hypothesis Space Hypothesis Space

High-Dimensional Nearest Neighbor Search High-Dimensional Nearest Neighbor Search Who?

Proximity in the Age of Distraction: Robust Approximate Nearest Neighbor Search Sariel Har-Peled

Graph-based Nearest Neighbor Search: From Practice to Theory Liudmila Prokhorenkova, Aleksandr

Simultaneous Nearest Neighbor Search Piotr Indyk Robert Kleinberg MIT Cornell Sepideh

BAYES AND NEAREST NEIGHBOR BAYES AND NEAREST NEIGHBOR CLASSIFIERS CLASSIFIERS Matthieu R Bloch

Simple and Fast Nearest Neighbor Search Marcel Birn, Manuel Holtgrewe, Peter Sanders , Johannes

Perceptrons Steven J Zeil Old Dominion Univ. Fall 2010 1 Introduction: Neural Networks The

Conflict nets: Efficient locally canonical MALL proof nets Dominic J. D. Hughes and Willem

9/28/2009 Nearest Neighbor Queries What are the two nearest stars to Andromeda? Reverse

Pricing of Accreting Swaptions using QuantLib Dr. Andr Miemiec, 13./14. Nov. 2013 Agenda 1.

Predicting Chroma from Luma using Frequency Domain Intra Prediction in Codecs Based on Lapped

Reflecting on the Goal and Baseline of Exascale Computing Thomas C. Schulthess | T. Schulthess

Part 5 Usability and Security Cognitive Errors, Usability vs. Security, Groupware Antonio Cerone

Introduction Harsh realities of network analytics netbeam Demo Technology

Pretty Good Democracy Peter Y A Ryan University of Luxembourg Vanessa Teague University of

Software Testing Testing 1 Background Main objectives of a project: High Quality &amp; High

NEPR208 - Adaptation properties and mechanisms Functional advantages in properties of a neural

Software Testing Testing 1 Background Main objectives of a project: High Quality & High