outline cs 188 artificial intelligence
play

Outline CS 188: Artificial Intelligence Generative vs. - PDF document

Outline CS 188: Artificial Intelligence Generative vs. Discriminative Binary Linear Classifiers Perceptron Lecture 21: Perceptrons Multi-class Linear Classifiers Multi-class Perceptron Fixing the Perceptron:


  1. Outline CS 188: Artificial Intelligence § Generative vs. Discriminative § Binary Linear Classifiers § Perceptron Lecture 21: Perceptrons § Multi-class Linear Classifiers § Multi-class Perceptron § Fixing the Perceptron: MIRA Pieter Abbeel – UC Berkeley Many slides adapted from Dan Klein. § Support Vector Machines* Classification: Feature Vectors Generative vs. Discriminative § Generative classifiers: § E.g. naïve Bayes § A causal model with evidence variables Hello, SPAM § Query model for causes given evidence # free : 2 YOUR_NAME : 0 Do you want free printr MISSPELLED : 2 or cartriges? Why pay more FROM_FRIEND : 0 when you can get them ... + § Discriminative classifiers: ABSOLUTELY FREE! Just § No causal model, no Bayes rule, often no probabilities at all! PIXEL-7,12 : 1 § Try to predict the label Y directly from X “ 2 ” PIXEL-7,13 : 0 ... § Robust, accurate with varied features NUM_LOOPS : 1 ... § Loosely: mistake driven rather than model driven 7 Outline Some (Simplified) Biology § Very loose inspiration: human neurons § Generative vs. Discriminative § Binary Linear Classifiers § Perceptron § Multi-class Linear Classifiers § Multi-class Perceptron § Fixing the Perceptron: MIRA § Support Vector Machines* 9 1

  2. Linear Classifiers Classification: Weights § Binary case: compare features to a weight vector § Inputs are feature values § Learning: figure out the weight vector from examples § Each feature has a weight § Sum is the activation # free : 4 YOUR_NAME :-1 # free : 2 MISSPELLED : 1 YOUR_NAME : 0 FROM_FRIEND :-3 MISSPELLED : 2 ... FROM_FRIEND : 0 ... § If the activation is: w 1 f 1 § Positive, output +1 w 2 Σ >0? f 2 # free : 0 w 3 YOUR_NAME : 1 § Negative, output -1 f 3 MISSPELLED : 1 Dot product positive FROM_FRIEND : 1 means the positive class ... 10 Linear Classifiers Mini Exercise 2 --- Bias Term Linear Classifiers Mini Exercise # free : 2 # free : 4 # free : 1 Bias : 1 Bias : 1 Bias : 1 YOUR_NAME : 0 YOUR_NAME : 1 # free : 2 # free : 4 YOUR_NAME : 1 # free : 1 YOUR_NAME: 0 YOUR_NAME: 1 YOUR_NAME: 1 -1 -3 2 -1 2 § 1. Draw the 4 feature vectors and the weight vector w § 2. Which feature vectors are classified as +? As - ? § 1. Draw the 4 feature vectors and the weight vector w § 3. Draw the line separating feature vectors being classified + and -. § 2. Which feature vectors are classified as +? As - ? § 3. Draw the line separating feature vectors being classified + and -. Binary Decision Rule Outline § In the space of feature vectors § Generative vs. Discriminative § Examples are points § Binary Linear Classifiers § Any weight vector is a hyperplane § One side corresponds to Y=+1 § Perceptron: how to find the weight vector w from data. § Other corresponds to Y=-1 money 2 § Multi-class Linear Classifiers +1 = SPAM § Multi-class Perceptron 1 BIAS : -3 free : 4 § Fixing the Perceptron: MIRA money : 2 0 ... -1 = HAM 0 1 free § Support Vector Machines* 2

  3. Binary Perceptron Update Outline § Generative vs. Discriminative § Start with zero weights § For each training instance: § Binary Linear Classifiers § Classify with current weights § Perceptron § Multi-class Linear Classifiers § If correct (i.e., y=y*), no change! § Multi-class Perceptron § If wrong: adjust the weight vector by adding or subtracting the § Fixing the Perceptron: MIRA feature vector. Subtract if y* is -1. § Support Vector Machines* 18 [demo] Example Exercise --- Which Multiclass Decision Rule Category is Chosen? § If we have multiple classes: BIAS : 1 win : 1 § A weight vector for each class: “ win the vote ” game : 0 vote : 1 the : 1 ... § Score (activation) of a class y: § Prediction highest score wins BIAS : -2 BIAS : 1 BIAS : 2 win : 4 win : 2 win : 0 game : 4 game : 0 game : 2 vote : 0 vote : 4 vote : 0 the : 0 the : 0 the : 0 ... ... ... Binary = multiclass where the negative class has weight zero Exercise: Multiclass linear classifier for Outline 2 classes and binary linear classifier § Generative vs. Discriminative § Consider the multiclass linear classifier for two classes with -1 1 2 2 § Binary Linear Classifiers § Is there an equivalent binary linear classifier, i.e., one that classifies all points x = (x 1 , x 2 ) the same way? § Perceptron § Multi-class Linear Classifiers § Multi-class Perceptron: learning the weight vectors w i from data § Fixing the Perceptron: MIRA § Support Vector Machines* 3

  4. Learning Multiclass Perceptron Example § Start with zero weights “ win the vote ” § Pick up training instances one by one “ win the election ” § Classify with current weights “ win the game ” § If correct, no change! § If wrong: lower score of wrong answer, raise score of right answer BIAS : BIAS : BIAS : win : win : win : game : game : game : vote : vote : vote : the : the : the : ... ... ... 24 Examples: Perceptron Outline § Generative vs. Discriminative § Separable Case § Binary Linear Classifiers § Perceptron § Multi-class Linear Classifiers § Multi-class Perceptron: learning the weight vectors w i from data § Fixing the Perceptron: MIRA § Support Vector Machines* 26 Properties of Perceptrons Examples: Perceptron Separable § Separability: some parameters get § Non-Separable Case the training set perfectly correct § Convergence: if the training is separable, perceptron will eventually converge (binary case) Non-Separable § Mistake Bound: the maximum number of mistakes (binary case) related to the margin or degree of separability 29 30 4

  5. Problems with the Perceptron Fixing the Perceptron § Noise: if the data isn ’ t § Idea: adjust the weight update to separable, weights might thrash mitigate these effects § Averaging weight vectors over time can help (averaged perceptron) § MIRA*: choose an update size that fixes the current mistake … § … but, minimizes the change to w § Mediocre generalization: finds a “ barely ” separating solution § Overtraining: test / held-out accuracy usually rises, then falls § Overtraining is a kind of overfitting § The +1 helps to generalize * Margin Infused Relaxed Algorithm Minimum Correcting Update Maximum Step Size § In practice, it ’ s also bad to make updates that are too large § Example may be labeled incorrectly § You may not have enough features § Solution: cap the maximum possible value of τ with some constant C § Corresponds to an optimization that assumes non-separable data min not τ =0, or would not § Usually converges faster than perceptron have made an error, so min § Usually better, especially on noisy data 35 will be where equality holds Outline Linear Separators § Generative vs. Discriminative § Which of these linear separators is optimal? § Binary Linear Classifiers § Perceptron § Multi-class Linear Classifiers § Multi-class Perceptron: learning the weight vectors w i from data § Fixing the Perceptron: MIRA § Support Vector Machines* 37 5

  6. Mini-Exercise: Give Example Dataset that Would be Overfit Support Vector Machines by SVM, MIRA and running perceptron till convergence § Maximizing the margin: good according to intuition, theory, practice § Only support vectors matter; other training examples are ignorable § Support vector machines (SVMs) find the separator with max margin § Basically, SVMs are MIRA where you optimize over all examples at once MIRA SVM § Could running perceptron less steps lead to better generalization? Classification: Comparison Extension: Web Search § Naïve Bayes x = “ Apple Computers ” § Information retrieval: § Builds a model training data § Given information needs, § Gives prediction probabilities produce information § Strong assumptions about feature independence § Includes, e.g. web search, § One pass through data (counting) question answering, and classic IR § Perceptrons / MIRA: § Makes less assumptions about data § Web search: not exactly § Mistake-driven learning classification, but rather § Multiple passes through data (prediction) ranking § Often more accurate 40 Feature-Based Ranking Perceptron for Ranking x = “ Apple Computers ” § Inputs § Candidates § Many feature vectors: x, § One weight vector: § Prediction: § Update (if wrong): x, 6

  7. Pacman Apprenticeship! § Examples are states s “ correct ” § Candidates are pairs (s,a) action a* § “ Correct ” actions: those taken by expert § Features defined over (s,a) pairs: f(s,a) § Score of a q-state (s,a) given by: § How is this VERY different from reinforcement learning? 7

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend