Computational Learning Theory: Agnostic Learning Machine Learning - PowerPoint PPT Presentation

Computational Learning Theory: Agnostic Learning Machine Learning 1 Slides based on material from Dan Roth, Avrim Blum, Tom Mitchell and others

This lecture: Computational Learning Theory • The Theory of Generalization • Probably Approximately Correct (PAC) learning • Positive and negative learnability results • Agnostic Learning • Shattering and the VC dimension 2

This lecture: Computational Learning Theory • The Theory of Generalization • Probably Approximately Correct (PAC) learning • Positive and negative learnability results • Agnostic Learning • Shattering and the VC dimension 3

So far we have seen… • The general setting for batch learning • PAC learning and Occam’s Razor – How good will a classifier that is consistent on a training set be? • Assumptions so far: 1. Training and test examples come from the same distribution 2. The hypothesis space is finite. 3. For any concept, there is some function in the hypothesis space that is consistent with the training set 4

So far we have seen… • The general setting for batch learning • PAC learning and Occam’s Razor – How good will a classifier that is consistent on a training set be? • Assumptions so far: 1. Training and test examples come from the same distribution 2. The hypothesis space is finite. 3. For any concept, there is some function in the hypothesis space that is consistent with the training set Is the second assumption reasonable? 5

So far we have seen… • The general setting for batch learning • PAC learning and Occam’s Razor – How good will a classifier that is consistent on a training set be? • Assumptions so far: 1. Training and test examples come from the same distribution 2. The hypothesis space is finite. 3. For any concept, there is some function in the hypothesis space that is consistent with the training set Let’s look at the last assumption. Is it reasonable? 6

What is agnostic learning? • So far, we have assumed that the learning algorithm could find the true concept • What if: We are trying to learn a concept f using hypotheses in H, but f Ï H – That is C is not a subset of H – This setting is called agnostic learning – Can we say something about sample complexity? More realistic setting than before 7

What is agnostic learning? • So far, we have assumed that the learning algorithm could find the true concept H C • What if: We are trying to learn a concept f using hypotheses in H, but f Ï H – That is C is not a subset of H – This setting is called agnostic learning – Can we say something about sample complexity? More realistic setting than before 8

What is agnostic learning? • So far, we have assumed that the learning algorithm could find the true concept H • What if: We are trying to learn a concept f using hypotheses in H, but f Ï H C – That is C is not a subset of H – This setting is called agnostic learning – Can we say something about sample complexity? More realistic setting than before 9

Learn a concept f using Agnostic Learning hypotheses in H, but f Ï H Are we guaranteed that training error will be zero? – No . There may be no consistent hypothesis in the hypothesis space! We can find a classifier ℎ ∈ 𝐼 that has low training error 𝑔 𝑦 ≠ ℎ 𝑦 : 𝑦 ∈ 𝑇 err ! ℎ = 𝑛 This is the fraction of training examples that are misclassified 10

Learn a concept f using Agnostic Learning hypotheses in H, but f Ï H Are we guaranteed that training error will be zero? – No . There may be no consistent hypothesis in the hypothesis space! We can find a classifier ℎ ∈ 𝐼 that has low training error 𝑔 𝑦 ≠ ℎ 𝑦 : 𝑦 ∈ 𝑇 err ! ℎ = 𝑛 This is the fraction of training examples that are misclassified 11

Learn a concept f using Agnostic Learning hypotheses in H, but f Ï H We can find a classifier ℎ ∈ 𝐼 that has low training error 𝑔 𝑦 ≠ ℎ 𝑦 : 𝑦 ∈ 𝑇 err ! ℎ = 𝑛 What we want : A guarantee that a hypothesis with small training error will have a good accuracy on unseen examples err " ℎ = Pr #~" 𝑔 𝑦 ≠ ℎ 𝑦 12

We will use Tail bounds for analysis How far can a random variable get from its mean? 13

We will use Tail bounds for analysis How far can a random variable get from its mean? Tails of these distributions 14

Bounding probabilities Law of large numbers: As we collect more samples, the empirical average converges to the true expectation – Suppose we have an unknown coin and we want to estimate its bias (i.e. probability of heads) – Toss the coin 𝑛 times number of heads → P heads 𝑛 As 𝑛 increases, we get a better estimate of P heads What can we say about the gap between these two terms? 15

Bounding probabilities Markov’s inequality: Bounds the probability that a non-negative • random variable exceeds a fixed value 𝑄 𝑌 ≥ 𝑏 ≤ 𝐹 𝑌 𝑏 Chebyshev’s inequality: Bounds the probability that a random • variable differs from its expected value by more than a fixed number of standard deviations 𝑄 𝑌 − 𝜈 ≥ 𝑙𝜏 ≤ 1 𝑙 ! What we want: To bound sums of random variables – Why? Because the training error depends on the number of errors on the training set 16

̅ Hoeffding’s inequality Upper bounds on how much the sum of a set of random variables differs from its expected value 𝑞 + 𝜗 ≤ 𝑓 !"#$ % 𝑄 𝑞 > 17

̅ Hoeffding’s inequality Upper bounds on how much the sum of a set of random variables differs from its expected value 𝑞 + 𝜗 ≤ 𝑓 !"#$ % 𝑄 𝑞 > True mean (Eg. For a coin toss, the probability of seeing heads) 18

̅ Hoeffding’s inequality Upper bounds on how much the sum of a set of random variables differs from its expected value 𝑞 + 𝜗 ≤ 𝑓 !"#$ % 𝑄 𝑞 > Empirical mean, computed True mean (Eg. For a coin toss, the probability of seeing heads) over 𝑛 independent trials 19

̅ Hoeffding’s inequality Upper bounds on how much the sum of a set of random variables differs from its expected value 𝑞 + 𝜗 ≤ 𝑓 !"#$ % 𝑄 𝑞 > Empirical mean, computed True mean (Eg. For a coin toss, the probability of seeing heads) over 𝑛 independent trials The probability that the true mean will be more than 𝜗 away from the empirical mean, computed over 𝑛 trials 20

̅ Hoeffding’s inequality Upper bounds on how much the sum of a set of random variables differs from its expected value 𝑞 + 𝜗 ≤ 𝑓 !"#$ % 𝑄 𝑞 > Empirical mean, computed True mean (Eg. For a coin toss, the probability of seeing heads) over 𝑛 independent trials What this tells us: The empirical mean will not be too far from the expected mean if there are many samples. And, it quantifies the convergence rate as well. 21

Back to agnostic learning Suppose we consider the true error (a.k.a generalization error) 𝐹𝑠𝑠 𝐸 (ℎ) to be a random variable 22

Back to agnostic learning Suppose we consider the true error (a.k.a generalization error) 𝐹𝑠𝑠 𝐸 (ℎ) to be a random variable The training error over 𝑛 examples 𝐹𝑠𝑠 𝑇 (ℎ) is the empirical estimate of this true error 23

Back to agnostic learning Suppose we consider the true error (a.k.a generalization error) 𝐹𝑠𝑠 𝐸 (ℎ) to be a random variable The training error over 𝑛 examples 𝐹𝑠𝑠 𝑇 (ℎ) is the empirical estimate of this true error We can ask: What is the probability that the true error is more than 𝜗 away from the empirical error? 24

Back to agnostic learning Suppose we consider the true error (a.k.a generalization error) 𝐹𝑠𝑠 𝐸 (ℎ) to be a random variable The training error over 𝑛 examples 𝐹𝑠𝑠 𝑇 (ℎ) is the empirical estimate of this true error Let’s apply Hoeffding’s inequality 25

Back to agnostic learning Suppose we consider the true error (a.k.a generalization error) 𝐹𝑠𝑠 𝐸 (ℎ) to be a random variable The training error over 𝑛 examples 𝐹𝑠𝑠 𝑇 (ℎ) is the empirical estimate of this true error Let’s apply Hoeffding’s inequality " ℎ + 𝜗 ≤ 𝑓 #$%& ! 𝑄 𝐹𝑠𝑠 ! ℎ > 𝐹𝑠𝑠 26

Back to agnostic learning Suppose we consider the true error (a.k.a generalization error) 𝐹𝑠𝑠 𝐸 (ℎ) to be a random variable The training error over 𝑛 examples 𝐹𝑠𝑠 𝑇 (ℎ) is the empirical estimate of this true error Let’s apply Hoeffding’s inequality " ℎ + 𝜗 ≤ 𝑓 #$%& ! 𝑄 𝐹𝑠𝑠 ! ℎ > 𝐹𝑠𝑠 𝑔 𝑦 ≠ ℎ 𝑦 , 𝑦 ∈ 𝑇 𝐹𝑠𝑠 & ℎ = 𝐹𝑠𝑠 # ℎ = 𝑄𝑠 $~# 𝑔 𝑦 ≠ ℎ 𝑦 𝑛 27

Agnostic learning The probability that a single hypothesis ℎ has a training error that is more than 𝜗 away from the true error is bounded above $ ℎ + 𝜗 ≤ 𝑓 %&'( ! 𝑄 𝐹𝑠𝑠 # ℎ > 𝐹𝑠𝑠 28

Agnostic learning The probability that a single hypothesis ℎ has a training error that is more than 𝜗 away from the true error is bounded above $ ℎ + 𝜗 ≤ 𝑓 %&'( ! 𝑄 𝐹𝑠𝑠 # ℎ > 𝐹𝑠𝑠 The learning algorithm looks for the best one of the |𝐼| possible hypotheses 29

Computational Learning Theory: Agnostic Learning Machine Learning - PowerPoint PPT Presentation

Computational Learning Theory: Agnostic Learning Machine Learning 1 Slides based on material from Dan Roth, Avrim Blum, Tom Mitchell and others This lecture: Computational Learning Theory The Theory of Generalization Probably

LANGUAGE-AGNOSTIC INJECTION LANGUAGE-AGNOSTIC INJECTION DETECTION DETECTION Lars Hermerschmidt,

MANA for MPI MPI-Agnostic Network-Agnostic Transparent Checkpointing Rohan Garg, *Gregory Price,

Pool-based Agnostic Pool-based Agnostic Experiment Design Experiment Design in Linear

Bayesian Model-Agnostic Meta-Learning Taesup Kim* (presenter), Jaesik Yoon* Ousmane Dia,

CS485/685 Lecture 16: March 1, 2012 Agnostic Learning [BDSS] Chapters 2, 3 CS485/685 (c) 2012 P.

Illustrating Agnostic Learning We want a classifier to distinguish between cats and dogs Image 1

Task-Agnostic Dynamics Priors for Deep Reinforcement Learning Yilun Du 1 , Karthik Narasimhan 2

Computational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 7:

Category & Progression Specific Programming Model for Industry Agnostic Incubators Sean

Why Im NOT Why Im NOT Jewish/ Christian Atheist Agnostic Hindu Muslim Buddhist

Why Im NOT Why Im NOT Jewish/ Christian Atheist Agnostic Hindu Muslim Buddhist

Why Im NOT Why Im NOT Jewish/ Christian Atheist Agnostic Hindu Muslim Buddhist

Why Im NOT Why Im NOT Jewish/ Christian Atheist Agnostic Hindu Muslim Buddhist

Why Im NOT Why Im NOT Jewish/ Christian Atheist Agnostic Hindu Muslim Buddhist

Why Im NOT Why Im NOT Jewish/ Christian Atheist Agnostic Hindu Muslim Buddhist

Why Im NOT Why Im NOT Jewish/ Christian Atheist Agnostic Hindu Muslim Buddhist

Generalization Bounds and Stability Lorenzo Rosasco Tomaso Poggio 9.520 Class 6 February, 23

Bootstrapping and Learning PDFA in Data Streams Borja Balle , Jorge Castro, Ricard Gavald` a

Flavor Physics: Past, Present, Future Indirect Searches for NP at the Time of LHC GGI, Florence,

Foundation of Cryptography (0368-4162-01), Lecture 3 Hardcore Predicates for Any One-way Function

CSCE 478/878 Lecture 3: Computational Learning Theory Examines the worst-case minimum and

Computational Learning Theory 1 / 22 Decidability Computation Decidability which

Algorithms at Scale (Week 2) Puzzle of the Day: A bag contains a collection of blue and red

Learning sums of ridge functions in high dimension: a nonlinear compressed sensing model Massimo

Computational Learning Theory: Agnostic Learning Machine Learning - PowerPoint PPT Presentation

Computational Learning Theory: Agnostic Learning Machine Learning 1 Slides based on material from Dan Roth, Avrim Blum, Tom Mitchell and others This lecture: Computational Learning Theory The Theory of Generalization Probably

LANGUAGE-AGNOSTIC INJECTION LANGUAGE-AGNOSTIC INJECTION DETECTION DETECTION Lars Hermerschmidt,

MANA for MPI MPI-Agnostic Network-Agnostic Transparent Checkpointing Rohan Garg, *Gregory Price,

Pool-based Agnostic Pool-based Agnostic Experiment Design Experiment Design in Linear

Bayesian Model-Agnostic Meta-Learning Taesup Kim* (presenter), Jaesik Yoon* Ousmane Dia,

CS485/685 Lecture 16: March 1, 2012 Agnostic Learning [BDSS] Chapters 2, 3 CS485/685 (c) 2012 P.

Illustrating Agnostic Learning We want a classifier to distinguish between cats and dogs Image 1

Task-Agnostic Dynamics Priors for Deep Reinforcement Learning Yilun Du 1 , Karthik Narasimhan 2

Computational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 7:

Category &amp; Progression Specific Programming Model for Industry Agnostic Incubators Sean

Why Im NOT Why Im NOT Jewish/ Christian Atheist Agnostic Hindu Muslim Buddhist

Why Im NOT Why Im NOT Jewish/ Christian Atheist Agnostic Hindu Muslim Buddhist

Why Im NOT Why Im NOT Jewish/ Christian Atheist Agnostic Hindu Muslim Buddhist

Why Im NOT Why Im NOT Jewish/ Christian Atheist Agnostic Hindu Muslim Buddhist

Why Im NOT Why Im NOT Jewish/ Christian Atheist Agnostic Hindu Muslim Buddhist

Why Im NOT Why Im NOT Jewish/ Christian Atheist Agnostic Hindu Muslim Buddhist

Why Im NOT Why Im NOT Jewish/ Christian Atheist Agnostic Hindu Muslim Buddhist

Generalization Bounds and Stability Lorenzo Rosasco Tomaso Poggio 9.520 Class 6 February, 23

Bootstrapping and Learning PDFA in Data Streams Borja Balle , Jorge Castro, Ricard Gavald` a

Flavor Physics: Past, Present, Future Indirect Searches for NP at the Time of LHC GGI, Florence,

Foundation of Cryptography (0368-4162-01), Lecture 3 Hardcore Predicates for Any One-way Function

CSCE 478/878 Lecture 3: Computational Learning Theory Examines the worst-case minimum and

Computational Learning Theory 1 / 22 Decidability Computation Decidability which

Algorithms at Scale (Week 2) Puzzle of the Day: A bag contains a collection of blue and red

Learning sums of ridge functions in high dimension: a nonlinear compressed sensing model Massimo

Category & Progression Specific Programming Model for Industry Agnostic Incubators Sean