cs485 685 lecture 16 march 1 2012
play

CS485/685 Lecture 16: March 1, 2012 Agnostic Learning [BDSS] Chapters - PDF document

01/03/2012 CS485/685 Lecture 16: March 1, 2012 Agnostic Learning [BDSS] Chapters 2, 3 CS485/685 (c) 2012 P. Poupart 1 Agnostic PAC Learning Definition: A learner that doesnt assume that contains an error free hypothesis and that simply


  1. 01/03/2012 CS485/685 Lecture 16: March 1, 2012 Agnostic Learning [BDSS] Chapters 2, 3 CS485/685 (c) 2012 P. Poupart 1 Agnostic PAC Learning • Definition: A learner that doesn’t assume that � contains an error free hypothesis and that simply finds the hypothesis with minimum training error is often called an agnostic learner CS485/685 (c) 2012 P. Poupart 2 1

  2. 01/03/2012 Agnostic PAC Learnability • A hypothesis class � is agnostic PAC learnable if for � � any � � 0 , � ∈ �0,1� , there exists an N � � � , � and a learning algorithm such that for any � and N i.i.d. samples it returns � ∈ � such that with probability 1 � � � � ∈� � � � � � � � � � � min CS485/685 (c) 2012 P. Poupart 3 � ‐ representative • Definition: A training set � is called � ‐ representative if ∀� ∈ �, |� � � � � � � | � � � • Lemma: Assume that a training set � is � ‐ representative. Then any output � � of an empirical risk minimizing algorithm satisfies � � � � � min �∈� � � � � � � • Proof: � � � � � � � � � � � � � � � � � � � � � � � � CS485/685 (c) 2012 P. Poupart 4 2

  3. 01/03/2012 Uniform Convergence • Definition: A hypothesis class � has the uniform convergence property if there exists a function �: � � � 0,1 → � such that for every probability distribution � , if � is a sample of � � ���, �� examples drawn i.i.d. according to � , then with probability at least 1 � � , � is � ‐ representative. CS485/685 (c) 2012 P. Poupart 5 Uniform Convergence • Corollary 2: If a class � has the uniform convergence property with a function � , then the class is agnostically PAC learnable with sample complexity � N � � � , � . Furthermore, an empirical risk minimization algorithm is a successful agnostic PAC learner for � . CS485/685 (c) 2012 P. Poupart 6 3

  4. 01/03/2012 Uniform Convergence • To show that uniform convergence holds, show that: |� � � � � � � | is likely to be small for any 1. fixed hypothesis (chosen before seeing the data) 2. Think of � � ��� as a random variable with mean � � ��� . Then the distribution of � � ��� is concentrated around its mean for all � ∈ � . CS485/685 (c) 2012 P. Poupart 7 Measure Concentration • Let � � be random variables with mean � . Then as � � � ∑ � → ∞ , � � → � ��� • Use measure concentration inequalities to quantify � � � ∑ � � from � for finite � the deviation of ��� CS485/685 (c) 2012 P. Poupart 8 4

  5. 01/03/2012 Markov’s Inequality • Markov’s inequality: � � ∀� � 0 Pr � � � � � � • Derivation: � � � � Pr � � � �� ��� � � � Pr � � � �� ��� � � � Pr � � � �� ��� � � Pr�� � �� CS485/685 (c) 2012 P. Poupart 9 Chebyshev’s Inequality • Bound deviation from the mean on both sides: � � � � � Pr � � � � � � � Pr � � � � � � � ��� � � � � � ��� � � � • Since ���� � � ��� � � � � ∑ � � � for i.i.d. � � ’s, ��� � � � � � ��� � � � � ∑ then Pr � � � � ��� �� � CS485/685 (c) 2012 P. Poupart 10 5

  6. 01/03/2012 Chebyshev’s Inequality • Lemma: Let � � , … , � � be i.i.d., � � � � � ∀� � and ��� � � � 1 ∀� � , then for any � ∈ �0,1� , with probability 1 � � , we have � � � ∑ � � � � � � �� � � • Proof: Let � � Pr � ∑ � � � � � � ��� ��� � � � Then � � � �� � �� � � � � � � ∑ Hence � � � � � � � �� and ��� �� CS485/685 (c) 2012 P. Poupart 11 Hoeffding’s Inequality • Tighter bound than Chebyshev’s inequality • Let � � , … , � � be i.i.d. variables with mean � • Assume that Pr � � � � � � � 1 � ���� � � � ∑ • Then Pr � � � � � � � 2� ��� � ��� � � � 2� ���� � • Hence Pr � � � � � � � CS485/685 (c) 2012 P. Poupart 12 6

  7. 01/03/2012 Agnostic PAC Learnability • Theorem: Let � be finite, � ∈ �0,1� , � � 0 and � ��� � � � � � , then with probability at least 1 � � , � � we have � � � � � min �∈� � � � � � CS485/685 (c) 2012 P. Poupart 13 Agnostic PAC Learnability • Proof : From Corollary 2, it suffices to show that � Pr ∃� ∈ �, � � � � � � � � � � � Using the union bound: � Pr ∃� ∈ �, � � � � � � � � � � � ∑ Pr � � � � � � � � �∈� � � 2 � � � ��� � � ��� � � � � since � � � � � CS485/685 (c) 2012 P. Poupart 14 7

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend