sample complexity bounds for active learning
play

Sample Complexity Bounds for Active Learning Paper by Sanjoy - PowerPoint PPT Presentation

Sample Complexity Bounds for Active Learning Paper by Sanjoy Dasgupta Presenter: Peter Sadowski Passive PAC Learning Complexity Based on VC dimension To get error < with probability 1 : num samples O


  1. Sample Complexity Bounds for Active Learning Paper by Sanjoy Dasgupta Presenter: Peter Sadowski

  2. Passive PAC Learning Complexity � Based on VC dimension To get error < ǫ with probability ≥ 1 − δ : � � � num samples ≥ � O ǫ ( V C ( H ) log (1 /δ )) Is there some equivalent for active learning?

  3. Example: Reals in 1-D P=underlying distribution of points H=space of possible hypotheses w � 1 if x ≥ w H= { h w : w ∈ � } h w ( x ) = 0 if x < w O(1/ ǫ ) random labeled examples needed from P to get error rate < ǫ

  4. Example: Reals in 1-D � 1 if x ≥ w h w ( x ) = 0 if x < w w Passive learning: O(1/ ǫ ) random labeled examples needed from P to get error rate < ǫ Active learning (Binary Search): O(log 1 /ǫ ) examples needed to get error < ǫ Active learning gives us an exponential improvement!

  5. Example 2: Points on a Circle � P = some density on circle perimeter � H = linear separators in R 2 h � h � h �

  6. Example 2: Points on a Circle Worst case: small ǫ slice of the circle is different O(1/ ǫ ) � Passive learning: O(1/ ǫ ) � Active learning: No improvement!

  7. Active Learning Abstracted � Goal: Narrow down the version space , (hypotheses that fit with known labels � Idea: Think of hypotheses as points x=1 version space New version space if x=0 Observe x Cut made by Version space observing x

  8. Shrinking the Version Space � Define distance between hypotheses: d(h,h’)=P { x:h(x) � = h ′ ( x ) } � Ignore distances less than ǫ Q=H × H Q ǫ = { ( h, h ′ ) ∈ Q : d ( h, h ′ ) > ǫ } A good cut!

  9. Quick Example � What is the best cut? Q ǫ = { ( h, h ′ ) ∈ Q : d ( h, h ′ ) > ǫ }

  10. Quick Example � Cut edges => shrink version space After this cut, we have a solution! The hypotheses left are insignificantly different.

  11. Quantifying “Usefulness” of Points A point x ∈ X is said to ρ − split Q ǫ IF its label reduces the number of edges by a fraction ρ > 0 ¼-split 1-split ¾-split

  12. Quantifying the Difficulty of Problems Definition: Subset S of hypotheses is if ( ρ, ǫ, τ ))splittable P { x : x ρ )splits Q ǫ } ≥ τ ”At least a fraction of τ samples are ρ )useful in splitting S.” ρ small ⇒ smaller splits ǫ small ⇒ small error τ small ⇒ lots of samples needed to get a good split

  13. Lower Bound Result Suppose for some hypothesis space H: for some hypotheses � d( h � , h i ) > ǫ h � , h � , ..., h N { x : h � ( x ) � = h i ( x ) } � “disagree sets” are disjoint h � Then: For any τ and ρ > 1 /N , Q is not ( ρ, ǫ, τ ))splittable.

  14. An Interesting Result There is constant c > 0 such that for any dimension d ≥ 2, if 1. H is the class of homogeneous lenear separators in R d , and 2. P is the uniform distribution over the surface of the unit sphere, then H is (1 / 4 , ǫ, cǫ ))splittable for all ǫ > 0. ⇒ For any h ∈ H , any ǫ ≤ 1 / (32 π � √ d ), � � �� √ � B ( h, 4 ǫ ) is � , ǫ, 1 ǫ/ d )splittable.

  15. Conclusions � Active learning not always much better than passive. � “Splittability” is the VC dimension for active learning. � We can use this framework to fit bounds for specific problems.

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend