Uniform Convergence - Sample Complexity Assume that we want to - PowerPoint PPT Presentation

Uniform Convergence - Sample Complexity Assume that we want to estimate the probability p of property in the domain (sample space). If we take m ≥ log(1 /δ ) p ǫ 2 samples, we have X − p | ≥ ǫ p ) ≤ e − ǫ 2 pm ≤ δ. Pr ( | ¯ Assume that we want to estimate simultaneously probabilities p 1 , . . . , p k of k different properties in our domain, how many samples do we need? m ≥ log( k /δ ) p ǫ 2 samples is enough. Can we do better?

What’s Learning? Two types of learning: What’s a rectangle? • ”A rectangle is any quadrilateral with four right angles” • Here are many random examples of rectangles, here are many random examples of shapes that are not rectangles. Make your own rule that best conforms with the examples - Statistical Learning.

Statistical Learning – Learning From Examples • We want to estimate the working temperature range of an iPhone. – We could study the physics and chemistry that affect the performance of the phone – too hard – We could sample temperatures in [-100C,+100C] and check if the iPhone works in each of these temperatures – We could sample users’ iPhones for failures/temperature • How many samples do we need? • How good is the result? a b -100C +100C

Learning From Examples • We get n random training examples from distribution D . We choose a rule [ a , b ] conforms with the examples. • We use this rule to decide on the next example. • If the next example is drawn from D , what is the probability that we is wrong? • Let [ c , d ] be the correct rule. • Let ∆ = ([ a , b ] − [ c , d ]) ∪ ([ c , d ] − [ a , b ]) • We are wrong only on examples in ∆.

What’s the probability that we are wrong? • We are wrong only on examples in ∆. • The probability that we are wrong is the probability of having a quary from ∆. • If Prob (sample from ∆) ≤ ǫ we don’t care. • If Prob (sample from ∆) ≥ ǫ then the probability that n training samples all missed ∆, is bounded by (1 − ǫ ) n = δ , for n ≥ 1 ǫ log 1 δ . • Thus, with n ≥ 1 ǫ log 1 δ training samples, with probability 1 − δ , we chose a rule (interval) that gives the correct answer for quarries from D with probability ≥ 1 − ǫ .

Learning a Binary Classifier • An unknown probability distribution D on a domain U • An unknown correct classification – a partition c of U to In and Out sets • Input: • Concept class C – a collection of possible classification rules (partitions of U ). • A training set { ( x i , c ( x i )) | i = 1 , . . . , m } , where x 1 , . . . , x m are sampled from D . • Goal: With probability 1 − δ the algorithm generates a good classifier. A classifier is good if the probability that it errs on an item generated from D is ≤ opt ( C ) + ǫ , where opt ( C ) is the error probability of the best classifier in C .

Learning a Binary Classifier • Out and In items, and a concept class C of possible classifica;on rules

When does the sample identify the correct rule? - The realizable case • The realizable case - the correct classification c ∈ C . • For any h ∈ C let ∆( c , h ) be the set of items on which the two classifiers differ: ∆( c , h ) = { x ∈ U | h ( x ) � = c ( x ) } • Algorithm: choose h ∗ ∈ C that agrees with all the training set (there must be at least one). • If the sample (training set) intersects every set in { ∆( c , h ) | Pr (∆( c , h )) ≥ ǫ } , then Pr (∆( c , h ∗ )) ≤ ǫ.

Learning a Binary Classifier • Red and blue items, possible classifica9on rules, and the sample items

When does the sample identify the correct rule? The unrealizable (agnostic) case • The unrealizable case - c may not be in C . • For any h ∈ C , let ∆( c , h ) be the set of items on which the two classifiers differ: ∆( c , h ) = { x ∈ U | h ( x ) � = c ( x ) } • For the training set { ( x i , c ( x i )) | i = 1 , . . . , m } , let m Pr (∆( c , h )) = 1 ˜ � 1 h ( x i ) � = c ( x i ) m i =1 • Algorithm: choose h ∗ = arg min h ∈C ˜ Pr (∆( c , h )) . • If for every set ∆( c , h ), | Pr (∆( c , h )) − ˜ Pr (∆( c , h )) | ≤ ǫ, then Pr (∆( c , h ∗ )) ≤ opt ( C ) + 2 ǫ. where opt ( C ) is the error probability of the best classifier in C .

If for every set ∆( c , h ), | Pr (∆( c , h )) − ˜ Pr (∆( c , h )) | ≤ ǫ, then Pr (∆( c , h ∗ )) ≤ opt ( C ) + 2 ǫ. where opt ( C ) is the error probability of the best classifier in C . Let ¯ h be the best classifier in C . Since the algorithm chose h ∗ , Pr (∆( c , h ∗ )) ≤ ˜ ˜ Pr (∆( c , ¯ h )) . Thus, ˜ Pr (∆( c , h ∗ )) − opt ( C ) Pr (∆( c , h ∗ )) − opt ( C ) + ǫ ≤ Pr (∆( c , ¯ ˜ ≤ h )) − opt ( C ) + ǫ ≤ 2 ǫ

Detection vs. Estimation • Input: • Concept class C – a collection of possible classification rules (partitions of U ). • A training set { ( x i , c ( x i )) | i = 1 , . . . , m } , where x 1 , . . . , x m are sampled from D . • For any h ∈ C , let ∆( c , h ) be the set of items on which the two classifiers differ: ∆( c , h ) = { x ∈ U | h ( x ) � = c ( x ) } • For the realizable case we need a training set (sample) that with probability 1 − δ intersects every set in { ∆( c , h ) | Pr (∆( c , h )) ≥ ǫ } ( ǫ -net) • For the unrealizable case we need a training set that with probability 1 − δ estimates, within additive error ǫ , every set in ∆( c , h ) = { x ∈ U | h ( x ) � = c ( x ) } ( ǫ -sample) .

Uniform Convergence Sets Given a collection R of sets in a universe X , under what conditions a finite sample N from an arbitrary distribution D over X , satisfies with probability 1 − δ , 1 ∀ r ∈ R , Pr D ( r ) ≥ ǫ ⇒ r ∩ N � = ∅ ( ǫ -net) 2 for any r ∈ R , � D ( r ) − | N ∩ r | � � � � Pr � ≤ ε ( ǫ -sample) � � | N |

Learnability - Uniform Convergence Theorem In the realizable case, any concept class C can be learned with m = 1 ǫ (ln |C| + ln 1 δ ) samples. Proof. We need a sample that intersects every set in the family of sets { ∆( c , c ′ ) | Pr (∆( c , c ′ )) ≥ ǫ } . There are at most |C| such sets, and the probability that a sample is chosen inside a set is ≥ ǫ . The probability that m random samples did not intersect with at least one of the sets is bounded by |C| (1 − ǫ ) m ≤ |C| e − ǫ m ≤ |C| e − (ln |C| +ln 1 δ ) ≤ δ.

How ¡Good ¡is ¡this ¡Bound? ¡ • Assume ¡that ¡we ¡want ¡to ¡es3mate ¡the ¡working ¡ temperature ¡range ¡of ¡an ¡iPhone. ¡ • We ¡sample ¡temperatures ¡in ¡[-‑100C,+100C] ¡ and ¡check ¡if ¡the ¡iPhone ¡works ¡in ¡each ¡of ¡these ¡ temperatures. ¡ a ¡ b ¡ -‑100C ¡ +100C ¡

Learning an Interval • A distribution D is defined on universe that is an interval [ A , B ]. • The true classification rule is defined by a sub-interval [ a , b ] ⊆ [ A , B ]. • The concept class C is the collection of all intervals, C = { [ c , d ] | [ c , d ] ⊆ [ A , B ] } Theorem There is a learning algorithm that given a sample from D of size m = 2 ǫ ln 2 δ , with probability 1 − δ , returns a classification rule (interval) [ x , y ] that is correct with probability 1 − ǫ . Note that the sample size is independent of the size of the concept class |C| , which is infinite.

Learning ¡an ¡Interval ¡ • If ¡the ¡classifica2on ¡error ¡is ¡≥ ¡ε ¡then ¡the ¡sample ¡ missed ¡at ¡least ¡one ¡of ¡the ¡the ¡intervals ¡[a,a’] ¡ or ¡[b’,b] ¡each ¡of ¡probability ¡≥ ¡ε/2 ¡ ¡ ε/2 ¡ ε/2 ¡ a’ ¡ b ¡ A ¡ a ¡ b’ ¡ B ¡ x ¡ y ¡ Each ¡sample ¡excludes ¡many ¡possible ¡intervals. ¡ The ¡union ¡bound ¡sums ¡over ¡overlapping ¡hypothesis. ¡ Need ¡beIer ¡characteriza2on ¡of ¡concept's ¡complexity! ¡ ¡

Proof. Algorithm: Choose the smallest interval [ x , y ] that includes all the ”In” sample points. • Clearly a ≤ x < y ≤ b , and the algorithm can only err in classifying ”In” points as ”Out” points. • Fix a < a ′ and b ′ < b such that Pr ([ a , a ′ ]) = ǫ/ 2 and Pr ([ b , b ′ ]) = ǫ/ 2. • If the probability of error when using the classification [ x , y ] is ≥ ǫ then either a ′ ≤ x or y ≤ b ′ or both. • The probability that the sample of size m = 2 ǫ ln 2 δ did not intersect with one of these intervals is bounded by 2(1 − ǫ 2) m ≤ e − ǫ m 2 +ln 2 ≤ δ

• The union bound is far too loose for our applications. It sums over overlapping hypothesis. • Each sample excludes many possible intervals. • Need better characterization of concept’s complexity!

Probably Approximately Correct Learning (PAC Learning) • The goal is to learn a concept (hypothesis) from a pre-defined concept class. (An interval, a rectangle, a k -CNF boolean formula, etc.) • There is an unknown distribution D on input instances. • Correctness of the algorithm is measured with respect to the distribution D . • The goal: a polynomial time (and number of samples) algorithm that with probability 1 − δ computes an hypothesis of the target concept that is correct (on each instance) with probability 1 − ǫ .

Uniform Convergence - Sample Complexity Assume that we want to - PowerPoint PPT Presentation

Uniform Convergence - Sample Complexity Assume that we want to estimate the probability p of property in the domain (sample space). If we take m log(1 / ) p 2 samples, we have X p | p ) e 2 pm . Pr ( |

10 slides that always work Simple text boxes (I) Sample text Sample text Sample text

Circuits Lecture 11 Uniform Circuit Complexity 1 Recall 2 Recall Non-uniform complexity 2

Curriculum on The Cadet Corps Uniform Class A Uniform Class A Uniform Agenda C1. Class A

Curriculum on The Cadet Corps Uniform Wear It WIth honor Class C Uniform Class C Uniform

Non-Uniform Computation Lecture 10 Non-Uniform Computational Models: Circuits 1 Non-Uniform

Non-Uniform Computation & Circuits Lecture 10 Wherein every language can be decided 1

Sample 2 Inlet in western (Sunset) Bay 0 Sample 3 Inlet behind Christian Island 1 Sample

Agglomeration of Ash Particles due to Flue Gas Conditioning (a) Sample CA8S12F1 (b) Sample

MAZENOD COLLEGE STUDENT PRESENTATION POLICY SUMMER UNIFORM If out of uniform students must

Winter Uniform If out of uniform students must present a note of explanation to their Year Level

Uniform Guidance aka UG, UniGui HUGE: CSU Harnessing Uniform Guidance Effectively An update

Convergence of uniform subdivision Amos Ron Erice, Trapani, Sicilia, Italia, Europa September,

Computable Real Functions Parameterized Uniform Parameterized Uniform From NP -hard to polytime

SEM Photographs of Activated ash samples SEM Micrographs (Original ash samples) (a) Sample S1F1

Hans Vangheluwe Modelling and Simulation Causes of Complexity Dealing with Complexity

Hans Vangheluwe Modelling and Simulation Causes of Complexity Dealing with Complexity

Outline Supply Networks Introduction Optimal Supply Networks Introduction Introduction

Math 1060Q Lecture 7 Jeffrey Connors University of Connecticut September 17, 2014 We shall

2. Diffraction at periodic structures 2. Diffraction Patterson function 2. Diffraction 58 60

Diffrac'on basics 1 Diffraction Diffraction refers to the phenomena exhibited by radiation when

Reachability in Stochastic Hybrid Systems [Ongoing Work] Patricia Bouyer 1 Thomas Brihaye 2

A Joint Model of Language and Perception for Grounded Attribute Learning Cynthia Matuszek,

Solving Recurrences Debdeep Mukhopadhyay IIT Madras Recurrence Relations A recurrence

Divide and Conquer Algorithms and Recurrence Relations Debdeep Mukhopadhyay IIT Madras Divide

Uniform Convergence - Sample Complexity Assume that we want to - PowerPoint PPT Presentation

Uniform Convergence - Sample Complexity Assume that we want to estimate the probability p of property in the domain (sample space). If we take m log(1 / ) p 2 samples, we have X p | p ) e 2 pm . Pr ( |

10 slides that always work Simple text boxes (I) Sample text Sample text Sample text

Circuits Lecture 11 Uniform Circuit Complexity 1 Recall 2 Recall Non-uniform complexity 2

Curriculum on The Cadet Corps Uniform Class A Uniform Class A Uniform Agenda C1. Class A

Curriculum on The Cadet Corps Uniform Wear It WIth honor Class C Uniform Class C Uniform

Non-Uniform Computation Lecture 10 Non-Uniform Computational Models: Circuits 1 Non-Uniform

Non-Uniform Computation &amp; Circuits Lecture 10 Wherein every language can be decided 1

Sample 2 Inlet in western (Sunset) Bay 0 Sample 3 Inlet behind Christian Island 1 Sample

Agglomeration of Ash Particles due to Flue Gas Conditioning (a) Sample CA8S12F1 (b) Sample

MAZENOD COLLEGE STUDENT PRESENTATION POLICY SUMMER UNIFORM If out of uniform students must

Winter Uniform If out of uniform students must present a note of explanation to their Year Level

Uniform Guidance aka UG, UniGui HUGE: CSU Harnessing Uniform Guidance Effectively An update

Convergence of uniform subdivision Amos Ron Erice, Trapani, Sicilia, Italia, Europa September,

Computable Real Functions Parameterized Uniform Parameterized Uniform From NP -hard to polytime

SEM Photographs of Activated ash samples SEM Micrographs (Original ash samples) (a) Sample S1F1

Hans Vangheluwe Modelling and Simulation Causes of Complexity Dealing with Complexity

Hans Vangheluwe Modelling and Simulation Causes of Complexity Dealing with Complexity

Outline Supply Networks Introduction Optimal Supply Networks Introduction Introduction

Math 1060Q Lecture 7 Jeffrey Connors University of Connecticut September 17, 2014 We shall

2. Diffraction at periodic structures 2. Diffraction Patterson function 2. Diffraction 58 60

Diffrac'on basics 1 Diffraction Diffraction refers to the phenomena exhibited by radiation when

Reachability in Stochastic Hybrid Systems [Ongoing Work] Patricia Bouyer 1 Thomas Brihaye 2

A Joint Model of Language and Perception for Grounded Attribute Learning Cynthia Matuszek,

Solving Recurrences Debdeep Mukhopadhyay IIT Madras Recurrence Relations A recurrence

Divide and Conquer Algorithms and Recurrence Relations Debdeep Mukhopadhyay IIT Madras Divide

Non-Uniform Computation & Circuits Lecture 10 Wherein every language can be decided 1