Learning From Data Lecture 3 Is Learning Feasible? Outside the - PowerPoint PPT Presentation

Learning From Data Lecture 3 Is Learning Feasible? Outside the Data Probability to the Rescue Learning vs. Verification Selection Bias - A Cartoon M. Magdon-Ismail CSCI 4100/6100

recap: The Perceptron Learning Algorithm y ∗ = +1 y ∗ x ∗ w ( t + 1) w ( t ) x ∗ Income Income − → y ∗ = − 1 y ∗ x ∗ w ( t ) w ( t + 1) Age Age x ∗ PLA finds a linear separator in finite time. • What if data is not linearly separable? • We want g ≈ f – Separating the data amounts to “memorizing the data”: g ≈ f only on D . – g ≈ f means we are interested in outside the data . M Is Learning Feasible : 2 /27 � A c L Creator: Malik Magdon-Ismail Puzzle - Outside the data − →

Outside the Data Set f = − 1 f = +1 f = ? M Is Learning Feasible : 3 /27 � A c L Creator: Malik Magdon-Ismail ± 1 both possible − →

Outside the Data Set f = − 1 f = +1 f = ? • Did you say f = +1? ( f is measuring symmetry.) • Did you say f = − 1? ( f only cares about the top left pixel.) Who is correct? – we cannot rule out either possibility . M Is Learning Feasible : 4 /27 � A c L Creator: Malik Magdon-Ismail No Free Lunch − →

Outside the Data Set f = − 1 f = +1 f = ? • An easy visual learning problem just got very messy. For every f that fits the data and is “+1” on the new point, there is one that is “ − 1”. Since f is unknown , it can take on any value outside the data, no matter how large the data. • This is called No Free Lunch (NFL) . You cannot know anything for sure about f outside the data without making assumptions. • What now! Is there any hope to know anything about f outside the data set without making assumptions about f ? Yes, if we are willing to give up the “for sure”. M Is Learning Feasible : 5 /27 � A c L Creator: Malik Magdon-Ismail The big question − →

Can we infer something outside the data using only D ? M Is Learning Feasible : 6 /27 � A c L Creator: Malik Magdon-Ismail Population mean − →

Population Mean from Sample Mean BIN The BIN Model • Bin with red and green marbles. SAMPLE • Pick a sample of N marbles independently . • µ : probability to pick a red marble. ν = fraction of red marbles in sample ν : fraction of red marbles in the sample. Sample − the data set → − → ν µ = probability to BIN − → outside the data − → µ pick a red marble Can we say anything about µ (outside the data) after observing ν (the data) ? ANSWER: No. It is possible for the sample to be all green marbles and the bin to be mostly red. Then, why do we trust polling (e.g. to predict the outcome of the presidential election). ANSWER: The bad case is possible , but not probable . M Is Learning Feasible : 7 /27 � A c L Creator: Malik Magdon-Ismail Hoeffding − →

Probability to the Rescue: Hoeffding’s Inequality Hoeffding/Chernoff proved that, most of the time, ν cannot be too far from µ : P [ | ν − µ | > ǫ ] ≤ 2 e − 2 ǫ 2 N , for any ǫ > 0 . box it and memorize it P [ | ν − µ | ≤ ǫ ] ≥ 1 − 2 e − 2 ǫ 2 N , for any ǫ > 0 . We get to select any ǫ we want. newsflash: ν ≈ µ = . ⇒ µ ≈ ν µ ≈ ν is probably approximately correct (PAC-learning) M Is Learning Feasible : 8 /27 � A c L Creator: Malik Magdon-Ismail Hoeffding example − →

Probability to the Rescue: Hoeffding’s Inequality P [ | ν − µ | > ǫ ] ≤ 2 e − 2 ǫ 2 N , for any ǫ > 0 . box it and memorize it P [ | ν − µ | ≤ ǫ ] ≥ 1 − 2 e − 2 ǫ 2 N , for any ǫ > 0 . Example : N = 1 , 000 ; draw a sample and observe ν . 99% of the time µ − 0 . 05 ≤ ν ≤ µ + 0 . 05 ( ǫ = 0 . 05) 99 . 9999996% of the time µ − 0 . 10 ≤ ν ≤ µ + 0 . 10 ( ǫ = 0 . 10) What does this mean? If I repeatedly pick a sample of size 1,000, observe ν and claim that µ ∈ [ ν − 0 . 05 , ν + 0 . 05] , (the error bar is ± 0 . 05 ) I will be right 99% of the time. On any particular sample you may be wrong, but not often. We learned something . From ν , we reached outside the data to µ . M Is Learning Feasible : 9 /27 � A c L Creator: Malik Magdon-Ismail Probability rescued us − →

How Did Probability Rescue Us? • Key ingredient samples must be independent . If the sample is constructed in some arbitrary fashion, then indeed we cannot say anything. Even with independence, ν can take on arbitrary values; but some values are way more likely than others. This is what allows us to learn something – it is likely that ν ≈ µ . • The bound 2 e − 2 ǫ 2 N does not depend on µ or the size of the bin The bin can be infinite. It’s great that it does not depend on µ because µ is unknown; and we mean unknown . • The key player in the bound 2 e − 2 ǫ 2 N is N . If N → ∞ , µ ≈ ν with very very very . . . high probabilty, but not for sure . Can you live with 10 − 100 probability of error? We should probably have said “independence to the rescue” M Is Learning Feasible : 10 /27 � A c L Creator: Malik Magdon-Ismail Bin and learning − →

Relating the Bin to Learning Target Function f Fixed hypothesis h Income Income Age Age UNKNOWN KNOWN In learning, the unknown is an entire function f ; in the bin it was a single number µ . M Is Learning Feasible : 11 /27 � A c L Creator: Malik Magdon-Ismail The error function − →

Relating the Bin to Learning - The Error Function Target Function f Fixed hypothesis h Income Income Age Age green: h ( x ) = f ( x ) red: h ( x ) � = f ( x ) Income E ( h ) = P x [ h ( x ) � = f ( x )] (“size” of the red region) ↑ P ( x ) Age UNKNOWN M Is Learning Feasible : 12 /27 � A c L Creator: Malik Magdon-Ismail Errors=red ‘marbles’ − →

Relating the Bin to Learning - The Error Function Target Function f Fixed a hypothesis h Income Income Age Age green “marble”: h ( x ) = f ( x ) red “marble”: h ( x ) � = f ( x ) BIN: X Income E out ( h ) = P x [ h ( x ) � = f ( x )] ↑ out-of-sample Age UNKNOWN M Is Learning Feasible : 13 /27 � A c L Creator: Malik Magdon-Ismail Data − →

Relating the Bin to Learning - the Data Target Function f Fixed a hypothesis h Income Income Age Age Income Age M Is Learning Feasible : 14 /27 � A c L Creator: Malik Magdon-Ismail Data=sample of marbles − →

Relating the Bin to Learning - the Data Target Function f Fixed a hypothesis h Income Income Age Age green data: h ( x n ) = f ( x n ) red data: h ( x n ) � = f ( x n ) Income E in ( h ) = fraction of red data ↑ ↑ in-sample misclassified Age KNOWN! M Is Learning Feasible : 15 /27 � A c L Creator: Malik Magdon-Ismail Learning vs. bin − →

Relating the Bin to Learning BIN SAMPLE Income Income ν = fraction of red marbles in sample Age Age = probability to µ pick a red marble Unknown f and P ( x ), fixed h Learning Bin Model input space X Bin • green marble x for which h ( x ) = f ( x ) • red marble x for which h ( x ) � = f ( x ) P ( x ) randomly picking a marble data set D sample of N marbles Out-of-sample Error: E out ( h ) = P x [ h ( x ) � = f ( x )] µ = probability of picking a red marble N In-sample Error: E in ( h ) = 1 � � h ( x ) � = f ( x ) � ν = fraction of red marbles in the sample N n =1 M Is Learning Feasible : 16 /27 � A c L Creator: Malik Magdon-Ismail Hoeffding for E in − →

Hoeffding says that E in ( h ) ≈ E out ( h ) P [ | E in ( h ) − E out ( h ) | > ǫ ] ≤ 2 e − 2 ǫ 2 N , for any ǫ > 0 . P [ | E in ( h ) − E out ( h ) | ≤ ǫ ] ≥ 1 − 2 e − 2 ǫ 2 N , for any ǫ > 0 . E in is random, but known ; E out fixed, but unknown . • If E in ≈ 0 = ⇒ E out ≈ 0 (with high probability), i.e. P x [ h ( x ) � = f ( x )] ≈ 0; We have learned something about the entire f : f ≈ h over X (outside D ) • If E in ≫ 0, we’re out of luck. But, we have still learned something about the entire f : f �≈ h ; it is not very useful though. Questions: Suppose that E in ≈ 1, have we learned something about the entire f that is useful? What is the worst E in for inferring about f ? M Is Learning Feasible : 17 /27 � A c L Creator: Malik Magdon-Ismail Verification vs. learning − →

That’s Verification, not Real Learning The entire previous argument assumed a FIXED h and then came the data. • Given h ∈ H , a sample can verify whether or not it is good (w.r.t. f ): if E in is small, h is good, with high confidence. if E in is large, h is bad with high confidence. We have no control over E in . It is what it is. • In learning, you actually try to fit the data, as with the perceptron model g results from searching an entire hypothesis set H for a hypothesis with small E in . Verification Real Learning Fixed single hypothesis h Fixed hypothesis set H h to be certified g to be certified h does not depend on D g results after searching H to fit D No control over E in Pick best E in Verification: we can say something outside the data about h ? Learning: can we say something outside the data about g ? M Is Learning Feasible : 18 /27 � A c L Creator: Malik Magdon-Ismail Real learning – finite model − →

Learning From Data Lecture 3 Is Learning Feasible? Outside the - PowerPoint PPT Presentation

Learning From Data Lecture 3 Is Learning Feasible? Outside the Data Probability to the Rescue Learning vs. Verification Selection Bias - A Cartoon M. Magdon-Ismail CSCI 4100/6100 recap: The Perceptron Learning Algorithm y = +1 y x

Panel Data Analysis Part II Feasible Estimators James J. Heckman University of Chicago Econ

CSE 40/60236 Sam Bailey Solution: any point in the variable space (both feasible and

10/04/2018 Definitions A schedule is said to be feasible if it satisfies a set of

NP-completeness (review) have no feasible solutions NP have P feasible solutions

10/12/2019 Definitions A schedule is said to be feasible if it satisfies a set of

Draft SIS 2045 Cost Feasible Plan Ca Candidate Proj oject Review - Col Collier MPO May 18 th ,

uRPF - reboot Alexander Azimov aa@qrator.net uRPF Feasible Mode: Problem Provider A Provider B

SQLite with a Fine-Toothed Comb John Regehr Trust-in-So1 / University of Utah Feasible states

Feasible Reactivity for Synchronous Cooperative Threads F. Dabrowski INRIA Sophia-Antipolis

Learning From Data Lecture 2 The Perceptron The Learning Setup A Simple Learning Algorithm: PLA

Malaysian Healthy Ageing Society Plenary Lecture Plenary Lecture Plenary Lecture Plenary

Data Collection of Security Incidents Data Collection of Security Incidents and Consumer

Minimum-Cost Flow Math 482, Lecture 28 Misha Lavrov April 10, 2020 Introduction Basic

Variations on Max-Flow Problems Math 482, Lecture 27 Misha Lavrov April 8, 2020 Introduction

Using Freshest Feasible Data for Medical Product Safety Surveillance in Mini- Sentinel: Potential

Budget Feasible Mechanisms for Experimental Design Thibaut Horel Joint work with Stratis

Computer Graphics - Introduction to Ray Tracing - Philipp Slusallek Rendering Algorithms

Unit 4: Inference for numerical data 2. ANOVA GOVT 3990 - Spring 2020 Cornell University Dr.

Basic Ray Tracing CMSC 435/634 Projections orthographic axis-aligned orthographic perspective

Recent adaptive selection in Tibet and Greenland Anders Albrechtsen The bioinformatic Centre,

Shadows Shadows What for? Shadows tell us about the relative locations and motions of objects

9: Advanced shading techniques Obtaining realistic renderings in real-time! Remember the

Ion slides 2 pc windows 10 driver download Moon gives a good business the last-named aspect

Scaling symbolic evaluation for automated verification of systems code with Serval Luke Nelson