Learning from Positive Examples Christos Tzamos (UW-Madison) Based - PowerPoint PPT Presentation

Typical Classification Task Positive Examples Features Hi Mike, + Your Do you want Amazon.com + to come over Valid Email order has _ for dinner + shipped! tomorrow? + _ + + + + _ _ Dear Sir, _ Congrats! I am a You won Nigerian Invalid Email 1,000,000!! Prince… (Spam) Negative Examples

Classification - Formulation Unknown set S ⊆ R d of positive examples (target concept) 1. 2. Points x 1 , …, x n in R d are drawn from a distribution D (examples) 3. The examples are labeled positive if they are in S and negative otherwise. _ + + _ + Goal : Find a set S’ such that agrees with the set S on the label of a random example with high probability (> 99%) How many examples are needed?

Complexity of Concepts The samples needed depend on how complex the concept is. vs Arbitrary Distribution of Samples Gaussian Distribution of Samples Vapnik–Chervonenkis (VC) dimension Gaussian Surface Area VC dimension k → O(k) samples suffice Gaussian SA γ → exp(γ 2 ) samples suffice

Learning with positive examples Learning from both positive and negative examples is well understood. In many situations though, only positive examples are provided. E.g. When a child learns to speak “Mary had a little lamb” “Twinkle twinkle little star” “What does the fox say?” No negative examples are given “Fox say what does” “akjda! Fefj dooraboo”

Can we learn from positive examples? Generally no! Need to know what examples are excluded. + + + + + + + +

Two approaches for learning 1. Assume data points are drawn from a structured distribution (e.g. Gaussian) “Learning Geometric Concepts from Positive Examples” (joint work with Contonis and Zampetakis) 2. Assume an oracle that can check the validity of examples (during training) “Actively Avoiding Nonsense in Generative Models” (joint work with Hanneke, Kalai and Kamath, COLT 2018 )

Learning from Normally Distributed Examples

Model ● Points x 1 , …, x n in R d are drawn from a normal distribution N( μ , Σ ) with unknown parameters. ● Only samples that fall into a set S are given. ● Assumption: at least 1% of the total samples are kept. ● Goal: Find μ , Σ , and S . ● Example: When S is a union of 3 intervals in 1-d. μ

Main Structural Theorem ● Suppose the set S has low complexity (Gaussian Surface Area at most γ ) ● Consider the moments E[x], E[x 2 ], …, E[x k ] of the positive samples for k = Θ ( γ 2 ) Structural Theorem [Contonis, T, Zampetakis’ 2018] For any μ’ , Σ’ , and a set S’ with Gaussian Surface Area at most γ that matches all k=Θ(γ 2 ) moments, • S agrees with S’ almost everywhere and, • The distribution N(μ’,Σ’) is almost identical to N(μ,Σ) Moreover, one can identify computationally efficiently μ’ , Σ’ , and S’

Ideas behind algorithm ● The moments of the positive samples are (proportional to) E[x 1 S (x)], E[x 2 1 S (x)], …, E[x k 1 S (x) ] for random x drawn from N( μ , Σ ) ● The function 1 S (x) can be written as a sum of ∑" # $ # (&) where $ # (&) is the degree k Hermite polynomial. ● Hermite polynomials form an orthonormal basis similar to the Fourier Transform. ● Knowing the k first moments, we can find the top k Hermite coefficients which give a low degree approximation of the function 1 S (x). ● For k= Θ ( γ 2 ), the approximation is very accurate.

Corollaries ● ! "($ % ) samples suffice to learn a concept with Gaussian surface area γ . Need to estimate accurately all ! "($ % ) high-dimensional moments. ● Intersection of ' halfspaces: ! (()*+ ') ● Degree , polynomial-threshold functions: ! ((, % ) ● Convex Sets: ! (( !)

Learning with access to a Validity Oracle

Setting Sample access to an unknown distribution p supported on an unknown set. Can query an oracle whether an example x is in !"##(#) . A family Q of probability distributions with varying supports. Assuming a q* in Q exists such that q* (∼0 , ∉ !"##(1 ∗ ) ≤ 2 (∼* ∗ , ∉ !"##(#) ≤ / Pr Pr and ≤ / find a q (∼* , ∉ !"##(#) ≤ / + ε Pr (∼0 , ∉ !"##(1) ≤ 2 + ε Pr and ≤ 2 p

Generative Model - Neural Net Many governments recognize the military housing of the [[Civil Liberalization and Infantry Resolution 265 National Party in Hungary]], that is sympathetic to be to the [[Punjab Resolution]] (PJS) [http://www.humah.yahoo.com/guardian.cfm/7754800786d17551963s89. htm Official economics Adjoint for the Nazism, Montgomery was swear to advance to the resources for those Socialism's rule, was starting to signing a major tripad of aid exile.]] -- Char-RNN trained on Wikipedia (Karpathy)

NONSENSE! NONSENSE! NONSENSE! q* NONSENSE! NONSENSE! NONSENSE!

Example: Rectangle Learning Consider again the problem instance where: Q is the class of all Uniform distributions over rectangles [a,b]x[c,d]

Draw many samples from p

For any quadruple of points Choose q ∈ Q specified by their bounding box Draw many samples from q to estimate validity querying the oracle "#$$($) ✓ ✓ ✓ � 2 ) samples from p p and O(1/ ε 5 ★ Can learn using O(1/ ε 2 5 ) queries to "#$$($) . In d-dimensions, uses O(d/ ε 2 2 ) samples and O(1/ ε 2d 2d + 1 ) queries.

Curse of dimensionality (The previous algorithm is tight…) Theorem: To find a d-dimensional box q in Q such that #∼% & ∉ ()**(, ∗ ) + / #∼% & ∉ ()**(,) ≤ Pr Pr #∼0 & ∉ ()**(*) ≤ / Pr and one needs to make exp (d) queries to the ()**(*) oracle. Lower-bound requires q in Q (proper learning)!!! We show that if q is not required to be in Q , it is possible to learn efficiently.

Main Result Theorem [Hanneke, Kalai, Kamath, T, COLT’18]: For any class of distributions Q , one can find a q such that #∼% & ∉ ()**(, ∗ ) + / #∼% & ∉ ()**(,) ≤ Pr Pr #∼0 & ∉ ()**(*) ≤ / Pr and using only poly ( VC-dim( Q ), / -1 ) samples from p and queries to ()**(*) .

Example Odd numbers? 3, 5, 13, 89 13, 15, 21? ✓ , ✗ , ✗ Prime numbers? 5, 7, 13? ✓ , ✗ , ✓ … Fibonacci numbers? 8, 13, 21? ✗ , ✓ , ✗ Prime ∧ Fibonacci

Why does this work? Valid Subspace Nonsense Subspace support of q large small intersections support of q’

Summary Learning from positive examples ● Not possible without assumptions ● Proposed a framework for learning when samples are normally distributed ● Alternatively, possible to learn if one can query an oracle for validity Further work ● Learning the Gaussian parameters requires only O(d 2 ) samples for any concept class with validity oracle [Daskalakis, Gouleakis, T , Zampetakis, FOCS’2018] Thank You!

Learning from Positive Examples Christos Tzamos (UW-Madison) Based - PowerPoint PPT Presentation

Learning from Positive Examples Christos Tzamos (UW-Madison) Based on joint work with V Contonis (UW-Madison), C Daskalakis (MIT), T Gouleakis (MIT), S Hanneke (TTIC), A Kalai (MSR), G Kamath (U Waterloo), M Zampetakis (MIT) Typical Classification

Becky Coffin Kingfisher plc Net Positive 2 Net Positive 3 Net Positive 4 Creating the

People Centred Positive Compassion Excellence People Centred Positive Compassion Excellence

Positive Discipline The Solution Studio What is Positive Discipline? Positive discipline is a

Algebra practice part 4 E. Exponents 3 4 Positive exponents Negative exponents Examples:

Outline Learning from Examples 1 Motivation Supervised Learning Aspects of Supervised Learning

The Power Of Positive Thoughts By : Andrew Bennett What Y ou Think About Thinking Do you think

Remove negative environments Promote positive growth Reward good / positive and

LHH LHH Po Positive Car Care Pr Program LHH Positive Care Program 1989 1996 The Early Years O4

Living a Positive Life in Challenging Times Sonya Corbin Dwyer, Ph.D. Positive Psychology

Classes of Herz-Schur multipliers Ivan Todorov April 2014 Toronto Content Positive multipliers

Module 5 Positive Influence Module Five: Positive Influence Objectives Understand the need

Positive Ageing & Resilience Training Guy Robertson Director Positive Ageing Associates

CSE not every positive integer is prime 311 some positive integer is not prime prime

Course Evaluations 1. More examples Worked examples on whiteboard? Concrete examples of

Objectives You should be able to ... Lambda Calculus Examples Here are some examples! Dr.

Graphs More Examples More Examples More Examples Path graph P n : V = {1,,n} and E = {

SET 8a Rout outing ing Algor lgorit ithms hms 1 Network Layer The main functions at the

Welcome Performing Arts Academy Orientation Performing Arts Philosophy The Performing Arts

CSCI 3210: Computational Game Theory Market Equilibria: An Algorithmic Perspective Ref: Ch 5

LCS 11 : Introduction to Cognitive Science. Acquisition of Syntax Jesse Harris April 8 , 2013

C Programming for Engineers Structured Program ICEN 360 Spring 2017 Prof. Dola Saha 1

1 Variant II: Command Consensus (BG) Variant III: Interactive Consistency (IC) Variant II:

NLP: Foundations and State-of-the-Art Part2 Advanced Statistical Learning Seminar (11-745)

7 View-synchronous Group Communication 7.1 Introduction This chapter starts from where Chapter

Learning from Positive Examples Christos Tzamos (UW-Madison) Based - PowerPoint PPT Presentation

Learning from Positive Examples Christos Tzamos (UW-Madison) Based on joint work with V Contonis (UW-Madison), C Daskalakis (MIT), T Gouleakis (MIT), S Hanneke (TTIC), A Kalai (MSR), G Kamath (U Waterloo), M Zampetakis (MIT) Typical Classification

Becky Coffin Kingfisher plc Net Positive 2 Net Positive 3 Net Positive 4 Creating the

People Centred Positive Compassion Excellence People Centred Positive Compassion Excellence

Positive Discipline The Solution Studio What is Positive Discipline? Positive discipline is a

Algebra practice part 4 E. Exponents 3 4 Positive exponents Negative exponents Examples:

Outline Learning from Examples 1 Motivation Supervised Learning Aspects of Supervised Learning

The Power Of Positive Thoughts By : Andrew Bennett What Y ou Think About Thinking Do you think

Remove negative environments Promote positive growth Reward good / positive and

LHH LHH Po Positive Car Care Pr Program LHH Positive Care Program 1989 1996 The Early Years O4

Living a Positive Life in Challenging Times Sonya Corbin Dwyer, Ph.D. Positive Psychology

Classes of Herz-Schur multipliers Ivan Todorov April 2014 Toronto Content Positive multipliers

Module 5 Positive Influence Module Five: Positive Influence Objectives Understand the need

Positive Ageing &amp; Resilience Training Guy Robertson Director Positive Ageing Associates

CSE not every positive integer is prime 311 some positive integer is not prime prime

Course Evaluations 1. More examples Worked examples on whiteboard? Concrete examples of

Objectives You should be able to ... Lambda Calculus Examples Here are some examples! Dr.

Graphs More Examples More Examples More Examples Path graph P n : V = {1,,n} and E = {

SET 8a Rout outing ing Algor lgorit ithms hms 1 Network Layer The main functions at the

Welcome Performing Arts Academy Orientation Performing Arts Philosophy The Performing Arts

CSCI 3210: Computational Game Theory Market Equilibria: An Algorithmic Perspective Ref: Ch 5

LCS 11 : Introduction to Cognitive Science. Acquisition of Syntax Jesse Harris April 8 , 2013

C Programming for Engineers Structured Program ICEN 360 Spring 2017 Prof. Dola Saha 1

1 Variant II: Command Consensus (BG) Variant III: Interactive Consistency (IC) Variant II:

NLP: Foundations and State-of-the-Art Part2 Advanced Statistical Learning Seminar (11-745)

7 View-synchronous Group Communication 7.1 Introduction This chapter starts from where Chapter

Positive Ageing & Resilience Training Guy Robertson Director Positive Ageing Associates