A Formal Proof of PAC Learnability for Decision Stumps Joseph - PowerPoint PPT Presentation

A Formal Proof of PAC Learnability for Decision Stumps Joseph Tassarotti Boston College Jean-Baptiste Tristan Oracle Labs Koundinya Vajjha University of Pittsburgh

Rigour in Machine Learning John had proved on paper that an ML algorithm they had developed at Oracle was fair/non-discriminatory. Given the importance and subtlety of this code, he wanted to have a machine checked proof, and started to wonder what that would take. 1

Rigour in Machine Learning John had proved on paper that an ML algorithm they had developed at Oracle was fair/non-discriminatory. Given the importance and subtlety of this code, he wanted to have a machine checked proof, and started to wonder what that would take. We are starting to see machine learning systems that (on paper) are proven to provide certain guarantees, about things like privacy, fairness, or robustness. Given the importance of these properties, we should strive to give machine-checked proofs that they hold. 1

Motivation: ML systems with provable guarantees 2

Challenges: • Unlike many classic verification problems, these proofs require quite a bit of mathematical prerequisites. 3

Challenges: • Unlike many classic verification problems, these proofs require quite a bit of mathematical prerequisites. • Claims are usually about the probabilistic behavior of an algorithm. 3

Challenges: • Unlike many classic verification problems, these proofs require quite a bit of mathematical prerequisites. • Claims are usually about the probabilistic behavior of an algorithm. • Unlike cryptographic algorithms or randomized algorithms like quicksort, we need to support probability with continuous numbers and distributions. 3

Challenges: • Unlike many classic verification problems, these proofs require quite a bit of mathematical prerequisites. • Claims are usually about the probabilistic behavior of an algorithm. • Unlike cryptographic algorithms or randomized algorithms like quicksort, we need to support probability with continuous numbers and distributions. Literature is not always as rigorous/detailed as we’d like. Technical conditions on lemmas are omitted, and serious details are skipped. 3

Case-study: generalization bound for stumps What we did: • Took the simplest possible example we could think of, called the decision stump learning problem . • Proved a generalization bound about it in Lean This theorem is often the “motivating example” used in textbooks on computational learning theory. 4

Case-study: generalization bound for stumps What we did: • Took the simplest possible example we could think of, called the decision stump learning problem . • Proved a generalization bound about it in Lean This theorem is often the “motivating example” used in textbooks on computational learning theory. Goals: • Exercise libraries, see what else is needed • Warm-up for more advanced results. 4

Stump Learning 5

Stump Learning Goal is to learn to distinguish two classes of items. Blue o’s and red x’s. 5

Stump Learning Goal is to learn to distinguish two classes of items. Blue o’s and red x’s. An unknown boundary value, represented by the dashed line, separates the classes. 5

Stump Learning A reasonable thing to do is to take the largest training example that’s a circle, and use its value as the boundary. 6

✶ Stump Learning This turns out to work well: we can prove that given enough training examples, the classifier we obtain this way can be made to have arbitrarily small error, with high probability. 7

Stump Learning This turns out to work well: we can prove that given enough training examples, the classifier we obtain this way can be made to have arbitrarily small error, with high probability. Formally, • Let X be [ 0 , ∞ ) . • The concept class of decision stumps C is the subset of { 0 , 1 } X defined as { λ x . ✶ ( x ≤ c ) ∣ c ∈ X } . Each element in C is a red-blue labelling as above. 7

Stump Learning Theorem (Informal) There exists a learning function (algorithm) A and a sample complexity function m such that for any distribution µ over X , c ∈ C , ǫ ∈ ( 0 , 1 ) , and δ ∈ ( 0 , 1 ) , when running the learning function A on n ≥ m ( ǫ, δ ) i.i.d. samples from µ labeled by c, A returns a hypothesis h ∈ C such that, with probability at least 1 − δ , µ ({ x ∈ X ∣ h ( x ) ≠ c ( x )}) ≤ ǫ 8

Sketch of Proof Because a decision stump is entirely determined by the boundary value it uses for decisions, we will refer to a stump and its boundary value interchangeably. 9

Sketch of Proof Because a decision stump is entirely determined by the boundary value it uses for decisions, we will refer to a stump and its boundary value interchangeably. Note also that the error probability of a hypothesis is the measure of the interval between it and the target. µ { x ∶ h ( x ) ≠ c ( x )} = µ ( h , c ] 9

Sketch of Proof • Given µ and ǫ , consider (random i.i.d) labeled samples ( X 1 , l 1 ) , ..., ( X n , l n ) . • The learning function A takes the above as input and returns the hypothesis h = λ x . ✶ ( x ≤ max { X i ∣ l i = 1 }) • If µ ( 0 , c ] < ǫ then the bound is trivial. (Since ( h , c ] ⊆ ( 0 , c ] ). So error is bounded with probability 1. So assume µ ( 0 , c ] ≥ ǫ . 10

Sketch of Proof • Find a θ such that µ [ θ, c ] = ǫ . Call I = [ θ, c ] . • If the boundary point h , selected by A , is in I = [ θ, c ] then we have µ ( h , c ] ≤ µ [ θ, c ] = ǫ . So the error is bounded by epsilon. 11

Sketch of Proof • Find a θ such that µ [ θ, c ] = ǫ . Call I = [ θ, c ] . • If the boundary point h , selected by A , is in I = [ θ, c ] then we have µ ( h , c ] ≤ µ [ θ, c ] = ǫ . So the error is bounded by epsilon. • So for the error µ ( h , c ] ≥ ǫ , this means that θ lies inside ( h , c ] . 11

Sketch of Proof • Find a θ such that µ [ θ, c ] = ǫ . Call I = [ θ, c ] . • If the boundary point h , selected by A , is in I = [ θ, c ] then we have µ ( h , c ] ≤ µ [ θ, c ] = ǫ . So the error is bounded by epsilon. • So for the error µ ( h , c ] ≥ ǫ , this means that θ lies inside ( h , c ] . • By the choice of h , it means that none of our samples X i came from I . 11

Sketch of Proof • Find a θ such that µ [ θ, c ] = ǫ . Call I = [ θ, c ] . • If the boundary point h , selected by A , is in I = [ θ, c ] then we have µ ( h , c ] ≤ µ [ θ, c ] = ǫ . So the error is bounded by epsilon. • So for the error µ ( h , c ] ≥ ǫ , this means that θ lies inside ( h , c ] . • By the choice of h , it means that none of our samples X i came from I . • The probability of a point X i not belonging to I has probability at most 1 − ǫ . 11

Sketch of Proof • Find a θ such that µ [ θ, c ] = ǫ . Call I = [ θ, c ] . • If the boundary point h , selected by A , is in I = [ θ, c ] then we have µ ( h , c ] ≤ µ [ θ, c ] = ǫ . So the error is bounded by epsilon. • So for the error µ ( h , c ] ≥ ǫ , this means that θ lies inside ( h , c ] . • By the choice of h , it means that none of our samples X i came from I . • The probability of a point X i not belonging to I has probability at most 1 − ǫ . • Since samples are i.i.d, the probability is atmost ( 1 − ǫ ) n . Choose an appropriate m such that for n ≥ m we get the result. 11

Problem! 12

Problem! Such a θ may not exist! 12

Problem! Such a θ may not exist! Indeed, take µ to be the Bernoulli distribution with p = . 5, c = . 5, and ǫ = . 25. Then the desired θ does not exist. 12

Problem! Such a θ may not exist! Indeed, take µ to be the Bernoulli distribution with p = . 5, c = . 5, and ǫ = . 25. Then the desired θ does not exist. Since PAC learning is distribution free, we need to account for all distributions µ , including those with a discrete component. 12

Fixing the proof The point θ we are looking for should be defined as θ = sup { x ∈ X ∣ µ [ x , c ] ≥ ǫ } and we not only need to prove that θ satisfies µ [ θ, c ] ≥ ǫ but also that µ ( θ, c ] ≤ ǫ . 13

Fixing the proof The point θ we are looking for should be defined as θ = sup { x ∈ X ∣ µ [ x , c ] ≥ ǫ } and we not only need to prove that θ satisfies µ [ θ, c ] ≥ ǫ but also that µ ( θ, c ] ≤ ǫ . Using this we fix the proof. And this proof has now been formalized. 13

Proofs in Textbooks? ✗ ✗ ✓ 14

Proofs in Textbooks? ✗ ✗ ✓ The one correct proof still omits the hardest part! (“Not hard to see...”) 14

It Gets Worse For stumps, the theorem is true even if many proofs are wrong. Many other results in these books are wrong as stated . Problem : they omit “technical” conditions about measurability. 15

It Gets Worse For stumps, the theorem is true even if many proofs are wrong. Many other results in these books are wrong as stated . Problem : they omit “technical” conditions about measurability. Challenge: can formalization automate these tedious details? Or provide a more “intuitive” interface that is checked? 15

Expressing the algorithm A major challenge is formally describing the learning algorithm. Three “stages”: 1. Draw a random sample of labeled training examples 2. Run the learning algorithm 3. Consider behavior of returned classifier on test examples 16

Enter Giry Monad 17

A Formal Proof of PAC Learnability for Decision Stumps Joseph - PowerPoint PPT Presentation

A Formal Proof of PAC Learnability for Decision Stumps Joseph Tassarotti Boston College Jean-Baptiste Tristan Oracle Labs Koundinya Vajjha University of Pittsburgh Rigour in Machine Learning John had proved on paper that an ML algorithm

Provably Robust Boosted Decision Stumps and Trees against Adversarial Attacks Maksym

Guiding Financial Controls and Practices for PACs and PAC Treasurers PAC Treasurers Workshop

PAC Learnability and Bayes Classifier Matthieu R. Bloch 1 PAC learnability Tie last question to

NAPSLO PAC Contributions How contributing to the NAPSLO PAC will benefit you, your company and the

WELCOME June 2011 PAC Presentation Opening Remarks Introductions June 2011 PAC

AAOS Orthopaedic PAC The Orthopaedic PAC is the only national political action committee

LArIAT Fermilab PAC Meeting November 11, 2016 Jen Raaf PAC Charge Fermilab PAC Meeting, J.

Learnability and models of decision making under uncertainty Pathikrit Basu Federico Echenique

PAC Learnability of Node Functions in Networked Dynamical Systems Abhijin Adiga, Chris J. Kuhlman,

An experimental study of the learnability of congestion control Anirudh Sivaraman, Keith

Machine learning theory Nonuniform learnability Hamid Beigy Sharif university of technology

Evaluating Learnability of - User interface and inline help - Inline/Online Tutorials Aim:

Plan Introduction 1 On categorial grammars and learnability 2 Logical Information Systems

HERITAGE SQUARE CONSIDERATIONS Public Process Project Advisory Committee Meetings: PAC Meeting

Interferometric Sensor (MAGIS-100) PAC Meeting Jason Hogan on behalf of the MAGIS

Formal Definition of a Finite Automaton Formal Definition of a Finite Automaton p.1/23 Why a

Formalizing Mathematics John Harrison Intel Corporation Seminar, University of Nice 29 November

Commutative Algebra MAS439 Lecture 1 Paul Johnson paul.johnson@sheffield.ac.uk Hicks J06b

Back to Basics - The 4R's of So3ware Es7ma7on Barbara Kitchenham Keele University Aim To

The Rigour of Proof SIGMAA, on the philosophy of mathematics Baltimore, January 2019 Michle

Mathematical rigour, pragmatically: the behaviour of C and UDP Michael Norrish, Peter Sewell and

COMP80122 Academic Life Publishing & Scholarly Communication Carole Goble | Uli Sattler

Rapid Reviews to Strengthen Health Policy and Systems Andrea C. Tricco MSc, PhD Scientist and

Making Resource Analysis Practical for Real-Time Java Rody Kersten, Olha Shkaravska, Bernard van

A Formal Proof of PAC Learnability for Decision Stumps Joseph - PowerPoint PPT Presentation

A Formal Proof of PAC Learnability for Decision Stumps Joseph Tassarotti Boston College Jean-Baptiste Tristan Oracle Labs Koundinya Vajjha University of Pittsburgh Rigour in Machine Learning John had proved on paper that an ML algorithm

Provably Robust Boosted Decision Stumps and Trees against Adversarial Attacks Maksym

Guiding Financial Controls and Practices for PACs and PAC Treasurers PAC Treasurers Workshop

PAC Learnability and Bayes Classifier Matthieu R. Bloch 1 PAC learnability Tie last question to

NAPSLO PAC Contributions How contributing to the NAPSLO PAC will benefit you, your company and the

WELCOME June 2011 PAC Presentation Opening Remarks Introductions June 2011 PAC

AAOS Orthopaedic PAC The Orthopaedic PAC is the only national political action committee

LArIAT Fermilab PAC Meeting November 11, 2016 Jen Raaf PAC Charge Fermilab PAC Meeting, J.

Learnability and models of decision making under uncertainty Pathikrit Basu Federico Echenique

PAC Learnability of Node Functions in Networked Dynamical Systems Abhijin Adiga, Chris J. Kuhlman,

An experimental study of the learnability of congestion control Anirudh Sivaraman, Keith

Machine learning theory Nonuniform learnability Hamid Beigy Sharif university of technology

Evaluating Learnability of - User interface and inline help - Inline/Online Tutorials Aim:

Plan Introduction 1 On categorial grammars and learnability 2 Logical Information Systems

HERITAGE SQUARE CONSIDERATIONS Public Process Project Advisory Committee Meetings: PAC Meeting

Interferometric Sensor (MAGIS-100) PAC Meeting Jason Hogan on behalf of the MAGIS

Formal Definition of a Finite Automaton Formal Definition of a Finite Automaton p.1/23 Why a

Formalizing Mathematics John Harrison Intel Corporation Seminar, University of Nice 29 November

Commutative Algebra MAS439 Lecture 1 Paul Johnson paul.johnson@sheffield.ac.uk Hicks J06b

Back to Basics - The 4R's of So3ware Es7ma7on Barbara Kitchenham Keele University Aim To

The Rigour of Proof SIGMAA, on the philosophy of mathematics Baltimore, January 2019 Michle

Mathematical rigour, pragmatically: the behaviour of C and UDP Michael Norrish, Peter Sewell and

COMP80122 Academic Life Publishing &amp; Scholarly Communication Carole Goble | Uli Sattler

Rapid Reviews to Strengthen Health Policy and Systems Andrea C. Tricco MSc, PhD Scientist and

Making Resource Analysis Practical for Real-Time Java Rody Kersten, Olha Shkaravska, Bernard van

COMP80122 Academic Life Publishing & Scholarly Communication Carole Goble | Uli Sattler