PAC Learning + Midterm Review Matt Gormley Lecture 15 March 7, - PowerPoint PPT Presentation

10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University PAC Learning + Midterm Review Matt Gormley Lecture 15 March 7, 2018 1

ML Big Picture Learning Paradigms: Problem Formulation: Vision, Robotics, Medicine, What is the structure of our output prediction? What data is available and NLP, Speech, Computer when? What form of prediction? boolean Binary Classification • supervised learning categorical Multiclass Classification • unsupervised learning ordinal Ordinal Classification Application Areas • semi-supervised learning real Regression • reinforcement learning Key challenges? • active learning ordering Ranking • imitation learning multiple discrete Structured Prediction • domain adaptation • multiple continuous (e.g. dynamical systems) online learning Search • density estimation both discrete & (e.g. mixed graphical models) • recommender systems cont. • feature learning • manifold learning • dimensionality reduction Facets of Building ML Big Ideas in ML: • ensemble learning Systems: Which are the ideas driving • distant supervision How to build systems that are development of the field? • hyperparameter optimization robust, efficient, adaptive, • inductive bias effective? • Theoretical Foundations: generalization / overfitting 1. Data prep • What principles guide learning? 2. Model selection bias-variance decomposition • 3. Training (optimization / q probabilistic generative vs. discriminative search) • deep nets, graphical models q information theoretic 4. Hyperparameter tuning on • PAC learning q evolutionary search validation data • distant rewards 5. (Blind) Assessment on test q ML as optimization 2 data

LEARNING THEORY 3

Questions For Today 1. Given a classifier with zero training error, what can we say about generalization error? (Sample Complexity, Realizable Case) 2. Given a classifier with low training error, what can we say about generalization error? (Sample Complexity, Agnostic Case) 3. Is there a theoretical justification for regularization to avoid overfitting? (Structural Risk Minimization) 4

PAC/SLT models for Supervised Learning PAC / SLT Model Data Distribution D on X Source Expert / Oracle Learning Algorithm Labeled Examples (x 1 ,c*(x 1 )),…, ( x m ,c*(x m )) c* : X ! Y Alg.outputs h : X ! Y x 1 > 5 + + - + - + +1 x 6 > 2 - - - - -1 +1 6 Slide from Nina Balcan

Two Types of Error True Error (aka. expected risk ) Train Error (aka. empirical risk ) 7

PAC / SLT Model 8

Three Hypotheses of Interest 9

PAC LEARNING 10

Probably Approximately Correct (PAC) Learning Whiteboard: – PAC Criterion – Meaning of “Probably Approximately Correct” – PAC Learnable – Consistent Learner – Sample Complexity 11

Generalization and Overfitting Whiteboard: – Realizable vs. Agnostic Cases – Finite vs. Infinite Hypothesis Spaces 12

PAC Learning 13

SAMPLE COMPLEXITY RESULTS 14

Sample Complexity Results We’ll start with the Four Cases we care about… finite case… Realizable Agnostic 15

Sample Complexity Results Four Cases we care about… Realizable Agnostic 16

Example: Conjunctions In-Class Quiz: Suppose H = class of conjunctions over x in {0,1} M If M = 10, ! = 0.1, δ = 0.01, how many examples suffice? Realizable Agnostic 17

1. Bound is inversely linear in 1. Bound is inversely quadratic in Sample Complexity Results epsilon (e.g. halving the error epsilon (e.g. halving the error requires double the examples) requires 4x the examples) 2. Bound is only logarithmic in 2. Bound is only logarithmic in |H| (e.g. quadrupling the |H| (i.e. same as Realizable hypothesis space only requires case) double the examples) Four Cases we care about… Realizable Agnostic 19

Generalization and Overfitting Whiteboard: – Sample Complexity Bounds (Agnostic Case) – Corollary (Agnostic Case) – Empirical Risk Minimization – Structural Risk Minimization – Motivation for Regularization 22

Sample Complexity Results Four Cases we care about… Realizable Agnostic We need a new definition of “complexity” for a Hypothesis space for these results (see VC Dimension ) 23

VC DIMENSION 25

What if H is infinite? + + - + E.g., linear separators in R d - + - - - - - + E.g., thresholds on the real line w - - + E.g., intervals on the real line a b 26

Shattering, VC-dimension Definition : H[S] – the set of splittings of dataset S using concepts from H. H shatters S if | H S | = 2 |𝑇| . A set of points S is shattered by H is there are hypotheses in H that split S in all of the 2 |𝑇| possible ways; i.e., all possible ways of classifying points in S are achievable using concepts in H. Definition : VC-dimension (Vapnik-Chervonenkis dimension) The VC-dimension of a hypothesis space H is the cardinality of the largest set S that can be shattered by H. If arbitrarily large finite sets can be shattered by H, then VCdim(H) = ∞ 27

Shattering, VC-dimension Definition : VC-dimension (Vapnik-Chervonenkis dimension) The VC-dimension of a hypothesis space H is the cardinality of the largest set S that can be shattered by H. If arbitrarily large finite sets can be shattered by H, then VCdim(H) = ∞ To show that VC-dimension is d: – there exists a set of d points that can be shattered – there is no set of d+1 points that can be shattered. Fact : If H is finite, then VCdim (|H|) . (H) ≤ log 28

Shattering, VC-dimension If the VC-dimension is d, that means there exists a set of d points that can be shattered, but there is no set of d+1 points that can be shattered. - + E.g., H= Thresholds on the real line w VCdim H = 1 + - - - + E.g., H= Intervals on the real line VCdim H = 2 + - + 29

Shattering, VC-dimension If the VC-dimension is d, that means there exists a set of d points that can be shattered, but there is no set of d+1 points that can be shattered. VCdim H = 2k E.g., H= Union of k intervals on the real line + - - + + - A sample of size 2k shatters VCdim H ≥ 2k (treat each pair of points as a separate case of intervals) VCdim H < 2k + 1 + - + - + … 30

Shattering, VC-dimension E.g., H= linear separators in R 2 VCdim H ≥ 3 31

Shattering, VC-dimension E.g., H= linear separators in R 2 VCdim H < 4 Case 1: one point inside the triangle formed by the others. Cannot label inside point as positive and outside points as negative. Case 2: all points on the boundary (convex hull). Cannot label two diagonally as positive and other two as negative. Fact: VCdim of linear separators in R d is d+1 32

Questions For Today 1. Given a classifier with zero training error, what can we say about generalization error? (Sample Complexity, Realizable Case) 2. Given a classifier with low training error, what can we say about generalization error? (Sample Complexity, Agnostic Case) 3. Is there a theoretical justification for regularization to avoid overfitting? (Structural Risk Minimization) 39

Learning Theory Objectives You should be able to… • Identify the properties of a learning setting and assumptions required to ensure low generalization error • Distinguish true error, train error, test error • Define PAC and explain what it means to be approximately correct and what occurs with high probability • Apply sample complexity bounds to real-world learning examples • Distinguish between a large sample and a finite sample analysis • Theoretically motivate regularization 40

Outline • Midterm Exam Logistics • Sample Questions • Classification and Regression: The Big Picture • Q&A 41

MIDTERM EXAM LOGISTICS 42

Midterm Exam • Time / Location – Time: Evening Exam Thu, March 22 at 6:30pm – 8:30pm – Room : We will contact each student individually with your room assignment . The rooms are not based on section. – Seats: There will be assigned seats . Please arrive early. – Please watch Piazza carefully for announcements regarding room / seat assignments. • Logistics – Format of questions: • Multiple choice • True / False (with justification) • Derivations • Short answers • Interpreting figures • Implementing algorithms on paper – No electronic devices – You are allowed to bring one 8½ x 11 sheet of notes (front and back) 43

Midterm Exam • How to Prepare – Attend the midterm review lecture (right now!) – Review prior year’s exam and solutions (we’ll post them) – Review this year’s homework problems – Consider whether you have achieved the “learning objectives” for each lecture / section 44

PAC Learning + Midterm Review Matt Gormley Lecture 15 March 7, - PowerPoint PPT Presentation

10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University PAC Learning + Midterm Review Matt Gormley Lecture 15 March 7, 2018 1 ML Big Picture Learning Paradigms: Problem

Guiding Financial Controls and Practices for PACs and PAC Treasurers PAC Treasurers Workshop

Midterm Introduction to Web Design Midterm exam on Tuesday, October 22 Midterm Introduction to

NAPSLO PAC Contributions How contributing to the NAPSLO PAC will benefit you, your company and the

WELCOME June 2011 PAC Presentation Opening Remarks Introductions June 2011 PAC

AAOS Orthopaedic PAC The Orthopaedic PAC is the only national political action committee

LArIAT Fermilab PAC Meeting November 11, 2016 Jen Raaf PAC Charge Fermilab PAC Meeting, J.

61A Lecture 11 Friday, September 21 Midterm 1 Recap 2 Midterm 1 Recap The exam was more

Midterm 2 Review. Midterm format Modular Arithmetic Inverses and GCD Midterm Topics: Notes 6-14.

CS 401 Midterm review Xiaorui Sun 1 Midterm Exam Midterm exam via gradescope : October 16

The PAC Learning Framework Guoqing Zheng January 20, 2015 Guoqing Zheng The PAC Learning

HERITAGE SQUARE CONSIDERATIONS Public Process Project Advisory Committee Meetings: PAC Meeting

Interferometric Sensor (MAGIS-100) PAC Meeting Jason Hogan on behalf of the MAGIS

Midterm Solutions David M. Rocke BIM 105, Fall 2018 David M. Rocke Midterm Solutions November

Announcements Midterm 2 is Thursday The midterm will cover everything since the first midterm up

CSE 115 Introduction to Computer Science I Midterm Midterm will be returned no later than

PAC Learning prof. dr Arno Siebes Algorithmic Data Analysis Group Department of Information and

V Vacuum Electron Device El t D i Limitations for High-Power RF Sources Heinz Bohlen,

Inferring and Executing Programs for Visual Reasoning Justin Johnson , Bharath Hariharan, Laurens

The Big Picture Related Work Practical Considerations The Boundaries of Our Methods Code

Magistrates Survey Results 19/02/2019 1 The Survey The Survey: Issued in December 2017

Introduction to Quantum Computing Kitty Yeung, Ph.D. in Applied Physics Creative Technologist +

Learning Learning is the ability to improve ones behavior based on experience. The range

CPSC 121: Models of Computation Unit 9 Mathematical Induction Based on slides by Patrice

Machine Learning CMSC 422 M ARINE C ARPUAT marine@cs.umd.edu What is this course about?