PAC Learning Learning Theory Readings: Matt Gormley Murphy -- - PowerPoint PPT Presentation

10-‑601 ¡Introduction ¡to ¡Machine ¡Learning Machine ¡Learning ¡Department School ¡of ¡Computer ¡Science Carnegie ¡Mellon ¡University PAC ¡Learning Learning ¡Theory ¡Readings: Matt ¡Gormley Murphy ¡-‑-‑ Bishop ¡-‑-‑ Lecture ¡28 HTF ¡-‑-‑ May ¡1, ¡2016 Mitchell ¡7 1

Reminders • Homework 9: ¡Applications of ¡ML – Release: ¡Mon, ¡Apr. ¡24 – Due: ¡Wed, ¡May 3 ¡at ¡11:59pm 4

Outline • Statistical ¡Learning ¡Theory – True ¡Error ¡vs. ¡Train ¡Error – Function ¡Approximation ¡View ¡(aka. ¡PAC/SLT ¡Model) – Three ¡Hypotheses ¡of ¡Interest • Probably ¡Approximately ¡Correct ¡(PAC) ¡Learning – PAC ¡Criterion – PAC ¡Learnable – Consistent ¡Learner – Sample ¡Complexity • Generalization ¡and ¡Overfitting – Realizable ¡vs. ¡Agnostic ¡Cases – Finite ¡vs. ¡Infinite ¡Hypothesis ¡Spaces – VC ¡Dimension – Sample ¡Complexity ¡Bounds – Empirical ¡Risk ¡Minimization – Structural ¡Risk ¡Minimization • Excess ¡Risk 5

LEARNING ¡THEORY 6

Questions ¡For ¡Today 1. Given ¡a ¡classifier ¡with ¡zero ¡training ¡error, ¡what ¡ can ¡we ¡say ¡about ¡generalization ¡error? (Sample ¡Complexity, ¡Realizable ¡Case) 2. Given ¡a ¡classifier ¡with ¡low ¡training ¡error, ¡what ¡ can ¡we ¡say ¡about ¡generalization ¡error? (Sample ¡Complexity, ¡Agnostic ¡Case) 3. Is ¡there ¡a ¡theoretical ¡justification ¡for ¡ regularization ¡to ¡avoid ¡overfitting? (Structural ¡Risk ¡Minimization) 7

Statistical ¡Learning ¡Theory Whiteboard: – Function ¡Approximation ¡View ¡(aka. ¡PAC/SLT ¡ Model) – True ¡Error ¡vs. ¡Train ¡Error – Three ¡Hypotheses ¡of ¡Interest 8

PAC/SLT models for Supervised Learning PAC ¡/ ¡SLT ¡Model Data Distribution D on X Source Expert / Oracle Learning Algorithm Labeled Examples (x 1 ,c*(x 1 )),…, ( x m ,c*(x m )) c* : X ! Y Alg.outputs h : X ! Y x 1 > 5 + + - + - + +1 x 6 > 2 - - - - -1 +1 9 Slide ¡from ¡Nina ¡Balcan

PAC ¡/ ¡SLT ¡Model 10

Two ¡Types ¡of ¡Error True ¡Error ¡(aka. ¡ expected ¡risk ) Train ¡Error ¡(aka. ¡ empirical ¡risk ) 11

Three ¡Hypotheses ¡of ¡Interest 12

PAC ¡LEARNING 13

Probably ¡Approximately ¡Correct ¡ (PAC) ¡Learning Whiteboard: – PAC ¡Criterion – Meaning ¡of ¡“Probably ¡Approximately ¡Correct” – PAC ¡Learnable – Consistent ¡Learner – Sample ¡Complexity 14

PAC ¡Learning 15

SAMPLE ¡COMPLEXITY ¡RESULTS 16

Sample ¡Complexity ¡Results We’ll ¡start ¡with ¡the ¡ Four ¡Cases ¡we ¡care ¡about… finite ¡case… Realizable Agnostic 17

Generalization ¡and ¡Overfitting Whiteboard: – Realizable ¡vs. ¡Agnostic ¡Cases – Finite ¡vs. ¡Infinite ¡Hypothesis ¡Spaces – Sample ¡Complexity ¡Bounds ¡(Finite ¡Case) 18

Sample ¡Complexity ¡Results Four ¡Cases ¡we ¡care ¡about… Realizable Agnostic 19

Example: ¡Conjunctions In-‑Class ¡Quiz: Suppose ¡H ¡= ¡class ¡of ¡conjunctions ¡over ¡ x ¡ in ¡{0,1} M If ¡M ¡= ¡10, ¡ 𝜁 = ¡0.1, ¡δ = ¡0.01, ¡how ¡many ¡examples ¡suffice? Realizable Agnostic 20

Sample ¡Complexity ¡Results Four ¡Cases ¡we ¡care ¡about… Realizable Agnostic We ¡need ¡a ¡new ¡definition ¡of ¡ “complexity” ¡for ¡a ¡Hypothesis ¡space ¡ for ¡these ¡results ¡(see ¡ VC ¡Dimension ) 22

VC ¡DIMENSION 23

What if H is infinite? + + - + E.g., linear separators in R d - + - - - - - + E.g., thresholds on the real line w - - + E.g., intervals on the real line a b 24

Shattering, VC-dimension Definition : H[S] – the set of splittings of dataset S using concepts from H. H shatters S if | H S | = 2 |𝑇| . A set of points S is shattered by H is there are hypotheses in H that split S in all of the 2 |𝑇| possible ways; i.e., all possible ways of classifying points in S are achievable using concepts in H. Definition : VC-dimension (Vapnik-Chervonenkis dimension) The VC-dimension of a hypothesis space H is the cardinality of the largest set S that can be shattered by H. If arbitrarily large finite sets can be shattered by H, then VCdim(H) = ∞ 25

Shattering, VC-dimension Definition : VC-dimension (Vapnik-Chervonenkis dimension) The VC-dimension of a hypothesis space H is the cardinality of the largest set S that can be shattered by H. If arbitrarily large finite sets can be shattered by H, then VCdim(H) = ∞ To show that VC-dimension is d: – there exists a set of d points that can be shattered – there is no set of d+1 points that can be shattered. Fact : If H is finite, then VCdim (|H|) . (H) ≤ log 26

Shattering, VC-dimension If the VC-dimension is d, that means there exists a set of d points that can be shattered, but there is no set of d+1 points that can be shattered. - + E.g., H= Thresholds on the real line w VCdim H = 1 + - - - + E.g., H= Intervals on the real line VCdim H = 2 + - + 27

Shattering, VC-dimension If the VC-dimension is d, that means there exists a set of d points that can be shattered, but there is no set of d+1 points that can be shattered. VCdim H = 2k E.g., H= Union of k intervals on the real line + - + + - - A sample of size 2k shatters VCdim H ≥ 2k (treat each pair of points as a separate case of intervals) VCdim H < 2k + 1 + - + - + … 28

Shattering, VC-dimension E.g., H= linear separators in R 2 VCdim H ≥ 3 29

Shattering, VC-dimension E.g., H= linear separators in R 2 VCdim H < 4 Case 1: one point inside the triangle formed by the others. Cannot label inside point as positive and outside points as negative. Case 2: all points on the boundary (convex hull). Cannot label two diagonally as positive and other two as negative. Fact: VCdim of linear separators in R d is d+1 30

SAMPLE ¡COMPLEXITY ¡RESULTS 32

Sample ¡Complexity ¡Results Four ¡Cases ¡we ¡care ¡about… Realizable Agnostic We ¡need ¡a ¡new ¡definition ¡of ¡ “complexity” ¡for ¡a ¡Hypothesis ¡space ¡ for ¡these ¡results ¡(see ¡ VC ¡Dimension ) 33

Generalization ¡and ¡Overfitting Whiteboard: – Sample ¡Complexity ¡Bounds ¡(Infinite ¡Case) – Empirical ¡Risk ¡Minimization – Structural ¡Risk ¡Minimization 35

EXCESS ¡RISK 36

Excess ¡Risk 37

Excess ¡Risk ¡Results 38

Questions ¡For ¡Today 1. Given ¡a ¡classifier ¡with ¡zero ¡training ¡error, ¡what ¡ can ¡we ¡say ¡about ¡generalization ¡error? (Sample ¡Complexity, ¡Realizable ¡Case) 2. Given ¡a ¡classifier ¡with ¡low ¡training ¡error, ¡what ¡ can ¡we ¡say ¡about ¡generalization ¡error? (Sample ¡Complexity, ¡Agnostic ¡Case) 3. Is ¡there ¡a ¡theoretical ¡justification ¡for ¡ regularization ¡to ¡avoid ¡overfitting? (Structural ¡Risk ¡Minimization) 39

PAC Learning Learning Theory Readings: Matt Gormley Murphy -- - PowerPoint PPT Presentation

10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University PAC Learning Learning Theory Readings: Matt Gormley Murphy -- Bishop

Guiding Financial Controls and Practices for PACs and PAC Treasurers PAC Treasurers Workshop

NAPSLO PAC Contributions How contributing to the NAPSLO PAC will benefit you, your company and the

WELCOME June 2011 PAC Presentation Opening Remarks Introductions June 2011 PAC

AAOS Orthopaedic PAC The Orthopaedic PAC is the only national political action committee

LArIAT Fermilab PAC Meeting November 11, 2016 Jen Raaf PAC Charge Fermilab PAC Meeting, J.

The PAC Learning Framework Guoqing Zheng January 20, 2015 Guoqing Zheng The PAC Learning

HERITAGE SQUARE CONSIDERATIONS Public Process Project Advisory Committee Meetings: PAC Meeting

Interferometric Sensor (MAGIS-100) PAC Meeting Jason Hogan on behalf of the MAGIS

PAC Learning prof. dr Arno Siebes Algorithmic Data Analysis Group Department of Information and

PAC Team P resentation Provosts Assessment Committee (PAC) Fall Convocation 2018 University

SCOTT RIT-PAC III Objectives Describe the SCOTT RIT-PAC III and its components

PAC 101 BY PAST PAC CHAIR, BRIGITTA SHORE PURPOSE OF A P ARENT A DVISORY C OUNCIL To advocate

Gui Guidi ding ng Fi Financi nancial al Controls and and Pr Practices for PA PACs and

Country Paper Presentation on Implementation of SEA-PAC Action Plan (2017-2019) at 13 th SEA-

PAC ACIFIC IFIC RI RING NG OF FI FIRE RE Photo credit: wikipedia.org PAC ACIFIC IFIC TY

w pac.edu.au e admissions@pac.edu.au ABN 235 392 909 73 Anna Karenina by Leo Tolstoy; Ulysses by

a HIGH IMPEDANCE SENSORS I Photodiode Preamplifiers I Piezoelectric Sensors N Accelerometers N

Conformational Variability Experience with Ribosomes Exploration of reconstruction strategy

A New DC Muon Beam Source: MuSIC - Status and Prospects - Akira SATO Department of Osaka

y>[Paa >C * h *

PAC Learning Matt Gormley Lecture 14 Oct. 17, 2018 1 ML Big Picture Learning Paradigms:

Computational Learning Theory: Probably Approximately Correct (PAC) Learning Machine Learning 1

A Search for the LHCb Charmed Pentaquark using Photoproduction of J/ at Threshold in Hall

Data Dependent Priors in PAC-Bayes Bounds John Shawe-Taylor University College London Joint work

PAC Learning Learning Theory Readings: Matt Gormley Murphy -- - PowerPoint PPT Presentation

10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University PAC Learning Learning Theory Readings: Matt Gormley Murphy -- Bishop

Guiding Financial Controls and Practices for PACs and PAC Treasurers PAC Treasurers Workshop

NAPSLO PAC Contributions How contributing to the NAPSLO PAC will benefit you, your company and the

WELCOME June 2011 PAC Presentation Opening Remarks Introductions June 2011 PAC

AAOS Orthopaedic PAC The Orthopaedic PAC is the only national political action committee

LArIAT Fermilab PAC Meeting November 11, 2016 Jen Raaf PAC Charge Fermilab PAC Meeting, J.

The PAC Learning Framework Guoqing Zheng January 20, 2015 Guoqing Zheng The PAC Learning

HERITAGE SQUARE CONSIDERATIONS Public Process Project Advisory Committee Meetings: PAC Meeting

Interferometric Sensor (MAGIS-100) PAC Meeting Jason Hogan on behalf of the MAGIS

PAC Learning prof. dr Arno Siebes Algorithmic Data Analysis Group Department of Information and

PAC Team P resentation Provosts Assessment Committee (PAC) Fall Convocation 2018 University

SCOTT RIT-PAC III Objectives Describe the SCOTT RIT-PAC III and its components

PAC 101 BY PAST PAC CHAIR, BRIGITTA SHORE PURPOSE OF A P ARENT A DVISORY C OUNCIL To advocate

Gui Guidi ding ng Fi Financi nancial al Controls and and Pr Practices for PA PACs and

Country Paper Presentation on Implementation of SEA-PAC Action Plan (2017-2019) at 13 th SEA-

PAC ACIFIC IFIC RI RING NG OF FI FIRE RE Photo credit: wikipedia.org PAC ACIFIC IFIC TY

w pac.edu.au e admissions@pac.edu.au ABN 235 392 909 73 Anna Karenina by Leo Tolstoy; Ulysses by

a HIGH IMPEDANCE SENSORS I Photodiode Preamplifiers I Piezoelectric Sensors N Accelerometers N

Conformational Variability Experience with Ribosomes Exploration of reconstruction strategy

A New DC Muon Beam Source: MuSIC - Status and Prospects - Akira SATO Department of Osaka

y&gt;[Paa &gt;C * h *

PAC Learning Matt Gormley Lecture 14 Oct. 17, 2018 1 ML Big Picture Learning Paradigms:

Computational Learning Theory: Probably Approximately Correct (PAC) Learning Machine Learning 1

A Search for the LHCb Charmed Pentaquark using Photoproduction of J/ at Threshold in Hall

Data Dependent Priors in PAC-Bayes Bounds John Shawe-Taylor University College London Joint work

y>[Paa >C * h *