Learning theory Lecture 10 David Sontag New York - PowerPoint PPT Presentation

Learning ¡theory ¡ Lecture ¡10 ¡ David ¡Sontag ¡ New ¡York ¡University ¡ Slides adapted from Carlos Guestrin & Luke Zettlemoyer

What ¡about ¡con:nuous ¡hypothesis ¡spaces? ¡ • Con:nuous ¡hypothesis ¡space: ¡ ¡ – |H| ¡= ¡ ∞ ¡ – Infinite ¡variance??? ¡ • Only ¡care ¡about ¡the ¡maximum ¡number ¡of ¡ points ¡that ¡can ¡be ¡classified ¡exactly! ¡

How ¡many ¡points ¡can ¡a ¡linear ¡boundary ¡classify ¡ exactly? ¡(1-‑D) ¡ 2 Points: Yes!! 3 Points: No… etc (8 total)

ShaLering ¡and ¡Vapnik–Chervonenkis ¡Dimension ¡ A ¡ set ¡of ¡points ¡ is ¡ sha$ered ¡by ¡a ¡hypothesis ¡ space ¡H ¡iff: ¡ – For ¡all ¡ways ¡of ¡ spli+ng ¡the ¡examples ¡into ¡ posi:ve ¡and ¡nega:ve ¡subsets ¡ – There ¡exists ¡some ¡ consistent ¡hypothesis ¡h ¡ The ¡ VC ¡Dimension ¡ of ¡H ¡over ¡input ¡space ¡X ¡ – The ¡size ¡of ¡the ¡ largest ¡finite ¡subset ¡of ¡X ¡ shaLered ¡by ¡H ¡

How ¡many ¡points ¡can ¡a ¡linear ¡boundary ¡classify ¡ exactly? ¡(2-‑D) ¡ 3 Points: Yes!! 4 Points: No… etc. [Figure from Chris Burges]

How ¡many ¡points ¡can ¡a ¡linear ¡boundary ¡classify ¡ exactly? ¡(d-‑D) ¡ • A ¡linear ¡classifier ¡∑ j=1..d w j x j ¡ + ¡b ¡ ¡can ¡ represent ¡all ¡ assignments ¡of ¡possible ¡labels ¡to ¡d+1 ¡points ¡ ¡ – But ¡not ¡d+2!! ¡ – Thus, ¡VC-‑dimension ¡of ¡d-‑dimensional ¡linear ¡classifiers ¡is ¡ d+1 ¡ – Bias ¡term ¡b ¡required ¡ – Rule ¡of ¡Thumb: ¡number ¡of ¡parameters ¡in ¡model ¡o_en ¡ matches ¡max ¡number ¡of ¡points ¡ ¡ • Ques:on: ¡Can ¡we ¡get ¡a ¡bound ¡for ¡error ¡as ¡a ¡func:on ¡of ¡ the ¡number ¡of ¡points ¡that ¡can ¡be ¡completely ¡labeled? ¡

PAC ¡bound ¡using ¡VC ¡dimension ¡ • VC ¡dimension: ¡number ¡of ¡training ¡points ¡that ¡can ¡be ¡ classified ¡exactly ¡(shaLered) ¡by ¡hypothesis ¡space ¡H!!! ¡ – Measures ¡relevant ¡size ¡of ¡hypothesis ¡space ¡ • Same ¡bias ¡/ ¡variance ¡tradeoff ¡as ¡always ¡ – Now, ¡just ¡a ¡func:on ¡of ¡VC(H) ¡ • Note: ¡all ¡of ¡this ¡theory ¡is ¡for ¡ binary ¡classifica:on ¡ – Can ¡be ¡generalized ¡to ¡mul:-‑class ¡and ¡also ¡regression ¡

What ¡is ¡the ¡VC-‑dimension ¡of ¡rectangle ¡ classifiers? ¡ • First, ¡show ¡that ¡there ¡are ¡4 ¡points ¡that ¡ can ¡be ¡ shaLered: ¡ • Then, ¡show ¡that ¡no ¡set ¡of ¡5 ¡points ¡can ¡be ¡ shaLered: ¡ [Figures from Anand Bhaskar, Ilya Sukhar]

Generaliza:on ¡bounds ¡using ¡VC ¡dimension ¡ • Linear ¡classifiers: ¡ ¡ – VC(H) ¡= ¡d+1, ¡for ¡ d ¡features ¡plus ¡constant ¡term ¡ b ¡ • Classifiers ¡using ¡Gaussian ¡Kernel ¡ – VC(H) ¡= ¡ ∞ Euclidean distance, squared [Figure from Chris Burges] [Figure from mblondel.org]

Gap ¡tolerant ¡classifiers ¡ • Suppose ¡data ¡lies ¡in ¡R d ¡in ¡a ¡ball ¡of ¡diameter ¡ D ¡ • Consider ¡a ¡hypothesis ¡class ¡H ¡of ¡linear ¡classifiers ¡that ¡can ¡only ¡ classify ¡point ¡sets ¡with ¡margin ¡at ¡least ¡ M ¡ • What ¡is ¡the ¡largest ¡set ¡of ¡points ¡that ¡H ¡can ¡shaLer? ¡ Cannot ¡shaLer ¡these ¡points: ¡ Y=0 Φ =0 Φ =1 Y=+1 D = 2 M = 3/2 Φ =0 Y=0 < M Φ = − 1 Y=-1 Y=0 Φ =0 SVM ¡a@empts ¡to ¡ d, D 2 ✓ ◆ M = 2 γ = 2 1 VC dimension = min minimize ¡ || w || 2 , ¡which ¡ || w || M 2 minimizes ¡VC-‑dimension!!! ¡ [Figure from Chris Burges]

Gap ¡tolerant ¡classifiers ¡ • Suppose ¡data ¡lies ¡in ¡R d ¡in ¡a ¡ball ¡of ¡diameter ¡ D ¡ • Consider ¡a ¡hypothesis ¡class ¡H ¡of ¡linear ¡classifiers ¡that ¡can ¡only ¡ classify ¡point ¡sets ¡with ¡margin ¡at ¡least ¡ M ¡ • What ¡is ¡the ¡largest ¡set ¡of ¡points ¡that ¡H ¡can ¡shaLer? ¡ Y=0 Φ =0 What ¡is ¡R=D/2 ¡for ¡the ¡Gaussian ¡kernel? ¡ Φ =1 Y=+1 R = max || φ ( x ) || x D = 2 p = max φ ( x ) · φ ( x ) M = 3/2 x p = max K ( x, x ) Φ =0 Y=0 x = 1 !!! ¡ Φ = − 1 ✓ 2 Y=-1 ◆ 2 Y=0 What ¡is ¡ || w || 2 ? ¡ Φ =0 || w || 2 = M || w || 2 = || d, D 2 X ✓ ◆ α i y i φ ( x i ) || 2 2 VC dimension = min i M 2 X X = α i α j y i y j K ( x i , x j ) i j [Figure from Chris Burges]

What ¡you ¡need ¡to ¡know ¡ • Finite ¡hypothesis ¡space ¡ – Derive ¡results ¡ – Coun:ng ¡number ¡of ¡hypothesis ¡ • Complexity ¡of ¡the ¡classifier ¡depends ¡on ¡number ¡of ¡ points ¡that ¡can ¡be ¡classified ¡exactly ¡ – Finite ¡case ¡– ¡number ¡of ¡hypotheses ¡considered ¡ – Infinite ¡case ¡– ¡VC ¡dimension ¡ – VC ¡dimension ¡of ¡gap ¡tolerant ¡classifiers ¡to ¡jus:fy ¡SVM ¡ • Bias-‑Variance ¡tradeoff ¡in ¡learning ¡theory ¡

Learning theory Lecture 10 David Sontag New York - PowerPoint PPT Presentation

Learning theory Lecture 10 David Sontag New York University Slides adapted from Carlos Guestrin & Luke Zettlemoyer What about con:nuous hypothesis spaces? Con:nuous

Chapter 2- -3 3 Chapter 2 Definition of Theory: A theory is a systematic Definition of

Malaysian Healthy Ageing Society Plenary Lecture Plenary Lecture Plenary Lecture Plenary

Computational Learning Theory: Agnostic Learning Machine Learning 1 Slides based on material

Dennis Ryan Clark County School District Health Occupations ryandl@nv.ccsd.net Learning Theory

CS786: Lecture 1 May 1st Basics: review of probability theory 1 CS 786 Lecture Slides (c)

Computational Learning Theory: Probably Approximately Correct (PAC) Learning Machine Learning 1

Lecture One: Classical Galois Theory and Some Generalizations Lecture Two: Grothendieck

Lectures on learning theory G abor Lugosi ICREA and Pompeu Fabra University Barcelona what

Computational Learning Theory: Occams Razor Machine Learning 1 Slides based on material from

Computational Learning Theory: Shattering and VC Dimensions Machine Learning 1 Slides based on

Computational Learning Theory: An Analysis of a Conjunction Learner Machine Learning Slides

Game Theory and Nuclear Weapons Game Theory and Nuclear Weapons Game Theory and Nuclear Warfare

Theory and Applications of Boosting Theory and Applications of Boosting Theory and Applications

Theory and Applications of Boosting Theory and Applications of Boosting Theory and Applications

SOCIOLOGICAL THEORY: A SCIENTIFIC APPROACH What is a theory? ! What does a theory consist of?

Applied Hodge Theory: Social Choice, Crowdsourced Ranking, and Game Theory Yuan Yao HKUST

MANOVA and the Multivariate GLM Here we generalize the notation we learned before to the case of

CS480/680 Lecture 8: June 3, 2019 Classification by Logistic Regression, Generalized linear

Innodrive Final Conference Czech Republic Micro-Data Evidence CEPS Brussels February, 2011 Two

Malaysian Healthy Ageing Society AGING ACTIVE AMONG SENIORS LIVING ALONE IN SINGAPORE Leng Leng

15-388/688 - Practical Data Science: Intro to Machine Learning & Linear Regression J. Zico

Workshop 7: (Generalized) Linear models Murray Logan July 19, 2017 Table of contents 1

Advanced Mathematical Methods Part II Statistics Generalised Linear Model Mel Slater

Hypothesis Testing and statistical preliminaries Stony Brook University CSE545, Spring 2019

Learning theory Lecture 10 David Sontag New York - PowerPoint PPT Presentation

Learning theory Lecture 10 David Sontag New York University Slides adapted from Carlos Guestrin & Luke Zettlemoyer What about con:nuous hypothesis spaces? Con:nuous

Chapter 2- -3 3 Chapter 2 Definition of Theory: A theory is a systematic Definition of

Malaysian Healthy Ageing Society Plenary Lecture Plenary Lecture Plenary Lecture Plenary

Computational Learning Theory: Agnostic Learning Machine Learning 1 Slides based on material

Dennis Ryan Clark County School District Health Occupations ryandl@nv.ccsd.net Learning Theory

CS786: Lecture 1 May 1st Basics: review of probability theory 1 CS 786 Lecture Slides (c)

Computational Learning Theory: Probably Approximately Correct (PAC) Learning Machine Learning 1

Lecture One: Classical Galois Theory and Some Generalizations Lecture Two: Grothendieck

Lectures on learning theory G abor Lugosi ICREA and Pompeu Fabra University Barcelona what

Computational Learning Theory: Occams Razor Machine Learning 1 Slides based on material from

Computational Learning Theory: Shattering and VC Dimensions Machine Learning 1 Slides based on

Computational Learning Theory: An Analysis of a Conjunction Learner Machine Learning Slides

Game Theory and Nuclear Weapons Game Theory and Nuclear Weapons Game Theory and Nuclear Warfare

Theory and Applications of Boosting Theory and Applications of Boosting Theory and Applications

Theory and Applications of Boosting Theory and Applications of Boosting Theory and Applications

SOCIOLOGICAL THEORY: A SCIENTIFIC APPROACH What is a theory? ! What does a theory consist of?

Applied Hodge Theory: Social Choice, Crowdsourced Ranking, and Game Theory Yuan Yao HKUST

MANOVA and the Multivariate GLM Here we generalize the notation we learned before to the case of

CS480/680 Lecture 8: June 3, 2019 Classification by Logistic Regression, Generalized linear

Innodrive Final Conference Czech Republic Micro-Data Evidence CEPS Brussels February, 2011 Two

Malaysian Healthy Ageing Society AGING ACTIVE AMONG SENIORS LIVING ALONE IN SINGAPORE Leng Leng

15-388/688 - Practical Data Science: Intro to Machine Learning &amp; Linear Regression J. Zico

Workshop 7: (Generalized) Linear models Murray Logan July 19, 2017 Table of contents 1

Advanced Mathematical Methods Part II Statistics Generalised Linear Model Mel Slater

Hypothesis Testing and statistical preliminaries Stony Brook University CSE545, Spring 2019

15-388/688 - Practical Data Science: Intro to Machine Learning & Linear Regression J. Zico