Statistics and learning Support Vector Machines S A c bastien - PowerPoint PPT Presentation

Statistics and learning Support Vector Machines S˜ A c � bastien Gadat Toulouse School of Economics February 2017 S. Gadat (TSE) SAD 2017 1 / 20

Linearly separable data Intuition: How would you separate whites and blacks? S. Gadat (TSE) SAD 2017 2 / 20

Separation hyperplane S. Gadat (TSE) SAD 2017 3 / 20

Separation hyperplane M + M - β Any separation hyperplane can be written ( β, β 0 ) such that: ∀ i = 1 ..N, β T x i + β 0 ≥ 0 if y i = +1 ∀ i = 1 ..N, β T x i + β 0 ≤ 0 if y i = − 1 This can be written: β T x i + β 0 � � ∀ i = 1 ..N, y i ≥ 0 S. Gadat (TSE) SAD 2017 3 / 20

Separation hyperplane M + M - β But. . . β T x i + β 0 � � y i is the signed distance between point i and the hyperplane ( β, β 0 ) β T x i + β 0 � � Margin of a separating hyperplane: min y i ? i S. Gadat (TSE) SAD 2017 3 / 20

Separation hyperplane M + M - β Optimal separating hyperplane Maximize the margin between the hyperplane and the data. max β,β 0 M β T x i + β 0 � � such that ∀ i = 1 ..N, y i ≥ M and � β � = 1 S. Gadat (TSE) SAD 2017 3 / 20

Separation hyperplane M + M - β Let’s get rid of � β � = 1 : 1 β T x i + β 0 � � ∀ i = 1 ..N, � β � y i ≥ M β T x i + β 0 � � ⇒ ∀ i = 1 ..N, y i ≥ M � β � S. Gadat (TSE) SAD 2017 3 / 20

Separation hyperplane M + M - β β T x i + β 0 � � ∀ i = 1 ..N, y i ≥ M � β � If ( β, β 0 ) satisfies this constraint, then ∀ α > 0 , ( αβ, αβ 0 ) does too. β T x i + β 0 � � Let’s choose to have ∀ i = 1 ..N, y i ≥ 1 then we need to set � β � = 1 M S. Gadat (TSE) SAD 2017 3 / 20

Separation hyperplane M + M - β 1 Now M = � β � . Geometrical interpretation? So β,β 0 � β � 2 max β,β 0 M ⇔ min β,β 0 � β � ⇔ min S. Gadat (TSE) SAD 2017 3 / 20

Separation hyperplane M + M - β Optimal separating hyperplane (continued) 1 2 � β � 2 min β,β 0 β T x i + β 0 � � such that ∀ i = 1 ..N, y i ≥ 1 1 Maximize the margin M = � β � between the hyperplane and the data. S. Gadat (TSE) SAD 2017 3 / 20

Optimal separating hyperplane 1 2 � β � 2 min β,β 0 β T x i + β 0 � � such that ∀ i = 1 ..N, y i ≥ 1 It’s a QP problem! S. Gadat (TSE) SAD 2017 4 / 20

Optimal separating hyperplane 1 2 � β � 2 min β,β 0 β T x i + β 0 � � such that ∀ i = 1 ..N, y i ≥ 1 It’s a QP problem! N L P ( β, β 0 , α ) = 1 2 � β � 2 − β T x i + β 0 � � � � � α i y i − 1 i =1 S. Gadat (TSE) SAD 2017 4 / 20

Optimal separating hyperplane 1 2 � β � 2 min β,β 0 β T x i + β 0 � � such that ∀ i = 1 ..N, y i ≥ 1 It’s a QP problem! N L P ( β, β 0 , α ) = 1 2 � β � 2 − β T x i + β 0 � � � � � α i y i − 1 i =1 N  ∂L P ∂β = 0 ⇒ β = � α i y i x i    i =1    N  ∂L P KKT conditions ∂β 0 = 0 ⇒ 0 = � α i y i i =1  β T x i + β 0  � � � �  ∀ i = 1 ..N, α i y i − 1 = 0     ∀ i = 1 ..N, α i ≥ 0 S. Gadat (TSE) SAD 2017 4 / 20

Optimal separating hyperplane 1 2 � β � 2 min β,β 0 β T x i + β 0 � � such that ∀ i = 1 ..N, y i ≥ 1 It’s a QP problem! β T x i + β 0 � � � � ∀ i = 1 ..N, α i y i − 1 = 0 Two possibilities: β T x i + β 0 ◮ α i > 0 , then y i � � = 1 : x i is on the margin’s boundary ◮ α i = 0 , then x i is anywhere on the boundary or further . . . but does not participate in β . N � β = α i y i x i i =1 The x i for which α i > 0 are called Support Vectors . S. Gadat (TSE) SAD 2017 4 / 20

Optimal separating hyperplane 1 2 � β � 2 min β,β 0 β T x i + β 0 � � such that ∀ i = 1 ..N, y i ≥ 1 It’s a QP problem! N N N α i − 1 � � � α i α j y i y j x T Dual problem: α ∈ R + N L D ( α ) = max i x j 2 i =1 i =1 j =1 N � such that α i y i = 0 i =1 Solving the dual problem is a maximization in R N , rather than a (constrained) minimization in R n . Usual algorithm: SMO=Sequential Minimal Optimization. S. Gadat (TSE) SAD 2017 4 / 20

Optimal separating hyperplane 1 2 � β � 2 min β,β 0 β T x i + β 0 � � such that ∀ i = 1 ..N, y i ≥ 1 It’s a QP problem! And β 0 ? β T x i + β 0 � � � � Solve α i y i − 1 = 0 for any i such that α i > 0 S. Gadat (TSE) SAD 2017 4 / 20

Optimal separating hyperplane M + M - β Overall: N � β = α i y i x i i =1 With α i > 0 only for x i support vectors . � N � β T x + β 0 � � α i y i x T Prediction: f ( x ) = sign = sign � i x + β 0 i =1 S. Gadat (TSE) SAD 2017 4 / 20

Non-linearly separable data? S. Gadat (TSE) SAD 2017 5 / 20

Non-linearly separable data? Slack variables ξ = ( ξ 1 , . . . , ξ N ) y i ( β T x i + β 0 ) ≥ M − ξ i  N  � or  and ξ i ≥ 0 and ξ i ≤ K y i ( β T x i + β 0 ) ≥ M (1 − ξ i ) i =1 S. Gadat (TSE) SAD 2017 5 / 20

Non-linearly separable data? y i ( β T x i + β 0 ) ≥ M (1 − ξ i ) ⇒ misclassification if ξ i ≥ 1 N � ξ i ≤ K ⇒ maximum K misclassifications i =1 S. Gadat (TSE) SAD 2017 5 / 20

Non-linearly separable data? Optimal separating hyperplane min β,β 0 � β � β T x i + β 0  � � y i ≥ 1 − ξ i ,  N such that ∀ i = 1 ..N, � ξ i ≥ 0 , ξ i ≤ K  i =1 S. Gadat (TSE) SAD 2017 5 / 20

Non-linearly separable data? Optimal separating hyperplane N 1 2 � β � 2 + C � min ξ i β,β 0 � y i i =1 β T x i + β 0 � � ≥ 1 − ξ i , such that ∀ i = 1 ..N, ξ i ≥ 0 S. Gadat (TSE) SAD 2017 5 / 20

Optimal separating hyperplane Again a QP problem. N N N L P = 1 2 � β � 2 + C β T x i + β 0 � � � � � � � ξ i − α i y i − (1 − ξ i ) − µ i ξ i i =1 i =1 i =1  N ∂L P ∂β = 0 ⇒ β = � α i y i x i     i =1   N   ∂L P  ∂β 0 = 0 ⇒ 0 = � α i y i    i =1 KKT conditions ∂L P ∂ξ = 0 ⇒ α i = C − µ i   β T x i + β 0  � � � � ∀ i = 1 ..N, α i y i − (1 − ξ i ) = 0     ∀ i = 1 ..N, µ i ξ i = 0     ∀ i = 1 ..N, α i ≥ 0 , µ i ≥ 0  S. Gadat (TSE) SAD 2017 6 / 20

Optimal separating hyperplane N N N α i − 1 � � � α i α j y i y j x T Dual problem: α ∈ R + N L D ( α ) = max i x j 2 i =1 i =1 j =1 N � such that α i y i = 0 i =1 and 0 ≤ α i ≤ C S. Gadat (TSE) SAD 2017 6 / 20

Optimal separating hyperplane N β T x i + β 0 � � � � � α i y i − (1 − ξ i ) = 0 and β = α i y i x i i =1 Again: β T x i + β 0 ◮ α i > 0 , then y i � � = 1 − ξ i : x i is a support vector . Among these: ◮ ξ i = 0 , then 0 ≤ α i ≤ C ◮ ξ i > 0 , then α i = C (because µ i = 0 , because µ i ξ i = 0 ) ◮ α i = 0 , then x i does not participate in β . S. Gadat (TSE) SAD 2017 6 / 20

Optimal separating hyperplane Overall: N � β = α i y i x i i =1 With α i > 0 only for x i support vectors . � N � β T x + β 0 α i y i x T � � � Prediction: f ( x ) = sign = sign i x + β 0 i =1 S. Gadat (TSE) SAD 2017 6 / 20

Non-linear SVMs? Key remark � X → H h : is a mapping to a p-dimensional Euclidean space. x �→ h ( x ) ( p ≫ n , possibly infinite) � N � SVM classifier in H : f ( x ′ ) = sign α i y i � x ′ i , x ′ � + β 0 � . i =1 Suppose K ( x, x ′ ) = � h ( x ) , h ( x ′ ) � , Then: � N � � f ( x ) = sign α i y i K ( x i , x ) + β 0 . i =1 S. Gadat (TSE) SAD 2017 7 / 20

Kernels Kernel K ( x, y ) = � h ( x ) , h ( y ) � is called a kernel function. S. Gadat (TSE) SAD 2017 8 / 20

Kernels Kernel K ( x, y ) = � h ( x ) , h ( y ) � is called a kernel function. Example: x 2   √ 1 X = R 2 , H = R 3 , h ( x ) = 2 x 1 x 2   x 2 2 K ( x, y ) = h ( x ) T h ( y ) S. Gadat (TSE) SAD 2017 8 / 20

Kernels Kernel K ( x, y ) = � h ( x ) , h ( y ) � is called a kernel function. What if we knew that K ( · , · ) is a kernel, without explicitly building h ? The SVM would be a linear classifier in H but we would never have to compute h ( x ) for training or prediction! This is called the kernel trick . S. Gadat (TSE) SAD 2017 8 / 20

Kernels Kernel K ( x, y ) = � h ( x ) , h ( y ) � is called a kernel function. Under what conditions is K ( · , · ) an acceptable kernel? Answer: if it is an inner product on a (separable) Hilbert space. In more general words, we are interested in positive, definite kernel on a Hilbert space: Positive Definite Kernels K ( · , · ) is a positive definite kernel on X if n ∀ n ∈ N , x ∈ X n and c ∈ R n , � c i c j K ( x i , x j ) ≥ 0 i,j =1 S. Gadat (TSE) SAD 2017 8 / 20

Statistics and learning Support Vector Machines S A c bastien - PowerPoint PPT Presentation

Statistics and learning Support Vector Machines S A c bastien Gadat Toulouse School of Economics February 2017 S. Gadat (TSE) SAD 2017 1 / 20 Linearly separable data Intuition: How would you separate whites and blacks? S. Gadat

Official Statistics Matt Dray, Assistant Statistician Official Statistics 2 Official

Areal statistics Barry Rowlingson Research Fellow DataCamp Spatial Statistics in R Borders

The Pulse monitors: Statistics Smartpods PULSE 1 - Improve Facility Efficiencies 2 - Increase

Quality Assurance in Official Statistics Directorate of Economics & Statistics, Planning

UK Bleeding Disorder Statistics UK Bleeding Disorder Statistics UK Bleeding Disorder Statistics

The Statistics Network The Statistics Network Statistics network Compute servers Desktop PCs

1 Practical Information 2 Introduction to Statistics Per Bruun Brockhoff 3 Descriptive Statistics:

Statistics for Social Sciences I: Introduction to Statistics Introduction to Statistics

Categorical Probability and Statistics Peter McCullagh Department of Statistics University of

Order Statistics and Applications Rosemary Smith Introduction to Order Statistics Unordered

Statistics for Machine Learning Prof. Seungchul Lee Industrial AI Lab. Statistics and

AP Biology and Statistics Statistics Statistics help to better understand the meaning of a

Order Statistics and Pitman Closeness Katherine F. Davies Department of Statistics University of

The Power and Limits of Statistics DPRRGSP 2018-11-29 @ReinhardFurrer Applied Statistics

Bayesian statistics DS GA 1002 Probability and Statistics for Data Science

Statistics in Schools Classrooms Powered by Census Data CENSUS.GOV/SCHOOLS Statistics in

SELF-CARE REMINDERS Remember you are not alone. This is being felt around the world.

1 Introduction 4 How to Avoid Troubled Projects Apply proper engineering

Statistics and learning Multivariate statistics 1 Emmanuel Rachelson and Matthieu Vignes ISAE

Wordly Wise afford boast Goal: Students will read with accuracy and apply chord exceptional

NO DEAL BREXIT CUSTOMS WORKSHOP FOR ACCREDITED TRADERS KAREN WHEELER, DIRECTOR-GENERAL BORDER

Family Week 2020 15 15-21 May Building Connections in the Spirit of Hope Who does CatholicCare

the ESTRO formalism Maria Rosa Malisan 2 AAPM RPT 258 A protocol is presented for the

Lecture 7: Word Embeddings Kai-Wei Chang CS @ University of Virginia kw@kwchang.net Couse

Statistics and learning Support Vector Machines S A c bastien - PowerPoint PPT Presentation

Statistics and learning Support Vector Machines S A c bastien Gadat Toulouse School of Economics February 2017 S. Gadat (TSE) SAD 2017 1 / 20 Linearly separable data Intuition: How would you separate whites and blacks? S. Gadat

Official Statistics Matt Dray, Assistant Statistician Official Statistics 2 Official

Areal statistics Barry Rowlingson Research Fellow DataCamp Spatial Statistics in R Borders

The Pulse monitors: Statistics Smartpods PULSE 1 - Improve Facility Efficiencies 2 - Increase

Quality Assurance in Official Statistics Directorate of Economics &amp; Statistics, Planning

UK Bleeding Disorder Statistics UK Bleeding Disorder Statistics UK Bleeding Disorder Statistics

The Statistics Network The Statistics Network Statistics network Compute servers Desktop PCs

1 Practical Information 2 Introduction to Statistics Per Bruun Brockhoff 3 Descriptive Statistics:

Statistics for Social Sciences I: Introduction to Statistics Introduction to Statistics

Categorical Probability and Statistics Peter McCullagh Department of Statistics University of

Order Statistics and Applications Rosemary Smith Introduction to Order Statistics Unordered

Statistics for Machine Learning Prof. Seungchul Lee Industrial AI Lab. Statistics and

AP Biology and Statistics Statistics Statistics help to better understand the meaning of a

Order Statistics and Pitman Closeness Katherine F. Davies Department of Statistics University of

The Power and Limits of Statistics DPRRGSP 2018-11-29 @ReinhardFurrer Applied Statistics

Bayesian statistics DS GA 1002 Probability and Statistics for Data Science

Statistics in Schools Classrooms Powered by Census Data CENSUS.GOV/SCHOOLS Statistics in

SELF-CARE REMINDERS Remember you are not alone. This is being felt around the world.

1 Introduction 4 How to Avoid Troubled Projects Apply proper engineering

Statistics and learning Multivariate statistics 1 Emmanuel Rachelson and Matthieu Vignes ISAE

Wordly Wise afford boast Goal: Students will read with accuracy and apply chord exceptional

NO DEAL BREXIT CUSTOMS WORKSHOP FOR ACCREDITED TRADERS KAREN WHEELER, DIRECTOR-GENERAL BORDER

Family Week 2020 15 15-21 May Building Connections in the Spirit of Hope Who does CatholicCare

the ESTRO formalism Maria Rosa Malisan 2 AAPM RPT 258 A protocol is presented for the

Lecture 7: Word Embeddings Kai-Wei Chang CS @ University of Virginia kw@kwchang.net Couse

Quality Assurance in Official Statistics Directorate of Economics & Statistics, Planning