Multi-class Support Vector Machine
Rizal Zaini Ahmad Fathony November 10, 2016
University of Illinois at Chicago
Multi-class Support Vector Machine Rizal Zaini Ahmad Fathony - - PowerPoint PPT Presentation
Multi-class Support Vector Machine Rizal Zaini Ahmad Fathony November 10, 2016 University of Illinois at Chicago Introduction Support Vector Machine The Support Vector Machine is a classification algorithm developed based on a geometric
University of Illinois at Chicago
Introduction 1
Introduction 1
Introduction 1
Introduction 1
Table of Contents 2
Standard SVM Formulation 3
Standard SVM Formulation 3
Standard SVM Formulation 4
Standard SVM Formulation 5
1 w
2w2 Standard SVM Formulation 6
1 w
2w2
Standard SVM Formulation 6
w,b
Standard SVM Formulation 7
w,b
Standard SVM Formulation 7
Standard SVM Formulation 8
w,b,ξ
n
Standard SVM Formulation 8
Standard SVM Formulation 9
Standard SVM Formulation 9
Standard SVM Formulation 10
Multi-class SVM Formulations 11
Multi-class SVM Formulations 12
Multi-class SVM Formulations 12
All-in-one Machine Formulations 13
One Versus One 14
One Versus One 15
One Versus One 16
One Versus One 17
One Versus One 18
One Versus One 19
One Versus All 20
One Versus All 21
One Versus All 22
One Versus All 23
One Versus All 24
a∈[1,k]
One Versus All 25
All-in-one Machine Formulations 26
All-in-one Machine Formulations 26
Weston and Watkins (WW) Formulation 27
Weston and Watkins (WW) Formulation 27
w,b,ξ
n
w,b,ξ
k
n
Weston and Watkins (WW) Formulation 28
j
j
Weston and Watkins (WW) Formulation 29
Crammer and Singer (CS) Formulation 30
w,b,ξ
k
n
w,b,ξ
k
n
Crammer and Singer (CS) Formulation 31
Lee, Lin, and Wahba (LLW) Formulation 32
Lee, Lin, and Wahba (LLW) Formulation 32
w,b,ξ
k
n
w,b,ξ
k
n
k
Lee, Lin, and Wahba (LLW) Formulation 33
1Lin, Y. Support vector machines and the Bayes rule in classification.
Fisher Consistency in Binary Classification 34
2
1Lin, Y. Support vector machines and the Bayes rule in classification.
Fisher Consistency in Binary Classification 34
2
2)
1Lin, Y. Support vector machines and the Bayes rule in classification.
Fisher Consistency in Binary Classification 34
Fisher Consistency in Multi-class Classification 35
1 (x), · · · , f ∗ k (x)]T is the minimizer
Fisher Consistency in Multi-class Classification 35
1 (x), · · · , f ∗ k (x)]T is the minimizer
j
j (x) = argmax j
Fisher Consistency in Multi-class Classification 35
1 (x), · · · , f ∗ k (x)]T is the minimizer
j
j (x) = argmax j
i=1 fj(x) = 0
Fisher Consistency in Multi-class Classification 35
j
Fisher Consistency in Multi-class Classification 36
Fisher Consistency of the All-in-One Machines SVM 37
Fisher Consistency of the All-in-One Machines SVM 37
l=1 Pl(x)([1 − fl(x)]+) Inconsistency of the Naive Formulation 38
l=1 Pl(x)([1 − fl(x)]+)
l=1 Pl(x)([1 − fl(x)]+)
j=1 fj(x) = 0 satisfies the following: f ∗ j (x) = −(k − 1) if
Inconsistency of the Naive Formulation 38
l=1 Pl(x)([1 − fl(x)]+) subject to
j=1 fj(x) = 0 satisfies the following: f ∗ j (x) = −(k − 1) if j = argminj Pj(x) and 1
f k
k
Inconsistency of the Naive Formulation 39
l=1 Pl(x)([1 − fl(x)]+) subject to
j=1 fj(x) = 0 satisfies the following: f ∗ j (x) = −(k − 1) if j = argminj Pj(x) and 1
f k
k
j (x) = −(k − 1) if j = argminj Pj(x) and 1 otherwise
Inconsistency of the Naive Formulation 39
j=Y [1 + fj(X)]+] is equal to
l=1
Consistency of the LLW Formulation 40
j=Y [1 + fj(X)]+] is equal to
l=1
j=Y [1 + fj(X)]+|X = x] = k l=1
j=1 fj(x) = 0 satisfies the following: f ∗ j (x) = k − 1 if
Consistency of the LLW Formulation 40
j=Y [1 + fj(X)]+|X = x] = k l=1
j=1 fj(x) = 0 satisfies the following: f ∗ j (x) = k − 1 if j = argmaxj Pj(x)
f k
k
Consistency of the LLW Formulation 41
j=Y [1 + fj(X)]+|X = x] = k l=1
j=1 fj(x) = 0 satisfies the following: f ∗ j (x) = k − 1 if j = argmaxj Pj(x)
f k
k
j (x) = k − 1 if j = argmaxj Pj(x) and -1 otherwise
Consistency of the LLW Formulation 41
j=y [1 − (fY (x) − fj(x))]+] is
l=1
Inconsistency of the WW Formulation 42
j=y [1 − (fY (x) − fj(x))]+] is
l=1
2 > P1 > P2 > P3. The minimizer
1 , f ∗ 2 , f ∗ 3 ) of
j=y [1−(fY (X)−fj(X))]+|X = x] = k l=1
3, any f∗ satisfying f ∗ 1 ≥ f ∗ 2 ≥ f ∗ 3 and f ∗ 1 − f ∗ 3 = 1.
3, any f∗ satisfying f ∗ 1 ≥ f ∗ 2 ≥ f ∗ 3 , f ∗ 1 = f ∗ 2 and f ∗ 2 − f ∗ 3 = 1.
3, any f∗ satisfying f ∗ 1 ≥ f ∗ 2 ≥ f ∗ 3 , f ∗ 2 = f ∗ 3 and f ∗ 1 − f ∗ 2 = 1.
Inconsistency of the WW Formulation 42
2 > P1 > P2 > P3. The minimizer
1 , f ∗ 2 , f ∗ 3 ) of
j=y [1 − (fY (X) − fj(X))]+|X = x] = k l=1
3 , any f∗ satisfying f ∗ 1 ≥ f ∗ 2 ≥ f ∗ 3 and f ∗ 1 − f ∗ 3 = 1.
3 , any f∗ satisfying f ∗ 1 ≥ f ∗ 2 ≥ f ∗ 3 , f ∗ 1 = f ∗ 2 and f ∗ 2 − f ∗ 3 = 1.
3 , any f∗ satisfying f ∗ 1 ≥ f ∗ 2 ≥ f ∗ 3 , f ∗ 2 = f ∗ 3 and f ∗ 1 − f ∗ 2 = 1.
2 > P1 > P2 > P3
3 Inconsistency of the WW Formulation 43
l=1 Pl(x)([1 − min g(f(x), l)]+) Inconsistency of the CS Formulation 44
l=1 Pl(x)([1 − min g(f(x), l)]+)
j=1 fj(x) = 0 satisfies the following properties:
2, then argmaxj f ∗ j = argmaxj Pj and
j ) = 1.
2, then f∗ = 0
Inconsistency of the CS Formulation 44
j=1 fj(x) = 0
2 , then argmaxj f ∗ j
j ) = 1.
2 , then f∗ = 0
2) cannot be guaranteed
2 for a given x, then f∗(x) = 0
Inconsistency of the CS Formulation 45
Modification of the Inconsistent Formulations 46
f k
k
f k
k
Modification of the Naive Formulation 47
1 k−1, ∀l ∈ [1, k]
j (x) = 1 if j = argmaxj Pj(x) and − 1 k−1 otherwise
Modification of the Naive Formulation 48
1 k−1, ∀l ∈ [1, k]
j (x) = 1 if j = argmaxj Pj(x) and − 1 k−1 otherwise
k
Modification of the Naive Formulation 48
k
Modification of the WW Formulation 49
f
k
n
k
Modification of the WW Formulation 50
f
k
n
k
Modification of the WW Formulation 50
Modification of the WW Formulation 51
Modification of the CS Formulation 52
Modification of the CS Formulation 53
Modification of the CS Formulation 54
Experiments 55
Experiments 55
Artificial Benchmark Problem 56
10), sin(t · π 10)) where t ∈ [0, 20] Artificial Benchmark Problem 56
10), sin(t · π 10)) where t ∈ [0, 20]
Artificial Benchmark Problem 56
Artificial Benchmark Problem 57
Artificial Benchmark Problem 58
j
Artificial Benchmark Problem 59
j
Artificial Benchmark Problem 59
Artificial Benchmark Problem 60
Artificial Benchmark Problem 61
Artificial Benchmark Problem 61
Artificial Benchmark Problem 61
Artificial Benchmark Problem 62
j=1 fj(x) = 0
2 , then argmaxj f ∗ j
j ) = 1.
2 , then f∗ = 0
Artificial Benchmark Problem 63
Empirical Comparison 64
Empirical Comparison 64
Empirical Comparison 64
Empirical Comparison 65
Dataset OVA WW CS LLW Covertype 50.59 (±5.49) 70.55 (±0.09) 45.73 (±5.88) 21.87 (±23.19) Letter 63.69 (±0.48) 69.39 (±0.63) 76.59 (±0.61) 12.78 (±0.40) News-20 85.36 (±0.32) 85.13 (±0.15) 85.17 (±0.32) 86.71 (±0.39) Sector 94.53 (±0.22) 94.10 (±0.33) 94.80 (±0.29) 94.82 (±0.28) Usps 94.50 (±0.39) 94.46 (±0.57) 95.26 (±0.46) 78.18 (±5.27) Abalone 18.95 (±0.86) 21.70 (±1.30) 14.12 (±1.64) 16.56 (±1.17) Car 71.69 (±1.73) 73.76 (±1.68) 73.15 (±2.02) 65.34 (±12.17) Glass 56.98 (±6.44) 61.93 (±6.63) 61.93 (±6.04) 46.78 (±6.77) Iris 91.11 (±4.85) 95.88 (±1.71) 91.76 (±7.18) 74.65 (±7.52)
95.98 (±0.60) 96.03 (±0.37) 96.42 (±0.37) 73.56 (±2.11) Page Blocks 70.44 (±21.20) 91.14 (±5.41) 94.20 (±2.34) 93.22 (±1.02) Sat 75.04 (±0.96) 77.40 (±3.00) 66.87 (±9.90) 51.47 (±9.01) Segment 92.54 (±0.75) 92.43 (±2.13) 92.43 (±2.13) 74.50 (±1.32) Soy Bean 90.65 (±3.03) 87.75 (±3.16) 83.49 (±5.80) 77.95 (±9.97) Vehicle 52.02 (±11.98) 72.75 (±4.13) 72.75 (±4.13) 63.21 (±10.63) Red wine 53.38 (±2.63) 58.37 (±1.69) 55.61 (±2.47) 57.26 (±2.02) White wine 50.73 (±1.27) 51.78 (±1.24) 50.85 (±1.12) 46.44 (±1.74)
Empirical Comparison 66
Dataset OVA WW CS LLW Covertype 50.59 (±5.49) 70.55 (±0.09) 45.73 (±5.88) 21.87 (±23.19) Letter 63.69 (±0.48) 69.39 (±0.63) 76.59 (±0.61) 12.78 (±0.40) News-20 85.36 (±0.32) 85.13 (±0.15) 85.17 (±0.32) 86.71 (±0.39) Sector 94.53 (±0.22) 94.10 (±0.33) 94.80 (±0.29) 94.82 (±0.28) Usps 94.50 (±0.39) 94.46 (±0.57) 95.26 (±0.46) 78.18 (±5.27) Abalone 18.95 (±0.86) 21.70 (±1.30) 14.12 (±1.64) 16.56 (±1.17) Car 71.69 (±1.73) 73.76 (±1.68) 73.15 (±2.02) 65.34 (±12.17) Glass 56.98 (±6.44) 61.93 (±6.63) 61.93 (±6.04) 46.78 (±6.77) Iris 91.11 (±4.85) 95.88 (±1.71) 91.76 (±7.18) 74.65 (±7.52)
95.98 (±0.60) 96.03 (±0.37) 96.42 (±0.37) 73.56 (±2.11) Page Blocks 70.44 (±21.20) 91.14 (±5.41) 94.20 (±2.34) 93.22 (±1.02) Sat 75.04 (±0.96) 77.40 (±3.00) 66.87 (±9.90) 51.47 (±9.01) Segment 92.54 (±0.75) 92.43 (±2.13) 92.43 (±2.13) 74.50 (±1.32) Soy Bean 90.65 (±3.03) 87.75 (±3.16) 83.49 (±5.80) 77.95 (±9.97) Vehicle 52.02 (±11.98) 72.75 (±4.13) 72.75 (±4.13) 63.21 (±10.63) Red wine 53.38 (±2.63) 58.37 (±1.69) 55.61 (±2.47) 57.26 (±2.02) White wine 50.73 (±1.27) 51.78 (±1.24) 50.85 (±1.12) 46.44 (±1.74)
Empirical Comparison 67
Dataset OVA WW CS LLW Covertype 50.59 (±5.49) 70.55 (±0.09) 45.73 (±5.88) 21.87 (±23.19) Letter 63.69 (±0.48) 69.39 (±0.63) 76.59 (±0.61) 12.78 (±0.40) News-20 85.36 (±0.32) 85.13 (±0.15) 85.17 (±0.32) 86.71 (±0.39) Sector 94.53 (±0.22) 94.10 (±0.33) 94.80 (±0.29) 94.82 (±0.28) Usps 94.50 (±0.39) 94.46 (±0.57) 95.26 (±0.46) 78.18 (±5.27) Abalone 18.95 (±0.86) 21.70 (±1.30) 14.12 (±1.64) 16.56 (±1.17) Car 71.69 (±1.73) 73.76 (±1.68) 73.15 (±2.02) 65.34 (±12.17) Glass 56.98 (±6.44) 61.93 (±6.63) 61.93 (±6.04) 46.78 (±6.77) Iris 91.11 (±4.85) 95.88 (±1.71) 91.76 (±7.18) 74.65 (±7.52)
95.98 (±0.60) 96.03 (±0.37) 96.42 (±0.37) 73.56 (±2.11) Page Blocks 70.44 (±21.20) 91.14 (±5.41) 94.20 (±2.34) 93.22 (±1.02) Sat 75.04 (±0.96) 77.40 (±3.00) 66.87 (±9.90) 51.47 (±9.01) Segment 92.54 (±0.75) 92.43 (±2.13) 92.43 (±2.13) 74.50 (±1.32) Soy Bean 90.65 (±3.03) 87.75 (±3.16) 83.49 (±5.80) 77.95 (±9.97) Vehicle 52.02 (±11.98) 72.75 (±4.13) 72.75 (±4.13) 63.21 (±10.63) Red wine 53.38 (±2.63) 58.37 (±1.69) 55.61 (±2.47) 57.26 (±2.02) White wine 50.73 (±1.27) 51.78 (±1.24) 50.85 (±1.12) 46.44 (±1.74)
Empirical Comparison 68
Conclusions 69
Conclusions 69
Conclusions 69
Conclusions 69
2Liu, Y. Fisher consistency of multicategory support vector machines in International
Conclusions 70
2Liu, Y. Fisher consistency of multicategory support vector machines in International
Conclusions 70
2Liu, Y. Fisher consistency of multicategory support vector machines in International
Conclusions 70
Conclusions 71
Conclusions 71
Conclusions 71
3Dogan, U. et al. A Unified View on Multi-class Support Vector Classification.
Conclusions 72
3Dogan, U. et al. A Unified View on Multi-class Support Vector Classification.
Conclusions 72
3Dogan, U. et al. A Unified View on Multi-class Support Vector Classification.
Conclusions 72
3Dogan, U. et al. A Unified View on Multi-class Support Vector Classification.
Conclusions 72
Conclusions 73
Conclusions 73
Conclusions 73
Conclusions 74
Conclusions 74
Conclusions 74
Conclusions 74