Linear classifiers
CE-717: Machine Learning
Sharif University of Technology
- M. Soleymani
Linear classifiers CE-717: Machine Learning Sharif University of - - PowerPoint PPT Presentation
Linear classifiers CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2018 Topics } Linear classifiers } Perceptron SVM will be covered in the later lectures } Fisher } Multi-class classification 2 Classification
2
3
4
} Even when they are not optimal, we can use their simplicity
} are relatively easy to compute } In the absence of information suggesting otherwise, linear classifiers are an
5
6
1 2 3 1 2 3 4
7
} The orientation of πΌ is determined by the normal vector π₯), β¦ , π₯; } π₯7 determine the location of the surface.
} The normal distance from the origin to the decision surface is HI
π
π π = 0
πJ
8
9
1 1
< + π¦< < = 0
1
<, π< <, π)π<]
10
} Select how to measure the prediction loss
} Based on the training set πΈ =
%() S
} Solve the resulting optimization problem to find parameters:
} Find optimal π
π
} We will investigate several cost functions for the classification problem
11
} Least square loss penalizes βtoo correctβ predictions (that they lie a long
} Least square loss also lack robustness to noise
12
13
14
15
16
17
18
19
} If training data are linearly separable, the single-sample perceptron is
20
(π,x)βy π <
(π,x)βy π§π
(π,x)βy π§π
21
22
23
24
* S()
25
} Finds linear combinations of features with large ratios of between-
} Predicts the class of an observation π by first projecting it to the
26
27
28
} π· = 2 classes }
%() *
} Goal: finding the best direction π that we hope to enable accurate
29
30
31
} Maximizes the separation of the projected class means
} It does not consider the variances of the classes in the projected direction
β¦ = π4 π)
β π(Λ)
*β°
β¦ = π4 π<
β π(Λ)
*Ε
32
β¦< =
<
β¦< =
<
33
34
35
4
4
T B T W
2 2
T T T T W B T T W B B W W B T T W W
B W
36
37
d) π) β π<
38
} π = π»H
d) π) β π<
d)π»β
} π = π»H
d) π) β π<
} Using a threshold on π4π, we can classify π
39
} Totally linearly separable
} Pairwise linearly separable
40
41
42
43
} Many classification methods are based on discriminant functions
} boundaries are called decision boundaries or decision surfaces.
44
} Boundary of the β% and ββ’ separating samples of these two categories:
% π = ππ(π)
45
} π
) π = π(π)
} π
< π = βπ(π)
46
} A function π
%(π) for each class π is found
Β¨ π§
%(),β¦,ΕΈ
%(π)
π¦2
β’(π) "π ΒΉ π
47
} Boundary of the contiguous β% and ββ’: βπ, π
% π = ππ(π)
48
49
ΕΎ Λ = πx ΕΎ Λ β π(%)
50