Linear classifiers
CE-717: Machine Learning
Sharif University of Technology
- M. Soleymani
Linear classifiers CE-717: Machine Learning Sharif University of - - PowerPoint PPT Presentation
Linear classifiers CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2019 Topics } Linear classifiers } Perceptron SVM will be covered in the later lectures } Fisher } Multi-class classification 2 Classification
2
3
4
} Even when they are not optimal, we can use their simplicity
} are relatively easy to compute } In the absence of information suggesting otherwise, linear classifiers are an
5
6
1 2 3 1 2 3 4
7
} The orientation of πΌ is determined by the normal vector π₯), β¦ , π₯; } π₯7 determines the location of the surface.
} The normal distance from the origin to the decision surface is HI
π
π π = 0
πJ
8
9
1 1
< + π¦< < = 0
1
<, π< <, π)π<]
10
} Select how to measure the prediction loss
} Based on the training set πΈ =
%() S
} Solve the resulting optimization problem to find parameters:
} Find optimal π
π
} We will investigate several cost functions for the classification problem
11
} Least square loss penalizes βtoo correctβ predictions (that they lie a long
} Least square loss also lack robustness to noise
12
13
14
15
16
17
18
19
} If training data are linearly separable, the single-sample perceptron is
20
21
22
23
24
(π,β)βΖ π <
(π,β)βΖ π§π
(π,β)βΖ π§π
25
26
* S()
27
} Finds linear combinations of features with large ratios of between-
} Predicts the class of an observation π by first projecting it to the
28
29
30
} πΏ = 2 classes }
%() *
} Goal: finding the best direction π that we hope to enable accurate
31
32
33
} Maximizes the separation of the projected class means
} It does not consider the variances of the classes in the projected direction
β π(β’)
*β
β π(β’)
*~
34
<
<
35
36
37
4
4
T B T W
2 2
T T T T W B T T W B B W W B T T W W
B W
38
39
d) π) β π<
40
} π = π»H
d) π) β π<
d)π»Ε‘
} π = π»H
d) π) β π<
} Using a threshold on π4π, we can classify π
41
} A function π
%(π) for each class π is found
Β¨ π§
%(),β¦,Ε
%(π)
π¦2
42
} A function π
%(π) for each class π is found
Β¨ π§
%(),β¦,Ε
%(π)
π¦2
43
} Totally linearly separable
} Pairwise linearly separable
44
45
46
47
} A function π
%(π) for each class π is found
Β¨ π§
%(),β¦,Ε
%(π)
π¦2
ΕΈ(π) "π ΒΉ π
48
} Boundary of the β% and βΕΈ separating samples of these two categories:
% π = ππ(π)
49
} Many classification methods are based on discriminant functions
} boundaries are called decision boundaries or decision surfaces.
50
} π
) π = π(π)
} π
< π = βπ(π)
51
} Boundary of the contiguous β% and βΕΈ: βπ, π
% π = ππ(π)
52
% π¦) β₯ π ΕΈ π¦)
% π¦< β₯ π ΕΈ π¦<
% π¦) + 1 β π½ π % π¦< β₯ π½π ΕΈ π¦) + 1 β π½ π ΕΈ π¦<
% π½π¦) + 1 β π½ π¦< β₯ π ΕΈ π½π¦) + 1 β π½ π¦<
% is linear
53
54
βΊ β’ = πβ βΊ β’ β π(%)
55