Linear classifiers
CE-717: Machine Learning
Sharif University of Technology
- M. Soleymani
Linear classifiers CE-717: Machine Learning Sharif University of - - PowerPoint PPT Presentation
Linear classifiers CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Topics Discriminant functions Linear classifiers Perceptron SVM will be covered in the later lectures Fisher Multi-class
2
3
4
Many classification methods are based on discriminant functions
boundaries are called decision boundaries or decision surfaces.
5
Boundary of the ℛ𝑗 and ℛ𝑘 separating samples of these two categories:
𝑗 𝒚 = 𝑔𝑘(𝒚)
6
𝑔
1 𝒚 = 𝑔(𝒚)
𝑔
2 𝒚 = −𝑔(𝒚)
7
Even when they are not optimal, we can use their simplicity
are relatively easy to compute In the absence of information suggesting otherwise, linear classifiers are an
8
9
1 2 3 1 2 3 4
10
The orientation of 𝐼 is determined by the normal vector 𝑥1, … , 𝑥𝑒 𝑥0 determine the location of the surface.
The normal distance from the origin to the decision surface is 𝑥0
𝒙
𝑔 𝒚 = 0
𝒚⊥
11
12
1 1
2 + 𝑦2 2 = 0
1
2, 𝒚2 2, 𝒚1𝒚2]
13
Select how to measure the prediction loss
Based on the training set 𝐸 =
𝑗=1 𝑜 , a cost function 𝐾 𝒙 is defined
Solve the resulting optimization problem to find parameters:
Find optimal
𝒙
We will investigate several cost functions for the classification problem
14
Least square loss penalizes ‘too correct’ predictions (that they lie a long
Least square loss also lack robustness to noise
15
16
18
19
20
21
22
If training data are linearly separable, the single-sample perceptron is
23
24
25
26
𝑜=1 𝑂
27
Finds linear combinations of features with large ratios of between-
Predicts the class of an observation 𝒚 by first projecting it to the
28
29
30
𝐷 = 2 classes
𝑗=1 𝑂
Goal: finding the best direction 𝒙 that we hope to enable accurate
31
32
33
Maximizes the separation of the projected class means
It does not consider the variances of the classes in the projected direction
′ = 𝒙𝑈 𝝂1
𝒚(𝑗)∈𝒟1 𝒚(𝑗) 𝑂1
′ = 𝒙𝑈 𝝂2
𝒚(𝑗)∈𝒟2 𝒚(𝑗) 𝑂2
34
2 = 𝒚(𝑗)∈𝒟1
2
2 = 𝒚(𝑗)∈𝒟2
2
′2 = 𝒚(𝑗)∈𝒟1
2
′2 = 𝒚(𝑗)∈𝒟2
2
35
36
37
𝒚(𝑗)∈𝒟1
𝑈
𝒚(𝑗)∈𝒟2
𝑈
T B T W
2 2
T T T T W B T T W B B W W B T T W W
B W
38
39
−1 𝝂1 − 𝝂2
40
𝒙 = 𝑻𝑥
−1 𝝂1 − 𝝂2
−1𝑻𝑐
𝒙 = 𝑻𝑥
−1 𝝂1 − 𝝂2
Using a threshold on 𝒙𝑈𝒚, we can classify 𝒚
41
A function 𝑔
𝑗(𝒚) for each class 𝑗 is found
𝑗=1,…,𝑑
𝑗(𝒚)
𝑦2
𝑘(𝒚) 𝑘 𝑗
42
Totally linearly separable
Pairwise linearly separable
43
44
45
46
Boundary of the contiguous ℛ𝑗 and ℛ𝑘: ∀𝒚, 𝑔
𝑗 𝒚 = 𝑔𝑘(𝒚)
47
48
𝑧 𝑗 = 𝒙 𝑧 𝑗 − 𝒚(𝑗)
49