CS 6316 Machine Learning
Linear Predictors
Yangfeng Ji
Department of Computer Science University of Virginia
CS 6316 Machine Learning Linear Predictors Yangfeng Ji Department - - PowerPoint PPT Presentation
CS 6316 Machine Learning Linear Predictors Yangfeng Ji Department of Computer Science University of Virginia Overview 1. Review: Linear Functions 2. Perceptron 3. Logistic Regression 4. Linear Regression 1 Review: Linear Functions Linear
Department of Computer Science University of Virginia
1
3
d
3
d
4
x1 x2
5
6
7
x1 x2 x (1, 1)
8
x1 x2 x (1, 1)
8
x1 x2 x
9
half {sign(w, x) : w ∈ Rd}
11
x1 x2
12
x1 x2 w, x > 0 w, x < 0 x1 x2 + − 13
x1 x2 w, x > 0 w, x < 0 x1 x2 + − 13
1: Input: S {(x1, y1), . . . , (xm, ym))} 2: Initialize w(0) (0, . . . , 0) 9: Output: w(T)
14
1: Input: S {(x1, y1), . . . , (xm, ym))} 2: Initialize w(0) (0, . . . , 0) 3: for t 1, 2, · · · , T do 4:
8: end for 9: Output: w(T)
14
1: Input: S {(x1, y1), . . . , (xm, ym))} 2: Initialize w(0) (0, . . . , 0) 3: for t 1, 2, · · · , T do 4:
5:
6:
7:
8: end for 9: Output: w(T)
14
1: Input: S {(x1, y1), . . . , (xm, ym))} 2: Initialize w(0) (0, . . . , 0) 3: for t 1, 2, · · · , T do 4:
5:
6:
7:
8: end for 9: Output: w(T)
14
15
15
16
i1 is separable. Let
17
18
18
18
18
19
20
LR {σ(w, x) : w ∈ Rd}
22
23
y ∈ {−1, +1}
y
h(x, +1) > h(x, −1) −1 h(x, +1) < h(x, −1) (20)
23
24
25
26
m
1more detail will be covered in the lecture of optimization methods
27
m
28
m
28
m
m
28
m
29
30
30
31
31
m
32
m
32
ℓ(w)
log pw(yi | xi)
log 1 1 + exp(−yiw, xi)
m
log(1 + exp(−yiw, xi)) (31)
33
ℓ(w)
log pw(yi | xi)
log 1 1 + exp(−yiw, xi)
m
log(1 + exp(−yiw, xi)) (31)
33
m
m
34
reg {w, x : w ∈ Rd}
reg
36
reg such that h(x) gives
37
38
m
38
m
39
w
w
m
m
m
40
i )w m
i )w m
m
i
m
41
42
42
D d1 ... di ... D+
1 d1
...
1 di
...
AA+ 1 ... 1 ... (45) 43
m
m
44
45
45
45
m
w1 w2 46
m
w1 w2 MSE 46
m
w1 w2 MSE ℓ2 Regularization 46
m
w1 w2 MSE ℓ2 Regularization
46
m
47
m
m
47
m
m
47
m
m
m
47
m
i1 N(yi | h(xi), 1 2): likelihood function of the data S
2): prior distribution of w
48
49
Berger, J. O. and Wolpert, R. L. (1988). The likelihood principle. IMS. Bishop, C. M. (2006). Pattern recognition and machine learning. springer. Friedman, J., Hastie, T., and Tibshirani, R. (2001). The elements of statistical learning. Springer.
50