Learning From Data Lecture 8 Linear Classification and Regression
Linear Classification Linear Regression
- M. Magdon-Ismail
CSCI 4100/6100
Learning From Data Lecture 8 Linear Classification and Regression - - PowerPoint PPT Presentation
Learning From Data Lecture 8 Linear Classification and Regression Linear Classification Linear Regression M. Magdon-Ismail CSCI 4100/6100 recap: Approximation Versus Generalization VC Analysis Bias-Variance Analysis E out E in + ( d
Linear Classification Linear Regression
CSCI 4100/6100
recap: Approximation Versus Generalization
in-sample error model complexity
VC dimension, dvc Error d∗
vc
The VC warranty had conditions for becoming void:
You can’t look at your data before choosing H. Data must be generated i.i.d from P(x). Data and test case from same P(x) (same bin).
x y x y x y ¯ g(x) sin(x) x y ¯ g(x) sin(x)
H0 bias = 0.50; var = 0.25. Eout = 0.75 H1 bias = 0.21; var = 1.69. Eout = 1.90
c A M L Creator: Malik Magdon-Ismail
Linear Classification and Regression: 2 /23
Recap: learning curve − →
recap: Decomposing The Learning Curve
Number of Data Points, N Expected Error in-sample error generalization error Eout Ein Number of Data Points, N Expected Error bias variance Eout Ein
Pick H that can generalize and has a good chance to fit the data Pick (H, A) to approximate f and not behave wildly after seeing the data
c A M L Creator: Malik Magdon-Ismail
Linear Classification and Regression: 3 /23
3 learning problems − →
Logistic Regression Credit Analysis
Approve
Amount
Probability
y ∈ R y ∈ [0, 1] y = ±1 Classification Regression
c A M L Creator: Malik Magdon-Ismail
Linear Classification and Regression: 4 /23
Linear signal − →
x is the augmented vector: x ∈ {1} × Rd
c A M L Creator: Malik Magdon-Ismail
Linear Classification and Regression: 5 /23
Using the linear signal − →
R
c A M L Creator: Malik Magdon-Ismail
Linear Classification and Regression: 6 /23
Classification and PLA − →
h(x) = sign(wtx)
Eout(h) ≤ Ein(h) + O
N log N
⇒ Ein = 0. w(t + 1) = w(t) + x∗y∗ ↑
misclassified data point
Ein = 0 = ⇒ Eout ≈ 0
(f is well approximated by a linear fit).
pocket algorithm
select good features
c A M L Creator: Malik Magdon-Ismail
Linear Classification and Regression: 7 /23
Non-separable data − →
c A M L Creator: Malik Magdon-Ismail
Linear Classification and Regression: 8 /23
Pocket algorithm − →
Minimizing Ein is a hard combinatorial problem.
– Run PLA – At each step keep the best Ein (and w) so far.
(Its not rocket science, but it works.)
c A M L Creator: Malik Magdon-Ismail
Linear Classification and Regression: 9 /23
Digits − →
c A M L Creator: Malik Magdon-Ismail
Linear Classification and Regression: 10 /23
Input is 256 dimensional − →
1 0.78 -0.72 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 0.1 1 0.92 -0.44 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -0.26 0.95 1 -0.16 -1 -1
0.07 -0.92 0.11 0.96 0.30 -0.88 -1 -0.07 1 0.64 -0.99 -1 -1 -0.67 1 1 0.75 0.34 1 0.70 -0.94 -1 -1 0.54 1 0.02 -1 -1
c A M L Creator: Malik Magdon-Ismail
Linear Classification and Regression: 11 /23
Intensity and symmetry features − →
(dictionary.com: a prominent or conspicuous part or characteristic)
c A M L Creator: Malik Magdon-Ismail
Linear Classification and Regression: 12 /23
PLA on digits data − →
Iteration Number, t Error (log scale) Eout Ein
250 500 750 1000 1% 10% 50%
c A M L Creator: Malik Magdon-Ismail
Linear Classification and Regression: 13 /23
Pocket on digits data − →
Iteration Number, t Error (log scale) Eout Ein
250 500 750 1000 1% 10% 50%
Iteration Number, t Error (log scale) Eout Ein
250 500 750 1000 1% 10% 50%
c A M L Creator: Malik Magdon-Ismail
Linear Classification and Regression: 14 /23
Regression − →
age 32 years gender male salary 40,000 debt 26,000 years in job 1 year years at home 3 years . . . . . .
regression ≡ y ∈ R
d
c A M L Creator: Malik Magdon-Ismail
Linear Classification and Regression: 15 /23
Regression − →
age 32 years gender male salary 40,000 debt 26,000 years in job 1 year years at home 3 years . . . . . .
regression ≡ y ∈ R
d
c A M L Creator: Malik Magdon-Ismail
Linear Classification and Regression: 16 /23
Squared error − →
x y
x1 x2 y
c A M L Creator: Malik Magdon-Ismail
Linear Classification and Regression: 17 /23
Squared error − →
x y
x1 x2 y
N N
c A M L Creator: Malik Magdon-Ismail
Linear Classification and Regression: 18 /23
Matrix representation − →
=
= Xw
N
1 N|
2
1 N|
2
1 N(wtXtXw − 2wtXty + yty)
c A M L Creator: Malik Magdon-Ismail
Linear Classification and Regression: 19 /23
Pseudoinverse solution − →
Vector Calculus: To minimize Ein(w), set ∇wEin(w) = 0. ∇w(wtAw) = (A + At)w, ∇w(wtb) = b. A = XtX and b = Xty:
c A M L Creator: Malik Magdon-Ismail
Linear Classification and Regression: 20 /23
Regression algorithm − →
c A M L Creator: Malik Magdon-Ismail
Linear Classification and Regression: 21 /23
Generalization − →
One can obtain a regression version of dvc. There are other bounds, for example: E[Eout(h)] = E[Ein(h)] + O d N
Expected Error Eout Ein σ2 d + 1
c A M L Creator: Malik Magdon-Ismail
Linear Classification and Regression: 22 /23
Regression for classification − →
For example yn = ±1. (±1 are real values!) Use linear regression to get w with wtxn ≈ yn = ±1 Then sign(wtxn) will likely agree with yn = ±1. These can be good initial weights for classification.
Example. Classifying 1 from not 1 (multiclass → 2 class)
Average Intensity Symmetry
c A M L Creator: Malik Magdon-Ismail
Linear Classification and Regression: 23 /23