Learning From Data Lecture 17 Memory and Efficiency in Nearest Neighbor
Memory Efficiency
- M. Magdon-Ismail
CSCI 4100/6100
Learning From Data Lecture 17 Memory and Efficiency in Nearest - - PowerPoint PPT Presentation
Learning From Data Lecture 17 Memory and Efficiency in Nearest Neighbor Memory Efficiency M. Magdon-Ismail CSCI 4100/6100 recap: Similarity and Nearest Neighbor Similarity 1. Simple. | x x | d ( x , x ) = | | 2. No training.
Memory Efficiency
CSCI 4100/6100
recap: Similarity and Nearest Neighbor
Similarity d(x, x′) = | | x − x′ | | 1-NN rule 21-NN rule
k → ∞, k/N → 0 = ⇒ Eout → E∗
k = 3; k = √ N
g(x) = 1 k
k
y[i](x) g(x) = 1 k
k
c A M L Creator: Malik Magdon-Ismail
Memory and Efficiency in Nearest Neighbor: 2 /25
Computational demands − →
Need to store all the data, O(Nd) memory.
N = 106, d = 100, double precision≈ 1GB
Need to compute distance to every data point, O(Nd).
N = 106, d = 100, 3GHz processor ≈ 3ms (compute g(x)) N = 106, d = 100, 3GHz processor ≈ 1hr (compute CV error) N = 106, d = 100, 3GHz processor > 1month (choose best k from among 1000 using CV)
c A M L Creator: Malik Magdon-Ismail
Memory and Efficiency in Nearest Neighbor: 3 /25
Two basic approaches − →
The 5-year old does not remember every horse he has seen, only a few representative horses.
Ongoing research field to develop geometric data structures to make finding nearest neighbors fast.
c A M L Creator: Malik Magdon-Ismail
Memory and Efficiency in Nearest Neighbor: 4 /25
Irrelevant data − →
c A M L Creator: Malik Magdon-Ismail
Memory and Efficiency in Nearest Neighbor: 5 /25
Decision boundary consistent − →
c A M L Creator: Malik Magdon-Ismail
Memory and Efficiency in Nearest Neighbor: 6 /25
Training set consistent − →
c A M L Creator: Malik Magdon-Ismail
Memory and Efficiency in Nearest Neighbor: 7 /25
Comparing − →
DB
TS
versus
c A M L Creator: Malik Magdon-Ismail
Memory and Efficiency in Nearest Neighbor: 8 /25
Consistent
⇒ (g(xn) = yn) − →
DB
TS
c A M L Creator: Malik Magdon-Ismail
Memory and Efficiency in Nearest Neighbor: 9 /25
Training set consistent (k = 3) − →
c A M L Creator: Malik Magdon-Ismail
Memory and Efficiency in Nearest Neighbor: 10 /25
CNN − →
add this point
Consider the solid blue point:
Add a red point:
c A M L Creator: Malik Magdon-Ismail
Memory and Efficiency in Nearest Neighbor: 11 /25
CNN: add red point − →
add this point
Consider the solid blue point:
Add a red point:
c A M L Creator: Malik Magdon-Ismail
Memory and Efficiency in Nearest Neighbor: 12 /25
CNN: algorithm − →
add this point
Consider the solid blue point:
Add a red point:
Minimum consistent set (MCS)? ← NP-hard
c A M L Creator: Malik Magdon-Ismail
Memory and Efficiency in Nearest Neighbor: 13 /25
Digits Data − →
c A M L Creator: Malik Magdon-Ismail
Memory and Efficiency in Nearest Neighbor: 14 /25
Condensing the Digits Data − →
c A M L Creator: Malik Magdon-Ismail
Memory and Efficiency in Nearest Neighbor: 15 /25
Finding the nearest neighbor − →
x[1].
| | x − µ2 | | − r2
| | x − ˆ x[1] | | ≤ | | x − µ2 | | − r2 A branch and bound algorithm Can be applied recursively
c A M L Creator: Malik Magdon-Ismail
Memory and Efficiency in Nearest Neighbor: 16 /25
When does the bound hold? − →
| | x − ˆ x[1] | | ≤ | | x − µ1 | | + r1 So, it suffices that r1 + r2 ≤ | | x − µ2 | | − | | x − µ1 | |. | | x − µ1 | | ≈ 0 means | | x − µ2 | | ≈ | | µ2 − µ2 | |.
It suffices that r1 + r2 ≤ | | µ2 − µ1 | |.
c A M L Creator: Malik Magdon-Ismail
Memory and Efficiency in Nearest Neighbor: 17 /25
Finding clusters – Lloyd’s algorithm − →
µj = 1 |Sj|
xn; rj = max
xn∈Sj |
| xn − µj | |.
c A M L Creator: Malik Magdon-Ismail
Memory and Efficiency in Nearest Neighbor: 18 /25
Furtherest away point − →
µj = 1 |Sj|
xn; rj = max
xn∈Sj |
| xn − µj | |.
c A M L Creator: Malik Magdon-Ismail
Memory and Efficiency in Nearest Neighbor: 19 /25
Next furtherest away point − →
µj = 1 |Sj|
xn; rj = max
xn∈Sj |
| xn − µj | |.
c A M L Creator: Malik Magdon-Ismail
Memory and Efficiency in Nearest Neighbor: 20 /25
All centers picked − →
µj = 1 |Sj|
xn; rj = max
xn∈Sj |
| xn − µj | |.
c A M L Creator: Malik Magdon-Ismail
Memory and Efficiency in Nearest Neighbor: 21 /25
Construct Voronoi regions − →
µj = 1 |Sj|
xn; rj = max
xn∈Sj |
| xn − µj | |.
c A M L Creator: Malik Magdon-Ismail
Memory and Efficiency in Nearest Neighbor: 22 /25
Update centers − →
µj = 1 |Sj|
xn; rj = max
xn∈Sj |
| xn − µj | |.
c A M L Creator: Malik Magdon-Ismail
Memory and Efficiency in Nearest Neighbor: 23 /25
Update Voronoi regions − →
µj = 1 |Sj|
xn; rj = max
xn∈Sj |
| xn − µj | |.
c A M L Creator: Malik Magdon-Ismail
Memory and Efficiency in Nearest Neighbor: 24 /25
Preview RBF − →
each neighbor has equal weight
data further away from x have less weight.
c A M L Creator: Malik Magdon-Ismail
Memory and Efficiency in Nearest Neighbor: 25 /25