Coresets for Data-efficient Training
- f Machine Learning Models
Coresets for Data-efficient Training of Machine Learning Models - - PowerPoint PPT Presentation
Coresets for Data-efficient Training of Machine Learning Models Baharan Mirzasoleiman*, Jeff Bilmes**, Jure Leskovec* * * ** Machine Learning Becomes Mainstream Personalized medicine Robotics Finance Autonomous cars 2 Data is the Fuel
2
Personalized medicine Robotics Finance Autonomous cars
Object detection performance in mAP@[.5,.95] on COCO minival [ ]
4
[SGM’19]
5
6
7
i∈V
Loss function associated with training example i ∈ V Regularizer
: Linear regression, logistic regression, ridge regression, regularized support vector machines (SVM)
: Neural networks
Training data volume:
Feature Label
8
i = wk i−1 − αk∇fi(wi−1)
i∈V
9
10 20 30 10 20 30 10 20 30 10 20 30
, we get a speedup by only training on
, s.t.
10
11
w∈𝒳 ∥ ∑ i∈V
j∈S
Gradients at w
Full gradient Gradient of S
Training Data: {(xi, yi), i ∈ V}
12
i∈V
j∈S* ∥∇fi(w) − ∇fj(w)∥ ≤ ϵ
13
i∈V
j∈S* ∥∇fi(w) − ∇fj(w)∥ ≤ ϵ
i∈V
j∈S* ∥∇fi(w) − ∇fj(w)∥ ≤ ∑ i∈V
j∈S* max w∈𝒳 ∥∇fi(w) − ∇fj(w)∥ ≤ ϵ
: upper-bound on the gradient difference
14
[KF’19]
i fi(w) − ∇z(L) j fj(w)∥)
Input to the last layer Feature vector
: Linear/logistic/ridge regression, regularized SVM
: Neural networks
can be found as a preprocessing step
is cheap to compute, but we have to update
15
by the size of the corresponding cluster
1 epoch w=0.05 w=0.1
w=0.2
w=0.3
Loss function Gradients of data points i ∈ V
from dataset
16
Theorem: For a -strongly convex loss function, CRAIG with decaying step-size converges to a neighborhood of the
w=0.05 w=0.1
w=0.2 w=0.3
17
CRAIG is complementary to all the above methods
18
19
10% 30% 50% 70% 90% 10% 20% 30% 90%
SGD+ All data
20
21
Training ResNet20 on subsets of various size from CIFAR10 with 50K points
22