Distributed Training for Large-scale Logistic Models
Siddharth Gopal
Carnegie Mellon Univeristy
21 Aug 2013
1Joint work with Yiming Yang presented at ICML’13 Siddharth Gopal Distributed Training for Large-scale Logistic Models
Distributed Training for Large-scale Logistic Models Siddharth - - PowerPoint PPT Presentation
Distributed Training for Large-scale Logistic Models Siddharth Gopal Carnegie Mellon Univeristy 21 Aug 2013 1 Joint work with Yiming Yang presented at ICML13 Siddharth Gopal Distributed Training for Large-scale Logistic Models Outline of
1Joint work with Yiming Yang presented at ICML’13 Siddharth Gopal Distributed Training for Large-scale Logistic Models
Siddharth Gopal Distributed Training for Large-scale Logistic Models
Siddharth Gopal Distributed Training for Large-scale Logistic Models
Siddharth Gopal Distributed Training for Large-scale Logistic Models
Siddharth Gopal Distributed Training for Large-scale Logistic Models
Distributed Training for Large-scale Logistic Models
Siddharth Gopal Distributed Training for Large-scale Logistic Models
Siddharth Gopal Distributed Training for Large-scale Logistic Models
Distributed Training for Large-scale Logistic Models
Distributed Training for Large-scale Logistic Models
Siddharth Gopal Distributed Training for Large-scale Logistic Models
Siddharth Gopal Distributed Training for Large-scale Logistic Models
j
j γ + bj} ≤ log
j′ {c⊤ j′ γ + dj′}
Siddharth Gopal Distributed Training for Large-scale Logistic Models
j
j γ + bj} ≤ log
j′ {c⊤ j′ γ + dj′}
Siddharth Gopal Distributed Training for Large-scale Logistic Models
Siddharth Gopal Distributed Training for Large-scale Logistic Models
1 13 5 9 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 85 89 0.0E+00 5.0E+03 1.0E+04 1.5E+04 2.0E+04 2.5E+04 3.0E+04
Efficiency of Bound Log-sum-exp Upper-bound Iteration Function-value
Siddharth Gopal Distributed Training for Large-scale Logistic Models
1 3 5
log(x) a = .3 a = 2 a = .02
Siddharth Gopal Distributed Training for Large-scale Logistic Models
K
k xi)
Siddharth Gopal Distributed Training for Large-scale Logistic Models
w
N
K
k xi + N
k xi)
k xi)
K
k xi) − log(ai) − 1 Siddharth Gopal Distributed Training for Large-scale Logistic Models
w
N
K
k xi + N
k xi)
k xi)
K
k xi) − log(ai) − 1
K
N
K
k xi + ai K
k xi) − log(ai) − 1
Distributed Training for Large-scale Logistic Models
w
N
K
k xi + N
k xi)
k xi)
K
k xi) − log(ai) − 1
K
N
K
k xi + ai K
k xi) − log(ai) − 1
Siddharth Gopal Distributed Training for Large-scale Logistic Models
Siddharth Gopal Distributed Training for Large-scale Logistic Models
Dataset # instances #Leaf-labels #Features #Parameters Parameter Size (approx) CLEF 10,000 63 80 5,040 40KB NEWS20 11,260 20 53,975 1,079,500 4MB LSHTC-small 4,463 1,139 51,033 227,760,279 911MB LSHTC-large 93,805 12,294 347,256 4,269,165,264 17GB
Siddharth Gopal Distributed Training for Large-scale Logistic Models
1.0E-06 1.0E-04 1.0E-02 1.0E+00 1.0E+02 1.0E+04 1.0E+06 1.0E+00 1.0E+01 1.0E+02 1.0E+03 Difference from Optimum Time Taken (secs) NEWS-20 Dataset ADMM LC LBFGS DM 1.0E-06 1.0E-04 1.0E-02 1.0E+00 1.0E+02 1.0E+04 1.0E+06 1.0E+00 1.0E+01 1.0E+02 1.0E+03 1.0E+04 1.0E+05 Difference from Optimum Time Taken (secs) CLEF Dataset 1.0E-06 1.0E-04 1.0E-02 1.0E+00 1.0E+02 1.0E+04 1.0E+06 1.0E+00 1.0E+01 1.0E+02 1.0E+03 1.0E+04 1.0E+05 Difference from Optimum Time Taken (secs) LSHTC-small Dataset 1.5E+04 2.0E+04 2.5E+04 3.0E+04 3.5E+04 4.0E+04 4.5E+04 0.0E+00 2.0E+04 4.0E+04 6.0E+04 8.0E+04 1.0E+05 1.2E+05 1.4E+05 Objective Time Taken (secs) LSHTC-large Dataset
Siddharth Gopal Distributed Training for Large-scale Logistic Models
Siddharth Gopal Distributed Training for Large-scale Logistic Models
0.05 0.1 0.5 0.6 0.7 0.8 0.9 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
0.75 0.76 0.77 0.78 0.79 0.8 0.81 0.82
Siddharth Gopal Distributed Training for Large-scale Logistic Models