Machine Learning Research at eBay: An Industry Lab Perspective
Dennis DeCoste eBay Research Labs
ML Summer School @ UC Santa Cruz, July 13, 2012
Dennis DeCoste (eBay Research Labs) 1 / 66
Machine Learning Research at eBay: An Industry Lab Perspective - - PowerPoint PPT Presentation
Machine Learning Research at eBay: An Industry Lab Perspective Dennis DeCoste eBay Research Labs ML Summer School @ UC Santa Cruz, July 13, 2012 Dennis DeCoste (eBay Research Labs) 1 / 66 Dennis DeCoste @ research labs NASA / Caltech JPL,
Dennis DeCoste (eBay Research Labs) 1 / 66
Dennis DeCoste (eBay Research Labs) 2 / 66
1
2
3
Dennis DeCoste (eBay Research Labs) 3 / 66
ML at eBay: Data and Apps
Dennis DeCoste (eBay Research Labs) 4 / 66
ML at eBay: Data and Apps
Dennis DeCoste (eBay Research Labs) 5 / 66
ML at eBay: Data and Apps
Dennis DeCoste (eBay Research Labs) 6 / 66
ML at eBay: Data and Apps
Dennis DeCoste (eBay Research Labs) 7 / 66
ML at eBay: Data and Apps
Dennis DeCoste (eBay Research Labs) 8 / 66
ML at eBay: Data and Apps
Dennis DeCoste (eBay Research Labs) 9 / 66
ML at eBay: Data and Apps
Dennis DeCoste (eBay Research Labs) 10 / 66
ML at eBay: Data and Apps
Dennis DeCoste (eBay Research Labs) 11 / 66
ML at eBay: Data and Apps
Dennis DeCoste (eBay Research Labs) 12 / 66
ML at eBay: Data and Apps
Dennis DeCoste (eBay Research Labs) 13 / 66
ML at eBay: Data and Apps
Dennis DeCoste (eBay Research Labs) 14 / 66
ML at eBay: Data and Apps
Dennis DeCoste (eBay Research Labs) 15 / 66
ML at eBay: Data and Apps
Dennis DeCoste (eBay Research Labs) 16 / 66
ML at eBay: Data and Apps
Dennis DeCoste (eBay Research Labs) 17 / 66
ML at eBay: Data and Apps
Dennis DeCoste (eBay Research Labs) 18 / 66
ML at eBay: Data and Apps
Dennis DeCoste (eBay Research Labs) 19 / 66
ML at eBay: Data and Apps
Dennis DeCoste (eBay Research Labs) 20 / 66
ML at eBay: Data and Apps
Dennis DeCoste (eBay Research Labs) 21 / 66
ML at eBay: Data and Apps
Dennis DeCoste (eBay Research Labs) 22 / 66
ML at eBay: Data and Apps
Dennis DeCoste (eBay Research Labs) 23 / 66
ML at eBay: Data and Apps
Dennis DeCoste (eBay Research Labs) 24 / 66
ML at eBay: Data and Apps
Dennis DeCoste (eBay Research Labs) 25 / 66
ML at eBay: Data and Apps
Dennis DeCoste (eBay Research Labs) 26 / 66
Needs and Opportunities for ML Research Randomized Algorithms: Scalable Large-Scale
1
2
3
Dennis DeCoste (eBay Research Labs) 27 / 66
Needs and Opportunities for ML Research Randomized Algorithms: Scalable Large-Scale
O(nz k + (m + n)k2) vs O(m n k) MATLAB [U,S,V]=svds(A,k) [multicore,GPU,2pass] See: “Finding structure with randomness”, Halko, Martinsson, Tropp. SIAM Review, 2011.
Dennis DeCoste (eBay Research Labs) 28 / 66
Needs and Opportunities for ML Research Randomized Algorithms: Scalable Large-Scale
U = randn(20*1000,100); V = randn(20*1000,100); A = U*V’; [v, i, j] = find(A); [~,ids] = sort(v); ids = [ ids(1:length(v)/200); ids(end-length(v)/200:end) ]; A = sparse( v(ids), i(ids), j(ids) ); tic; [Ur, Sr, Vr] = rsvd(A, k); toc, tic; [U0, S0, V0] = svds(A, k); toc
also useful for seeding, even if accuracy not sufficient
Dennis DeCoste (eBay Research Labs) 29 / 66
Needs and Opportunities for ML Research Randomized Algorithms: Scalable Large-Scale
Dennis DeCoste (eBay Research Labs) 30 / 66
Needs and Opportunities for ML Research Randomized Algorithms: Scalable Large-Scale
i=1 ci ≈ n
Dennis DeCoste (eBay Research Labs) 31 / 66
Needs and Opportunities for ML Research Randomized Algorithms: Scalable Large-Scale
BS = function(x, stat, k) { bs = replicate(k, stat( sample(x,replace=T) )) quantile(bs, c(0.025, 0.5, 0.975)) } BS.pois = function(x, stat, k) { bs = replicate(k, stat( rep(x,rpois(length(x),lambda=1)) )); quantile(bs, c(0.025, 0.5, 0.975)) } LMH.normal <- function(x) { mu = mean(x); stderr = sqrt(var(x)/length(x)) c(mu - 1.96*stderr, mu, mu + 1.96*stderr) } n = 100*1000; k = 10*1000; x = runif(n,0,1); s = mean; print(quantile(replicate(k,s(runif(n,0,1))),c(0.025,0.5,0.975))) 0.4982299 0.5000069 0.5017628 0.4978307 0.4996247 0.5014186 print(LMH.normal(x)) 0.4978188 0.4996341 0.5014037 print(BS(x,s,k)) 0.4978936 0.4996468 0.5014059 print(BS.pois(x,s,k))
Dennis DeCoste (eBay Research Labs) 32 / 66
Needs and Opportunities for ML Research Randomized Algorithms: Scalable Large-Scale
BS = function(x, stat, k) { bs = replicate(k, stat( sample(x,replace=T) )) quantile(bs, c(0.025, 0.5, 0.975)) } BS.pois = function(x, stat, k) { bs = replicate(k, stat( rep(x,rpois(length(x),lambda=1)) )); quantile(bs, c(0.025, 0.5, 0.975)) } LMH.normal <- function(x) { mu = mean(x); stderr = sqrt(var(x)/length(x)) c(mu - 1.96*stderr, mu, mu + 1.96*stderr) } n = 100*1000; k = 10*1000; x = rexp(n,1); s = median; print(quantile(replicate(k,s(rexp(n,1)),c(0.025,0.5,0.975))) 0.6869930 0.6931360 0.6994282 0.9964869 1.0026975 1.0089082 print(LMH.normal(x)) 0.6899971 0.6967001 0.7029585 print(BS(x,s,k)) 0.6899953 0.6967119 0.7029711 print(BS.pois(x,s,k)) 0.6966489 print(s(x))
Dennis DeCoste (eBay Research Labs) 33 / 66
Needs and Opportunities for ML Research Randomized Algorithms: Scalable Large-Scale
150GB data (n=6M,d=3000); 10x8 cores (60GB) vs 20x4 cores (240GB)
b = n0.7 subsamples → r = 50 weighted(sum=n) resamplings
Dennis DeCoste (eBay Research Labs) 34 / 66
Needs and Opportunities for ML Research Randomized Algorithms: Scalable Large-Scale
[finite diffs: 2 loss evals / iter]
Dennis DeCoste (eBay Research Labs) 35 / 66
Needs and Opportunities for ML Research Systems Issues: The Need for Speed
1
2
3
Dennis DeCoste (eBay Research Labs) 36 / 66
Needs and Opportunities for ML Research Systems Issues: The Need for Speed
Dennis DeCoste (eBay Research Labs) 37 / 66
Needs and Opportunities for ML Research Systems Issues: The Need for Speed
Dennis DeCoste (eBay Research Labs) 38 / 66
Needs and Opportunities for ML Research Systems Issues: The Need for Speed
Dennis DeCoste (eBay Research Labs) 39 / 66
Needs and Opportunities for ML Research Systems Issues: The Need for Speed
X = randn(1000, 1000*1000, ’single’); Q = randn(1000, 1000, ’single’); q = Q(:, 1); recall ||x − q||2 ≡ x’ * x - 2 * x’ * q + q’ * q
[BLAS3 SGEMM or not]
Dennis DeCoste (eBay Research Labs) 40 / 66
Needs and Opportunities for ML Research Systems Issues: The Need for Speed
X = randn(1000, 1000*1000, ’single’); Q = randn(1000, 1000, ’single’); q = Q(:, 1); recall ||x − q||2 ≡ x’ * x - 2 * x’ * q + q’ * q
[BLAS3 SGEMM or not]
(3.5 Gz)(12 cores)(4 SSE flops) ≈ 168 GF/s; X’*Q = 1 TF
[6 secs @ peak]
newests: Sandy Bridge AVX vs SSE: 2x more (8 vs 4 float / sec)
Dennis DeCoste (eBay Research Labs) 40 / 66
Needs and Opportunities for ML Research Systems Issues: The Need for Speed
X = randn(1000, 1000*1000, ’single’); Q = randn(1000, 1000, ’single’); q = Q(:, 1); recall ||x − q||2 ≡ x’ * x - 2 * x’ * q + q’ * q
[BLAS3 SGEMM or not]
(3.5 Gz)(12 cores)(4 SSE flops) ≈ 168 GF/s; X’*Q = 1 TF
[6 secs @ peak]
newests: Sandy Bridge AVX vs SSE: 2x more (8 vs 4 float / sec)
Dennis DeCoste (eBay Research Labs) 40 / 66
Needs and Opportunities for ML Research Systems Issues: The Need for Speed
X = randn(1000, 1000*1000, ’single’); Q = randn(1000, 1000, ’single’); q = Q(:, 1); recall ||x − q||2 ≡ x’ * x - 2 * x’ * q + q’ * q
[BLAS3 SGEMM or not]
(3.5 Gz)(12 cores)(4 SSE flops) ≈ 168 GF/s; X’*Q = 1 TF
[6 secs @ peak]
newests: Sandy Bridge AVX vs SSE: 2x more (8 vs 4 float / sec)
Dennis DeCoste (eBay Research Labs) 40 / 66
Needs and Opportunities for ML Research Systems Issues: The Need for Speed
dual SATA3 SSD’s: ≈ 1 GB/s reads
Dennis DeCoste (eBay Research Labs) 41 / 66
Needs and Opportunities for ML Research Systems Issues: The Need for Speed
fast Maximum Margin Matrix Factorization (Rennie & Srebro, ICML 2005)
Dennis DeCoste (eBay Research Labs) 42 / 66
Needs and Opportunities for ML Research Systems Issues: The Need for Speed
Dennis DeCoste (eBay Research Labs) 43 / 66
Needs and Opportunities for ML Research Systems Issues: The Need for Speed
‘Debunking the 100x GPU vs GPU Myth”, (Lee et al (Intel), ISCA 2010)
Dennis DeCoste (eBay Research Labs) 44 / 66
Needs and Opportunities for ML Research Systems Issues: The Need for Speed
Dennis DeCoste (eBay Research Labs) 45 / 66
Needs and Opportunities for ML Research Model Compilation: Decoupling Train vs Test
1
2
3
Dennis DeCoste (eBay Research Labs) 46 / 66
Needs and Opportunities for ML Research Model Compilation: Decoupling Train vs Test
Dennis DeCoste (eBay Research Labs) 47 / 66
Needs and Opportunities for ML Research Model Compilation: Decoupling Train vs Test
Dennis DeCoste (eBay Research Labs) 48 / 66
Needs and Opportunities for ML Research Model Compilation: Decoupling Train vs Test
Dennis DeCoste (eBay Research Labs) 49 / 66
Needs and Opportunities for ML Research Model Compilation: Decoupling Train vs Test
Dennis DeCoste (eBay Research Labs) 50 / 66
Needs and Opportunities for ML Research Model Compilation: Decoupling Train vs Test
i=1 αiK(x, si), 0 ≤ ai ≤ C, s ∈ X
i=1 βiK(x, zi) for M ≪ N
i=1 αiφ(si)
i=1 βiφ(zi)
Dennis DeCoste (eBay Research Labs) 51 / 66
Needs and Opportunities for ML Research Model Compilation: Decoupling Train vs Test
Dennis DeCoste (eBay Research Labs) 52 / 66
Needs and Opportunities for ML Research Model Compilation: Decoupling Train vs Test
i=1 αiK(Si, x)
(e.g. approx K’s (PCA20), so cost ≪ O(dN), kd-tree, etc.)
j=1 αjK(Sj, x), until gk(x) outside Lk, Hk
Dennis DeCoste (eBay Research Labs) 53 / 66
Needs and Opportunities for ML Research Model Compilation: Decoupling Train vs Test
Dennis DeCoste (eBay Research Labs) 54 / 66
Needs and Opportunities for ML Research Model Compilation: Decoupling Train vs Test
Dennis DeCoste (eBay Research Labs) 55 / 66
Needs and Opportunities for ML Research Model Compilation: Decoupling Train vs Test
Dennis DeCoste (eBay Research Labs) 56 / 66
Needs and Opportunities for ML Research Model Compilation: Decoupling Train vs Test
Dennis DeCoste (eBay Research Labs) 57 / 66
Needs and Opportunities for ML Research Model Compilation: Decoupling Train vs Test
Dennis DeCoste (eBay Research Labs) 58 / 66
Needs and Opportunities for ML Research Model Compilation: Decoupling Train vs Test
Dennis DeCoste (eBay Research Labs) 59 / 66
Needs and Opportunities for ML Research Model Compilation: Decoupling Train vs Test
Dennis DeCoste (eBay Research Labs) 60 / 66
Needs and Opportunities for ML Research Model Compilation: Decoupling Train vs Test
i=1 αiK(S, x)
Dennis DeCoste (eBay Research Labs) 61 / 66
Needs and Opportunities for ML Research Model Compilation: Decoupling Train vs Test
Dennis DeCoste (eBay Research Labs) 62 / 66
Conclusion / Open Issues
Dennis DeCoste (eBay Research Labs) 63 / 66
Conclusion / Open Issues
Dennis DeCoste (eBay Research Labs) 64 / 66
Conclusion / Open Issues
NIPS, 1997.
C.-T. Chu, S. K. Kim, Y.-A. Lin, Y. Yu, G. R. Bradski, A. Y. Ng, and K. Olukotun. Map-reduce for machine learning on multicore. In NIPS, 2006.
NIPS, 1995.
machine classification via distance geometry. In ICML, 2002.
incremental approximate nearest support vectors. In ICML, 2003.
46:161–190, 2002.
Dennis DeCoste (eBay Research Labs) 65 / 66
Conclusion / Open Issues
algorithms for constructing approximate matrix decompositions. SIAM Rev., Survey and Review section, 53(2):217–288, June 2011.
2012.
for using hadoop on a cluster. In 1st International Workshop on Hot Topics in Cloud Data Processing (HotCDP 2012), ACM, 2012.
Dennis DeCoste (eBay Research Labs) 66 / 66