Machine Learning Research at eBay: An Industry Lab Perspective - PowerPoint PPT Presentation

Machine Learning Research at eBay: An Industry Lab Perspective Dennis DeCoste eBay Research Labs ML Summer School @ UC Santa Cruz, July 13, 2012 Dennis DeCoste (eBay Research Labs) 1 / 66

Dennis DeCoste @ research labs NASA / Caltech JPL, principal computer scientist Yahoo! Research, founding Director, Machine Learning Microsoft Live Labs, principal research scientist Facebook, research scientist eBay Research Labs, Director of Machine Learning Dennis DeCoste (eBay Research Labs) 2 / 66

Outline ML at eBay: Data and Apps 1 Needs and Opportunities for ML Research 2 Randomized Algorithms: Scalable Large-Scale Systems Issues: The Need for Speed Model Compilation: Decoupling Train vs Test Conclusion / Open Issues 3 Dennis DeCoste (eBay Research Labs) 3 / 66

ML at eBay: Data and Apps “ML Apps” vs “Applied Research” eBay product groups “applied scientists”; driven by Quarterly roadmaps search, recommendation, catalog, fraud detection, ... eRL focus: “what if” & foundational ML research data: scaling to massive data (e.g. behaviorial logs) algos: stream, sample, randomize, mini-batch, ... systems issues: exploit multi-cores, GPU, Hadoop, ... methodology: decouple accurate train from cheap run Dennis DeCoste (eBay Research Labs) 4 / 66

ML at eBay: Data and Apps Data at eBay users : behavior logs (query,click,view,wish,buy,...) ≈ 100 million users (buyers and sellers) items : text (title,description,...), prices, images, ... ≈ 10 million items listed/day billions of historic item listings long tail: many have no product SKU id (e.g. antiques) variety: auctions / Buy It Now, new/used, ... Dennis DeCoste (eBay Research Labs) 5 / 66

ML at eBay: Data and Apps Text Data: Project Orgami LDA, sentiment analysis (e.g. product/seller reviews), NLP, ... information extraction (e.g. product properties) item classification into large catalog taxonomy Dennis DeCoste (eBay Research Labs) 6 / 66

ML at eBay: Data and Apps Example Product Descriptions study: billions of descriptions ( ≈ 1 year) Dennis DeCoste (eBay Research Labs) 7 / 66

ML at eBay: Data and Apps Unsupervised Property Extraction (Rohanimanesh, Mauge, Ruvini (eRL), ACL 2012) big data, some structured sellers → simple heuristics suffice popularity of name/value ≡ # sellers using it KN = known name (pattern 1 finds new names) Dennis DeCoste (eBay Research Labs) 8 / 66

ML at eBay: Data and Apps Example Discovered Properties (5 Cats) Dennis DeCoste (eBay Research Labs) 9 / 66

ML at eBay: Data and Apps Property Synonym Discovery logistic regression with features: name string edit distance, common values, co-occurrence (anti!), ... train set: hand cluster discovered property names Dennis DeCoste (eBay Research Labs) 10 / 66

ML at eBay: Data and Apps Example Discovered Property Synonyms precision = 91.8%, recall = 51% (lack of value overlap: need clean/normalize) Dennis DeCoste (eBay Research Labs) 11 / 66

ML at eBay: Data and Apps Large eBay Taxonomy of Categories Dennis DeCoste (eBay Research Labs) 12 / 66

ML at eBay: Data and Apps Cats: Levels / Sizes, Skew Dennis DeCoste (eBay Research Labs) 13 / 66

ML at eBay: Data and Apps Cats: Two-Stage ML Approach item titles: millions of unigrams Dennis DeCoste (eBay Research Labs) 14 / 66

ML at eBay: Data and Apps Cats: Grouping Algorithm Dennis DeCoste (eBay Research Labs) 15 / 66

ML at eBay: Data and Apps Cats: Example Group Result Dennis DeCoste (eBay Research Labs) 16 / 66

ML at eBay: Data and Apps Cats: Some Results hier-ebay-struc : size-balanced grouping of existing 6-level tree > 20,000 classes and 83 million training examples Dennis DeCoste (eBay Research Labs) 17 / 66

ML at eBay: Data and Apps ML Apps at eBay: Search Dennis DeCoste (eBay Research Labs) 18 / 66

ML at eBay: Data and Apps eBay Search: Some ML Challenges query text → rank product item listings e.g. pairwise RankSVM, but whack-a-mole lower unseen items blending with user deterministic orderings (“Time Ending Soonest”, cheapest first, ...) correlating offline metrics to A/B tests temporal data mining in Hadoop (Mobius) Dennis DeCoste (eBay Research Labs) 19 / 66

ML at eBay: Data and Apps “Null Search”: Zero-Recall Queries (Singh, Parikh, Sundaresan (eRL), WWW 2012) unique eBay challenges: 100 million queries / day, significant nulls inventory dynamic e.g. “warren buffet lunch” seller / buyer vocab mismatch e.g. “universal” vs “size 5” clock key Dennis DeCoste (eBay Research Labs) 20 / 66

ML at eBay: Data and Apps Null Search: Common Query Attributes Dennis DeCoste (eBay Research Labs) 21 / 66

ML at eBay: Data and Apps Null Search: Stats (Top-K, Historic Overlap) billions of products in history, 10 million listed / day past month items overlap 30% today’s null queries coverage 3x @ 10% (non vs null) Dennis DeCoste (eBay Research Labs) 22 / 66

ML at eBay: Data and Apps Null Search: Algo/Example Dennis DeCoste (eBay Research Labs) 23 / 66

ML at eBay: Data and Apps Null Search: Taxa Inferred per Search Dennis DeCoste (eBay Research Labs) 24 / 66

ML at eBay: Data and Apps Image Search Dennis DeCoste (eBay Research Labs) 25 / 66

ML at eBay: Data and Apps ML Apps at eBay: Merch product merchandise recommendation e.g. large-scale sparse user/item SVD (100 million by billions) extra challenges: listings dynamic, items � = products Dennis DeCoste (eBay Research Labs) 26 / 66

Needs and Opportunities for ML Research Randomized Algorithms: Scalable Large-Scale Outline ML at eBay: Data and Apps 1 Needs and Opportunities for ML Research 2 Randomized Algorithms: Scalable Large-Scale Systems Issues: The Need for Speed Model Compilation: Decoupling Train vs Test Conclusion / Open Issues 3 Dennis DeCoste (eBay Research Labs) 27 / 66

Needs and Opportunities for ML Research Randomized Algorithms: Scalable Large-Scale Randomized Large-Scale SVD function [U, S, V] = rsvd (A, k) [m,n]=size(A); nz=nnz(A); ≈ ( Q ∗ Q ′ ) ∗ A %% A P = randn (n, k + 5); Y = full( A * P ); O ( nz k ) %% [Q, R] = qr (Y, 0); O ( m k ˆ2) %% B = full( Q’ * A ); O ( k nz ) %% [Uhat, S, V] = svd (B, ’econ’); %% O ( n k ˆ2) U = Q * Uhat; %% O ( m k ˆ2) U = U(:,1:k); S = S(1:k,1:k); V = V(:,1:k); O ( nz k + ( m + n ) k 2 ) vs O ( m n k ) MATLAB [U,S,V]=svds(A,k) [multicore,GPU,2pass] See: “Finding structure with randomness”, Halko, Martinsson, Tropp. SIAM Review, 2011. Dennis DeCoste (eBay Research Labs) 28 / 66

Needs and Opportunities for ML Research Randomized Algorithms: Scalable Large-Scale RSVD vs SVD U = randn(20*1000,100); V = randn(20*1000,100); A = U*V’; [v, i, j] = find(A); [~,ids] = sort(v); ids = [ ids(1:length(v)/200); ids(end-length(v)/200:end) ]; A = sparse( v(ids), i(ids), j(ids) ); tic; [Ur, Sr, Vr] = rsvd (A, k); toc, tic; [U0, S0, V0] = svds (A, k); toc also useful for seeding, even if accuracy not sufficient Dennis DeCoste (eBay Research Labs) 29 / 66

Needs and Opportunities for ML Research Randomized Algorithms: Scalable Large-Scale Reservoir Sampling UNIX: head -1000 < rows.txt > subset.txt want: samp -1000 < rows.txt > subset.txt void reservoir_sample(int k) { int n = 0; vector<string> R(k); string s; while (true) { cin >> s; if (cin.eof()) break; if (n < k) { R[n] = s; } else { int i = rand()%n; if (i < k) R[i] = s; } n++; } if (n<k) k=n; for (int i=0; i<k; i++) { cout << R[i] << endl; } } apps: streams, map-reduce, simulators Dennis DeCoste (eBay Research Labs) 30 / 66

Needs and Opportunities for ML Research Randomized Algorithms: Scalable Large-Scale Bootstrap Sampling traditional bootstrap sample with replacement ( k times do: n → n ) per k samples: n times, pick from n examples e.g. k trainings → ensemble (e.g. random forest) e.g. internet log aggregation: sums (clicks, revenue), ... issue: sample by entry, or by user, or ... confidence intervals (not assume normal distribution) online bootstrap popularized in ML by (Oza & Russell, AISTATS 2001) per example: determine k counts for k samples; each sample: c i = rpois( λ = 1); � n i =1 c i ≈ n good for: streaming, n unknown (e.g. user sampling) Dennis DeCoste (eBay Research Labs) 31 / 66

Needs and Opportunities for ML Research Randomized Algorithms: Scalable Large-Scale Bootstrap Example BS = function(x, stat, k) { bs = replicate(k, stat( sample(x,replace=T) )) quantile(bs, c(0.025, 0.5, 0.975)) } BS.pois = function(x, stat, k) { bs = replicate(k, stat( rep(x,rpois(length(x),lambda=1)) )); quantile(bs, c(0.025, 0.5, 0.975)) } LMH.normal <- function(x) { mu = mean(x); stderr = sqrt(var(x)/length(x)) c(mu - 1.96*stderr, mu, mu + 1.96*stderr) } n = 100*1000; k = 10*1000; x = runif(n,0,1); s = mean; print(quantile(replicate(k,s(runif(n,0,1))),c(0.025,0.5,0.975))) 0.4982299 0.5000069 0.5017628 0.4978307 0.4996247 0.5014186 print(LMH.normal(x)) 0.4978188 0.4996341 0.5014037 print(BS(x,s,k)) 0.4978936 0.4996468 0.5014059 print(BS.pois(x,s,k)) Dennis DeCoste (eBay Research Labs) 32 / 66

Machine Learning Research at eBay: An Industry Lab Perspective - PowerPoint PPT Presentation

Machine Learning Research at eBay: An Industry Lab Perspective Dennis DeCoste eBay Research Labs ML Summer School @ UC Santa Cruz, July 13, 2012 Dennis DeCoste (eBay Research Labs) 1 / 66 Dennis DeCoste @ research labs NASA / Caltech JPL,

CQRS AT DBA Morten Jokumsen eBay classifieds Morten Jokumsen Software Architect @ eBay

Introducing Krylov eBay AI Platform - Machine Learning Made Easy Henry Saputra Technical Lead

Pronto Elasticsearch Extension Practice in eBay Donggeng Yu 12/07/2019, Pronto, eBay 1 Agenda

LEGACY APPLICATION INTO HIGH SCALABLE ARCHITECTURE Morten Jokumsen eBay classifieds eBay Inc.

Agility in eBay QCon San Francisco November 17, 2011 Deepak Nadig Distinguished Application

eBay Mine Finds - or - Adventures in On-line Rockhounding Presentation Notes: 1. Lapis - eBay -

Agile Enterprise Analytics OLIVER RATZESBERGER Sr. Director Architecture & Operations eBay

SOA @ eBay : How is it a hit Sastry Malladi Distinguished Architect. Distinguished Architect.

Continuous delivery for native apps Niels Frydenholm, ebay Classifieds Continuous delivery 3

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

Operators Deep Dive Graham Hayes / HP Ron Rickard / eBay Inc. Graham Hayes - HP Cloud Ron

Why Do People Give? An Experimental Test of Pure and Impure Altruism Lise Vesterlund Mark

Natural Language Processing Info 159/259 Lecture 3: Text classification 2 (Aug 30, 2018)

Reviews Using Off-The-Shelf Argumentation Mining Marco Passon, Marco Lippi, Giuseppe Serra ,

Magnetic Field of a Wire Fundamental Laws for Calculating B-field Biot-Savart Law (long

TX-NM Network Gathering: August 1315, 2015 Generation X Bill Young MCC Austin Last updated:

USING PRIORS TO IMPROVE* ESTIMATES OF MUSIC STRUCTURE Jordan B. L. Smith Masataka Goto

Formal Program Optimization in Nuprl Using Computational Equivalence and Partial Types Vincent

arXiv:1810.02435v2 [hep-ex] 8 Oct 2018 Department of Physics Duke University, Durham, North

Machine Learning Research at eBay: An Industry Lab Perspective - PowerPoint PPT Presentation

Machine Learning Research at eBay: An Industry Lab Perspective Dennis DeCoste eBay Research Labs ML Summer School @ UC Santa Cruz, July 13, 2012 Dennis DeCoste (eBay Research Labs) 1 / 66 Dennis DeCoste @ research labs NASA / Caltech JPL,

CQRS AT DBA Morten Jokumsen eBay classifieds Morten Jokumsen Software Architect @ eBay

Introducing Krylov eBay AI Platform - Machine Learning Made Easy Henry Saputra Technical Lead

Pronto Elasticsearch Extension Practice in eBay Donggeng Yu 12/07/2019, Pronto, eBay 1 Agenda

LEGACY APPLICATION INTO HIGH SCALABLE ARCHITECTURE Morten Jokumsen eBay classifieds eBay Inc.

Agility in eBay QCon San Francisco November 17, 2011 Deepak Nadig Distinguished Application

eBay Mine Finds - or - Adventures in On-line Rockhounding Presentation Notes: 1. Lapis - eBay -

Agile Enterprise Analytics OLIVER RATZESBERGER Sr. Director Architecture &amp; Operations eBay

SOA @ eBay : How is it a hit Sastry Malladi Distinguished Architect. Distinguished Architect.

Continuous delivery for native apps Niels Frydenholm, ebay Classifieds Continuous delivery 3

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

Operators Deep Dive Graham Hayes / HP Ron Rickard / eBay Inc. Graham Hayes - HP Cloud Ron

Why Do People Give? An Experimental Test of Pure and Impure Altruism Lise Vesterlund Mark

Natural Language Processing Info 159/259 Lecture 3: Text classification 2 (Aug 30, 2018)

Reviews Using Off-The-Shelf Argumentation Mining Marco Passon*, Marco Lippi, Giuseppe Serra* ,

Magnetic Field of a Wire Fundamental Laws for Calculating B-field Biot-Savart Law (long

TX-NM Network Gathering: August 1315, 2015 Generation X Bill Young MCC Austin Last updated:

USING PRIORS TO IMPROVE* ESTIMATES OF MUSIC STRUCTURE Jordan B. L. Smith Masataka Goto

Formal Program Optimization in Nuprl Using Computational Equivalence and Partial Types Vincent

arXiv:1810.02435v2 [hep-ex] 8 Oct 2018 Department of Physics Duke University, Durham, North

Agile Enterprise Analytics OLIVER RATZESBERGER Sr. Director Architecture & Operations eBay

Reviews Using Off-The-Shelf Argumentation Mining Marco Passon, Marco Lippi, Giuseppe Serra ,