 
              Building a 1500-Class Listing Categorizer from Implicit User Feedback Arnau Tibau - arnau.tibau@letgo.com October 2019, Data Council Barcelona Arnau Luca Julien Philipp Antoine 1
Outline 1. The problem of listing categorization 2. Building a listing categorization dataset 3. Building a listing taxonomy 4. Image-based listing categorization 5. Conclusions & next steps 2 2
A local two-sided marketplace - where sellers sell & buyers buy “bike” Arnau Luca Hoboken, Jersey City, New York New York 3
A correct categorization is key for a good buyer experience Moderation Recommender ● System Categorization Search ● Filter ● Related listings .... ● Category: Sports > ● Bicycles > Road bike Illegal Content: No ● Listing Catalog Brand: Fuji ● ... ● 4
Why don’t you just ask sellers for the category? complete a posting % of users who # of required fields in the posting process 5
Why don’t you just ask sellers for the category? (II) Adversarial Sellers Posting mistakes Is this an This is not a “Car” “iphone”? 6
Problem statement Baby & Child > Strollers > Listing categorizer Title : Joovy twin Roo+ Stroller Twin stroller Description : Best stroller for infant twins in my opinion. Fits two infant car seats side by side. You pick the [...] Price : $50 It’s just a classical “supervised learning” setup 7
Challenges Ambiguity Noisy Data Open-world Adversarial Actors A TV, a TV stand or Does this look like “Fidget spinners” This is not an a TV + TV Stand? a “Cardigan”? rise and demise “iphone” 8
Outline 1. The problem of listing categorization 2. Building a listing categorization dataset 3. Building a listing taxonomy 4. Image-based listing categorization 5. Conclusions & next steps 9 9
To train a model we need data and labeled data is expensive 1500 categories x 1000 samples / category = 1.5M labeled samples $ / annotation Total Estimate* (estimate) AWS Groundtruth 0.02 (aws) + 0.06 (labor) $120k (internal workforce) AWS Groundtruth 0.02 (aws) + 0.02 (labor) $60k (Mechanical Turk) Google Data Labeling ? (google) + 0.025 (labor) - With a recommended replication factor of 3x, we’re talking $180k - $360k for a moderately sized, static dataset. 10
Implicit user feedback provides noisy labels search query Contacted listing (feedback) 11
Implicit user feedback provides noisy labels (II) “road bike” (3) “road bike” (1) consensus label (# of users who (search agreed”) query) “cannondale” (4) We have 10s of millions of different search queries a month 12
Challenges: Potential Selection Bias Products supply Product demand (posted listings) (search queries) Listings with implicit annotations Potential problems ahead if P(class | w/ annotations) ≠ P(class) 13
Challenges: Uncorrelated label noise Query: “sub” (for sub-woofer) “Uncorrelated” because ● label error depends weakly on the listing attributes Sources of this type of label ● noise: Curiosity ○ Mistakes ○ …. ○ Contacted: “Easy” to deal with ● “Sub-mariner through outlier detection action figure” 14
Challenges: Correlated label noise Query: “cannondale bike” “Correlated” because label ● error depends strongly on the listing attributes Sources of this type of label Contacted: ● noise: “cannondale Listing similarity shoes” ○ User interest ○ correlation Contacted: Taxonomy errors ○ “cannondale mountain bike” Hard to deal with ● 15
Outline 1. The problem of listing categorization 2. Building a listing categorization dataset 3. Building a listing taxonomy 4. Image-based listing categorization 5. Conclusions & next steps 16 16
What is a listing taxonomy? adirondack armoire ● ● cars ● bean bag bed ● ● fashion & accessories ● bench bed frame ● ● furniture ● camper chair board ● ● ... ● commode chair cabinet ● ● cuddle chair couch ● ● dining chair chair ● ● feeding chair chest ● ● gaming chair desk ● ● glider chair dinning set ● ● highchair ... ● ● lounge chair ● massage chair ● office armchair ● The leafs of the taxonomy are the office chair ● classes in our classifier… ottoman ● … and the tree structure encodes the papasan chair ● relationship between labels parson chair ● ... ● 17
What makes a good marketplace taxonomy? Supply Demand Minimizes user Good coverage of confusion while both supply and offering sufficient demand granularity 18
Building a taxonomy is typically a manual process tool chest h mountain bike vanity tops chair y b r electric bike i d road bike b sofa wall cabinets i k e Bikes → road bike, road bike welsh cabinets hybrid bike, bmx bike, …. adirondack Class labels chairs hybrid bike bmx bike baseball card This is a very time-consuming business card process (O(k^2))... can we help speed it up? calling card card collection 19
A data-driven process to define large taxonomies N listings road bike 0 1 0 1 1 ... K classes 0 0 0 1 1 ... mountain bike 1 0 0 0 0 ... adirondack chair First we map potential classes to a vector space (K << N) 20
A data-driven process to define large taxonomies (II) N d 0 1 0 1 1 ... 0.4 1.1 -3.4 0.1 1 ... Your favorite 3.4 1.1 -2.4 0.3 0.8 0 0 0 1 1 ... embedding ... algo K K (word2vec, NNMF, LSA,...) 1 0 0 0 0 ... 0.5 1.1 6.4 0.1 1 ... We “embed” these vectors into a lower-dimensional space 21
A data-driven process to define large taxonomies (II) Finally we cluster those class-vectors and inspect manually (C clusters instead of K classes) 22
Outline 1. The problem of listing categorization 2. Building a listing categorization dataset 3. Building a listing taxonomy 4. Image-based listing categorization 5. Conclusions & next steps 23
Why image-based listing categorization? At first, we only asked for one image at posting-time* ● Very few listings had a description ● Even if now we have more info (name, description, price), the ● picture and the other info should be coherent. 24
A virtuous training & debugging cycle Dataset Training pre-processing Taxonomy Dataset De-duplication Definition generation Outlier Cleaning Report Inspection Manual evaluation Automatic evaluation 25
Uncorrelated label noise cleaning via Isolation Forests Outliers for “dslr camera” Distribution of anomaly scores for 3 different classes Isolation-based Anomaly Detection, Liu, Ting and Zhou, ACM 2012 26
Training & deploying with AWS Sagemaker’s Training/Evaluation dataset inference.py training.py Model Artefacts 27
Training & deploying with AWS Sagemaker’s (II) Pros Cons Ease-of-use Debugging ● ● Flexibility (DL framework, Inference Cost ($$$) ● ● model ser/deserialization, Unstable SDK ● data transformation) Integration Testing is hard ● Endpoint autoscaling ● Integration with AWS ● ecosystem (IAM roles, Cloudwatch, ELB, etc) Fast & responsive ● maintenance 28
Model & dataset debugging (I) Local Global metrics Precision Recall f1-score metrics Precision Recall Accuracy 6901_masquerade 95% 97% 96% Category _mask Subcategory 11381_vinyl_figure 92% 94% 93% Micro-cat 11361_video_game 89% 96% 93% …. Metrics are great to measure progress but not that great for fixing problems: Look at the mistakes! 29
Model & dataset debugging (II): confusion report Model capacity 30
Model & dataset debugging (III): confusion report Wrong label 31
Model & dataset debugging (IV): confusion report Indistinguishable from images alone Other potential issues: Low sample size ● Non-disjoint classes ● 32
Manual evaluation Noisy labels means we can’t trust our accuracy metrics ● ● Customer perception ≠ metrics Gathering human labels → EASY PEASY? ● Manual Title : Joovy twin Roo+ Stroller annotations Description : Best stroller for infant twins in my opinion. Fits two infant car seats side by side. Our model You pick the [...] Price : $50 33
Manual evaluation challenges (II) What do we ask for? right/wrong, choose correct one? ● ● What do we show? All images, text, category tree? Same predictions, different category tree → 10% difference in manually evaluated accuracy 34
Outline 1. The problem of listing categorization 2. Building a listing taxonomy 3. Building a listing categorization dataset 4. Image-based listing categorization 5. Conclusions & next steps 35 35
Conclusions & next steps Implicit feedback is a cost-effective way of annotating data ● Performance evaluation is not trivial ● Building pipelines with iteration in mind is KEY ● AWS Sagemaker provides a good trade-off between flexibility and ease of use ● Next steps: Resolving ambiguities → multi-image and multi-modal models ● Handling correlated label noise → Noise-aware models ● 36
Thanks for your attention! Any questions? arnau.tibau@letgo.com 37
Recommend
More recommend