Large-scale Product Categorization with Deep Models in Rakuten
May/8/2017 Ali Cevahir / Denis Miller Rakuten Institute of Technology / Rakuten, Inc. https://rit.rakuten.co.jp / https://global.rakuten.com
Large-scale Product Categorization with Deep Models in Rakuten - - PowerPoint PPT Presentation
Large-scale Product Categorization with Deep Models in Rakuten May/8/2017 Ali Cevahir / Denis Miller Rakuten Institute of Technology / Rakuten, Inc. https://rit.rakuten.co.jp / https://global.rakuten.com About Rakuten
Large-scale Product Categorization with Deep Models in Rakuten
May/8/2017 Ali Cevahir / Denis Miller Rakuten Institute of Technology / Rakuten, Inc. https://rit.rakuten.co.jp / https://global.rakuten.com
About Rakuten
2 https://global.rakuten.com/corp/about/strength/data.htmlRakuten Group Services
3E-Commerce FinTech Digital Content Travel & Reservation Pro Sports Others
https://global.rakuten.com/corp/about/business/internet.htmlOnline Market Place
Merchants Shoppers
Branding Marketing
Rakuten Ichiba
EC Consulting
Problem and Solution
5Introduction
classify it to its correct category
6MACPHEE(マカフィー) 切り替えVネックニット Ladies Fashion Tops Knit Sweaters Long Sleeves V Neck
Proposed Solutions
– Deep Belief Nets – Deep Auotoencoders + kNN
– Titles – Descriptions
Proposed Solutions
– First classify to Level-1 categories – Then, to leaf levels
(‘others’ excluded)
– Merchants are not always correct
8MACPHEE(マカフィー) 切り替えVネックニット
Ladies Fashion Tops Knit Sweaters Long Sleeves V Neck
CUDeep: A CUDA-based Deep Learning Framework
for training DBN and DAE
using cuBlas and cuSparse
9CUDeep: A CUDA-based Deep Learning Framework
vs
X Y’ X X’
Input features
Semantic hash Class probabilitiesY
Supervised(~1 million dim.)
Billions of connections!!!
CUDeep: A CUDA-based Deep Learning Framework
(Dauphin et. al, 2011)
– Layer-wise training – Backpropagation
11CUDeep: Some Design Decisions
12W[vis,hid1] = 4 GB 1M 1000
– Faster: No need to communicate weights btw CPU and GPU – Alternative: store weights on main memory, copy weights to be updated to GPU for each minibatch
stored on main memory
– Limited device memory – Disk streaming possible, but slower
CUDeep: Some Design Decisions
13During layer-wise pre-training:
instead
– Not practical to store
200 Million sparse inputs 8 GB (10 nonzero / feature) (2000-d) (1000-d) 1.6 TB 800 GB (64-d) 51.2GB
CUDA-kNN
CUDA-kNN
15k-means clustering
1. Closest-cluster search 2. kNN in the closest cluster
1 2
2-Step Classification
16– Same encoding for step 1 and step 2
Level 1: 35 Categories Level 5: ~30,000 Categories
Feature Extraction
– Product codes: iPhone-4S → iphone4s – Japanese counters: 4枚 (do not tokenize) – Sizes and dimensions: 12Cm x 3 Cm → 12cmx3cm
17Feature Extraction
frequent tokens
– Good enough for L1 classification – Less tokens exist in subcategories for L2 classification
18Total dictionary size: 26M Total dictionary size: 800K
Dataset Properties and Hardware Setup
– Rakuten Data Release (https://rit.rakuten.co.jp/opendata.html)
– Merchants may sell the same items
– ~40% of products are assigned to leaf categories named “others”
Level-1 Genre Prediction Results (Step 1)
20Excludes “others” categories Includes “others” categories
70 72 74 76 78 80 82 84 86 88 90 92 94 96 98 100 1 2 3 4 5 6 7 8 9 10 L1 Prediction - with others(Percent Recall @ N) Title-DBN Description-DBN Title-KNN Description-KNN Combined % 70 72 74 76 78 80 82 84 86 88 90 92 94 96 98 100 1 2 3 4 5 6 7 8 9 10 L1 Prediction - without others(Percent Recall @ N) Title-DBN Description-DBN Title-KNN Description-KNN Combined % Top N predictions Top N predictionsOverall Taxonomy Matching (Step 2)
21Excludes “others” categories Includes “others” categories
% % Top N predictions Top N predictionsSample Results Merchant Correct / Algorithm Incorrect
22Sweet Mother - Isaac Andrews
Merchant Category: Books, Magazines & Comics > Western Books > Books For Kids Predicted Category: Books, Magazines & Comics > Western Books > Fiction & Literature
Sample Results Merchant Incorrect / Algorithm Correct
23トヨトミ[KS-67H]電子火式流型石油ストブKS67H
Merchant Category: Flowers, Garden & DIY > DIY & Tools > Others Predicted Category: Consumer electronics > Seasonal home appliances > Heating > Oilstove > 14+ tatami (wooden) , 19+ tatami (rebar)
Sample Results Merchant and Algorithm are Both Correct
24レンタル【RG87】袴 フルセット/大学生/小学生/高校生/中学生
Merchant Category: Women’s Fashion > Japanese > Kimono > Hakama Predicted Category: Women’s Fashion > Japanese > Rental
Summary
– Large data – Dynamic data: products and categories keep changing – Not easy to replicate research output with these settings
25Engineering Work
26 Architecture Tuning for different GPU cards Dealing with large data set Improving prediction accuracy Future work
System architecture
scalability and availability
single and multiple input data
nvidia-docker for GPU-based components
27https://github.com/NVIDIA/nvidia-docker
Classification data flow diagram
28PROBLEMS & SOLUTIONS
29GPU memory size difference
Research environment Titan X Production environment Tesla K80
3012,287 MiB 11,519 MiB 768 MiB loss
GPU memory size difference
Different memory size requires a series of experiments to find new model configuration
1M to 900K, with sacrificing information
Will use latest GPU with more GPU memory to recover this information loss in future work
31900K 1K 2K N
Extra large data amount
million items
models using single server with 2 Tesla K80 cards
during training and classification
32Extra large data amount
– File operations and data processing has high time consumption
– Multiprocessing everywhere – High-speed storage
33Accuracy worse than experiment
74% accuracy rate and up to 88% in some categories
from latest data, accuracy is
some few significant defects.
34Shuffling input data
– Due to the high correlation of sample data, this can result in biased gradient and lead to poor convergence
– Add shuffling process into the data preparation
35Input data preprocessing Additional process to shuffle data
Tuning training parameter
– Trained models with latest data resulted in low accuracy result
– Increase number of backpropagation epochs in 2.5 times and decrease bias multiplier in 10 times
36Grouping of categories
37– Low prediction accuracy for similar categories when separating models
– Group similar categories
Accuracy improvement result
51%
– 80% of overall accuracy – 98% in popular categories
Most successful categories
39FUTURE WORK
40Next steps
is not enough
Need to improve the accuracy as much as possible
Bias item distribution in leaf categories
42Category ID # of items
Resulting in low prediction accuracy in categories with few items
Experiment with fine-tuning
– for categories with few items by using the same input data to increase acknowledgment
dynamic of using fine-tuning
2 4 6 8 10 12 14 16 18 Category 1 Category 2 Category 3 Category 4
Extra training iterations in model
43 # of item (M)Experiment with splitting models
Separate categories into 3 groups and build model sets for them independently
Require more resources, but expected to have significant accuracy improvement
44To meet business requirement
– Need to reduce time to train new models
and automation enhancement
45THANK YOU!
Q&A
46More about Rakuten https://global.rakuten.com/corp/about/ https://global.rakuten.com/corp/careers/