Large-scale Product Categorization with Deep Models in Rakuten - - PowerPoint PPT Presentation

large scale product categorization
SMART_READER_LITE
LIVE PREVIEW

Large-scale Product Categorization with Deep Models in Rakuten - - PowerPoint PPT Presentation

Large-scale Product Categorization with Deep Models in Rakuten May/8/2017 Ali Cevahir / Denis Miller Rakuten Institute of Technology / Rakuten, Inc. https://rit.rakuten.co.jp / https://global.rakuten.com About Rakuten


slide-1
SLIDE 1

Large-scale Product Categorization with Deep Models in Rakuten

May/8/2017 Ali Cevahir / Denis Miller Rakuten Institute of Technology / Rakuten, Inc. https://rit.rakuten.co.jp / https://global.rakuten.com

slide-2
SLIDE 2

About Rakuten

2 https://global.rakuten.com/corp/about/strength/data.html
slide-3
SLIDE 3

Rakuten Group Services

3

E-Commerce FinTech Digital Content Travel & Reservation Pro Sports Others

https://global.rakuten.com/corp/about/business/internet.html
slide-4
SLIDE 4 4

Online Market Place

  • ver 230,000,000 items in 30,000+ categories

Merchants Shoppers

Branding Marketing

Rakuten Ichiba

EC Consulting

slide-5
SLIDE 5

Problem and Solution

5
slide-6
SLIDE 6

Introduction

  • Problem: Given product information, automatically

classify it to its correct category

6

MACPHEE(マカフィー) 切り替えVネックニット Ladies Fashion  Tops  Knit Sweaters  Long Sleeves  V Neck

slide-7
SLIDE 7

Proposed Solutions

  • 2 different models

– Deep Belief Nets – Deep Auotoencoders + kNN

  • 2 different data sources

– Titles – Descriptions

  • Overall results aggregated
  • GPU Implementation
7
slide-8
SLIDE 8

Proposed Solutions

  • 2-step classification

– First classify to Level-1 categories – Then, to leaf levels

  • 81% match with merchants

(‘others’ excluded)

– Merchants are not always correct

8

MACPHEE(マカフィー) 切り替えVネックニット

Ladies Fashion  Tops  Knit Sweaters  Long Sleeves  V Neck

slide-9
SLIDE 9

CUDeep: A CUDA-based Deep Learning Framework

  • In-house command-line tool

for training DBN and DAE

  • Written with CUDA,

using cuBlas and cuSparse

9
slide-10
SLIDE 10

CUDeep: A CUDA-based Deep Learning Framework

  • Deep Belief Nets

vs

  • Deep Autoencoders
10

X Y’ X X’

Input features

Semantic hash Class probabilities

Y

Supervised

(~1 million dim.)

Billions of connections!!!

slide-11
SLIDE 11

CUDeep: A CUDA-based Deep Learning Framework

  • Selective Reconstruction

(Dauphin et. al, 2011)

  • Applied for both

– Layer-wise training – Backpropagation

11
slide-12
SLIDE 12

CUDeep: Some Design Decisions

12

W[vis,hid1] = 4 GB 1M 1000

  • Keep neural net weights on GPU

– Faster: No need to communicate weights btw CPU and GPU – Alternative: store weights on main memory, copy weights to be updated to GPU for each minibatch

  • Sparse input feature vectors are

stored on main memory

– Limited device memory – Disk streaming possible, but slower

slide-13
SLIDE 13

CUDeep: Some Design Decisions

13

During layer-wise pre-training:

  • Do not store intermediate
  • utputs of hidden layers
  • Do feedforward computations

instead

  • Intermediate outputs are dense

– Not practical to store

200 Million sparse inputs 8 GB (10 nonzero / feature) (2000-d) (1000-d) 1.6 TB 800 GB (64-d) 51.2GB

slide-14
SLIDE 14

CUDA-kNN

  • Vector search engine
14
slide-15
SLIDE 15

CUDA-kNN

15
  • Preprocessing: Multi-level

k-means clustering

  • 2-step search

1. Closest-cluster search 2. kNN in the closest cluster

1 2

slide-16
SLIDE 16

2-Step Classification

16
  • Step-1: 2 DBN & kNN
  • Step 2: 2x35 DBN & kNN
  • 2 DAE models

– Same encoding for step 1 and step 2

Level 1: 35 Categories Level 5: ~30,000 Categories

slide-17
SLIDE 17

Feature Extraction

  • Features: 0-1 word vectors
  • Mostly Japanese text
  • Normalize letters: アイフォン 4S  アイフォン 4s
  • Cleaning all html tags: <a href> link </a>  link
  • Regular expressions for:

– Product codes: iPhone-4S → iphone4s – Japanese counters: 4枚 (do not tokenize) – Sizes and dimensions: 12Cm x 3 Cm → 12cmx3cm

17
slide-18
SLIDE 18

Feature Extraction

  • Titles: 26M tokens
  • Descr: 47M tokens
  • Use only 1M most-

frequent tokens

– Good enough for L1 classification – Less tokens exist in subcategories for L2 classification

18

Total dictionary size: 26M Total dictionary size: 800K

slide-19
SLIDE 19

Dataset Properties and Hardware Setup

  • 280 million (active and inactive) products

– Rakuten Data Release (https://rit.rakuten.co.jp/opendata.html)

  • Deduped by titles: 280 million → 172 million

– Merchants may sell the same items

  • 28,338 active categories

– ~40% of products are assigned to leaf categories named “others”

  • 90% of randomly selected products used for training
  • A Linux server with 4 TitanX GPUs
  • 2 x12-core Intel CPUs
  • 96 GB main memory
19
slide-20
SLIDE 20

Level-1 Genre Prediction Results (Step 1)

20

Excludes “others” categories Includes “others” categories

70 72 74 76 78 80 82 84 86 88 90 92 94 96 98 100 1 2 3 4 5 6 7 8 9 10 L1 Prediction - with others(Percent Recall @ N) Title-DBN Description-DBN Title-KNN Description-KNN Combined % 70 72 74 76 78 80 82 84 86 88 90 92 94 96 98 100 1 2 3 4 5 6 7 8 9 10 L1 Prediction - without others(Percent Recall @ N) Title-DBN Description-DBN Title-KNN Description-KNN Combined % Top N predictions Top N predictions
slide-21
SLIDE 21 50 55 60 65 70 75 80 85 90 95 100 1 2 3 4 5 6 7 8 9 10 L5 Prediction - without others (Percent Recall @ N) Title-DBN Description-DBN Title-KNN Description-KNN Combined 50 55 60 65 70 75 80 85 90 95 100 1 2 3 4 5 6 7 8 9 10 L5 Prediction - with others (Percent Recall @ N) Title-DBN Description-DBN Title-KNN Description-KNN Combined DBNs combined

Overall Taxonomy Matching (Step 2)

21

Excludes “others” categories Includes “others” categories

% % Top N predictions Top N predictions
slide-22
SLIDE 22

Sample Results Merchant Correct / Algorithm Incorrect

22

Sweet Mother - Isaac Andrews

Merchant Category: Books, Magazines & Comics > Western Books > Books For Kids Predicted Category: Books, Magazines & Comics > Western Books > Fiction & Literature

slide-23
SLIDE 23

Sample Results Merchant Incorrect / Algorithm Correct

23

トヨトミ[KS-67H]電子火式流型石油ストブKS67H

Merchant Category: Flowers, Garden & DIY > DIY & Tools > Others Predicted Category: Consumer electronics > Seasonal home appliances > Heating > Oilstove > 14+ tatami (wooden) , 19+ tatami (rebar)

slide-24
SLIDE 24

Sample Results Merchant and Algorithm are Both Correct

24

レンタル【RG87】袴 フルセット/大学生/小学生/高校生/中学生

Merchant Category: Women’s Fashion > Japanese > Kimono > Hakama Predicted Category: Women’s Fashion > Japanese > Rental

slide-25
SLIDE 25

Summary

  • Large-scale product categorization
  • A multi-modal deep learning approach
  • CUDA-based tools: CUDeep, CUDA-kNN
  • Noisy data, high matching with manual labeling
  • Engineering challenges

– Large data – Dynamic data: products and categories keep changing – Not easy to replicate research output with these settings

25
slide-26
SLIDE 26

Engineering Work

26

 Architecture  Tuning for different GPU cards  Dealing with large data set  Improving prediction accuracy  Future work

slide-27
SLIDE 27

System architecture

  • Designed to have high

scalability and availability

  • Support requests of both

single and multiple input data

  • Based on Docker. Used

nvidia-docker for GPU-based components

27

https://github.com/NVIDIA/nvidia-docker

slide-28
SLIDE 28

Classification data flow diagram

28
slide-29
SLIDE 29

PROBLEMS & SOLUTIONS

29
slide-30
SLIDE 30

GPU memory size difference

Research environment Titan X Production environment Tesla K80

30

12,287 MiB 11,519 MiB 768 MiB loss

>

slide-31
SLIDE 31

GPU memory size difference

Different memory size requires a series of experiments to find new model configuration

  • Reduce input layer size e.g. from

1M to 900K, with sacrificing information

Will use latest GPU with more GPU memory to recover this information loss in future work

31

900K 1K 2K N

slide-32
SLIDE 32

Extra large data amount

230

million items

  • 200 GB of raw data
  • 260 GB of tokenized items
  • 200+ GB of 70+ model files
  • 4 days preparing training data
  • More than one week to train the

models using single server with 2 Tesla K80 cards

  • Extremely large memory usage

during training and classification

32
slide-33
SLIDE 33

Extra large data amount

  • Issue

– File operations and data processing has high time consumption

  • Solution

– Multiprocessing everywhere – High-speed storage

33
slide-34
SLIDE 34

Accuracy worse than experiment

74%

51%

  • Research shows the result of

74% accuracy rate and up to 88% in some categories

  • After first building the models

from latest data, accuracy is

  • nly 51%
  • Further investigations shown

some few significant defects.

34
slide-35
SLIDE 35

Shuffling input data

  • Issue

– Due to the high correlation of sample data, this can result in biased gradient and lead to poor convergence

  • Solution

– Add shuffling process into the data preparation

35

Input data preprocessing Additional process to shuffle data

slide-36
SLIDE 36

Tuning training parameter

  • Issue

– Trained models with latest data resulted in low accuracy result

  • Lower input layer size
  • Unbalanced item distribution in categories
  • Solution

– Increase number of backpropagation epochs in 2.5 times and decrease bias multiplier in 10 times

36
slide-37
SLIDE 37

Grouping of categories

37
  • Issue

– Low prediction accuracy for similar categories when separating models

  • Solution

– Group similar categories

slide-38
SLIDE 38

Accuracy improvement result

80%~ 98%

51%

  • Recover expected result

– 80% of overall accuracy – 98% in popular categories

  • Cost several months of work
38
slide-39
SLIDE 39

Most successful categories

39
slide-40
SLIDE 40

FUTURE WORK

40
slide-41
SLIDE 41

Next steps

80%

is not enough

Need to improve the accuracy as much as possible

  • Data analysis
  • New experiments
41
slide-42
SLIDE 42

Bias item distribution in leaf categories

42

Category ID # of items

Resulting in low prediction accuracy in categories with few items

slide-43
SLIDE 43

Experiment with fine-tuning

  • Adding extra training iterations

– for categories with few items by using the same input data to increase acknowledgment

  • Experiment shows a positive

dynamic of using fine-tuning

2 4 6 8 10 12 14 16 18 Category 1 Category 2 Category 3 Category 4

Extra training iterations in model

43 # of item (M)
slide-44
SLIDE 44

Experiment with splitting models

Separate categories into 3 groups and build model sets for them independently

  • Extra small
  • Normal
  • Extra large genres

Require more resources, but expected to have significant accuracy improvement

44
slide-45
SLIDE 45

To meet business requirement

  • Very frequent data update

– Need to reduce time to train new models

  • Will need high spec GPU server

and automation enhancement

45
slide-46
SLIDE 46

THANK YOU!

Q&A

46

More about Rakuten https://global.rakuten.com/corp/about/ https://global.rakuten.com/corp/careers/