Product2Vec : MRNe Net-Pr A A Multi ti-task Recurrent Ne Neural - - PowerPoint PPT Presentation
Product2Vec : MRNe Net-Pr A A Multi ti-task Recurrent Ne Neural - - PowerPoint PPT Presentation
Product2Vec : MRNe Net-Pr A A Multi ti-task Recurrent Ne Neural Ne Network for Product Embedding Embeddings Arijit Biswas, Mukul Bhutani and Subhajit Sanyal Machine Learning, Amazon, Bangalore, India {barijit,mbhutani,subhajs}@amazon.com
Th The Collaborators
Mukul Bhutani Machine Learning, Amazon Subhajit Sanyal Machine Learning, Amazon
A A Produc duct in n an n E-co commerce Company
Product attributes
- Title
- Color
- Size
- Material
- Category
- Item Type
- Hazardous indicator
- Batteries required
- High Value
- Target Gender
- Weight
- Offer
- Review
- Price
- View Count
Mo Motivation
- Billions of products in the inventory
- Diverse set of ML problems involving products
- Product recommendation
- Duplicate Product Detection
- Product Safety Classification
- Price Estimation
- ....
- Any ML application needs a good set of features
- What is a good and useful featurization for products?
A A Naïve Fe Featurization
- Bag-of-words: TF-IDF representations
- Title
- Description
- Bullet Points etc.
- Although effective, often difficult to use in practice:
- Overfitting
- Computational and Storage Inefficient
- Not Semantically Meaningful
- Increases the parameters in down-stream ML algorithms
- Dense Low-dimensional Features could alleviate these issues
Su Summa mmary of Co Contri ributions
- We propose a novel product representation approach
- Dense, Low-dimensional, Generic
- As good as TF-IDF representation
- A Discriminative Multi-task Neural Network is trained
- Different signals pertaining to a product are explicitly injected
- Static: color, material, weight, size, sub-category
- Dynamic: price, popularity, views
- The learned representations should be generic
- The title of a product is fed into a bidirectional LSTM
- Hidden representation is “product embedding” or “product feature”
- Training: Embedding is fed to multiple classification/regression/decoding units
- Trained Jointly
- Referred as Multi-task Recurrent Neural Network (MRNet)
Pr Prior Work
- Word/Document Embeddings
- Word2Vec [Mikolov, 2013]
- Paragraph2Vec/Doc2Vec [Mikolov, 2014]
- Product Embeddings
- Prod2Vec [Grbovic, KDD 2015]
- Meta-Prod2Vec [Vasile, Recsys 2016]
- Designed for product recommendation
- Traditionally, Multi-task Learning is used for correlated tasks
- We use multi-task learning to make the product representations
generic!
MR MRNet
Our Approa
Classification Bi-directional LSTM Tasks Classification Regression Decoding Static
Color, Size, Material,Category, Item Type, Hazardous, High- value,Target Gender, Weight Tf-IDF representation of Title (5000 dim.)
Dynamic
Offers, Reviews Price, # Views
- Different product signals are injected into MRNet
- To make the embedding generic
Classification Embedding Layer (Product representation)
Word 1 Word 2 Word 3 Word T
Task 1 Task 2 Task 3 Task 4 Task 5 Input words from Product Title Regression Decoding
Lo Loss and Optimi mization
Lo Loss and Optimi mization
Lo Loss and Optimi mization
Lo Loss and Optimi mization
- Joint Optimization
- Gradient is computed w.r.t full loss
- Alternating Optimization
- Randomly one task loss is selected
- Backpropagation is performed with that loss
- Only the weights of that task and task-invariant layers are updated
Lo Loss and Optimi mization
- Joint Optimization
- Gradient is computed w.r.t full loss
- Alternating Optimization
- Randomly one task loss is selected
- Backpropagation is performed with that loss
- Only the weights of that task and task-invariant layers are updated
PG1 PG2 PGN PG1 PG2 PGN GL agnostic embedding (sparsity enforced) Fully connected linkages Fully connected linkages Embedding specific to PG1
Products organized as Product Groups (PGs):
- Furniture, Jewelry, Books, Home, Clothes etc.
Signals are often product group specific:
- Weights of Home items are different from
Jewelry
- Sizes of clothes (XL, XXL etc.) are different from
furniture (king, queen)
- Embeddings are learned for each product group
- A sparse Autoencoder is used to obtain PG-
agnostic embedding
Pr Product Group Agnostic Em Embe beddi ddings ngs
Da Datas asets
- Plugs: If a product has an electrical plug or not
- Binary, 205K samples
- SIOC: If a product ships in it’s own container
- Binary, 296K samples
- Browse Category classification
- Multi-class, 150K samples
- Ingestible Classification
- Binary, 1500 samples
- SIOC (unseen population)
- Binary, 150K training and 271 test samples
Proposed MRNet is comparable to TF-IDF-LR in most scenarios!
Expe Experimental Resul sults s
Baseline: TF-IDF-LR
Qua ualitative e res esul ults
Products from different marketplaces have their metadata in the language native to that region. We train a multi-modal Autoencoder to link representations of products pertaining to different marketplaces.
Hidden Layer Embedding: UK Embedding: FR Embedding: UK Embedding: FR Training Data Split 1/3 input: [Embedding:UK, Embedding:FR] Output:[Embedding:UK, Embedding:FR] 1/3 input: [Embedding:UK, (0,0,0,…..,0)] Output:[(0,0,0,...,0) Embedding:FR] 1/3 input: [(0,0,0,...,0), Embedding:FR] Output:[Embedding:UK,(0,0,0,...,0)]
La Language Agnostic MR MRNet-Pr Product2Vec
Qua ualitative e Res esul ults (Langua nguage e Agno Agnostic)
Nearest neighbors
- f French
products in UK marketplace.
Co Conclusi sion and Future Work rk
- Propose a method for generic e-commerce product representation
- Inject various product signals into it’s embedding
- Comparable results w.r.t sparse and high-dimensional baseline
- Product group agnostic embeddings
- Language agnostic embeddings
- Incorporate more signals: more generic
- Include product image information