Product2Vec : MRNe Net-Pr A A Multi ti-task Recurrent Ne Neural - - PowerPoint PPT Presentation

product2vec
SMART_READER_LITE
LIVE PREVIEW

Product2Vec : MRNe Net-Pr A A Multi ti-task Recurrent Ne Neural - - PowerPoint PPT Presentation

Product2Vec : MRNe Net-Pr A A Multi ti-task Recurrent Ne Neural Ne Network for Product Embedding Embeddings Arijit Biswas, Mukul Bhutani and Subhajit Sanyal Machine Learning, Amazon, Bangalore, India {barijit,mbhutani,subhajs}@amazon.com


slide-1
SLIDE 1

MRNe Net-Pr Product2Vec: A A Multi ti-task Recurrent Ne Neural Ne Network for Product Embedding Embeddings

Arijit Biswas, Mukul Bhutani and Subhajit Sanyal Machine Learning, Amazon, Bangalore, India {barijit,mbhutani,subhajs}@amazon.com

slide-2
SLIDE 2

Th The Collaborators

Mukul Bhutani Machine Learning, Amazon Subhajit Sanyal Machine Learning, Amazon

slide-3
SLIDE 3

A A Produc duct in n an n E-co commerce Company

Product attributes

  • Title
  • Color
  • Size
  • Material
  • Category
  • Item Type
  • Hazardous indicator
  • Batteries required
  • High Value
  • Target Gender
  • Weight
  • Offer
  • Review
  • Price
  • View Count
slide-4
SLIDE 4

Mo Motivation

  • Billions of products in the inventory
  • Diverse set of ML problems involving products
  • Product recommendation
  • Duplicate Product Detection
  • Product Safety Classification
  • Price Estimation
  • ....
  • Any ML application needs a good set of features
  • What is a good and useful featurization for products?
slide-5
SLIDE 5

A A Naïve Fe Featurization

  • Bag-of-words: TF-IDF representations
  • Title
  • Description
  • Bullet Points etc.
  • Although effective, often difficult to use in practice:
  • Overfitting
  • Computational and Storage Inefficient
  • Not Semantically Meaningful
  • Increases the parameters in down-stream ML algorithms
  • Dense Low-dimensional Features could alleviate these issues
slide-6
SLIDE 6

Su Summa mmary of Co Contri ributions

  • We propose a novel product representation approach
  • Dense, Low-dimensional, Generic
  • As good as TF-IDF representation
  • A Discriminative Multi-task Neural Network is trained
  • Different signals pertaining to a product are explicitly injected
  • Static: color, material, weight, size, sub-category
  • Dynamic: price, popularity, views
  • The learned representations should be generic
  • The title of a product is fed into a bidirectional LSTM
  • Hidden representation is “product embedding” or “product feature”
  • Training: Embedding is fed to multiple classification/regression/decoding units
  • Trained Jointly
  • Referred as Multi-task Recurrent Neural Network (MRNet)
slide-7
SLIDE 7

Pr Prior Work

  • Word/Document Embeddings
  • Word2Vec [Mikolov, 2013]
  • Paragraph2Vec/Doc2Vec [Mikolov, 2014]
  • Product Embeddings
  • Prod2Vec [Grbovic, KDD 2015]
  • Meta-Prod2Vec [Vasile, Recsys 2016]
  • Designed for product recommendation
  • Traditionally, Multi-task Learning is used for correlated tasks
  • We use multi-task learning to make the product representations

generic!

slide-8
SLIDE 8

MR MRNet

Our Approa

Classification Bi-directional LSTM Tasks Classification Regression Decoding Static

Color, Size, Material,Category, Item Type, Hazardous, High- value,Target Gender, Weight Tf-IDF representation of Title (5000 dim.)

Dynamic

Offers, Reviews Price, # Views

  • Different product signals are injected into MRNet
  • To make the embedding generic

Classification Embedding Layer (Product representation)

Word 1 Word 2 Word 3 Word T

Task 1 Task 2 Task 3 Task 4 Task 5 Input words from Product Title Regression Decoding

slide-9
SLIDE 9

Lo Loss and Optimi mization

slide-10
SLIDE 10

Lo Loss and Optimi mization

slide-11
SLIDE 11

Lo Loss and Optimi mization

slide-12
SLIDE 12

Lo Loss and Optimi mization

  • Joint Optimization
  • Gradient is computed w.r.t full loss
  • Alternating Optimization
  • Randomly one task loss is selected
  • Backpropagation is performed with that loss
  • Only the weights of that task and task-invariant layers are updated
slide-13
SLIDE 13

Lo Loss and Optimi mization

  • Joint Optimization
  • Gradient is computed w.r.t full loss
  • Alternating Optimization
  • Randomly one task loss is selected
  • Backpropagation is performed with that loss
  • Only the weights of that task and task-invariant layers are updated
slide-14
SLIDE 14

PG1 PG2 PGN PG1 PG2 PGN GL agnostic embedding (sparsity enforced) Fully connected linkages Fully connected linkages Embedding specific to PG1

Products organized as Product Groups (PGs):

  • Furniture, Jewelry, Books, Home, Clothes etc.

Signals are often product group specific:

  • Weights of Home items are different from

Jewelry

  • Sizes of clothes (XL, XXL etc.) are different from

furniture (king, queen)

  • Embeddings are learned for each product group
  • A sparse Autoencoder is used to obtain PG-

agnostic embedding

Pr Product Group Agnostic Em Embe beddi ddings ngs

slide-15
SLIDE 15

Da Datas asets

  • Plugs: If a product has an electrical plug or not
  • Binary, 205K samples
  • SIOC: If a product ships in it’s own container
  • Binary, 296K samples
  • Browse Category classification
  • Multi-class, 150K samples
  • Ingestible Classification
  • Binary, 1500 samples
  • SIOC (unseen population)
  • Binary, 150K training and 271 test samples
slide-16
SLIDE 16

Proposed MRNet is comparable to TF-IDF-LR in most scenarios!

Expe Experimental Resul sults s

Baseline: TF-IDF-LR

slide-17
SLIDE 17

Qua ualitative e res esul ults

slide-18
SLIDE 18

Products from different marketplaces have their metadata in the language native to that region. We train a multi-modal Autoencoder to link representations of products pertaining to different marketplaces.

Hidden Layer Embedding: UK Embedding: FR Embedding: UK Embedding: FR Training Data Split 1/3 input: [Embedding:UK, Embedding:FR] Output:[Embedding:UK, Embedding:FR] 1/3 input: [Embedding:UK, (0,0,0,…..,0)] Output:[(0,0,0,...,0) Embedding:FR] 1/3 input: [(0,0,0,...,0), Embedding:FR] Output:[Embedding:UK,(0,0,0,...,0)]

La Language Agnostic MR MRNet-Pr Product2Vec

slide-19
SLIDE 19

Qua ualitative e Res esul ults (Langua nguage e Agno Agnostic)

Nearest neighbors

  • f French

products in UK marketplace.

slide-20
SLIDE 20

Co Conclusi sion and Future Work rk

  • Propose a method for generic e-commerce product representation
  • Inject various product signals into it’s embedding
  • Comparable results w.r.t sparse and high-dimensional baseline
  • Product group agnostic embeddings
  • Language agnostic embeddings
  • Incorporate more signals: more generic
  • Include product image information
slide-21
SLIDE 21