Embeddings @ Twitter Making ML easy with Embeddings !!! Sept 2018

Agenda 1 Team 2 Whats an Embedding ? 3 Why Embeddings ? 4 Embeddings Pipeline 5 What’s Next

Section Team Cortex

Team To unify and advance recommendation systems.

Team Recommendation Systems

Team Home Explore

Team Email

Team Notifications

Team Twitter

Agenda 1 Team and Product 2 Whats an Embedding ? 3 Why Embeddings ? 4 Embeddings Pipeline 5 What’s Next

What is an Embedding ? Discrete Continuous Space! Model twitter: [ 0.07, -0.001, -0.208 ] @jack: [ 0.427, 0.225, -0.082 ] SF: [ 0.541, 0.496, -0.362 ] #TwitterNBA: [ 0.414, 0.068, -0.196 ] word

Why Embeddings ? Model Transfer Feature Nearest Neighbor Features Compression Search Learning Lead to improved model Reduced infrastructure cost and Similarity search on the Knowledge exchange between performance when used as input improved efficiency embedding space related domains while reducing features training time and boosting performance

Why Embeddings ? ML practitioners typically use one-hot encoding ● Model Features to represent categorical inputs Incapable of encoding relationships ○ Sparsity issues make it less useful for large ○ dimensions Embeddings are outputs of ML models ● Conserve relationships amongst entities ○ Compress the sparse input space into ○ dense vectors

Why Embeddings ? Model Features

Why Embeddings ? Feature Compression

Why Embeddings ? Feature Compression Generate embeddings from a sub-network offline ● Update at the same frequency as the raw features ●

Why Embeddings ? Feature Compression

Why Embeddings ? Nearest Neighbor Search

Why Embeddings ? Essential component for Candidate Generation ● pipelines Co-embed users and items ○ Given a user, lookup neighbors ○ Use approximate methods to scale ○ Nearest Neighbor Search Finds application in many other areas ●

Why Embeddings ? Model trained for one task is used in another ● Typically by initializing network weights and fine ○ tuning Transfer Very attractive from a business point of view ● Learning Reduced development time ○ Cross domain information sharing ○

Embedding pipeline Goals Creation & Quality and Sharing & consumption with Relevance discoverability ease Enable adapting to Enable cross team Enable teams to learn evolving data distributions embeddings at scale using collaboration over time the appropriate algorithm Improvements/learning in If applicable the learnt one domain can drive Enable teams to consume embeddings should be of embeddings at scale improvements elsewhere value across product ML models

Embedding pipeline Item Selection & Data Preprocessing Identify the set of entities to learn embeddings for ● Assemble dataset that represents the relationships ● between these entities Data representation defined by the learning ○ algorithm

Embedding pipeline Model Fitting Fit a model on the collected data ● Use pre-built algorithms ○ Option to plug in a custom algorithm ○

Embedding pipeline Developed a variety of standard benchmarking ● tasks for each type of embedding Benchmarking User Topic Prediction: ○ Predictive performance of a logistic regression model learnt on the users embedding.

Embedding pipeline Developed a variety of standard benchmarking ● tasks for each type of embedding Benchmarking User metadata prediction : ○ Predictive performance of a logistic regression model learnt on the users embedding.

Embedding pipeline Developed a variety of standard benchmarking ● tasks for each type of embedding Benchmarking User Follow Jaccard: ○ Jaccard index of the users’ embedding similarity and their follow sets'

Embedding pipeline Feature Store Publish embeddings to the "feature store", ● Twitter's shared feature repository Enables ML teams throughout Twitter to easily ● discover, access, and utilize freshly trained embeddings. Easy offline & online access ○ Discovery through UX ○

Whats Next ? New embedding learning algorithms ● Increasing number of datasets available as embeddings ● Large scale approximate nearest neighbor (ANN) solution ● Further exploration with embeddings as means for feature compression ●

@tayal_abhishek Thank you September, 2018

Section Abhishek Tayal @tayal_abhishek We are Hiring !!! #TwitterCortex #MLX 00 5k 10k 09 Sep 2018

Embeddings @ Twitter Making ML easy with Embeddings !!! Sept 2018 - PowerPoint PPT Presentation

Embeddings @ Twitter Making ML easy with Embeddings !!! Sept 2018 Agenda 1 Team 2 Whats an Embedding ? 3 Why Embeddings ? 4 Embeddings Pipeline 5 Whats Next Agenda 1 Team 2 Whats an Embedding ? 3 Why Embeddings ? 4

Word embeddings Rappel Embeddings ( pas Word Embeddings ) Est une lookup table Formalisme:

Word Embeddings Natural Language Processing VU (706.230) - Andi Rexha 02/04/2020 Word Embeddings

Word Embeddings Revisited: Contextual Embeddings CS 6956: Deep Learning for NLP Overview

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

Mixed membership word embeddings: Corpus-specific embeddings without big data James Foulds

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

Large-Scale Machine Learning at Twitter 2 Large-Scale Machine Learning at Twitter Jimmy Lin and

Using Twitter for your CPD Janet Thomas November 2019 #PHYSIO19 Why twitter for CPD?

ML at Twitter: A Deep Dive into Twitters Timeline Cibele Montez Halasz, Twitter Cortex

//Dashboard //Twitter Panel //Twitter Panel Context and Actions Act based on the document

Deep Image-Text Embeddings Learning Deep Structure-Preserving Image-Text Embeddings (CVPR 2016)

CS 4803 / 7643: Deep Learning Guest Lecture: Embeddings and world2vec Feb. 18 th 2020 Ledell Wu

Word Embeddings Tutorial HILA GONEN PHD STUDENT AT YOAV GOLDBERGS LAB BAR ILAN UNIVERSITY

Dense Word Embeddings CMSC 470 Marine Carpuat Slides credit: Jurasky & Martin How to

December 2017 Forward-Looking / Cautionary Statements This presentation, including any oral

I am Satoru Shiono of MS&AD Holdings Corporate Communications and Investor Relations Dept.

How can we develop a complete picture of suicidal behaviors to inform our plans and programs?

The Natural History Society of Northumbria How to learn to love nature and not panic about gaps

FEATURE AND LOCAL FEATURE DESCRIPTORS FOR SILK FABRIC PATTERN IMAGE RECOGNITION Thananchai

What is the Indiana Philanthropy Alliance? And What Does It Have to Do With the 2020 Census?

1 Virginia Clean Water Revolving Loan Fund (VCWRLF) Water Quality Improvement Fund

Workshop July 28, 2020 Discussion Seminar SEMINAR Q&A Oral Comments Use the Raise

Sambuz

Useful Links

Newsletter

Mail Us

Embeddings @ Twitter Making ML easy with Embeddings !!! Sept 2018 - PowerPoint PPT Presentation

Embeddings @ Twitter Making ML easy with Embeddings !!! Sept 2018 Agenda 1 Team 2 Whats an Embedding ? 3 Why Embeddings ? 4 Embeddings Pipeline 5 Whats Next Agenda 1 Team 2 Whats an Embedding ? 3 Why Embeddings ? 4

Word embeddings Rappel Embeddings ( pas Word Embeddings ) Est une lookup table Formalisme:

Word Embeddings Natural Language Processing VU (706.230) - Andi Rexha 02/04/2020 Word Embeddings

Word Embeddings Revisited: Contextual Embeddings CS 6956: Deep Learning for NLP Overview

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

Mixed membership word embeddings: Corpus-specific embeddings without big data James Foulds

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

Large-Scale Machine Learning at Twitter 2 Large-Scale Machine Learning at Twitter Jimmy Lin and

Using Twitter for your CPD Janet Thomas November 2019 #PHYSIO19 Why twitter for CPD?

ML at Twitter: A Deep Dive into Twitters Timeline Cibele Montez Halasz, Twitter Cortex

//Dashboard //Twitter Panel //Twitter Panel Context and Actions Act based on the document

Deep Image-Text Embeddings Learning Deep Structure-Preserving Image-Text Embeddings (CVPR 2016)

CS 4803 / 7643: Deep Learning Guest Lecture: Embeddings and world2vec Feb. 18 th 2020 Ledell Wu

Word Embeddings Tutorial HILA GONEN PHD STUDENT AT YOAV GOLDBERGS LAB BAR ILAN UNIVERSITY

Dense Word Embeddings CMSC 470 Marine Carpuat Slides credit: Jurasky &amp; Martin How to

December 2017 Forward-Looking / Cautionary Statements This presentation, including any oral

I am Satoru Shiono of MS&amp;AD Holdings Corporate Communications and Investor Relations Dept.

How can we develop a complete picture of suicidal behaviors to inform our plans and programs?

The Natural History Society of Northumbria How to learn to love nature and not panic about gaps

FEATURE AND LOCAL FEATURE DESCRIPTORS FOR SILK FABRIC PATTERN IMAGE RECOGNITION Thananchai

What is the Indiana Philanthropy Alliance? And What Does It Have to Do With the 2020 Census?

1 Virginia Clean Water Revolving Loan Fund (VCWRLF) Water Quality Improvement Fund

Workshop July 28, 2020 Discussion Seminar SEMINAR Q&amp;A Oral Comments Use the Raise

Sambuz

Useful Links

Newsletter

Mail Us

Dense Word Embeddings CMSC 470 Marine Carpuat Slides credit: Jurasky & Martin How to

I am Satoru Shiono of MS&AD Holdings Corporate Communications and Investor Relations Dept.

Workshop July 28, 2020 Discussion Seminar SEMINAR Q&A Oral Comments Use the Raise