A journey towards real-life results illustrated by using AI in - PowerPoint PPT Presentation

A journey towards real-life results … illustrated by using AI in Twitter’s Timelines

Overview ● ML Workflows ● The Timelines Ranking case ● The power of the platform, opportunities ● Future

Deep Learning Workflows ● Pure Research ○ Model Exploration ● Applied Research ○ Dataset/Feature Exploration ○ Model Exploration ● Production ○ Feature Addition ○ Data Addition ○ Training ○ Deployment ○ A/B test

Deep Learning Workflows ● Pure Research ○ Model Exploration + Training → Very flexible modeling framework ● Applied Research ○ Dataset/Feature Exploration → Flexible data exploration framework ○ Model Exploration + Training → Flexible modeling framework ● Production ○ Feature Addition → Scalable data manipulation framework ○ Data Addition → Scalable data manipulation framework ○ Training → Fast, robust training engine ○ Deployment → Seamless and tested ML services ○ A/B test → Good AB test environment

Deep Learning Workflows Pure Applied Research Research Production

Deep Learning Workflows PRODUCTION

Data First Workflow ● Model architecture doesn’t matter (anymore) ● Large Scale data manipulation matters ● Fast training matters ● Ease of deployment matters ● Testing matters!!! ○ Training VS online ○ Continuous integration

Case Study Timelines Ranking (Blog Post @TwitterEng)

Timelines Ranking ● Sparse features ● A few billions data samples ● Low latency ● Candidates generation → Heavy model → sort → publish ● Before: decision trees + other sparse techniques ● Probability prediction

Timelines Ranking New Modules

Sparse Linear Layer 2 Nj = F(∑ Wi,j * norm(Vi) + Bj ) . . . . . . . 1 2 5 0 F = Sigmoid/ReLU/PReLU . . . . . . . V1 V2 Vn-2 Vn-1 Vn has_image is_vit engagement days_since obama_word {0,1} {0,1} ratio [0,+infinity]] {0,1} [0,+infinity]]

Sparse Linear Layer: Online Normalization ● Example: input feature (value == 1M) ⇒ weight_gradient == 1M ⇒ update == 1M * learning_rate ⇒ explosion ● Solution: normalization of input values norm(Vi) == Vi / max(all_abs_Vi) + bi Trainable per-feature Belongs to [-1,1] bias: discriminate absence and presence of features

Sparse Linear Layer: Speedups CPU -- i7 3790k Forward pass -- ~500 features -- output size == 50 Batch VS PyTorch (1 thread) VS TensorFlow (’’’’) VS PyTorch (4 threads) VS TF* (’’’’) Size 1 2.1x 4.1x 2.8x 5.5x 16 1.7x 1.7x 4.6x 4.3x 64 1.7x 1.3x 5.1x 3.8x 256 1.8x 1.2x 5.6x 3.5x

Sparse Linear Layer: Speedups GPU -- Tesla M40 -- CUDA 7.5 Forward pass -- ~500 features -- output size == 50 Batch VS cuSparse Size 1 0.7x 16 4.4x 64 5.2x 256 2x

Split Nets ... ... 1 2 N 1 2 N SPLIT NET 1: SPLIT NET K: TWEET BINARY ENGAGEMENT FEATURES FEATURES . . . . . . . V1 V2 Vn-2 Vn-1 Vn has_image has_link engagement days_since obama_word {0,1} {0,1} ratio [0,+infinity]] {0,1} [0,+infinity]]

Split Nets UNIQUE DEEP NET (N*K neurons) GLUE ALL SPLIT NETS !!! SPLIT 1: ... ... 1 2 N 1 2 N SPLIT K: . . . . . . . TWEET BINARY ENGAGEMENT FEATURES FEATURES

Prevent overfitting -- Split by feature type ● Send “dense” features on one side ○ BINARY ○ CONTINUOUS ○ (SPARSE_CONTINUOUS) ● “Sparse” features on the other side ○ DISCRETE ○ STRING ○ SPARSE_BINARY ○ (SPARSE_CONTINUOUS)

Sampling -- Calibration ● Sample according to positive ratio P ● Output average probability == P ⇒ Need Calibration ● Use Isotonic Calibration

Feature Discretization Intuition ● Max normalization good to avoid explosion BUT ● Per-aggregate-feature min/max range larger ● Max-normalization will generate very small input feature values ● The deep net will have tremendous trouble learning on such small values ● Std/mean normalization? Better but still not satisfying Solution ● Discretization

Dsicretization ● Feature id == 10 ● → over the entire dataset, compute equal sized bins, assign bin_id ● At inference time, for key/value (id,value): ○ id → bin_id ○ value → 1 ● Other possibilities: Decision trees, ...

Final simplest architecture 1) Discretizer(s) 2) Sparse Layer with online normalization 3) MLP 4) Prediction 5) Isotonic Calibration

The power of the platform

The power of the platform ● Testing ● Tracking ● Automation ● Robustness ● Standardization ● Speed ● Workflow ● Examples ● Support ● Easy Multimodal (Text + media + sparse + …)

The power of the platform ● How to train all this? ○ Train the discretizer ○ Train the deep net ○ Calibrate the probabilities ○ Validate ○ ... ● Training loop + ML scheduler → one-liner ● Unique serialization format for params

The power of the platform ● How to deploy all this? ● Tight Twitter infra integration + saved model → one-liner deployment ● Arbitrary number of instances ● All the goodies from Twitter services infra! ● Seamless

The power of the platform ● How to test all this? ● Model offline validation → PredictionSet A ● Model online prediction → PredictionSet B ● PredictionSet A == PredictionSet B ?? ● Yes → ready to ship ● Continuous integration

Future

Future of DL platforms In a single platform: ● Abstract DAG of: ○ Services ○ Storages ○ Dataset ○ ... ● Model dependency handling ● Offline/Online feature mapping ● Coverage for all the workflows ● Bundling ● … Cloud?

DAG of services Cache Model A Model D Storage Cache Model B Storage Timelines, Recommendations, ... Model C

THANKS!

A journey towards real-life results illustrated by using AI in - PowerPoint PPT Presentation

A journey towards real-life results illustrated by using AI in Twitters Timelines Overview ML Workflows The Timelines Ranking case The power of the platform, opportunities Future Deep Learning Workflows Pure

A Musical Journey A Musical Journey A Musical Journey A Musical Journey A Musical Journey A

Real graduates, Real graduates, real transitions, real transitions, real stories: real

Using Social Media for Health Studies Ingmar Weber Social Computing, Qatar Computing Research

Methods Updating Variables Console Programs int life = 42; life life = 42 life; 21 life =

Beyond Beyond Journey Journey Times Times Bluetooth journey time process Moving beyond basic

Use Case Study: Journey - Effects and Terrain Presented by Madis Janno Use Case Study: Journey -

Jo Journey of In Indonesia Accounting Standards Now committed towards our journey to 2018

Chapter 24 Life in the Universe 24.1 Life on Earth Our goals for learning When did life

Life in the Universe Chapter 24 24.1 Life on Earth Our goals for learning When did life

Sister Pat McDermott: Presentation on the Journey of Oneness Part One: Journey of Oneness Desire

A MONUMENTAL JOURNEY A MONUMENTAL JOURNEY PURPOSE A MONUMENTAL JOURNEY, SCULPTURE TO PRESERVE

The Heros Journey The Heros Journey the hero's journey, is the common template of stories

Pauls Preaching Paul Journey 2 Pauls Third Missionary Journey Acts 18:23-28 Pauls

Real Students Real World Real Work Real Life: A Plan for a Holistic Approach to Supporting

PepsiCo Kolkata Plant Food Safety & Quality Excellence Journey 1 PepsiCo Confidential

journey [ my Journey ] I finished college My life fell apart! My wife left I cried myself to

Implementing Evidence-Based PPH Prevention and Management BLAMI DAO Director, Maternal and

RESOURCES NHS Lothian Guidelines for basic IV fluid and electrolyte prescription in adults

Classification, Dose-response Modelling, and the Evaluation of Biomarker in a Microarray Setting

Sha re ho lde rs Pre se nta tio n April 17, 2008 W Ma jo r De ve lo pme nts in E RVI F

Managem ent of paediatric severe sepsis-a brief overview Presented by: Radu Botgros, MD EMA

Technical Considerations for In-Beam Gamma-Ray Experiments at the RIBF P . Doornenbal

INDUSTRIES INC. CORPORATE PRESENTATION Financial Results for Fiscal 2018 and Second Quarter

Multi-agent constrained optimization of a strongly convex function Necdet Serhat Aybat

A journey towards real-life results illustrated by using AI in - PowerPoint PPT Presentation

A journey towards real-life results illustrated by using AI in Twitters Timelines Overview ML Workflows The Timelines Ranking case The power of the platform, opportunities Future Deep Learning Workflows Pure

A Musical Journey A Musical Journey A Musical Journey A Musical Journey A Musical Journey A

Real graduates, Real graduates, real transitions, real transitions, real stories: real

Using Social Media for Health Studies Ingmar Weber Social Computing, Qatar Computing Research

Methods Updating Variables Console Programs int life = 42; life life = 42 life; 21 life =

Beyond Beyond Journey Journey Times Times Bluetooth journey time process Moving beyond basic

Use Case Study: Journey - Effects and Terrain Presented by Madis Janno Use Case Study: Journey -

Jo Journey of In Indonesia Accounting Standards Now committed towards our journey to 2018

Chapter 24 Life in the Universe 24.1 Life on Earth Our goals for learning When did life

Life in the Universe Chapter 24 24.1 Life on Earth Our goals for learning When did life

Sister Pat McDermott: Presentation on the Journey of Oneness Part One: Journey of Oneness Desire

A MONUMENTAL JOURNEY A MONUMENTAL JOURNEY PURPOSE A MONUMENTAL JOURNEY, SCULPTURE TO PRESERVE

The Heros Journey The Heros Journey the hero's journey, is the common template of stories

Pauls Preaching Paul Journey 2 Pauls Third Missionary Journey Acts 18:23-28 Pauls

Real Students Real World Real Work Real Life: A Plan for a Holistic Approach to Supporting

PepsiCo Kolkata Plant Food Safety &amp; Quality Excellence Journey 1 PepsiCo Confidential

journey [ my Journey ] I finished college My life fell apart! My wife left I cried myself to

Implementing Evidence-Based PPH Prevention and Management BLAMI DAO Director, Maternal and

RESOURCES NHS Lothian Guidelines for basic IV fluid and electrolyte prescription in adults

Classification, Dose-response Modelling, and the Evaluation of Biomarker in a Microarray Setting

Sha re ho lde rs Pre se nta tio n April 17, 2008 W Ma jo r De ve lo pme nts in E RVI F

Managem ent of paediatric severe sepsis-a brief overview Presented by: Radu Botgros, MD EMA

Technical Considerations for In-Beam Gamma-Ray Experiments at the RIBF P . Doornenbal

INDUSTRIES INC. CORPORATE PRESENTATION Financial Results for Fiscal 2018 and Second Quarter

Multi-agent constrained optimization of a strongly convex function Necdet Serhat Aybat

PepsiCo Kolkata Plant Food Safety & Quality Excellence Journey 1 PepsiCo Confidential