tvm for ads ranking facebook
play

TVM for Ads Ranking @ Facebook Hao Lu, Ansha Yu, Yinghai Lu, Andrew - PowerPoint PPT Presentation

TVM for Ads Ranking @ Facebook Hao Lu, Ansha Yu, Yinghai Lu, Andrew Tulloch Ads Ranking at Facebook . . . ad 1 ad 2 ad 3 ad n model 2 model 2 model 1 model 3 . . . model k batch 1 batch 2 predictions X 2 Ads Ranking at Facebook:


  1. TVM for Ads Ranking @ Facebook Hao Lu, Ansha Yu, Yinghai Lu, Andrew Tulloch

  2. Ads Ranking at Facebook . . . ad 1 ad 2 ad 3 ad n model 2 model 2 model 1 model 3 . . . model k batch 1 batch 2 predictions X 2

  3. Ads Ranking at Facebook: Production Requirements • Parallel execution between model evaluation ad 1 ad 2 ad 3 • Each model runs on a single thread • For each model, there can be multiple batches executing at the same time. In this case, weights are global and shared between threads, but activations are thread local • Model weights are refreshed every few model 2 model 2 model 1 hours. Therefore, activations needs to be batch 1 batch 2 released at the end of each inference to avoid running out of memory • Batch size is dynamic • C++ only • Mutiple CPU architectures: avx512, avx2 predictions X 3

  4. Model Architecture TVM EMB MLP: Multilayer perceptron (sequence of FC + activation function) https://ai.facebook.com/blog/dlrm-an-advanced-open-source-deep-learning-recommendation-model/

  5. Ads Ranking Models Implementation Dense features + embeddings from ca ff e2 batch_size x • JIT (not AOT): because models are updated periodically • Graph runtime does not manage memory • weights are shared between threads for the same model graph runtime graph runtime graph runtime • activations are shared by instances of all graph batch_size 1 batch_size 2 batch_size n runtimes • release activation after each iteration to avoid OOM Performance • Use MKL for FC for simplicity prediction • 5-10% speedup from fusion • Runtime overhead eats into speedup 5

  6. What's Next Relay VM Performance • Handles dynamic shapes • Autotuning at scale • JIT compilation • FBGEMM for fp16 and int8 • Dynamic memory allocation • Embedding lookup 6

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend