a massively scalable architecture for learning
play

A Massively Scalable Architecture for Learning Representations from - PowerPoint PPT Presentation

A Massively Scalable Architecture for Learning Representations from Heterogeneous Graphs NVIDIA GPU Technology Conference 2019 - San Jose, CA C. Bayan Bruss Anish Khazane 1. Overview & Background TODAYS TALK 2. Our Approach How to


  1. A Massively Scalable Architecture for Learning Representations from Heterogeneous Graphs NVIDIA GPU Technology Conference 2019 - San Jose, CA C. Bayan Bruss Anish Khazane

  2. 1. Overview & Background TODAY’S TALK 2. Our Approach How to handle heterogeneity in training large graph embedding models 3. Results ​2

  3. Who we are ​Bayan Bruss ​Anish Khazane ​3

  4. SECTION ONE: OVERVIEW A quick background on graph embeddings & some of the issues related to scaling them ​4

  5. People are can be disproportionately attracted to content that is sensational or provocative. ​5

  6. Machine learning systems that learn how to serve content are prone to optimizing towards these types of content. ​6

  7. Some common problems and solutions -> Flag & demote content that is deemed objectionable If this is a problem with content (spam, 1. violent, racist, homophobic, etc.) -> Eliminate fraudulent accounts If this is a problem with users (fake 2. accounts, malicious actors) ​7

  8. What’s missing? ​8

  9. Basic mechanics of a neural network recommender User Article ​9

  10. Basic mechanics of a neural network recommender User Article ​10

  11. Basic mechanics of a neural network recommender Clicks On User Article ​11

  12. Basic mechanics of a neural network recommender User Article ​12

  13. Basic mechanics of a neural network recommender User Article ​13

  14. Basic mechanics of a neural network recommender User Article ​14

  15. Basic mechanics of a neural network recommender Recommended to User Article ​15

  16. How can we add more fidelity to these models? Treat heterogeneous graphs as containing distinct element types 1. Model interactions depending what type of entity is involved 2. ​16

  17. A brief history of graph embeddings Most Common Objective: Learn a continuous vector for each node in a graph that preserves some local or global - topological features about the neighborhood of that node Early Efforts Focused on Explicit Matrix Factorization Not very scalable - Highly tuned to specific topological attributes - ​17

  18. Meanwhile over in the language modeling world Word2Vec world blows things open Mikolov, Tomas, et al. "Distributed representations of words and phrases and their compositionality." Advances in neural information processing systems . 2013. Y Bengio, R Ducharme, P Vincent, C Jauvin. A neural probabilistic language model. Journal of machine learning research, 2003 ​18

  19. Quickly ported to graph embeddings Walks on a graph can be likened to sentences in a document A B C F D E ​19

  20. Quickly ported to graph embeddings Walks on a graph can be likened to sentences in a document A B C F D E ​20

  21. Quickly ported to graph embeddings Walks on a graph can be likened to sentences in a document A B C F D E ​21

  22. Quickly ported to graph embeddings Walks on a graph can be likened to sentences in a document A B C F D E ​22

  23. Quickly ported to graph embeddings Walks on a graph can be likened to sentences in a document A B C F D E ​23

  24. Quickly ported to graph embeddings Walks on a graph can be likened to sentences in a document A B C F D E ​24

  25. Quickly ported to graph embeddings Walks on a graph can be likened to sentences in a document A B C F D E ​25

  26. Quickly ported to graph embeddings Walks on a graph can be likened to sentences in a document [“D”, “B”, “A”, “F”] A B C [“F”, “C”, “F”, “E”] F D E ​26

  27. Walks on graphs can be treated as sentences [“D”, “B”, “A”, “F”] [“F”, “C”, “F”, “E”] ​27

  28. Graphs are different from language ​28

  29. Graphs are different from language ​29

  30. Graphs are different from language ​30

  31. Graphs can be heterogeneous Cavs Heat Lakers Thunder Rockets Warriors Dion Lebron JaVale Kevin James Steph Waiters James McGee Durant Harden Curry Confidential ​31

  32. All this makes scale an even bigger challenge Confidential ​32

  33. Homogeneous graphs are difficult Dimensionality: Millions or even billions of nodes Sparsity: Each node only interacts with a small subset of other nodes Confidential ​33

  34. Quickly hit limits on all resources An embedding space is a N X D dimensional matrix where each row corresponds to a row. 1) D is typically 100 - 200 (an arbitrary hyperparameter) 2) A 500M node graph would be 200 - 400 GB 3) Cannot hold in GPU memory 4) Quickly exceeds limits of a single worker 5) Lots of little vector multiplication ideal for GPUs 6) Sharding because of connectedness - sharding the matrix is challenging 7) ​34

  35. Heterogeneous graphs are even harder Have to keep K possible embedding spaces with N nodes for each Have to have an architecture that routes to the right embedding space Confidential ​35

  36. It’s also hard from an algorithmic perspective We’re working on this too but not the focus of today’s talk See interesting articles Metapath2Vec: Scalable Representation Learning for Heterogeneous Networks - CARL: Content-Aware representation Learning for Heterogeneous Graphs - Confidential ​36

  37. SECTION TWO: OUR APPROACH Applied Research: An architecture for handling heterogeneity at scale ​37

  38. Quick Primer on Negative Sampling Original SkipGram Model Need to compute softmax over entire vocabulary for each input ​38

  39. Quick Primer on Negative Sampling Original SkipGram Model VERY EXPENSIVE! ​39

  40. Softmax can be approximated by binary classification task Original SkipGram Model Binary Discriminator w(t-2) vs negative samples w(t-1) vs negative samples w(t+1) vs negative samples w(t+2) vs negative samples ​40

  41. Use non-edges to generate negative samples Negatives for B [“F”, “C”, “F”] [“D”, “B”, “A”, “F”] A B C Context for B [“F”, “C”, “F”, “E”] F D E ​41

  42. Walking on heterogeneous graph Cavs Heat Lakers Thunder Rockets Warriors Dion Lebron JaVale Kevin James Steph Waiters James McGee Durant Harden Curry Confidential ​42

  43. How to distribute (parallelize) training 1. Split the training set across a number of workers that execute in parallel asynchronously and unaware of the existence of each other. 2. Create some form of centralized parameter repository that allows learning to be shared across all the workers. ​43

  44. Parameter server partitioning A parameter server can hold the embeddings table which contains the ● vectors corresponding to each node in the graph. The embeddings table is a N x M table, where N is the number of nodes in ● the graph and M is a hyperparameter that denotes the number of embedding dimensions. ​44

  45. Variable Tensorflow Computational Graphs Confidential ​45

  46. SECTION THREE: RESULTS ​47

  47. Capital One Heterogeneous Data Node Type A: 18, 856, 021 Node Type B: 32, 107, 404 Total Nodes: 50, 963, 425 Edges: 280, 422, 628 Train Time: 3 Days on 28 workers Confidential ​48

  48. Friendster Graph Publicly available dataset 68,349,466 vertices (users) 2,586,147,869 edges (friendships) Sampled 80 positive and 5 * 80 negative edges per node as training data. The data was shuffled, split into chunks and distributed across workers ​49

  49. Friendster Graph ​50

  50. Friendster Graph ​51

  51. Implications Scalability: More nodes per entity type - More entity types - Convergence: Faster as number of workers increases - ​52

  52. Limitations and Future Directions Limitations Future Directions Python performance Evaluate c++ variant of architecture • • Not partitioning the embedding space Intelligent partitioning of graph so that • • each worker gets a component of the graph and only has to go to the server for Recomputing the computational graph for • small subset of nodes in other each batch could be optimized components Confidential ​53

  53. THANK YOU

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend