CS 744: PARAMETER SERVERS Shivaram Venkataraman Fall 2019 - - PowerPoint PPT Presentation

cs 744 parameter servers
SMART_READER_LITE
LIVE PREVIEW

CS 744: PARAMETER SERVERS Shivaram Venkataraman Fall 2019 - - PowerPoint PPT Presentation

CS 744: PARAMETER SERVERS Shivaram Venkataraman Fall 2019 ADMINISTRIVIA - Assignment 2 is out! - Course project groups due Oct 7 - Introductions due Oct 17 Applications Machine Learning SQL Streaming Graph Computational Engines Scalable


slide-1
SLIDE 1

CS 744: PARAMETER SERVERS

Shivaram Venkataraman Fall 2019

slide-2
SLIDE 2

ADMINISTRIVIA

  • Assignment 2 is out!
  • Course project groups due Oct 7
  • Introductions due Oct 17
slide-3
SLIDE 3

Scalable Storage Systems Datacenter Architecture Resource Management Computational Engines Machine Learning SQL Streaming Graph Applications

slide-4
SLIDE 4

Machine Learning Bismarck Supervised learning Unified Interface Shared memory Model fits in memory

slide-5
SLIDE 5

MOTIVATION

  • Large training data 1TB to 1PB
  • Models with 109 to 1012 parameters
  • Goals

– Efficient communication – Flexible synchronization – Elastic Scalability – Fault Tolerance and Durability

slide-6
SLIDE 6

EXAMPLE WORKLOAD

Ad Click Prediction

  • Trillions of clicks per day
  • Very sparse feature vectors

Computation flow

slide-7
SLIDE 7

ARCHITECTURE

slide-8
SLIDE 8

REPRESENTATION

  • Key value pairs e.g., (featureID, weight)
  • Assume keys are ordered.

Easier to apply linear algebra operations

  • Interface supports range push and pull

w.push(R, dest)

  • Support for user-defined functions on server-side
slide-9
SLIDE 9

TASK DEPENDENCY

slide-10
SLIDE 10

CONSISTENCY MODELS

User defined filters Significantly modified filter KKT filter

slide-11
SLIDE 11

IMPLEMENTATION: VECTOR CLOCKS

slide-12
SLIDE 12

IMPLEMENTATION

Key Caching

  • Worker might send the same key lists again
  • Receiving node caches the key lists
  • Sender only needs to send a hash of the list

Value Compression

  • Lots of repeated values, zeros
  • Use Snappy to compress messages
slide-13
SLIDE 13

IMPLEMENTATION: REPLICATION

Replication after aggregation

slide-14
SLIDE 14

FAULT TOLERANCE

  • 1. Server manager assigns the new node a key range

to serve as master.

  • 2. The node fetches the range of data to maintains as

master and k additional ranges to keep as slave.

  • 3. The server manager broadcasts the node changes.

The recipients of the message may shrink their own data

slide-15
SLIDE 15

SPARSE LR

slide-16
SLIDE 16

DISCUSSION

https://forms.gle/35vrxyG6WLmSvCs38

slide-17
SLIDE 17

What are some of the downsides of using PS compared implementing Gradient Descent in Bismarck / Spark?

slide-18
SLIDE 18
slide-19
SLIDE 19

How would you integrate PS with a resource manager like Mesos? What would be some of the challenges?

slide-20
SLIDE 20

NEXT STEPS

Next class: Tensorflow Assignment 2 is out! Course project deadlines Oct 7 (topics) Oct 17 (proposals)