CS 744: BiSMARCK Shivaram Venkataraman Fall 2019 ADMINISTRIVIA - - - PowerPoint PPT Presentation

cs 744 bismarck
SMART_READER_LITE
LIVE PREVIEW

CS 744: BiSMARCK Shivaram Venkataraman Fall 2019 ADMINISTRIVIA - - - PowerPoint PPT Presentation

CS 744: BiSMARCK Shivaram Venkataraman Fall 2019 ADMINISTRIVIA - Assignment 2 out! - Project groups extension - OH / Setup meeting by email COURSE PROJECT PROPOSAL Propose topic, group (2 sentences) Oct 7 Project Proposal (2 pages) Oct


slide-1
SLIDE 1

CS 744: BiSMARCK

Shivaram Venkataraman Fall 2019

slide-2
SLIDE 2

ADMINISTRIVIA

  • Assignment 2 out!
  • Project groups extension
  • OH / Setup meeting by email
slide-3
SLIDE 3

COURSE PROJECT PROPOSAL

Propose topic, group (2 sentences) – Oct 7 Project Proposal (2 pages) – Oct 17 Introduction Related Work Timeline (with eval plan)

slide-4
SLIDE 4

WRITING AN INTRODUCTION

1-2 paras: what is the problem you are solving why is it important (need citations) 1-2 paras: How other people solve and why they fall short 1-2 paras: How do you plan on solving it and why your approach is better 1 para: Anticipated results or what experiments you will use

slide-5
SLIDE 5

WRITING RELATED WORK

Group related work into two/three buckets (1-2 para per bucket) Explain what the papers / projects do Why are they different / insufficient

slide-6
SLIDE 6

Scalable Storage Systems Datacenter Architecture Resource Management Computational Engines Machine Learning SQL Streaming Graph Applications

slide-7
SLIDE 7

MACHINE LEARNING

Classification Recommendation

slide-8
SLIDE 8

Optimization

Function Data (Examples) Model Regularization

slide-9
SLIDE 9

Convex Optimization

What is convex ? Linear Regression, Linear SVM Kernel SVMs, Logistic Regression, What is not convex ? Graph mining, Deep Learning

slide-10
SLIDE 10

Gradient Descent

Initialize w For many iterations: Compute Gradient Update model End

slide-11
SLIDE 11

INCREMENTAL Gradient Descent

Initialize w For many iterations: Pick one point Compute Gradient Update model End

slide-12
SLIDE 12
slide-13
SLIDE 13

Bismarck Architecture

slide-14
SLIDE 14

BISMARCK: USER DEFINED AGGREGATE

Three steps:

  • 1. initialize(state)
  • 2. transition(state, data)
  • 3. terminate(state)
slide-15
SLIDE 15

BISMARCK: LOGISTIC REGRESSION

slide-16
SLIDE 16

DATA ORDERING

Random sampling

  • Sample without replacement
  • Shuffle the data after each epoch

Shuffle once

  • Avoids pathological ordering
  • Much cheaper
slide-17
SLIDE 17

RESERVOIR SAMPLING

Select first m items On the kth additional item s = random in [0, m + k) if s < m Put in slot s else Drop the item

slide-18
SLIDE 18

Parallel gradients

Shared Memory:

  • Compute gradients in parallel
  • Average their updates
  • Or update in parallel
  • Locks?
slide-19
SLIDE 19

DISCUSSION

https://forms.gle/nFNEi2NZMNhZio1f7

slide-20
SLIDE 20

How would an implementation of GD look in Spark? Try to sketch an implementation. What would be similar / different to Bismarck?

slide-21
SLIDE 21

What are some ML scenarios where Bismarck architecture might prove to be limited?

slide-22
SLIDE 22
slide-23
SLIDE 23

NEXT STEPS

Next class: Parameter Server Assignment 2 out! Project Proposal Groups by Oct 7 2 pager by Oct 17