DIMMWITTED: A STUDY OF MAIN-Memory Statistical ANALYTICS Shivaram - - PowerPoint PPT Presentation

dimmwitted a study of main memory statistical analytics
SMART_READER_LITE
LIVE PREVIEW

DIMMWITTED: A STUDY OF MAIN-Memory Statistical ANALYTICS Shivaram - - PowerPoint PPT Presentation

DIMMWITTED: A STUDY OF MAIN-Memory Statistical ANALYTICS Shivaram Venkataraman MOTIVATION How to best use main memory ? Memory Bandwidth: ~60 GB/s r3.8xlarge on EC2 DESIGN SPACE Access method Row vs. Column Density


slide-1
SLIDE 1

DIMMWITTED: A STUDY OF MAIN-Memory Statistical ANALYTICS

Shivaram Venkataraman

slide-2
SLIDE 2

MOTIVATION

How to best use main memory ? Memory Bandwidth: ~60 GB/s r3.8xlarge on EC2

slide-3
SLIDE 3

DESIGN SPACE

  • Access method

– Row vs. Column – Density

  • Replication

– Data – Model

slide-4
SLIDE 4

ITERATIVE ALGORITHMS: ACCESS METHOD

n d d n Sample rows vs. columns Broadly “gradient” vs “coordinate” methods.

slide-5
SLIDE 5

DATA DENSITY: Dense vs. SPARSE

Dense Linear Algebra

  • More FLOPs / CPU intensive
  • e.g., Matrix vector multiply: O(n * d)

Sparse Linear Algebra

  • Lesser FLOPs / communication intensive
  • e.g., Matrix vector multiply: O(nnz * d)

n d

slide-6
SLIDE 6

DIMM WITTED: ACCESS METHODS

Data Model

slide-7
SLIDE 7

REPLICATION

Model

  • Replica per core ? Similar to Spark, shared nothing
  • Replica per machine ? Shared memory
  • Hybrid: Replica per NUMA node

Data

  • Partition per core ? Similar to shared nothing
  • Replicate data per NUMA node?
slide-8
SLIDE 8

DIMM WITTED

slide-9
SLIDE 9

OPTIMIZER

Inputs

  • frow, fcol, fctr
  • data A ∈ RN×d
  • Initial model vector

Output

  • Execution plan for each CPU
  • subset of data
  • model replica
  • access method to use
slide-10
SLIDE 10

ACCESS METHOD

  • Cost Ratio: how much more

expensive writes are

  • Row-wise is more efficient when

writes are cheap

  • Column-to-row becomes more

efficient at some point

slide-11
SLIDE 11

MODEL REPLICATION

slide-12
SLIDE 12

DATA REPLICATION

slide-13
SLIDE 13

TAKEAWAYS

  • Data access patterns matters but changes based on problem
  • Model / data replication design space
  • “Optimizer” for ML
slide-14
SLIDE 14

QUESTIONS / DISCUSSION ?