A Resilient Framework for Iterative Linear Algebra Applications in - - PowerPoint PPT Presentation

a resilient framework for iterative linear algebra
SMART_READER_LITE
LIVE PREVIEW

A Resilient Framework for Iterative Linear Algebra Applications in - - PowerPoint PPT Presentation

A Resilient Framework for Iterative Linear Algebra Applications in X10 Sara S. Hamouda Australian National University Josh Milthorpe IBM T.J. Watson Research Center Peter E. Strazdins Australian National University Vijay Saraswat IBM T.J.


slide-1
SLIDE 1

A Resilient Framework for Iterative Linear Algebra Applications in X10

2015 ACM SIGPLAN X10 Workshop at PLDI

Sara S. Hamouda Australian National University Josh Milthorpe IBM T.J. Watson Research Center Peter E. Strazdins Australian National University Vijay Saraswat IBM T.J. Watson Research Center

slide-2
SLIDE 2

Programmability vs. Resilience

Local Memory View SPMD, Actor Local Memory View SPMD, Actor MPI, Charm++, Erlang Global Memory View SPMD Global Memory View SPMD UPC, Titanium, CAF Global Memory View Async Task Parallelism Global Memory View Async Task Parallelism X10, Chapel

Programmability

slide-3
SLIDE 3

PPoPP 2014 - Resilient X10 Paper

slide-4
SLIDE 4

X10 Domain Specific Libraries

4

  • GML (Global Matrix Library)
  • ANUChem
  • ScaleGraph
  • M3RLite (Main Memory Map Reduce Lite)
  • Megaffic (Traffic flow simulation)
  • SatX10 (Parallel boolean satisfiability)
slide-5
SLIDE 5

X10 Domain Specific Libraries

5

  • Resilient GML (Global Matrix Library)
  • ANUChem
  • ScaleGraph
  • M3RLite (Main Memory Map Reduce Lite)
  • Megaffic (Traffic flow simulation)
  • SatX10 (Parallel boolean satisfiability)
slide-6
SLIDE 6

Outline

  • Resilient X10
  • GML

– API Overview – Resilience Limitations – Resilience Enhancements – Performance Results

6

slide-7
SLIDE 7

Resilient X10

// Task A try { at(p) { // Task B finish { at(q) async { // Task C } } } } catch (dpe:DeadPlaceException) { // recovery step } // D

Place r Place q Place p

spawn

A B C

slide-8
SLIDE 8

Resilient X10

// Task A try { at(p) { // Task B finish { at(q) async { // Task C } } } } catch (dpe:DeadPlaceException) { // recovery step } // D

Place r Place q Place p

spawn

A B C

slide-9
SLIDE 9

// Task A try { at(p) { // Task B finish { at(q) async { // Task C } } } } catch (dpe:DeadPlaceException) { // recovery step } // D

Resilient X10

spawn

Place r Place q Place p

spawn

A B C

  • Resilient X10 supports only the sockets backend
  • Resilient Store

– Centralized Store – Distributed Store (currently not supported)

slide-10
SLIDE 10

Outline

  • Resilient X10
  • GML

– API Overview – Resilience Limitations – Resilience Enhancements – Performance Results

10

slide-11
SLIDE 11

Global Matrix Library (GML)

  • Distributed matrix library in X10
  • Simple programming model

– Matrix based – Sequential style programming – Efficient iterative processing

  • Potential compilation target for high-level array languages

– Provides fundamental vector/matrix routines – Supports dense and sparse matrix formats – Uses BLAS and LAPACK

slide-12
SLIDE 12

GML Vector/Matrix Classes

Single Place Multi-Place Duplicated Distributed

1 Block/Place N Blocks/Place DenseMatrix SymDense TriDense SparseCSC SparseCSR DupDenseMatrix DupSparseMatrix DistDenseMatrix DistSparseMatrix DistBlockMatrix Vector DupVector DistVector

slide-13
SLIDE 13

PageRank Implementation in GML

/* Matrix dimensions */ var m:Long, n:Long; /* Matrix partitioning configurations */ var rowBlocks:Long, colBlocks:Long, rowPlaces:Long, colPlaces:Long; /* Create GML objects */ val G:DistBlockMatrix = DistBlockMatrix.make(m, n, rowBlocks, colBlocks, rowPlaces, colPlaces); val P:DupVector = DupVector.make(n); val U:DistVector = DistVector.make(n,G.getAggRowBs()); val GP:DistVector = DistVector.make(n,G.getAggRowBs()); /* Data initialization code omitted */

Algorithm: for (1..k) P = α G P + (1 − α) E U

T

P

slide-14
SLIDE 14

PageRank Implementation in GML

/* Data initialization code omitted */ for (1..k) { GP.mult(G, P).scale(alpha); val UtP1a = U.dot(P) * (1-alpha); GP.copyTo(P.local()); P.local().cellAdd(UtP1a); P.sync(); }

Algorithm: for (1..k) P = α G P + (1 − α) E U

T

P

slide-15
SLIDE 15

Outline

  • Resilient X10
  • GML

– API Overview – Resilience Limitations – Resilience Enhancements – Performance Results

15

slide-16
SLIDE 16

GML Resilience Limitations

  • Fixed place distribution
  • Failure of a place resulted in loss of GML objects
  • no built-in mechanism for restoring objects
slide-17
SLIDE 17

Resilience Enhancements (1)

  • Arbitrary and dynamic place distribution

– make(..., places:PlaceGroup) – remake(..., newPlaces:PlaceGroup)

slide-18
SLIDE 18

DistVector Redistribution

val pg = make_P0_P2_group(); A.remake(pg);

Place 1 Place 2 Place 0 A Place 1 Place 2 Place 0 2 4 6 8 10 A

Before remake After remake

slide-19
SLIDE 19

Resilience Enhancements (2)

  • Added in-memory snapshot / restore capability to GML classes

interface Snapshottable { makeSnapshot():Snapshot; restoreSnapshot(Snapshot):void; }

slide-20
SLIDE 20

val A = DistVector.make(6); A.init((i:Long)=> i*2.0);

Place 1 Place 2 Place 0 2 4 6 8 10 A

DistVector Snapshot/Restore

A PlaceLocalHandle

slide-21
SLIDE 21

DistVector Snapshot/Restore

val A = DistVector.make(6); A.init((i:Long)=> i*2.0); val snap = A.makeSnapshot();

Place 1 Place 2 Place 0 2 4 6 8 10 A snap key value key value key value A PlaceLocalHandle

slide-22
SLIDE 22

val A = DistVector.make(6); A.init((i:Long)=> i*2.0); val snap = A.makeSnapshot();

Place 1 Place 2 Place 0 2 4 6 8 10 A snap key value 2 key value 1 key value 2 1 2 8 10 4 6 2 8 10 4 6

DistVector Snapshot/Restore

A PlaceLocalHandle Copy from Snapshot

slide-23
SLIDE 23

Place 1 Place 2 Place 0 2 4 6 8 10 A snap key value 2 key value 1 key value 2 1 2 8 10 4 6 2 8 10 4 6

DistVector Snapshot/Restore

A PlaceLocalHandle

val A = DistVector.make(6); A.init((i:Long)=> i*2.0); val snap = A.makeSnapshot(); /* Place 1 failed */

slide-24
SLIDE 24

Place 2 Place 0 A snap key value 2 key value 2 1 2 8 10 8 10 4 6

DistVector Snapshot/Restore

A PlaceLocalHandle

val A = DistVector.make(6); A.init((i:Long)=> i*2.0); val snap = A.makeSnapshot(); /* Place 1 failed */ val pg = make_P0_P2_group(); A.remake(pg);

slide-25
SLIDE 25

Place 2 Place 0 2 6 8 A snap key value 2 key value 2 1 2 8 10 8 10 4 6

DistVector Snapshot/Restore

4 10 A PlaceLocalHandle Copy from Snapshot

val A = DistVector.make(6); A.init((i:Long)=> i*2.0); val snap = A.makeSnapshot(); /* Place 1 failed */ val pg = make_P0_P2_group(); A.remake(pg); A.restoreSnapshot(snap);

slide-26
SLIDE 26

(1) Iterative Programming Model

interface ResilientIterativeApp { def step():void; def isFinished():void; def checkpoint(store:AppResilientStore):void; def restore(newPlaces:PlaceGroup, store:AppResilientStore, snapshotIter:Long):void; }

slide-27
SLIDE 27

(2) Iterative Application Executor

val store:AppResilientStore; while (!isFinished()) { try { if (restoreRequired) { val newPlaces = createRestorePlaceGroup(); restore(newPlaces, store, checkpointIter); } step(); if (iter % checkpointInterval == 0) { checkpoint(store); checkpointIter = iter; } iter++; } catch (dpe:DeadPlaceException) { restoreRequired = true; } }

slide-28
SLIDE 28

(3) Application Resilient Store

  • Concurrent and atomic snapshot/restore for multiple GML objects

class AppResilientStore { def startNewSnapshot(); def save(obj:Snapshottable); def saveReadOnly(obj:Snapshottable); def commit(); def cancelSnapshot(); def restore(); }

slide-29
SLIDE 29

PageRank Snapshot/Restore

def checkpoint(store:AppResilientStore){ store.startNewSnapshot(); store.saveReadOnly(G); store.saveReadOnly(U); store.save(P); store.commit(); } def restore(newPlaces:PlaceGroup, store:AppResilientStore,snapshotIter:Long){ G.remake(..., newPG); U.remake(..., newPG); P.remake(newPG); store.restore(); //restore other primitive variables }

slide-30
SLIDE 30

(4) Restore Modes

  • Restoration Modes

– Shrink – Shrink-Rebalance – Replace Redundant

slide-31
SLIDE 31

Shrink

Place 1 Place 2 Place 0

b0 b1 b2 b3 b4 b5

Place 2 Place 0

Before remake After remake

b0 b3 b1 b4 b2 b5 b0` b3` b1` b4` b2` b5`

slide-32
SLIDE 32

Place 1 Place 2 Place 0

b0 b1 b2 b3 b4 b5

Place 2 Place 0

Before remake After remake

b0 b3 b1 b4 b2 b5 c0 c1 c2 c3 c0 c2 c1 c3

Shrink Rebalance

slide-33
SLIDE 33

Outline

  • Resilient X10
  • GML

– API Overview – Resilience Limitations – Resilience Enhancements – Performance Results

33

slide-34
SLIDE 34

Experimental Setup

  • SoftLayer Cluster host hosted at IBM Almaden Research Center

– 11 nodes: four-core 2.6 GHz Intel Xeon E5-2650 CPU with 8 GB of memory

  • X10:

– Native X10, version 2.5.2 – 4 places per node, X10_NTHREADS=1 – X10RT sockets backend

  • GML:

– OpenBLAS version 0.2.13 (OPENBLAS_NUM_THREADS=1)

slide-35
SLIDE 35

Checkpoint and Restore Overheads

  • Checkpoint every 10 iterations (3 checkpoints per run)
  • A single place failure at iteration 15
  • Repeat the experiments with different restore modes:

– Shrink – Shrink-Rebalance – Redundant

slide-36
SLIDE 36

Applications

  • Dense

– LinReg (50,000 X 500 per place) – LogReg (50,000 X 500 per place)

  • Sparse

– PageRank (2M edges per place)

slide-37
SLIDE 37

Linear Regression Linear Regression Logistic Regression Logistic Regression Overhead on 44 places: ~120% Overhead on 44 places: ~100% PageRank PageRank Overhead on 44 places: ~3%

Resilient X10 Overhead

slide-38
SLIDE 38

Checkpoint and Restore Overheads

  • Checkpoint every 10 iterations (3 checkpoints per run)
  • A single place failure at iteration 15
  • Repeat the experiments with different restore modes:

– Shrink – Shrink-Rebalance – Redundant

slide-39
SLIDE 39

Logistic Regression Logistic Regression

Time per Checkpiont

PageRank PageRank Linear Regression Linear Regression Overhead on 44 places: ~8% Overhead from 12 to 44 places: ~18% Overhead from 12 to 44 places: ~8%

slide-40
SLIDE 40

Logistic Regression Logistic Regression Linear Regression Linear Regression PageRank PageRank

Restore Time

From 12 to 44 places in Redundant mode: ~25% From 12 to 44 places in Redundant mode: ~9% From 12 to 44 places in Redundant mode: ~67%

slide-41
SLIDE 41

3 Checkpoints + 1 Restore

Logistic Regression Logistic Regression Linear Regression Linear Regression PageRank PageRank Overhead to Non-Resilient X10 mode: ~273% Overhead to Non-Resilient X10 mode: ~219% Overhead to Non-Resilient X10 mode: ~31%

slide-42
SLIDE 42

Conclusions

  • We presented a framework for developing resilient linear algebra

applications using X10 – Simple to use – Generic enough to be used in other libraries (i.e. ScaleGraph) – Assumptions:

  • Place 0 is immortal
  • Fails when 2 neighbouring processes fail

– Reasonable scalability for dense matrix checkpoint / restore – Main source of the performance overhead is due to Resilient X10 mode itself

  • Full source code freely available at http://x10-lang.org as part of

GML version 2.5.2

slide-43
SLIDE 43

Future Work

  • Improve Resilient GML's performance:

– Enhancements in Resilient X10

  • Support MPI, avoid the centralized resilient store

– More efficient fault tolerance techniques – Use Elastic X10

  • Compare Resilient GML with other frameworks (i.e. Spark).