CS 744: Powergraph Shivaram Venkataraman Fall 2020 ADMINISTRIVIA - - PowerPoint PPT Presentation

cs 744 powergraph
SMART_READER_LITE
LIVE PREVIEW

CS 744: Powergraph Shivaram Venkataraman Fall 2020 ADMINISTRIVIA - - PowerPoint PPT Presentation

CS 744: Powergraph Shivaram Venkataraman Fall 2020 ADMINISTRIVIA ! ! - Midterm update Tonight - Course Project reminders groups Discussion - id - email Group Number : - Piazza group corresponding the You can join - week !


slide-1
SLIDE 1

CS 744: Powergraph

Shivaram Venkataraman Fall 2020

slide-2
SLIDE 2

ADMINISTRIVIA

  • Midterm update
  • Course Project reminders

Tonight

! !

  • Discussion

groups

  • Piazza

Group Number

:

email

  • id
  • You

can

join

the

corresponding

group

  • OH

slot

:

start

this

from

next

week !

extra

slide-3
SLIDE 3

Scalable Storage Systems Datacenter Architecture Resource Management Computational Engines Machine Learning SQL Streaming Graph Applications f-

Naiad ,

Spark streaming

slide-4
SLIDE 4

GRAPH DATA

Datasets Application

I .

Social

network

" friend

graph

" - > recommendation

PageRank

2 .

Internet !

web

pages

,

link

Hosts

are

connected

s .

Fagots

→ e.

  • ut,

pair.ir.am#y.mrtgg..ponltrgeg:i7nm

4

.

Paper't cites Papert

cites

  • thers

etc

.

5

.

Software dependencies

actor frame ..ru/Btonimt!

. . .

Spark

Akka

slide-5
SLIDE 5

GRAPH ANALYTICS

Perform computations on graph-structured data Examples PageRank Shortest path Connected components …

see L

queries

  • n

Tabular data

g

slide-6
SLIDE 6

PREGEL: PROGRAMMING MODEL

Message combiner(Message m1, Message m2): return Message(m1.value() + m2.value()); void PregelPageRank(Message msg): float total = msg.value(); vertex.val = 0.15 + 0.85*total; foreach(nbr in out_neighbors): SendMsg(nbr, vertex.val/num_out_nbrs);

Vow
  • f
11

Ef

  • 7in:c

:

" ""

vet ::

a \
  • State

q

  • f
  • 4

2

3

this vertex

  • =

e) het messages

from

Neighbors

&

°

e)

combiner

coalesces

messages

]rh%eat

,

computation

using

the

combined

message

convergence

(4)

Send

  • ut

msgs to

Neighbors

slide-7
SLIDE 7

NATURAL GRAPHS

a)

Distribution

  • f

degree

is

skewed !

  • most

vertices

have

small degree

  • some

vertices

have

very high degree

q

(2)

High degree

vertices

lead

to

skew

in

Communication ↳ memory

premiere (state)

computation

  • a

D

Hard

to

partition

such

graphs

a
slide-8
SLIDE 8

POWERGRAPH

Programming Model: Gather-Apply-Scatter Better Graph Partitioning with vertex cuts Distributed execution (Sync, Async)

Execution

slide-9
SLIDE 9

GATHER-APPLY-SCATTER

Gather: Accumulate info from nbrs Apply: Accumulated value to vertex Scatter: Update adjacent edges, vertices

// gather_nbrs: IN_NBRS gather(Du, D(u,v), Dv): return Dv.rank / #outNbrs(v) sum(a, b): return a+b apply(Du, acc): rnew = 0.15 + 0.85 * acc Du.delta = (rnew - Du.rank)/ #outNbrs(u) Du.rank = rnew // scatter_nbrs: OUT_NBRS scatter(Du,D(u,v),Dv): if(|Du.delta|> ε) Activate(v) return delta

⑦ → Are

state

Az ⑦

AHAHA ,

As

quieter fedt~veri.IE

  • father

returns

an

accumulator

  • → change

in value

  • You

can

combine

accumulators

!

  • similar

to

reduction

in

spark

  • Activate
  • n

a

neighboring

vertex from

scatter

Allows

us

to

  • nly

process

necessary

vertices

in

next

iteration

slide-10
SLIDE 10

EXECUTION MODEL, CACHING

Active Queue

Delta caching Cache accumulator value for vertex Optionally scatter returns a delta Accumulate deltas

Could

run

into

race

conditions

vertex

Ftl

Single

machine

.

Hath

h

na::*.li#*.eaon-atel:oii

.

gather

!÷7n÷e

.
  • '
.
  • F
u,qedge
  • state

P

.
  • .
.

'¥¥

.

Huyser.fm/aaufaa.fau4aIy

¥⇒¥→

.

apses ?!

.ae)

Eat Eat .

scatter UD

→ mainframes

need "

u÷÷÷+.l÷÷at

future

  • perations

  • Syne
. rs .

A-sync

slide-11
SLIDE 11

SYNC VS ASYNC

Sync Execution Gather for all active vertices, followed by Apply, Scatter Barrier after each minor-step Async Execution Execute active vertices, as cores become available No Barriers! Optionally serializable

Queue

  • f

V1

  • perations

/

  • Vz

#

'

vs

.

  • Barrier

read her,

GUD

neighbor

ensures

Vertenl

updates

vertex

state

GUD

edge

AUD

state

.

Barrier

state

update

huh)

update

Acu)

Acv

?

local

Alva

is

visible

in

,

state

Barrier

next

mirror

GCVD

so

?

:

step

slide-12
SLIDE 12

DISTRIBUTED EXECUTION

Symmetric system, no coordinator Load graph into each machine Communicate across machines to spread updates, read state

state 1€

partition

:E

slide-13
SLIDE 13

GRAPH PARTITIONING

mirror

I 1 mirror

  • O

'

O

→ Every vertex is

placed

a

Every

edge

is

placed

  • n

machine

a machine

Edges

might

span

across

them

Vertices

might

be across

machines

  • v. Natural

graphs

lots

  • f

edges

across

→ Better

balance for

machines !

natural

graphs

slide-14
SLIDE 14

RANDOM, GREEDY OBLIVIOUS

Three distributed approaches: Random Placement Coordinated Greedy Placement Oblivious Greedy Placement

qmachiez

t

machineI

② B

↳ stream

through

edges

send

edge

to

a

random

machine

send

edge

6-

a

machine

that already has

  • ne
  • f

its

vertices

↳ greedy

in

parallel

so

you

don't

have

perfect

knowledge

  • f

vertex

machine

slide-15
SLIDE 15

OTHER FEATURES

Async Serializable engine Preventing adjacent vertex from running simultaneously Acquire locks for all adjacent vertices Fault Tolerance Checkpoint at the end of super-step for sync

[IIFhfFj

.

super step

slide-16
SLIDE 16

SUMMARY

Gather-Apply-Scatter programming model Vertex cuts to handle power-law graphs Balance computation, minimize communication

slide-17
SLIDE 17

DISCUSSION

https://forms.gle/rKB5hcJgT4NQsFgq8

slide-18
SLIDE 18

Consider the PageRank implementation in Spark vs synchronous PageRank in

  • PowerGraph. What are some reasons why PowerGraph might be faster?

Activate

ensures

no

wasteful

computation

fine

  • grained

communication

in

Power

  • graph

Better

partitioning

!

Delta

caching

avoids

computation

slide-19
SLIDE 19
slide-20
SLIDE 20

NEXT STEPS

Next class: GraphX

Partitioning

in

spark

Co - partitioning

µ

:.me?:::::E:::nYJsrr

.

iterations

!

Powergraph has

methods to

fick

what

vertices go

in

a

partition