Kineograph
Raymond Cheng (University of Washinton, Microsoft Research) et al.
Kineograph Raymond Cheng (University of Washinton, Microsoft - - PowerPoint PPT Presentation
Kineograph Raymond Cheng (University of Washinton, Microsoft Research) et al. The challenge Social networks (Facebook, Twitter) generate a lot of information Let's analyze it! Simple data-mining won't do: too much data
Raymond Cheng (University of Washinton, Microsoft Research) et al.
generate a lot of information
○ too much data ○ constant influx of new data ○ long computation time
○ support incremental computation
@Alice: @Bob, check out these #kittens ! Transaction T @Alice @Bob #kittens Ingest node node(Alice)
T @Alice -> #kittens @Alice -> @Bob T #kittens -> @Alice
node(kittens)
T @Bob -> @Alice
node(Bob) Progress table after receiving ACKs, report T
updates (i.e. sets of edges)
○ at this point, it's just stored in the queue
clock
○ in practice, every 10 seconds
and sends it to graph nodes
times specified in progress table
○ new updates are coming in parallel
propagation
neighbours
Bonus: it's incremental between snapshots!
○ landmarks - top vertices from TunkRank ○ calculate only paths passing through landmarks
○ Intel Xeon (quad-core, 2.8 GHz) with 8 GB RAM
Twitter rate)
Decaying can help
Tunk-rank:
snapshooter):
○ simple replication ○ Paxos-based consensus
○ input data is cached until it is committed to a snapshot ○ if ingest node fails, all its transactions are discarded ○ another machine processes data from cache
next snapshot and starts working normally from there
○ maintain more logical partitions than nodes ○ to add nodes, migrate some logical partitions to it ○ splitting logical partitions is possible too ○ new node starts working from the next snapshot - just as in failure recovery