Data Storage Solutions for Decentralized Online Social Networks - - PowerPoint PPT Presentation

data storage solutions for decentralized online social
SMART_READER_LITE
LIVE PREVIEW

Data Storage Solutions for Decentralized Online Social Networks - - PowerPoint PPT Presentation

Data Storage Solutions for Decentralized Online Social Networks Anwitaman Datta S* Aspects of Networked & Distributed Systems (SANDS) School of Computer Engineering NTU Singapore iSocial Summer School, KTH Stockholm Research @


slide-1
SLIDE 1

Data Storage Solutions for Decentralized Online Social Networks

— Anwitaman Datta

S* Aspects of Networked & Distributed Systems (SANDS) School of Computer Engineering
 NTU Singapore

iSocial Summer School, KTH Stockholm

slide-2
SLIDE 2

Research @ SANDS

codes&for& storage& & trust& models& & social& network& analysis& secure/privacy& preserved&computa7on& primi7ves& networked&distributed&storage&& &&data&management&systems& distributed&key:value&stores& P2P/F2F& storage& systems& data:center& design& & privacy&aware/preserved&data& aggrega7on,&storage,&sharing&& &&analy7cs/data:mining& data/computa7on&&at&& 3rd&party/outsourced& decentralized&online&social& networking&and&collabora7on& & recommenda7on&and& decision&support&systems& &

Founda'onal) (Distributed)))Systems) Applica'ons)

slide-3
SLIDE 3

DOSNish research at SANDS

slide-4
SLIDE 4

DOSNish research at SANDS

Selective information dissemination using social links GoDisco

slide-5
SLIDE 5

DOSNish research at SANDS

Selective information dissemination using social links GoDisco Security issues Access control, Private Information Retrieval, …

slide-6
SLIDE 6

DOSNish research at SANDS

Selective information dissemination using social links GoDisco Security issues Access control, Private Information Retrieval, … DOSN architectures PeerSoN, SuperNova, PriSM, …

slide-7
SLIDE 7

DOSNish research at SANDS

Selective information dissemination using social links GoDisco Security issues Access control, Private Information Retrieval, … DOSN architectures PeerSoN, SuperNova, PriSM, … P2P storage

slide-8
SLIDE 8

DOSNish research at SANDS

Selective information dissemination using social links GoDisco Security issues Access control, Private Information Retrieval, … DOSN architectures PeerSoN, SuperNova, PriSM, … P2P storage

h"p://sands.sce.ntu.edu.sg/0

slide-9
SLIDE 9

P2P Storage

Not the same as a file-sharing system Peer-to-Peer (P2P) storage systems leverage the combined storage capacity of a network of storage devices (peers) contributed typically by autonomous end-users as a common pool of storage space to store content reliably.

slide-10
SLIDE 10

P2P Storage

slide-11
SLIDE 11

P2P Storage

Design space

slide-12
SLIDE 12

P2P Storage

Design space Reliability: Availability & Durability (focus of this talk)

slide-13
SLIDE 13

P2P Storage

Design space Reliability: Availability & Durability (focus of this talk) Security & Privacy: Access control, integrity, free- riding, anonymity, privacy, …

slide-14
SLIDE 14

P2P Storage

Design space Reliability: Availability & Durability (focus of this talk) Security & Privacy: Access control, integrity, free- riding, anonymity, privacy, … Sophisticated functionalities: Concurrency, Version Control, …

slide-15
SLIDE 15

Realizing Reliability

Proactive Eager: Repair all Lazy: Deterministic

(Threshold based)

Lazy: Randomized Reactive

Maintenance strategies

Redundancy type

Replication New codes, e.g. self-repairing codes Erasure codes Key based (e.g., DHTs) Selective (e.g., at friends or trusted nodes, history or proximity based, etc.) Random

Placement

Garbage collection

Diversity of

  • nline fragments

Duplicates of same fragment

P2P#storage# design#space#

slide-16
SLIDE 16

Redundancy Type

slide-17
SLIDE 17

Redundancy Type

Replication

slide-18
SLIDE 18

Redundancy Type

Replication Erasure codes

Data = Object

Encoding

k blocks

O1 O2 Ok B2 B1 Bn

n encoded blocks

(stored in storage devices in a network)

… …

Lost blocks

Retrieve any k’ (≥ k) blocks Original k blocks

Reconstruct Data

O1 O2 Ok Decoding Bl

slide-19
SLIDE 19

Redundancy placement

slide-20
SLIDE 20

Redundancy placement

A rather complicated problem All peers are fully cooperative and altruistic, but autonomous System capacity and resource allocation …

  • Heterogeneity, …

Coverage: history/prediction/…

slide-21
SLIDE 21

Redundancy placement

A rather complicated problem All peers are fully cooperative and altruistic, but autonomous System capacity and resource allocation …

  • Heterogeneity, …

Coverage: history/prediction/… Selfish/Byzantine peers: Incentives, trust, enforcement, …

slide-22
SLIDE 22

Redundancy placement

A rather complicated problem All peers are fully cooperative and altruistic, but autonomous System capacity and resource allocation …

  • Heterogeneity, …

Coverage: history/prediction/… Selfish/Byzantine peers: Incentives, trust, enforcement, … Security & privacy implications of data placement …

slide-23
SLIDE 23

Classical P2P storage systems

DHT$ID$space$ Successor$list$ replicas)

slide-24
SLIDE 24

Classical P2P storage systems

Distributed Hash Table (DHT) determines storage placement, e.g., CFS/ OpenDHT

DHT$ID$space$ Successor$list$ replicas)

slide-25
SLIDE 25

Classical P2P storage systems

Distributed Hash Table (DHT) determines storage placement, e.g., CFS/ OpenDHT Pros: Simple design, ease of locating data

DHT$ID$space$ Successor$list$ replicas)

slide-26
SLIDE 26

Classical P2P storage systems

Distributed Hash Table (DHT) determines storage placement, e.g., CFS/ OpenDHT Pros: Simple design, ease of locating data Cons: mixes indexing with storage

DHT$ID$space$ Successor$list$ replicas)

slide-27
SLIDE 27

Classical P2P storage systems

Distributed Hash Table (DHT) determines storage placement, e.g., CFS/ OpenDHT Pros: Simple design, ease of locating data Cons: mixes indexing with storage high correlation of failures

DHT$ID$space$ Successor$list$ replicas)

slide-28
SLIDE 28

Classical P2P storage systems

Distributed Hash Table (DHT) determines storage placement, e.g., CFS/ OpenDHT Pros: Simple design, ease of locating data Cons: mixes indexing with storage high correlation of failures cannot leverage other 
 characteristics

  • e.g., locality, history, etc.

DHT$ID$space$ Successor$list$ replicas)

slide-29
SLIDE 29

Classical P2P storage systems

Distributed Hash Table (DHT) determines storage placement, e.g., CFS/ OpenDHT Pros: Simple design, ease of locating data Cons: mixes indexing with storage high correlation of failures cannot leverage other 
 characteristics

  • e.g., locality, history, etc.

may lead to poor performance

  • access latency, repair cost, …

DHT$ID$space$ Successor$list$ replicas)

slide-30
SLIDE 30

Classical P2P storage systems

DHT$ID$space$ S u c c e s s

  • r

$ l i s t $ pointers)to)) replicas)

slide-31
SLIDE 31

Classical P2P storage systems

Distributed Hash Table (DHT) as a directory, e.g., TotalRecall

DHT$ID$space$ S u c c e s s

  • r

$ l i s t $ pointers)to)) replicas)

slide-32
SLIDE 32

Classical P2P storage systems

Distributed Hash Table (DHT) as a directory, e.g., TotalRecall Pros: Flexible placement policy

DHT$ID$space$ S u c c e s s

  • r

$ l i s t $ pointers)to)) replicas)

slide-33
SLIDE 33

Classical P2P storage systems

Distributed Hash Table (DHT) as a directory, e.g., TotalRecall Pros: Flexible placement policy Cons of TotalRecall, which placed at random: ???

DHT$ID$space$ S u c c e s s

  • r

$ l i s t $ pointers)to)) replicas)

slide-34
SLIDE 34

Source: ¡Google ¡tech ¡talk ¡on ¡Wuala: ¡http://www.youtube.com/watch?v=3xKZ4KGkQY8 ¡

Cloud assisted storage system

slide-35
SLIDE 35

Source: ¡Google ¡tech ¡talk ¡on ¡Wuala: ¡http://www.youtube.com/watch?v=3xKZ4KGkQY8 ¡

Cloud assisted storage system

Hybrid architecture (used previously in Wuala)

slide-36
SLIDE 36

Source: ¡Google ¡tech ¡talk ¡on ¡Wuala: ¡http://www.youtube.com/watch?v=3xKZ4KGkQY8 ¡

DHT Superpeers

Cloud assisted storage system

Hybrid architecture (used previously in Wuala)

slide-37
SLIDE 37

Source: ¡Google ¡tech ¡talk ¡on ¡Wuala: ¡http://www.youtube.com/watch?v=3xKZ4KGkQY8 ¡

DHT Users Superpeers

Cloud assisted storage system

Hybrid architecture (used previously in Wuala)

slide-38
SLIDE 38

Source: ¡Google ¡tech ¡talk ¡on ¡Wuala: ¡http://www.youtube.com/watch?v=3xKZ4KGkQY8 ¡

DHT Storage ¡peers Users Superpeers

Cloud assisted storage system

Hybrid architecture (used previously in Wuala)

slide-39
SLIDE 39

Source: ¡Google ¡tech ¡talk ¡on ¡Wuala: ¡http://www.youtube.com/watch?v=3xKZ4KGkQY8 ¡

DHT Storage ¡peers Users G E T Superpeers

Cloud assisted storage system

Hybrid architecture (used previously in Wuala)

slide-40
SLIDE 40

Source: ¡Google ¡tech ¡talk ¡on ¡Wuala: ¡http://www.youtube.com/watch?v=3xKZ4KGkQY8 ¡

DHT Storage ¡peers Users G E T Routing Superpeers

Cloud assisted storage system

Hybrid architecture (used previously in Wuala)

slide-41
SLIDE 41

Source: ¡Google ¡tech ¡talk ¡on ¡Wuala: ¡http://www.youtube.com/watch?v=3xKZ4KGkQY8 ¡

DHT Storage ¡peers Users G E T Routing Superpeers

Cloud assisted storage system

Hybrid architecture (used previously in Wuala)

slide-42
SLIDE 42

Source: ¡Google ¡tech ¡talk ¡on ¡Wuala: ¡http://www.youtube.com/watch?v=3xKZ4KGkQY8 ¡

DHT Storage ¡peers Users G E T Routing Superpeers

Cloud assisted storage system

Hybrid architecture (used previously in Wuala)

slide-43
SLIDE 43

Source: ¡Google ¡tech ¡talk ¡on ¡Wuala: ¡http://www.youtube.com/watch?v=3xKZ4KGkQY8 ¡

DHT Storage ¡peers Users G E T Routing Superpeers

Cloud assisted storage system

Hybrid architecture (used previously in Wuala)

slide-44
SLIDE 44

Source: ¡Google ¡tech ¡talk ¡on ¡Wuala: ¡http://www.youtube.com/watch?v=3xKZ4KGkQY8 ¡

DHT Storage ¡peers Wuala’s ¡dedicated ¡ storage ¡data ¡center ¡ as ¡fallback Users G E T Routing Superpeers

Cloud assisted storage system

Hybrid architecture (used previously in Wuala)

slide-45
SLIDE 45

Source: ¡Google ¡tech ¡talk ¡on ¡Wuala: ¡http://www.youtube.com/watch?v=3xKZ4KGkQY8 ¡

DHT Storage ¡peers Wuala’s ¡dedicated ¡ storage ¡data ¡center ¡ as ¡fallback Users G E T Routing Superpeers

Cloud assisted storage system

Hybrid architecture (used previously in Wuala) Index independent of storage

slide-46
SLIDE 46

Source: ¡Google ¡tech ¡talk ¡on ¡Wuala: ¡http://www.youtube.com/watch?v=3xKZ4KGkQY8 ¡

DHT Storage ¡peers Wuala’s ¡dedicated ¡ storage ¡data ¡center ¡ as ¡fallback Users G E T Routing Superpeers

Cloud assisted storage system

Hybrid architecture (used previously in Wuala) Index independent of storage Many fragments per object

slide-47
SLIDE 47

Source: ¡Google ¡tech ¡talk ¡on ¡Wuala: ¡http://www.youtube.com/watch?v=3xKZ4KGkQY8 ¡

DHT Storage ¡peers Wuala’s ¡dedicated ¡ storage ¡data ¡center ¡ as ¡fallback Users G E T Routing Superpeers

Cloud assisted storage system

Hybrid architecture (used previously in Wuala) Index independent of storage Many fragments per object Suitable for sharing very large but static files

slide-48
SLIDE 48

Source: ¡Google ¡tech ¡talk ¡on ¡Wuala: ¡http://www.youtube.com/watch?v=3xKZ4KGkQY8 ¡

DHT Storage ¡peers Wuala’s ¡dedicated ¡ storage ¡data ¡center ¡ as ¡fallback Users G E T Routing Superpeers

Cloud assisted storage system

Hybrid architecture (used previously in Wuala) Index independent of storage Many fragments per object Suitable for sharing very large but static files Parallel download

slide-49
SLIDE 49

Source: ¡Google ¡tech ¡talk ¡on ¡Wuala: ¡http://www.youtube.com/watch?v=3xKZ4KGkQY8 ¡

DHT Storage ¡peers Wuala’s ¡dedicated ¡ storage ¡data ¡center ¡ as ¡fallback Users G E T Routing Superpeers

Cloud assisted storage system

Hybrid architecture (used previously in Wuala) Index independent of storage Many fragments per object Suitable for sharing very large but static files Parallel download Piggy-backed, large DHT routing states

slide-50
SLIDE 50

Source: ¡Google ¡tech ¡talk ¡on ¡Wuala: ¡http://www.youtube.com/watch?v=3xKZ4KGkQY8 ¡

DHT Storage ¡peers Wuala’s ¡dedicated ¡ storage ¡data ¡center ¡ as ¡fallback Users G E T Routing Superpeers

Cloud assisted storage system

Hybrid architecture (used previously in Wuala) Index independent of storage Many fragments per object Suitable for sharing very large but static files Parallel download Piggy-backed, large DHT routing states So very few hops needed, gives high through-put

slide-51
SLIDE 51

More sophisticated heuristics

slide-52
SLIDE 52

More sophisticated heuristics

Incentives reciprocity, trust/reputation, …

slide-53
SLIDE 53

More sophisticated heuristics

Incentives reciprocity, trust/reputation, … QoS: 24/7 coverage, locality, …

  • nline/offline behavior (history/prediction), …
slide-54
SLIDE 54

More sophisticated heuristics

Incentives reciprocity, trust/reputation, … QoS: 24/7 coverage, locality, …

  • nline/offline behavior (history/prediction), …

Control De/centralized, local/global knowledge

slide-55
SLIDE 55

Replication model: A clique of replicas storing each other’s data (reciprocity) Explores both centralized and decentralized settings for clique formation Challenge Centralized matching - right set of peers to optimize storage capacity utilization (proven NP-hard) Decentralized matching - uses an underlying gossip algorithm (T-man) to explore partners

Replica Placement in P2P Storage: Complexity and Game Theoretic Analyses

Rzadca et al, ICDCS 2010

slide-56
SLIDE 56

Representative result

(simulations with artificial data)

← − worse better − → ← − better worse − → 10−8 10−6 10−4 10−2 100 estimated data unavailability 0.2 0.4 0.6 0.8 1 peer availability 0k 10k 20k 30k 40k 50k number of peers in bucket (histogram) peers (histogram) random equitable subgame perfect

  • Fig. 2.

Peers’ expected data unavailability as a function of their availability in random, equitable and subgame perfect assignment. Histogram shows the number of peers in each availability bucket.

slide-57
SLIDE 57

Representative result

(simulations with artificial data)

← − worse better − → ← − better worse − → 10−8 10−6 10−4 10−2 100 estimated data unavailability 0.2 0.4 0.6 0.8 1 peer availability 0k 10k 20k 30k 40k 50k number of peers in bucket (histogram) peers (histogram) random equitable subgame perfect

  • Fig. 2.

Peers’ expected data unavailability as a function of their availability in random, equitable and subgame perfect assignment. Histogram shows the number of peers in each availability bucket. Good or bad?

slide-58
SLIDE 58

How about F2F storage?

slide-59
SLIDE 59

How about F2F storage?

Friend-to-Friend instead of Peer-to-Peer

slide-60
SLIDE 60

How about F2F storage?

Friend-to-Friend instead of Peer-to-Peer Translating “real life” trust into something useful for reliable “system” design

slide-61
SLIDE 61

How about F2F storage?

Friend-to-Friend instead of Peer-to-Peer Translating “real life” trust into something useful for reliable “system” design Maps naturally to the overlying social application

slide-62
SLIDE 62

How about F2F storage?

Friend-to-Friend instead of Peer-to-Peer Translating “real life” trust into something useful for reliable “system” design Maps naturally to the overlying social application Anecdotal note: SafeBook used Friend-of-Friends for access control also

slide-63
SLIDE 63

Place data at friends: That’s it?

Store at all friends (naïve/baseline) Best one can do in terms of achieving highest possible availability Very high overheads! Storage Maintenance

slide-64
SLIDE 64

Place data at friends: That’s it?

Store at all friends (naïve/baseline) Best one can do in terms of achieving highest possible availability Very high overheads! Storage Maintenance

Find instead a “reasonable” subset of friends to store at!

slide-65
SLIDE 65

An empirical study of availability in friend-to-friend storage systems

Sharma et al, P2P 2011

slide-66
SLIDE 66

Look at the temporal online/offline behavior of friends

An empirical study of availability in friend-to-friend storage systems

Sharma et al, P2P 2011

slide-67
SLIDE 67

Look at the temporal online/offline behavior of friends Achievable coverage What best availability can be achieved?

An empirical study of availability in friend-to-friend storage systems

Sharma et al, P2P 2011

slide-68
SLIDE 68

Look at the temporal online/offline behavior of friends Achievable coverage What best availability can be achieved? Criticality of friends Which friends are indispensable?

An empirical study of availability in friend-to-friend storage systems

Sharma et al, P2P 2011

slide-69
SLIDE 69

Evaluation

Data set Italian instant messenger service Pros

  • Social+Temporal characterisitcs
  • “May” reasonably reflect the online/offline behavior

Cons:

  • Not a p2p storage system trace
  • “small”, “incomplete” and “geographically localized”
slide-70
SLIDE 70

Evaluation

Data set Italian instant messenger service Pros

  • Social+Temporal characterisitcs
  • “May” reasonably reflect the online/offline behavior

Cons:

  • Not a p2p storage system trace
  • “small”, “incomplete” and “geographically localized”

! 3436$nodes$

  • 848$nodes$in$the$largest$component$

" Note$that$many$nodes$had$“neighbors”$in$other$ servers,$for$whom$we$did$not$have$info.$ " Between$1A18$neighbors$

! Use$two$weeks$of$data$

  • One$for$“learning”,$one$for$evaluaFon$

" Time$of$day,$day$of$week$effects$

slide-71
SLIDE 71

Representative results

slide-72
SLIDE 72

Representative results

! AC:$achievable$coverage$

  • 50%$nodes$can$get$more$than$90%$availability$

! Crit:$Time$covered$using$cri<cal$nodes$

  • Too$much$dependence$on$cri<cal$nodes$
slide-73
SLIDE 73

Representative results

! AC:$achievable$coverage$

  • 50%$nodes$can$get$more$than$90%$availability$

! Crit:$Time$covered$using$cri<cal$nodes$

  • Too$much$dependence$on$cri<cal$nodes$

! !<Achievable!coverage,!Degree!of!Cri3cality,!#!of!Friends>!

slide-74
SLIDE 74

Representative results

! AC:$achievable$coverage$

  • 50%$nodes$can$get$more$than$90%$availability$

! Crit:$Time$covered$using$cri<cal$nodes$

  • Too$much$dependence$on$cri<cal$nodes$

! !<Achievable!coverage,!Degree!of!Cri3cality,!#!of!Friends>!

If there are “enough” friends, (>10), ought to be okay! (assuming storage capacity is not an issue)

slide-75
SLIDE 75

Bootstrapping pangs!

New peers with few friends in the system, or no reputation of being highly available, will find it difficult to get started! Game-theoretic study on reciprocity based P2P cliques Analysis of ego-centric networks for F2F storage

slide-76
SLIDE 76

SuperNova: Super-peers Based Architecture for Decentralized Online Social Networks

Sharma et al, Comsnets 2012

slide-77
SLIDE 77

The big picture/premise

SuperNova: Super-peers Based Architecture for Decentralized Online Social Networks

Sharma et al, Comsnets 2012

slide-78
SLIDE 78

The big picture/premise Well resourced nodes act as super-peers incentives (could be): reputation within an interest community, ability to monetize (e.g., using ads), …

SuperNova: Super-peers Based Architecture for Decentralized Online Social Networks

Sharma et al, Comsnets 2012

slide-79
SLIDE 79

The big picture/premise Well resourced nodes act as super-peers incentives (could be): reputation within an interest community, ability to monetize (e.g., using ads), … New nodes use superpeers for storage, until they get established in the system so that the super-peers are not over-burdened, or become a bottleneck for established peers, …

SuperNova: Super-peers Based Architecture for Decentralized Online Social Networks

Sharma et al, Comsnets 2012

slide-80
SLIDE 80

The big picture/premise Well resourced nodes act as super-peers incentives (could be): reputation within an interest community, ability to monetize (e.g., using ads), … New nodes use superpeers for storage, until they get established in the system so that the super-peers are not over-burdened, or become a bottleneck for established peers, … Superpeers help coordinating, finding storage partners, etc.

SuperNova: Super-peers Based Architecture for Decentralized Online Social Networks

Sharma et al, Comsnets 2012

slide-81
SLIDE 81

Representative result

(c) System Performance

  • Fig. 2. Comparison for Friend’s Time (FT) and Total Time (TT) for Deviation (D) and

NonDeviation (ND) Take with a huge pinch of salt: artificial data to drive simulations, with too many parameters …

slide-82
SLIDE 82

Moving forward

dynamic/social data store

High availability High consistency High rate of data updates Small volume of data

Security modules

Encryption access control …

Social modules

Analytics Search/Navigation Recommendation …

Bulk (static) data storage

Full-fledged (D)OSN

Light weight P2P OSN P2P overlay with basic services: DHT lookup, peer-sampling, etc. Could be even (multi-)cloud based. Can be a small dynamic clique maintained aggressively

slide-83
SLIDE 83