Experiences in Building and Operating ePOST, a Reliable Peer-to-Peer - - PowerPoint PPT Presentation

experiences in building and operating epost a reliable
SMART_READER_LITE
LIVE PREVIEW

Experiences in Building and Operating ePOST, a Reliable Peer-to-Peer - - PowerPoint PPT Presentation

Experiences in Building and Operating ePOST, a Reliable Peer-to-Peer Application Alan Mislove Ansley Post Andreas Haeberlen Peter Druschel Max Planck Institute for Software Systems Rice University 1 Reliable P2P


slide-1
SLIDE 1

Experiences in Building and Operating ePOST, a Reliable Peer-to-Peer Application

Alan Mislove†‡ Ansley Post†‡ Andreas Haeberlen†‡ Peter Druschel†

†Max Planck Institute for Software Systems

‡Rice University

1

slide-2
SLIDE 2

19.04.2006 EuroSys’06 Conference, Leuven, Belgium

Reliable P2P Systems: Myth or Reality?

  • For the past few years, much research interest in p2p
  • Highly scalable in nodes and data
  • Utilization of underused resources
  • Robust to large range of workloads and failures
  • Most deployed systems are not reliable [Kazaa, Skype, etc]
  • None attempt to store data reliably, durably, or securely
  • Lead some to conclude p2p can’t support reliable applications
  • Question: Can peer-to-peer systems provide reliable service?

2

2

slide-3
SLIDE 3

19.04.2006 EuroSys’06 Conference, Leuven, Belgium

Demonstration Application: ePOST

  • ePOST is an email service built using decentralized components
  • Completely decentralized, no ‘email servers’
  • Email one of the most important Internet applications
  • Privacy
  • Integrity
  • Durability
  • Availability
  • Wanted to develop system to a point where people rely on it

3

3

slide-4
SLIDE 4

19.04.2006 EuroSys’06 Conference, Leuven, Belgium

ePOST: Deployment

  • Built and deployed ePOST within our group
  • Running for over 2 years
  • Processed well over 500,000 email messages
  • Built ePOST to be more reliable than existing email systems
  • 16 users used ePOST as primary email
  • Even my advisor!
  • Many challenges found by building the system
  • After challenges solved, provides reliable service
  • Robust; numerous times ePOST was only mail service working

4

4

slide-5
SLIDE 5

19.04.2006 EuroSys’06 Conference, Leuven, Belgium

Rest of Talk

  • ePOST in detail
  • Challenges faced in building and deploying ePOST
  • Conclusion

5

5

slide-6
SLIDE 6

19.04.2006 EuroSys’06 Conference, Leuven, Belgium

IMAP SMTP POP3 SMTP IMAP SMTP IMAP SMTP

ePOST: Architecture

  • Each participating node runs mail

servers for the local user

  • Email service looks the same to users
  • Data stored cooperatively on

participating machines

  • Machines form overlay
  • Replicated for redundancy
  • All data encrypted and signed
  • Prevents others from reading your

email

6

Node Email

6

slide-7
SLIDE 7

19.04.2006 EuroSys’06 Conference, Leuven, Belgium

ePOST: Architecture

  • Each participating node runs mail

servers for the local user

  • Email service looks the same to users
  • Data stored cooperatively on

participating machines

  • Machines form overlay
  • Replicated for redundancy
  • All data encrypted and signed
  • Prevents others from reading your

email

6

Node Email

6

slide-8
SLIDE 8

19.04.2006 EuroSys’06 Conference, Leuven, Belgium

ePOST: Metadata Storage

  • Folders represented using logs
  • Entries represent changes
  • All entries self-authenticating
  • Log head points to most recent entry
  • Signed by owner due to mutability
  • Only local node has key material
  • All writes performed by owner
  • Map multi-access problem to single-

writer

7

Log Head Log Entry

Add Email #3 Delete Email #2 Mark #2 Read Add Email #2

7

slide-9
SLIDE 9

19.04.2006 EuroSys’06 Conference, Leuven, Belgium

ePOST: Metadata Storage

  • Folders represented using logs
  • Entries represent changes
  • All entries self-authenticating
  • Log head points to most recent entry
  • Signed by owner due to mutability
  • Only local node has key material
  • All writes performed by owner
  • Map multi-access problem to single-

writer

7

Log Head Log Entry

Add Email #3 Delete Email #2 Mark #2 Read Add Email #2 Add Email #4

7

slide-10
SLIDE 10

19.04.2006 EuroSys’06 Conference, Leuven, Belgium

ePOST: Metadata Storage

  • Folders represented using logs
  • Entries represent changes
  • All entries self-authenticating
  • Log head points to most recent entry
  • Signed by owner due to mutability
  • Only local node has key material
  • All writes performed by owner
  • Map multi-access problem to single-

writer

7

Log Head Log Entry

Add Email #3 Delete Email #2 Mark #2 Read Add Email #2 Add Email #4

7

slide-11
SLIDE 11

19.04.2006 EuroSys’06 Conference, Leuven, Belgium

Challenges Faced

8

8

slide-12
SLIDE 12

19.04.2006 EuroSys’06 Conference, Leuven, Belgium

Challenges Faced

  • Network partitions
  • NATs and firewalls
  • Routing anomalies
  • Node churn
  • Correlated failures
  • Resource consumption
  • Data storage
  • Slow nodes
  • Hidden single points of failure
  • Data corruption
  • Comatose nodes

8

  • Complex failure modes
  • Very unsynchronized clocks
  • Lost key material
  • Disconnected nodes
  • Power failures
  • Resource exhaustion
  • Spam attacks on relays
  • Java eccentricities
  • Congested links
  • PlanetLab slice deletion
  • ...

8

slide-13
SLIDE 13

19.04.2006 EuroSys’06 Conference, Leuven, Belgium

Challenges Faced

  • Network partitions
  • NATs and firewalls
  • Routing anomalies
  • Node churn
  • Correlated failures
  • Resource consumption
  • Data storage
  • Slow nodes
  • Hidden single points of failure
  • Data corruption
  • Comatose nodes

8

  • Complex failure modes
  • Very unsynchronized clocks
  • Lost key material
  • Disconnected nodes
  • Power failures
  • Resource exhaustion
  • Spam attacks on relays
  • Java eccentricities
  • Congested links
  • PlanetLab slice deletion
  • ...
  • Network partitions
  • Routing anomalies
  • Correlated failures
  • Resource consumption
  • Very unsynchronized clocks

8

slide-14
SLIDE 14

19.04.2006 EuroSys’06 Conference, Leuven, Belgium

Challenge: Network Partitions

  • Overlay originally had no special

provisions for network partitions

  • Did not envision partitions as a

significant problem

  • When a network failure occurs, nodes

detect others to be dead

  • Multiple overlays reform
  • Network usually fails at access links
  • Generally one large overlay and one

small overlay

9

Node

9

slide-15
SLIDE 15

19.04.2006 EuroSys’06 Conference, Leuven, Belgium

Challenge: Network Partitions

  • Overlay originally had no special

provisions for network partitions

  • Did not envision partitions as a

significant problem

  • When a network failure occurs, nodes

detect others to be dead

  • Multiple overlays reform
  • Network usually fails at access links
  • Generally one large overlay and one

small overlay

9

Node

9

slide-16
SLIDE 16

19.04.2006 EuroSys’06 Conference, Leuven, Belgium

Challenge: Network Partitions

  • Overlay originally had no special

provisions for network partitions

  • Did not envision partitions as a

significant problem

  • When a network failure occurs, nodes

detect others to be dead

  • Multiple overlays reform
  • Network usually fails at access links
  • Generally one large overlay and one

small overlay

9

Node

9

slide-17
SLIDE 17

19.04.2006 EuroSys’06 Conference, Leuven, Belgium

How frequent are partitions?

  • Partitions occur often in PlanetLab
  • Usually a single subnet (PlanetLab site) becomes partitioned

10

1 2 3 4 5 6 10 20 30 40 50 60 70 80 90 Number of Partitions Time (days)

10

slide-18
SLIDE 18

19.04.2006 EuroSys’06 Conference, Leuven, Belgium

Impact of Network Partitions

  • Tradeoff between consistency and

availability under partitions

  • Well-known tradeoff
  • ePOST resolves this in favor of availability
  • Partitions cause consistency problems
  • Small partitions have data inaccessibility
  • Mutable data can diverge
  • Partitions persist unless action is taken

11

Node Email Log Head

11

slide-19
SLIDE 19

19.04.2006 EuroSys’06 Conference, Leuven, Belgium

Impact of Network Partitions

  • Tradeoff between consistency and

availability under partitions

  • Well-known tradeoff
  • ePOST resolves this in favor of availability
  • Partitions cause consistency problems
  • Small partitions have data inaccessibility
  • Mutable data can diverge
  • Partitions persist unless action is taken

11

?

Node Email Log Head

11

slide-20
SLIDE 20

19.04.2006 EuroSys’06 Conference, Leuven, Belgium

Partitions: Overlay Reintegration

  • To reintegrate overlay
  • Nodes remember recently deceased

nodes

  • Periodically query these nodes, and

integrate missing nodes into overlay

  • Protocol is periodic, and therefore stable
  • Tested on simulated failures as well as

Planetlab

  • Overlay heals as expected

12

Node

12

slide-21
SLIDE 21

19.04.2006 EuroSys’06 Conference, Leuven, Belgium

Partitions: Overlay Reintegration

  • To reintegrate overlay
  • Nodes remember recently deceased

nodes

  • Periodically query these nodes, and

integrate missing nodes into overlay

  • Protocol is periodic, and therefore stable
  • Tested on simulated failures as well as

Planetlab

  • Overlay heals as expected

12

Node

12

slide-22
SLIDE 22

19.04.2006 EuroSys’06 Conference, Leuven, Belgium

Partitions: Data Divergence

  • In ePOST, log-based data structure
  • Forked logs must be merged
  • Data divergence unlikely due to

single-writer behavior

  • To repair logs, merge entries,

cancel destructive operations

  • Ensures no data loss

13

Log Entry

Add Email #3 Delete Email #2 Mark #2 Read Add Email #2

13

slide-23
SLIDE 23

19.04.2006 EuroSys’06 Conference, Leuven, Belgium

Partitions: Data Divergence

  • In ePOST, log-based data structure
  • Forked logs must be merged
  • Data divergence unlikely due to

single-writer behavior

  • To repair logs, merge entries,

cancel destructive operations

  • Ensures no data loss

13

Log Entry

Add Email #3 Delete Email #2 Mark #2 Read Add Email #2 Delete Folder Mark #4 Read Add Email #4

13

slide-24
SLIDE 24

19.04.2006 EuroSys’06 Conference, Leuven, Belgium

Partitions: Data Divergence

  • In ePOST, log-based data structure
  • Forked logs must be merged
  • Data divergence unlikely due to

single-writer behavior

  • To repair logs, merge entries,

cancel destructive operations

  • Ensures no data loss

13

Log Entry

Add Email #3 Delete Email #2 Mark #2 Read Add Email #2 Delete Folder Mark #4 Read Add Email #4

13

slide-25
SLIDE 25

19.04.2006 EuroSys’06 Conference, Leuven, Belgium

Challenge: Routing Anomalies

  • Overlay assumed that any two

participating nodes could communicate

  • Internet routing anomalies (routing

intransitivity) a problem

  • Nodes disagree about the liveness of
  • ther nodes

14

Node A

Node

14

slide-26
SLIDE 26

19.04.2006 EuroSys’06 Conference, Leuven, Belgium

Challenge: Routing Anomalies

  • Overlay assumed that any two

participating nodes could communicate

  • Internet routing anomalies (routing

intransitivity) a problem

  • Nodes disagree about the liveness of
  • ther nodes

14

Node A

Node

x

14

slide-27
SLIDE 27

19.04.2006 EuroSys’06 Conference, Leuven, Belgium

Challenge: Routing Anomalies

  • Overlay assumed that any two

participating nodes could communicate

  • Internet routing anomalies (routing

intransitivity) a problem

  • Nodes disagree about the liveness of
  • ther nodes

14

Node A

Node

x

A is dead A is alive A is alive

14

slide-28
SLIDE 28

19.04.2006 EuroSys’06 Conference, Leuven, Belgium

Effect of Routing Anomalies

  • Routing anomalies cause nodes to

disagree on membership

  • Objects on disputed nodes may be

inaccessible

  • Example: DHT lookup inconsistency
  • Overlay route locates object
  • Direct return path fails
  • Failure is permanent until node churn

creates a new owner

15

Node Email

15

slide-29
SLIDE 29

19.04.2006 EuroSys’06 Conference, Leuven, Belgium

?

Effect of Routing Anomalies

  • Routing anomalies cause nodes to

disagree on membership

  • Objects on disputed nodes may be

inaccessible

  • Example: DHT lookup inconsistency
  • Overlay route locates object
  • Direct return path fails
  • Failure is permanent until node churn

creates a new owner

15

Node Email

? ?

15

slide-30
SLIDE 30

19.04.2006 EuroSys’06 Conference, Leuven, Belgium

?

x

Effect of Routing Anomalies

  • Routing anomalies cause nodes to

disagree on membership

  • Objects on disputed nodes may be

inaccessible

  • Example: DHT lookup inconsistency
  • Overlay route locates object
  • Direct return path fails
  • Failure is permanent until node churn

creates a new owner

15

Node Email

? ?

15

slide-31
SLIDE 31

19.04.2006 EuroSys’06 Conference, Leuven, Belgium

Routing Anomalies: Solution

  • Liveness messages forwarded using

source routing [DSR, IP]

  • Nodes advertise best routes to other

nodes

  • If direct path fails, route through

another node

  • With source routing, we see about 8%

indirect links in PlanetLab ring

16

Node A

Node

x

16

slide-32
SLIDE 32

19.04.2006 EuroSys’06 Conference, Leuven, Belgium

Routing Anomalies: Solution

  • Liveness messages forwarded using

source routing [DSR, IP]

  • Nodes advertise best routes to other

nodes

  • If direct path fails, route through

another node

  • With source routing, we see about 8%

indirect links in PlanetLab ring

16

Node A

Node

x

A is alive A is alive A is alive

16

slide-33
SLIDE 33

19.04.2006 EuroSys’06 Conference, Leuven, Belgium

  • Initially assumed diverse node population
  • Independent failure probability
  • But many sources of correlated failures
  • DNS entries
  • Possible worm attack
  • Can cause data loss
  • Solution: Glacier [NSDI’05]
  • Erasure codes and redundancy to mask failure
  • Survive 60% failure with 10x storage overhead

Challenge: Correlated Failures

17

Node Email Fragment

17

slide-34
SLIDE 34

19.04.2006 EuroSys’06 Conference, Leuven, Belgium

  • Initially assumed diverse node population
  • Independent failure probability
  • But many sources of correlated failures
  • DNS entries
  • Possible worm attack
  • Can cause data loss
  • Solution: Glacier [NSDI’05]
  • Erasure codes and redundancy to mask failure
  • Survive 60% failure with 10x storage overhead

Challenge: Correlated Failures

17

x x x x x x x x x

Node Email Fragment

17

slide-35
SLIDE 35

19.04.2006 EuroSys’06 Conference, Leuven, Belgium

  • Initially assumed diverse node population
  • Independent failure probability
  • But many sources of correlated failures
  • DNS entries
  • Possible worm attack
  • Can cause data loss
  • Solution: Glacier [NSDI’05]
  • Erasure codes and redundancy to mask failure
  • Survive 60% failure with 10x storage overhead

Challenge: Correlated Failures

17

x x x x x x x x x

Node Email Fragment

17

slide-36
SLIDE 36

19.04.2006 EuroSys’06 Conference, Leuven, Belgium

Challenge: Resource Consumption

  • Studied hard drive growth vs. data creation rate
  • Determined sufficient space
  • But did not anticipate spam explosion
  • After 6 months, 75% garbage
  • Sufficient space, but high bandwidth
  • Maintaining replicas of garbage
  • Solution: Lease-based storage
  • Renew useful objects
  • Avoids insecure delete operation

18

18

slide-37
SLIDE 37

19.04.2006 EuroSys’06 Conference, Leuven, Belgium

Challenge: Unsynchronized Clocks

  • Assumed loosely synchronized clocks
  • Error of a few hours
  • Did not hold
  • One user was 2 years behind
  • Caused user’s lease requests to fail
  • Never deleted any stored data
  • Solution: Counter-based leases
  • Do not use absolute time

19

Node Email

19

slide-38
SLIDE 38

19.04.2006 EuroSys’06 Conference, Leuven, Belgium

Challenge: Unsynchronized Clocks

  • Assumed loosely synchronized clocks
  • Error of a few hours
  • Did not hold
  • One user was 2 years behind
  • Caused user’s lease requests to fail
  • Never deleted any stored data
  • Solution: Counter-based leases
  • Do not use absolute time

19

[May 10]

Node Email

19

slide-39
SLIDE 39

19.04.2006 EuroSys’06 Conference, Leuven, Belgium

Challenge: Unsynchronized Clocks

  • Assumed loosely synchronized clocks
  • Error of a few hours
  • Did not hold
  • One user was 2 years behind
  • Caused user’s lease requests to fail
  • Never deleted any stored data
  • Solution: Counter-based leases
  • Do not use absolute time

19

[May 10, 2004]

Node Email

x

19

slide-40
SLIDE 40

19.04.2006 EuroSys’06 Conference, Leuven, Belgium

Challenge: Unsynchronized Clocks

  • Assumed loosely synchronized clocks
  • Error of a few hours
  • Did not hold
  • One user was 2 years behind
  • Caused user’s lease requests to fail
  • Never deleted any stored data
  • Solution: Counter-based leases
  • Do not use absolute time

19

[12 days]

Node Email

19

slide-41
SLIDE 41

19.04.2006 EuroSys’06 Conference, Leuven, Belgium

Conclusion

  • Question: Can peer-to-peer systems build reliable applications?
  • Yes!
  • Built ePOST, a reliable decentralized mail system
  • Many users relied on ePOST for primary mail
  • Many challenges to providing reliable service
  • Network partitions, routing anomalies, ...
  • Challenges and techniques applicable to other systems
  • Human time-scale events, eventual consistency
  • Instant messaging, whiteboards, newsgroups, blogs, ...

20

20

slide-42
SLIDE 42

19.04.2006 EuroSys’06 Conference, Leuven, Belgium

Questions?

21

http://www.epostmail.org

Thanks to all of the ePOST users!

21