Beyond Replicated Storage: Eventually-Consistent Distributed Data - - PowerPoint PPT Presentation

beyond replicated storage eventually consistent
SMART_READER_LITE
LIVE PREVIEW

Beyond Replicated Storage: Eventually-Consistent Distributed Data - - PowerPoint PPT Presentation

Beyond Replicated Storage: Eventually-Consistent Distributed Data Structures Konrad Iwanicki University of Warsaw PaPEC 2014, Amsterdam, the Netherlands, April 13th, 2014 What we do Extreme distributed systems What we do Extreme


slide-1
SLIDE 1

Beyond Replicated Storage: Eventually-Consistent Distributed Data Structures

Konrad Iwanicki University of Warsaw

PaPEC 2014, Amsterdam, the Netherlands, April 13th, 2014

slide-2
SLIDE 2

What we do

  • Extreme distributed systems
slide-3
SLIDE 3

What we do

  • Extreme distributed systems
  • Based on wireless sensor networks
slide-4
SLIDE 4

What we do

  • Extreme distributed systems
  • Based on wireless sensor networks:

– Hundreds or even thousands of nodes in a network – A single node is severely constrained in resources

slide-5
SLIDE 5

Extreme distributed systems

  • More complex than sense-and-send:
  • Sensing
  • Analyzing and deciding
  • Actuating
slide-6
SLIDE 6

Extreme distributed systems

  • More complex than sense-and-send:
  • Sensing
  • Analyzing and deciding
  • Actuating

collaborative

slide-7
SLIDE 7

Extreme distributed systems

  • More complex than sense-and-send:
  • Sensing
  • Analyzing and deciding
  • Actuating
  • Subject to various challenges:
  • Resource constraints
  • Unreliable communication
  • Interactions with the surrounding environment

collaborative

slide-8
SLIDE 8

Extreme distributed systems

  • More complex than sense-and-send:
  • Sensing
  • Analyzing and deciding
  • Actuating
  • Subject to various challenges:
  • Resource constraints
  • Unreliable communication
  • Interactions with the surrounding environment
  • Distributed algorithms are increasingly complex

➔ e.g., employ specific organizations.

collaborative

slide-9
SLIDE 9

Cluster hierarchy

Wireless connectivity between nodes.

slide-10
SLIDE 10

Cluster hierarchy

P L B Q G K C R E D J H F O M N A

Each node forms a level-0 cluster

  • f which it becomes the head.
slide-11
SLIDE 11

Cluster hierarchy

P.L L.L I.H B.G Q.G G.G K.G C.G R.G E.G D.D J.D H.H F.H O.H M.H N.H A.L

Proximate level-0 clusters are grouped into level-1 clusters.

slide-12
SLIDE 12

Cluster hierarchy

P.L.G L.L.G I.H.G B.G.G Q.G.G G.G.G K.G.G C.G.G R.G.G E.G.G D.D.G J.D.G H.H.G F.H.G O.H.G M.H.G N.H.G A.L.G

And so on at higher levels, typically until a single cluster remains.

slide-13
SLIDE 13

Cluster hierarchy

P.L.G L.L.G I.H.G B.G.G Q.G.G G.G.G K.G.G C.G.G R.G.G E.G.G D.D.G J.D.G H.H.G F.H.G O.H.G M.H.G N.H.G A.L.G

The membership of a node in the hierarchy is reflected in the node’s label.

slide-14
SLIDE 14

Cluster hierarchy

P.L.G L.L.G I.H.G B.G.G Q.G.G G.G.G K.G.G C.G.G R.G.G E.G.G D.D.G J.D.G H.H.G F.H.G O.H.G M.H.G N.H.G A.L.G

Each node also maintains information for each sibling cluster in the hierarchy (e.g., a routing entry).

slide-15
SLIDE 15

The problem

  • Using the hierarchy is relatively easy:
  • Routing
  • Aggregation
  • In-network storage
  • Maintaining it is a different story.
  • Connectivity changes
  • Node failures & arrivals
  • Nodes should be autonomous
slide-16
SLIDE 16

General scheme – gossiping

  • Each node maintains its state which it occasionally

updates.

  • For communication:
  • Each node operates in rounds.
  • In each round, it broadcasts its state to its neighbors.
  • It also receives the neighbors' states, which it merges

with its own one.

time a round Tx Rx

slide-17
SLIDE 17

Observation

  • Gossiping can be efficient
slide-18
SLIDE 18

Observation

  • Gossiping can be efficient, but...
  • … it makes it difficult to control when a bit of

information reaches a particular node.

slide-19
SLIDE 19

Observation

  • Gossiping can be efficient, but...
  • … it makes it difficult to control when a bit of

information reaches a particular node.

  • We have:
  • Updates done by nodes
  • Lazy update propagation
slide-20
SLIDE 20

Observation

  • Gossiping can be efficient, but...
  • … it makes it difficult to control when a bit of

information reaches a particular node.

  • We have:
  • Updates done by nodes
  • Lazy update propagation

Resemblance to eventually-consistent replicated storage systems.

slide-21
SLIDE 21

Observation

  • Gossiping can be efficient, but...
  • … it makes it difficult to control when a bit of

information reaches a particular node.

  • We have:
  • Updates done by nodes
  • Lazy update propagation

Resemblance to eventually-consistent replicated storage systems.

Let's thus take a look at the problem of cluster hierarchy maintenance from the eventual consistency perspective.

slide-22
SLIDE 22

EC perspective

  • We treat the cluster hierarchy as a distributed

structure.

  • The sate of each node
  • label
  • routing table

is a part of this structure.

  • Each node can autonomously update its local state:
  • Locally altering the structure.
  • The updates propagate through gossiping:
  • Eventually the structure becomes consistent globally.
slide-23
SLIDE 23

EC perspective

P.L.G L.L.G I.H.G B.G.G Q.G.G G.G.G K.G.G C.G.G R.G.G E.G.G D.D.G J.D.G H.H.G F.H.G O.H.G M.H.G N.H.G A.L.G

Consider node labels.

slide-24
SLIDE 24

EC perspective

A L P B C E G K Q R D J F H I M N O L G D H G

They can be viewed as a distributed tree.

slide-25
SLIDE 25

EC perspective

  • What is different from the “traditional” model?
  • The state of each node is not a replica.
  • On the contrary:

– Some of its parts are unique. – Some are replicated at other nodes (to a varying degree).

  • On the global scale the states of all nodes should

form a coherent structure.

slide-26
SLIDE 26

EC perspective

A L P B C E G K Q R D J F H I M N O L G D H G

Logical view:

slide-27
SLIDE 27

EC perspective

A L P B C E G K Q R D J F H I M N O L L L G G G A L P G G G G G G G G G G G G G G B C E G K Q R D D G G H H H H H H G G G G G G D J F H I M N O

Physical view:

slide-28
SLIDE 28

EC perspective

A L P B C E G K Q R D J F H I M N O L L L G G G A L P G G G G G G G G G G G G G G B C E G K Q R D D G G H H H H H H G G G G G G D J F H I M N O

unique information

Physical view:

slide-29
SLIDE 29

EC perspective

A L P B C E G K Q R D J F H I M N O L L L G G G A L P G G G G G G G G G G G G G G B C E G K Q R D D G G H H H H H H G G G G G G D J F H I M N O

unique information replicated information

Physical view:

slide-30
SLIDE 30

EC perspective

A L P B C E G K Q R D J F H I M N O L L L G G G A L P G G G G G G G G G G G G G G B C E G K Q R D D G G H H H H H H G G G G G G D J F H I M N O

unique information replicated information When this changes at one node, the other nodes must update their state accordingly

Physical view:

slide-31
SLIDE 31

EC perspective

A L P B C E G K Q R D J F H I M N O L L L G G G A L P G G G G G G G G G G G G G G B C E G K Q R D D G G H H H H H H G G G G G G D J F H I M N O

unique information replicated information When this changes at one node, the other nodes must update their state accordingly, but

  • Updates can be concurrent
  • They are often not independent
  • Propagate lazily
  • (Think of also about all the limitations of the nodes)

Physical view:

slide-32
SLIDE 32

EC-related challenges

  • How to decide that a given piece of the

distributed structure should be updated?

  • How such updates should be performed and

which node(s) should do them?

  • How can other nodes detect and merge the

updates to their corresponding pieces of the distributed structure?

  • (How to do this under constrained resources?)
slide-33
SLIDE 33

Our solution

  • Details, for instance, in:
  • K. Iwanicki and M. van Steen. “Gossip-based self-

management of a recursive area hierarchy for large wireless sensornets.” IEEE Transactions on Parallel and Distributed Systems, 21(4):562–576, April 2010.

  • r
  • K. Iwanicki. “Hierarchical Routing in Low-Power Wireless

Networks.” PhD thesis, Vrije Universiteit Amsterdam, Amsterdam, the Netherlands, June 2010.

slide-34
SLIDE 34

Our solution – overview

  • Formalize the properties of a hierarchy as invariants, e.g.:
  • 1. Level-0 clusters correspond to individual nodes.
  • 2. There exists a single top-level cluster.
  • 3. Level-i+1 clusters are composed out of level-i clusters.
  • 4. Each level-i+1 cluster has a central subcluster that is

adjacent to all other subclusters of the cluster.

  • Maintaining a hierarchy = detecting and eliminating

violations of the invariants.

slide-35
SLIDE 35

Our solution – overview

  • The invariants are global

➔ Have to maintained collaboratively by the nodes.

  • A node's state is local

➔ Each node is concerned only with those invariants

that are relevant to its part of the distributed structure.

  • Eliminating a violation:

= local update operation

  • Propagating the update:

= eventually-consistent gossiping.

slide-36
SLIDE 36

Our solution - example

X Y Y Ci

X

Ci

Y

X Y Y Ci

X

Ci

Y

X Y Y Ci

X

Ci

Y

X Y Y Ci

X

Ci

Y

Local operations for maintaining labels. label extension label cut

slide-37
SLIDE 37

Our solution - example

  • Our label:
  • P.L.G
slide-38
SLIDE 38

Our solution - example

  • Our label:
  • P.L.G
  • The label received in a gossip message:
  • A.L.D
slide-39
SLIDE 39

Our solution - example

  • Our label:
  • P.L.G
  • The label received in a gossip message:
  • A.L.D
  • What should we do?
slide-40
SLIDE 40

Our solution - example

  • Our label:
  • P.L.G
  • The label received in a gossip message:
  • A.L.D
  • What should we do:
  • leave our label as is, or
  • change it to P.L.D?
slide-41
SLIDE 41

Take-home message

  • Eventual consistency can offer lots of benefits

to extreme distributed systems.

  • Distributed data structures appear also in other

fields.

  • Eventually-consistent distributed data structures

are poorly understood.

slide-42
SLIDE 42

Thank You

Questions?

Supported by the (Polish) National Science Centre (NCN) within the SONATA programme under grant no. DEC-2012/05/D/ST6/03582.