Data Management in Distributed Systems Simon Schffner Advisor: - - PowerPoint PPT Presentation

data management in distributed systems
SMART_READER_LITE
LIVE PREVIEW

Data Management in Distributed Systems Simon Schffner Advisor: - - PowerPoint PPT Presentation

Data Management in Distributed Systems Simon Schffner Advisor: Stefan Liebald Technische Universitt Mnchen Fakultt fr Informatik Lehrstuhl fr Netzarchitekturen und Netzdienste Garching, 13. Juli 2018 Distributed Systems Simon


slide-1
SLIDE 1

Data Management in Distributed Systems

Simon Schäffner Advisor: Stefan Liebald Technische Universität München Fakultät für Informatik Lehrstuhl für Netzarchitekturen und Netzdienste Garching, 13. Juli 2018

slide-2
SLIDE 2

Simon Schäffner

Distributed Systems

2

slide-3
SLIDE 3

“A collection of autonomous computing elements that appear to its users as a single coherent system” [1]

Simon Schäffner

Distributed Systems

2

slide-4
SLIDE 4

“A collection of autonomous computing elements that appear to its users as a single coherent system” [1]

Simon Schäffner

Distributed Systems

2

Geographical Dispersion?

slide-5
SLIDE 5

Simon Schäffner

Input / Output for most tasks More complex than reading from / writing to HDD How is data organised? What data is stored on which nodes?

Data Management in Distributed Systems

3

slide-6
SLIDE 6

Simon Schäffner

Scalability

Attributes of Data Management Strategies

4

slide-7
SLIDE 7

Simon Schäffner

Scalability Performance

Attributes of Data Management Strategies

5

slide-8
SLIDE 8

Simon Schäffner

Scalability Performance Consistency

Attributes of Data Management Strategies

6

slide-9
SLIDE 9

Simon Schäffner

Scalability Performance Consistency Redundancy

Attributes of Data Management Strategies

7

slide-10
SLIDE 10

Simon Schäffner

Scalability Performance Consistency Redundancy Overhead

Attributes of Data Management Strategies

8

slide-11
SLIDE 11

Simon Schäffner

Scalability Performance Consistency Redundancy Overhead Attack Resistance

Attributes of Data Management Strategies

9

slide-12
SLIDE 12

Simon Schäffner

Comparison of Data Management Strategies

10

slide-13
SLIDE 13

Simon Schäffner

Comparison of Data Management Strategies

11

slide-14
SLIDE 14

Simon Schäffner

Started out with Napster and gnutella Legally controversial usage Interesting for completely legal use, as well

Peer-To-Peer Filesharing

12

slide-15
SLIDE 15

Simon Schäffner

BitTorrent

13

Web Server

slide-16
SLIDE 16

Web Server .torrent

Simon Schäffner

BitTorrent

14

slide-17
SLIDE 17

.torrent

Simon Schäffner

BitTorrent

15

Web Server

slide-18
SLIDE 18

Simon Schäffner

BitTorrent

16

Web Server

{ announce: http://bttracker.debian.org:6969/ announce, comment: "Debian CD from cdimage.debian.org”, creation date: 1520682848, httpseeds: [ https://cdimage.debian.org/cdimage/ release/9.4.0//srv/cdbuilder.debian.org/ dst/deb-cd/weekly-builds/amd64/iso-cd/ debian-9.4.0-amd64-netinst.iso ], info: { length: 305135616, name: debian-9.4.0-amd64-netinst.iso, piece length: 262144, … } }

slide-19
SLIDE 19

.torrent

Simon Schäffner

BitTorrent

17

Tracker

.torrent

http://bttracker.debian.org:6969/announce

.torrent

Origin

slide-20
SLIDE 20

.torrent

Simon Schäffner

BitTorrent

17

Tracker

.torrent

http://bttracker.debian.org:6969/announce

.torrent

Origin

slide-21
SLIDE 21

.torrent

Simon Schäffner

BitTorrent

18

Tracker

.torrent

http://bttracker.debian.org:6969/announce

.torrent

Origin

slide-22
SLIDE 22

.torrent

Simon Schäffner

BitTorrent

19

Tracker

.torrent

http://bttracker.debian.org:6969/announce

.torrent

Origin

slide-23
SLIDE 23

.torrent

Simon Schäffner

BitTorrent

19

Tracker

.torrent

http://bttracker.debian.org:6969/announce

.torrent

Origin

slide-24
SLIDE 24

Simon Schäffner

BitTorrent: Scalability

20

Source: [4]

slide-25
SLIDE 25

Scalability

  • Scales well with large #nodes

21

BitTorrent: Attributes (1)

slide-26
SLIDE 26

Scalability

  • Scales well with large #nodes

Performance

  • Good upload utilisation

22

BitTorrent: Attributes (1)

slide-27
SLIDE 27

Scalability

  • Scales well with large #nodes

Performance

  • Good upload utilisation

Consistency

  • Content does not change
  • Checksums in metadata file

23

BitTorrent: Attributes (1)

slide-28
SLIDE 28

Scalability

  • Scales well with large #nodes

Performance

  • Good upload utilisation

Consistency

  • Content does not change
  • Checksums in metadata file

Redundancy

  • Max. redundancy possible

24

BitTorrent: Attributes (1)

slide-29
SLIDE 29

Scalability

  • Scales well with large #nodes

Performance

  • Good upload utilisation

Consistency

  • Content does not change
  • Checksums in metadata file

Redundancy

  • Max. redundancy possible

Overhead

  • Metadata file
  • 1/1000 of traffic to tracker [5]
  • Metainformation sent to other nodes

25

BitTorrent: Attributes (1)

slide-30
SLIDE 30

Attack Resistance

  • Download of corrupt file very unlikely
  • Poisoning attacks possible

− Uploading large amount of fake files / malware − Flooding all peers with download requests

26

BitTorrent: Attributes (2)

slide-31
SLIDE 31

Simon Schäffner

BitTorrent

27

Data Update Scalability Performance Consistency Redundancy Overhead Attack Resistance no data update ++ Upload

  • util. indep.

Of #nodes + upload util. very good no data updates; corrupt data identified + high high, but aligns with goal

  • poisoning

attacks possible

slide-32
SLIDE 32

Peer-to-peer distributed hash table (DHT) identifiers: 160-bit (both key and node IDs) Keys stored on “close” nodes

Simon Schäffner

Kademlia

28

d(x, y) = x ⊕ y

slide-33
SLIDE 33

Simon Schäffner

Kademlia

29

Data Update Scalability Performance Consistency Redundancy Overhead Attack Resistance active republication + storage scales lin. With #nodes; Lookup time scales with O(log_2 n) ++ parallel redundant requests, efficient caching +/0 deleted information

  • max. 24h in

system + <key, value> pairs stored in >=k nodes

  • ptimized

process republishin g <key, value> pairs every hour + protection agains node flooding

slide-34
SLIDE 34

Simon Schäffner

First idea: Proxies for caching static content Internet now very dynamic, hit rates for proxies low (25-40%) [9] Content Delivery Networks (CDNs) more elaborate

Content Delivery Networks

30

slide-35
SLIDE 35

Simon Schäffner

Content Delivery Networks

31

Source: [10]

slide-36
SLIDE 36

Simon Schäffner

>240.000 servers >130 countries Within >1.700 networks [10] Handle flashcrowds by allocating more servers to the sites that need them at the moment Server nearby for low latency and small packet-loss

Akamai

32

slide-37
SLIDE 37

Simon Schäffner

Akamai: Tiered Distribution

33

Source: [11]

slide-38
SLIDE 38

Akamai: Attributes (1)

Scalability

  • Large amount of data available at high speed within network

34

slide-39
SLIDE 39

Akamai: Attributes (1)

Scalability

  • Large amount of data available at high speed within network

Performance

  • Overlay network (speed improvements up to 30-50%)
  • Sometimes Border Gateway Protocol is not optimal
  • Increased reliability by offering alternate routes
  • Reduced packet loss by sending packet through parallel routes
  • Forward error correction techniques
  • Transport protocol optimisations over TCP
  • Application level optimisations (content compression, application logic on edge servers)

35

slide-40
SLIDE 40

Akamai: Attributes (1)

Scalability

  • Large amount of data available at high speed within network

Performance

  • Overlay network (speed improvements up to 30-50%)
  • Sometimes Border Gateway Protocol is not optimal
  • Increased reliability by offering alternate routes
  • Reduced packet loss by sending packet through parallel routes
  • Forward error correction techniques
  • Transport protocol optimisations over TCP
  • Application level optimisations (content compression, application logic on edge servers)

Consistency

  • Standard techniques for caching (TTLs, versioned URLs)

36

slide-41
SLIDE 41

Akamai: Attributes (2)

Redundancy

  • Dependent on customer’s needs
  • Tiered distribution provides balance between redundancy and fast availability

37

slide-42
SLIDE 42

Akamai: Attributes (2)

Redundancy

  • Dependent on customer’s needs
  • Tiered distribution provides balance between redundancy and fast availability

Overhead

  • Proprietary overlay network
  • Claim to have improvements over TCP (reduced setup & teardown time per connection)

38

slide-43
SLIDE 43

Akamai: Attributes (2)

Redundancy

  • Dependent on customer’s needs
  • Tiered distribution provides balance between redundancy and fast availability

Overhead

  • Proprietary overlay network
  • Claim to have improvements over TCP (reduced setup & teardown time per connection)

Attack Resistance

  • No attackers within closed network
  • Engineered for high failure rate of network and equipment

39

slide-44
SLIDE 44

Highly distributed nature fundamental to high performance Entire communication within overlay network optimised, two small hops on either end should not matter

Simon Schäffner

Akamai

40

Data Update Scalability Performance Consistency Redundancy Overhead Attack Resistance

  • dep. on

customer ++ >240.000 nodes ++

  • ptimised

transfer speed in network based on TTL/version URLs, but

  • dep. on

customer ++ tiered distribution allows for any degree improvem ents over TCP, dep.

  • n

customer ++

  • nly attacks

from

  • utside

possible, high recovery rate

slide-45
SLIDE 45

Simon Schäffner

Sharding: partition data over many servers Redundancy Online Transaction Processing (OTLP): each query may be handled by different node Online Analytical Processing (OLAP): network can work together on single query

Distributed Databases

41

slide-46
SLIDE 46

Simon Schäffner

Document storage NoSQL database

  • Non relational
  • not only SQL (SQL-like query languages supported)
  • Usually worse consistency, better

Documents: number of fields and attachments Documents are versioned Use case: multiple offline clients, synchronise upon reconnection

CouchDB

42

slide-47
SLIDE 47

CouchDB: Attributes (1)

Scalability

  • Data lookup restricted to keys
  • Data can be partitioned, but nodes can still be queried individually

43

slide-48
SLIDE 48

CouchDB: Attributes (1)

Scalability

  • Data lookup restricted to keys
  • Data can be partitioned, but nodes can still be queried individually

Performance

  • Storage on Nodes: B-Tree

− Lookup by key − Lookup by key range

  • Multiversion Concurrency Control (MVCC)

− Queries see state of database at beginning of query for their while lifetime − Queries can run fully in parallel (as long as they do not write to the same dataset)

44

O(logN) O(logN + K)

slide-49
SLIDE 49

CouchDB: Attributes (2)

Consistency

  • Every node is ACID compliant

− Availability − Consistency − Isolation − Durability

  • Edit conflict triggered when two clients try to edit the same document
  • Edit conflict also triggered when offline client reconnects and a document was edited on node

and network − Each node deterministically decides which version wins − Losing versions still stored and replicated − Every node sees conflict, can resolve it

45

slide-50
SLIDE 50

CouchDB: Attributes (2)

Consistency

  • Every node is ACID compliant

− Availability − Consistency − Isolation − Durability

  • Edit conflict triggered when two clients try to edit the same document
  • Edit conflict also triggered when offline client reconnects and a document was edited on node

and network − Each node deterministically decides which version wins − Losing versions still stored and replicated − Every node sees conflict, can resolve it Redundancy

  • Fully configurable: able to run on single node, full replication supported, partitioning supported,

46

slide-51
SLIDE 51

CouchDB: Attributes (3)

Overhead

  • Depends on replication

47

slide-52
SLIDE 52

CouchDB: Attributes (3)

Overhead

  • Depends on replication

Attack Resistance

  • Simple authentication model by default
  • Allows for custom JavaScript function for update validation

48

slide-53
SLIDE 53

Simon Schäffner

CouchDB

49

Data Update Scalability Performance Consistency Redundancy Overhead Attack Resistance Active ++ data lookup by key only; data can be partitioned

  • ver many

nodes ++ key lookup locally O(log N); MVCC for parallel reading queries ++ eventual consistency; graceful conflict handling ++

  • dep. on use-

case +

  • dep. on

redundanc y needed + simple auth. Model by default, can be customised

slide-54
SLIDE 54

Simon Schäffner

Summary: Performance

50

slide-55
SLIDE 55

Simon Schäffner

Summary: Data Update

51

slide-56
SLIDE 56

Simon Schäffner

Summary: Data Allocation

52

slide-57
SLIDE 57

Simon Schäffner

Summary: Conflict Handling

53

slide-58
SLIDE 58

Simon Schäffner

Conclusion

54

trend: active replication

  • More network overhead
  • Decreases time of system being inconsistent

Currently all system popular (except for Zatara) Amount of distributed systems will probably only ever increase

slide-59
SLIDE 59

Simon Schäffner

[1] M. van Steen and A. S. Tanenbaum. A brief introduction to distributed systems. Computing, 98(10):967–1009, Oct 2016. [2] J. Pouwelse, P. Garbacki, D. Epema, and H. Sips. The bittorrent p2p file-sharing system: Measurements and analysis. In M. Castro and R. van Renesse, editors, Peer-to-Peer Systems IV, pages 205–216, Berlin, Heidelberg, 2005. Springer Berlin Heidelberg. 
 [3] Z. Zhang, Y. Li, Y. Chen, P. Cao, B. Deng, and X. Li. Understand the unfairness of bittorrent. 12 2010. [4] A. Bharambe, C. Herley, and V. N. Padmanabhan. Analyzing and improving bittorrent

  • performance. 01 2006.

[5] B. Cohen. Incentives build robustness in bittorrent. http://www.bittorrent.org/bittorrentecon.pdf, May 2003. last accessed: 27.05.2018 14:51.

Bibliography

55

slide-60
SLIDE 60

Simon Schäffner

[6] P. Maymounkov and D. Mazi`eres. Kademlia: A peer-to-peer information system based on the xor metric. In P. Druschel, F. Kaashoek, and A. Rowstron, editors, Peer-to-Peer Systems, pages 53–65, Berlin, Heidelberg, 2002. Springer Berlin Heidelberg. [7] A. Loewenstern and A. Norberg. Dht protocol. http://www.bittorrent.org/beps/bep_0005.html, Jan 2008. last accessed: 02.06.2018 15:11. [8] vbuterin and J. Ray. Kademlia peer selection. https://github.com/ethereum/wiki/wiki/ Kademlia- Peer-Selection, Oct 2015. last accessed: 02.06.2018 15:13 (revision ea47c31).
 [9] J. Dilley, B. Maggs, J. Parikh, H. Prokop, R. Sitaraman, and B. Weihl. Globally distributed content delivery. IEEE Internet Computing, 6(5):50–58, Sep 2002. [10] Facts & figures. https: //www.akamai.com/us/en/about/facts-figures.jsp. last accessed: 27.05.2018 20:47.

Bibliography

56

slide-61
SLIDE 61

Simon Schäffner

[11] E. Nygren, R. K. Sitaraman, and J. Sun. The akamai network: A platform for high- performance internet applications. SIGOPS Oper. Syst. Rev., 44(3):2–19, Aug. 2010. [12] B. Carstoiu and D. Carstoiu. High performance eventually consistent distributed database

  • zatara. In INC2010: 6th International Conference on Networked Computing, pages 1–6, May

2010.

Bibliography

57

slide-62
SLIDE 62

Any Questions?

Simon Schäffner Garching, 13. Juli 2018

slide-63
SLIDE 63

Thank you for your attention

Simon Schäffner Garching, 13. Juli 2018

slide-64
SLIDE 64

Simon Schäffner

Kademlia

60

d(x, y) = x ⊕ y

Source: [6]

slide-65
SLIDE 65

Simon Schäffner 61

d(x, y) = x ⊕ y

Source: [6]

Kademlia: Finding Another Node

slide-66
SLIDE 66

Simon Schäffner

Kademlia: Finding Another Node

62

Source: [6]

slide-67
SLIDE 67

Simon Schäffner

Kademlia: Finding Another Node

62

Source: [6]

slide-68
SLIDE 68

Simon Schäffner

Kademlia: Finding Another Node

62

Source: [6]

slide-69
SLIDE 69

Simon Schäffner

Kademlia: Finding Another Node

63

Source: [6]

slide-70
SLIDE 70

Simon Schäffner

Kademlia: Finding Another Node

64

Source: [6]

slide-71
SLIDE 71

Simon Schäffner

Kademlia: Finding Another Node

65

Source: [6]

slide-72
SLIDE 72

Simon Schäffner

Kademlia: Finding Another Node

66

Source: [6]

slide-73
SLIDE 73

Simon Schäffner

Kademlia: Finding Another Node

67

Source: [6]

slide-74
SLIDE 74

Simon Schäffner

Kademlia: Finding Another Node

68

Source: [6]

slide-75
SLIDE 75

Kademlia: Attributes

Scalability

  • Storage scales linearly with #nodes
  • #nodes to contact for lookup of a value scales with

Performance

  • Parallel, redundant requests
  • Caching reduces likelihood of hotspots

Consistency

  • Original publisher has to republish every 24h
  • Each of the k nodes the <key,value> pair is stored on has to republish every 1h
  • Caching: TTL inversely proportional to

Redundancy

  • Every <key,value> pair is stored on k nodes

69

O(log2(n))

d(x, y) = x ⊕ y

slide-76
SLIDE 76

Kademlia: Attributes

Overhead

  • Optimised republishing (do not also republish for 1h when republish is received)
  • Caching limited to short timespans

Attack Resistance

  • Protected against node flooding

70

slide-77
SLIDE 77

Simon Schäffner

Kademlia: Caching

71

Source: [6]

Node with value STORE

slide-78
SLIDE 78

Kademlia

Now used in

  • BitTorrent [7]
  • Ethereum [8]

72

slide-79
SLIDE 79

Simon Schäffner

Kademlia

73

Source: [7]

slide-80
SLIDE 80

Simon Schäffner

Kademlia

74

Source: [8]

slide-81
SLIDE 81

Simon Schäffner

First general use-case NoSQL database Eventually consistent Built to satisfy needs of modern cloud applications

Zatara

75

slide-82
SLIDE 82

Simon Schäffner

Key types

  • Cache only: stored in memory on single node, not replicated
  • Persistent: eventually consistent, stored on disk, replicated

Each key mapped to single node by hashing algorithm Client connects directly to that node to store the key

Zatara: Key Management

76

slide-83
SLIDE 83

Simon Schäffner

Zatara: Node Management

77

Source: [12]

slide-84
SLIDE 84

Zatara: Attributes

Scalability

  • Regional, asynchronous replication

Performance

  • Keys cached in memory
  • Higher TTL for more popular keys
  • Evaluation on 196 Amazon EC2 instances [12]

78

  • perations/sec

32500 65000 97500 130000 196 nodes, 196 groups 196 nodes, 98 groups SETKEY GETKEY PUSHRKEY PUSHLKEY

slide-85
SLIDE 85

Zatara: Attributes

Consistency

  • Eventual consistency: guaranteed consistency after consistency window has closed

Redundancy

  • Key stored on >=2 nodes

Overhead

  • Asynchronous replication reduces overhead (group sizes of 2-4 recommended)

Attack Resistance

  • Simple authentication mechanism (only trusted clients should have access)

79

slide-86
SLIDE 86

Simon Schäffner

Zatara

80

Data Update Scalability Performance Consistency Redundancy Overhead Attack Resistance Active ++ asynchr. Data repl.; large #groups possible; O(1) key lookup ++ in memory caching (least recently used) + eventual consistency + keys are guaranteed to be stored

  • n >=2 nodes

regional replication Simple authenticati

  • n