FAQs Quiz #2 2/21 ~ 2/23 Spark and Storm 10 questions 30 minutes - - PDF document

faqs
SMART_READER_LITE
LIVE PREVIEW

FAQs Quiz #2 2/21 ~ 2/23 Spark and Storm 10 questions 30 minutes - - PDF document

CS535 Big Data 2/19/2020 Week 5-B Sangmi Lee Pallickara CS535 BIG DATA PART B. GEAR SESSIONS SESSION 1: PETA-SCALE STORAGE SYSTEMS Google had 2.5 million servers in 2016 Sangmi Lee Pallickara Computer Science, Colorado State University


slide-1
SLIDE 1

CS535 Big Data 2/19/2020 Week 5-B Sangmi Lee Pallickara http://www.cs.colostate.edu/~cs535 Spring 2020 Colorado State University, page 1

CS535 BIG DATA

PART B. GEAR SESSIONS

SESSION 1: PETA-SCALE STORAGE SYSTEMS

Sangmi Lee Pallickara Computer Science, Colorado State University http://www.cs.colostate.edu/~cs535

Google had 2.5 million servers in 2016

FAQs

  • Quiz #2
  • 2/21 ~ 2/23
  • Spark and Storm
  • 10 questions
  • 30 minutes
  • Answers will be available at 9PM 2/24

CS535 Big Data | Computer Science | Colorado State University

slide-2
SLIDE 2

CS535 Big Data 2/19/2020 Week 5-B Sangmi Lee Pallickara http://www.cs.colostate.edu/~cs535 Spring 2020 Colorado State University, page 2

Topics of Todays Class

  • GEAR Session I. Peta Scale Storage Systems
  • Lecture 2.
  • GFS I and II
  • Cassandra

CS535 Big Data | Computer Science | Colorado State University

GEAR Session 1. Peta-scale Storage Systems

CS535 Big Data | Computer Science | Colorado State University

slide-3
SLIDE 3

CS535 Big Data 2/19/2020 Week 5-B Sangmi Lee Pallickara http://www.cs.colostate.edu/~cs535 Spring 2020 Colorado State University, page 3

GEAR Session 1. peta-scale storage systems

Lecture 2. Google File System and Hadoop Distributed File System

  • 3. Relaxed Consistency

CS535 Big Data | Computer Science | Colorado State University

Two breaks in the communication lines

Boston Chicago Miami LA London Paris Rome Sydney

A single machine can’t partition So it does not have to worry about partition tolerance There is only one node. If it’s up, it’s available

CS535 Big Data | Computer Science | Colorado State University

slide-4
SLIDE 4

CS535 Big Data 2/19/2020 Week 5-B Sangmi Lee Pallickara http://www.cs.colostate.edu/~cs535 Spring 2020 Colorado State University, page 4

Eventually consistent

  • At any time nodes may have replication inconsistencies
  • If there are no more updates (or updates can be ordered), eventually all nodes will be

updated to the same value

CS535 Big Data | Computer Science | Colorado State University

GFS has a relaxed consistency model

  • Consistent: See the same data
  • On all replicas
  • Defined: If it is consistent AND
  • Clients see mutation writes in its entirety

CS535 Big Data | Computer Science | Colorado State University

slide-5
SLIDE 5

CS535 Big Data 2/19/2020 Week 5-B Sangmi Lee Pallickara http://www.cs.colostate.edu/~cs535 Spring 2020 Colorado State University, page 5

Inconsistent and undefined

Operation A Operation B

CS535 Big Data | Computer Science | Colorado State University

Consistent but undefined

Operation A Operation B

CS535 Big Data | Computer Science | Colorado State University

slide-6
SLIDE 6

CS535 Big Data 2/19/2020 Week 5-B Sangmi Lee Pallickara http://www.cs.colostate.edu/~cs535 Spring 2020 Colorado State University, page 6

Defined

Operation A Operation B

CS535 Big Data | Computer Science | Colorado State University

File state region after a mutation

Write Record Append Serial success Defined Concurrent success Consistent but undefined defined interspersed with inconsistent Failure Inconsistent

CS535 Big Data | Computer Science | Colorado State University

slide-7
SLIDE 7

CS535 Big Data 2/19/2020 Week 5-B Sangmi Lee Pallickara http://www.cs.colostate.edu/~cs535 Spring 2020 Colorado State University, page 7

GEAR Session 1. peta-scale storage systems

Lecture 2. Google File System and Hadoop Distributed File System

  • 4. Handling write and append to a file

CS535 Big Data | Computer Science | Colorado State University

GFS uses leases to maintain consistent mutation

  • rder across replicas
  • Master grants lease to one of the replicas
  • Primary
  • Primary picks serial-order
  • For all mutations to the chunk
  • Other replicas follow this order
  • When applying mutations

CS535 Big Data | Computer Science | Colorado State University

slide-8
SLIDE 8

CS535 Big Data 2/19/2020 Week 5-B Sangmi Lee Pallickara http://www.cs.colostate.edu/~cs535 Spring 2020 Colorado State University, page 8

Lease mechanism designed to minimize communications with the master

  • Lease has initial timeout of 60 seconds
  • As long as chunk is being mutated
  • Primary can request and receive extensions
  • Extension requests/grants piggybacked over heart-beat messages

CS535 Big Data | Computer Science | Colorado State University

Revocation and transfer of leases

  • Master may revoke a lease before it expires
  • If communications lost with primary
  • Master can safely give lease to another replica
  • Only After the lease period for old primary elapses

CS535 Big Data | Computer Science | Colorado State University

slide-9
SLIDE 9

CS535 Big Data 2/19/2020 Week 5-B Sangmi Lee Pallickara http://www.cs.colostate.edu/~cs535 Spring 2020 Colorado State University, page 9

How a write is actually performed

Client MASTER

Secondary Replica A

Primary Replica

Secondary Replica B

  • 1. Chunkserver holding the current lease

for the chunk and the location of the other replica

  • 2. Identity of the primary

and the locations of other replicas 3*.

  • 3. Client pushes the data to all the replicas
  • 4. Write request
  • 5. Write request/ 6. Acknowledgement
  • 7. Final Reply

CS535 Big Data | Computer Science | Colorado State University

Client pushes data to all the replicas [1/2]

  • Each chunk server stores data in an LRU buffer until
  • Data is used
  • Aged out

CS535 Big Data | Computer Science | Colorado State University

slide-10
SLIDE 10

CS535 Big Data 2/19/2020 Week 5-B Sangmi Lee Pallickara http://www.cs.colostate.edu/~cs535 Spring 2020 Colorado State University, page 10

Client pushes data to all the replicas [2/2]

  • When chunk servers acknowledge receipt of data
  • Client sends a write request to primary
  • Primary assigns consecutive serial numbers to mutations
  • Forwards to replicas

CS535 Big Data | Computer Science | Colorado State University

Data flow is decoupled from the control flow to utilize network efficiently

  • Utilize each machine’s network bandwidth
  • Avoid network bottlenecks
  • Avoid high-latency links
  • Leverage network topology
  • Estimate distances from IP addresses
  • Pipeline the data transfer
  • Once a chunkserver receives some data, it starts forwarding immediately.
  • For transferring B bytes to R replicas
  • Ideal elapsed time will be ≈ B/T+RL where:
  • T is the network throughput
  • L is latency to transfer bytes between two machines

CS535 Big Data | Computer Science | Colorado State University

slide-11
SLIDE 11

CS535 Big Data 2/19/2020 Week 5-B Sangmi Lee Pallickara http://www.cs.colostate.edu/~cs535 Spring 2020 Colorado State University, page 11

Append: Record sizes and fragmentation

  • Size is restricted to ¼ the chunk size
  • Maximum size
  • Minimizes worst-case fragmentation
  • Internal fragmentation in each chunk …

CS535 Big Data | Computer Science | Colorado State University

Inconsistent Regions

Data 3 Data 3 Data 1 Data 1 Data 1 Data 2 Data 2 Data 2

User will re-try to store Data 3

Data 3 Data 3 Data 1 Data 2 Data 3 Data 1 Data 2 Data 3 Data 1 Data 2 Data 3 Data 3

Failed Empty

CS535 Big Data | Computer Science | Colorado State University

slide-12
SLIDE 12

CS535 Big Data 2/19/2020 Week 5-B Sangmi Lee Pallickara http://www.cs.colostate.edu/~cs535 Spring 2020 Colorado State University, page 12

What if record append fails at one of the replicas

  • Client must retry the operation
  • Replicas of same chunk may contain
  • Different data
  • Duplicates of the same record
  • In whole or in part
  • Replicas of chunks are not bit-wise identical!
  • In most systems, replicas are identical

CS535 Big Data | Computer Science | Colorado State University

GFS only guarantees that the data will be written at least once as an atomic unit

  • For an operation to return success
  • Data must be written at the same offset on all the replicas
  • After the write, all replicas are as long as the end of the record
  • Any future record will be assigned a higher offset or a different chunk

CS535 Big Data | Computer Science | Colorado State University

slide-13
SLIDE 13

CS535 Big Data 2/19/2020 Week 5-B Sangmi Lee Pallickara http://www.cs.colostate.edu/~cs535 Spring 2020 Colorado State University, page 13

GEAR Session 1. peta-scale storage systems

Lecture 2. Google File System and Hadoop Distributed File System

Google File System II Colossus

CS535 Big Data | Computer Science | Colorado State University

Storage Software: Colossus (GFS2)

  • Next-generation cluster-level file system
  • Automatically sharded metadata layer
  • Distributed Masters (64MB block size à 1MB)
  • Data typically written using Reed-Solomon (1.5x)
  • Client-driven replication, encoding and replication
  • Metadata space has enabled availability
  • Why Reed-Solomon?
  • Cost
  • Especially with cross cluster replication
  • More flexible cost vs. availability choices
  • Google File System II: Dawn of the Multiplying Master Nodes,

http://www.theregister.co.uk/2009/08/12/google_file_system_part_deux/?page=1

CS535 Big Data | Computer Science | Colorado State University

slide-14
SLIDE 14

CS535 Big Data 2/19/2020 Week 5-B Sangmi Lee Pallickara http://www.cs.colostate.edu/~cs535 Spring 2020 Colorado State University, page 14

GEAR Session 1. peta-scale storage systems

Lecture 1. Google File System and Hadoop Distributed File System

Google File System II Colossus

Reed-Solomon Codes

CS535 Big Data | Computer Science | Colorado State University

Reed-Solomon Codes

  • Block-based error correcting codes
  • Digital communication and storage
  • Storage devices (including tape, CD, DVD, barcodes, etc)
  • Wireless or mobile communications
  • Satellite communications
  • Digital TV
  • High-speed modems

SOURCE: https://en.wikiversity.org/wiki/Reed–Solomon_codes_for_coders CS535 Big Data | Computer Science | Colorado State University

slide-15
SLIDE 15

CS535 Big Data 2/19/2020 Week 5-B Sangmi Lee Pallickara http://www.cs.colostate.edu/~cs535 Spring 2020 Colorado State University, page 15

What does the R-S code do?

  • Takes a block of digital data
  • Adds extra “redundant” bits
  • If an error happens, the R-S decoder processes each block and recovers original data

Reed-Solomon Encoder Reed-Solomon Decoder Communication channel or storage devices Noise, Errors Data source Data Sink

CS535 Big Data | Computer Science | Colorado State University

A Quick Example of the R-S encoding

  • 4+2 coding
  • Original files are broken into 4 pieces
  • 2 parity pieces are added
  • First piece of data “ABCD”, second piece of data “EFGH”…

A B C D E F G H I J K L M N O P Original Data

CS535 Big Data | Computer Science | Colorado State University

slide-16
SLIDE 16

CS535 Big Data 2/19/2020 Week 5-B Sangmi Lee Pallickara http://www.cs.colostate.edu/~cs535 Spring 2020 Colorado State University, page 16

A Quick Example of the R-S encoding

  • Applying coding matrix

A B C D E F G H I J K L M N O P 01 00 00 00 00 01 00 00 00 00 01 00 00 00 00 01 1b 1c 12 14 1c 1b 14 12 A B C D E F G H I J K L M N O P 51 52 53 49 55 56 57 25

x =

CS535 Big Data | Computer Science | Colorado State University

A Quick Example of the R-S encoding

  • Data loss
  • 2 of 6 rows are lost

A B C D E F G H I J K L M N O P 01 00 00 00 00 01 00 00 00 00 01 00 00 00 00 01 1b 1c 12 14 1c 1b 14 12 A B C D E F G H I J K L M N O P 51 52 53 49 55 56 57 25

x =

CS535 Big Data | Computer Science | Colorado State University

slide-17
SLIDE 17

CS535 Big Data 2/19/2020 Week 5-B Sangmi Lee Pallickara http://www.cs.colostate.edu/~cs535 Spring 2020 Colorado State University, page 17

A Quick Example of the R-S encoding

  • Without 2 rows

A B C D E F G H I J K L M N O P 01 00 00 00 00 01 00 00 1b 1c 12 14 1c 1b 14 12 A B C D E F G H 51 52 53 49 55 56 57 25

x =

CS535 Big Data | Computer Science | Colorado State University

A Quick Example of the R-S encoding

  • Multiplying each side with the inverted matrix

A B C D E F G H I J K L M N O P 01 00 00 00 00 01 00 00 1b 1c 12 14 1c 1b 14 12 A B C D E F G H 51 52 53 49 55 56 57 25

x =

01 00 00 00 00 01 00 00 8d f6 7b 01 f6 8d 01 7b

x

01 00 00 00 00 01 00 00 8d f6 7b 01 f6 8d 01 7b

x

CS535 Big Data | Computer Science | Colorado State University

slide-18
SLIDE 18

CS535 Big Data 2/19/2020 Week 5-B Sangmi Lee Pallickara http://www.cs.colostate.edu/~cs535 Spring 2020 Colorado State University, page 18

A Quick Example of the R-S encoding

  • The Inverse Matrix and the Coding Matrix Cancel Out

A B C D E F G H I J K L M N O P 01 00 00 00 00 01 00 00 1b 1c 12 14 1c 1b 14 12 A B C D E F G H 51 52 53 49 55 56 57 25

x =

01 00 00 00 00 01 00 00 8d f6 7b 01 f6 8d 01 7b

x

01 00 00 00 00 01 00 00 8d f6 7b 01 f6 8d 01 7b

x

CS535 Big Data | Computer Science | Colorado State University

A Quick Example of the R-S encoding

  • Reconstructing the Original Data

A B C D E F G H I J K L M N O P A B C D E F G H 51 52 53 49 55 56 57 25

=

01 00 00 00 00 01 00 00 8d f6 7b 01 f6 8d 01 7b

x

CS535 Big Data | Computer Science | Colorado State University

slide-19
SLIDE 19

CS535 Big Data 2/19/2020 Week 5-B Sangmi Lee Pallickara http://www.cs.colostate.edu/~cs535 Spring 2020 Colorado State University, page 19

Properties of Reed-Solomon codes

  • RS(n,k) with s-bit symbols
  • Encoder takes k data symbols (blocks) of s bits each
  • Adds parity symbols to make n symbol code word
  • There are n-k parity symbols of s bits each
  • A Reed-Solomon decoder can correct up to t symbols that contain errors in a code

word, where 2t = n-k.

  • t= (n-k)/2 for n-k even
  • t = (n-k-1)/2 for n-k odd

CS535 Big Data | Computer Science | Colorado State University

Example

  • RS(255,223) with 8 bit symbols
  • Each code word contains 255 code word bytes
  • 223 bytes are data and 32 bytes are parity
  • n=255, k=223, s=8, 2t = 32, t=16
  • The decoder can correct any 16 symbol errors in the code word
  • Errors in up to 16 bytes anywhere in the codeword can be automatically corrected.

data Parity 2t k n

CS535 Big Data | Computer Science | Colorado State University

slide-20
SLIDE 20

CS535 Big Data 2/19/2020 Week 5-B Sangmi Lee Pallickara http://www.cs.colostate.edu/~cs535 Spring 2020 Colorado State University, page 20

The Maximum codeword length

  • For a symbol size s
  • The maximum codeword length n is
  • n = 2s-1
  • For example, the max length of a code with 8-bit symbols (s=8) is 255 bytes

CS535 Big Data | Computer Science | Colorado State University

Coding Gain

  • Coding Gain
  • The probability of an error remaining in the decoded data is (usually) much lower than the probability
  • f an error if Reed-Solomon is not used
  • Example
  • A digital communication system is designed to operate at a Bit Error Ratio (BER) of 10-9
  • No more than 1 in 109 bits are received in error
  • This can be achieved by boosting the power of the transmitter or by adding Reed-Solomon (or another

type of Forward Error Correction)

  • Reed-Solomon allows the system to achieve this target BER with a lower transmitter output power
  • The power "saving" given by Reed-Solomon (in decibels) is the coding gain.

CS535 Big Data | Computer Science | Colorado State University

slide-21
SLIDE 21

CS535 Big Data 2/19/2020 Week 5-B Sangmi Lee Pallickara http://www.cs.colostate.edu/~cs535 Spring 2020 Colorado State University, page 21

GEAR Session 1. peta-scale storage systems

Lecture 2. Distributed No-SQL data storage system

CS535 Big Data | Computer Science | Colorado State University

In the Lambda architecture

New Data 01111001111… Batch layer Master dataset Serving layer Batch view Batch view Batch view Speed layer Realtime view Realtime view Realtime view Query: “How many…?”

CS535 Big Data | Computer Science | Colorado State University

slide-22
SLIDE 22

CS535 Big Data 2/19/2020 Week 5-B Sangmi Lee Pallickara http://www.cs.colostate.edu/~cs535 Spring 2020 Colorado State University, page 22

NoSQL databases

  • Basic Idea
  • Operates without a schema
  • Allows users to add fields without having to define any changes in structure first
  • Useful when dealing with nonuniform data and custom fields
  • Stands for “Not Only SQL”
  • Handles data access with size and performance that demand a cluster
  • Improves the productivity of application development by using a more convenient data

interaction style

CS535 Big Data | Computer Science | Colorado State University

Polyglot persistence

  • Using different data stores in different circumstances
  • Without picking a particular database for all situations
  • Most organizations have a mix of data storage technologies for different circumstances

CS535 Big Data | Computer Science | Colorado State University

slide-23
SLIDE 23

CS535 Big Data 2/19/2020 Week 5-B Sangmi Lee Pallickara http://www.cs.colostate.edu/~cs535 Spring 2020 Colorado State University, page 23

NoSQL Storage Data Model: (1) Key-Value Store

  • Simple hash table
  • All access to the storage is via primary key
  • Get the value for the key
  • Put a value for a key
  • Delete a key
  • Add a key
  • “value” is stored as a blob
  • Without caring or knowing what’s inside
  • Application is responsible for understanding data

CS535 Big Data | Computer Science | Colorado State University

NoSQL Storage Data Model: (2) Document Storage Model

  • Documents
  • Self-describing
  • Data structure
  • Maps, collections, tree, and scalar values
  • Stores documents in the value part of the key-value store
  • MongoDB, CouchDB, OrientDB, RavenDB, etc.
  • Users can query the data inside the document
  • without having to retrieve the whole document

CS535 Big Data | Computer Science | Colorado State University

slide-24
SLIDE 24

CS535 Big Data 2/19/2020 Week 5-B Sangmi Lee Pallickara http://www.cs.colostate.edu/~cs535 Spring 2020 Colorado State University, page 24

NoSQL Storage Data Model: (3) Column-Family Stores

  • Cassandra, Hbase, Hypertable, and Amazon SimpleDB
  • Stores data in column family as rows
  • Have many columns associated with a row key
  • Column families
  • Groups of related data that is often accessed together

CS535 Big Data | Computer Science | Colorado State University

GEAR Session 1. peta-scale storage systems

Lecture 2. Distributed No-SQL data storage system

Column Family NoSQL Storage system: Introduction to Apache Cassandra

CS535 Big Data | Computer Science | Colorado State University

slide-25
SLIDE 25

CS535 Big Data 2/19/2020 Week 5-B Sangmi Lee Pallickara http://www.cs.colostate.edu/~cs535 Spring 2020 Colorado State University, page 25

This material is built based on,

  • Avinash Lakshman, Prashant Malik, “A Decentralized Structured Storage System” ACM

SIGOPS Operation Systems Review, Vol. 44-(2), April 2010 pp. 35-40

  • Datastax Documentation: Apache Cassandra
  • http://docs.datastax.com/en/cassandra/2.1/cassandra/gettingStartedCassandraIntro.html
  • Now, Apache’s open source project,
  • http://cassandra.apache.org

CS535 Big Data | Computer Science | Colorado State University

CAP Theorem

  • Eric Brewer
  • it is impossible for a distributed data store to simultaneously provide more than two out of the following

three guarantees

  • Consistency: Every read receives the most recent write or an error
  • Availability: Every request receives a (non-error) response, without the guarantee that

it contains the most recent write

  • Partition tolerance: The system continues to operate despite an arbitrary number of

messages being dropped (or delayed) by the network between nodes

CS535 Big Data | Computer Science | Colorado State University

slide-26
SLIDE 26

CS535 Big Data 2/19/2020 Week 5-B Sangmi Lee Pallickara http://www.cs.colostate.edu/~cs535 Spring 2020 Colorado State University, page 26

If you design a data storage system for,

  • Facebook to store data for the Facebook content, how would you prioritize properties:

Consistency, Availability, or Partition tolerance? And Why?

CS535 Big Data | Computer Science | Colorado State University

Facebook’s operational requirements

  • Performance
  • Reliability
  • Failures are norm
  • Efficiency
  • Scalability
  • Support continuous growth of the platform

CS535 Big Data | Computer Science | Colorado State University

slide-27
SLIDE 27

CS535 Big Data 2/19/2020 Week 5-B Sangmi Lee Pallickara http://www.cs.colostate.edu/~cs535 Spring 2020 Colorado State University, page 27

Inbox search problem

  • A feature that allows users to search through all of their messages
  • By name of the person who sent it
  • By a keyword that shows up in the text
  • Search through all the previous messages
  • In order to solve this problem,
  • System should handle a very high write throughput
  • Billions of writes per day
  • Large number of users

CS535 Big Data | Computer Science | Colorado State University

Now,

  • Cassandra is in use at,
  • Apple
  • CERN
  • Easou
  • Comcast
  • eBay
  • GitHub
  • Hulu
  • Instagram
  • Netflix
  • Reddit
  • The Weather Channel
  • And over 1500 more companies

CS535 Big Data | Computer Science | Colorado State University

slide-28
SLIDE 28

CS535 Big Data 2/19/2020 Week 5-B Sangmi Lee Pallickara http://www.cs.colostate.edu/~cs535 Spring 2020 Colorado State University, page 28

GEAR Session 1. peta-scale storage systems

Lecture 2. Distributed No-SQL data storage system

Apache Cassandra

Data Model

CS535 Big Data | Computer Science | Colorado State University

Data Model (1/2)

  • Distributed multidimensional map indexed by a key
  • Row key
  • String with no size restrictions
  • Typically 16 ~ 36 bytes long
  • Every operation under a single row key is atomic
  • Value is an object
  • Highly structured

CS535 Big Data | Computer Science | Colorado State University

slide-29
SLIDE 29

CS535 Big Data 2/19/2020 Week 5-B Sangmi Lee Pallickara http://www.cs.colostate.edu/~cs535 Spring 2020 Colorado State University, page 29

Data Model (2/2)

  • Columns are grouped into column families
  • Similar to Bigtable
  • Columns are sorted within a simple column or super columns
  • Sorted by time or by name

CS535 Big Data | Computer Science | Colorado State University

Super column family vs. Simple column family

"alice": { "ccd17c10-d200-11e2-b7f6-29cc17aeed4c": { "sender": "bob", "sent": "2013-06-10 19:29:00+0100", "subject": "hello", "body": "hi" } }

  • Simple column family
  • Some uses require more dimensions
  • Family of values

e.g. messages

  • Cassandra’s native data model is two-dimensional
  • Rows and columns.
  • Columns that contain columns

CS535 Big Data | Computer Science | Colorado State University

slide-30
SLIDE 30

CS535 Big Data 2/19/2020 Week 5-B Sangmi Lee Pallickara http://www.cs.colostate.edu/~cs535 Spring 2020 Colorado State University, page 30

API

  • insert(table, key, rowMutation)
  • get(table, key, columnName)
  • delete(table, key, columnName)

CS535 Big Data | Computer Science | Colorado State University

Scalable Key location

  • In consistent hashing:
  • Each node need only be aware of its successor node on the circle
  • Queries can be passed around the circle via these successor pointers until it finds the resource
  • What is the disadvantage of this scheme?
  • It may require traversing all N nodes to find the appropriate mapping

CS535 Big Data | Computer Science | Colorado State University

slide-31
SLIDE 31

CS535 Big Data 2/19/2020 Week 5-B Sangmi Lee Pallickara http://www.cs.colostate.edu/~cs535 Spring 2020 Colorado State University, page 31

GEAR Session 1. peta-scale storage systems

Lecture 2. Distributed No-SQL data storage system

Apache Cassandra

Data Partitioning: Consistent Hashing

CS535 Big Data | Computer Science | Colorado State University

Non-consistent hashing vs. consistent hashing

  • When a hash table is resized
  • Non-consistent hashing algorithm requires re-hash of the complete table
  • Consistent hashing algorithm requires only partial rehash of the table

CS535 Big Data | Computer Science | Colorado State University

slide-32
SLIDE 32

CS535 Big Data 2/19/2020 Week 5-B Sangmi Lee Pallickara http://www.cs.colostate.edu/~cs535 Spring 2020 Colorado State University, page 32

Consistent hashing [1/3]

1 4 2 5 6 A C B

Identifier circle with m = 3 Consistent hash function assigns each node and key an m-bit identifier using a hashing function Hashing value of IP address m-bit Identifier: 2m identifiers m has to be big enough to make the probability of two nodes or keys hashing to the same identifier negligible

7 3

CS535 Big Data | Computer Science | Colorado State University

Key 3 will be stored in machine successor(3) = 5

1 4 3 5 7 A C B

Consistent hashing assigns keys to nodes: Key k will be assigned to the first node whose identifier is equal to or follows k in the identifier space Key 2 will be stored in machine C successor(2) = 5 Identifier: 2m identifiers Machine B is the successor node of key 1. successor (1) = 1

6 2

Consistent hashing [2/3]

CS535 Big Data | Computer Science | Colorado State University

slide-33
SLIDE 33

CS535 Big Data 2/19/2020 Week 5-B Sangmi Lee Pallickara http://www.cs.colostate.edu/~cs535 Spring 2020 Colorado State University, page 33

1 4 3 5 7 A C B 6 2

If machine C leaves circle, Successor(5) will point to A If machine N joins circle, successor(2) will point to N

New node N

Consistent hashing [3/3]

CS535 Big Data | Computer Science | Colorado State University

Scalable Key location

  • In consistent hashing:
  • Each node need only be aware of its successor node on the circle
  • Queries can be passed around the circle via these successor pointers until it finds the resource
  • What is the disadvantage of this scheme?
  • It may require traversing all N nodes to find the appropriate mapping

CS535 Big Data | Computer Science | Colorado State University

slide-34
SLIDE 34

CS535 Big Data 2/19/2020 Week 5-B Sangmi Lee Pallickara http://www.cs.colostate.edu/~cs535 Spring 2020 Colorado State University, page 34

GEAR Session 1. peta-scale storage systems

Lecture 2. Distributed No-SQL data storage system

Apache Cassandra

Data Partitioning: CHORD

CS535 Big Data | Computer Science | Colorado State University

This material is built based on

  • Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, and Hari Balakrishnan.
  • 2001. Chord: A scalable peer-to-peer lookup service for internet applications. In

Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications (SIGCOMM '01). ACM, New York, NY, USA, 149-160. DOI=http://dx.doi.org/10.1145/383059.383071

CS535 Big Data | Computer Science | Colorado State University

slide-35
SLIDE 35

CS535 Big Data 2/19/2020 Week 5-B Sangmi Lee Pallickara http://www.cs.colostate.edu/~cs535 Spring 2020 Colorado State University, page 35

Example of use

  • Apache Cassandra’s partitioning scheme
  • Couchbase
  • Openstack’s object storage service Swift
  • Akamai Content delivery network
  • Data partitioning in Voldemort
  • Partitioning component of Amazon’s storage system Dynamo

CS535 Big Data | Computer Science | Colorado State University

Scalable Key location in Chord

  • Let m be the number of bits in the key/node identifiers
  • Each node n, maintains,
  • A routing table with (at most ) m entries
  • Called the finger table
  • The ith entry in the table at node n, contains the identity of the first node, s.
  • Succeeds n by at least 2i-1 on the identifier circle
  • i.e. s = successor (n+2i-1), where 1≤i≤m (and all arithmetic is modulo 2m)

The ith entry finger of node n

CS535 Big Data | Computer Science | Colorado State University

slide-36
SLIDE 36

CS535 Big Data 2/19/2020 Week 5-B Sangmi Lee Pallickara http://www.cs.colostate.edu/~cs535 Spring 2020 Colorado State University, page 36

Definition of variables for node n, using m-bit identifiers

  • finger[i]. start = (n+2i-1) mod 2m, 1 ≤ k ≤ m
  • finger[i]. interval = [finger[i].start, finger[i+1].start)
  • finger[i]. node = first node ≥ n.finger[i].start
  • successor = the next node of the identifier circle
  • predecessor= the previous node on the identifier circle

CS535 Big Data | Computer Science | Colorado State University

  • Finger table
  • The Chord identifier
  • The IP address of the relevant node
  • First finger of n is its immediate successor on the circle
  • Clockwise!

CS535 Big Data | Computer Science | Colorado State University

slide-37
SLIDE 37

CS535 Big Data 2/19/2020 Week 5-B Sangmi Lee Pallickara http://www.cs.colostate.edu/~cs535 Spring 2020 Colorado State University, page 37

Finger tables

1 2 3 4 5 6 7 Start int succ 1 [1,2) 1 2 [2,4) 3 4 [4,0) 0 Finger table Start int succ 2 [2,3) 3 3 [3,5) 3 5 [5,1) 0 Finger table Start int succ 4 [4,5) 0 5 [5,7) 0 7 [7,3) 0 Finger table

CS535 Big Data | Computer Science | Colorado State University

Lookup process [1/3]

  • Each node stores information about only a small number of other nodes
  • A node’s finger table generally does not contain enough information to determine the

successor of an arbitrary key k

  • What happens when a node n does not know the successor of a key k?
  • If n finds a node whose ID is close than its own to k, that node will know more about the identifier circle

in the region of k than n does

CS535 Big Data | Computer Science | Colorado State University

slide-38
SLIDE 38

CS535 Big Data 2/19/2020 Week 5-B Sangmi Lee Pallickara http://www.cs.colostate.edu/~cs535 Spring 2020 Colorado State University, page 38

  • First, check the data is stored in n
  • If it is, return the data
  • Otherwise,
  • n searches its finger table for the node j
  • Whose ID most immediately precedes k
  • Ask j for the node it knows whose ID is closest to k
  • Do not overshoot!

Lookup process [2/3]

1.Go clockwise 2.Never overshoot

CS535 Big Data | Computer Science | Colorado State University

1 2 3 4 5 6 7 Start int succ 1 [1,2) 1 2 [2,4) 3 4 [4,0) 0 Finger table Start int succ 2 [2,3) 3 3 [3,5) 3 5 [5,1) 0 Finger table Start int succ 4 [4,5) 0 5 [5,7) 0 7 [7,3) 0 Finger table

  • 0. Request comes into node 3

to find the successor of identifier 1.

  • 1. Node 3 wants to find the

successor of identifier 1

  • 2. Identifier 1 belongs

to [7,3)

  • 3. Check succ: 0
  • 4. Node 3 asks node 0

to find successor of 1

  • 5. Successor of 1 is 1

Lookup process [3/3]

CS535 Big Data | Computer Science | Colorado State University

slide-39
SLIDE 39

CS535 Big Data 2/19/2020 Week 5-B Sangmi Lee Pallickara http://www.cs.colostate.edu/~cs535 Spring 2020 Colorado State University, page 39

Lookup process: example 1

1 2 3 4 5 6 7 Start int succ 1 [1,2) 1 2 [2,4) 3 4 [4,0) 0 Finger table Start int succ 2 [2,3) 3 3 [3,5) 3 5 [5,1) 0 Finger table Start int succ 4 [4,5) 0 5 [5,7) 0 7 [7,3) 0 Finger table

  • 0. Request comes into

node(machine) 1 to find the successor of id 4.

  • 1. Node 3 wants to find the

successor of identifier 4

  • 2. Identifier 4 belongs

to [3,5)

  • 3. Check succ: 3
  • 4. Node 1 asks node 3

to find successor of 4

  • 5. Successor of 4 is 0

CS535 Big Data | Computer Science | Colorado State University

Lookup process: example 2

1 2 3 4 5 6 7 Start int succ 1 [1,2) 1 2 [2,4) 3 4 [4,0) 0 Finger table Start int succ 2 [2,3) 3 3 [3,5) 3 5 [5,1) 0 Finger table Start int succ 4 [4,5) 0 5 [5,7) 0 7 [7,3) 0 Finger table

  • 0. Request comes into node 3.
  • 1. Node 3 wants to find the

successor of identifier 0

  • 2. Identifier 0 belongs

to [7,3)

  • 3. Check succ: 0
  • 4. Node 3 asks node 0

to find successor of 1

  • 5. Machine is using

identifier 0 as well.à succ is 0.

CS535 Big Data | Computer Science | Colorado State University

slide-40
SLIDE 40

CS535 Big Data 2/19/2020 Week 5-B Sangmi Lee Pallickara http://www.cs.colostate.edu/~cs535 Spring 2020 Colorado State University, page 40

Theorem 2.

  • With high probability (or under standard hardness assumptions), the number of nodes that must

be contacted to find a successor in an N-node network is O(logN)

  • Proof

Suppose that node n tries to resolve a query for the successor of k. Let p be the node that immediately precedes k. We analyze the number of steps to reach p. If n ≠ p, then n forwards its query to the closest predecessor of k in its finger table. ( i steps) Node k will finger some node f in this interval. The distance between n and f is at least 2i-1.

CS535 Big Data | Computer Science | Colorado State University

Proof continued

f and p are both in n’s ith finger interval, and the distance between them is at most 2i-1. This means f is closer to p than to n or equivalently

Distance from f to p is at most half of the distance from n to p If the distance between the node handling the query and the predecessor p halves in each step, and is at most 2m Within m steps the distance will be 1 (you have arrived at p)

The number of forwardings necessary will be O(logN) After log N forwardings, the distance between the current query node and the key k will be reduced at most 2m/N

  • The average lookup time is ½logN

CS535 Big Data | Computer Science | Colorado State University

slide-41
SLIDE 41

CS535 Big Data 2/19/2020 Week 5-B Sangmi Lee Pallickara http://www.cs.colostate.edu/~cs535 Spring 2020 Colorado State University, page 41

Requirements in node Joins

  • In a dynamic network, nodes can join (and leave) at any time
  • 1. Each node’s successor is correctly maintained
  • 2. For every key k, node successor(k) is responsible for k

CS535 Big Data | Computer Science | Colorado State University

Tasks to perform node join

  • 1. Initialize the predecessor and fingers of node n
  • 2. Update the fingers and predecessors of existing nodes to reflect the addition of n
  • 3. Notify the higher layer software so that it can transfer state (e.g. values) associated

with keys that node n is now responsible for

CS535 Big Data | Computer Science | Colorado State University

slide-42
SLIDE 42

CS535 Big Data 2/19/2020 Week 5-B Sangmi Lee Pallickara http://www.cs.colostate.edu/~cs535 Spring 2020 Colorado State University, page 42

Questions?

CS535 Big Data | Computer Science | Colorado State University