Alva Couch couch@cs.tufts.edu Consistency_and_Concurrency Page 1 - - PDF document

alva couch
SMART_READER_LITE
LIVE PREVIEW

Alva Couch couch@cs.tufts.edu Consistency_and_Concurrency Page 1 - - PDF document

Consistency, Concurrency, ACID, and CAP Tuesday, November 05, 2013 4:26 PM This lecture was composed from fragments of two lectures in COMP150CPA: Cloud and Power-Aware Computing. Alva Couch couch@cs.tufts.edu Consistency_and_Concurrency


slide-1
SLIDE 1

This lecture was composed from fragments of two lectures in COMP150CPA: Cloud and Power-Aware Computing. Alva Couch couch@cs.tufts.edu

Consistency, Concurrency, ACID, and CAP

Tuesday, November 05, 2013 4:26 PM Consistency_and_Concurrency Page 1

slide-2
SLIDE 2

The most important attribute of a distributed system: how consistency is handled. A consistent distributed object has the property that if many applications access the object via differing interfaces, all interfaces provide "the same view" of the object. The definition of consistency: You purchase an item. You ask what you purchased (perhaps through a different interface or mechanism than the one used to purchase the item). If you always get an accurate depiction of what you purchased -- regardless of interface -- then the purchase system is consistent. Example: credit card purchases

Consistency

Tuesday, November 05, 2013 12:13 PM Consistency_and_Concurrency Page 2

slide-3
SLIDE 3

How inconsistency arises

Tuesday, November 05, 2013 12:18 PM Consistency_and_Concurrency Page 3

slide-4
SLIDE 4

How inconsistency arises: horizontal scaling

Tuesday, November 05, 2013 12:23 PM Consistency_and_Concurrency Page 4

slide-5
SLIDE 5

How inconsistency arises: enter the cloud

Tuesday, November 05, 2013 12:23 PM Consistency_and_Concurrency Page 5

slide-6
SLIDE 6

When using a "cloud resource", the server you utilize

  • ften changes for every transaction you make.

Reason: horizontal scalability. (elasticity) Unless servers somehow communicate, you receive an inconsistent view of the world. "The cloud" is one model of server communication. "The cloud" has its own concepts of consistency. The facts of life for inconsistency

The facts of life for inconsistency

Tuesday, November 05, 2013 12:20 PM Consistency_and_Concurrency Page 6

slide-7
SLIDE 7

Is not a parallel or distributed computing program. Instances of an application do not communicate with each other. Instances have no local persistent memory. The only communication between instances is in sharing the same cloud! A cloud application…

A cloud application...

Wednesday, January 27, 2010 9:05 AM Consistency_and_Concurrency Page 7

slide-8
SLIDE 8

Are regular serial application programs. Can have many concurrent instances running in different locations. All instances do the same thing. All instances have the same view of the world. "Distributable" applications

Distributable programs

Wednesday, January 27, 2010 9:06 AM Consistency_and_Concurrency Page 8

slide-9
SLIDE 9

Give all instances of an application "the same view"

  • f the world.

Are opaque to the application. Are distributed in ways that the application cannot detect. Distributed objects

Distributed objects

Wednesday, January 27, 2010 9:08 AM Consistency_and_Concurrency Page 9

slide-10
SLIDE 10

The class determines the kind of application. An instance is one copy of a program that satisfies class requirements There can be many concurrent instances of a client (e.g., 1000 cellphone users) For a client: The class is the kind of service An instance is one copy of a program that provides the service. There can be many instances of the same service (e.g., geographically distributed) For a service: Best to think about cloud clients and services in terms of classes and instances:

Classes and instances

Monday, January 24, 2011 7:58 PM Consistency_and_Concurrency Page 10

slide-11
SLIDE 11

You run an app on your cellphone. It connects to a service. It does something (e.g., recording your Lat/Long location) It logs out. What can you assume about this transaction? The concept of binding you'll ever get the same server again. or that the server you get will have the same view of the cloud. (unless you know something more about the cloud…!) You cannot assume that All data must be stored in the cloud. There is no useful concept of local data. Caveat: when you write a cloud service

Binding

Monday, January 24, 2011 8:01 PM Consistency_and_Concurrency Page 11

slide-12
SLIDE 12

A model of cloud execution

Monday, January 24, 2011 7:54 PM Consistency_and_Concurrency Page 12

slide-13
SLIDE 13

ACID: also known as Structured Query Language (SQL) NoSQL: Not Only SQL: makes a choice of desirable ACID properties, leaves some out. The CAP Theorem: constraints the choices NoSQL can enable. Three different approaches to storing cloud data

ACID, NoSQL, and CAP

Tuesday, November 05, 2013 12:58 PM Consistency_and_Concurrency Page 13

slide-14
SLIDE 14

Atomicity: requested operations either occur or not, and there is nothing in between "occurring" and "not

  • ccurring".

Consistency: what you wrote is what you read. Isolation: no other factors other than your actions affect data. Durability: what you wrote remains what you read, even after system failures. Consistent datastores should exhibit what is commonly called ACID: Why isn't everything ACID? Why do non-ACID systems exist? This is a deep question....

ACID

Wednesday, January 27, 2010 11:09 AM Consistency_and_Concurrency Page 14

slide-15
SLIDE 15

Two visible properties of a distributed object: consistency and concurrency Consistency: the extent to which "what you write" is "what you read" back, afterward. Concurrency: what happens when two instances of your application try to do conflicting things at the same exact time? Consistency and concurrency

Consistency and concurrency

Wednesday, January 27, 2010 9:19 AM Consistency_and_Concurrency Page 15

slide-16
SLIDE 16

The extent to which "what you write" into a distributed object is "what you read" later. Strong consistency: if you write something into a distributed object, you always read what you wrote, even immediately after the write. Weak (eventual) consistency: if you write something into a distributed object, then eventually -- after some time passes -- you will be able to read it back. Immediate queries may return stale data. Two kinds: Consistency

Consistency

Wednesday, January 27, 2010 9:20 AM Consistency_and_Concurrency Page 16

slide-17
SLIDE 17

Strong consistency is like writing to a regular file or database: what you write is always what you get back. Eventual consistency is like writing something on pieces of paper and mailing them to many other

  • people. What you get back depends upon which

person you talk to and when you ask. Eventually, they'll all know about the change. A file analogy

A file analogy

Wednesday, January 27, 2010 9:36 AM Consistency_and_Concurrency Page 17

slide-18
SLIDE 18

How conflicting concurrent operations are handled. Strong concurrency: if two or more conflicting

  • perations are requested at the same time, they

are serialized and done in arrival order, and both are treated as succeeding. Thus the last request determines the outcome. Weak ("opportunistic") concurrency: if two conflicting operations are requested at the same time, the first succeeds and the second fails. Thus the first request determines the outcome. Two kinds: Concurrency

Concurrency

Wednesday, January 27, 2010 9:23 AM Consistency_and_Concurrency Page 18

slide-19
SLIDE 19

On linux, file writes exhibit strong concurrency, in the sense that conflicting writes all occur and the last one wins. Likewise, in a database, a stream of conflicting

  • perations are serialized and all occur -- the last one

determines the outcome. Opportunistic concurrency only occurs when there is some form of data locking, e.g., in a database transaction block. A file analogy

A file analogy

Wednesday, January 27, 2010 9:41 AM Consistency_and_Concurrency Page 19

slide-20
SLIDE 20

Obviously, we want both strong consistency and strong concurrency But we can't have both at the same time! Consistency/Concurrency tradeoffs

Strong consistency and concurrency

Wednesday, January 27, 2010 9:54 AM Consistency_and_Concurrency Page 20

slide-21
SLIDE 21

Some form of read blocking until a consistent state is achieved, Which implies a (relatively) slow read time before unblocking. Which means we can't have strong concurrency! Strong consistency requires

Strong consistency requirements

Wednesday, January 27, 2010 10:01 AM Consistency_and_Concurrency Page 21

slide-22
SLIDE 22

Some form of write sequencing. A (relatively) fast write time, with little blocking. Which means writes need time to propogate. Which means we can't have strong consistency! Strong concurrency requires

Strong concurrency requirements

Wednesday, January 27, 2010 10:01 AM Consistency_and_Concurrency Page 22

slide-23
SLIDE 23

Provides strong consistency At the expense of opportunistic concurrency. Google's "appEngine" Provides strong concurrency. At the expense of exhibiting eventual consistency. Amazon's "dynamo"

Two approaches to PAAS

Wednesday, January 27, 2010 10:03 AM Consistency_and_Concurrency Page 23

slide-24
SLIDE 24

Strong consistency: what you write is always what you read, even if you read at a (geographically) different place! Opportunistic concurrency: updates can fail; application is responsible for repeating failed update

  • perations. Updates should be contained in "try"

blocks! AppEngine properties

AppEngine

Wednesday, January 27, 2010 10:18 AM Consistency_and_Concurrency Page 24

slide-25
SLIDE 25

A distributed object retrieved from the persistence manager remains "attached" to the cloud. If you "set" something in a persistent object, this implicitly modifies the cloud version and every copy in other concurrently running application instances! This is what strong consistency means! So if some concurrent application instance sets something else in an object instance you fetched, your object will reflect that change (via strong consistency). So mostly, you observe what appears to be strong concurrency. Modifying distributed objects in AppEngine

Modifying distributed objects in AppEngine

Wednesday, January 27, 2010 11:23 AM Consistency_and_Concurrency Page 25

slide-26
SLIDE 26

How is this actually done? It's actually smoke and mirrors! The illusion of strong consistency Every object is "dirty" if changed, and "clean" if not. A class of objects is dirty if any instance is. An instance is dirty if its data isn't completely propogated. Very fast mechanisms for propogating "dirty" information (e.g., a bit array). Actually propogate the data, then relabel the thing as clean. Relatively slow mechanisms for changing something from "dirty" to "clean". Applications get "dirty" info immediately, and then wait until the data is clean before proceeding! Creating strong consistency

The illusion of strong consistency

Monday, January 24, 2011 8:09 PM Consistency_and_Concurrency Page 26

slide-27
SLIDE 27

import com.google.appengine.api.datastore.Key; import java.util.Date; import javax.jdo.annotations.IdGeneratorStrategy; import javax.jdo.annotations.IdentityType; import javax.jdo.annotations.PersistenceCapable; import javax.jdo.annotations.Persistent; import javax.jdo.annotations.PrimaryKey; @PersistenceCapable(identityType = IdentityType.APPLICATION) public class Employee { @PrimaryKey @Persistent(valueStrategy = IdGeneratorStrategy.IDENTITY) private Key key; @Persistent private String firstName; @Persistent private String lastName; @Persistent private Date hireDate; public Employee(String firstName, String lastName, Date hireDate) { this.firstName = firstName; this.lastName = lastName; this.hireDate = hireDate; } // Accessors for the fields. JDO doesn't use these, but your application does.

An example object

Wednesday, January 27, 2010 11:28 AM Consistency_and_Concurrency Page 27

slide-28
SLIDE 28

public Key getKey() { return key; } public String getFirstName() { return firstName; } public void setFirstName(String firstName) { this.firstName = firstName; } public String getLastName() { return lastName; } public void setLastName(String lastName) { this.lastName = lastName; } public Date getHireDate() { return hireDate; } public void setHireDate(Date hireDate) { this.hireDate = hireDate; } } Pasted from <http://code.google.com/appengine/docs/java/datastore/dataclas ses.html>

Consistency_and_Concurrency Page 28

slide-29
SLIDE 29

A hard thing to understand

Wednesday, January 27, 2010 1:57 PM Consistency_and_Concurrency Page 29

slide-30
SLIDE 30

Changes to a persistent object occur when requested. Access functions must be used; persistent data must be private; the Persistence Manager factory adds management code to make this happen! Changes are reflected everywhere the object is being referenced! AppEngine persistent object caveats

Persistent object caveats

Wednesday, January 27, 2010 2:38 PM Consistency_and_Concurrency Page 30

slide-31
SLIDE 31

When you change something in an instance of a persistent object, it is changed in every other image

  • f that instance, including inside other instances of

your application! What does strong consistency mean? But this is a polite illusion; in reality, other instances of your application wait for data they need to arrive!

What does strong consistency mean?

Wednesday, January 27, 2010 2:02 PM Consistency_and_Concurrency Page 31

slide-32
SLIDE 32

If you are just modifying one object in straightforward ways, one-attribute-at-a-time, you might think there is strong conurrency. But problems arise when you're trying to update an

  • bject, e.g., from itself.

The solution to these problems -- and not the problems themselves -- makes concurrency weak! Is concurrency actually weak?

Is concurrency actually weak?

Wednesday, January 27, 2010 2:07 PM Consistency_and_Concurrency Page 32

slide-33
SLIDE 33

Consider the code:

Key k = KeyFactory.createKey(Employee.class.getSimpleName(), "Alfred.Smith@example.com"); // assume existence of persistent getSalary and setSalary methods Employee e = pm.getObjectById(Employee.class, k); e.setSalary(e.getSalary()+100); // Give Alfred a raise!

Consider what happens when two application instances invoke this code at nearly the same time.

Updating an entity from its own values

Wednesday, January 27, 2010 2:08 PM Consistency_and_Concurrency Page 33

slide-34
SLIDE 34

e.setSalary(e.getSalary()+100) The code tmp = e.getSalary() tmp = tmp + 100 e.setSalary(tmp) is the same thing as (and is implemented as!)

The problem of concurrent updates

Wednesday, January 27, 2010 2:11 PM Consistency_and_Concurrency Page 34

slide-35
SLIDE 35

So, we can execute this twice according to the following schedule: Instance 1 Instance 2 e.getSalary e.getSalary e.setSalary e.setSalary And Alfred gets a $100 raise rather than a $200 raise :(

So, we can execute this as:

Wednesday, January 27, 2010 2:13 PM Consistency_and_Concurrency Page 35

slide-36
SLIDE 36

Identify operations that should be done without interruption. Keep other concurrent things from interfering between begin() and commit(). E.g.:

Key k = KeyFactory.createKey(Employee.class.getSimpleName(), "Alfred.Smith@example.com"); pm.currentTransaction().begin(); Employee e = pm.getObjectById(Employee.class, k); e.setSalary(e.getSalary()+100); // Give Alfred a raise! pm.currentTransaction().commit(); try { // ouch: something prevented the transaction! throw ex; // share the pain! } catch (JDOCanRetryException ex) { }

Transactions allow us to avoid that problem:

Transactions

Wednesday, January 27, 2010 2:16 PM Consistency_and_Concurrency Page 36

slide-37
SLIDE 37

The transaction block (from begin() to commit()) attempts to execute before any other changes can be made to e. If no other changes have been made to e between begin() and commit(), the transaction succeeds. If some change in e has been made meanwhile, the transaction fails, e gets the changed values, the whole transaction is cancelled, and the application has to recover somehow (if it can). How this works

How this works

Wednesday, January 27, 2010 2:32 PM Consistency_and_Concurrency Page 37

slide-38
SLIDE 38

the object will only change due to a commit. If two commits interleave, an exception is thrown. If two applications try to do this, then So we know we've goofed! How this behaves It is the use of transactions that "creates" weak concurrency, but without them, we have chaos!

How this behaves

Wednesday, January 27, 2010 2:34 PM Consistency_and_Concurrency Page 38

slide-39
SLIDE 39

Transaction blocks delimit things that should be done together. If something changes about the object between begin() and commit(), the transaction throws an exception. So, the application knows that its request failed. Optimistic concurrency

Optimistic concurrency

Wednesday, January 27, 2010 2:23 PM Consistency_and_Concurrency Page 39

slide-40
SLIDE 40

To modify an object, you must read it. If two object operations are attempted at the same time, it is possible for both to contain stale data with respect to each other. Then the only reasonable choice is for the loser to start over by reading e again! Otherwise data is lost! Thus the only reasonable choice that preserves transactional integrity is opportunistic concurrency! A subtle point of object consistency

A subtle point

Wednesday, January 27, 2010 10:27 AM Consistency_and_Concurrency Page 40

slide-41
SLIDE 41

So far, we see that the outcome of concurrency is weak (transaction) consistency. Why does strong concurrency work well in Amazon dynamo? This is really subtle. Dynamo is intended as a data warehouse. Thus it is not a database itself, but rather, a historical record of a database. Thus it is never necessary to invoke a transaction on it! When is strong concurrency a good choice?

When is strong concurrency a good choice?

Wednesday, January 27, 2010 10:34 AM Consistency_and_Concurrency Page 41

slide-42
SLIDE 42

Often misunderstood: some more radical authors interpret NoSQL as "Prevent Structured Query Language":) NoSQL = "Not only Structured Query Language" Assumes that the majority of queries are simple key fetches that don't require SQL structures. Optimizes simple key fetches for quick response. Often contains a fallback to interpreting full SQL, with performance disadvantages. Or doesn't provide all ACID properties. In actuality, the NoSQL movement: The NoSQL controversy

The NoSQL controversy

Wednesday, March 09, 2011 1:10 PM Consistency_and_Concurrency Page 42

slide-43
SLIDE 43

Gets the last value associated with a key (most

  • f the time).

ValueType get(KeyType key) Assert the binding of a key to a value. Subsequent puts update data. int put(KeyType key, ValueType value) Operations are: A typical NoSQL service

A typical NoSQL service

Wednesday, March 09, 2011 3:12 PM Consistency_and_Concurrency Page 43

slide-44
SLIDE 44

The most common use-case is retrieval by key. Values are just bags of bits; the user can given them structure as needed. The CAP theorem: one cannot build a distributed database system that exhibits all of strong Consistency, high Availability, and robustness in the presence of Partitioning (loss of messages). Roots of NoSQL

Roots of NoSQL

Wednesday, March 09, 2011 1:13 PM Consistency_and_Concurrency Page 44

slide-45
SLIDE 45

Consistency: all nodes have the same view of data. (High) Availability: data is instantly available from the system. Partitionability: the system continues to respond if messages are lost (due to system or network failure). The CAP conjecture (Eric Brewer, 2000): any distributed datastore can only exhibit two of the following three properties Proved to be true by Gilbert and Lynch (2002). The CAP Theorem Very similar to my initial claim about the tension between consistency and concurrency. Main contribution: cloud datastores can be categorized in terms of the two (of three) properties C,A,P that they exhibit. AppEngine is in class CP = Consistency + Partitionability Only other reasonable cloud class is class AP = Availability + Partionability We don't want a cloud datastore that loses data (e.g., ¬P)! In the cloud context:

The CAP Theorem

Wednesday, March 09, 2011 1:15 PM Consistency_and_Concurrency Page 45

slide-46
SLIDE 46

CAP theorem claim is related to my initial claim about the tension between consistency and concurrency. Main contribution: cloud datastores can be categorized in terms of the two (of three) properties C,A,P that they exhibit. AppEngine JDO is in class CP = Consistency + Partitionability Only other reasonable class is AP = Availability + Partionability We don't want a cloud datastore that loses data! (e.g., ¬P) In the cloud context:

Impact of the CAP theorem

Wednesday, March 09, 2011 1:56 PM Consistency_and_Concurrency Page 46

slide-47
SLIDE 47

Amazon Dynamo Facebook's Cassandra LinkedIn's Project Voldemort The class AP contains datastores like: Everybody needed one. Google didn't publish theirs. And it was CP, not AP. Why so many?

The class AP

Wednesday, March 09, 2011 1:58 PM Consistency_and_Concurrency Page 47

slide-48
SLIDE 48

Periodically, a Hadoop job is run to identify potential friends. Output is stored to a Voldemort (NoSQL) datastore. A web service accesses the datastore in read-only mode. How LinkedIn's colleague search actually works Data changes slowly, so the Hadoop job only has to be done once/day or less. The Voldemort datastore is read-only once the job is done. So we don't need the C in CAP, because there are no consistency issues! So Voldemort, in class AP, suffices. Some notes:

How LinkedIn's colleague search actually works

Wednesday, March 09, 2011 2:02 PM Consistency_and_Concurrency Page 48

slide-49
SLIDE 49

The back-end cloud datastore for amazon.com itself. Serves requests for shopping carts, purchases. On an eventually consistent datastore…! How? What is dynamo?

What is dynamo?

Wednesday, March 09, 2011 2:17 PM Consistency_and_Concurrency Page 49

slide-50
SLIDE 50

Vector clocks for conflict detection. Business logic for conflict resolution. Unique features of Dynamo

Unique features of dynamo

Wednesday, March 09, 2011 2:27 PM Consistency_and_Concurrency Page 50

slide-51
SLIDE 51

Every change transaction contains a timestamp and a pointer to the previous version of the object. As transactions flow through the system, they are accumulated in a local history on each node. Pruned if there are no conflicts. Resolved (in an application-specific manner) if conflicts occur. Local history is Vector clocks

Vector clocks

Wednesday, March 09, 2011 2:56 PM Consistency_and_Concurrency Page 51

slide-52
SLIDE 52

Example: shopping cart

Wednesday, March 09, 2011 5:03 PM Consistency_and_Concurrency Page 52

slide-53
SLIDE 53

If object is a shopping cart, contents are merged. If object is a purchase record, apparent duplicate purchases are eliminated. What happens when a conflict occurs is based upon business rules. Business-based recovery

Business-based recovery

Wednesday, March 09, 2011 3:05 PM Consistency_and_Concurrency Page 53

slide-54
SLIDE 54

I told you previously that one would not want to store business data in an eventually consistent store. embedding business logic. using vector clock updates. for each kind of object, separately. Amazon "gets away with" doing that, by This is a complex game, which is why Dynamo isn't available to their customers. Why Amazon "gets away with" (eventually consistent) Dynamo

Why?

Wednesday, March 09, 2011 3:16 PM Consistency_and_Concurrency Page 54