Claret Using Data Types for High Contention Distributed - - PowerPoint PPT Presentation

claret
SMART_READER_LITE
LIVE PREVIEW

Claret Using Data Types for High Contention Distributed - - PowerPoint PPT Presentation

Claret Using Data Types for High Contention Distributed Transactions Brandon Holt, Irene Zhang, Dan Ports, Mark Oskin, Luis Ceze PaPoC15 @ EuroSys Brandon Holt @holtbg At #EuroSys right now! EuroSys 2015 @EuroSys2015


slide-1
SLIDE 1

Claret

Brandon Holt, Irene Zhang, Dan Ports, Mark Oskin, Luis Ceze

Using Data Types for High Contention Distributed Transactions

PaPoC’15 @ EuroSys

slide-2
SLIDE 2 PaPoC’15 @ EuroSys – Claret 2

Brandon Holt @holtbg At #EuroSys right now!

   •••

EuroSys

2015

EuroSys 2015 @EuroSys2015 Co-located workshop: Principles and Practice

  • f Consistency for Distributed Data.

papoc.di.uminho.pt

   •••

16 9

Ellen DeGeneres @TheEllenShow If only Bradley's arm was longer. Best photo

  • ever. #oscars

   •••

2M 3.4M

slide-3
SLIDE 3 PaPoC’15 @ EuroSys – Claret 3

Brandon Holt @holtbg At #EuroSys right now!

   •••

EuroSys 2015

EuroSys 2015 @EuroSys2015 Co-located workshop: Principles and Practice

  • f Consistency for Distributed Data.

papoc.di.uminho.pt

   •••

16 9

Ellen DeGeneres @TheEllenShow If only Bradley's arm was longer. Best photo

  • ever. #oscars

   •••

2M 3.4M

{ }

author: user:92 content: "If only Bradley’s arm was longer. Best photo ever. #oscars"

post[1003] ⟹ Post

slide-4
SLIDE 4 PaPoC’15 @ EuroSys – Claret 3

Brandon Holt @holtbg At #EuroSys right now!

   •••

EuroSys 2015

EuroSys 2015 @EuroSys2015 Co-located workshop: Principles and Practice

  • f Consistency for Distributed Data.

papoc.di.uminho.pt

   •••

16 9

Ellen DeGeneres @TheEllenShow If only Bradley's arm was longer. Best photo

  • ever. #oscars

   •••

2M 3.4M

{ }

author: user:92 content: "If only Bradley’s arm was longer. Best photo ever. #oscars"

post[1003] ⟹ Post

{ }

retweets[1003] ⟹ Set

user:43 user:89 user:29 user:10 user:74

slide-5
SLIDE 5 PaPoC’15 @ EuroSys – Claret 3

Brandon Holt @holtbg At #EuroSys right now!

   •••

EuroSys 2015

EuroSys 2015 @EuroSys2015 Co-located workshop: Principles and Practice

  • f Consistency for Distributed Data.

papoc.di.uminho.pt

   •••

16 9

Ellen DeGeneres @TheEllenShow If only Bradley's arm was longer. Best photo

  • ever. #oscars

   •••

2M 3.4M

{ }

author: user:92 content: "If only Bradley’s arm was longer. Best photo ever. #oscars"

post[1003] ⟹ Post

{ }

retweets[1003] ⟹ Set

user:43 user:89 user:29 user:10 user:74

Retweet

retweets[1003].add("user:53")

slide-6
SLIDE 6 PaPoC’15 @ EuroSys – Claret 3

Brandon Holt @holtbg At #EuroSys right now!

   •••

EuroSys 2015

EuroSys 2015 @EuroSys2015 Co-located workshop: Principles and Practice

  • f Consistency for Distributed Data.

papoc.di.uminho.pt

   •••

16 9

Ellen DeGeneres @TheEllenShow If only Bradley's arm was longer. Best photo

  • ever. #oscars

   •••

2M 3.4M

{ }

author: user:92 content: "If only Bradley’s arm was longer. Best photo ever. #oscars"

post[1003] ⟹ Post

{ }

retweets[1003] ⟹ Set

user:43 user:89 user:29 user:10 user:74

Retweet

retweets[1003].add("user:53")

View post

retweet_count = retweets[1003].size() # ...

slide-7
SLIDE 7 PaPoC’15 @ EuroSys – Claret 4

Brandon Holt @holtbg At #EuroSys right now!

   •••

EuroSys 2015

EuroSys 2015 @EuroSys2015 Co-located workshop: Principles and Practice

  • f Consistency for Distributed Data.

papoc.di.uminho.pt

   •••

16 9

Ellen DeGeneres @TheEllenShow If only Bradley's arm was longer. Best photo

  • ever. #oscars

   •••

2M 3.4M

{ }

author: user:92 content: "If only Bradley’s arm was longer. Best photo ever. #oscars"

post[1003] ⟹ Post

{ }

retweets[1003] ⟹ Set

user:43 user:89 user:29 user:10 user:74

Retweet

retweets[1003].add("user:53")

View post

retweet_count = retweets[1003].size() # ...

How do we make this scale?

slide-8
SLIDE 8 PaPoC’15 @ EuroSys – Claret 4

Brandon Holt @holtbg At #EuroSys right now!

   •••

EuroSys 2015

EuroSys 2015 @EuroSys2015 Co-located workshop: Principles and Practice

  • f Consistency for Distributed Data.

papoc.di.uminho.pt

   •••

16 9

Ellen DeGeneres @TheEllenShow If only Bradley's arm was longer. Best photo

  • ever. #oscars

   •••

2M 3.4M

{ }

author: user:92 content: "If only Bradley’s arm was longer. Best photo ever. #oscars"

post[1003] ⟹ Post

{ }

retweets[1003] ⟹ Set

user:43 user:89 user:29 user:10 user:74

Retweet

retweets[1003].add("user:53")

View post

retweet_count = retweets[1003].size() # ...

How do we make this scale?

NoSQL

slide-9
SLIDE 9 PaPoC’15 @ EuroSys – Claret

Brandon Holt @holtbg At #EuroSys right now!

   •••

EuroSys 2015

EuroSys 2015 @EuroSys2015 Co-located workshop: Principles and Practice

  • f Consistency for Distributed Data.

papoc.di.uminho.pt

   •••

16 9

Ellen DeGeneres @TheEllenShow If only Bradley's arm was longer. Best photo

  • ever. #oscars

   •••

2M 3.4M

View post

retweets = get("retweeters:1003") # ...

5

post:1003:author ⟹ 92 post:1003:content ⟹ "If only Bradley’s arm was longer. Best photo ever. #oscars"

Retweet

s = get("retweeters:1003") if "user:43" not not in s: s += "user:43" put("retweeters:1003", s)

NoSQL

retweeters:1003 ⟹ "user29,user:89,user:74, user:10,user:43"

must be atomic which retweets will this contain?

slide-10
SLIDE 10 PaPoC’15 @ EuroSys – Claret

View post

retweets = get("retweeters:1003") # ... Brandon Holt @holtbg At #EuroSys right now!

   •••

EuroSys 2015

EuroSys 2015 @EuroSys2015 Co-located workshop: Principles and Practice

  • f Consistency for Distributed Data.

papoc.di.uminho.pt

   •••

16 9

Ellen DeGeneres @TheEllenShow If only Bradley's arm was longer. Best photo

  • ever. #oscars

   •••

2M 3.4M 6

post:1003:author ⟹ 92 post:1003:content ⟹ "If only Bradley’s arm was longer. Best photo ever. #oscars"

Retweet

s = get("retweeters:1003") if "user:43" not not in s: s += "user:43" put("retweeters:1003", s)

NoSQL

retweeters:1003 ⟹ "user29,user:89,user:74, user:10,user:43"

must be atomic which retweets will this contain?

Transactions? "Too expensive." "Don’t scale." What if the datastore knew more? More information → more chance for optimization Opportunity: 
 Use data types provided by the programmer

slide-11
SLIDE 11 PaPoC’15 @ EuroSys – Claret

View post

retweets = get("retweeters:1003") # ... Brandon Holt @holtbg At #EuroSys right now!

   •••

EuroSys 2015

EuroSys 2015 @EuroSys2015 Co-located workshop: Principles and Practice

  • f Consistency for Distributed Data.

papoc.di.uminho.pt

   •••

16 9

Ellen DeGeneres @TheEllenShow If only Bradley's arm was longer. Best photo

  • ever. #oscars

   •••

2M 3.4M 7

post:1003:author ⟹ 92 post:1003:content ⟹ "If only Bradley’s arm was longer. Best photo ever. #oscars"

Retweet

s = get("retweeters:1003") if "user:43" not not in s: s += "user:43" put("retweeters:1003", s)

NoSQL

retweeters:1003 ⟹ "user29,user:89,user:74, user:10,user:43"

must be atomic which retweets will this contain?

  • programmers express intent through types
  • flexible data model, no fixed schema
  • leverage ADT properties for transaction performance
  • sanely trade off consistency for scalability

Abstract Data Types in NoSQL

slide-12
SLIDE 12 PaPoC’15 @ EuroSys – Claret 8
  • Transactional boosting
  • Combining

Approximate data types

  • Bounded inconsistency
  • Isolated eventual consistency (CRDTs)
  • Probabilistic data types

Evaluation: Claret prototype

PaPoC’15 @ EuroSys

Leveraging Abstract Data Types in NoSQL

Commutativity

slide-13
SLIDE 13 PaPoC’15 @ EuroSys – Claret 9
  • Transactional boosting
  • Combining

Approximate data types

  • Bounded inconsistency
  • Isolated eventual consistency (CRDTs)
  • Probabilistic data types

Evaluation:

PaPoC’15 @ EuroSys

Leveraging Abstract Data Types in NoSQL

Commutativity

slide-14
SLIDE 14 PaPoC’15 @ EuroSys – Claret 11

Commutativity

Brandon Holt @holtbg At #EuroSys right now!

   •••

EuroSys 2015

EuroSys 2015 @EuroSys2015 Co-located workshop: Principles and Practice

  • f Consistency for Distributed Data.

papoc.di.uminho.pt

   •••

16 9

Ellen DeGeneres @TheEllenShow If only Bradley's arm was longer. Best photo

  • ever. #oscars

   •••

2M 3.4M

{ }

author: 92 content: "If only Bradley’s arm was longer. Best photo ever. #oscars"

post:1003 ⟹ retweeters:1003 ⟹

user:43 user:89 user:29 user:10 user:74

View post

post = Map("post:1003").get() retweets = Set("retweeters:1003").size() # ...

many reads → okay

slide-15
SLIDE 15 PaPoC’15 @ EuroSys – Claret 11

Commutativity

Brandon Holt @holtbg At #EuroSys right now!

   •••

EuroSys 2015

EuroSys 2015 @EuroSys2015 Co-located workshop: Principles and Practice

  • f Consistency for Distributed Data.

papoc.di.uminho.pt

   •••

16 9

Ellen DeGeneres @TheEllenShow If only Bradley's arm was longer. Best photo

  • ever. #oscars

   •••

2M 3.4M

{ }

author: 92 content: "If only Bradley’s arm was longer. Best photo ever. #oscars"

post:1003 ⟹ retweeters:1003 ⟹

user:43 user:89 user:29 user:10 user:74

Retweet

Set("retweeters:1003").add("user:53")

Retweet

Set("retweeters:1003").add("user:53")

slide-16
SLIDE 16 PaPoC’15 @ EuroSys – Claret 11

Commutativity

Brandon Holt @holtbg At #EuroSys right now!

   •••

EuroSys 2015

EuroSys 2015 @EuroSys2015 Co-located workshop: Principles and Practice

  • f Consistency for Distributed Data.

papoc.di.uminho.pt

   •••

16 9

Ellen DeGeneres @TheEllenShow If only Bradley's arm was longer. Best photo

  • ever. #oscars

   •••

2M 3.4M

3.4M

{ }

author: 92 content: "If only Bradley’s arm was longer. Best photo ever. #oscars"

post:1003 ⟹ retweeters:1003 ⟹

user:43 user:89 user:29 user:10 user:74

Retweet

Set("retweeters:1003").add("user:53")

Retweet

Set("retweeters:1003").add("user:53")

slide-17
SLIDE 17 PaPoC’15 @ EuroSys – Claret 11

Commutativity

Brandon Holt @holtbg At #EuroSys right now!

   •••

EuroSys 2015

EuroSys 2015 @EuroSys2015 Co-located workshop: Principles and Practice

  • f Consistency for Distributed Data.

papoc.di.uminho.pt

   •••

16 9

Ellen DeGeneres @TheEllenShow If only Bradley's arm was longer. Best photo

  • ever. #oscars

   •••

2M 3.4M

3.4M

{ }

author: 92 content: "If only Bradley’s arm was longer. Best photo ever. #oscars"

post:1003 ⟹ retweeters:1003 ⟹

user:43 user:89 user:29 user:10 user:74

Retweet

Set("retweeters:1003").add("user:53")

Retweet

Set("retweeters:1003").add("user:53") # add post to followers’ timelines

Retweet

Set("retweeters:1003").add("user:53") # add post to followers’ timelines

Retweet

Set("retweeters:1003").add("user:53") # add post to followers’ timelines

Retweet

Set("retweeters:1003").add("user:53") # add post to followers’ timelines

Retweet

Set("retweeters:1003").add("user:53")

many updates → contention

Retweet

Set("retweeters:1003").add("user:53")

slide-18
SLIDE 18 PaPoC’15 @ EuroSys – Claret 12

Commutativity

Brandon Holt @holtbg At #EuroSys right now!

   •••

EuroSys 2015

EuroSys 2015 @EuroSys2015 Co-located workshop: Principles and Practice

  • f Consistency for Distributed Data.

papoc.di.uminho.pt

   •••

16 9

Ellen DeGeneres @TheEllenShow If only Bradley's arm was longer. Best photo

  • ever. #oscars

   •••

2M 3.4M

3.4M

{ }

author: 92 content: "If only Bradley’s arm was longer. Best photo ever. #oscars"

post:1003 ⟹ retweeters:1003 ⟹

user:43 user:89 user:29 user:10 user:74

Retweet

Set("retweeters:1003").add("user:53") # add post to followers’ timelines

Retweet

Set("retweeters:1003").add("user:53") # add post to followers’ timelines

Retweet

Set("retweeters:1003").add("user:53") # add post to followers’ timelines

Retweet

Set("retweeters:1003").add("user:53") # add post to followers’ timelines

Retweet

Set("retweeters:1003").add("user:53") # add post to followers’ timelines

Retweet

Set("retweeters:1003").add("user:53") # add post to followers’ timelines

Retweet

Set("retweeters:1003").add("user:53") # add post to followers’ timelines

Set adds commute!

slide-19
SLIDE 19 PaPoC’15 @ EuroSys – Claret 13

Commutativity

Commutativity Specification* for Set

method: commutes with: when:

add(x): void add(y)

∀x,y

remove(x): void remove(y)

∀x,y

add(y)

x ≠ y

size(): int add(x)

x ∈ Set

remove(x)

x ∉ Set

contains(x): bool add(y)

x ≠ y ∨ y ∈ Set

remove(y)

x ≠ y ∨ y ∉ Set

size()

∀x

For a given data type: which pairs of operations commute?

* M. Kulkarni, D. Nguyen, D. Prountzos, X. Sui, and K. Pingali. Exploiting the Commutativity Lattice. PLDI ’11.

slide-20
SLIDE 20 PaPoC’15 @ EuroSys – Claret 13

Commutativity

Commutativity Specification* for Set

method: commutes with: when:

add(x): void add(y)

∀x,y

remove(x): void remove(y)

∀x,y

add(y)

x ≠ y

size(): int add(x)

x ∈ Set

remove(x)

x ∉ Set

contains(x): bool add(y)

x ≠ y ∨ y ∈ Set

remove(y)

x ≠ y ∨ y ∉ Set

size()

∀x

For a given data type: which pairs of operations commute?

If the key/value store knew this, what could it do?

* M. Kulkarni, D. Nguyen, D. Prountzos, X. Sui, and K. Pingali. Exploiting the Commutativity Lattice. PLDI ’11.

slide-21
SLIDE 21 PaPoC’15 @ EuroSys – Claret T2

Set("retweeters:1003").add(89) # add post to followers’ timelines f = followers(89).all() timeline(f[0]).push(1003) timeline(f[1]).push(1003) timeline(f[2]).push(1003) timeline(f[3]).push(1003)

T2

Set("retweeters:1003").add(89) # add post to followers’ timelines f = followers(89).all() timeline(f[0]).push(1003) timeline(f[1]).push(1003) timeline(f[2]).push(1003) timeline(f[3]).push(1003)

14

Commutativity

T1

Set("retweeters:1003").add(53) # add post to followers’ timelines f = followers(53).all() timeline(f[0]).push(1003) timeline(f[1]).push(1003) timeline(f[2]).push(1003) timeline(f[3]).push(1003)

Problem: contention → many aborts / retries

slide-22
SLIDE 22 PaPoC’15 @ EuroSys – Claret T2

Set("retweeters:1003").add(89) # add post to followers’ timelines f = followers(89).all() timeline(f[0]).push(1003) timeline(f[1]).push(1003) timeline(f[2]).push(1003) timeline(f[3]).push(1003)

T2

Set("retweeters:1003").add(89) # add post to followers’ timelines f = followers(89).all() timeline(f[0]).push(1003) timeline(f[1]).push(1003) timeline(f[2]).push(1003) timeline(f[3]).push(1003)

14

Commutativity

T1

Set("retweeters:1003").add(53) # add post to followers’ timelines f = followers(53).all() timeline(f[0]).push(1003) timeline(f[1]).push(1003) timeline(f[2]).push(1003) timeline(f[3]).push(1003)

Problem: contention → many aborts / retries

slide-23
SLIDE 23 PaPoC’15 @ EuroSys – Claret T2

Set("retweeters:1003").add(89) # add post to followers’ timelines f = followers(89).all() timeline(f[0]).push(1003) timeline(f[1]).push(1003) timeline(f[2]).push(1003) timeline(f[3]).push(1003)

T2

Set("retweeters:1003").add(89) # add post to followers’ timelines f = followers(89).all() timeline(f[0]).push(1003) timeline(f[1]).push(1003) timeline(f[2]).push(1003) timeline(f[3]).push(1003)

14

Commutativity

T1

Set("retweeters:1003").add(53) # add post to followers’ timelines f = followers(53).all() timeline(f[0]).push(1003) timeline(f[1]).push(1003) timeline(f[2]).push(1003) timeline(f[3]).push(1003)

Problem: contention → many aborts / retries

slide-24
SLIDE 24 PaPoC’15 @ EuroSys – Claret T2

Set("retweeters:1003").add(89) # add post to followers’ timelines f = followers(89).all() timeline(f[0]).push(1003) timeline(f[1]).push(1003) timeline(f[2]).push(1003) timeline(f[3]).push(1003)

15

Commutativity

T1

Set("retweeters:1003").add(53) # add post to followers’ timelines f = followers(53).all() timeline(f[0]).push(1003) timeline(f[1]).push(1003) timeline(f[2]).push(1003) timeline(f[3]).push(1003) * M. Herlihy and E. Koskinen.
 Transactional Boosting: A Methodology for Highly- concurrent Transactional Objects. PPoPP 2008.

Problem: Solution: Transactional boosting*

  • when operations commute, no need to abort their transactions
slide-25
SLIDE 25 PaPoC’15 @ EuroSys – Claret T2

Set("retweeters:1003").add(89) # add post to followers’ timelines f = followers(89).all() timeline(f[0]).push(1003) timeline(f[1]).push(1003) timeline(f[2]).push(1003) timeline(f[3]).push(1003)

16

Commutativity

T1

Set("retweeters:1003").add(53) # add post to followers’ timelines f = followers(53).all() timeline(f[0]).push(1003) timeline(f[1]).push(1003) timeline(f[2]).push(1003) timeline(f[3]).push(1003)

T3

Set("retweeters:1003").add(71) # add post to followers’ timelines f = followers(71).all() timeline(f[0]).push(1003) timeline(f[1]).push(1003) timeline(f[2]).push(1003) timeline(f[3]).push(1003) * M. Herlihy and E. Koskinen.
 Transactional Boosting: A Methodology for Highly- concurrent Transactional Objects. PPoPP 2008.

Solution: Transactional boosting*

  • when operations commute, no need to abort their transactions

Problem:

  • reduce abort rate → increase throughput
slide-26
SLIDE 26 PaPoC’15 @ EuroSys – Claret 18

Commutativity

Set("retweeters:1003").add(53) Set("retweeters:1003").add(89) Set("retweeters:1003").add(71) Set("retweeters:1003").add(22) Set("retweeters:1003").add(11) Set("retweeters:1003").add(55) Set("retweeters:1003").add(42) Set("retweeters:1003").add(91) Set("retweeters:1003").add(96)

Problem: Serializing operations on hot records

retweeters:1003

user:43 user:89 user:29 user:10 user:74

slide-27
SLIDE 27 PaPoC’15 @ EuroSys – Claret 19

Commutativity

Set("retweeters:1003").add(53) Set("retweeters:1003").add(89) Set("retweeters:1003").add(71) Set("retweeters:1003").add(22) Set("retweeters:1003").add(11) Set("retweeters:1003").add(55) Set("retweeters:1003").add(42) Set("retweeters:1003").add(91) Set("retweeters:1003").add(96)

Problem:

* D. Hendler, I. Incze, N. Shavit, and M. Tzafrir.
 Flat combining and the synchronization-parallelism

  • tradeoff. ACM Symposium on Parallelism in Algorithms

and Architectures, 2010.

retweeters:1003

user:43 user:89 user:29 user:10 user:74

slide-28
SLIDE 28 PaPoC’15 @ EuroSys – Claret 19

Commutativity

Set("retweeters:1003").add(53) Set("retweeters:1003").add(89) Set("retweeters:1003").add(71) Set("retweeters:1003").add(22) Set("retweeters:1003").add(11) Set("retweeters:1003").add(55) Set("retweeters:1003").add(42) Set("retweeters:1003").add(91) Set("retweeters:1003").add(96)

Problem:

Set("retweeters:1003").add([53,89,71]) Set("retweeters:1003").add([22,11,55]) Set("retweeters:1003").add([42,91,96]) * D. Hendler, I. Incze, N. Shavit, and M. Tzafrir.
 Flat combining and the synchronization-parallelism

  • tradeoff. ACM Symposium on Parallelism in Algorithms

and Architectures, 2010.

Solution: Combining*

  • merge multiple operations together and apply them all at once

retweeters:1003

user:43 user:89 user:29 user:10 user:74

slide-29
SLIDE 29 PaPoC’15 @ EuroSys – Claret 19

Commutativity

Set("retweeters:1003").add(53) Set("retweeters:1003").add(89) Set("retweeters:1003").add(71) Set("retweeters:1003").add(22) Set("retweeters:1003").add(11) Set("retweeters:1003").add(55) Set("retweeters:1003").add(42) Set("retweeters:1003").add(91) Set("retweeters:1003").add(96)

Problem:

Set("retweeters:1003").add([53,89,71]) Set("retweeters:1003").add([22,11,55]) Set("retweeters:1003").add([42,91,96]) * D. Hendler, I. Incze, N. Shavit, and M. Tzafrir.
 Flat combining and the synchronization-parallelism

  • tradeoff. ACM Symposium on Parallelism in Algorithms

and Architectures, 2010.

Solution: Combining*

  • merge multiple operations together and apply them all at once
  • parallelize and decrease contention

retweeters:1003

user:43 user:89 user:29 user:10 user:74

slide-30
SLIDE 30 PaPoC’15 @ EuroSys – Claret 20
  • Transactional boosting
  • Combining
  • Bounded inconsistency
  • Isolated eventual consistency (CRDTs)
  • Probabilistic data types

Evaluation: Claret prototype

PaPoC’15 @ EuroSys

Leveraging Abstract Data Types in NoSQL

Commutativity Approximate data types

slide-31
SLIDE 31 PaPoC’15 @ EuroSys – Claret 21

Approximate data types

Brandon Holt @holtbg At #EuroSys right now!

   •••

EuroSys 2015

EuroSys 2015 @EuroSys2015 Co-located workshop: Principles and Practice

  • f Consistency for Distributed Data.

papoc.di.uminho.pt

   •••

16 9

Ellen DeGeneres @TheEllenShow If only Bradley's arm was longer. Best photo

  • ever. #oscars

   •••

2M 3.4M

3.4M

Problem: Reads don’t commute with updates

View post

# ... retweets = Set("retweeters:1003").size() # ...

Retweet

Set("retweeters:1003").add("user:53") # ...

slide-32
SLIDE 32 PaPoC’15 @ EuroSys – Claret 21

Approximate data types

Brandon Holt @holtbg At #EuroSys right now!

   •••

EuroSys 2015

EuroSys 2015 @EuroSys2015 Co-located workshop: Principles and Practice

  • f Consistency for Distributed Data.

papoc.di.uminho.pt

   •••

16 9

Ellen DeGeneres @TheEllenShow If only Bradley's arm was longer. Best photo

  • ever. #oscars

   •••

2M 3.4M

3.4M

Problem: Reads don’t commute with updates

doesn’t need to be precise

View post

# ... retweets = Set("retweeters:1003").size() # ...

Retweet

Set("retweeters:1003").add("user:53") # ...

slide-33
SLIDE 33 PaPoC’15 @ EuroSys – Claret 22

Approximate data types

Brandon Holt @holtbg At #EuroSys right now!

   •••

EuroSys 2015

EuroSys 2015 @EuroSys2015 Co-located workshop: Principles and Practice

  • f Consistency for Distributed Data.

papoc.di.uminho.pt

   •••

16 9

Ellen DeGeneres @TheEllenShow If only Bradley's arm was longer. Best photo

  • ever. #oscars

   •••

2M 3.4M

3.4M

Solution: Bounded inconsistency

  • allow some updates concurrently with reads
  • exposes additional "commutativity"

Problem:

Retweet

Set("retweeters:1003").add("user:53") # ...

View post

# ... retweets = Set("retweeters:1003").size() # ...

View post

# ... retweets = Set("retweeters:1003").approxSize<0.05>() # ...

slide-34
SLIDE 34 PaPoC’15 @ EuroSys – Claret 22

Approximate data types

Brandon Holt @holtbg At #EuroSys right now!

   •••

EuroSys 2015

EuroSys 2015 @EuroSys2015 Co-located workshop: Principles and Practice

  • f Consistency for Distributed Data.

papoc.di.uminho.pt

   •••

16 9

Ellen DeGeneres @TheEllenShow If only Bradley's arm was longer. Best photo

  • ever. #oscars

   •••

2M 3.4M

3.4M

Solution: Bounded inconsistency

  • allow some updates concurrently with reads
  • exposes additional "commutativity"

Problem:

Retweet

Set("retweeters:1003").add("user:53") # ...

View post

# ... retweets = Set("retweeters:1003").size() # ...

View post

# ... retweets = Set("retweeters:1003").approxSize<0.05>() # ...

5% error → 170,000 adds

slide-35
SLIDE 35 PaPoC’15 @ EuroSys – Claret 23

Approximate data types

Problem: Scaling → high latencies, low availability

Replica 0 Replica 1 Replica 2
slide-36
SLIDE 36 PaPoC’15 @ EuroSys – Claret 23

Approximate data types

Problem: Scaling → high latencies, low availability

Replica 0 Replica 1 Replica 2

everything replicated, 
 all eventual consistency

slide-37
SLIDE 37 PaPoC’15 @ EuroSys – Claret Shard 0 Shard 2 Shard 1 24

Approximate data types

Problem: Solution: Isolated eventual consistency via CRDTs

  • use CRDT data type only where needed for scaling or low-latency
  • programmers choose what can be approximate

foo:1 foo:7 foo:9 retweeters:1003 retweeters:1003 retweeters:1003

slide-38
SLIDE 38 PaPoC’15 @ EuroSys – Claret 25

Approximate data types

Problem: Can’t (or don’t want to) store all the data

Tweets per second

slide-39
SLIDE 39 PaPoC’15 @ EuroSys – Claret 26

Approximate data types

Problem:

Tweets per second

slide-40
SLIDE 40 PaPoC’15 @ EuroSys – Claret 26

Approximate data types

Problem: Solution: Probabilistic data types

  • e.g. HyperLogLog, Bloom filter, Count-min sketch, T-digest
  • useful for tracking statistics, summary of high-volume data, or

partially-materialized views

Tweets per second

slide-41
SLIDE 41 PaPoC’15 @ EuroSys – Claret 28

Evaluation

PaPoC’15 @ EuroSys

Leveraging Abstract Data Types in NoSQL

Commutativity

  • Bounded inconsistency
  • Transactional boosting
  • Transactional boosting
  • Combining

Approximate data types

  • Bounded inconsistency
  • Isolated eventual consistency (CRDTs)
  • Probabilistic data types

: Claret prototype

slide-42
SLIDE 42 PaPoC’15 @ EuroSys – Claret 29

Evaluation

Claret: Key-value store with data types

  • simple two-phase commit protocol with locking 


(+transactional boosting)

  • experiments run with 4 shards,


standard local ethernet network,
 8-core 2GHz Intel Xeon processor per node

slide-43
SLIDE 43 PaPoC’15 @ EuroSys – Claret 30

Evaluation

Case study: Twitter clone

  • realistic synthetic graph (Kronecker, scale 14)
  • simple random user model, retweet more popular posts (viral effect)
slide-44
SLIDE 44 PaPoC’15 @ EuroSys – Claret

read−heavy repost−heavy 2 4 6 8 5 10 15 20 5 10 15 20

Throughput (ktxns/s) Average latency (ms)

Locking / OCC

31

Evaluation

Case study: Twitter clone

  • realistic synthetic graph (Kronecker, scale 16)
  • simple random user model, retweet more popular posts (viral effect)

(better)

slide-45
SLIDE 45 PaPoC’15 @ EuroSys – Claret

read−heavy repost−heavy 2 4 6 8 5 10 15 20 5 10 15 20

Throughput (ktxns/s) Average latency (ms)

Locking / OCC Claret

32

Evaluation

Case study: Twitter clone

  • realistic synthetic graph (Kronecker, scale 16)
  • simple random user model, retweet more popular posts (viral effect)

(better)

slide-46
SLIDE 46 PaPoC’15 @ EuroSys – Claret

read−heavy repost−heavy 2 4 6 8 5 10 15 20 5 10 15 20

Throughput (ktxns/s) Average latency (ms)

Locking / OCC Claret Claret−Approx

33

Evaluation

Case study: Twitter clone

  • realistic synthetic graph (Kronecker, scale 16)
  • simple random user model, retweet more popular posts (viral effect)

(better)

slide-47
SLIDE 47 PaPoC’15 @ EuroSys – Claret 34

Claret

Flexible data model lets programmers express intent

Commutativity

Leverage type info for transaction performance

Approximate data types

Sanely trade off consistency for scalability

PaPoC’15 @ EuroSys

Abstract Data Types for NoSQL

slide-48
SLIDE 48

Claret

Brandon Holt, Irene Zhang, Dan Ports, Mark Oskin, Luis Ceze

Abstract Data Types for NoSQL