Aspect Ratio Test Slide
Aspect Ratio Test Slide APACHE COUCHDB SYNC DEEP DIVE by Jan - - PowerPoint PPT Presentation
Aspect Ratio Test Slide APACHE COUCHDB SYNC DEEP DIVE by Jan - - PowerPoint PPT Presentation
Aspect Ratio Test Slide APACHE COUCHDB SYNC DEEP DIVE by Jan Lehnardt at ApacheCon EU 2016 in Sevilla JAN LEHNARDT CouchDB since 2006 Apache CouchDB since 2008 PMC Chair & VP of CouchDB since 2011 Longest active contributor
APACHE COUCHDB SYNC DEEP DIVE
by Jan Lehnardt at ApacheCon EU 2016 in Sevilla
JAN LEHNARDT
➤CouchDB since 2006 ➤Apache CouchDB since
2008
➤PMC Chair & VP of
CouchDB since 2011
➤Longest active contributor ➤CEO at Neighbourhoodie
Software in Berlin
Joined CouchDB in 2006, longest standing contributor Have done everything from evangelising, community work, core engineering. Still do all of the above * * * Short intro to CouchDB Intro again at 2:30pm at ApacheCon EU
BASICS
JSON cuts ORM
BASICS
➤ HTTP
JSON cuts ORM
BASICS
➤ HTTP ➤ JSON
JSON cuts ORM
BASICS
➤ HTTP ➤ JSON ➤ Documents
JSON cuts ORM
BASICS
➤ HTTP ➤ JSON ➤ Documents ➤ Unique IDs, content
addressable revisions
JSON cuts ORM
BASICS
MR: unique API compatible
- design from 10 years ago
- other databases have features that start failing unpredictably at scale
- CouchDB doesn’t have those features in the first place
BASICS
➤ Incremental, Persistent Map / Reduce for queries
MR: unique API compatible
- design from 10 years ago
- other databases have features that start failing unpredictably at scale
- CouchDB doesn’t have those features in the first place
BASICS
➤ Incremental, Persistent Map / Reduce for queries ➤ Changes, “what happened since?”, think `git log` but a real-
time stream for your database
MR: unique API compatible
- design from 10 years ago
- other databases have features that start failing unpredictably at scale
- CouchDB doesn’t have those features in the first place
BASICS
➤ Incremental, Persistent Map / Reduce for queries ➤ Changes, “what happened since?”, think `git log` but a real-
time stream for your database
➤ API Compatible between single node and cluster, apps can grow
without rewrite
MR: unique API compatible
- design from 10 years ago
- other databases have features that start failing unpredictably at scale
- CouchDB doesn’t have those features in the first place
BASICS
➤ Incremental, Persistent Map / Reduce for queries ➤ Changes, “what happened since?”, think `git log` but a real-
time stream for your database
➤ API Compatible between single node and cluster, apps can grow
without rewrite
➤ trade-off: no features that wouldn’t scale in single node
version
MR: unique API compatible
- design from 10 years ago
- other databases have features that start failing unpredictably at scale
- CouchDB doesn’t have those features in the first place
DESIGN DECISIONS
DESIGN DECISIONS
➤ Data safety > *
DESIGN DECISIONS
➤ Data safety > * ➤ Fault tolerance
DESIGN DECISIONS
➤ Data safety > * ➤ Fault tolerance ➤ Erlang: only one request
can fail, not the whole server
DESIGN DECISIONS
➤ Data safety > * ➤ Fault tolerance ➤ Erlang: only one request
can fail, not the whole server
➤ Crash-only design
DESIGN DECISIONS
➤ Data safety > * ➤ Fault tolerance ➤ Erlang: only one request
can fail, not the whole server
➤ Crash-only design ➤ Everything is resumable
DESIGN DECISIONS
➤ Data safety > * ➤ Fault tolerance ➤ Erlang: only one request
can fail, not the whole server
➤ Crash-only design ➤ Everything is resumable ➤ Everything is idempotent
solo could be single node instance or cluster installation
hot spare explain replication a bit
- ne way, resume, delta, conflicts
read-only secondaries
multi-primary
multi-primary
us-east us-west eu-west
multi-primary
Sevilla New York Tokyo
multi-primary
Tree
City 1 City 2 City 3 City 4 City 5 City 6 City 7 City 8 City 9
Tree
City 1 City 2 City 3 City 4 City 5 City 6 City 7 City 8 City 9 County 1 County 2 County 3
Tree
State 2 State 1 City 1 City 2 City 3 City 4 City 5 City 6 City 7 City 8 City 9 County 1 County 2 County 3
Tree
State 2 State 1 City 1 City 2 City 3 City 4 City 5 City 6 City 7 City 8 City 9 County 1 County 2 County 3 Country
Tree
Till 1 Till 2 Till 3 Store 1 Till 4 Till 5 City 6 Store 2 Region 1 Region 2 Till 7 Till 8 Till 9 Store 3 Corporate
Tree
Mesh c.f. Internet of Things / Industry of Things
This attitude is good, except when you want to train a new generation
LET’S MAKE YOU SYNC EXPERTS!
IDENTITY
we need to uniquely identify a data record
NATURAL KEYS
Last Name First Name Abblesworth Antonia Burrows Bertram Chickleston Cecilia Dunkington David
NATURAL KEYS
Last Name First Name Abblesworth Antonia Burrows Bertram Chickleston Cecilia Dunkington David Doe John
NATURAL KEYS
Last Name First Name Abblesworth Antonia Burrows Bertram Chickleston Cecilia Dunkington David Doe John Doe Jane
NATURAL KEYS
Last Name First Name Abblesworth Antonia Burrows Bertram Chickleston Cecilia Dunkington David Doe John Doe Jane Doe Jane
SURROGATE KEYS: AUTO INCREMENT
ID Last Name First Name 1 Abblesworth Antonia 2 Burrows Bertram 3 Chickleston Cecilia 4 Dunkington David 5 Doe John 6 Doe Jane 7 Doe Jane
SURROGATE KEYS: AUTO INCREMENT ISSUES
ID Last Name First Name 1 Abblesworth Antonia 2 Burrows Bertram 3 Chickleston Cecilia 4 Dunkington David 5 Doe John 6 Doe Jane 7 Doe Jane 8 Ericsson Eric ID Last Name First Name 1 Abblesworth Antonia 2 Burrows Bertram 3 Chickleston Cecilia 4 Dunkington David 5 Doe John 6 Doe Jane 7 Doe Jane 8 Easterburgh Ethel
SURROGATE KEYS: AUTO INCREMENT ISSUES
ID Last Name First Name 1 Abblesworth Antonia 2 Burrows Bertram 3 Chickleston Cecilia 4 Dunkington David 5 Doe John 6 Doe Jane 7 Doe Jane 8 Ericsson Eric ID Last Name First Name 1 Abblesworth Antonia 2 Burrows Bertram 3 Chickleston Cecilia 4 Dunkington David 5 Doe John 6 Doe Jane 7 Doe Jane 8 Easterburgh Ethel
👏 👏
SURROGATE KEYS: UUIDS
ID Last Name First Name A0D31890-67 66-43E0- Abblesworth Antonia BE2EBD0B-36 39-42FB-87D1 Burrows Bertram CA7BEA04-79 8C-4903- Chickleston Cecilia DFFCFABC-6F BB-4BB2- Dunkington David E328E255- BF62-4743- Doe John F868B1D1-695 6-41C1-8815-9 Doe Jane G8DC258D- CF27-48E8- Doe Jane H232725B- E577-4F36- Ericsson Eric ID Last Name First Name A0D31890-67 66-43E0- Abblesworth Antonia BE2EBD0B-36 39-42FB-87D1 Burrows Bertram CA7BEA04-79 8C-4903- Chickleston Cecilia DFFCFABC-6F BB-4BB2- Dunkington David E328E255- BF62-4743- Doe John F868B1D1-695 6-41C1-8815-9 Doe Jane G8DC258D- CF27-48E8- Doe Jane KDB685D8-5 0DF-496C- Easterburgh Ethel
Upside:
- unique per record across nodes
- survives natural key changes
Downsides:
- no direct correlation between data and ID -> ICQ
- more storage stape
SURROGATE KEYS: UUIDS
ID Last Name First Name A0D31890-67 66-43E0- Abblesworth Antonia BE2EBD0B-36 39-42FB-87D1 Burrows Bertram CA7BEA04-79 8C-4903- Chickleston Cecilia DFFCFABC-6F BB-4BB2- Dunkington David E328E255- BF62-4743- Doe John F868B1D1-695 6-41C1-8815-9 Doe Jane G8DC258D- CF27-48E8- Doe Jane H232725B- E577-4F36- Ericsson Eric ID Last Name First Name A0D31890-67 66-43E0- Abblesworth Antonia BE2EBD0B-36 39-42FB-87D1 Burrows Bertram CA7BEA04-79 8C-4903- Chickleston Cecilia DFFCFABC-6F BB-4BB2- Dunkington David E328E255- BF62-4743- Doe John F868B1D1-695 6-41C1-8815-9 Doe Jane G8DC258D- CF27-48E8- Doe Jane KDB685D8-5 0DF-496C- Easterburgh Ethel
👎 👎
Upside:
- unique per record across nodes
- survives natural key changes
Downsides:
- no direct correlation between data and ID -> ICQ
- more storage stape
SYNC INGREDIENTS
➤ Identity
WHAT’S NEW?
DELTA
- 1. Send all documents to the target
ID Last Name First Name A0D31890… Abblesworth Antonia BE2EBD0B… Burrows Bertram CA7BEA04… Chickleston Cecilia DFFCFABC… Dunkington David E328E255… Doe John F868B1D1… Doe Jane G8DC258D… Doe Jane H232725B… Ericsson Eric
DELTA
- 1. Send all documents to the target
- 2. Send only the new documents to the
target
Database
ID Last Name First Name A0D31890… Abblesworth Antonia BE2EBD0B… Burrows Bertram CA7BEA04… Chickleston Cecilia DFFCFABC… Dunkington David
By-Sequence
Update Sequence Doc ID 1 CA7BEA04… 2 DFFCFABC… 3 A0D31890… 4 BE2EBD0B…
Database A
ID Last Name First Name A0D31890… Abblesworth Antonia BE2EBD0B… Burrows Bertram CA7BEA04… Chickleston Cecilia DFFCFABC… Dunkington David
By-Sequence
Update Sequence Doc ID 1 CA7BEA04… 2 DFFCFABC… 3 A0D31890… 4 BE2EBD0B…
Database B
ID Last Name First Name
Database A
ID Last Name First Name A0D31890… Abblesworth Antonia BE2EBD0B… Burrows Bertram CA7BEA04… Chickleston Cecilia DFFCFABC… Dunkington David
By-Sequence
Update Sequence Doc ID 1 CA7BEA04… 2 DFFCFABC… 3 A0D31890… 4 BE2EBD0B…
Database B
ID Last Name First Name CA7BEA04… Chickleston Cecilia
Database A
ID Last Name First Name A0D31890… Abblesworth Antonia BE2EBD0B… Burrows Bertram CA7BEA04… Chickleston Cecilia DFFCFABC… Dunkington David
By-Sequence
Update Sequence Doc ID 1 CA7BEA04… 2 DFFCFABC… 3 A0D31890… 4 BE2EBD0B…
Database B
ID Last Name First Name CA7BEA04… Chickleston Cecilia DFFCFABC… Dunkington David
Database A
ID Last Name First Name A0D31890… Abblesworth Antonia BE2EBD0B… Burrows Bertram CA7BEA04… Chickleston Cecilia DFFCFABC… Dunkington David
By-Sequence
Update Sequence Doc ID 1 CA7BEA04… 2 DFFCFABC… 3 A0D31890… 4 BE2EBD0B…
Database B
ID Last Name First Name A0D31890… Abblesworth Antonia CA7BEA04… Chickleston Cecilia DFFCFABC… Dunkington David
Database A
ID Last Name First Name A0D31890… Abblesworth Antonia BE2EBD0B… Burrows Bertram CA7BEA04… Chickleston Cecilia DFFCFABC… Dunkington David
By-Sequence
Update Sequence Doc ID 1 CA7BEA04… 2 DFFCFABC… 3 A0D31890… 4 BE2EBD0B…
Database B
ID Last Name First Name A0D31890… Abblesworth Antonia BE2EBD0B… Burrows Bertram CA7BEA04… Chickleston Cecilia DFFCFABC… Dunkington David
Database A
ID Last Name First Name A0D31890… Abblesworth Antonia BE2EBD0B… Burrows Bertram CA7BEA04… Chickleston Cecilia DFFCFABC… Dunkington David
By-Sequence
Update Sequence Doc ID 1 CA7BEA04… 2 DFFCFABC… 3 A0D31890… 4 BE2EBD0B…
Database B
ID Last Name First Name A0D31890… Abblesworth Antonia BE2EBD0B… Burrows Bertram CA7BEA04… Chickleston Cecilia DFFCFABC… Dunkington David
High Watermark: 4
Database A
ID Last Name First Name A0D31890… Abblesworth Antonia BE2EBD0B… Burrows Bertram CA7BEA04… Chickleston Cecilia DFFCFABC… Dunkington David E328E255… Doe John
By-Sequence
Update Sequence Doc ID 1 CA7BEA04… 2 DFFCFABC… 3 A0D31890… 4 BE2EBD0B… 5 E328E255…
Database B
ID Last Name First Name A0D31890… Abblesworth Antonia BE2EBD0B… Burrows Bertram CA7BEA04… Chickleston Cecilia DFFCFABC… Dunkington David
High Watermark: 4
Database B
ID Last Name First Name A0D31890… Abblesworth Antonia BE2EBD0B… Burrows Bertram CA7BEA04… Chickleston Cecilia DFFCFABC… Dunkington David E328E255… Doe John
High Watermark: 5
Database A
ID Last Name First Name A0D31890… Abblesworth Antonia BE2EBD0B… Burrows Bertram CA7BEA04… Chickleston Cecilia DFFCFABC… Dunkington David E328E255… Doe John
By-Sequence
Update Sequence Doc ID 1 CA7BEA04… 2 DFFCFABC… 3 A0D31890… 4 BE2EBD0B… 5 E328E255…
Next: Updates
UPDATES
Database B
ID Last Name First Name A0D31890… Abblesworth Antonia BE2EBD0B… Burrows Bertram CA7BEA04… Chickleston Cecilia DFFCFABC… Dunkington David E328E255… Doe John
High Watermark: 5
Database A
ID Last Name First Name A0D31890… Abblesworth Antonia BE2EBD0B… Burrows Bertram CA7BEA04… Chickleston Cecilia DFFCFABC… Dunkington David E328E255…* Esterhase John
By-Sequence
Update Sequence Doc ID 1 CA7BEA04… 2 DFFCFABC… 3 A0D31890… 4 BE2EBD0B… 5 E328E255… 6 E328E255…*
Database B
ID Last Name First Name A0D31890… Abblesworth Antonia BE2EBD0B… Burrows Bertram CA7BEA04… Chickleston Cecilia DFFCFABC… Dunkington David E328E255…* Esterhase John
High Watermark: 6
Database A
ID Last Name First Name A0D31890… Abblesworth Antonia BE2EBD0B… Burrows Bertram CA7BEA04… Chickleston Cecilia DFFCFABC… Dunkington David E328E255…* Esterhase John
By-Sequence
Update Sequence Doc ID 1 CA7BEA04… 2 DFFCFABC… 3 A0D31890… 4 BE2EBD0B… 5 E328E255… 6 E328E255…*
Now we go back and make a typo
Database B
ID Last Name First Name A0D31890… Abblesworth Antonia BE2EBD0B… Burrows Bertram CA7BEA04… Chickleston Cecilia DFFCFABC… Dunkington David E328E255… Doe John
High Watermark: 5
Database A
ID Last Name First Name A0D31890… Abblesworth Antonia BE2EBD0B… Burrows Bertram CA7BEA04… Chickleston Cecilia DFFCFABC… Dunkington David E328E255…** Esterhose John
By-Sequence
Update Sequence Doc ID 2 DFFCFABC… 3 A0D31890… 4 BE2EBD0B… 5 E328E255… 6 E328E255…* 7 E328E255…**
Database B
ID Last Name First Name A0D31890… Abblesworth Antonia BE2EBD0B… Burrows Bertram CA7BEA04… Chickleston Cecilia DFFCFABC… Dunkington David E328E255…* Esterhase John
High Watermark: 6
Database A
ID Last Name First Name A0D31890… Abblesworth Antonia BE2EBD0B… Burrows Bertram CA7BEA04… Chickleston Cecilia DFFCFABC… Dunkington David E328E255…** Esterhose John
By-Sequence
Update Sequence Doc ID 2 DFFCFABC… 3 A0D31890… 4 BE2EBD0B… 5 E328E255… 6 E328E255…* 7 E328E255…**
Wouldn’t it be more efficient to just send update 7?
Database B
ID Last Name First Name A0D31890… Abblesworth Antonia BE2EBD0B… Burrows Bertram CA7BEA04… Chickleston Cecilia DFFCFABC… Dunkington David E328E255…** Esterhose John
High Watermark: 7
Database A
ID Last Name First Name A0D31890… Abblesworth Antonia BE2EBD0B… Burrows Bertram CA7BEA04… Chickleston Cecilia DFFCFABC… Dunkington David E328E255…** Esterhose John
By-Sequence
Update Sequence Doc ID 1 CA7BEA04… 2 DFFCFABC… 3 A0D31890… 4 BE2EBD0B… 7 E328E255…**
Why yes it would -> make the sequence table unique for doc ids Next: deletes
DELETES
Database B
ID Last Name First Name A0D31890… Abblesworth Antonia BE2EBD0B… Burrows Bertram CA7BEA04… Chickleston Cecilia DFFCFABC… Dunkington David E328E255…** Esterhose John
High Watermark: 7
Database A
ID Last Name First Name A0D31890… Abblesworth Antonia BE2EBD0B… Burrows Bertram CA7BEA04… Chickleston Cecilia DFFCFABC… Dunkington David E328E255…** Esterhose John
By-Sequence
Update Sequence Doc ID 2 DFFCFABC… 3 A0D31890… 4 BE2EBD0B… 7 E328E255…** 8 CA7BEA04…
Wouldn’t it be more efficient to just send update 7? Why yes it would -> make the sequence table unique for doc ids
Database B
ID Last Name First Name A0D31890… Abblesworth Antonia BE2EBD0B… Burrows Bertram CA7BEA04… Chickleston Cecilia DFFCFABC… Dunkington David E328E255…** Esterhose John
High Watermark: 8
Database A
ID Last Name First Name A0D31890… Abblesworth Antonia BE2EBD0B… Burrows Bertram CA7BEA04… Chickleston Cecilia DFFCFABC… Dunkington David E328E255…** Esterhose John
By-Sequence
Update Sequence Doc ID 2 DFFCFABC… 3 A0D31890… 4 BE2EBD0B… 7 E328E255…** 8 CA7BEA04…
Wouldn’t it be more efficient to just send update 7? Why yes it would -> make the sequence table unique for doc ids
SYNC INGREDIENTS
➤ Identity ➤ What happened since?
QUESTIONS?
VERSIONS
* ** business, what’s up with that?
AUTO INCREMENT
* ** business, what’s up with that?
VERSIONS: AUTO INCREMENT?
Database A
ID Version Last Name First Name A0D31890 … 1 Abblesworth Antonia
Database B
ID Version Last Name First Name
VERSIONS: AUTO INCREMENT?
Database A
ID Version Last Name First Name A0D31890 … 1 Abblesworth Antonia
Database B
ID Version Last Name First Name A0D31890 … 1 Abblesworth Antonia
VERSIONS: AUTO INCREMENT?
Database A
ID Version Last Name First Name A0D31890 … 2 Bobblesworth Antonia
Database B
ID Version Last Name First Name A0D31890 … 2 Wibblesworth Antonia
What is it about the auto-increment that is so appealing: strictly ordered in math terms: monotonically increasing what else is monotonically increasing?
VERSIONS: AUTO INCREMENT?
Database A
ID Version Last Name First Name A0D31890 … 2 Bobblesworth Antonia
Database B
ID Version Last Name First Name A0D31890 … 2 Wibblesworth Antonia
👏 👏
What is it about the auto-increment that is so appealing: strictly ordered in math terms: monotonically increasing what else is monotonically increasing?
TIME!
From: Falsehoods programmers believe about time, and the sequel. ALL OF THESE STATEMENTS ARE FALSE http://infiniteundo.com/post/25326999628/falsehoods-programmers-believe-about-time http://infiniteundo.com/post/25509354022/more-falsehoods-programmers-believe-about-time
THE SYSTEM CLOCK WILL ALWAYS BE SET TO THE CORRECT LOCAL TIME.
THE SYSTEM CLOCK WILL ALWAYS BE SET TO A TIME THAT IS NOT WILDLY DIFFERENT FROM THE CORRECT LOCAL TIME.
IF THE SYSTEM CLOCK IS INCORRECT, IT WILL AT LEAST ALWAYS BE OFF BY A CONSISTENT NUMBER OF SECONDS.
THE SERVER CLOCK AND THE CLIENT CLOCK WILL ALWAYS BE SET TO THE SAME TIME.
THE SERVER CLOCK AND THE CLIENT CLOCK WILL ALWAYS BE SET TO AROUND THE SAME TIME.
OK, BUT THE TIME ON THE SERVER CLOCK AND TIME ON THE CLIENT CLOCK WOULD NEVER BE DIFFERENT BY A MATTER OF DECADES.
IF THE SERVER CLOCK AND THE CLIENT CLOCK ARE NOT IN SYNC, THEY WILL AT LEAST ALWAYS BE OUT OF SYNC BY A CONSISTENT NUMBER OF SECONDS.
THE SERVER CLOCK AND THE CLIENT CLOCK WILL USE THE SAME TIME ZONE.
THE SYSTEM CLOCK WILL NEVER BE SET TO A TIME THAT IS IN THE DISTANT PAST OR THE FAR FUTURE.
ONE MINUTE ON THE SYSTEM CLOCK HAS EXACTLY THE SAME DURATION AS ONE MINUTE ON ANY OTHER CLOCK
OK, BUT THE DURATION OF ONE MINUTE ON THE SYSTEM CLOCK WILL BE PRETTY CLOSE TO THE DURATION OF ONE MINUTE ON MOST OTHER CLOCKS.
FINE, BUT THE DURATION OF ONE MINUTE ON THE SYSTEM CLOCK WOULD NEVER BE MORE THAN AN HOUR.
A TIME STAMP OF SUFFICIENT PRECISION CAN SAFELY BE CONSIDERED UNIQUE.
IT’S POSSIBLE TO ESTABLISH A TOTAL ORDERING ON TIMESTAMPS THAT IS USEFUL OUTSIDE YOUR SYSTEM.
TIMESTAMPS ALWAYS ADVANCE MONOTONICALLY.
MY SOFTWARE IS ONLY USED INTERNALLY/LOCALLY, SO I DON’T HAVE TO WORRY ABOUT TIMEZONES
MY SOFTWARE STACK WILL HANDLE TIMEZONE WITHOUT ME NEEDING TO DO ANYTHING SPECIAL
ALL MEASUREMENTS OF TIME ON A GIVEN CLOCK WILL OCCUR WITHIN THE SAME FRAME OF REFERENCE.
In other words
TIME PASSES AT THE SAME SPEED ON TOP OF A MOUNTAIN AND AT THE BOTTOM OF A VALLEY.
TIMESTAMPS? NO
Spanner 2FA Whatever the exact scenario, this is plausible and has documented occurrences in small and large-scale systems: you can’t rely on timestamps to guarantee the order of two items, even if the timestamps were generated on the same device as they lead to data loss and/or duplication. This is what happens under the hood when you suddenly have all your notes or contacts twice, after syncing your phone and your desktop. Or why that one contact always gets deleted when you try to sync from phone to desktop (but not the other way around)
- Ok. How can we improve on timestamps?
Before we find out, we need to introduce one more new concept: conflicts.
DISTRIBUTED SYSTEMS INTERLUDE
VERSIONS: AUTO INCREMENT?
Database A
ID Version Last Name First Name A0D31890 … 2 Bobblesworth Antonia
Database B
ID Version Last Name First Name A0D31890 … 2 Wibblesworth Antonia
👏 👏
We are trained to think conflicts are bad. Until you discover distributed systems
VERSIONS: EMBRACING CONFLICTS
Database A
ID Version Last Name First Name A0D31890 … 2 Bobblesworth Antonia
Database B
ID Version Last Name First Name A0D31890 … 2 Wibblesworth Antonia
👎 👎
In distributed systems, conflicts are just another natural state of data. We all know how this works in version control systems, with the big markers
>>>>>>>>>>>>>>> =============== <<<<<<<<<<<<<<<
We know how to deal with this.
As soon as you have to computers connected by network, you have a distributed systems. Most other databases pretend that’s not true. That’s why you are monitoring “replication lag” and somesuch
CONFLICTS ARE COOL
CONFLICTS ARE COOL
WHAT’S BETTER THAN TIMESTAMPS?
VECTOR CLOCKS
➤ “Logical clock” ➤ Monotonically increase ➤ Causality, X happened after Y ➤ No representation of when precisely
something happened
➤ This is a good thing!
Cluster Node A Node C Node B DB 1 Shard 1 DB 1 Shard 1* DB 1 Shard 1**
PUT abc n=3
Client
Dynamo fast-intro
VECTOR CLOCKS
➤ “Logical clock” ➤ Monotonically increase ➤ Causality, X happened after Y ➤ No representation of when precisely
something happened
➤ This is a good thing! ➤ Would introduce conflicts on each
cluster-write.
➤ This is not a good thing
CONTENT ADDRESSABLE VERSIONS
CONTENT ADDRESSABLE VERSIONS
➤ Hash over document version contents
CONTENT ADDRESSABLE VERSIONS
➤ Hash over document version contents ➤ currently MD5 🙊
CONTENT ADDRESSABLE VERSIONS
➤ Hash over document version contents ➤ currently MD5 🙊 ➤ Ordered, stored in a list:
CONTENT ADDRESSABLE VERSIONS
➤ Hash over document version contents ➤ currently MD5 🙊 ➤ Ordered, stored in a list: ➤ [A, B, C, D, E]
CONTENT ADDRESSABLE VERSIONS
➤ Hash over document version contents ➤ currently MD5 🙊 ➤ Ordered, stored in a list: ➤ [A, B, C, D, E] ➤ List not as elegant as vector clocks,
but no other way.
CONTENT ADDRESSABLE VERSIONS
➤ Hash over document version contents ➤ currently MD5 🙊 ➤ Ordered, stored in a list: ➤ [A, B, C, D, E] ➤ List not as elegant as vector clocks,
but no other way.
➤ Limit
REPLICATION DETAIL INTERLUDE
Database Database
replication details
REPLICATION DETAIL INTERLUDE
Doc 1 [Rev A]
Database Database
replication details
REPLICATION DETAIL INTERLUDE
Doc 1 [Rev A] Doc 1 [B, A]
Database Database
replication details
REPLICATION DETAIL INTERLUDE
Doc 1 [Rev A] Doc 1 [B, A]
Database
Doc 1 [C, B, A]
Database
replication details
REPLICATION DETAIL INTERLUDE
Doc 1 [Rev A] Doc 1 [B, A]
Database
Doc 1 [C, B, A]
Database
replication details
REPLICATION DETAIL INTERLUDE
Doc 1 [Rev A] Doc 1 [B, A]
Database
Doc 1 [C, B, A]
Database
Doc 1 [C, B, A]
replication details
REPLICATION DETAIL INTERLUDE
Doc 1 [Rev A] Doc 1 [B, A]
Database
Doc 1 [C, B, A]
Database
Doc 1 [C, B, A] Doc 1 [D, C, B, A]
replication details
REPLICATION DETAIL INTERLUDE
Doc 1 [Rev A] Doc 1 [B, A]
Database
Doc 1 [C, B, A]
Database
Doc 1 [C, B, A] Doc 1 [D, C, B, A] Doc 1 [E, D, C, B, A]
replication details
REPLICATION DETAIL INTERLUDE
Doc 1 [Rev A] Doc 1 [B, A]
Database
Doc 1 [C, B, A]
Database
Doc 1 [C, B, A] Doc 1 [D, C, B, A] Doc 1 [E, D, C, B, A]
replication details
REPLICATION DETAIL INTERLUDE
Doc 1 [Rev A] Doc 1 [B, A]
Database
Doc 1 [C, B, A]
Database
Doc 1 [C, B, A] Doc 1 [D, C, B, A] Doc 1 [E, D, C, B, A] Doc 1 [E, D, C, B, A]
replication details
REPLICATION DETAIL INTERLUDE
Doc 1 [Rev A] Doc 1 [B, A]
Database
Doc 1 [C, B, A]
Database
Doc 1 [C, B, A] Doc 1 [D, C, B, A] Doc 1 [E, D, C, B, A] Doc 1 [E, D, C, B, A] Doc 1 [F, E, D, C, B, A]
replication details
REPLICATION DETAIL INTERLUDE
Doc 1 [Rev A] Doc 1 [B, A]
Database
Doc 1 [C, B, A]
Database
Doc 1 [C, B, A] Doc 1 [D, C, B, A] Doc 1 [E, D, C, B, A] Doc 1 [E, D, C, B, A] Doc 1 [F, E, D, C, B, A] Doc 1 [H, E, D, C, B, A]
replication details
REPLICATION DETAIL INTERLUDE
Doc 1 [Rev A] Doc 1 [B, A]
Database
Doc 1 [C, B, A]
Database
Doc 1 [C, B, A] Doc 1 [D, C, B, A] Doc 1 [E, D, C, B, A] Doc 1 [E, D, C, B, A] Doc 1 [F, E, D, C, B, A] Doc 1 [H, E, D, C, B, A]
replication details
REPLICATION DETAIL INTERLUDE
Doc 1 [Rev A] Doc 1 [B, A]
Database
Doc 1 [C, B, A]
Database
Doc 1 [C, B, A] Doc 1 [D, C, B, A] Doc 1 [E, D, C, B, A] Doc 1 [E, D, C, B, A] Doc 1 [F, E, D, C, B, A] Doc 1 [H, E, D, C, B, A] Doc 1 [[F, H], E, D, C, B, A]
replication details
REPLICATION DETAIL INTERLUDE
Doc 1 [Rev A] Doc 1 [B, A]
Database
Doc 1 [C, B, A]
Database
Doc 1 [C, B, A] Doc 1 [D, C, B, A] Doc 1 [E, D, C, B, A] Doc 1 [E, D, C, B, A] Doc 1 [F, E, D, C, B, A] Doc 1 [H, E, D, C, B, A] Doc 1 [[F, H], E, D, C, B, A]
{ _id: "1", _conflicts: [F, H] }
replication details
A B C D E F H
SYNC INGREDIENTS
➤ Identity ➤ What happened since? ➤ Embrace Conflicts ➤ List of Content Addressable Versions
BONUS MATERIAL
OPERATIONAL TRANSFORMS
OPERATIONAL TRANSFORMS
➤ Etherpad
OPERATIONAL TRANSFORMS
➤ Etherpad ➤ Google Docs / Wave 😭
OPERATIONAL TRANSFORMS
➤ Etherpad ➤ Google Docs / Wave 😭 ➤ Good for usually connected
collaborative text editing
OPERATIONAL TRANSFORMS
➤ Etherpad ➤ Google Docs / Wave 😭 ➤ Good for usually connected
collaborative text editing
➤ Stops working efficiently with longer
disconnects
CRDTS
CRDTS
➤ Conflict-free Replicated Data Types
CRDTS
➤ Conflict-free Replicated Data Types ➤ No Conflicts!
CRDTS
➤ Conflict-free Replicated Data Types ➤ No Conflicts! ➤ Only specific data types.
CRDTS
➤ Conflict-free Replicated Data Types ➤ No Conflicts! ➤ Only specific data types. ➤ Lists, trees, hashes, counters
CRDTS
➤ Conflict-free Replicated Data Types ➤ No Conflicts! ➤ Only specific data types. ➤ Lists, trees, hashes, counters ➤ All with non-standard properties
CRDTS
➤ Conflict-free Replicated Data Types ➤ No Conflicts! ➤ Only specific data types. ➤ Lists, trees, hashes, counters ➤ All with non-standard properties ➤ Good for special cases where they fit
CONFLICT-FREE REPLICATED JSON DATATYPE
CONFLICT-FREE REPLICATED JSON DATATYPE
➤ CRJDT
CONFLICT-FREE REPLICATED JSON DATATYPE
➤ CRJDT ➤ No Conflicts!
CONFLICT-FREE REPLICATED JSON DATATYPE
➤ CRJDT ➤ No Conflicts! ➤ Very new
CONFLICT-FREE REPLICATED JSON DATATYPE
➤ CRJDT ➤ No Conflicts! ➤ Very new ➤ Haven’t read it yet
CONFLICT-FREE REPLICATED JSON DATATYPE
➤ CRJDT ➤ No Conflicts! ➤ Very new ➤ Haven’t read it yet ➤ Probably same limitation as with
CRDTs: special semantics.
CONFLICT-FREE REPLICATED JSON DATATYPE
➤ CRJDT ➤ No Conflicts! ➤ Very new ➤ Haven’t read it yet ➤ Probably same limitation as with
CRDTs: special semantics.
➤ https://arxiv.org/abs/1608.03960
THANK YOU!
Apache CouchDB Sync Deep Dive Jan Lehnardt @janl jan@apache.org Professional Support for Apache CouchDB: https://neighbourhood.ie