Aspect Ratio Test Slide APACHE COUCHDB SYNC DEEP DIVE by Jan - - PowerPoint PPT Presentation

aspect ratio test slide apache couchdb sync deep dive
SMART_READER_LITE
LIVE PREVIEW

Aspect Ratio Test Slide APACHE COUCHDB SYNC DEEP DIVE by Jan - - PowerPoint PPT Presentation

Aspect Ratio Test Slide APACHE COUCHDB SYNC DEEP DIVE by Jan Lehnardt at ApacheCon EU 2016 in Sevilla JAN LEHNARDT CouchDB since 2006 Apache CouchDB since 2008 PMC Chair & VP of CouchDB since 2011 Longest active contributor


slide-1
SLIDE 1

Aspect Ratio Test Slide

slide-2
SLIDE 2

APACHE COUCHDB SYNC DEEP DIVE

by Jan Lehnardt at ApacheCon EU 2016 in Sevilla

slide-3
SLIDE 3

JAN LEHNARDT

➤CouchDB since 2006 ➤Apache CouchDB since

2008

➤PMC Chair & VP of

CouchDB since 2011

➤Longest active contributor ➤CEO at Neighbourhoodie

Software in Berlin

Joined CouchDB in 2006, longest standing contributor Have done everything from evangelising, community work, core engineering. Still do all of the above * * * Short intro to CouchDB Intro again at 2:30pm at ApacheCon EU

slide-4
SLIDE 4
slide-5
SLIDE 5

BASICS

JSON cuts ORM

slide-6
SLIDE 6

BASICS

➤ HTTP

JSON cuts ORM

slide-7
SLIDE 7

BASICS

➤ HTTP ➤ JSON

JSON cuts ORM

slide-8
SLIDE 8

BASICS

➤ HTTP ➤ JSON ➤ Documents

JSON cuts ORM

slide-9
SLIDE 9

BASICS

➤ HTTP ➤ JSON ➤ Documents ➤ Unique IDs, content

addressable revisions

JSON cuts ORM

slide-10
SLIDE 10

BASICS

MR: unique API compatible

  • design from 10 years ago
  • other databases have features that start failing unpredictably at scale
  • CouchDB doesn’t have those features in the first place
slide-11
SLIDE 11

BASICS

➤ Incremental, Persistent Map / Reduce for queries

MR: unique API compatible

  • design from 10 years ago
  • other databases have features that start failing unpredictably at scale
  • CouchDB doesn’t have those features in the first place
slide-12
SLIDE 12

BASICS

➤ Incremental, Persistent Map / Reduce for queries ➤ Changes, “what happened since?”, think `git log` but a real-

time stream for your database

MR: unique API compatible

  • design from 10 years ago
  • other databases have features that start failing unpredictably at scale
  • CouchDB doesn’t have those features in the first place
slide-13
SLIDE 13

BASICS

➤ Incremental, Persistent Map / Reduce for queries ➤ Changes, “what happened since?”, think `git log` but a real-

time stream for your database

➤ API Compatible between single node and cluster, apps can grow

without rewrite

MR: unique API compatible

  • design from 10 years ago
  • other databases have features that start failing unpredictably at scale
  • CouchDB doesn’t have those features in the first place
slide-14
SLIDE 14

BASICS

➤ Incremental, Persistent Map / Reduce for queries ➤ Changes, “what happened since?”, think `git log` but a real-

time stream for your database

➤ API Compatible between single node and cluster, apps can grow

without rewrite

➤ trade-off: no features that wouldn’t scale in single node

version

MR: unique API compatible

  • design from 10 years ago
  • other databases have features that start failing unpredictably at scale
  • CouchDB doesn’t have those features in the first place
slide-15
SLIDE 15

DESIGN DECISIONS

slide-16
SLIDE 16

DESIGN DECISIONS

➤ Data safety > *

slide-17
SLIDE 17

DESIGN DECISIONS

➤ Data safety > * ➤ Fault tolerance

slide-18
SLIDE 18

DESIGN DECISIONS

➤ Data safety > * ➤ Fault tolerance ➤ Erlang: only one request

can fail, not the whole server

slide-19
SLIDE 19

DESIGN DECISIONS

➤ Data safety > * ➤ Fault tolerance ➤ Erlang: only one request

can fail, not the whole server

➤ Crash-only design

slide-20
SLIDE 20

DESIGN DECISIONS

➤ Data safety > * ➤ Fault tolerance ➤ Erlang: only one request

can fail, not the whole server

➤ Crash-only design ➤ Everything is resumable

slide-21
SLIDE 21

DESIGN DECISIONS

➤ Data safety > * ➤ Fault tolerance ➤ Erlang: only one request

can fail, not the whole server

➤ Crash-only design ➤ Everything is resumable ➤ Everything is idempotent

slide-22
SLIDE 22

solo could be single node instance or cluster installation

slide-23
SLIDE 23

hot spare explain replication a bit

  • ne way, resume, delta, conflicts
slide-24
SLIDE 24

read-only secondaries

slide-25
SLIDE 25

multi-primary

slide-26
SLIDE 26

multi-primary

slide-27
SLIDE 27

us-east us-west eu-west

multi-primary

slide-28
SLIDE 28

Sevilla New York Tokyo

multi-primary

slide-29
SLIDE 29

Tree

slide-30
SLIDE 30

City 1 City 2 City 3 City 4 City 5 City 6 City 7 City 8 City 9

Tree

slide-31
SLIDE 31

City 1 City 2 City 3 City 4 City 5 City 6 City 7 City 8 City 9 County 1 County 2 County 3

Tree

slide-32
SLIDE 32

State 2 State 1 City 1 City 2 City 3 City 4 City 5 City 6 City 7 City 8 City 9 County 1 County 2 County 3

Tree

slide-33
SLIDE 33

State 2 State 1 City 1 City 2 City 3 City 4 City 5 City 6 City 7 City 8 City 9 County 1 County 2 County 3 Country

Tree

slide-34
SLIDE 34

Till 1 Till 2 Till 3 Store 1 Till 4 Till 5 City 6 Store 2 Region 1 Region 2 Till 7 Till 8 Till 9 Store 3 Corporate

Tree

slide-35
SLIDE 35

Mesh c.f. Internet of Things / Industry of Things

slide-36
SLIDE 36
slide-37
SLIDE 37

This attitude is good, except when you want to train a new generation

slide-38
SLIDE 38

LET’S MAKE YOU SYNC EXPERTS!

slide-39
SLIDE 39

IDENTITY

we need to uniquely identify a data record

slide-40
SLIDE 40

NATURAL KEYS

Last Name First Name Abblesworth Antonia Burrows Bertram Chickleston Cecilia Dunkington David

slide-41
SLIDE 41

NATURAL KEYS

Last Name First Name Abblesworth Antonia Burrows Bertram Chickleston Cecilia Dunkington David Doe John

slide-42
SLIDE 42

NATURAL KEYS

Last Name First Name Abblesworth Antonia Burrows Bertram Chickleston Cecilia Dunkington David Doe John Doe Jane

slide-43
SLIDE 43

NATURAL KEYS

Last Name First Name Abblesworth Antonia Burrows Bertram Chickleston Cecilia Dunkington David Doe John Doe Jane Doe Jane

slide-44
SLIDE 44

SURROGATE KEYS: AUTO INCREMENT

ID Last Name First Name 1 Abblesworth Antonia 2 Burrows Bertram 3 Chickleston Cecilia 4 Dunkington David 5 Doe John 6 Doe Jane 7 Doe Jane

slide-45
SLIDE 45
slide-46
SLIDE 46

SURROGATE KEYS: AUTO INCREMENT ISSUES

ID Last Name First Name 1 Abblesworth Antonia 2 Burrows Bertram 3 Chickleston Cecilia 4 Dunkington David 5 Doe John 6 Doe Jane 7 Doe Jane 8 Ericsson Eric ID Last Name First Name 1 Abblesworth Antonia 2 Burrows Bertram 3 Chickleston Cecilia 4 Dunkington David 5 Doe John 6 Doe Jane 7 Doe Jane 8 Easterburgh Ethel

slide-47
SLIDE 47

SURROGATE KEYS: AUTO INCREMENT ISSUES

ID Last Name First Name 1 Abblesworth Antonia 2 Burrows Bertram 3 Chickleston Cecilia 4 Dunkington David 5 Doe John 6 Doe Jane 7 Doe Jane 8 Ericsson Eric ID Last Name First Name 1 Abblesworth Antonia 2 Burrows Bertram 3 Chickleston Cecilia 4 Dunkington David 5 Doe John 6 Doe Jane 7 Doe Jane 8 Easterburgh Ethel

👏 👏

slide-48
SLIDE 48

SURROGATE KEYS: UUIDS

ID Last Name First Name A0D31890-67 66-43E0- Abblesworth Antonia BE2EBD0B-36 39-42FB-87D1 Burrows Bertram CA7BEA04-79 8C-4903- Chickleston Cecilia DFFCFABC-6F BB-4BB2- Dunkington David E328E255- BF62-4743- Doe John F868B1D1-695 6-41C1-8815-9 Doe Jane G8DC258D- CF27-48E8- Doe Jane H232725B- E577-4F36- Ericsson Eric ID Last Name First Name A0D31890-67 66-43E0- Abblesworth Antonia BE2EBD0B-36 39-42FB-87D1 Burrows Bertram CA7BEA04-79 8C-4903- Chickleston Cecilia DFFCFABC-6F BB-4BB2- Dunkington David E328E255- BF62-4743- Doe John F868B1D1-695 6-41C1-8815-9 Doe Jane G8DC258D- CF27-48E8- Doe Jane KDB685D8-5 0DF-496C- Easterburgh Ethel

Upside:

  • unique per record across nodes
  • survives natural key changes

Downsides:

  • no direct correlation between data and ID -> ICQ
  • more storage stape
slide-49
SLIDE 49

SURROGATE KEYS: UUIDS

ID Last Name First Name A0D31890-67 66-43E0- Abblesworth Antonia BE2EBD0B-36 39-42FB-87D1 Burrows Bertram CA7BEA04-79 8C-4903- Chickleston Cecilia DFFCFABC-6F BB-4BB2- Dunkington David E328E255- BF62-4743- Doe John F868B1D1-695 6-41C1-8815-9 Doe Jane G8DC258D- CF27-48E8- Doe Jane H232725B- E577-4F36- Ericsson Eric ID Last Name First Name A0D31890-67 66-43E0- Abblesworth Antonia BE2EBD0B-36 39-42FB-87D1 Burrows Bertram CA7BEA04-79 8C-4903- Chickleston Cecilia DFFCFABC-6F BB-4BB2- Dunkington David E328E255- BF62-4743- Doe John F868B1D1-695 6-41C1-8815-9 Doe Jane G8DC258D- CF27-48E8- Doe Jane KDB685D8-5 0DF-496C- Easterburgh Ethel

👎 👎

Upside:

  • unique per record across nodes
  • survives natural key changes

Downsides:

  • no direct correlation between data and ID -> ICQ
  • more storage stape
slide-50
SLIDE 50

SYNC INGREDIENTS

➤ Identity

slide-51
SLIDE 51

WHAT’S NEW?

slide-52
SLIDE 52

DELTA

  • 1. Send all documents to the target

ID Last Name First Name A0D31890… Abblesworth Antonia BE2EBD0B… Burrows Bertram CA7BEA04… Chickleston Cecilia DFFCFABC… Dunkington David E328E255… Doe John F868B1D1… Doe Jane G8DC258D… Doe Jane H232725B… Ericsson Eric

slide-53
SLIDE 53

DELTA

  • 1. Send all documents to the target
  • 2. Send only the new documents to the

target

Database

ID Last Name First Name A0D31890… Abblesworth Antonia BE2EBD0B… Burrows Bertram CA7BEA04… Chickleston Cecilia DFFCFABC… Dunkington David

By-Sequence

Update Sequence Doc ID 1 CA7BEA04… 2 DFFCFABC… 3 A0D31890… 4 BE2EBD0B…

slide-54
SLIDE 54

Database A

ID Last Name First Name A0D31890… Abblesworth Antonia BE2EBD0B… Burrows Bertram CA7BEA04… Chickleston Cecilia DFFCFABC… Dunkington David

By-Sequence

Update Sequence Doc ID 1 CA7BEA04… 2 DFFCFABC… 3 A0D31890… 4 BE2EBD0B…

Database B

ID Last Name First Name

slide-55
SLIDE 55

Database A

ID Last Name First Name A0D31890… Abblesworth Antonia BE2EBD0B… Burrows Bertram CA7BEA04… Chickleston Cecilia DFFCFABC… Dunkington David

By-Sequence

Update Sequence Doc ID 1 CA7BEA04… 2 DFFCFABC… 3 A0D31890… 4 BE2EBD0B…

Database B

ID Last Name First Name CA7BEA04… Chickleston Cecilia

slide-56
SLIDE 56

Database A

ID Last Name First Name A0D31890… Abblesworth Antonia BE2EBD0B… Burrows Bertram CA7BEA04… Chickleston Cecilia DFFCFABC… Dunkington David

By-Sequence

Update Sequence Doc ID 1 CA7BEA04… 2 DFFCFABC… 3 A0D31890… 4 BE2EBD0B…

Database B

ID Last Name First Name CA7BEA04… Chickleston Cecilia DFFCFABC… Dunkington David

slide-57
SLIDE 57

Database A

ID Last Name First Name A0D31890… Abblesworth Antonia BE2EBD0B… Burrows Bertram CA7BEA04… Chickleston Cecilia DFFCFABC… Dunkington David

By-Sequence

Update Sequence Doc ID 1 CA7BEA04… 2 DFFCFABC… 3 A0D31890… 4 BE2EBD0B…

Database B

ID Last Name First Name A0D31890… Abblesworth Antonia CA7BEA04… Chickleston Cecilia DFFCFABC… Dunkington David

slide-58
SLIDE 58

Database A

ID Last Name First Name A0D31890… Abblesworth Antonia BE2EBD0B… Burrows Bertram CA7BEA04… Chickleston Cecilia DFFCFABC… Dunkington David

By-Sequence

Update Sequence Doc ID 1 CA7BEA04… 2 DFFCFABC… 3 A0D31890… 4 BE2EBD0B…

Database B

ID Last Name First Name A0D31890… Abblesworth Antonia BE2EBD0B… Burrows Bertram CA7BEA04… Chickleston Cecilia DFFCFABC… Dunkington David

slide-59
SLIDE 59

Database A

ID Last Name First Name A0D31890… Abblesworth Antonia BE2EBD0B… Burrows Bertram CA7BEA04… Chickleston Cecilia DFFCFABC… Dunkington David

By-Sequence

Update Sequence Doc ID 1 CA7BEA04… 2 DFFCFABC… 3 A0D31890… 4 BE2EBD0B…

Database B

ID Last Name First Name A0D31890… Abblesworth Antonia BE2EBD0B… Burrows Bertram CA7BEA04… Chickleston Cecilia DFFCFABC… Dunkington David

High Watermark: 4

slide-60
SLIDE 60

Database A

ID Last Name First Name A0D31890… Abblesworth Antonia BE2EBD0B… Burrows Bertram CA7BEA04… Chickleston Cecilia DFFCFABC… Dunkington David E328E255… Doe John

By-Sequence

Update Sequence Doc ID 1 CA7BEA04… 2 DFFCFABC… 3 A0D31890… 4 BE2EBD0B… 5 E328E255…

Database B

ID Last Name First Name A0D31890… Abblesworth Antonia BE2EBD0B… Burrows Bertram CA7BEA04… Chickleston Cecilia DFFCFABC… Dunkington David

High Watermark: 4

slide-61
SLIDE 61

Database B

ID Last Name First Name A0D31890… Abblesworth Antonia BE2EBD0B… Burrows Bertram CA7BEA04… Chickleston Cecilia DFFCFABC… Dunkington David E328E255… Doe John

High Watermark: 5

Database A

ID Last Name First Name A0D31890… Abblesworth Antonia BE2EBD0B… Burrows Bertram CA7BEA04… Chickleston Cecilia DFFCFABC… Dunkington David E328E255… Doe John

By-Sequence

Update Sequence Doc ID 1 CA7BEA04… 2 DFFCFABC… 3 A0D31890… 4 BE2EBD0B… 5 E328E255…

Next: Updates

slide-62
SLIDE 62

UPDATES

slide-63
SLIDE 63

Database B

ID Last Name First Name A0D31890… Abblesworth Antonia BE2EBD0B… Burrows Bertram CA7BEA04… Chickleston Cecilia DFFCFABC… Dunkington David E328E255… Doe John

High Watermark: 5

Database A

ID Last Name First Name A0D31890… Abblesworth Antonia BE2EBD0B… Burrows Bertram CA7BEA04… Chickleston Cecilia DFFCFABC… Dunkington David E328E255…* Esterhase John

By-Sequence

Update Sequence Doc ID 1 CA7BEA04… 2 DFFCFABC… 3 A0D31890… 4 BE2EBD0B… 5 E328E255… 6 E328E255…*

slide-64
SLIDE 64

Database B

ID Last Name First Name A0D31890… Abblesworth Antonia BE2EBD0B… Burrows Bertram CA7BEA04… Chickleston Cecilia DFFCFABC… Dunkington David E328E255…* Esterhase John

High Watermark: 6

Database A

ID Last Name First Name A0D31890… Abblesworth Antonia BE2EBD0B… Burrows Bertram CA7BEA04… Chickleston Cecilia DFFCFABC… Dunkington David E328E255…* Esterhase John

By-Sequence

Update Sequence Doc ID 1 CA7BEA04… 2 DFFCFABC… 3 A0D31890… 4 BE2EBD0B… 5 E328E255… 6 E328E255…*

Now we go back and make a typo

slide-65
SLIDE 65

Database B

ID Last Name First Name A0D31890… Abblesworth Antonia BE2EBD0B… Burrows Bertram CA7BEA04… Chickleston Cecilia DFFCFABC… Dunkington David E328E255… Doe John

High Watermark: 5

Database A

ID Last Name First Name A0D31890… Abblesworth Antonia BE2EBD0B… Burrows Bertram CA7BEA04… Chickleston Cecilia DFFCFABC… Dunkington David E328E255…** Esterhose John

By-Sequence

Update Sequence Doc ID 2 DFFCFABC… 3 A0D31890… 4 BE2EBD0B… 5 E328E255… 6 E328E255…* 7 E328E255…**

slide-66
SLIDE 66

Database B

ID Last Name First Name A0D31890… Abblesworth Antonia BE2EBD0B… Burrows Bertram CA7BEA04… Chickleston Cecilia DFFCFABC… Dunkington David E328E255…* Esterhase John

High Watermark: 6

Database A

ID Last Name First Name A0D31890… Abblesworth Antonia BE2EBD0B… Burrows Bertram CA7BEA04… Chickleston Cecilia DFFCFABC… Dunkington David E328E255…** Esterhose John

By-Sequence

Update Sequence Doc ID 2 DFFCFABC… 3 A0D31890… 4 BE2EBD0B… 5 E328E255… 6 E328E255…* 7 E328E255…**

Wouldn’t it be more efficient to just send update 7?

slide-67
SLIDE 67

Database B

ID Last Name First Name A0D31890… Abblesworth Antonia BE2EBD0B… Burrows Bertram CA7BEA04… Chickleston Cecilia DFFCFABC… Dunkington David E328E255…** Esterhose John

High Watermark: 7

Database A

ID Last Name First Name A0D31890… Abblesworth Antonia BE2EBD0B… Burrows Bertram CA7BEA04… Chickleston Cecilia DFFCFABC… Dunkington David E328E255…** Esterhose John

By-Sequence

Update Sequence Doc ID 1 CA7BEA04… 2 DFFCFABC… 3 A0D31890… 4 BE2EBD0B… 7 E328E255…**

Why yes it would -> make the sequence table unique for doc ids Next: deletes

slide-68
SLIDE 68

DELETES

slide-69
SLIDE 69

Database B

ID Last Name First Name A0D31890… Abblesworth Antonia BE2EBD0B… Burrows Bertram CA7BEA04… Chickleston Cecilia DFFCFABC… Dunkington David E328E255…** Esterhose John

High Watermark: 7

Database A

ID Last Name First Name A0D31890… Abblesworth Antonia BE2EBD0B… Burrows Bertram CA7BEA04… Chickleston Cecilia DFFCFABC… Dunkington David E328E255…** Esterhose John

By-Sequence

Update Sequence Doc ID 2 DFFCFABC… 3 A0D31890… 4 BE2EBD0B… 7 E328E255…** 8 CA7BEA04…

Wouldn’t it be more efficient to just send update 7? Why yes it would -> make the sequence table unique for doc ids

slide-70
SLIDE 70

Database B

ID Last Name First Name A0D31890… Abblesworth Antonia BE2EBD0B… Burrows Bertram CA7BEA04… Chickleston Cecilia DFFCFABC… Dunkington David E328E255…** Esterhose John

High Watermark: 8

Database A

ID Last Name First Name A0D31890… Abblesworth Antonia BE2EBD0B… Burrows Bertram CA7BEA04… Chickleston Cecilia DFFCFABC… Dunkington David E328E255…** Esterhose John

By-Sequence

Update Sequence Doc ID 2 DFFCFABC… 3 A0D31890… 4 BE2EBD0B… 7 E328E255…** 8 CA7BEA04…

Wouldn’t it be more efficient to just send update 7? Why yes it would -> make the sequence table unique for doc ids

slide-71
SLIDE 71

SYNC INGREDIENTS

➤ Identity ➤ What happened since?

slide-72
SLIDE 72

QUESTIONS?

slide-73
SLIDE 73

VERSIONS

* ** business, what’s up with that?

slide-74
SLIDE 74

AUTO INCREMENT

* ** business, what’s up with that?

slide-75
SLIDE 75

VERSIONS: AUTO INCREMENT?

Database A

ID Version Last Name First Name A0D31890 … 1 Abblesworth Antonia

Database B

ID Version Last Name First Name

slide-76
SLIDE 76

VERSIONS: AUTO INCREMENT?

Database A

ID Version Last Name First Name A0D31890 … 1 Abblesworth Antonia

Database B

ID Version Last Name First Name A0D31890 … 1 Abblesworth Antonia

slide-77
SLIDE 77

VERSIONS: AUTO INCREMENT?

Database A

ID Version Last Name First Name A0D31890 … 2 Bobblesworth Antonia

Database B

ID Version Last Name First Name A0D31890 … 2 Wibblesworth Antonia

What is it about the auto-increment that is so appealing: strictly ordered in math terms: monotonically increasing what else is monotonically increasing?

slide-78
SLIDE 78

VERSIONS: AUTO INCREMENT?

Database A

ID Version Last Name First Name A0D31890 … 2 Bobblesworth Antonia

Database B

ID Version Last Name First Name A0D31890 … 2 Wibblesworth Antonia

👏 👏

What is it about the auto-increment that is so appealing: strictly ordered in math terms: monotonically increasing what else is monotonically increasing?

slide-79
SLIDE 79

TIME!

From: Falsehoods programmers believe about time, and the sequel. ALL OF THESE STATEMENTS ARE FALSE http://infiniteundo.com/post/25326999628/falsehoods-programmers-believe-about-time http://infiniteundo.com/post/25509354022/more-falsehoods-programmers-believe-about-time

slide-80
SLIDE 80

THE SYSTEM CLOCK WILL ALWAYS BE SET TO THE CORRECT LOCAL TIME.

slide-81
SLIDE 81

THE SYSTEM CLOCK WILL ALWAYS BE SET TO A TIME THAT IS NOT WILDLY DIFFERENT FROM THE CORRECT LOCAL TIME.

slide-82
SLIDE 82

IF THE SYSTEM CLOCK IS INCORRECT, IT WILL AT LEAST ALWAYS BE OFF BY A CONSISTENT NUMBER OF SECONDS.

slide-83
SLIDE 83

THE SERVER CLOCK AND THE CLIENT CLOCK WILL ALWAYS BE SET TO THE SAME TIME.

slide-84
SLIDE 84

THE SERVER CLOCK AND THE CLIENT CLOCK WILL ALWAYS BE SET TO AROUND THE SAME TIME.

slide-85
SLIDE 85

OK, BUT THE TIME ON THE SERVER CLOCK AND TIME ON THE CLIENT CLOCK WOULD NEVER BE DIFFERENT BY A MATTER OF DECADES.

slide-86
SLIDE 86

IF THE SERVER CLOCK AND THE CLIENT CLOCK ARE NOT IN SYNC, THEY WILL AT LEAST ALWAYS BE OUT OF SYNC BY A CONSISTENT NUMBER OF SECONDS.

slide-87
SLIDE 87

THE SERVER CLOCK AND THE CLIENT CLOCK WILL USE THE SAME TIME ZONE.

slide-88
SLIDE 88

THE SYSTEM CLOCK WILL NEVER BE SET TO A TIME THAT IS IN THE DISTANT PAST OR THE FAR FUTURE.

slide-89
SLIDE 89

ONE MINUTE ON THE SYSTEM CLOCK HAS EXACTLY THE SAME DURATION AS ONE MINUTE ON ANY OTHER CLOCK

slide-90
SLIDE 90

OK, BUT THE DURATION OF ONE MINUTE ON THE SYSTEM CLOCK WILL BE PRETTY CLOSE TO THE DURATION OF ONE MINUTE ON MOST OTHER CLOCKS.

slide-91
SLIDE 91

FINE, BUT THE DURATION OF ONE MINUTE ON THE SYSTEM CLOCK WOULD NEVER BE MORE THAN AN HOUR.

slide-92
SLIDE 92

A TIME STAMP OF SUFFICIENT PRECISION CAN SAFELY BE CONSIDERED UNIQUE.

slide-93
SLIDE 93

IT’S POSSIBLE TO ESTABLISH A TOTAL ORDERING ON TIMESTAMPS THAT IS USEFUL OUTSIDE YOUR SYSTEM.

slide-94
SLIDE 94

TIMESTAMPS ALWAYS ADVANCE MONOTONICALLY.

slide-95
SLIDE 95

MY SOFTWARE IS ONLY USED INTERNALLY/LOCALLY, SO I DON’T HAVE TO WORRY ABOUT TIMEZONES

slide-96
SLIDE 96

MY SOFTWARE STACK WILL HANDLE TIMEZONE WITHOUT ME NEEDING TO DO ANYTHING SPECIAL

slide-97
SLIDE 97

ALL MEASUREMENTS OF TIME ON A GIVEN CLOCK WILL OCCUR WITHIN THE SAME FRAME OF REFERENCE.

In other words

slide-98
SLIDE 98

TIME PASSES AT THE SAME SPEED ON TOP OF A MOUNTAIN AND AT THE BOTTOM OF A VALLEY.

slide-99
SLIDE 99

TIMESTAMPS? NO

Spanner 2FA Whatever the exact scenario, this is plausible and has documented occurrences in small and large-scale systems: you can’t rely on timestamps to guarantee the order of two items, even if the timestamps were generated on the same device as they lead to data loss and/or duplication. This is what happens under the hood when you suddenly have all your notes or contacts twice, after syncing your phone and your desktop. Or why that one contact always gets deleted when you try to sync from phone to desktop (but not the other way around)

  • Ok. How can we improve on timestamps?

Before we find out, we need to introduce one more new concept: conflicts.

slide-100
SLIDE 100

DISTRIBUTED SYSTEMS INTERLUDE

slide-101
SLIDE 101

VERSIONS: AUTO INCREMENT?

Database A

ID Version Last Name First Name A0D31890 … 2 Bobblesworth Antonia

Database B

ID Version Last Name First Name A0D31890 … 2 Wibblesworth Antonia

👏 👏

We are trained to think conflicts are bad. Until you discover distributed systems

slide-102
SLIDE 102

VERSIONS: EMBRACING CONFLICTS

Database A

ID Version Last Name First Name A0D31890 … 2 Bobblesworth Antonia

Database B

ID Version Last Name First Name A0D31890 … 2 Wibblesworth Antonia

👎 👎

In distributed systems, conflicts are just another natural state of data. We all know how this works in version control systems, with the big markers

slide-103
SLIDE 103

>>>>>>>>>>>>>>>
 =============== <<<<<<<<<<<<<<<

We know how to deal with this.

slide-104
SLIDE 104

As soon as you have to computers connected by network, you have a distributed systems. Most other databases pretend that’s not true. That’s why you are monitoring “replication lag” and somesuch

slide-105
SLIDE 105

CONFLICTS ARE COOL

slide-106
SLIDE 106

CONFLICTS ARE COOL

slide-107
SLIDE 107

WHAT’S BETTER THAN TIMESTAMPS?

slide-108
SLIDE 108

VECTOR CLOCKS

➤ “Logical clock” ➤ Monotonically increase ➤ Causality, X happened after Y ➤ No representation of when precisely

something happened

➤ This is a good thing!

slide-109
SLIDE 109

Cluster Node A Node C Node B DB 1 Shard 1 DB 1 Shard 1* DB 1 Shard 1**

PUT abc n=3

Client

Dynamo fast-intro

slide-110
SLIDE 110

VECTOR CLOCKS

➤ “Logical clock” ➤ Monotonically increase ➤ Causality, X happened after Y ➤ No representation of when precisely

something happened

➤ This is a good thing! ➤ Would introduce conflicts on each

cluster-write.

➤ This is not a good thing

slide-111
SLIDE 111

CONTENT ADDRESSABLE VERSIONS

slide-112
SLIDE 112

CONTENT ADDRESSABLE VERSIONS

➤ Hash over document version contents

slide-113
SLIDE 113

CONTENT ADDRESSABLE VERSIONS

➤ Hash over document version contents ➤ currently MD5 🙊

slide-114
SLIDE 114

CONTENT ADDRESSABLE VERSIONS

➤ Hash over document version contents ➤ currently MD5 🙊 ➤ Ordered, stored in a list:

slide-115
SLIDE 115

CONTENT ADDRESSABLE VERSIONS

➤ Hash over document version contents ➤ currently MD5 🙊 ➤ Ordered, stored in a list: ➤ [A, B, C, D, E]

slide-116
SLIDE 116

CONTENT ADDRESSABLE VERSIONS

➤ Hash over document version contents ➤ currently MD5 🙊 ➤ Ordered, stored in a list: ➤ [A, B, C, D, E] ➤ List not as elegant as vector clocks,

but no other way.

slide-117
SLIDE 117

CONTENT ADDRESSABLE VERSIONS

➤ Hash over document version contents ➤ currently MD5 🙊 ➤ Ordered, stored in a list: ➤ [A, B, C, D, E] ➤ List not as elegant as vector clocks,

but no other way.

➤ Limit

slide-118
SLIDE 118

REPLICATION DETAIL INTERLUDE

Database Database

replication details

slide-119
SLIDE 119

REPLICATION DETAIL INTERLUDE

Doc 1 [Rev A]

Database Database

replication details

slide-120
SLIDE 120

REPLICATION DETAIL INTERLUDE

Doc 1 [Rev A] Doc 1 [B, A]

Database Database

replication details

slide-121
SLIDE 121

REPLICATION DETAIL INTERLUDE

Doc 1 [Rev A] Doc 1 [B, A]

Database

Doc 1 [C, B, A]

Database

replication details

slide-122
SLIDE 122

REPLICATION DETAIL INTERLUDE

Doc 1 [Rev A] Doc 1 [B, A]

Database

Doc 1 [C, B, A]

Database

replication details

slide-123
SLIDE 123

REPLICATION DETAIL INTERLUDE

Doc 1 [Rev A] Doc 1 [B, A]

Database

Doc 1 [C, B, A]

Database

Doc 1 [C, B, A]

replication details

slide-124
SLIDE 124

REPLICATION DETAIL INTERLUDE

Doc 1 [Rev A] Doc 1 [B, A]

Database

Doc 1 [C, B, A]

Database

Doc 1 [C, B, A] Doc 1 [D, C, B, A]

replication details

slide-125
SLIDE 125

REPLICATION DETAIL INTERLUDE

Doc 1 [Rev A] Doc 1 [B, A]

Database

Doc 1 [C, B, A]

Database

Doc 1 [C, B, A] Doc 1 [D, C, B, A] Doc 1 [E, D, C, B, A]

replication details

slide-126
SLIDE 126

REPLICATION DETAIL INTERLUDE

Doc 1 [Rev A] Doc 1 [B, A]

Database

Doc 1 [C, B, A]

Database

Doc 1 [C, B, A] Doc 1 [D, C, B, A] Doc 1 [E, D, C, B, A]

replication details

slide-127
SLIDE 127

REPLICATION DETAIL INTERLUDE

Doc 1 [Rev A] Doc 1 [B, A]

Database

Doc 1 [C, B, A]

Database

Doc 1 [C, B, A] Doc 1 [D, C, B, A] Doc 1 [E, D, C, B, A] Doc 1 [E, D, C, B, A]

replication details

slide-128
SLIDE 128

REPLICATION DETAIL INTERLUDE

Doc 1 [Rev A] Doc 1 [B, A]

Database

Doc 1 [C, B, A]

Database

Doc 1 [C, B, A] Doc 1 [D, C, B, A] Doc 1 [E, D, C, B, A] Doc 1 [E, D, C, B, A] Doc 1 [F, E, D, C, B, A]

replication details

slide-129
SLIDE 129

REPLICATION DETAIL INTERLUDE

Doc 1 [Rev A] Doc 1 [B, A]

Database

Doc 1 [C, B, A]

Database

Doc 1 [C, B, A] Doc 1 [D, C, B, A] Doc 1 [E, D, C, B, A] Doc 1 [E, D, C, B, A] Doc 1 [F, E, D, C, B, A] Doc 1 [H, E, D, C, B, A]

replication details

slide-130
SLIDE 130

REPLICATION DETAIL INTERLUDE

Doc 1 [Rev A] Doc 1 [B, A]

Database

Doc 1 [C, B, A]

Database

Doc 1 [C, B, A] Doc 1 [D, C, B, A] Doc 1 [E, D, C, B, A] Doc 1 [E, D, C, B, A] Doc 1 [F, E, D, C, B, A] Doc 1 [H, E, D, C, B, A]

replication details

slide-131
SLIDE 131

REPLICATION DETAIL INTERLUDE

Doc 1 [Rev A] Doc 1 [B, A]

Database

Doc 1 [C, B, A]

Database

Doc 1 [C, B, A] Doc 1 [D, C, B, A] Doc 1 [E, D, C, B, A] Doc 1 [E, D, C, B, A] Doc 1 [F, E, D, C, B, A] Doc 1 [H, E, D, C, B, A] Doc 1 [[F, H], E, D, C, B, A]

replication details

slide-132
SLIDE 132

REPLICATION DETAIL INTERLUDE

Doc 1 [Rev A] Doc 1 [B, A]

Database

Doc 1 [C, B, A]

Database

Doc 1 [C, B, A] Doc 1 [D, C, B, A] Doc 1 [E, D, C, B, A] Doc 1 [E, D, C, B, A] Doc 1 [F, E, D, C, B, A] Doc 1 [H, E, D, C, B, A] Doc 1 [[F, H], E, D, C, B, A]

{ _id: "1", _conflicts: [F, H] }

replication details

slide-133
SLIDE 133

A B C D E F H

slide-134
SLIDE 134

SYNC INGREDIENTS

➤ Identity ➤ What happened since? ➤ Embrace Conflicts ➤ List of Content Addressable Versions

slide-135
SLIDE 135

BONUS MATERIAL

slide-136
SLIDE 136

OPERATIONAL TRANSFORMS

slide-137
SLIDE 137

OPERATIONAL TRANSFORMS

➤ Etherpad

slide-138
SLIDE 138

OPERATIONAL TRANSFORMS

➤ Etherpad ➤ Google Docs / Wave 😭

slide-139
SLIDE 139

OPERATIONAL TRANSFORMS

➤ Etherpad ➤ Google Docs / Wave 😭 ➤ Good for usually connected

collaborative text editing

slide-140
SLIDE 140

OPERATIONAL TRANSFORMS

➤ Etherpad ➤ Google Docs / Wave 😭 ➤ Good for usually connected

collaborative text editing

➤ Stops working efficiently with longer

disconnects

slide-141
SLIDE 141

CRDTS

slide-142
SLIDE 142

CRDTS

➤ Conflict-free Replicated Data Types

slide-143
SLIDE 143

CRDTS

➤ Conflict-free Replicated Data Types ➤ No Conflicts!

slide-144
SLIDE 144

CRDTS

➤ Conflict-free Replicated Data Types ➤ No Conflicts! ➤ Only specific data types.

slide-145
SLIDE 145

CRDTS

➤ Conflict-free Replicated Data Types ➤ No Conflicts! ➤ Only specific data types. ➤ Lists, trees, hashes, counters

slide-146
SLIDE 146

CRDTS

➤ Conflict-free Replicated Data Types ➤ No Conflicts! ➤ Only specific data types. ➤ Lists, trees, hashes, counters ➤ All with non-standard properties

slide-147
SLIDE 147

CRDTS

➤ Conflict-free Replicated Data Types ➤ No Conflicts! ➤ Only specific data types. ➤ Lists, trees, hashes, counters ➤ All with non-standard properties ➤ Good for special cases where they fit

slide-148
SLIDE 148

CONFLICT-FREE REPLICATED JSON DATATYPE

slide-149
SLIDE 149

CONFLICT-FREE REPLICATED JSON DATATYPE

➤ CRJDT

slide-150
SLIDE 150

CONFLICT-FREE REPLICATED JSON DATATYPE

➤ CRJDT ➤ No Conflicts!

slide-151
SLIDE 151

CONFLICT-FREE REPLICATED JSON DATATYPE

➤ CRJDT ➤ No Conflicts! ➤ Very new

slide-152
SLIDE 152

CONFLICT-FREE REPLICATED JSON DATATYPE

➤ CRJDT ➤ No Conflicts! ➤ Very new ➤ Haven’t read it yet

slide-153
SLIDE 153

CONFLICT-FREE REPLICATED JSON DATATYPE

➤ CRJDT ➤ No Conflicts! ➤ Very new ➤ Haven’t read it yet ➤ Probably same limitation as with

CRDTs: special semantics.

slide-154
SLIDE 154

CONFLICT-FREE REPLICATED JSON DATATYPE

➤ CRJDT ➤ No Conflicts! ➤ Very new ➤ Haven’t read it yet ➤ Probably same limitation as with

CRDTs: special semantics.

➤ https://arxiv.org/abs/1608.03960

slide-155
SLIDE 155

THANK YOU!

Apache CouchDB Sync Deep Dive
 Jan Lehnardt @janl jan@apache.org Professional Support for Apache CouchDB: https://neighbourhood.ie

slide-156
SLIDE 156