[PPT] - Secondary reads: the good and the bad Bartomiej Noga Agenda Read PowerPoint Presentation

SLIDE 1

Secondary reads: the good and the bad

Bartłomiej Nogaś

SLIDE 2

2

Agenda

Read Preference configuration
Lagging secondaries and stale or missing/duplicated data
What queries can be safely run on secondaries?
Improving read throughput: sharding vs reading from

secondaries

SLIDE 3

Read preference configurations

And impact of step downs

SLIDE 4

4

Client Configuration options

ELIGIBLE NODE Node that satisfies all the conditions defined in the Read Preference. A client directs reads to all eligible nodes at random.

SLIDE 5

5

Client Configuration options

serverSelectionTimeout

○ How long to wait for an eligible node ○ Defaults to 30 seconds

ELIGIBLE NODE Node that satisfies all the conditions defined in the Read Preference. A client directs reads to all eligible nodes at random.

SLIDE 6

6

Client Configuration options

serverSelectionTimeout

○ How long to wait for an eligible node ○ Defaults to 30 seconds

localTresholdMS (default 15ms)

○ Size of the latency window for selecting among available replica set members

ELIGIBLE NODE Node that satisfies all the conditions defined in the Read Preference. A client directs reads to all eligible nodes at random.

SLIDE 7

7

Latency Window

Every 10 seconds (3.2) driver sends a heartbeat to measure

network response time(last_RTT)

Average RTT is a weighted moving function,

Last observation weight is 0.2 ( last nine around 0.85 )

localTresholdMS is relative to the server with lowest RTT

SLIDE 8

8

Available Read Preference Modes

Primary Secondary Primary preferred Secondary preferred

Nearest

SLIDE 9

9

Primary and Secondary

PRIMARY

Read only from the

Primary member

Exception if no

Primary is available SECONDARY

Read only from

secondary members within latency window

Exception if there is

no Secondary

SLIDE 10

10

Primary and Secondary Preferred

PRIMARY PREFERRED

Read from the

Primary member

If no Primary is

available follow the procedure for secondary read preference SECONDARY PREFERRED

Read from

secondary members within latency window

If no Secondary is

available read from Primary

SLIDE 11

11

Nearest

NEAREST Read from any member within the latency window WHEN TO USE If you need the shortest response time

SLIDE 12

Read Preference Tags

Multiple DC configuration

SLIDE 13

13

Read preference tags

Tag is a single key/value pair:
Ex. {"dc": "A"}
Tag set is a document containing

zero or more of such tags

Example: {"dc": "A",

"role": "backup"}

One can’t use tags with read

Preference Primary

SLIDE 14

14

Multiple DC configuration

Nearest with tags

{"dc": "A"} will choose between (P and S1)

Primary (P) Secondary (S1)

{"dc": "A"}

Secondary (S2) Secondary (S3)

{"dc": "B"}

SLIDE 15

15

Multiple DC configuration

Nearest with tags

{"dc": "A"} will choose between (P and S1)

secondaryPreferred with

tags {"dc": "B"} will read from S2 or S3 or P

Primary (P) Secondary (S1)

{"dc": "A"}

Secondary (S2) Secondary (S3)

{"dc": "B"}

SLIDE 16

16

Multiple DC configuration

Primary (P) Secondary (S1)

{"dc": "A"}

Secondary (S2) Secondary (S3)

{"dc": "B"}

Note: setting

Mode: Secondary Tags: {"dc": "A"} would allow only node S1, in case of failure of this node there will be no eligible members

SLIDE 17

17

Agenda

Read Preference configuration
Lagging secondaries and stale or missing/duplicated data
What queries can be safely run on secondaries?
Improving read throughput: sharding vs reading from

secondaries

SLIDE 18

Lagging secondaries

And stale or missing data

SLIDE 19

19

Stale data

Primary Secondary

Replication lag

○ rs.printSlaveReplicationInfo() (rs.status())

Typically replication lag

should not be bigger than a couple of seconds

The lag can grow big for

example when secondaries uses worse hardware than primary

SLIDE 20

20

Stale data

Primary (P) Secondary (S1) Lag 2s

Take an example:

○ An update is made on P on a document

Secondary (S2) Lag 4s Client (C)

SLIDE 21

21

Stale data

Primary (P) Secondary (S1) Lag 2s

Take an example:

○ An update is made on P on a document ○ The write is replicated to S1

Secondary (S2) Lag 4s Client (C)

SLIDE 22

22

Stale data

Primary (P) Secondary (S1) Lag 2s

Take an example:

○ An update is made on P on a document ○ The write is replicated to S1 ○ C is reading the document from S1 (got updated version)

Secondary (S2) Lag 4s Client (C)

SLIDE 23

23

Stale data

Primary (P) Secondary (S1) Lag 2s

Take an example:

○ An update is made on P on a document ○ The write is replicated to S1 ○ C is reading the document from S1 (got updated version) ○ Then C is reading the same document from S2 (old record)

Secondary (S2) Lag 4s Client (C)

SLIDE 24

24

Stale data

Primary (P) Secondary (S1) Lag 2s

Take an example:

○ An update is made on P on a document ○ The write is replicated to S1 ○ C is reading the document from S1 (got updated version) ○ Then C is reading the same document from S2 (old record)

Important to monitor

replication lag

Secondary (S2) Lag 4s Client (C)

SLIDE 25

25

Changes in MongoDB 3.4

Primary (P) Secondary (S1) Lag 2s

maxStalenessMS parameter

is added to read Preference

This parameter defines the

maximum replication latency for a secondary to read from

Example: if maxStalnessMS

is set to 3000ms:

○ S1, lag 2s will be eligible ○ S2, lag 4s will not be eligible

Secondary (S2) Lag 4s

SLIDE 26

26

Missing/Duplicated data in Sharded Cluster

TWO PROBLEMS

Duplicated/outdated data

because of orphaned documents

Missing data because of not

yet replicated chunk migration SERVER-3645 - Inaccurate count for primary SERVER-5931 - Inconsistent read from secondary in sharded environment

SLIDE 27

27

Orphaned records and duplicated data

Duplicated and outdated with

secondary readPreference

Orphaned Document

○ Failed balancer rounds ○ During chunk migration Orphaned Document In sharded cluster it’s a document that exists also on shard that it doesn’t belong to

SLIDE 28

28

Orphaned records and duplicated data

db.test shard key: { "_id": 1 } { "_id": [object MinKey] } -> { "_id": 10 } on: test-rs0 { "_id": 10 } -> { "_id": [object MaxKey] } on: test-rs1

SLIDE 29

29

Orphaned records and duplicated data

db.test shard key: { "_id": 1 } { "_id": [object MinKey] } -> { "_id": 10 } on: test-rs0 { "_id": 10 } -> { "_id": [object MaxKey] } on: test-rs1 test-rs0/test_db db.test.find(): {"_id" : 1, "rs": 0} {"_id" : 2, "rs": 0}

SLIDE 30

30

Orphaned records and duplicated data

db.test shard key: { "_id": 1 } { "_id": [object MinKey] } -> { "_id": 10 } on: test-rs0 { "_id": 10 } -> { "_id": [object MaxKey] } on: test-rs1 test-rs0/test db.test.find(): {"_id" : 1, "rs": 0} {"_id" : 2, "rs": 0} test-rs1/test db.test.find(): {"_id" : 12, "rs": 1} {"_id" : 2, "rs": 1}

SLIDE 31

31

Orphaned records and duplicated data

test-rs0/test {"_id" : 1, "rs": 0} {"_id" : 2, "rs": 0} test-rs1/test {"_id" : 12, "rs": 1} {"_id" : 2, "rs": 1} Query readPreference=Primary db.test.find().readPref("primary") {"_id" : 1, "rs": 0} {"_id" : 2, "rs": 0} {"_id" : 12, "rs": 1}

SLIDE 32

32

Orphaned records and duplicated data

test-rs0/test {"_id" : 1, "rs": 0} {"_id" : 2, "rs": 0} test-rs1/test {"_id" : 12, "rs": 1} {"_id" : 2, "rs": 1} Query readPreference=Secondary db.test.find().readPref("secondary") {"_id" : 1, "rs": 0} {"_id" : 2, "rs": 0} {"_id" : 12, "rs": 1} {"_id" : 2, "rs": 1}

SLIDE 33

33

Orphaned records and duplicated data

test-rs0/test {"_id" : 1, "rs": 0} {"_id" : 2, "rs": 0} test-rs1/test {"_id" : 12, "rs": 1} {"_id" : 2, "rs": 1} Query readPreference=Secondary find({"_id" : 2}).readPref("secondary") {"_id" : 2, "rs": 0}

SLIDE 34

34

Orphaned records and duplicated data

test-rs0/test {"_id" : 1, "rs": 0} {"_id" : 2, "rs": 0} test-rs1/test {"_id" : 12, "rs": 1} {"_id" : 2, "rs": 1} Query readPreference=Secondary find({"_id" : 2}).readPref("secondary") {"_id" : 2, "rs": 0} find({"rs" : 1}).readPref("secondary") {"_id" : 2, "rs": 1}

SLIDE 35

35

Missing data with active balancer

By default balancer migrates

chunks with "writeConcern": {"w": 2}

writeConcern for the

balancer can be changed

SLIDE 36

36

Missing data with active balancer

By default balancer migrates

chunks with "writeConcern": {"w": 2}

writeConcern for the

balancer can be changed

At the end of a migration -

config database is updated

SLIDE 37

37

Missing data with active balancer

By default balancer migrates

chunks with "writeConcern": {"w": 2}

writeConcern for the

balancer can be changed

At the end of a migration -

config database is updated

Primary (P) Secondary (S1) Secondary (S2) Client (C)

SLIDE 38

38

Is there a workaround?

CleanOrphaned documents
n shards
Don’t use automatic balancer
Set the balancer window
Issue only not critical reads
n Secondaries

SLIDE 39

39

Agenda

Read Preference configuration
Lagging secondaries and stale or missing/duplicated data
What queries can be safely run on secondaries?
Improving read throughput: sharding vs reading from

secondaries

SLIDE 40

What queries can be run on secondaries?

SLIDE 41

41

What queries can be run on secondaries?

+ Counts + Offline data analyzing + Backup jobs + Online queries that doesn’t require strong consistency

Queries that

requires strong consistency

SLIDE 42

42

Agenda

Read Preference configuration
Lagging secondaries and stale or missing/duplicated data
What queries can be safely run on secondaries?
Improving read throughput: sharding vs reading from

secondaries

SLIDE 43

Improving read throughput

Read from secondaries, sharding

SLIDE 44

44

Improving read throughput - Secondaries

+ Reduce read time if Multi data center + Efficient use of secondary indexes + Reduce network load + Reduce CPU load

No immediate

consistency

Does not reduce

indexes or working set size

Every node in a

replica set has roughly the same write load

SLIDE 45

45

Improving read throughput - Sharding

+ Reduces indexes and working set size + Strong consistency + Improves also write throughput

Secondary indexes

are inefficient so it may require data de-normalization

Requires using

additional nodes for config servers and mongos

SLIDE 46

46

Thank you

Contact: bartlomiej.nogas@allegrogroup.com