SCALE OUT AND CONQUER: ARCHITECTURAL DECISIONS BEHIND DISTRIBUTED - - PowerPoint PPT Presentation

scale out and conquer
SMART_READER_LITE
LIVE PREVIEW

SCALE OUT AND CONQUER: ARCHITECTURAL DECISIONS BEHIND DISTRIBUTED - - PowerPoint PPT Presentation

SCALE OUT AND CONQUER: ARCHITECTURAL DECISIONS BEHIND DISTRIBUTED IN-MEMORY SYSTEMS VLADIMIR OZEROV YAKOV ZHDANOV WHO? Yakov Zhdanov: - GridGains Product Development VP - With GridGain since 2010 - Apache Ignite committer and PMC -


slide-1
SLIDE 1

SCALE OUT AND CONQUER:

ARCHITECTURAL DECISIONS BEHIND DISTRIBUTED IN-MEMORY SYSTEMS

VLADIMIR OZEROV YAKOV ZHDANOV

slide-2
SLIDE 2

WHO?

Yakov Zhdanov:

  • GridGain’s Product Development VP
  • With GridGain since 2010
  • Apache Ignite committer and PMC
  • Passion for performance & scalability
  • Finding ways to make product better
  • St. Petersburg, Russia
slide-3
SLIDE 3

WHY IN-MEMORY?

slide-4
SLIDE 4

PLAN

  • 1. Data partitioning and affjnity functions examples
slide-5
SLIDE 5

PLAN

  • 1. Data partitioning and affjnity functions examples
  • 2. Data affjnity colocation
slide-6
SLIDE 6

PLAN

  • 1. Data partitioning and affjnity functions examples
  • 2. Data affjnity colocation
  • 3. Synchronization in distributed systems
slide-7
SLIDE 7

PLAN

  • 1. Data partitioning and affjnity functions examples
  • 2. Data affjnity colocation
  • 3. Synchronization in distributed systems
  • 4. Multithreading: local architecture
slide-8
SLIDE 8

PLAN

  • 1. Data partitioning and affjnity functions examples
  • 2. Data affjnity colocation
  • 3. Synchronization in distributed systems
  • 4. Multithreading: local architecture
slide-9
SLIDE 9

On which node of the cluster does the key reside?

WHERE?

slide-10
SLIDE 10

AFFINITY

Partitio n Nod e

slide-11
SLIDE 11

AFFINITY

slide-12
SLIDE 12

AFFINITY

Ke y Partitio n Nod e

slide-13
SLIDE 13

AFFINITY

slide-14
SLIDE 14

NAIVE AFFINITY

slide-15
SLIDE 15

NAIVE AFFINITY

slide-16
SLIDE 16

NAIVE AFFINITY

slide-17
SLIDE 17

NAIVE AFFINITY

slide-18
SLIDE 18

NAIVE AFFINITY

slide-19
SLIDE 19

NAIVE AFFINITY

slide-20
SLIDE 20

NAIVE AFFINITY

NODE = F (PARTITION, NODES_COUNT);

Problem: partition to node mapping depends on nodes count.

slide-21
SLIDE 21

AFFINITY: BETTER ALGORITHMS

[1] https://en.wikipedia.org/wiki/Consistent_hashing [2] https://en.wikipedia.org/wiki/Rendezvous_hashing

 Consistent hashing [1]  Rendezvous hashing (or highest random weight - HRW) [2]

slide-22
SLIDE 22

RENDEZVOUS AFFINITY

WEIGHT = W(PARTITION, NODE);

slide-23
SLIDE 23

RENDEZVOUS AFFINITY

slide-24
SLIDE 24

RENDEZVOUS AFFINITY

slide-25
SLIDE 25

RENDEZVOUS AFFINITY

slide-26
SLIDE 26

RENDEZVOUS AFFINITY

slide-27
SLIDE 27

RENDEZVOUS AFFINITY

slide-28
SLIDE 28

RENDEZVOUS AFFINITY: EVEN DISTRIBUTION?

slide-29
SLIDE 29

PLAN

  • 1. Data partitioning and affjnity functions examples
  • 2. Data affjnity colocation
  • 3. Synchronization in distributed systems
  • 4. Multithreading: local architecture
slide-30
SLIDE 30

TRANSACTIONS: NO COLOCATION

1: class Customer { 2: long id; 3: City city; 4: }

slide-31
SLIDE 31

TRANSACTIONS: NO COLOCATION

slide-32
SLIDE 32

TRANSACTIONS: NO COLOCATION

2 (2 nodes)

slide-33
SLIDE 33

TRANSACTIONS: NO COLOCATION

2 (2 nodes) 2 (primary + backup)

slide-34
SLIDE 34

TRANSACTIONS: NO COLOCATION

2 (2 nodes) 2 (primary + backup) 2 (two-phase commit)

slide-35
SLIDE 35

TRANSACTIONS: NO COLOCATION

2 (2 nodes) 2 (primary + backup) 2 (two-phase commit) 2 (request-response)

slide-36
SLIDE 36

TRANSACTIONS: NO COLOCATION

2 (2 nodes) 2 (primary + backup) 2 (two-phase commit) 2 (request-response)

16 Messages

slide-37
SLIDE 37

TRANSACTIONS: NO COLOCATION

slide-38
SLIDE 38

TRANSACTIONS: WITH COLOCATION

1: class Customer { 2: long id; 3: 4: @AffinityKeyMapped 5: City city; 6: }

slide-39
SLIDE 39

TRANSACTIONS: WITH COLOCATION

slide-40
SLIDE 40

TRANSACTIONS: WITH COLOCATION

1 (1 node)

slide-41
SLIDE 41

TRANSACTIONS: WITH COLOCATION

1 (1 node) 2 (primary + backup)

slide-42
SLIDE 42

TRANSACTIONS: WITH COLOCATION

1 (1 node) 2 (primary + backup) (one-phase commit)

slide-43
SLIDE 43

TRANSACTIONS: WITH COLOCATION

1 (1 node) 2 (primary + backup) (one-phase commit) 1 (request-response)

slide-44
SLIDE 44

TRANSACTIONS: WITH COLOCATION

1 (1 node) 2 (primary + backup) (one-phase commit) 1 (request-response)

4 Messages

slide-45
SLIDE 45

TRANSACTIONS: COLOCATION VS NO COLOCATION

4 Messages

VS

16 Messages

slide-46
SLIDE 46

SQL

Let’s run a query

  • n our data
slide-47
SLIDE 47

SQL

No colocation: FULL SCAN

slide-48
SLIDE 48

SQL

No colocation: FULL SCAN

slide-49
SLIDE 49

SQL

No colocation: FULL SCAN

1/3x Latency

slide-50
SLIDE 50

SQL

No colocation: FULL SCAN

1/3x Latency 3x Capacity

slide-51
SLIDE 51

SQL

slide-52
SLIDE 52

SQL

1 node

slide-53
SLIDE 53

SQL

1 node N nodes

slide-54
SLIDE 54

SQL

What about complexity?

log 1_000_000 ≈ 20

slide-55
SLIDE 55

SQL

What about complexity?

log 1_000_000 ≈ 20 vs log 333_333 ≈ 18 log 333_333 ≈ 18 log 333_333 ≈ 18

slide-56
SLIDE 56

SQL: INDEXED

slide-57
SLIDE 57

SQL

No colocation: INDEXED QUERY

Same latency! Same capacity!

slide-58
SLIDE 58

SQL: INDEX AND COLOCATION

Colocation: INDEXED QUERY

slide-59
SLIDE 59

SQL

Colocation: INDEXED QUERY

slide-60
SLIDE 60

SQL: INDEX AND COLOCATION

Colocation: INDEXED QUERY

Same latency But 3x capacity!

slide-61
SLIDE 61

SQL: EVEN DISTRIBUTION WITH COLOCATION?

slide-62
SLIDE 62

SQL: JOINS IN DISTRIBUTED ENVIRONMENT

slide-63
SLIDE 63

SQL: JOINS WITH COLOCATION

slide-64
SLIDE 64

SQL: JOINS WITH REPLICATION

slide-65
SLIDE 65

PLAN

  • 1. Data partitioning and affjnity functions examples
  • 2. Data affjnity colocation
  • 3. Synchronization in distributed systems
  • 4. Multithreading: local architecture
slide-66
SLIDE 66

SYNCHRONIZATION: LOCAL COUNTER

1: AtomicLong ctr; 2: 3: long getNext() { 4: return ctr.incrementAndGet(); 5: }

slide-67
SLIDE 67

SYNCHRONIZATION: LOCAL (RE-INVENTING A BICYCLE)

1: AtomicLong ctr; 2: ThreadLocal<Long> localCtr; 3: 4: long getNext() { 5: long res = localCtr.get(); 6: 7: if (res % 1000 == 0) 8: res = ctr.getAndAdd(1000); 9: 10: localCtr.set(++res); 11: 12: return res; 13: }

slide-68
SLIDE 68

SYNCHRONIZATION: LOCAL

slide-69
SLIDE 69

SYNCHRONIZATION: DISTRIBUTED

slide-70
SLIDE 70

SYNCHRONIZATION: COUNTER IN THE CLUSTER

Distributed implementation: thousands ops/sec Local implementation: millions ops/sec

slide-71
SLIDE 71

SYNCHRONIZATION: COUNTER IN THE CLUSTER

Proper requirements:

Unique Monotonously growing 8 bytes

slide-72
SLIDE 72

SYNCHRONIZATION: COUNTER IN THE CLUSTER

Requirements: unique, monotonous, 8 bytes.

slide-73
SLIDE 73

SYNCHRONIZATION: COUNTER IN THE CLUSTER

Requirements: unique, monotonous, 8 bytes.

slide-74
SLIDE 74

SYNCHRONIZATION: COUNTER IN THE CLUSTER

Requirements: unique, monotonous, 8 bytes.

slide-75
SLIDE 75

SYNCHRONIZATION: COUNTER IN THE CLUSTER

Requirements: unique, monotonous, 8 bytes.

See also: org.apache.ignite.lang.IgniteUuid

slide-76
SLIDE 76

SYNCHRONIZATION AS FRICTION FOR A CAR

slide-77
SLIDE 77

SYNCHRONIZATION: DATA TO CODE

slide-78
SLIDE 78

SYNCHRONIZATION: DATA TO CODE

1: Account acc = cache.get(accKey); 3: 3: acc.add(100); 4: 5: cache.put(accKey, acc);

slide-79
SLIDE 79

SYNCHRONIZATION: DATA TO CODE

1: Account acc = cache.get(accKey); 3: 3: acc.add(100); 4: 5: cache.put(accKey, acc);

slide-80
SLIDE 80

SYNCHRONIZATION: CODE TO DATA

slide-81
SLIDE 81

SYNCHRONIZATION: CODE TO DATA

1: cache.invoke(accKey, (entry) -> { 1: Account acc = entry.getValue(); 3: 3: acc.add(100); 4: 5: entry.setValue(acc); 6: });

slide-82
SLIDE 82

SYNCHRONIZATION: CODE TO DATA

1: cache.invoke(accKey, (entry) -> { 1: Account acc = entry.getValue(); 3: 3: acc.add(100); 4: 5: entry.setValue(acc); 6: });

slide-83
SLIDE 83

SYNCHRONIZATION: DATA TO CODE

What if we have a bug?!

slide-84
SLIDE 84

SYNCHRONIZATION: CODE TO DATA

What if we have a bug?!

slide-85
SLIDE 85

SYNCHRONIZATION: CODE TO DATA

What if we have a bug?!

`

slide-86
SLIDE 86

PLAN

  • 1. Data partitioning and affjnity functions examples
  • 2. Data affjnity colocation
  • 3. Synchronization in distributed systems
  • 4. Multithreading: local architecture
slide-87
SLIDE 87

LOCAL TASKS DISTRIBUTION

slide-88
SLIDE 88

LOCAL TASKS DISTRIBUTION

slide-89
SLIDE 89

LOCAL TASKS DISTRIBUTION

slide-90
SLIDE 90

LOCAL TASKS DISTRIBUTION

slide-91
SLIDE 91

LOCAL TASKS DISTRIBUTION

slide-92
SLIDE 92

LOCAL TASKS DISTRIBUTION: THREAD PER PARTITION

slide-93
SLIDE 93

LOCAL TASKS DISTRIBUTION: THREAD PER PARTITION

slide-94
SLIDE 94

LOCAL TASKS DISTRIBUTION: THREAD PER PARTITION

slide-95
SLIDE 95

LOCAL TASKS DISTRIBUTION: THREAD PER PARTITION

slide-96
SLIDE 96

LESSONS LEARNED 1) Data partitioning: balance and stability

slide-97
SLIDE 97

LESSONS LEARNED 1) Data partitioning: balance and stability 2) Colocation: balance and effjciency

slide-98
SLIDE 98

LESSONS LEARNED 1) Data partitioning: balance and stability 2) Colocation: balance and effjciency 3) Data model: should be adopted accordingly

slide-99
SLIDE 99

LESSONS LEARNED 1) Data partitioning: balance and stability 2) Colocation: balance and effjciency 3) Data model: should be adopted accordingly 4) Synchronization: delicate and only when really needed

slide-100
SLIDE 100

LESSONS LEARNED 1) Data partitioning: balance and stability 2) Colocation: balance and effjciency 3) Data model: should be adopted accordingly 4) Synchronization: delicate and only when really needed 5) Thread per partition: can improve simple operations, but also may slow down complex ones

slide-101
SLIDE 101

CONTACTS

yzhdanov@gridgain.com http://ignite.apache.org dev@ignite.apache.org user@ignite.apache.org

slide-102
SLIDE 102

QUESTIONS?

ANY QUESTIONS?