Highly resilient, multi-region Keystone deployments Michael - - PowerPoint PPT Presentation

highly resilient multi region keystone deployments
SMART_READER_LITE
LIVE PREVIEW

Highly resilient, multi-region Keystone deployments Michael - - PowerPoint PPT Presentation

Highly resilient, multi-region Keystone deployments Michael Richardson // 22 May 2018 1 2.1 3 Purpose Caveats SQL-backed deployment All regions are considered equal "Standard" regions, not singular Edge nodes 4 Compute Storage


slide-1
SLIDE 1

Highly resilient, multi-region Keystone deployments

Michael Richardson // 22 May 2018

1

slide-2
SLIDE 2

2.1

slide-3
SLIDE 3
slide-4
SLIDE 4

3

slide-5
SLIDE 5

Purpose

Caveats

SQL-backed deployment All regions are considered equal "Standard" regions, not singular Edge nodes

4

slide-6
SLIDE 6

Compute Storage Network Orchestration Images Dashboard Billing LBaaS VPNaaS GPUs CaaS BYON BYOIP

Region1

5

slide-7
SLIDE 7

Compute Storage Network Orchestration Images Dashboard Billing LBaaS VPNaaS Identity GPUs CaaS BYON BYOIP

Region1

6

slide-8
SLIDE 8

Compute Storage Network Orchestration Images Dashboard Billing LBaaS VPNaaS Identity GPUs CaaS BYON BYOIP

Region1

Compute Storage Network Orchestration Images Dashboard Billing LBaaS VPNaaS Identity GPUs CaaS BYON BYOIP

Region2

7

slide-9
SLIDE 9 Compute Storage Network Orchestration Images Dashboard Billing LBaaS VPNaaS Identity GPUs CaaS BYON BYOIP Region1 Compute Storage Network Orchestration Images Dashboard Billing LBaaS VPNaaS Identity GPUs CaaS BYON BYOIP Region3 Compute Storage Network Orchestration Images Dashboard Billing LBaaS VPNaaS Identity GPUs CaaS BYON BYOIP Region2

8

slide-10
SLIDE 10

9

slide-11
SLIDE 11

10

slide-12
SLIDE 12

11

slide-13
SLIDE 13

What if Keystone should fail?

No API requests No Dashboard No metrics collection Data plane AOK No orchestration

12

slide-14
SLIDE 14

Keystone (then)

UWSGI + Nginx, HAProxy, Memcached MariaDB Galera cluster Giftwrapped python virtualenv Kilo, UUID tokens

13

slide-15
SLIDE 15 External proxies API node2 API node3 Memcache node3 DB node2 Memecache node2 Internal proxies API node1 Memcache node1 DB node1 DB node3 The Internets

14

slide-16
SLIDE 16

Requirements

Loss of a region should not affect the operation of any other region

Major partition must continue as before Minor partition may continue in read-only mode

All users and project data should be available in each region Self-healing

15

slide-17
SLIDE 17

Options

16

slide-18
SLIDE 18

Multiple Keystones A single Keystone

17

slide-19
SLIDE 19

Federation

Solves a different problem.

18

slide-20
SLIDE 20

CockroachDB

Too soon. http://lists.openstack.org/pipermail/openstack- dev/2017-May/117018.html

19

slide-21
SLIDE 21

Master-slave replication

Asynchronous.

20

slide-22
SLIDE 22

Circular replication

Ye-olde multi-master.

21

slide-23
SLIDE 23

Galera site-to-site replication

No benefit at this scale.

22

slide-24
SLIDE 24

Design

A single, inter-region Galera cluster, with an odd number of nodes

Nodes in all regions Providing data for all regions

One Keystone database

Synchronous replication Inserts/updates/locks must be negligable Fernet tokens

23

slide-25
SLIDE 25

Design

Distinct authentication endpoints in each region, that back onto the same DB cluster. Other regions must be able to take over if all nodes are down

Frontend, backend proxy configuration

Region-specific service (Nova, neutron etc.) DB clusters

But, pointing to the inter-region Keystone service

24

slide-26
SLIDE 26

Implementation

25

slide-27
SLIDE 27

Keystone upgrade

Upgrade the Keystone APIs

Kilo to Mitaka One release, micro-(scheduled)-outage Straightforward

26

slide-28
SLIDE 28

Keystone configuration

Multiple Keystone API nodes per region

As before (Nginx + UWSGI) All sitting behind redundant HAProxy nodes

27

slide-29
SLIDE 29

Multi-factor authentication

Password plus TOTP

Can be enabled per account by users Works with the APIs

https://github.com/catalyst-cloud/adjutant-mfa

28

slide-30
SLIDE 30

29

slide-31
SLIDE 31

30

slide-32
SLIDE 32

31

slide-33
SLIDE 33

What's inside?

curl $keystone_endpoint:35357/v3/auth/tokens \

  • H "X-Subject-Token: {{ fernet_token }}" \
  • H "X-Auth-Token: {{ admin_token }}" \

| python -m json.tool

32

slide-34
SLIDE 34

33

slide-35
SLIDE 35

Inter-region Galera cluster

Redundant links between regions Odd number of nodes per region

wsrep_dirty_reads = 1 wsrep_sync_wait = 0 wsrep_slave_threads = "n. cores" max_connections = "high"

34

slide-36
SLIDE 36

Inter-node connectivity

mysql = tcp/3306 galera replication = {tcp,udp}/4567 galera IST = tcp/4568 galera SST = tcp/4444

35

slide-37
SLIDE 37

Testing

Lock/load test across all regions

pseudo-code { while true; do check cluster size, cluster status for each in region A B C; do locking-operation in region ${each} for region in A B C; do read and verify from ${region} done done done }

36

slide-38
SLIDE 38

Memcache configuration

One thread per core 512 MB for storage Objects cached for 60 minutes

37

slide-39
SLIDE 39

38

slide-40
SLIDE 40

Proxies and endpoints

Final cutover

Internal, external proxies updated Migrate to new cluster

Endpoints in each region were now active, but not advertised.

Services in each region were re-pointed to the local endpoint Additional endpoints now added to the Catalog

Endpoints "unversioned"

39

slide-41
SLIDE 41

Proxies and endpoints

External proxy configuration

Local endpoints utilise remote regions endpoints as backup servers Allows the other regions to take over Transparent failover

40

slide-42
SLIDE 42
slide-43
SLIDE 43 External proxies API node2 API node3 Memcache node3 DB node2 Memecache node2 Internal proxies API node1 Memcache node1 DB node1 DB node3 The Internets External proxies API node2 API node3 Memcache node3 DB node2 Memecache node2 Internal proxies API node1 Memcache node1 DB node1 DB node3 The Internets External proxies API node2 API node3 Memcache node3 DB node2 Memecache node2 Internal proxies API node1 Memcache node1 DB node1 DB node3 The Internets

41

slide-44
SLIDE 44

42

slide-45
SLIDE 45

43

slide-46
SLIDE 46

44

slide-47
SLIDE 47

45

slide-48
SLIDE 48

Where to from here?

Global endpoints MFA in/from master Keystone from $release n-1.

46

slide-49
SLIDE 49

Summary

Keystone: Fernet with caching Galera: single DB, geo-distributed, redundant paths

47

slide-50
SLIDE 50

Summary

External proxies

Other regions must be present as backup servers

Keepalived

DNS round-robin between multiple VRRP addresses Configure each to have one master, the others as backup

48

slide-51
SLIDE 51

Thank you

49

slide-52
SLIDE 52

50