highly resilient multi region keystone deployments
play

Highly resilient, multi-region Keystone deployments Michael - PowerPoint PPT Presentation

Highly resilient, multi-region Keystone deployments Michael Richardson // 22 May 2018 1 2.1 3 Purpose Caveats SQL-backed deployment All regions are considered equal "Standard" regions, not singular Edge nodes 4 Compute Storage


  1. Highly resilient, multi-region Keystone deployments Michael Richardson // 22 May 2018 1

  2. 2.1

  3. 3

  4. Purpose Caveats SQL-backed deployment All regions are considered equal "Standard" regions, not singular Edge nodes 4

  5. Compute Storage Network Images Dashboard Billing VPNaaS LBaaS Orchestration GPUs CaaS BYON BYOIP Region1 5

  6. Compute Storage Network Images Dashboard Billing VPNaaS Identity LBaaS Orchestration GPUs CaaS BYON BYOIP Region1 6

  7. Compute Storage Network Compute Storage Network Images Dashboard Billing Images Dashboard Billing VPNaaS Identity LBaaS VPNaaS LBaaS Identity Orchestration Orchestration GPUs CaaS BYON BYOIP GPUs CaaS BYON BYOIP Region1 Region2 7

  8. Compute Storage Network Images Dashboard Billing VPNaaS Identity LBaaS Orchestration GPUs CaaS BYON BYOIP Region1 Compute Storage Network Compute Storage Network Images Dashboard Billing Images Dashboard Billing VPNaaS Identity LBaaS VPNaaS Identity LBaaS Orchestration Orchestration GPUs CaaS BYON BYOIP GPUs CaaS BYON BYOIP Region3 Region2 8

  9. 9

  10. 10

  11. 11

  12. What if Keystone should fail? No API requests No Dashboard No metrics collection Data plane AOK No orchestration 12

  13. Keystone (then) UWSGI + Nginx, HAProxy, Memcached MariaDB Galera cluster Giftwrapped python virtualenv Kilo, UUID tokens 13

  14. The Internets External proxies API node1 API node2 API node3 Memcache Memecache Memcache node1 node2 node3 Internal proxies DB node1 DB node2 DB node3 14

  15. Requirements Loss of a region should not affect the operation of any other region Major partition must continue as before Minor partition may continue in read-only mode All users and project data should be available in each region Self-healing 15

  16. Options 16

  17. Multiple Keystones A single Keystone 17

  18. Federation Solves a different problem. 18

  19. CockroachDB Too soon. http://lists.openstack.org/pipermail/openstack- dev/2017-May/117018.html 19

  20. Master-slave replication Asynchronous. 20

  21. Circular replication Ye-olde multi-master. 21

  22. Galera site-to-site replication No benefit at this scale. 22

  23. Design A single, inter-region Galera cluster, with an odd number of nodes Nodes in all regions Providing data for all regions One Keystone database Synchronous replication Inserts/updates/locks must be negligable Fernet tokens 23

  24. Design Distinct authentication endpoints in each region, that back onto the same DB cluster. Other regions must be able to take over if all nodes are down Frontend, backend proxy configuration Region-specific service (Nova, neutron etc.) DB clusters But, pointing to the inter-region Keystone service 24

  25. Implementation 25

  26. Keystone upgrade Upgrade the Keystone APIs Kilo to Mitaka One release, micro-(scheduled)-outage Straightforward 26

  27. Keystone configuration Multiple Keystone API nodes per region As before (Nginx + UWSGI) All sitting behind redundant HAProxy nodes 27

  28. Multi-factor authentication Password plus TOTP Can be enabled per account by users Works with the APIs https://github.com/catalyst-cloud/adjutant-mfa 28

  29. 29

  30. 30

  31. 31

  32. What's inside? curl $keystone_endpoint:35357/v3/auth/tokens \ -H "X-Subject-Token: {{ fernet_token }}" \ -H "X-Auth-Token: {{ admin_token }}" \ | python -m json.tool 32

  33. 33

  34. Inter-region Galera cluster Redundant links between regions Odd number of nodes per region wsrep_dirty_reads = 1 wsrep_sync_wait = 0 wsrep_slave_threads = "n. cores" max_connections = "high" 34

  35. Inter-node connectivity mysql = tcp/3306 galera replication = {tcp,udp}/4567 galera IST = tcp/4568 galera SST = tcp/4444 35

  36. Testing Lock/load test across all regions pseudo-code { while true; do check cluster size, cluster status for each in region A B C; do locking-operation in region ${each} for region in A B C; do read and verify from ${region} done done done } 36

  37. Memcache configuration One thread per core 512 MB for storage Objects cached for 60 minutes 37

  38. 38

  39. Proxies and endpoints Final cutover Internal, external proxies updated Migrate to new cluster Endpoints in each region were now active, but not advertised. Services in each region were re-pointed to the local endpoint Additional endpoints now added to the Catalog Endpoints "unversioned" 39

  40. Proxies and endpoints External proxy configuration Local endpoints utilise remote regions endpoints as backup servers Allows the other regions to take over Transparent failover 40

  41. The Internets External proxies API node1 API node2 Memcache node1 API node3 Memecache node2 Memcache DB node1 Internal proxies node3 DB node2 DB node3 API node3 Memcache DB node3 node3 The Internets External proxies Internal proxies API node2 Memecache DB node2 node2 DB node1 API node1 Memcache node1 DB node1 DB node2 node1 Memcache Internal proxies DB node3 node2 API node1 Memecache node3 API node2 Memcache External proxies API node3 The Internets 41

  42. 42

  43. 43

  44. 44

  45. 45

  46. Where to from here? Global endpoints MFA in/from master Keystone from $release n-1. 46

  47. Summary Keystone: Fernet with caching Galera: single DB, geo-distributed, redundant paths 47

  48. Summary External proxies Other regions must be present as backup servers Keepalived DNS round-robin between multiple VRRP addresses Configure each to have one master, the others as backup 48

  49. Thank you 49

  50. 50

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend