operating multi tenant kafka services for developers
play

Operating Multi-Tenant Kafka Services for Developers Data Council - PowerPoint PPT Presentation

Operating Multi-Tenant Kafka Services for Developers Data Council SF 2019 Ali Hamidi - Heroku Data Agenda Intro Motivation Single Tenant Dedicated Multi-tenancy Configuration & Tuning Testing Automation


  1. Operating Multi-Tenant Kafka Services for Developers Data Council SF 2019 Ali Hamidi - Heroku Data

  2. Agenda • Intro • Motivation • Single Tenant Dedicated • Multi-tenancy • Configuration & Tuning • Testing • Automation • Limitations Data Council SF 2019 - Heroku Data 2

  3. Intro I am… Ali Hamidi, an engineer on the Heroku Data team at Salesforce. Heroku is... a cloud platform that lets companies build, deliver, monitor and scale apps. Heroku Data is… the team that provides secure, scalable data services on the Heroku Platform. Data Council SF 2019 - Heroku Data 3

  4. Apache Kafka • Distributed Streaming Platform Data Council SF 2019 - Heroku Data 4

  5. Apache Kafka • Distributed Streaming Platform • Publish/Subscribe (=> Produce/Consume) Data Council SF 2019 - Heroku Data 5

  6. Apache Kafka • Distributed Streaming Platform • Publish/Subscribe (=> Produce/Consume) • Durable message store (commit log) Data Council SF 2019 - Heroku Data 6

  7. Apache Kafka • Distributed Streaming Platform • Publish/Subscribe (=> Produce/Consume) • Durable message store (commit log) • Highly available Data Council SF 2019 - Heroku Data 7

  8. Apache Kafka on Heroku • Fully Managed Service Data Council SF 2019 - Heroku Data 8

  9. Apache Kafka on Heroku • Fully Managed Service • Opinionated Data Council SF 2019 - Heroku Data 9

  10. Apache Kafka on Heroku • Fully Managed Service • Opinionated • Configured for best practices for most users* 10 Data Council SF 2019 - Heroku Data 

  11. Use Cases • Decompose a monolithic app 11 Data Council SF 2019 - Heroku Data 

  12. Use Cases • Decompose a monolithic app • Process high volume, real-time data streams 12 Data Council SF 2019 - Heroku Data 

  13. Use Cases • Decompose a monolithic app • Process high volume, real-time data streams • Power a real-time, event-driven architecture 13 Data Council SF 2019 - Heroku Data 

  14. SHIFT Commerce Decompose a monolithic app 14 Data Council SF 2019 - Heroku Data 

  15. Quoine • QUOINE is a leading global fintech company that provides trading, exchange, and next generation financial services powered by blockchain technology • Consume real-time cryptocurrency pricing data from individual markets and exchanges 15 Data Council SF 2019 - Heroku Data 

  16. Caesars Entertainment • Ingest, aggregate, and process customer data in real-time to provide the best customer experience • Real-time, event-driven architecture 16 Data Council SF 2019 - Heroku Data 

  17. The Motivation 17 Data Council SF 2019 - Heroku Data 

  18. Why Multi-tenant Kafka? • More accessible • Additional use cases • Development • Testing • Low volume production 18 Data Council SF 2019 - Heroku Data 

  19. 19 Data Council SF 2019 - Heroku Data 

  20. Single Tenant Dedicated 20 Data Council SF 2019 - Heroku Data 

  21. 21 Data Council SF 2019 - Heroku Data 

  22. 22 Data Council SF 2019 - Heroku Data 

  23. Multi-tenancy 23 Data Council SF 2019 - Heroku Data 

  24. 24 Data Council SF 2019 - Heroku Data 

  25. Multi-tenancy • Resource isolation • Security • Performance • Safety • Parity • Feature • Behaviour • Compatibility • Costs • Resources • Operational 25 Data Council SF 2019 - Heroku Data 

  26. Multi-tenancy • Resource isolation • Security • Performance • Safety • Parity • Feature • Behaviour • Compatibility • Costs • Resources • Operational 26 Data Council SF 2019 - Heroku Data 

  27. Security 27 Data Council SF 2019 - Heroku Data 

  28. A tenant should not be able to access another tenant’s data 28 Data Council SF 2019 - Heroku Data 

  29. 29 Data Council SF 2019 - Heroku Data 

  30. 30 Data Council SF 2019 - Heroku Data 

  31. Security • Access Control Lists (ACLs) • Namespacing 31 Data Council SF 2019 - Heroku Data 

  32. Security • Access Control Lists (ACLs) • User A can carry out action B on resource C • Namespacing 32 Data Council SF 2019 - Heroku Data 

  33. Security • Access Control Lists (ACLs) • User A can carry out action B on resource C • Namespacing • wabash-58779.events 33 Data Council SF 2019 - Heroku Data 

  34. Performance 34 Data Council SF 2019 - Heroku Data 

  35. A tenant should not adversely affect another tenant’s performance 35 Data Council SF 2019 - Heroku Data 

  36. Performance • Quotas • Produce • Consume 36 Data Council SF 2019 - Heroku Data 

  37. Safety 37 Data Council SF 2019 - Heroku Data 

  38. A tenant should not jeopardise the stability of the cluster 38 Data Council SF 2019 - Heroku Data 

  39. Safety • Limits • Topics • Partitions • Consumer Groups • Storage • Throughput 39 Data Council SF 2019 - Heroku Data 

  40. Capacity = Message Throughput * Retention * Replication 40 Data Council SF 2019 - Heroku Data 

  41. Safety • Limits • Topics • Partitions • Consumer Groups • Storage Capacity • Throughput 41 Data Council SF 2019 - Heroku Data 

  42. Safety • Limits • Topics • Partitions • Consumer Groups • Storage Capacity • Throughput • Monitoring 42 Data Council SF 2019 - Heroku Data 

  43. Safety • Limits • Topics • Partitions • Consumer Groups • Storage Capacity • Throughput • Monitoring • Limit enforcement! 43 Data Council SF 2019 - Heroku Data 

  44. Multi-tenancy • Resource isolation • Security • Performance • Safety • Parity • Feature • Behaviour • Compatibility • Costs • Resources • Operational 44 Data Council SF 2019 - Heroku Data 

  45. Parity 45 Data Council SF 2019 - Heroku Data 

  46. For the service to be useful, it needs to behave like a normal cluster 46 Data Council SF 2019 - Heroku Data 

  47. Parity • Access to a standard cluster 47 Data Council SF 2019 - Heroku Data 

  48. Parity • Access to a standard cluster • ...but with some limitations 48 Data Council SF 2019 - Heroku Data 

  49. Multi-tenancy • Resource isolation • Security • Performance • Safety • Parity • Feature • Behaviour • Compatibility • Costs • Resources • Operational 49 Data Council SF 2019 - Heroku Data 

  50. Compatibility 50 Data Council SF 2019 - Heroku Data 

  51. The service needs to support standard clients No vendor lock-in 51 Data Council SF 2019 - Heroku Data 

  52. Compatibility • Open Source Apache Kafka • Not a fork • No custom code required • Use standard clients 52 Data Council SF 2019 - Heroku Data 

  53. Multi-tenancy • Resource isolation • Security • Performance • Safety • Parity • Feature • Behaviour • Compatibility • Costs • Resources • Operational 53 Data Council SF 2019 - Heroku Data 

  54. Costs 54 Data Council SF 2019 - Heroku Data 

  55. The service needs to be financially feasible 55 Data Council SF 2019 - Heroku Data 

  56. Resource Costs • Packing Density • Utilization 56 Data Council SF 2019 - Heroku Data 

  57. Resource Costs • Cluster size? • No over provisioning • Seamless upgrading • Can’t move tenants (can’t migrate message offsets) 57 Data Council SF 2019 - Heroku Data 

  58. Operational Costs • Minimal operational burden • Minimize impact/blast radius 58 Data Council SF 2019 - Heroku Data 

  59. Operational Costs • Safe defaults • Similar clusters to our dedicated • Automation (kind of our thing) • Testing (lots) 59 Data Council SF 2019 - Heroku Data 

  60. Configuration & Tuning 60 Data Council SF 2019 - Heroku Data 

  61. Configuration & Tuning • Partitions • Quotas • Topics & Consumer Groups • Guard Rails 61 Data Council SF 2019 - Heroku Data 

  62. Partitions • Lots of partitions • 48,000 • Max file descriptors • 500,000 • Max mmap count • 500,000 62 Data Council SF 2019 - Heroku Data 

  63. Quotas • Per Broker! • Counter intuitive enforcement 63 Data Council SF 2019 - Heroku Data 

  64. Topics & Consumer Groups • Explicit Topic creation • Explicit Consumer Group creation 64 Data Council SF 2019 - Heroku Data 

  65. Guard Rails • Limit potential bad usage 65 Data Council SF 2019 - Heroku Data 

  66. Guard Rails • Limit potential bad usage • “Customers don’t make mistakes, we make bad tools” 66 Data Council SF 2019 - Heroku Data 

  67. # Heroku Data Control Plane min_retention_time = 24.hours 67 Data Council SF 2019 - Heroku Data 

  68. # Heroku Data Control Plane min_retention_time = 24.hours max_retention_time = 7.days 68 Data Council SF 2019 - Heroku Data 

  69. # Heroku Data Control Plane min_retention_time = 24.hours max_retention_time = 7.days default_replication_factor = 3 69 Data Council SF 2019 - Heroku Data 

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend