Dive into Streams with Brooklin
Celia Kung
Dive into Streams with Brooklin Celia Kung LinkedIn Background - - PowerPoint PPT Presentation
Dive into Streams with Brooklin Celia Kung LinkedIn Background Scenarios Outline Application Use Cases Architecture Current and Future Background Nearline Applications Require near real-time response Thousands of applications at
Celia Kung
Background Scenarios Application Use Cases Architecture Current and Future
○ E.g. Live search indices, Notifications
○ Data could be spread across multiple database systems
○ App devs should focus on event processing and not on
data access
Espresso
(LinkedIn’s document store)
Microsoft EventHubs
solutions to stream data from and to each different system?
○ Slows down development ○ Hard to manage! Microsoft EventHubs
Streaming System C Streaming System D Streaming System A Streaming System B
Nearline Applications
provisioned and individually configured
additional sources/destinations
types to many destination types
thousand streams simultaneously
Kinesis EventHubs Kafka
Destinations Applications
Kinesis EventHubs Kafka Databases Messaging Systems
Sources
Espresso
Scenario 1:
reflect her recent job change
colleagues of this change
Updates
News Feed Service
query
Updates
News Feed Service Search Indices Service
query query
Updates
News Feed Service
q u e r y
Notifications Service
Standardization Service
query query query
Search Indices Service
Updates
News Feed Service
q u e r y
Notifications Service
Standardization Service
query query query
Search Indices Service
Updates
News Feed Service
q u e r y
Notifications Service
Standardization Service
query query query
Search Indices Service
from the sources and don’t compete for resources with online queries
points in change timelines
updates to a change stream
consume from change streams
Messaging System
News Feed Service Notifications Service Standardization Service Search Indices Service
Member DB
Updates
Scenario 2:
○ cloud services ○ clusters ○ data centers
different environments
Obfuscation, Data formats
○ Issues with KMM: didn’t scale well, difficult to operate and manage, poor failure isolation
Destinations Sources
Messaging systems Microsoft EventHubs Messaging systems Microsoft EventHubs Databases Databases
Datacenter B
aggregate tracking tracking
Datacenter A
aggregate tracking tracking
KMM
aggregate metrics metrics aggregate metrics metrics
Datacenter C
aggregate tracking tracking aggregate metrics metrics
KMM KMM KMM KMM KMM KMM KMM KMM KMM KMM KMM KMM KMM KMM KMM KMM KMM
Datacenter A
aggregate tracking tracking Brooklin metrics aggregate metrics
Datacenter B
aggregate tracking tracking metrics aggregate metrics
Datacenter C
aggregate tracking tracking metrics aggregate metrics
Brooklin Brooklin
○ Entire pipeline, topic, topic-partition
○ Auto-resumes the partitions afuer a configurable duration
Security
Cache
Security
Cache Search Indices
Security
Cache Search Indices ETL or Data warehouse
Security
Cache Search Indices ETL or Data warehouse Materialized Views or Replication
Security
Cache Search Indices ETL or Data warehouse Materialized Views or Replication Repartitioning
Adjunct Data
Bridge Adjunct Data
Bridge Serde, Encryption, Policy Adjunct Data
Bridge Serde, Encryption, Policy Standardization, Notifications … Adjunct Data
Example:
Member DB
Updates
News Feed Service
○ Source Database: Espresso (Member DB, Profile table) ○ Destination: Kafka ○ Application: News Feed service
destination
Name: MemberProfileChangeStream Source: MemberDB/ProfileTable Type: Espresso Partitions: 8 Destination: ProfileTopic Type: Kafka Partitions: 8 Metadata: Application: News Feed service Owner: newsfeed@linkedin.com
Brooklin Client Load Balancer ZooKeeper
Brooklin Instance Datastream Management Service (DMS) Espresso Consumer Coordinator (Leader) Kafka Producer
Brooklin Instance Datastream Management Service (DMS) Espresso Consumer Coordinator Kafka Producer Brooklin Instance Datastream Management Service (DMS) Espresso Consumer Coordinator Kafka Producer
Member DB
create POST /datastream
News Feed service
Brooklin Client Load Balancer ZooKeeper
Brooklin Instance Datastream Management Service (DMS) Espresso Consumer Coordinator (Leader) Kafka Producer
Brooklin Instance Datastream Management Service (DMS) Espresso Consumer Coordinator Kafka Producer Brooklin Instance Datastream Management Service (DMS) Espresso Consumer Coordinator Kafka Producer
Member DB News Feed service
Brooklin Client Load Balancer ZooKeeper
Brooklin Instance Datastream Management Service (DMS) Espresso Consumer Coordinator (Leader) Kafka Producer
Brooklin Instance Datastream Management Service (DMS) Espresso Consumer Coordinator Kafka Producer Brooklin Instance Datastream Management Service (DMS) Espresso Consumer Coordinator Kafka Producer
Member DB News Feed service
Brooklin Client Load Balancer ZooKeeper
Brooklin Instance Datastream Management Service (DMS) Espresso Consumer Coordinator (Leader) Kafka Producer
Brooklin Instance Datastream Management Service (DMS) Espresso Consumer Coordinator Kafka Producer Brooklin Instance Datastream Management Service (DMS) Espresso Consumer Coordinator Kafka Producer
Member DB News Feed service
Brooklin Client Load Balancer ZooKeeper
Brooklin Instance Datastream Management Service (DMS) Espresso Consumer Coordinator (Leader) Kafka Producer
Brooklin Instance Datastream Management Service (DMS) Espresso Consumer Coordinator Kafka Producer Brooklin Instance Datastream Management Service (DMS) Espresso Consumer Coordinator Kafka Producer
Member DB News Feed service
Brooklin Client Load Balancer ZooKeeper
Brooklin Instance Datastream Management Service (DMS) Espresso Consumer Coordinator (Leader) Kafka Producer
Brooklin Instance Datastream Management Service (DMS) Espresso Consumer Coordinator Kafka Producer Brooklin Instance Datastream Management Service (DMS) Espresso Consumer Coordinator Kafka Producer
Member DB News Feed service
Brooklin Client Load Balancer ZooKeeper
Brooklin Instance Datastream Management Service (DMS) Espresso Consumer Coordinator (Leader) Kafka Producer
Brooklin Instance Datastream Management Service (DMS) Espresso Consumer Coordinator Kafka Producer Brooklin Instance Datastream Management Service (DMS) Espresso Consumer Coordinator Kafka Producer
Member DB News Feed service
Brooklin Client Load Balancer ZooKeeper
Brooklin Instance Datastream Management Service (DMS) Espresso Consumer Coordinator (Leader) Kafka Producer
Brooklin Instance Datastream Management Service (DMS) Espresso Consumer Coordinator Kafka Producer Brooklin Instance Datastream Management Service (DMS) Espresso Consumer Coordinator Kafka Producer
Member DB News Feed service
Brooklin Client Load Balancer ZooKeeper
Brooklin Instance Datastream Management Service (DMS) Espresso Consumer Coordinator (Leader) Kafka Producer
Brooklin Instance Datastream Management Service (DMS) Espresso Consumer Coordinator Kafka Producer Brooklin Instance Datastream Management Service (DMS) Espresso Consumer Coordinator Kafka Producer
Member DB News Feed service
Brooklin Client Load Balancer ZooKeeper
Brooklin Instance Datastream Management Service (DMS) Espresso Consumer Coordinator (Leader) Kafka Producer
Brooklin Instance Datastream Management Service (DMS) Espresso Consumer Coordinator Kafka Producer Brooklin Instance Datastream Management Service (DMS) Espresso Consumer Coordinator Kafka Producer
Member DB News Feed service
Brooklin Client Load Balancer ZooKeeper
Brooklin Instance Datastream Management Service (DMS) Espresso Consumer Coordinator (Leader) Kafka Producer
Brooklin Instance Datastream Management Service (DMS) Espresso Consumer Coordinator Kafka Producer Brooklin Instance Datastream Management Service (DMS) Espresso Consumer Coordinator Kafka Producer
Member DB News Feed service
Brooklin Client Load Balancer ZooKeeper
Brooklin Instance Datastream Management Service (DMS) Espresso Consumer Coordinator (Leader) Kafka Producer
Brooklin Instance Datastream Management Service (DMS) Espresso Consumer Coordinator Kafka Producer Brooklin Instance Datastream Management Service (DMS) Espresso Consumer Coordinator Kafka Producer
Member DB News Feed service
Brooklin Client Load Balancer ZooKeeper
Brooklin Instance Datastream Management Service (DMS) Espresso Consumer Coordinator (Leader) Kafka Producer
Brooklin Instance Datastream Management Service (DMS) Espresso Consumer Coordinator Kafka Producer Brooklin Instance Datastream Management Service (DMS) Espresso Consumer Coordinator Kafka Producer
Member DB News Feed service News Feed service News Feed service
ZooKeeper
Brooklin Instance
Brooklin Instance Coordinator Datastream Management Service (DMS) Consumer A Consumer B Producer X Producer Y Producer Z Brooklin Instance
Datastream Management Service (DMS) Consumer A Producer X Consumer B Producer Y Producer Z Coordinator (Leader)
Brooklin Instance Coordinator Datastream Management Service (DMS) Consumer A Consumer B Producer X Producer Y Producer Z Brooklin Instance
○ Espresso ○ Oracle ○ Kafka ○ EventHubs ○ Kinesis
○ Kafka ○ EventHubs
sources and destinations
datastreams across several source and destination types
maintained at partition level
improved latency with flushless-produce mode
Sources & Destinations Features
messages/day
datastreams
unique sources
applications
Brooklin streams with Espresso, Oracle, or EventHubs as the source
messages/day
datastreams
topics
Brooklin streams mirroring Kafka data
○ MySQL ○ Cosmos DB ○ Azure SQL
○ Azure Blob storage ○ Kinesis ○ Cosmos DB ○ Azure SQL ○ Couchbase
Sources & Destinations Open Source
2019 (soon!)
Optimizations
Write multiple
: ckung@linkedin.com : /in/celiakkung/