Dive into Streams with Brooklin Celia Kung LinkedIn Background - - PowerPoint PPT Presentation

dive into streams with brooklin
SMART_READER_LITE
LIVE PREVIEW

Dive into Streams with Brooklin Celia Kung LinkedIn Background - - PowerPoint PPT Presentation

Dive into Streams with Brooklin Celia Kung LinkedIn Background Scenarios Outline Application Use Cases Architecture Current and Future Background Nearline Applications Require near real-time response Thousands of applications at


slide-1
SLIDE 1

Dive into Streams with Brooklin

Celia Kung

LinkedIn

slide-2
SLIDE 2

Outline

Background Scenarios Application Use Cases Architecture Current and Future

slide-3
SLIDE 3

Background

slide-4
SLIDE 4

Nearline Applications

  • Require near real-time response
  • Thousands of applications at LinkedIn

○ E.g. Live search indices, Notifications

slide-5
SLIDE 5

Nearline Applications

  • Require continuous, low-latency access to data

○ Data could be spread across multiple database systems

  • Need an easy way to move data to applications

○ App devs should focus on event processing and not on

data access

slide-6
SLIDE 6

Heterogeneous Data Systems

Espresso

(LinkedIn’s document store)

Microsoft EventHubs

slide-7
SLIDE 7

Building the Right Infrastructure

  • Build separate, specialized

solutions to stream data from and to each different system?

○ Slows down development ○ Hard to manage! Microsoft EventHubs

...

Streaming System C Streaming System D Streaming System A Streaming System B

...

Nearline Applications

slide-8
SLIDE 8

Need a centralized, managed, and extensible service to continuously deliver data in near real-time

slide-9
SLIDE 9

Brooklin

slide-10
SLIDE 10

Brooklin

  • Streams are dynamically

provisioned and individually configured

  • Extensible: Plug-in support for

additional sources/destinations

  • Streaming data pipeline service
  • Propagates data from many source

types to many destination types

  • Multitenant: Can run several

thousand streams simultaneously

slide-11
SLIDE 11

Kinesis EventHubs Kafka

Destinations Applications

Kinesis EventHubs Kafka Databases Messaging Systems

Sources

Espresso

Pluggable Sources & Destinations

slide-12
SLIDE 12

Scenarios

slide-13
SLIDE 13

Change Data Capture

Scenario 1:

slide-14
SLIDE 14

Capturing Live Updates

  • 1. Member updates her profile to

reflect her recent job change

slide-15
SLIDE 15

Capturing Live Updates

  • 2. LinkedIn wants to inform her

colleagues of this change

slide-16
SLIDE 16

Capturing Live Updates

Member DB

Updates

News Feed Service

query

slide-17
SLIDE 17

Capturing Live Updates

Member DB

Updates

News Feed Service Search Indices Service

query query

slide-18
SLIDE 18

Capturing Live Updates

Member DB

Updates

News Feed Service

q u e r y

Notifications Service

Standardization Service

...

query query query

Search Indices Service

slide-19
SLIDE 19

Capturing Live Updates

Member DB

Updates

News Feed Service

q u e r y

Notifications Service

Standardization Service

...

query query query

Search Indices Service

slide-20
SLIDE 20

Capturing Live Updates

Member DB

Updates

News Feed Service

q u e r y

Notifications Service

Standardization Service

...

query query query

Search Indices Service

slide-21
SLIDE 21
  • Isolation: Applications are decoupled

from the sources and don’t compete for resources with online queries

  • Applications can be at different

points in change timelines

  • Brooklin can stream database

updates to a change stream

  • Data processing applications

consume from change streams

Change Data Capture (CDC)

slide-22
SLIDE 22

Change Data Capture (CDC)

Messaging System

News Feed Service Notifications Service Standardization Service Search Indices Service

Member DB

Updates

slide-23
SLIDE 23

Streaming Bridge

Scenario 2:

slide-24
SLIDE 24
  • Across…

○ cloud services ○ clusters ○ data centers

Stream Data from X to Y

slide-25
SLIDE 25

Streaming Bridge

  • Data pipe to move data between

different environments

  • Enforce policy: Encryption,

Obfuscation, Data formats

slide-26
SLIDE 26
  • Aggregating data from all data centers into a centralized place
  • Moving data between LinkedIn and external cloud services (e.g. Azure)
  • Brooklin has replaced Kafka MirrorMaker (KMM) at LinkedIn

○ Issues with KMM: didn’t scale well, difficult to operate and manage, poor failure isolation

Mirroring Kafka Data

slide-27
SLIDE 27

Use Brooklin to Mirror Kafka Data

Destinations Sources

Messaging systems Microsoft EventHubs Messaging systems Microsoft EventHubs Databases Databases

slide-28
SLIDE 28

Kafka MirrorMaker Topology

Datacenter B

aggregate tracking tracking

Datacenter A

aggregate tracking tracking

KMM

aggregate metrics metrics aggregate metrics metrics

Datacenter C

aggregate tracking tracking aggregate metrics metrics

...

KMM KMM KMM KMM KMM KMM KMM KMM KMM KMM KMM KMM KMM KMM KMM KMM KMM

... ... ...

slide-29
SLIDE 29

Brooklin Kafka Mirroring Topology

Datacenter A

aggregate tracking tracking Brooklin metrics aggregate metrics

Datacenter B

aggregate tracking tracking metrics aggregate metrics

Datacenter C

aggregate tracking tracking metrics aggregate metrics

...

Brooklin Brooklin

slide-30
SLIDE 30

Brooklin Kafka Mirroring

  • Optimized for stability and operability
  • Manually pause and resume mirroring at every level

○ Entire pipeline, topic, topic-partition

  • Can auto-pause partitions facing mirroring issues

○ Auto-resumes the partitions afuer a configurable duration

  • Flow of messages from other partitions is unaffected
slide-31
SLIDE 31

Application Use Cases

slide-32
SLIDE 32

Security

Cache

Application Use Cases

slide-33
SLIDE 33

Security

Cache Search Indices

Application Use Cases

slide-34
SLIDE 34

Security

Cache Search Indices ETL or Data warehouse

Application Use Cases

slide-35
SLIDE 35

Security

Cache Search Indices ETL or Data warehouse Materialized Views or Replication

Application Use Cases

slide-36
SLIDE 36

Security

Cache Search Indices ETL or Data warehouse Materialized Views or Replication Repartitioning

Application Use Cases

slide-37
SLIDE 37

Adjunct Data

Application Use Cases

slide-38
SLIDE 38

Bridge Adjunct Data

Application Use Cases

slide-39
SLIDE 39

Bridge Serde, Encryption, Policy Adjunct Data

Application Use Cases

slide-40
SLIDE 40

Bridge Serde, Encryption, Policy Standardization, Notifications … Adjunct Data

Application Use Cases

slide-41
SLIDE 41

Architecture

slide-42
SLIDE 42

Stream updates made to Member Profile

Example:

slide-43
SLIDE 43

Capturing Live Updates

Member DB

Updates

News Feed Service

slide-44
SLIDE 44
  • Scenario: Stream Espresso Member Profile updates into Kafka

○ Source Database: Espresso (Member DB, Profile table) ○ Destination: Kafka ○ Application: News Feed service

Example

slide-45
SLIDE 45
  • Describes the data pipeline
  • Mapping between source and

destination

  • Holds the configuration for the pipeline

Datastream

Name: MemberProfileChangeStream Source: MemberDB/ProfileTable Type: Espresso Partitions: 8 Destination: ProfileTopic Type: Kafka Partitions: 8 Metadata: Application: News Feed service Owner: newsfeed@linkedin.com

slide-46
SLIDE 46
  • 1. Client makes REST call to create datastream

Brooklin Client Load Balancer ZooKeeper

Brooklin Instance Datastream Management Service (DMS) Espresso Consumer Coordinator (Leader) Kafka Producer

Brooklin Instance Datastream Management Service (DMS) Espresso Consumer Coordinator Kafka Producer Brooklin Instance Datastream Management Service (DMS) Espresso Consumer Coordinator Kafka Producer

Member DB

create POST /datastream

News Feed service

slide-47
SLIDE 47
  • 2. Create request goes to any Brooklin instance

Brooklin Client Load Balancer ZooKeeper

Brooklin Instance Datastream Management Service (DMS) Espresso Consumer Coordinator (Leader) Kafka Producer

Brooklin Instance Datastream Management Service (DMS) Espresso Consumer Coordinator Kafka Producer Brooklin Instance Datastream Management Service (DMS) Espresso Consumer Coordinator Kafka Producer

Member DB News Feed service

slide-48
SLIDE 48
  • 3. Datastream is written to ZooKeeper

Brooklin Client Load Balancer ZooKeeper

Brooklin Instance Datastream Management Service (DMS) Espresso Consumer Coordinator (Leader) Kafka Producer

Brooklin Instance Datastream Management Service (DMS) Espresso Consumer Coordinator Kafka Producer Brooklin Instance Datastream Management Service (DMS) Espresso Consumer Coordinator Kafka Producer

Member DB News Feed service

slide-49
SLIDE 49
  • 4. Leader coordinator is notified of new datastream

Brooklin Client Load Balancer ZooKeeper

Brooklin Instance Datastream Management Service (DMS) Espresso Consumer Coordinator (Leader) Kafka Producer

Brooklin Instance Datastream Management Service (DMS) Espresso Consumer Coordinator Kafka Producer Brooklin Instance Datastream Management Service (DMS) Espresso Consumer Coordinator Kafka Producer

Member DB News Feed service

slide-50
SLIDE 50
  • 5. Leader coordinator calculates work distribution

Brooklin Client Load Balancer ZooKeeper

Brooklin Instance Datastream Management Service (DMS) Espresso Consumer Coordinator (Leader) Kafka Producer

Brooklin Instance Datastream Management Service (DMS) Espresso Consumer Coordinator Kafka Producer Brooklin Instance Datastream Management Service (DMS) Espresso Consumer Coordinator Kafka Producer

Member DB News Feed service

slide-51
SLIDE 51
  • 6. Leader coordinator writes the assignments to ZK

Brooklin Client Load Balancer ZooKeeper

Brooklin Instance Datastream Management Service (DMS) Espresso Consumer Coordinator (Leader) Kafka Producer

Brooklin Instance Datastream Management Service (DMS) Espresso Consumer Coordinator Kafka Producer Brooklin Instance Datastream Management Service (DMS) Espresso Consumer Coordinator Kafka Producer

Member DB News Feed service

slide-52
SLIDE 52
  • 7. ZooKeeper is used to communicate the assignments

Brooklin Client Load Balancer ZooKeeper

Brooklin Instance Datastream Management Service (DMS) Espresso Consumer Coordinator (Leader) Kafka Producer

Brooklin Instance Datastream Management Service (DMS) Espresso Consumer Coordinator Kafka Producer Brooklin Instance Datastream Management Service (DMS) Espresso Consumer Coordinator Kafka Producer

Member DB News Feed service

slide-53
SLIDE 53
  • 8. Coordinators hand task assignments to consumers

Brooklin Client Load Balancer ZooKeeper

Brooklin Instance Datastream Management Service (DMS) Espresso Consumer Coordinator (Leader) Kafka Producer

Brooklin Instance Datastream Management Service (DMS) Espresso Consumer Coordinator Kafka Producer Brooklin Instance Datastream Management Service (DMS) Espresso Consumer Coordinator Kafka Producer

Member DB News Feed service

slide-54
SLIDE 54
  • 9. Consumers start streaming data from the source

Brooklin Client Load Balancer ZooKeeper

Brooklin Instance Datastream Management Service (DMS) Espresso Consumer Coordinator (Leader) Kafka Producer

Brooklin Instance Datastream Management Service (DMS) Espresso Consumer Coordinator Kafka Producer Brooklin Instance Datastream Management Service (DMS) Espresso Consumer Coordinator Kafka Producer

Member DB News Feed service

slide-55
SLIDE 55
  • 10. Consumers propagate data to producers

Brooklin Client Load Balancer ZooKeeper

Brooklin Instance Datastream Management Service (DMS) Espresso Consumer Coordinator (Leader) Kafka Producer

Brooklin Instance Datastream Management Service (DMS) Espresso Consumer Coordinator Kafka Producer Brooklin Instance Datastream Management Service (DMS) Espresso Consumer Coordinator Kafka Producer

Member DB News Feed service

slide-56
SLIDE 56
  • 11. Producers write data to the destination

Brooklin Client Load Balancer ZooKeeper

Brooklin Instance Datastream Management Service (DMS) Espresso Consumer Coordinator (Leader) Kafka Producer

Brooklin Instance Datastream Management Service (DMS) Espresso Consumer Coordinator Kafka Producer Brooklin Instance Datastream Management Service (DMS) Espresso Consumer Coordinator Kafka Producer

Member DB News Feed service

slide-57
SLIDE 57
  • 12. App consumes from Kafka

Brooklin Client Load Balancer ZooKeeper

Brooklin Instance Datastream Management Service (DMS) Espresso Consumer Coordinator (Leader) Kafka Producer

Brooklin Instance Datastream Management Service (DMS) Espresso Consumer Coordinator Kafka Producer Brooklin Instance Datastream Management Service (DMS) Espresso Consumer Coordinator Kafka Producer

Member DB News Feed service

slide-58
SLIDE 58
  • 13. Destinations can be shared by apps

Brooklin Client Load Balancer ZooKeeper

Brooklin Instance Datastream Management Service (DMS) Espresso Consumer Coordinator (Leader) Kafka Producer

Brooklin Instance Datastream Management Service (DMS) Espresso Consumer Coordinator Kafka Producer Brooklin Instance Datastream Management Service (DMS) Espresso Consumer Coordinator Kafka Producer

Member DB News Feed service News Feed service News Feed service

slide-59
SLIDE 59

Brooklin Architecture

ZooKeeper

Brooklin Instance

Brooklin Instance Coordinator Datastream Management Service (DMS) Consumer A Consumer B Producer X Producer Y Producer Z Brooklin Instance

Datastream Management Service (DMS) Consumer A Producer X Consumer B Producer Y Producer Z Coordinator (Leader)

Brooklin Instance Coordinator Datastream Management Service (DMS) Consumer A Consumer B Producer X Producer Y Producer Z Brooklin Instance

slide-60
SLIDE 60

Current & Future

slide-61
SLIDE 61

Current

  • Consumers:

○ Espresso ○ Oracle ○ Kafka ○ EventHubs ○ Kinesis

  • Producers:

○ Kafka ○ EventHubs

  • APIs are standardized to support additional

sources and destinations

  • Multitenant: Can power thousands of

datastreams across several source and destination types

  • Guarantees: At-least-once delivery, order is

maintained at partition level

  • Kafka mirroring improvements: finer control
  • f pipelines (pause/auto-pause partitions),

improved latency with flushless-produce mode

Sources & Destinations Features

slide-62
SLIDE 62

38B

messages/day

2K+

datastreams

1K+

unique sources

200+

applications

Brooklin in Production

Brooklin streams with Espresso, Oracle, or EventHubs as the source

slide-63
SLIDE 63

2T+

messages/day

200+

datastreams

10K+

topics

Brooklin in Production

Brooklin streams mirroring Kafka data

slide-64
SLIDE 64

Future

  • Consumers:

○ MySQL ○ Cosmos DB ○ Azure SQL

  • Producers:

○ Azure Blob storage ○ Kinesis ○ Cosmos DB ○ Azure SQL ○ Couchbase

Sources & Destinations Open Source

  • Plan to open source Brooklin in

2019 (soon!)

Optimizations

  • Brooklin auto-scaling
  • Passthrough compression
  • Read optimizations: Read once,

Write multiple

slide-65
SLIDE 65

Thank you

slide-66
SLIDE 66

Questions?

Celia Kung

: ckung@linkedin.com : /in/celiakkung/

slide-67
SLIDE 67