APACHE PULSAR - THE NEXT GENERATION MESSAGING AND QUEUING KARTHIK - - PowerPoint PPT Presentation

apache pulsar the next generation messaging and queuing
SMART_READER_LITE
LIVE PREVIEW

APACHE PULSAR - THE NEXT GENERATION MESSAGING AND QUEUING KARTHIK - - PowerPoint PPT Presentation

1 APACHE PULSAR - THE NEXT GENERATION MESSAGING AND QUEUING KARTHIK RAMASAMY SENIOR DIRECTOR OF ENGINEERING SPLUNK @KARTHIKZ 2 Connected World 3 Ubiquity of Real-Time Data Streams & Events 4 EVENT/STREAM DATA PROCESSING


slide-1
SLIDE 1

1

APACHE PULSAR - THE NEXT GENERATION MESSAGING AND QUEUING

KARTHIK RAMASAMY

SENIOR DIRECTOR OF ENGINEERING

SPLUNK

@KARTHIKZ

slide-2
SLIDE 2

2

Connected World

slide-3
SLIDE 3

3

Ubiquity of Real-Time Data Streams & Events

slide-4
SLIDE 4

EVENT/STREAM DATA PROCESSING

4

✦ Events are analyzed and processed as they arrive ✦ Decisions are timely, contextual and based on fresh data ✦ Decision latency is eliminated ✦ Data in motion

Ingest/ Buffer Analyze Act

slide-5
SLIDE 5

MICROSERVICES MODEL INFERENCE

WORKFLOWS ANALYTICS MONITORING

EVENT/STREAM PROCESSING PATTERNS

slide-6
SLIDE 6

STREAM PROCESSING PATTERN

6

Compute Messaging Storage

Data Ingestion Data Processing Results Storage Data Storage Data Serving

slide-7
SLIDE 7

APACHE PULSAR

7

Flexible Messaging + Queueing System backed by a durable log storage

slide-8
SLIDE 8

Key Concepts

slide-9
SLIDE 9

Core concepts: Tenants, namespaces, topics

9

Apache Pulsar Cluster

Tenants Namespaces Topics

Marketing Sales Security

Analytics Campaigns Data transformation Data Integration Microservices

Visits Conversions Responses Conversions Transactions Interactions Log events Signatures Accesses

slide-10
SLIDE 10

Topics

10

Topic

Producers Consumers

Time

Consumers Consumers Producers

slide-11
SLIDE 11

Topic partitions

11

Topic - P0

Time

Topic - P1 Topic - P2

Producers Producers Consumers Consumers Consumers

slide-12
SLIDE 12

Segments

12

Time

Segment 1 Segment 2 Segment 3 Segment 1 Segment 2 Segment 3 Segment 4 Segment 1 Segment 2 Segment 3

P0 P1 P2

slide-13
SLIDE 13

Architecture

slide-14
SLIDE 14

APACHE PULSAR

14

Bookie Bookie Bookie Broker Broker Broker Producer Consumer SERVING

Brokers can be added independently Traffic can be shifted quickly across brokers

STORAGE

Bookies can be added independently New bookies will ramp up traffic quickly

slide-15
SLIDE 15

APACHE PULSAR - BROKER

15

✦ Broker is the only point of interaction for clients (producers and consumers) ✦ Brokers acquire ownership of group of topics and “serve” them ✦ Broker has no durable state ✦ Provides service discovery mechanism for client to connect to right broker

slide-16
SLIDE 16

APACHE PULSAR - BROKER

16

slide-17
SLIDE 17

APACHE PULSAR - CONSISTENCY

17

Bookie Bookie Bookie Broker Producer

slide-18
SLIDE 18

APACHE PULSAR - DURABILITY (NO DATA LOSS)

18

Bookie Bookie Bookie Broker Producer

Journal Journal Journal fsync fsync fsync

slide-19
SLIDE 19

APACHE PULSAR - ISOLATION

19

slide-20
SLIDE 20

APACHE PULSAR - SEGMENT STORAGE

20

2 3 4 … 20 21 22 23 … 40 41 42 43 … 60 61 62 63 … Segment 1 Segment 3 Segment 2 Segment 2 Segment 1 Segment 3 Segment 4 Segment 3 Segment 2 Segment 1 Segment 4 Segment 4

slide-21
SLIDE 21

APACHE PULSAR - RESILIENCY

21

1 2 3 4 … 20 21 22 23 … 40 41 42 43 … 60 61 62 63 … Segment 1 Segment 3 Segment 2 Segment 2 Segment 1 Segment 3 Segment 4 Segment 3 Segment 2 Segment 1 Segment 4 Segment 4

slide-22
SLIDE 22

APACHE PULSAR - SEAMLESS CLUSTER EXPANSION

22

1 2 3 4 … 20 21 22 23 … 40 41 42 43 … 60 61 62 63 … Segment 1 Segment 3 Segment 2 Segment 2 Segment 1 Segment 3 Segment 4 Segment 3 Segment 2 Segment 1 Segment 4 Segment 4 Segment Y Segment Z Segment X

slide-23
SLIDE 23

APACHE PULSAR - TIERED STORAGE

23

Low Cost Storage 1 2 3 4 … 20 21 22 23 … 40 41 42 43 … 60 61 62 63 … Segment 3 Segment 2 Segment 3 Segment 4 Segment 3 Segment 1 Segment 4 Segment 4

slide-24
SLIDE 24

Multi-tiered storage and serving

24 Partition

Broker Broker Broker

.
 .
 . .
 .
 . .
 .
 . .
 .
 .

Processing (brokers) Warm Storage Cold Storage

Tailing reads: served from in-memory cache Catch-up reads: served from persistent storage layer Historical reads: served from cold storage

slide-25
SLIDE 25

PARTITIONS VS SEGMENTS - WHY SHOULD YOU CARE?

25

Legacy Architectures

# Storage co-resident with processing # Partition-centric # Cumbersome to scale--data redistribution, performance impact

Logical View

Apache Pulsar

# Storage decoupled from processing # Partitions stored as segments # Flexible, easy scalability

Partition

Processing & Storage

Segment 1 Segment 3 Segment 2 Segment n

Partition

Broker Partition (primary) Broker Partition (copy) Broker Partition (copy) Broker Broker Broker

Segment 1 Segment 2 Segment n

.
 .
 .

Segment 2 Segment 3 Segment n

.
 .
 .

Segment 3 Segment 1 Segment n

.
 .
 .

Segment 1 Segment 2 Segment n

.
 .
 .

Processing (brokers) Storage

slide-26
SLIDE 26

DEPLOYMENT IN K8S

26

Broker1 Broker3 Broker2

S1 S2 S3 LB1 LB2 LB3

Broker Broker Broker

Segment 1 Segment 2 Segment n

.
 .
 .

Segment 2 Segment 3 Segment n

.
 .
 .

Segment 3 Segment 1 Segment n

.
 .
 .

Segment 1 Segment 2 Segment n

.
 .
 .

S LB

slide-27
SLIDE 27

PARTITIONS VS SEGMENTS - WHY SHOULD YOU CARE?

27

✦ In Kafka, partitions are assigned to brokers “permanently” ✦ A single partition is stored entirely in a single node ✦ Retention is limited by a single node storage capacity ✦ Failure recovery and capacity expansion require expensive “rebalancing” ✦ Rebalancing has a big impact over the system, affecting regular traffic

slide-28
SLIDE 28

UNIFIED MESSAGING MODEL - STREAMING

28

Pulsar topic/ partition

Producer 2 Producer 1 Consumer 1 Consumer 2

Subscription A M4 M3 M2 M1 M0 M4 M3 M2 M1 M0

X

Exclusive

slide-29
SLIDE 29

UNIFIED MESSAGING MODEL - STREAMING

29

Pulsar topic/ partition

Producer 2 Producer 1 Consumer 1 Consumer 2

Subscription B M4 M3 M2 M1 M0 M4 M3 M2 M1 M0

Failover

In case of failure in consumer 1

slide-30
SLIDE 30

UNIFIED MESSAGING MODEL - QUEUING

30

Pulsar topic/ partition

Producer 2 Producer 1 Consumer 2 Consumer 3

Subscription C M4 M3 M2 M1 M0

Shared

Traffic is equally distributed across consumers

Consumer 1

M4 M3 M2 M1 M0

slide-31
SLIDE 31

DISASTER RECOVERY

31

Topic (T1) Topic (T1) Topic (T1)

Subscription (S1) Subscription (S1) Producer (P1) Consumer Producer (P3) Producer (P2) Consumer

Data Center A Data Center B Data Center C

Integrated in the broker message flow Simple configuration to add/remove regions Asynchronous (default) and synchronous replication

slide-32
SLIDE 32
  • Two independent clusters,

primary and standby

  • Configured tenants and

namespaces replicate to standby

  • Data published to primary is

asynchronously replicated to standby

  • Producers and consumers

restarted in second datacenter upon primary failure

Asynchronous replication example

32

Producers (active)

Datacenter 1

Consumers (active)

Pulsar Cluster (primary)

Datacenter 2

Producers (standby) Consumers (standby) Pulsar Cluster (standby) Pulsar replication ZooKeeper ZooKeeper

slide-33
SLIDE 33

ZooKeeper

  • Each topic owned by one

broker at a time, i.e. in one datacenter

  • ZooKeeper cluster spread

across multiple locations

  • Broker commits writes to

bookies in both datacenters

  • In event of datacenter failure,

broker in surviving datacenter assumes ownership of topic

Synchronous replication example

33

Producers

Datacenter 1

Consumers

Pulsar Cluster

Datacenter 2

Producers Consumers

slide-34
SLIDE 34

Replicated subscriptions

34 Producers

Datacenter 1

Consumers Pulsar Cluster 1 Subscriptions

Datacenter 2

Consumers Pulsar Cluster 2 Subscriptions

Pulsar Replication

Marker Marker Marker

slide-35
SLIDE 35

MULTITENANCY - CLOUD NATIVE

35

Apache Pulsar Cluster

Product Safety ETL

Fraud Detection

Topic-1 Account History Topic-2 User Clustering Topic-1 Risk Classification

Marketing

Campaigns

ETL

Topic-1 Budgeted Spend Topic-2 Demographic Classification Topic-1 Location Resolution

Data Serving

Microservice Topic-1 Customer Authentication

10 TB 7 TB 5 TB

✦ Authentication ✦ Authorization ✦ Software isolation

๏ Storage quotas, flow control, back pressure, rate limiting

✦ Hardware isolation

๏ Constrain some tenants on a subset of brokers/bookies

slide-36
SLIDE 36

PULSAR CLIENTS

36

Apache Pulsar Cluster

Java Python Go C++ C

slide-37
SLIDE 37

PULSAR PRODUCER

37

PulsarClient client = PulsarClient.create( “http://broker.usw.example.com:8080”); Producer producer = client.createProducer( “persistent://my-property/us-west/my-namespace/my-topic”); // handles retries in case of failure producer.send("my-message".getBytes()); // Async version: producer.sendAsync("my-message".getBytes()).thenRun(() -> { // Message was persisted });

slide-38
SLIDE 38

PULSAR CONSUMER

38

PulsarClient client = PulsarClient.create( "http://broker.usw.example.com:8080"); Consumer consumer = client.subscribe( "persistent://my-property/us-west/my-namespace/my-topic", "my-subscription-name"); while (true) { // Wait for a message Message msg = consumer.receive(); System.out.println("Received message: " + msg.getData()); // Acknowledge the message so that it can be deleted by broker consumer.acknowledge(msg); }

slide-39
SLIDE 39

SCHEMA REGISTRY

39

✦ Provides type safety to applications built on top of Pulsar ✦ Two approaches ✦ Client side - type safety enforcement up to the application ✦ Server side - system enforces type safety and ensures that producers and consumers remain synced ✦ Schema registry enables clients to upload data schemas on a topic basis. ✦ Schemas dictate which data types are recognized as valid for that topic

slide-40
SLIDE 40

PULSAR SCHEMAS - HOW DO THEY WORK?

40

✦ Enforced at the topic level ✦ Pulsar schemas consists of ✦ Name - Name refers to the topic to which the schema is applied ✦ Payload - Binary representation of the schema ✦ Schema type - JSON, Protobuf and Avro ✦ User defined properties - Map of strings to strings (application specific - e.g git hash of the schema)

slide-41
SLIDE 41

SCHEMA VERSIONING

41

PulsarClient client = PulsarClient.builder() .serviceUrl(“http://broker.usw.example.com:6650") .build() Producer<SensorReading> producer = client.newProducer(JSONSchema.of(SensorReading.class)) .topic(“sensor-data”) .sendTimeout(3, TimeUnit.SECONDS) .create()

Scenario What happens

No schema exists for the topic Producer is created using the given schema Schema already exists; producer connects using the same schema that’s already stored Schema is transmitted to the broker, determines that it is already stored Schema already exists; producer connects using a new schema that is compatible Schema is transmitted, compatibility determined and stored as new schema

slide-42
SLIDE 42

Processing framework

slide-43
SLIDE 43

HOW TO PROCESS DATA MODELED AS STREAMS

43

✦ Consume data as it is produced (pub/sub) ✦ Light weight compute - transform and react to data as it arrives ✦ Heavy weight compute - continuous data processing ✦ Interactive query of stored streams

slide-44
SLIDE 44

LIGHT WEIGHT COMPUTE

44

f(x)

Incoming Messages Output Messages

ABSTRACT VIEW OF COMPUTE REPRESENTATION

slide-45
SLIDE 45

TRADITIONAL COMPUTE REPRESENTATION

45

DAG

% % % % %

Source 1 Source 2 Action Action Action Sink 1 Sink 2

slide-46
SLIDE 46

REALIZING COMPUTATION - EXPLICIT CODE

46

public static class SplitSentence extends BaseBasicBolt { @Override public void declareOutputFields(OutputFieldsDeclarer declarer) { declarer.declare(new Fields("word")); } @Override public Map<String, Object> getComponentConfiguration() { return null; } public void execute(Tuple tuple, BasicOutputCollector basicOutputCollector) { String sentence = tuple.getStringByField("sentence"); String words[] = sentence.split(" "); for (String w : words) { basicOutputCollector.emit(new Values(w)); } } }

STITCHED BY PROGRAMMERS

slide-47
SLIDE 47

REALIZING COMPUTATION - FUNCTIONAL

47

Builder.newBuilder() .newSource(() -> StreamletUtils.randomFromList(SENTENCES)) .flatMap(sentence -> Arrays.asList(sentence.toLowerCase().split("\\s+"))) .reduceByKeyAndWindow(word -> word, word -> 1, WindowConfig.TumblingCountWindow(50), (x, y) -> x + y);

slide-48
SLIDE 48

TRADITIONAL REAL TIME - SEPARATE SYSTEMS

48

Messaging Compute

slide-49
SLIDE 49

TRADITIONAL REAL TIME SYSTEMS

49

DEVELOPER EXPERIENCE

✦ Powerful API but complicated ✦ Does everyone really need to learn functional programming? ✦ Configurable and scalable but management overhead ✦ Edge systems have resource and management constraints

slide-50
SLIDE 50

TRADITIONAL REAL TIME SYSTEMS

50

OPERATIONAL EXPERIENCE

✦ Multiple systems to operate ✦ IoT deployments routinely have thousands of edge systems ✦ Semantic differences ✦ Mismatch and duplication between systems ✦ Creates developer and operator friction

slide-51
SLIDE 51

LESSONS LEARNT - USE CASES

51

✦ Data transformations ✦ Data classification ✦ Data enrichment ✦ Data routing ✦ Data extraction and loading ✦ Real time aggregation ✦ Microservices

Significant set of processing tasks are exceedingly simple

slide-52
SLIDE 52

EMERGENCE OF CLOUD - SERVERLESS

52

✦ Simple function API ✦ Functions are submitted to the system ✦ Runs per events ✦ Composition APIs to do complex things ✦ Wildly popular

slide-53
SLIDE 53

SERVERLESS VS STREAMING

53

✦ Both are event driven architectures ✦ Both can be used for analytics and data serving ✦ Both have composition APIs

๏ Configuration based for serverless ๏ DSL based for streaming

✦ Serverless typically does not guarantee ordering ✦ Serverless is pay per action

slide-54
SLIDE 54

STREAM NATIVE COMPUTE USING FUNCTIONS

54

✦ Simplest possible API -function or a procedure ✦ Support for multi language ✦ Use of native API for each language ✦ Scale developers ✦ Use of message bus native concepts - input and output as topics ✦ Flexible runtime - simple standalone applications vs managed system applications

APPLYING INSIGHT GAINED FROM SERVERLESS

slide-55
SLIDE 55

PULSAR FUNCTIONS

55

SDK LESS API

import java.util.function.Function; public class ExclamationFunction implements Function<String, String> { @Override public String apply(String input) { return input + "!"; } }

slide-56
SLIDE 56

PULSAR FUNCTIONS

56

SDK API

import org.apache.pulsar.functions.api.PulsarFunction; import org.apache.pulsar.functions.api.Context; public class ExclamationFunction implements PulsarFunction<String, String> { @Override public String process(String input, Context context) { return input + "!"; } }

slide-57
SLIDE 57

PULSAR FUNCTIONS

57

✦ Function executed for every message of input topic ✦ Support for multiple topics as inputs ✦ Function output goes into output topic - can be void topic as well ✦ SerDe takes care of serialization/deserialization of messages

๏ Custom SerDe can be provided by the users ๏ Integration with schema registry

slide-58
SLIDE 58

PROCESSING GUARANTEES

58

✦ ATMOST_ONCE

๏ Message acked to Pulsar as soon as we receive it

✦ ATLEAST_ONCE

๏ Message acked to Pulsar after the function completes ๏ Default behavior - don’t want people to loose data

✦ EFFECTIVELY_ONCE

๏ Uses Pulsar’s inbuilt effectively once semantics

✦ Controlled at runtime by user

slide-59
SLIDE 59

DEPLOYING FUNCTIONS - BROKER

59

Broker 1 Worker Function wordcount-1 Function transform-2 Broker 1 Worker Function transform-1 Function dataroute-1 Broker 1 Worker Function wordcount-2 Function transform-3 Node 1 Node 2 Node 3

slide-60
SLIDE 60

DEPLOYING FUNCTIONS - WORKER NODES

60

Worker Function wordcount-1 Function transform-2 Worker Function transform-1 Function dataroute-1 Worker Function wordcount-2 Function transform-3 Node 1 Node 2 Node 3 Broker 1 Broker 2 Broker 3 Node 4 Node 5 Node 6

slide-61
SLIDE 61

DEPLOYING FUNCTIONS - KUBERNETES

61

Function wordcount-1 Function transform-1 Function transform-3 Pod 1 Pod 2 Pod 3 Broker 1 Broker 2 Broker 3 Pod 7 Pod 8 Pod 9 Function dataroute-1 Function wordcount-2 Function transform-2 Pod 4 Pod 5 Pod 6

slide-62
SLIDE 62

BUILT-IN STATE MANAGEMENT IN FUNCTIONS

62

✦ Functions can store state in inbuilt storage

๏ Framework provides a simple library to store and retrieve state

✦ Support server side operations like counters ✦ Simplified application development

๏ No need to standup an extra system

slide-63
SLIDE 63

DISTRIBUTED STATE IN FUNCTIONS

63

import org.apache.pulsar.functions.api.Context; import org.apache.pulsar.functions.api.PulsarFunction; public class CounterFunction implements PulsarFunction<String, Void> { @Override public Void process(String input, Context context) throws Exception { for (String word : input.split("\\.")) { context.incrCounter(word, 1); } return null; } }

slide-64
SLIDE 64

PULSAR - DATA IN AND OUT

64

✦ Users can write custom code using Pulsar producer and consumer API ✦ Challenges

๏ Where should the application to publish data or consume data from Pulsar? ๏ How should the application to publish data or consume data from Pulsar?

✦ Current systems have no organized and fault tolerant way to run applications that ingress and egress data from and to external systems

slide-65
SLIDE 65

PULSAR IO TO THE RESCUE

65

Apache Pulsar Cluster

Source Sink

slide-66
SLIDE 66

PULSAR IO - EXECUTION

66

Broker 1 Worker Sink Cassandra-1 Source Kinesis-2 Broker 2 Worker Source Kinesis-1 Source Twitter-1 Broker 3 Worker Sink Cassandra-2 Source Kinesis-3 Node 1 Node 2 Node 3

Fault tolerance Parallelism Elasticity Load Balancing On-demand updates

slide-67
SLIDE 67

INTERACTIVE QUERYING OF STREAMS - PULSAR SQL

67

1 2 3 4 … 20 21 22 23 … 40 41 42 43 … 60 61 62 63 … Segment 1 Segment 3 Segment 2 Segment 2 Segment 1 Segment 3 Segment 4 Segment 3 Segment 2 Segment 1 Segment 4 Segment 4

Segment Reader Segment Reader Segment Reader Segment Reader

Coordinator

slide-68
SLIDE 68

PULSAR PERFORMANCE

68

slide-69
SLIDE 69

PULSAR PERFORMANCE - LATENCY

69

slide-70
SLIDE 70

APACHE PULSAR VS. APACHE KAFKA

70 Multi-tenancy A single cluster can support many tenants and use cases Seamless Cluster Expansion Expand the cluster without any down time High throughput & Low Latency Can reach 1.8 M messages/s in a single partition and publish latency of 5ms at 99pct Durability Data replicated and synced to disk Geo-replication Out of box support for geographically distributed applications Unified messaging model Support both Topic & Queue semantic in a single model Tiered Storage Hot/warm data for real time access and cold event data in cheaper storage Pulsar Functions Flexible light weight compute Highly scalable Can support millions of topics, makes data modeling easier

slide-71
SLIDE 71

Growing ecosystem

71

slide-72
SLIDE 72

Examples of companies using Apache Pulsar

72

Open source adopters Open source evaluators Streamlio

  • utreach

Growing funnel of validation and leads from outbound, inbound and open source

slide-73
SLIDE 73

Pulsar in Production

73

slide-74
SLIDE 74

Scenario

Need to collect and distribute user and data events to distributed global applications at Internet scale

Challenges

  • Multiple technologies to handle

messaging needs

  • Multiple, siloed messaging clusters
  • Hard to meet scale and performance
  • Complex, fragile environment

Yahoo!

74

Solution

  • Central event data bus using Apache Pulsar
  • Consolidated multiple technologies and clusters into a

single solution

  • Fully-replicated across 8 global datacenter
  • Processing >100B messages / day, 2.3M topics
slide-75
SLIDE 75

APACHE PULSAR IN PRODUCTION @SCALE

75

4+ years Serves 2.3 million topics 700 billion messages/day 500+ bookie nodes 200+ broker nodes Average latency < 5 ms 99.9% 15 ms (strong durability guarantees) Zero data loss 150+ applications Self served provisioning Full-mesh cross-datacenter replication - 8+ data centers

slide-76
SLIDE 76

Use Cases

slide-77
SLIDE 77

Example use cases

77

Streaming data transformation Data distribution Real-time analytics Real-time monitoring and notifications IoT analytics

!

Event-driven workflows Interactive applications Log processing and analytics

slide-78
SLIDE 78

Data-driven workflows

78

Scenario

Application processes incoming events and documents that generate processing workflows

Challenges

Operational burdens and scalability challenges of existing technologies growing as data grows

Solution

Process incoming events and data and create work queues in same system

Decrypt, extract, convert, dispatch, process, store

slide-79
SLIDE 79

Data distribution

79

Data collected from multiple sources Normalized, enriched transformed and put into topics Delivered to applications and users as data streams Distribution and usage logged for auditing

Data Sources

slide-80
SLIDE 80

Scenario

Retail analytics software provider brings together

  • perational and market research data for

insights.

Challenges

Existing Kinesis + Spark + data lake infrastructure was unnecessarily complex and burdensome to operate and maintain.

Solution

  • Replaced Kinesis + Spark with Apache Pulsar
  • Simplified data transformation pipeline
  • Reduced operations burdens

Simplifying the data pipeline

80

Data Lake

slide-81
SLIDE 81

Event sourcing

81

Solution

Deploy Apache Pulsar for long-term retention and scalable processing and distribution of event data.

Why Streamlio

  • Architected for scalable and efficient long-term storage
  • High performance, scalable processing and distribution of

data due to unique architecture

Problem

Event-driven applications require long-term retention of data streams, but current technologies are cumbersome and expensive to use for data retention and cannot efficiently replay data.

slide-82
SLIDE 82

IOT ENVIRONMENT

82

D Smart D

Edge Aggregator

Light Device Smart Device Edge Node

✦ Typically sensors ✦ Only one functionality ✦ Simple to configure ✦ Light weight protocols to communicate ✦ Typically ARM based ✦ Multiple functionality ✦ Basic but generic computational logic, limited storage ✦ Light weight and propriety protocols to communicate ✦ Multicore based ✦ Versatile functionality ✦ Complex and generic computational logic, decent amount of storage ✦ Light weight and propriety protocols to communicate

Cloud Cloud

✦ Multiple machines ✦ Versatile functionality ✦ Complex and generic computational logic ✦ Lots of storage

slide-83
SLIDE 83

IOT DATA FABRIC WITH APACHE PULSAR

83

Apache Pulsar Cloud

Apache Pulsar Edge Apache Pulsar Edge

Apache Pulsar Device Apache Pulsar Device Apache Pulsar Device Apache Pulsar Device

D D D D

Apache Pulsar Device

D

filter-fn

Web Socket API Web Socket API Web Socket API Web Socket API

xform-fn xform-fn

Web Socket API

aggr-fn xform- fn aggr-fn

Data Replication Data Replication Data Replication Data Replication Data Replication

slide-84
SLIDE 84

Scenario

Continuously-arriving data generated by connected cars needs to be quickly collected, processed and distributed to applications and partners

Challenges

Require scalability to handle growing data sources and volumes without complex mix of technologies

Solution

Leverage Streamlio solution to provide data backbone that can receive, transform, and distribute data at scale

Large Car Manufacturer: Connected vehicle

84

slide-85
SLIDE 85

Large Car Manufacturer: Connected vehicle

85

Telemetry data from connected vehicles transmitted and published to Pulsar Data cleansing, enrichment and refinement processed inside Pulsar Data made available to internal teams for analysis and reports Data feeds supplied to partners and partner applications

slide-86
SLIDE 86

Scenario

Continuously ingest logs from big data system for distributed to appropriate teams with appropriate log transformations and enrichment

Challenges

Require scalability to handle growing set of big data systems and larger log volumes

Solution

Leverage Streamlio Pulsar solution to provide logging backbone that can ingest, transform, and distribute logs at scale

Large Car Manufacturer: Big Data Logging System

86

slide-87
SLIDE 87

Large Car Manufacturer: Big Data Logging System

87 Pulsar functions to route and transform logs to different teams Team 1 logs Team 2 logs

slide-88
SLIDE 88

Connected consumer

88

Connected consumer electronic devices Emit event data that is collected and processed in Pulsar Generating notifications and work requests Distributed to microservices for processing Supporting connected services and applications

slide-89
SLIDE 89

89

✓


  • ✓

  • MORE READINGS
slide-90
SLIDE 90

MORE READINGS

90

✓


  • ✓

  • ✓

  • ✓

  • ✓

slide-91
SLIDE 91

QUESTIONS

91

slide-92
SLIDE 92

STAY IN TOUCH

@karthikz

TWITTER EMAIL

karthik@streaml.io

92

slide-93
SLIDE 93

93

@karthikz