Matteo Merli
Guaranteed “effectively-once” messaging semantic
Matteo Merli What is Apache Pulsar? Distributed pub/sub messaging - - PowerPoint PPT Presentation
Guaranteed e ff ectively-once messaging semantic Matteo Merli What is Apache Pulsar? Distributed pub/sub messaging Backed by a scalable log store Apache BookKeeper Streaming & Queuing Low latency Multi-tenant
Matteo Merli
Guaranteed “effectively-once” messaging semantic
2
between brokers bookies
can be added independently
very quickly across brokers
up on traffic quickly
3
Pulsar Broker 1 Pulsar Broker 2 Pulsar Broker 3
Bookie 1 Bookie 2 Bookie 3 Bookie 4 Bookie 5
Apache BookKeeper Apache Pulsar
Producer Consumer
4
5
broadcast”
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
producer as before
stream, …)
23
ProducerConfiguration conf = new ProducerConfiguration(); conf.setProducerName("my-producer-name"); conf.setSendTimeout(0, TimeUnit.SECONDS); Producer producer = client.createProducer(MY_TOPIC, conf); // Get last committed sequence id before crash long lastSequenceId = producer.getLastSequenceId();
24
// Fictitious record reader class RecordReader source = new RecordReader("/my/file/path"); long fileOffset = producer.getLastSequenceId(); source.seekToOffset(fileOffset); while (source.hasNext()) { long currentOffset = source.currentOffset(); Message msg = MessageBuilder.create() .setSequenceId(currentOffset) .setContent(source.next()).build(); producer.send(msg); }
25
Consumer consumer = client.subscribe(MY_TOPIC, MY_SUBSCRIPTION_NAME); while (true) { Message msg = consumer.receive(); // Process the message... consumer.acknowledge(msg); }
26
deduplicate the processing results. Eg:
27
from
28
MessageId lastMessageId = recoverLastMessageIdFromDB(); Reader reader = client.createReader(MY_TOPIC, lastMessageId, new ReaderConfiguration()); while (true) { Message msg = reader.readNext(); byte[] msgId = msg.getMessageId().toByteArray(); // Process the message and store msgId atomically }
29
30
def process(input): return input + '!'
31
32
each source topic/partition, to ensure monotonic sequence ids
33
34
OpenMessaging Benchmark 1 Topic / 1 Partition 1 Partition / 1 Consumer 1Kb msg
35
36
Kafka Pulsar Producer Idempotency Best-effort (in memory only) Guaranteed after crash Transactions 2 phase commit No transactions Dedup across producer sessions No Yes Dedup with geo- replication No Yes Throughput Lower (1 in-flight message/batch for
Equal
37