Storm Distributed and fault-tolerant realtime computation Nathan - - PowerPoint PPT Presentation

storm
SMART_READER_LITE
LIVE PREVIEW

Storm Distributed and fault-tolerant realtime computation Nathan - - PowerPoint PPT Presentation

Storm Distributed and fault-tolerant realtime computation Nathan Marz Twitter Basic info Open sourced September 19th Implementation is 15,000 lines of code Used by over 25 companies >2700 watchers on Github (most watched JVM


slide-1
SLIDE 1

Nathan Marz Twitter

Distributed and fault-tolerant realtime computation

Storm

slide-2
SLIDE 2

Basic info

  • Open sourced September 19th
  • Implementation is 15,000 lines of code
  • Used by over 25 companies
  • >2700 watchers on Github (most watched

JVM project)

  • Very active mailing list
  • >2300 messages
  • >670 members
slide-3
SLIDE 3

Before Storm

Queues Workers

slide-4
SLIDE 4

Example

(simplified)

slide-5
SLIDE 5

Example

Workers schemify tweets and append to Hadoop

slide-6
SLIDE 6

Example

Workers update statistics on URLs by incrementing counters in Cassandra

slide-7
SLIDE 7

Example

Use mod/hashing to make sure same URL always goes to same worker

slide-8
SLIDE 8

Scaling

Deploy Reconfigure/redeploy

slide-9
SLIDE 9

Problems

  • Scaling is painful
  • Poor fault-tolerance
  • Coding is tedious
slide-10
SLIDE 10

What we want

  • Guaranteed data processing
  • Horizontal scalability
  • Fault-tolerance
  • No intermediate message brokers!
  • Higher level abstraction than message passing
  • “Just works”
slide-11
SLIDE 11

Storm

Guaranteed data processing Horizontal scalability Fault-tolerance No intermediate message brokers! Higher level abstraction than message passing “Just works”

slide-12
SLIDE 12

Stream processing Continuous computation Distributed RPC

Use cases

slide-13
SLIDE 13

Storm Cluster

slide-14
SLIDE 14

Storm Cluster

Master node (similar to Hadoop JobTracker)

slide-15
SLIDE 15

Storm Cluster

Used for cluster coordination

slide-16
SLIDE 16

Storm Cluster

Run worker processes

slide-17
SLIDE 17

Starting a topology

slide-18
SLIDE 18

Killing a topology

slide-19
SLIDE 19

Concepts

  • Streams
  • Spouts
  • Bolts
  • Topologies
slide-20
SLIDE 20

Streams

Unbounded sequence of tuples

Tuple Tuple Tuple Tuple Tuple Tuple Tuple

slide-21
SLIDE 21

Spouts

Source of streams

slide-22
SLIDE 22

Spout examples

  • Read from Kestrel queue
  • Read from Twitter streaming API
slide-23
SLIDE 23

Bolts

Processes input streams and produces new streams

slide-24
SLIDE 24

Bolts

  • Functions
  • Filters
  • Aggregation
  • Joins
  • Talk to databases
slide-25
SLIDE 25

Topology

Network of spouts and bolts

slide-26
SLIDE 26

Tasks

Spouts and bolts execute as many tasks across the cluster

slide-27
SLIDE 27

Task execution

Tasks are spread across the cluster

slide-28
SLIDE 28

Task execution

Tasks are spread across the cluster

slide-29
SLIDE 29

Stream grouping

When a tuple is emitted, which task does it go to?

slide-30
SLIDE 30

Stream grouping

  • Shuffle grouping: pick a random task
  • Fields grouping: mod hashing on a

subset of tuple fields

  • All grouping: send to all tasks
  • Global grouping: pick task with lowest id
slide-31
SLIDE 31

Topology

shuffle [“url”] shuffle shuffle [“id1”, “id2”] all

slide-32
SLIDE 32

Streaming word count

TopologyBuilder is used to construct topologies in Java

slide-33
SLIDE 33

Streaming word count

Define a spout in the topology with parallelism of 5 tasks

slide-34
SLIDE 34

Streaming word count

Split sentences into words with parallelism of 8 tasks

slide-35
SLIDE 35

Consumer decides what data it receives and how it gets grouped

Streaming word count

Split sentences into words with parallelism of 8 tasks

slide-36
SLIDE 36

Streaming word count

Create a word count stream

slide-37
SLIDE 37

Streaming word count

splitsentence.py

slide-38
SLIDE 38

Streaming word count

slide-39
SLIDE 39

Streaming word count

Submitting topology to a cluster

slide-40
SLIDE 40

Streaming word count

Running topology in local mode

slide-41
SLIDE 41

Demo

slide-42
SLIDE 42

Distributed RPC

Data flow for Distributed RPC

slide-43
SLIDE 43

DRPC Example

Computing “reach” of a URL on the fly

slide-44
SLIDE 44

Reach

Reach is the number of unique people exposed to a URL on Twitter

slide-45
SLIDE 45

Computing reach

URL Tweeter Tweeter Tweeter Follower Follower Follower Follower Follower Follower Distinct follower Distinct follower Distinct follower Count Reach

slide-46
SLIDE 46

Reach topology

slide-47
SLIDE 47

Reach topology

slide-48
SLIDE 48

Reach topology

slide-49
SLIDE 49

Reach topology

Keep set of followers for each request id in memory

slide-50
SLIDE 50

Reach topology

Update followers set when receive a new follower

slide-51
SLIDE 51

Reach topology

Emit partial count after receiving all followers for a request id

slide-52
SLIDE 52

Demo

slide-53
SLIDE 53

Guaranteeing message processing

“Tuple tree”

slide-54
SLIDE 54

Guaranteeing message processing

  • A spout tuple is not fully processed until all

tuples in the tree have been completed

slide-55
SLIDE 55

Guaranteeing message processing

  • If the tuple tree is not completed within a

specified timeout, the spout tuple is replayed

slide-56
SLIDE 56

Guaranteeing message processing

Reliability API

slide-57
SLIDE 57

Guaranteeing message processing

“Anchoring” creates a new edge in the tuple tree

slide-58
SLIDE 58

Guaranteeing message processing

Marks a single node in the tree as complete

slide-59
SLIDE 59

Guaranteeing message processing

  • Storm tracks tuple trees for you in an

extremely efficient way

slide-60
SLIDE 60

Transactional topologies

How do you do idempotent counting with an at least once delivery guarantee?

slide-61
SLIDE 61

Won’t you overcount?

Transactional topologies

slide-62
SLIDE 62

Transactional topologies solve this problem

Transactional topologies

slide-63
SLIDE 63

Built completely on top of Storm’s primitives

  • f streams, spouts, and bolts

Transactional topologies

slide-64
SLIDE 64

Enables fault-tolerant, exactly-once messaging semantics

Transactional topologies

slide-65
SLIDE 65

Batch 1 Batch 2 Batch 3

Transactional topologies

Process small batches of tuples

slide-66
SLIDE 66

Batch 1 Batch 2 Batch 3

Transactional topologies

If a batch fails, replay the whole batch

slide-67
SLIDE 67

Batch 1 Batch 2 Batch 3

Transactional topologies

Once a batch is completed, commit the batch

slide-68
SLIDE 68

Batch 1 Batch 2 Batch 3

Transactional topologies

Bolts can optionally be “committers”

slide-69
SLIDE 69

Commit 1

Transactional topologies

Commits are ordered. If there’s a failure during commit, the whole batch + commit is retried

Commit 1 Commit 2 Commit 3 Commit 4 Commit 4

slide-70
SLIDE 70

Example

slide-71
SLIDE 71

Example

New instance of this object for every transaction attempt

slide-72
SLIDE 72

Example

Aggregate the count for this batch

slide-73
SLIDE 73

Example

Only update database if transaction ids differ

slide-74
SLIDE 74

Example

This enables idempotency since commits are ordered

slide-75
SLIDE 75

Example

(Credit goes to Kafka devs for this trick)

slide-76
SLIDE 76

Transactional topologies

Multiple batches can be processed in parallel, but commits are guaranteed to be ordered

slide-77
SLIDE 77
  • Requires a more sophisticated source

queue than Kestrel or RabbitMQ

  • storm-contrib has a transactional spout

implementation for Kafka

Transactional topologies

slide-78
SLIDE 78

Storm UI

slide-79
SLIDE 79

Storm on EC2

https://github.com/nathanmarz/storm-deploy One-click deploy tool

slide-80
SLIDE 80

Starter code

https://github.com/nathanmarz/storm-starter Example topologies

slide-81
SLIDE 81

Documentation

slide-82
SLIDE 82

Ecosystem

  • Scala, JRuby, and Clojure DSL’s
  • Kestrel, Redis, AMQP

, JMS, and other spout adapters

  • Multilang adapters
  • Cassandra, MongoDB integration
slide-83
SLIDE 83

Questions?

http://github.com/nathanmarz/storm

slide-84
SLIDE 84

Future work

  • State spout
  • Storm on Mesos
  • “Swapping”
  • Auto-scaling
  • Higher level abstractions
slide-85
SLIDE 85

Implementation

KafkaTransactionalSpout

slide-86
SLIDE 86

Implementation

all all all

slide-87
SLIDE 87

Implementation

all all all

TransactionalSpout is a subtopology consisting of a spout and a bolt

slide-88
SLIDE 88

Implementation

all all all

The spout consists of one task that coordinates the transactions

slide-89
SLIDE 89

Implementation

all all all

The bolt emits the batches of tuples

slide-90
SLIDE 90

Implementation

all all all

The coordinator emits a “batch” stream and a “commit stream”

slide-91
SLIDE 91

Implementation

all all all

Batch stream

slide-92
SLIDE 92

Implementation

all all all

Commit stream

slide-93
SLIDE 93

Implementation

all all all

Coordinator reuses tuple tree framework to detect success or failure of batches or commits and replays appropriately