Streaming In Practice KARTHIK RAMASAMY @KARTHIKZ #TwitterHeron - - PowerPoint PPT Presentation

streaming in practice
SMART_READER_LITE
LIVE PREVIEW

Streaming In Practice KARTHIK RAMASAMY @KARTHIKZ #TwitterHeron - - PowerPoint PPT Presentation

Streaming In Practice KARTHIK RAMASAMY @KARTHIKZ #TwitterHeron TALK OUTLINE BEGIN b I II ( III HERON HERON HERON PERFORMANCE BACKPRESSURE OVERVIEW K V Z IV END CONCLUSION HERON LOAD SHEDDING b HERON OVERVIEW


slide-1
SLIDE 1

Streaming In Practice

KARTHIK RAMASAMY

@KARTHIKZ #TwitterHeron

slide-2
SLIDE 2

BEGIN END

HERON OVERVIEW

  • I

HERON PERFORMANCE

(

II

CONCLUSION

K

V

HERON LOAD SHEDDING

Z

IV

TALK OUTLINE

HERON BACKPRESSURE

b

III

slide-3
SLIDE 3

HERON OVERVIEW

b

slide-4
SLIDE 4

STORM/HERON TERMINOLOGY

TOPOLOGY Directed acyclic graph Vertices=computation, and edges=streams of data tuples SPOUTS Sources of data tuples for the topology Examples - Kafka/Kestrel/MySQL/Postgres BOLTS Process incoming tuples and emit outgoing tuples Examples - filtering/aggregation/join/arbitrary function

, %

slide-5
SLIDE 5

STORM/HERON TOPOLOGY

% % % % %

SPOUT 1 SPOUT 2 BOLT 1 BOLT 2 BOLT 3 BOLT 4 BOLT 5

slide-6
SLIDE 6

WHY HERON?

PERFORMANCE PREDICTABILITY EASE OF MANAGEABILITY

  • d
  • IMPROVE DEVELOPER PRODUCTIVITY
slide-7
SLIDE 7

HERON DESIGN DECISIONS

FULLY API COMPATIBLE WITH STORM Directed acyclic graph Topologies, spouts and bolts USE OF MAIN STREAM LANGUAGES C++/JAVA/Python

  • d
  • TASK ISOLATION

Ease of debug ability/resource isolation/profiling

slide-8
SLIDE 8

HERON ARCHITECTURE

Topology 1

TOPOLOGY SUBMISSION

Scheduler Topology 2 Topology 3 Topology N

slide-9
SLIDE 9

TOPOLOGY ARCHITECTURE

Topology Master

ZK CLUSTER

Stream Manager

I1 I2 I3 I4

Stream Manager

I1 I2 I3 I4

Logical Plan, Physical Plan and Execution State Sync Physical Plan CONTAINER CONTAINER

Metrics Manager Metrics Manager

slide-10
SLIDE 10

HERON SAMPLE TOPOLOGIES

slide-11
SLIDE 11

Large amount of data produced every day Large cluster Several hundred topologies deployed Several billion messages every day

HERON @TWITTER

1 stage 10 stages

3x reduction in cores and memory Heron has been in production for 2 years

slide-12
SLIDE 12

HERON USE CASES

REALTIME ETL REAL TIME BI SPAM DETECTION REAL TIME TRENDS REALTIME ML REAL TIME MEDIA REAL TIME OPS

slide-13
SLIDE 13

HERON ENVIRONMENT

Laptop/Server Cluster/Aurora Cluster/Mesos

slide-14
SLIDE 14

HERON RESOURCE USAGE

x

9

slide-15
SLIDE 15

HERON PERFORMANCE

Settings

COMPONENTS EXPT #1 EXPT #2 EXPT #3 EXPT #4 Spout 25 100 200 300 Bolt 25 100 200 300 # Heron containers 25 100 200 300 # Storm workers 25 100 200 300

slide-16
SLIDE 16

HERON PERFORMANCE

million tuples/min 350 700 1050 1400 Spout Parallelism 25 100 200 500

Storm Heron

Word count topology - Acknowledgements enabled

latency (ms) 625 1250 1875 2500 Spout Parallelism 25 100 200 500

Storm Heron

10-14x Throughput Latency 5-15x

slide-17
SLIDE 17

HERON RESOURCE USAGE

Event Spout Aggregate Bolt

60-100M/min

Filter

8-12M/min

Flat-Map

40-60M/min

Aggregate

Cache 1 sec

Output

25-42M/min

Redis

slide-18
SLIDE 18

RESOURCE CONSUMPTION

Cores Requested Cores Used Memory Requested (GB) Memory Used Redis 24 2-4 48 N/A Heron 120 30-50 200 180

slide-19
SLIDE 19

RESOURCE CONSUMPTION

7% 9% 84%

Spout Instances Bolt Instances Heron Overhead

slide-20
SLIDE 20

PROFILING SPOUTS

2% 7% 16% 6% 6% 63%

Deserialize Parse/Filter Mapping Kafka Iterator Kafka Fetch Rest

slide-21
SLIDE 21

PROFILING BOL TS

2% 4% 5% 2% 19% 68%

Write Data Serialize Deserialize Aggregation Data Transport Rest

slide-22
SLIDE 22

RESOURCE CONSUMPTION - BREAKDOWN

8% 11% 21% 61%

Fetching Data User Logic Heron Usage Writing Data

slide-23
SLIDE 23

HERON BACK PRESSUREx

9

slide-24
SLIDE 24

BACK PRESSURE AND STRAGGLERS

PROVIDES PREDICTABILITY PROCESSES DATA AT MAXIMUM RATE REDUCE RECOVERY TIMES HANDLES TEMPORARY SPIKES

/

b

\

Ñ

Stragglers are the norm in a multi-tenant distributed systems Bad machine, inadequate provisioning and hot keys

slide-25
SLIDE 25

BACK PRESSURE AND STRAGGLERS

MOST SCENARIOS BACK PRESSURE RECOVERS Without any manual intervention SOMETIMES USER PREFER DROPPING OF DATA Care about only latest data

  • d
  • SUSTAINED BACK PRESSURE

Irrecoverable GC cycles Bad or faulty host

slide-26
SLIDE 26

LOAD SHEDDING

SAMPLING BASED APPROACHES Down sample the incoming stream and scale up the results Easy to reason if the sampling is uniform Hard to achieve uniformity across distributed spouts

  • DROP BASED APPROACHES

Simply drop older data Spouts takes a lag threshold and a lag adjustment value Works well in practice

slide-27
SLIDE 27

CURIOUS TO LEARN MORE…

Twitter Heron: Stream Processing at Scale

Sanjeev Kulkarni, Nikunj Bhagat, Maosong Fu, Vikas Kedigehalli, Christopher Kellogg, Sailesh Mittal, Jignesh M. Patel*,1, Karthik Ramasamy, Siddarth Taneja

@sanjeevrk, @challenger_nik, @Louis_Fumaosong, @vikkyrk, @cckellogg, @saileshmittal, @pateljm, @karthikz, @staneja

Twitter, Inc., *University of Wisconsin – Madison

Storm @Twitter

Ankit Toshniwal, Siddarth Taneja, Amit Shukla, Karthik Ramasamy, Jignesh M. Patel*, Sanjeev Kulkarni, Jason Jackson, Krishna Gade, Maosong Fu, Jake Donham, Nikunj Bhagat, Sailesh Mittal, Dmitriy Ryaboy

@ankitoshniwal, @staneja, @amits, @karthikz, @pateljm, @sanjeevrk, @jason_j, @krishnagade, @Louis_Fumaosong, @jakedonham, @challenger_nik, @saileshmittal, @squarecog

Twitter, Inc., *University of Wisconsin – Madison

Streaming@Twitter

Maosong Fu, Sailesh Mittal, Vikas Kedigehalli, Karthik Ramasamy, Michael Barry, Andrew Jorgensen, Christopher Kellogg, Neng Lu, Bill Graham, Jingwei Wu Twitter, Inc.

slide-28
SLIDE 28
  • #ThankYou

FOR LISTENING

slide-29
SLIDE 29

QUESTIONS and ANSWERS

R

  • Go ahead. Ask away.
slide-30
SLIDE 30

HERON LOAD SHEDDINGx

9