flying faster with heron
play

Flying Faster with Heron KARTHIK RAMASAMY @KARTHIKZ #TwitterHeron - PowerPoint PPT Presentation

Flying Faster with Heron KARTHIK RAMASAMY @KARTHIKZ #TwitterHeron TALK OUTLINE BEGIN b I II ( III MOTIVATION HERON OVERVIEW K V Z IV END HERON OPERATIONAL PERFORMANCE EXPERIENCES [ OVERVIEW TWITTER IS REAL TIME


  1. Flying Faster with Heron KARTHIK RAMASAMY @KARTHIKZ #TwitterHeron

  2. TALK OUTLINE BEGIN b I II ( � III MOTIVATION HERON OVERVIEW K V Z IV END HERON OPERATIONAL PERFORMANCE EXPERIENCES

  3. [ � OVERVIEW

  4. TWITTER IS REAL TIME real time trends real time conversations real time recommendations real time search Ü � G s Real time sports Emerging break out Real time product Real time search of conversations related trends in Twitter (in the recommendations based tweets with a topic (recent goal form #hashtags) on your behavior & or touchdown) profile ANALYZING BILLIONS OF EVENTS IN REAL TIME IS A CHALLENGE!

  5. TWITTER STORM Streaming platform for analyzing realtime data as they arrive, so you can react to data as it happens . / Ñ \ b GUARANTEED HORIZONTAL ROBUST CONCISE MESSAGE SCALABILITY FAULT CODE- FOCUS PROCESSING TOLERANCE ON LOGIC

  6. STORM TERMINOLOGY TOPOLOGY , Directed acyclic graph Vertices=computation, and edges=streams of data tuples SPOUTS Sources of data tuples for the topology Examples - Event Bus/Kafka/Kestrel/MySQL/Postgres BOLTS % Process incoming tuples and emit outgoing tuples Examples - filtering/aggregation/join/arbitrary function

  7. STORM TOPOLOGY BOLT 1 % SPOUT 1 BOLT 4 % BOLT 2 % % % SPOUT 2 BOLT 5 BOLT 3

  8. WORD COUNT TOPOLOGY Live stream of Tweets % % TWEET SPOUT PARSE TWEET BOLT WORD COUNT BOLT LOGICAL PLAN

  9. WORD COUNT TOPOLOGY % % % % % % % % % % TWEET SPOUT PARSE TWEET BOLT WORD COUNT BOLT TASKS TASKS TASKS When a parse tweet bolt task emits a tuple which word count bolt task should it send to?

  10. STREAM GROUPINGS SHUFFLE GROUPING FIELDS GROUPING ALL GROUPING GLOBAL GROUPING / - . , Random distribution Group tuples by a Replicates tuples to Sends the entire of tuples field or multiple all tasks stream to one task fields

  11. WORD COUNT TOPOLOGY SHUFFLE GROUPING FIELDS GROUPING % % % % % % % % % % TWEET SPOUT PARSE TWEET BOLT WORD COUNT BOLT TASKS TASKS TASKS

  12. ( MOTIVATION

  13. STORM ARCHITECTURE MASTER NODE TOPOLOGY Nimbus ASSIGNMENT SUBMISSION MAPS Multiple Functionality Single point of failure Scheduling/Monitoring ZK No resource reservation CLUSTER and isolation Storage Contention SUPERVISOR SUPERVISOR W2 W3 W4 W2 W3 W4 W1 W1 SLAVE NODE SLAVE NODE

  14. STORM WORKER EXECUTOR2 EXECUTOR1 Complex hierarchy TASK1 JVM PROCESS Hard to debug TASK4 TASK2 Difficult to tune TASK5 TASK3

  15. DATA FLOW IN STORM WORKERS User Logic User Logic User Logic User Logic User Logic User Logic In Queue User Logic User Logic In Queue In Queue User Logic In Queue In Queue In Queue In Queue Thread Thread In Queue Out Queue Thread Thread Send Thread In Queue Thread Thread Thread Thread Thread Queue Contention Global Receive Outgoing Thread Message Buffer Global Send TCP Receive Buffer Multiple Languages Thread TCP Send Buffer Kernel

  16. OVERLOADED ZOOKEEPER Scaled up STORM S1 W zk W S2 zk W S3 Handled unto to 1200 workers per cluster

  17. OVERLOADED ZOOKEEPER Analyzing zookeeper traffic KAFKA SPOUT 67% Offset/partition is written every 2 secs STORM RUNTIME 33% Workers write heart beats every 3 secs

  18. OVERLOADED ZOOKEEPER Heart beat daemons STORM S1 W zk H HH W S2 zk W S3 KV KV KV 5000 workers per cluster

  19. STORM - DEPLOYMENT shared pool storm cluster

  20. STORM - DEPLOYMENT shared pool isolated pools joe’s topology storm cluster

  21. STORM - DEPLOYMENT shared pool isolated pools joe’s topology storm jane’s topology cluster

  22. STORM - DEPLOYMENT shared pool isolated pools joe’s topology storm jane’s topology cluster dave’s topology

  23. STORM ISSUES LACK OF BACK PRESSURE g Drops tuples unpredictably EFFICIENCY G Serialization program consumes 75 cores at 30% CPU Topology consumes 600 cores at 20-30% CPU NO BATCHING � Tuple oriented system - implicit batching by 0MQ

  24. EVOLUTION OR REVOLUTION? fix storm or develop a new system? FUNDAMENTAL ISSUES- REQUIRE EXTENSIVE REWRITING , Several queues for moving data Inflexible and requires longer development cycle USE EXISTING OPEN SOURCE SOLUTIONS Issues working at scale/lacks required performance Incompatible API and long migration process

  25. b HERON

  26. HERON DESIGN GOALS FULLY API COMPATIBLE WITH STORM � Directed acyclic graph Topologies, spouts and bolts TASK ISOLATION � Ease of debug ability/resource isolation/profiling USE OF MAIN STREAM LANGUAGES d C++/JAVA/Python

  27. HERON ARCHITECTURE Topology 1 Scheduler Topology 2 TOPOLOGY SUBMISSION Topology 3 Topology N

  28. TOPOLOGY ARCHITECTURE Logical Plan, Physical Plan and Topology Execution State Master ZK CLUSTER Sync Physical Plan Metrics Metrics Stream Stream Manager Manager Manager Manager I1 I2 I3 I4 I1 I2 I3 I4 CONTAINER CONTAINER

  29. TOPOLOGY MASTER Solely responsible for the entire topology Ñ \ b ASSIGNS ROLE MONITORING METRICS

  30. TOPOLOGY MASTER Logical Plan, Physical Plan and Topology Execution State Master ZK CLUSTER � PREVENT MULTIPLE TM BECOMING MASTERS � ALLOWS OTHER PROCESS TO DISCOVER TM

  31. STREAM MANAGER Routing Engine Ñ / , ROUTES TUPLES BACK PRESSURE ACK MGMT

  32. STREAM MANAGER S1 B2 B3 B4 % % %

  33. STREAM MANAGER S1 B2 S1 B2 Stream Stream Manager Manager B3 B4 B3 B4 O(n 2 ) O(k 2 ) S1 B2 S1 B2 Stream Stream Manager Manager B3 B4 B3

  34. STREAM MANAGER tcp back pressure S1 B2 S1 B2 Stream Stream Manager Manager B3 B4 B3 B4 S1 B2 S1 B2 Stream Stream Manager Manager B3 B4 B3 SLOWS UPSTREAM AND DOWNSTREAM INSTANCES

  35. STREAM MANAGER spout back pressure S1 S1 B2 S1 S1 B2 Stream Stream Manager Manager B3 B4 B3 B4 S1 S1 B2 S1 S1 B2 Stream Stream Manager Manager B3 B4 B3

  36. STREAM MANAGER stage by stage back pressure S1 S1 B2 B2 S1 S1 B2 B2 Stream Stream Manager Manager B3 B4 B3 B4 S1 S1 B2 B2 S1 S1 B2 B2 Stream Stream Manager Manager B3 B4 B3

  37. STREAM MANAGER back pressure advantages PREDICTABILITY � Tuple failures are more deterministic SELF ADJUSTS � Topology goes as fast as the slowest component

  38. HERON INSTANCE Does the real work! > > | > p RUNS ONE TASK EXPOSES API COLLECTS METRICS

  39. HERON INSTANCE Stream Manager data-in queue Gateway Task Execution Thread Thread data-out queue Metrics metrics-out queue Manager

  40. K OPERATIONAL EXPERIENCES �

  41. HERON DEPLOYMENT Topology 1 ZK Aurora Scheduler CLUSTER Aurora Services Topology 2 Heron Web Topology 3 Heron Tracker Heron Topology N VIZ Observability

  42. HERON SAMPLE TOPOLOGIES

  43. SAMPLE TOPOLOGY DASHBOARD

  44. HERON @TWITTER STORM is decommissioned Large amount of data Large cluster Several topologies Several billion produced every day deployed messages every day 1 stage 10 stages 3x reduction in cores and memory

  45. x HERON PERFORMANCE 9

  46. HERON PERFORMANCE Settings COMPONENTS EXPT #1 EXPT #2 EXPT #3 EXPT #4 Spout 25 100 200 300 Bolt 25 100 200 300 # Heron containers 25 100 200 300 # Storm workers 25 100 200 300

  47. HERON PERFORMANCE Word count topology - Acknowledgements enabled Throughput Latency Storm Heron Storm Heron 1400 2500 1050 1875 million tuples/min latency (ms) 700 1250 350 625 0 0 25 100 200 500 25 100 200 500 Spout Parallelism Spout Parallelism 10-14x 5-15x

  48. HERON PERFORMANCE Word count topology - CPU usage Storm Heron 2500 1875 # cores used 1250 625 0 25 100 200 500 Spout Parallelism 2-3x

  49. HERON PERFORMANCE Throughput and CPU usage with no acknowledgements - Word count topology Storm Heron Storm Heron 5000 2500 3750 1875 million tuples/min # cores used 2500 1250 1250 625 0 0 25 100 200 500 25 100 200 500 Spout Parallelism Spout Parallelism

  50. HERON EXPERIMENT RTAC topology SHUFFLE FIELDS FIELDS % % % GROUPING GROUPING GROUPING CLIENT EVENT DISTRIBUTOR USER COUNT AGGREGATOR SPOUT BOLT BOLT BOLT

  51. HERON PERFORMANCE CPU usage - RTAC Topology Storm Heron Storm Heron No acknowledgements Acknowledgements enabled 400 400 300 300 # cores used # cores used 200 200 100 100 0 0

  52. HERON PERFORMANCE Latency with acknowledgements enabled - RTAC Topology Storm Heron 70 52.5 latency (ms) 35 17.5 0

  53. CURIOUS TO LEARN MORE… Twitter Heron: Stream Processing at Scale Sanjeev Kulkarni, Nikunj Bhagat, Maosong Fu, Vikas Kedigehalli, Christopher Kellogg, Sailesh Mittal, Jignesh M. Patel *,1 , Karthik Ramasamy, Siddarth Taneja @sanjeevrk, @challenger_nik, @Louis_Fumaosong, @vikkyrk, @cckellogg, @saileshmittal, @pateljm, @karthikz, @staneja Twitter, Inc., *University of Wisconsin – Madison Storm @Twitter Ankit Toshniwal, Siddarth Taneja, Amit Shukla, Karthik Ramasamy, Jignesh M. Patel*, Sanjeev Kulkarni, Jason Jackson, Krishna Gade, Maosong Fu, Jake Donham, Nikunj Bhagat, Sailesh Mittal, Dmitriy Ryaboy @ankitoshniwal, @staneja, @amits, @karthikz, @pateljm, @sanjeevrk, @jason_j, @krishnagade, @Louis_Fumaosong, @jakedonham, @challenger_nik, @saileshmittal, @squarecog Twitter, Inc., *University of Wisconsin – Madison

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend