Challenges in Data Stream Processing Corso di Sistemi e Architetture - - PDF document

challenges in data stream processing
SMART_READER_LITE
LIVE PREVIEW

Challenges in Data Stream Processing Corso di Sistemi e Architetture - - PDF document

Macroarea di Ingegneria Dipartimento di Ingegneria Civile e Ingegneria Informatica Challenges in Data Stream Processing Corso di Sistemi e Architetture per Big Data A.A. 2019/2020 Valeria Cardellini Laurea Magistrale in Ingegneria Informatica


slide-1
SLIDE 1

Corso di Sistemi e Architetture per Big Data A.A. 2019/2020 Valeria Cardellini Laurea Magistrale in Ingegneria Informatica

Challenges in Data Stream Processing

Macroarea di Ingegneria Dipartimento di Ingegneria Civile e Ingegneria Informatica

Challenges

  • Let’s consider how to tackle the following

challenges in DSP systems

  • 1. Optimize the DSP application
  • 2. Place the DSP operators on the underlying

computing infrastructure

  • 3. Manage load variations
  • 4. Self-adapt at run-time
  • 5. Stateful operators
  • 6. Fault tolerance

Valeria Cardellini - SABD 2019/2020 1

slide-2
SLIDE 2

Challenge 1: Optimize the DSP application

  • Apply some transformation to streaming graph

– At design time or run-time

  • Operator reordering

– To avoid unnecessary data transfers

  • Redundancy elimination

A B B A A B B D C A B D C

2 Valeria Cardellini - SABD 2019/2020

Challenge 1: Optimize the DSP application

  • Operator separation
  • Operator fusion

A A1 A2 A B AB

3 Valeria Cardellini - SABD 2019/2020

slide-3
SLIDE 3

Challenge 1: Optimize the DSP application

  • Operator fission (i.e., data parallelism)

Valeria Cardellini - SABD 2019/2020 4

A A A A

Split Merge

At the streaming system layer

  • The previous challenge is addressed at the

DSP application layer

  • What about the streaming system layer?
  • Two main classes of solutions to improve

performance (e.g., to control application latency) at the streaming system layer

1. Place the DSP operators 2. Manage load variations

Valeria Cardellini - SABD 2019/2020 5

slide-4
SLIDE 4

Challenge 2: Place DSP operators

1 2 3 4 6 5

(1,2) (1,2) (1,2) (2,3) (2,4) (3,5) (4,5) (4,6) (4,6) ( 2 , 4 ) (2,3) (3,5) (4,5) (4,6)

v

6 Valeria Cardellini - SABD 2019/2020

  • Determine, within a set of available distributed

computing nodes, those nodes that should host and execute each operator instance of a DSP application

Challenge 2: Place DSP operators

  • Operator placement decision: a complex

problem

– Trade communication cost against resource utilization

  • When

– Initial (static) operator placement

  • Can be more expensive and comprehensive

– Can also be at run-time

  • Place again all the operators or only a subset
  • Require self-adaptation
  • We will focus on this issue later

7 Valeria Cardellini - SABD 2019/2020

slide-5
SLIDE 5

Challenge 3: Manage load variations

  • Typical stream processing workloads are:

– with high volume and high rates – bursty and with workload spikes not known in advance

  • Twitter in 2013: rate of tweets per second = 5700
  • … but significant peak of 144,000 tweets per second

8 Valeria Cardellini - SABD 2019/2020

Challenge 3: Manage load variations

  • Some solutions:

– Admission control – Static reservation

  • Reserve specific resources in advance
  • Cons: over-provisioning and cost increase

– Apply dynamic techniques such as load shedding

  • Selectively drop tuples at strategic points (e.g., when CPU

usage exceeds a specific limit)

  • Cons: sacrifice accuracy and completeness

A

Shedder

A

9 Valeria Cardellini - SABD 2019/2020

slide-6
SLIDE 6

Challenge 3: Manage load variations

  • Some solutions (continued):

– Use adaptive rate allocation

  • E.g., backpressure: the upstream operator that precedes the

bottleneck operator stores data in an internal buffer to reduce the pressure; backpressure recursively propagates up to the source operators

– Redistribute load, e.g., determine new operator placement and relocate operators on computing nodes

  • Cons: available resources could be insufficient
  • What else?

10 Valeria Cardellini - SABD 2019/2020

  • Another solution:

– Detect bottleneck and solve it by exploiting elasticity: acquire and release resources when needed – How?

  • By hand: possible, but cumbersome
  • So what? MAPE!

Exploit elasticity

11 Valeria Cardellini - SABD 2019/2020

slide-7
SLIDE 7

Elastic data stream processing

  • Where?

– At application layer (i.e., data parallelism)

  • i.e., apply SPMD paradigm: concurrent execution of multiple

replicas of the same operator on different data portions

  • Scale-out (in) operators by adding (removing) operator

replicas

12 Valeria Cardellini - SABD 2019/2020

Elastic data stream processing

  • Where?

– At infrastructure layer

  • Scale horizontally computing resources (containers, virtual

machines, physical machines)

  • Also scale vertically computing resources (containers, virtual

machines)

13 Valeria Cardellini - SABD 2019/2020

slide-8
SLIDE 8

Elastic stream processing

  • When and how to scale?

– Open issues – Some simple example:

  • When: threshold-based (like AWS Auto Scaling)
  • How: add/remove one operator replica at time
  • Where: determine randomly (or in a round-robin fashion)

location of new replica

  • Be careful: elasticity overhead is not zero!

– In most streaming systems: required to run new placement decision to take new replicas into account – Dynamic scaling impacts stateful operators

14 Valeria Cardellini - SABD 2019/2020

Challenge 4: Self-adapt at run-time

  • Many factors may change at runtime, e.g.,

– Load variations, QoS of computing resources, cost

  • f computing resources (e.g., due to dynamic pricing

schemes), network characteristics, node mobility, …

  • How to adapt the DSP application when

changes occur?

– Enrich DSP systems with run-time adaptation capabilities

  • Which adaptation actions?

– Migrate the operators on different computing nodes – Scale-out/in the number of operator instances

15 Valeria Cardellini - SABD 2019/2020

slide-9
SLIDE 9

Self-adaptive deployment

  • MAPE (Monitor, Analyze, Plan and Execute)
  • Plan phase: how to reconfigure the DSP

application deployment

16 Valeria Cardellini - SABD 2019/2020

Distributed Storm

  • We developed an extension of Storm, named

Distributed Storm

  • Goals: to provide

– Distributed monitoring – Distributed placement – Adaptation capabilities

  • Where: geo-distributed environment
  • Code available on GitHub

matnar.github.io/uniroma2-storm/

17 Valeria Cardellini - SABD 2019/2020

  • V. Cardellini, V. Grassi, F. Lo Presti, M. Nardelli, “Distributed QoS-aware

scheduling in Storm”, ACM DEBS 2015.

slide-10
SLIDE 10

Distributed Storm architecture

18 Valeria Cardellini - SABD 2019/2020

Distributed Storm: monitoring

  • QoSMonitor (for each worker node)

– Estimate network latencies

  • Use a network coordinate system
  • Vivaldi’s algorithm: decentralized and gossip-based

– Monitor QoS attributes

  • Node utilization and availability
  • Worker Monitor (for each worker process)

– Monitor exchanged data rate among the operators

19 Valeria Cardellini - SABD 2019/2020

slide-11
SLIDE 11

Distributed Storm: performance

Load spike on a subset of nodes

~50%

20 Valeria Cardellini - SABD 2019/2020

But distributed placement suffers from lack of coordination

  • We compared fully distributed placement heuristic

implemented in Distributed Storm (Pietzuch et al.) with our optimal placement policy (ODP)

21

Activations of fully distributed algorithm lead to performance degradation

Valeria Cardellini - SABD 2019/2020

slide-12
SLIDE 12

Reconfiguration challenges

  • Reconfiguring the deployment has a non

negligible cost

  • Can affect negatively application performance

in the short term

– Application freezing times caused by operator migration and scaling, especially for stateful

  • perators
  • Solution:

– Perform reconfiguration only when needed – Take into account the overhead for migrating and scaling the operators

22 Valeria Cardellini - SABD 2019/2020

Challenge 5: Stateful operators

  • State complicates things…

1.Dynamic scaling 2.Operator re-placement 3.Recovery from failure Loss of state! impact state

23 Valeria Cardellini - SABD 2019/2020

impact state

slide-13
SLIDE 13

Approaches for stateful migration

  • Most streaming systems do not support stateful

processing and migration (e.g., Storm)

– Developers need to manage state – Typically combined with external system to store state – Increased design complexity

  • Recent interest in research prototypes and

production-ready streaming systems

– E.g., Heron, Spark Streaming

  • Requirements for stateful operatior migration

– Safety (i.e., to preserve operation consistency) – Application transparency – Minimal footprint

24 Valeria Cardellini - SABD 2019/2020

Issues with stateful operators

  • Require mechanisms to:

– Migrate stateful operators

  • Pause-and-resume approach
  • Parallel track approach

– Partition streams and load balance among replicas

25 Valeria Cardellini - SABD 2019/2020

slide-14
SLIDE 14

Stateful operator migration

  • Pause-and-resume approach

Application latency peak during migration

Stop migrating task Save state Terminate migrating task and start it on new node Restore state Resume stream processing

Valeria Cardellini - SABD 2019/2020 26

Stateful operator migration

  • Parallel track approach

– Old and new operator instances run concurrently until their state is synchronized No latency peak Enhanced mechanisms for synchronization

27 Valeria Cardellini - SABD 2019/2020

slide-15
SLIDE 15

Issues for stateful migration: stream partitioning

  • How to identify the portion of state to

migrate? Possible approaches:

– Expose an API to let the user manually manage the state – Support only partitioned stateful operators

  • Partitioned stateful operators store independent state for

each sub-stream identified by a partitioning key

  • Automatically determine, on the basis of a partitioning

key, the optimal number of state partitions to be used and migrate

28 Valeria Cardellini - SABD 2019/2020

Issues for stateful migration: load balancing

  • How to balance the load among multiple

stateful replicas?

  • Can use consistent hashing
  • Can use partial key grouping

– Uses two hash functions where a key can be sent to two different replicas instead of one

  • Only available in research prototypes

Valeria Cardellini - SABD 2019/2020 29

slide-16
SLIDE 16

Elastic stateful migration in Storm

  • We developed mechanisms for elasticity and

stateful migration in Storm

Supervisor Supervisor Supervisor Supervisor

worker process worker process worker slot worker slot worker slot worker slot worker process worker process worker process worker process worker process worker process

DDS DDS DDS DDS

Network

scheduler MigrationNotifier ElasticityManager

Nimbus ZooKeeper

30 Valeria Cardellini - SABD 2019/2020

  • V. Cardellini, M. Nardelli, D. Luzi, "Elastic stateful stream processing in Storm", HPCS 2016.

Elastic stateful migration in Storm

  • Migration protocol based on pause-and-resume

approach to relocate the operator internal state

  • n a different node
  • Elasticity policy at the application level

– Simple threshold-based policy

MIGRATION NOTIFIED MIGRATION MODE SAVE STATE first synchronization barrier the migrating task can be terminated MIGRATION MODE RESTORE STATE (if any) OPERATIONAL MODE new task second synchronization barrier streams are resumed

time

DDS DDS 31 Valeria Cardellini - SABD 2019/2020

slide-17
SLIDE 17

Challenge 6: Guarantee fault tolerance

  • DSP applications run for long time

failures are unavoidable

  • Possible solutions:

– Active replication – Check-pointing – Replay logs

  • Having different trade-offs between runtime cost

in absence of failures and recovery cost

  • Large-scale complicates things…

– Network partitions and CAP theorem

32 Valeria Cardellini - SABD 2019/2020

References

  • M. Hirzel, R. Soulé, S. Schneider, B. Gedik, R. Grimm, “A

catalog of stream processing optimizations”, ACM Comput. Surv., 2014. http://bit.ly/2rtLljf

  • T. Heinze T, L. Aniello, L. Querzoni, Z. Jerzak, “Cloud-based

data stream processing”, Proc. ACM DEBS 2014. http://bit.ly/2sMzxMM

  • M. de Assuncao, A. da Silva Veith, R. Buyya, "Distributed data

stream processing and edge computing: A survey on resource elasticity and future directions", J. of Network and Computer Applications, 2018. https://hal.inria.fr/hal-01653842/document

  • V. Cardellini, V. Grassi, F. Lo Presti, M. Nardelli, “Distributed

QoS-aware scheduling in Storm”, Proc. ACM DEBS 2015.

  • V. Cardellini, M. Nardelli, D. Luzi, “Elastic stateful stream

processing in Storm”, Proc. HPCS 2016.

Valeria Cardellini - SABD 2019/2020 33