An Experiment-Driven Performance Model of Stream Processing - PowerPoint PPT Presentation

An Experiment-Driven Performance Model of Stream Processing Operators in Fog Computing Environments Hamidreza Arkian 1 , Guillaume Pierre 1 , Johan Tordsson 2 , Erik Elmroth 2 1 University of Rennes1/IRISA, France 2 Elastisys AB, Sweden SAC’20 - March 30-April 3, 2020 - Brno, Czech Republic

IoT-to-Cloud basic architecture 2/16

Cloud-based stream processing Apache Flink 3/16

Challenges Low Throughput!! Low Bandwidth!! Cost!! Apache Flink Latency-sensitive applications Continuously generating stream of data with high rate 4/16

Fog-based stream processing 5/16

Stream processing in Fog environment Operator 2 Operator Source Operator Sink 1 4 Operator 3 Logical graph of DSP Operator 2 Operator Source 2 Operator 2 Operator Operator Sink 4 1 Source Operator 3 Workflow execution model 6/16

Stream processing in geo-distributed environments Operator 2 Operator Source Operator Sink 1 4 Operator 3 Op2 Replica1 Logical graph of DSP Op2 Sink Replica2 Operator 2 Operator 1 Op2 Operator Replica3 Source 2 Source Operator 4 Operator 2 Operator Operator Sink 4 1 Source Source Operator 3 Operator 3 Workflow execution model Deployment in Fog geo-distributed environment 7/16

Challenges ➢ Understanding the performance of a geo-distributed stream processing application is difficult. ➢ Any configuration decision can have a significant impact on performance. 8/16

Experimental setup ➢ Emulation of a real fog platform 32-core server ≈ 16 fog nodes (2 cores/node) o Emulated network latencies o Apache Flink o ➢ Test Application Input stream of 100,000 Tuple2 records o The operator calls the Fibonacci function o Fib(24) upon every processed record ➢ Performance metric: Processing Time (PT) o 9/16

Modeling operator replication ➢ n operator replicas should in principle process data n times faster than a single replica Experiment Model ➢ α represents the computation capacity of a single node. ➢ We can determine the value of α based on one measurement 10/16

Considering heterogeneous network delays ➢ Network delays between data sources and operator replicas slow down the whole system. ➢ When the network delays are heterogeneous, the dominating one is the greatest one ( ND max ). Experiment Model ➢ γ represents the impact of network delays on overall performance. We can determine both α and γ based on two measurements ➢ 11/16

Improving the model’s accuracy ➢ Operator replication incurs some amount of parallelization inefficiency ➢ The speedup with n nodes is usually a little less than n Experiment Model ➢ 𝛾 represents Flink’s parallelization inefficiency ➢ We can determine α , 𝛾 and γ based on three or more measurements 12/16

Prediction accuracy Accuracy metric: 𝑁𝐵𝑄𝐹 4 measurements, 2.0% accuracy 13/16

What about modeling an entire (simple) workflow? ➢ The throughput of an entire workflow is determined by the slowest operator Workflow Experiment Model 𝛲 Workflow = max ( 𝛲 Map+KeyBy , 𝛲 Reduce ) 14/16

Can we reuse the parameters instead of multiple measurements? Calibrated model for Operator 1 Uncalibrated model for Operator 2 𝛽 1 𝛽 2 β 1 β 1 γ 1 γ 1 ➢ 𝛽 cannot be reused because it is specific to the computation complexity of one operator. ➢ β and γ capture properties that are independent from the nature of the computation carried out by the operator. ➢ β and γ values of one operator’s model might be reused for other operators’ models. 15/16

Conclusions ➢ Heterogeneous network characteristics make it difficult to understand the performance of stream processing engines in geo-distributed environments. ➢ A predictive performance model for Apache Flink operators that is backed by experimental measurements and evaluations was proposed. ➢ The model predictions are accurate within ±2% of the actual values. Hamidreza Arkian hamidreza.arkian@irisa.fr Acknowledgment This work is part of a project that has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No 765452. The information and views set out in this publication are those of the author(s) and do not necessarily reflect the Training the next generation of European official opinion of the European Union. Neither the European Union institutions Fog computing experts and bodies nor any person acting on their behalf may be held responsible for the http://www.fogguru.eu/ use which may be made of the information contained therein. 16/16

An Experiment-Driven Performance Model of Stream Processing - PowerPoint PPT Presentation

An Experiment-Driven Performance Model of Stream Processing Operators in Fog Computing Environments Hamidreza Arkian 1 , Guillaume Pierre 1 , Johan Tordsson 2 , Erik Elmroth 2 1 University of Rennes1/IRISA, France 2 Elastisys AB, Sweden SAC20 -

? sync ref chosen as sync source by Listener Stream B: Presentation Stream C: timestamps

Priority-Driven Scheduling of Periodic Tasks Priority-driven vs. clock-driven scheduling:

Stream Ciphers Stream Ciphers 1 Stream Ciphers Generalization of one-time pad Trade

False fasting is driven by pride False fasting is driven by pride False fasting is

Fresh water stream ecosystem Gr ov p 2 The description of stream lives Quadrat 1: Hong Kong Newt

Phase III Stream Assessment Study: Potential Stream Restoration Projects Strawberry Run and

UPLOAD VIDEOS TO MICROSOFT STREAM VIA ACCESSUH To upload a video on Microsoft Stream, go to

Assessing stream and riparian conditions Stream Habitat Assessment Conducted yearly

CS162: Introduction to Computer Science II Streams 1 Streams A stream is a flow of data

Stream Processing Marco Serafini COMPSCI 532 Lecture 5 Stream vs. Batch Processing Batch

Stream Switching Control draft-gentric-mmusic-stream-switching-00.txt Philippe Gentric

B.e) Stream Ciphers W. Schindler: Cryptography, B-IT, winter 2006 / 2007 2 B.125 Stream Ciphers

Sodium Reactor Experiment Accident Sodium Reactor Experiment Accident Sodium Reactor Experiment

Future Outlook: Experiment Future Outlook: Experiment Future Outlook: Experiment Future Outlook:

Probabilistic Models of Human Sentence Experiment 1: Entropy and Sentence Length 2 Processing

PHYSICS PROSPECTS OF THE PHYSICS PROSPECTS OF THE JUNO EXPERIMENT JUNO EXPERIMENT Monica Sisti

On the Price of Heterogeneity in Parallel Systems P . Brighten Godfrey and Richard M. Karp

A Massively Scalable Architecture for Learning Representations from Heterogeneous Graphs NVIDIA

Machine Learning as Enabler for Cross-Layer Resource Allocation: Opportunities and Challenges

Cr Cross L Laye yer Co Control ( (CL CLC) C) ba base sed d on n SDN and nd SDR R

Efficient 3-D Placement of an Aerial Base Station in Next Generation Cellular Networks Article by

Scalability of InfiniBand-Connected LNET Routers Team Light Coral Computer System, Cluster, and

DEUTSCHE TELEKOM CAPITAL MARKETS DAY 2015 Bonn, February 26/27, 2015 DISCLAIMER This

RE NET RE Agenda Extending OMNeT++ Towards a Platform for the Design of Future In-Vehicle