modeling big data systems by extending the palladio
play

Mnchen, 2015-11-06 Modeling Big Data Systems by Extending the Palladio Component Model 6 th Symposium on Software Performance (SSP) 2015 Johannes Kro 1 , Andreas Brunnert 1 , Helmut Krcmar 2 1 fortiss GmbH, 2 Technische Universitt Mnchen

0 downloads 1 Views 463 KB Size Report
  1. München, 2015-11-06 Modeling Big Data Systems by Extending the Palladio Component Model 6 th Symposium on Software Performance (SSP) 2015 Johannes Kroß 1 , Andreas Brunnert 1 , Helmut Krcmar 2 1 fortiss GmbH, 2 Technische Universität München fortiss GmbH An-Institut Technische Universität München

  2. Agenda • Motivation • Development Process and Characteristics of Big Data Systems • Palladio Component Model (PCM) Meta-model Extension • Related Work • Conclusion and Future Work 2 pmw.fortiss.org München, 2015-11-06

  3. Agenda • Motivation • Development Process and Characteristics of Big Data Systems • Palladio Component Model (PCM) Meta-model Extension • Related Work • Conclusion and Future Work 3 pmw.fortiss.org München, 2015-11-06

  4. Motivation Cloudera Apache Flume Apache Spark splunk IBM Netezza HP Vertica Voldemort tableau Autonomy Hortonworks Cassandra Apache HBase ElephantDB S4 Apache Storm Amazon Kinesis TIBCO Apache Kafka Apache Hadoop VoltDB MongoDB Teradata Aster EMC Greenplum SAP Apache Samza Pentaho MapR Hana • Various big data technologies with different characteristics • Casado and Younas (2015) list two main techniques that are common for big data systems, namely, batch and stream processing 4 pmw.fortiss.org München, 2015-11-06

  5. Motivation • The added value of big data systems for organizations depends on the performance of such systems (Barbierato et al. 2014) • Performance models allow for proactive evaluations of these systems • Existing performance meta-models for big data systems, however, focus on either ... … one processing paradigm such as stream processing e.g., Ginis and Strom (2013) … or one technology such as Apache Hadoop MapReduce e.g., Ge et al. (2013) • We propose a general performance meta-model to specify shared characteristics of big data systems 5 pmw.fortiss.org München, 2015-11-06

  6. Agenda • Motivation • Development Process and Characteristics of Big Data Systems • Palladio Component Model (PCM) Meta-model Extension • Related Work • Conclusion and Future Work 6 pmw.fortiss.org München, 2015-11-06

  7. Development Process of Big Data Systems Component developers • Batch processing (e.g., using Apache MapReduce) public void map(Object key, Text value, ..)..{ StringTokenizer itr = new StringTokenizer(value.toString()); while (itr.hasMoreTokens()) { word.set(itr.nextToken()); context.write(word, one); } } public void reduce(Text key, Iterable<IntWritable> values,..)..{ int sum = 0; for (IntWritable val : values) { sum += val.get(); } result.set(sum); context.write(key, result); } • Stream processing (e.g., using Apache Storm) public void execute(Tuple tuple, BasicOutputCollector collector) { String word = tuple.getString(0); Integer count = counts.get(word); if (count == null) count = 0; count++; counts.put(word, count); collector.emit( new Values(word, count)); } 7 pmw.fortiss.org München, 2015-11-06

  8. Development Process of Big Data Systems System deployers • Resource environment (e.g., Apache YARN) Client Node Resource Node Manager Manager Manager Container Container Node Application Map Master Task Container Container Map Reduce Task Task Node Node 8 pmw.fortiss.org München, 2015-11-06

  9. Characteristics of Big Data Systems • We derive the following requirements of big data systems that we propose to implement based on the finding of previous work (Kroß et al. 2015) 1. Distribution and parallelization of operations • Component developers specify reusable software components consisting of operations using software frameworks like Apache Spark. • In doing so, they may specify, but also may not know the definite number of simultaneous and/or total executions of an operation. 2. Clustering of resource containers • System deployers specify resource containers with resource roles (e.g., master or worker nodes), link them to a mutual network and logically group them to a computer cluster. 9 pmw.fortiss.org München, 2015-11-06

  10. Agenda • Motivation • Development Process and Characteristics of Big Data Systems • Palladio Component Model (PCM) Meta-model Extension • Related Work • Conclusion and Future Work 10 pmw.fortiss.org München, 2015-11-06

  11. PCM Meta-model Extension Service effect specification (SEFF) actions CallReturnAction CallAction 0..1 0..1 * * VariableUsage * OperationRequired 1 Role ExternalCallAction InterCallAction 1 - retryCount : Integer 0..1 OperationSignature SetVariableAction DistributedCallAction AbstractInternal - totalForkCount : Integer AbstractAction ControlFlowAction - simultaneousForkCount: Integer PCM Version 3.4.1 11 pmw.fortiss.org München, 2015-11-06

  12. PCM Meta-model Extension Resource environment <<Enumeration>> Resource Environment SchedulingPolicy 1 1 - DELAY - PROCESSOR_SHARING - FCFS - ROUND_ROBIN <<Enumeration>> * * ResourceRole * ResourceContainer LinkingResource * - CLUSTER - MASTER 1 1 1 - WORKER 0..1 1 ClusterResourceSpecification * ProcessingResource CommunicationLink - resourceRole : ResourceRole Specification ResourceSpecification - actionSchedulingPolicy : SchedulingPolicy PCM Version 3.4.1 12 pmw.fortiss.org München, 2015-11-06

  13. PCM Meta-model Extension Service effect specification (SEFF) diagram 13 pmw.fortiss.org München, 2015-11-06

  14. PCM Meta-model Extension Resource environment diagram 14 pmw.fortiss.org München, 2015-11-06

  15. Agenda • Motivation • Development Process and Characteristics of Big Data Systems • Palladio Component Model (PCM) Meta-model Extension • Related Work • Conclusion and Future Work 15 pmw.fortiss.org München, 2015-11-06

  16. Related Work • Ginis and Strom (2013) present a method for predicting the response time of stream processes in distributed systems • Verma et al. (2011) introduce the ARIA framework which specifies on strategy scheduling of single Apache MapReduce jobs • Vianna et al. (2013) propose an analytical performance model which focuses on the pipeline between map and reduce jobs • Barbierato et al. (2013) and Ge et al. (2013) present modeling techniques for Apache MapReduce which allow to estimate response times only • Castiglione et al. (2014) use Markovian agents and mean field analysis to model big data batch applications and to provide information about performance of cloud-based data processing architectures 16 pmw.fortiss.org München, 2015-11-06

  17. Agenda • Motivation • Development Process and Characteristics of Big Data Systems • Palladio Component Model (PCM) Meta-model Extension • Related Work • Conclusion and Future Work 17 pmw.fortiss.org München, 2015-11-06

  18. Conclusion and Future Work • We introduced a modeling approach that allows to model essential characteristics of data processing as found in big data systems • We presented to meta-model extensions for PCM .. … to model a computer cluster and … to apply distributed and parallel operations on this cluster • We plan to ... … complete extending the simulation framework SimuCom … fully evaluate our extensions for up- and downscaling scenarios … automatically derive performance models based on measurement data 18 pmw.fortiss.org München, 2015-11-06

  19. References • Barbierato, E., Gribaudo, M., Iacono, M.: Performance evaluation of nosql big-data applications using multi-formalism models. Future Generation Computer Systems 37(0), 345-353 (2014) • Casado, R., Younas, M.: Emerging trends and technologies in big data processing. Concurrency and Computation: Practice and Experience 27(8), 2078-2091 (2015) • Castiglione, A., Gribaudo, M., Iacono, M., Palmieri, F.: Modeling performances of concurrent big data applications. Software: Practice and Experience (2014) • Ge, S., Zide, M., Huet, F., Magoules, F., Lei, Y., Xuelian, L.: A Hadoop MapReduce performance prediction method. In: Proceedings of the IEEE 10th International Conference on High Performance Computing and Communications & 2013 IEEE International Conference on Embedded and Ubiquitous Computing, 820-825 (2013) • Ginis, R., Strom, R.E.: Method for predicting performance of distributed stream processing systems. US Patent 8,499,069 , url: https://www.google.com/patents/US8499069 (2013) • Kroß, J., Brunnert, A., Prehofer C., Runkler, T., Krcmar, H.: Stream processing on demand for lambda architectures. Computer Performance Engineering (Vol. 9272) Eds.: M. Beltrán, W. Knottenbelt, and J. Bradley, pp. 243-257. Springer International Publishing (2015) • Verma, A., Cherkasova, L., Campbell, R.H.: Aria: automatic resource inference and allocation for mapreduce environments. In: Proceedings of the 8th ACM International Conference on Autonomic Computing. pp. 235-244. ACM, New York, NY, USA (2011) • Vianna, E., Comarela, G., Pontes, T., Almeida, J., Almeida, V., Wilkinson, K., Kuno, H., Dayal, U.: Analytical performance models for mapreduce workloads. International Journal of Parallel Programming 41(4), 495-525 (2013) 19 pmw.fortiss.org München, 2015-11-06

  20. Q&A Johannes Kroß kross@fortiss.org performancegroup@fortiss.org pmw.fortiss.org 20 pmw.fortiss.org München, 2015-11-06

Recommend Documents


extending the palladio component model using profiles and
Extending the Palladio Component Model

Extending the Palladio Component Model using Profiles and Stereotypes

towards automated software project planning
Towards Automated Software Project

KPD Symposium 2013 Position Paper Towards Automated Software Project Planning

predicting energy consumption by extending the palladio
Predicting Energy Consumption by

Stuttgart, Germany, 2014-11-27 Predicting Energy Consumption by Extending the

automated transformation from descartes modeling language
Automated Transformation from Descartes

Automated Transformation from Descartes Modeling Language to Palladio

modeling complex user behavior with the palladio
Modeling Complex User Behavior with the

Mnchen, 2015-11-06 Modeling Complex User Behavior with the Palladio Component

controlling the palladio bench using the descartes query
Controlling the Palladio Bench using

Controlling the Palladio Bench using the Descartes Query Language

data modeling
Data Modeling Database Systems: The

Data Modeling Database Systems: The Complete Book Ch. 4.1-4.5, 7.1-7.4 Data

analyze data contention for modernizing transactional
Analyze Data Contention for Modernizing

Extending the Palladio Component Model to Analyze Data Contention for

extending the gc hardware
Extending the GC hardware Rob Reilink

Extending the GC hardware Rob Reilink Extending the GC hardware Why? GC can

mapping data flow models to the palladio component model
Mapping Data Flow Models to the

Mapping Data Flow Models to the Palladio Component Model Stephan Seifermann,

final project m1 cs 327e
Final Project M1 CS 327E October 30,

Final Project M1 CS 327E October 30, 2017 Employee table Final Project

introduction to
Introduction to (incubating)

Introduction to (incubating) ApacheCon Big Data, September 2015

project 1
Project 1 2 MapReduce is

Cloud Computing ECPE 276 Project 1 2 MapReduce is Dead?

leveraging public clouds for doe environmental streaming
Leveraging Public Clouds for DOE

Leveraging Public Clouds for DOE Environmental Streaming Data Marty Humphrey

serverless iot applications
serverless IoT-Applications BED-Con

serverless IoT-Applications BED-Con 2017 Niko Will, innoQ @n1ko_w1ll about

building data applications with go
Building Data applications with Go

Building Data applications with Go from Bloom filters to Data pipelines

serverless architectures
SERVERLESS ARCHITECTURES (WITH AWS

SERVERLESS ARCHITECTURES (WITH AWS INFRASTRUCTURE) Niko Kbler

building serverless applications with lambda
Building Serverless Applications with

Building Serverless Applications with Lambda Craig Golightly SENIOR SOFTWARE

preparing for a future microservices journey
Preparing For a Future Microservices

Preparing For a Future Microservices Journey Susanne Kaiser Independent Tech

towards nonmonotonic relational learning from knowledge
Towards Nonmonotonic Relational

Motivation Problem Statement Approach Overview Experiments Towards

creating knowledge graphs via a symbiosis of data science
Creating Knowledge Graphs via a

Creating Knowledge Graphs via a Symbiosis of Data Science and Data Engineering

knowledge networks a research agenda
Knowledge & Networks: A Research

Knowledge & Networks: A Research Agenda Steve Borgatti Dept. of