effizienz optimierung daten intensiver data mashups am
play

Effizienz-Optimierung daten-intensiver Data Mashups am Beispiel - PowerPoint PPT Presentation

Effizienz-Optimierung daten-intensiver Data Mashups am Beispiel von Map-Reduce Pascal Hirmer BTW 2017 BigDS Workshop Towards optimizing the efficiency of data- intensive data mashups based on the example of Map-Reduce Pascal Hirmer BTW


  1. Effizienz-Optimierung daten-intensiver Data Mashups am Beispiel von Map-Reduce Pascal Hirmer BTW 2017 BigDS Workshop

  2. Towards optimizing the efficiency of data- intensive data mashups based on the example of Map-Reduce Pascal Hirmer BTW 2017 BigDS Workshop

  3. Motivation Big Data • Big Data: volume and complexity of data highly increases • New paradigms: Internet of Things, Industrie 4.0, Data Lakes, … • It is important to gain knowledge through data processing and analysis (knowledge discovery) • But: gaining knowledge is difficult because of the (at least) three Vs of Big Data: • Volume • Variety • Velocity 3

  4. Data Mashups - Definition • Goal: flow-based processing, analytics, and integration of data • Modeling of data operations based on Pipes and Filters extract filter join analyze extract • Famous example: Yahoo! Pipes 4

  5. Motivation Data Processing Tools • Data Mashup tools, ETL tools, and data analytics tools (e.g. KNIME) offer means to process and analyze data • Focus on approaches that support abstract modeling based on the pipes and filters pattern • nodes: data operations (e.g., extraction, transformation, analysis) • edges: data flow • nodes are associated with services that process the data (orchestrated by workflows) • Offer an explorative means to process data • Focus lies on the Open Source Data Mashup Tool FlexMash developed at the Uni Stuttgart • Concepts are also applicable to different approaches for data processing 5

  6. Motivation • Overall goal of this work: Increasing the efficiency of service-based data processing • State of the art: data processing "in-service" (memory)  scalability / memory issues S1 S2 S4 S5 S3 • Approach in a nutshell: • Move data processing on computing clusters and process data in parallel • Integration of modern data processing techniques and technologies (Map-Reduce, Apache Spark, …) • Coping with the generated overhead (where is the cost-value limit?) 6

  7. FlexMash Cloud-based execution Mashup Execution Environments Robust Time-Critical FlexMash Secure ? Modeling Tool Robust & Mashup Pattern Mashup Secure Result Mashup Plan Selection & Modeler Combination … Pattern-based Domain-specific Pattern Transformation and Visualization Modeling Selection Execution 7

  8. FlexMash – Graphical User Interface Download FlexMash on Github: https://github.com/hirm erpl/FlexMash 8

  9. Main contribution (I) Mashup Plan (non-executable) Executable representation of the data flow model extract analyze filter join Service runtime in-service parallel data processing Parallel data processing based on computing clusters 9

  10. Main contribution – decision: in-service vs. distributed/parallel Requirements (e.g., costs) Transformation executable model Mashup Plan (non-executable) Service Repository Services Policies/Capabilities 10

  11. Conclusion and future work • First approach to increase the efficiency of service-based data processing tools • Large efficiency advantages enabled through parallelization • Finding the cost-value limit is difficult • Future/ongoing work • Conducting measurements for comparison and finding cost-value limit • Concretizing the concepts • Generation of Map-Reduce jobs 11

  12. Questions & Discussion ? 12

  13. Thank you! Pascal Hirmer E-Mail Pascal Hirmer@ipvs.uni-stuttgart.de Telefon +49 (0) 711 685- 88297 Fax +49 (0) 711 685- 78217 Universität Stuttgart Pascal.Hirmer@ipvs.uni-stuttgart.de Universitätsstraße 38, 70569 Stuttgart, Germany

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend