Effizienz-Optimierung daten-intensiver Data Mashups am Beispiel - - PowerPoint PPT Presentation

effizienz optimierung daten intensiver data mashups am
SMART_READER_LITE
LIVE PREVIEW

Effizienz-Optimierung daten-intensiver Data Mashups am Beispiel - - PowerPoint PPT Presentation

Effizienz-Optimierung daten-intensiver Data Mashups am Beispiel von Map-Reduce Pascal Hirmer BTW 2017 BigDS Workshop Towards optimizing the efficiency of data- intensive data mashups based on the example of Map-Reduce Pascal Hirmer BTW


slide-1
SLIDE 1

Pascal Hirmer

BTW 2017 BigDS Workshop

Effizienz-Optimierung daten-intensiver Data Mashups am Beispiel von Map-Reduce

slide-2
SLIDE 2

Pascal Hirmer

BTW 2017 BigDS Workshop

Towards optimizing the efficiency of data- intensive data mashups based on the example of Map-Reduce

slide-3
SLIDE 3
  • Big Data: volume and complexity of data highly increases
  • New paradigms: Internet of Things, Industrie 4.0, Data Lakes, …
  • It is important to gain knowledge through data processing and analysis (knowledge

discovery)

  • But: gaining knowledge is difficult because of the (at least) three Vs of Big Data:
  • Volume
  • Variety
  • Velocity

Big Data Motivation

3

slide-4
SLIDE 4
  • Goal: flow-based processing, analytics, and integration of data
  • Modeling of data operations based on Pipes and Filters
  • Famous example: Yahoo! Pipes

4

Data Mashups - Definition

extract extract filter join analyze

slide-5
SLIDE 5
  • Data Mashup tools, ETL tools, and data analytics tools (e.g. KNIME) offer means

to process and analyze data

  • Focus on approaches that support abstract modeling based on the pipes and filters pattern
  • nodes: data operations (e.g., extraction, transformation, analysis)
  • edges: data flow
  • nodes are associated with services that process the data (orchestrated by workflows)
  • Offer an explorative means to process data
  • Focus lies on the Open Source Data Mashup Tool FlexMash developed at the Uni Stuttgart
  • Concepts are also applicable to different approaches for data processing

Data Processing Tools Motivation

5

slide-6
SLIDE 6
  • Overall goal of this work: Increasing the efficiency of service-based data processing
  • State of the art: data processing "in-service" (memory)  scalability / memory issues
  • Approach in a nutshell:
  • Move data processing on computing clusters and process data in parallel
  • Integration of modern data processing techniques and technologies (Map-Reduce,

Apache Spark, …)

  • Coping with the generated overhead (where is the cost-value limit?)

Motivation

S1 S3 S2 S4 S5

6

slide-7
SLIDE 7

FlexMash

Cloud-based execution

FlexMash Modeling Tool Mashup Plan

Mashup Modeler Mashup Result Domain-specific Modeling Pattern-based Transformation and Execution Visualization

?

Pattern Selection & Combination Robust Time-Critical Secure Pattern Selection Mashup Execution Environments Robust & Secure …

7

slide-8
SLIDE 8

FlexMash – Graphical User Interface

8

Download

FlexMash on Github: https://github.com/hirm erpl/FlexMash

slide-9
SLIDE 9

Main contribution (I)

Mashup Plan (non-executable) Executable representation

  • f the data flow model

Service runtime parallel data processing Parallel data processing based on computing clusters

extract filter join analyze

in-service

9

slide-10
SLIDE 10

Main contribution – decision: in-service vs. distributed/parallel

Transformation Service Repository Policies/Capabilities Services Mashup Plan (non-executable) executable model Requirements (e.g., costs)

10

slide-11
SLIDE 11
  • First approach to increase the efficiency of service-based data processing tools
  • Large efficiency advantages enabled through parallelization
  • Finding the cost-value limit is difficult
  • Future/ongoing work
  • Conducting measurements for comparison and finding cost-value limit
  • Concretizing the concepts
  • Generation of Map-Reduce jobs

Conclusion and future work

11

slide-12
SLIDE 12

?

Questions & Discussion

12

slide-13
SLIDE 13

E-Mail Telefon +49 (0) 711 685- Fax +49 (0) 711 685- Universität Stuttgart

Thank you!

Pascal Hirmer 88297 78217 Pascal.Hirmer@ipvs.uni-stuttgart.de Pascal Hirmer@ipvs.uni-stuttgart.de Universitätsstraße 38, 70569 Stuttgart, Germany