Senska Towards an Enterprise Streaming Benchmark Dachstuhl Seminar - - PowerPoint PPT Presentation

senska towards an enterprise streaming benchmark
SMART_READER_LITE
LIVE PREVIEW

Senska Towards an Enterprise Streaming Benchmark Dachstuhl Seminar - - PowerPoint PPT Presentation

Senska Towards an Enterprise Streaming Benchmark Dachstuhl Seminar 17441 - Big Stream Processing Systems 31 st October 2017 Guenter Hesse 1 Motivation In a GE battery production plant in New York (state), 10,000 different data attributes


slide-1
SLIDE 1

Senska Towards an Enterprise Streaming Benchmark

Dachstuhl Seminar 17441 - Big Stream Processing Systems 31st October 2017 Guenter Hesse

1

slide-2
SLIDE 2

Motivation

  • In a GE battery production plant in New York (state), 10,000 different data

attributes are captured, some as often as every 250ms [3]

  • Modern manufacturing equipment, e.g., injection molding machines, generate

up to terabytes, daily [2]

  • By 2025, total global worth of IoT technology USD 6.2 trillion [1]
  • Industrial manufacturing is one of the industry sectors investing most on IoT [1]

[1] http://www.intel.com/content/www/us/en/internet-of-things/infographics/guide-to-iot.html [2] Huber, M.F., Voigt, M., Ngomo, A.N.: Big data architecture for the semantic analysis of complex events in manufacturing. In: Informatik 2016, 46. Jahrestagung der Gesellschaft für Informatik, 26.-30. September 2016, Klagenfurt, Osterreich. pp. 353-360 (2016) [3] Weiner, S., Line, D. 2014. Manufacturing and the data conundrum - Too much? Too little? Or just right? https://www.eiuperspectives.economist.com/sites/default/files/Manufacturing_Data_Conundrum_Jul14.pdf. (2014). Accessed: 2017-03-01.

2

http://3.bp.blogspot.com/-9uJ1ni0tb7g/UcgNtWqKYrI/ AAAAAAAACe4/iwxNn-eiaKM/s1600/moulding+machine+lg.jpg

slide-3
SLIDE 3

Motivation

3

t 2010

http://appliance.moneta.com.mx/images/ Infosphere.png

2013

http://storm.apache.org/images/logo.png

2013

https://upload.wikimedia.org/wikipedia/commons/5/50/ Samza_Logo.png

2013

https://mapr.com/blog/quick-guide-spark-streaming/ assets/spark-streaming-logo.png

2014

https://flink.apache.org/img/logo/png/1000/ flink_squirrel_1000.png

2014

https://azure.microsoft.com/svghandler/stream-analytics/? width=600&height=315

2015

https://twitter.github.io/heron/img/ HeronTextLogo.png

2015

https://qph.ec.quoracdn.net/main-thumb- t-1401165-200- sjxjnvzzmvkykdhdkwmrriupmnletogh.jpeg

2016

https://upload.wikimedia.org/wikipedia/commons/ e/e1/Apache_Apex_Logo.png

Data Stream Processing Systems

slide-4
SLIDE 4

Related Work

4

[1] Arasu, A., Cherniack, M., Galvez, E., Maier, D., Maskey, A.S., Ryvkina, E., Stonebraker, M., Tibbetts, R.: Linear road: A stream data management benchmark. In: Proceedings of the Thirtieth International Conference on Very Large Data Bases - Volume 30. pp. 480{491. VLDB '04, VLDB Endowment (2004) [2] Lu, R., Wu, G., Xie, B., Hu, J.: Stream bench: Towards benchmarking modern distributed stream computing frameworks. In: Proceedings of the 2014 IEEE/ACM 7th International Conference on Utility and Cloud Computing. pp. 69{78. UCC '14, IEEE Computer Society, Washington, DC, USA (2014) [3] Shukla, A., Chaturvedi, S., Simmhan, Y.: Riotbench: A real-time iot benchmark for distributed stream processing platforms. CoRR abs/1701.08530 (2017

[1] [2] [3]

slide-5
SLIDE 5

Related Work

5

Linear Road

  • focus on single-node DSPS
  • barely use of historical data
  • partly too complex queries
  • limited metrics

StreamBench

  • typical streaming operations

missing (e.g., window functions)

  • no validation

RIoTBench

  • no tool support (data

ingestion, result validation)

  • no historical data
slide-6
SLIDE 6

Related Work

6

Currently, there is not satisfying Enterprise Streaming Benchmark

Linear Road

  • focus on single-node DSPS
  • barely use of historical data
  • partly too complex queries
  • limited metrics

StreamBench

  • typical streaming operations

missing (e.g., window functions)

  • no validation

RIoTBench

  • no tool support (data

ingestion, result validation)

  • no historical data
slide-7
SLIDE 7

Contributions/Scope of Senska

  • Design of benchmark architecture
  • Definition and validation of query set
  • Design and development of benchmark toolkit for
  • Data ingestion
  • Result validation
  • Metric calculation
  • Systems’ setup
  • Reference implementation that can be used for

benchmarking various systems

7

slide-8
SLIDE 8

Architecture

8

Data Feeder System Under Test (Query Implementation) Result Validator

System Under Test Message Broker (Apache Kafka) Data Sender (Toolkit) Input Data (Sensor Data) DBMS (Transactional Data) Benchmark Query Implementation Result Validator and Metric Calculator (Toolkit) Data and Workload Generator (Toolkit)

General Architecture Streaming Benchmark Architecture of Senska

slide-9
SLIDE 9

9

Architecture - In Detail

slide-10
SLIDE 10

Thank you for your attention!

Guenter.Hesse@hpi.de

10

System Under Test Message Broker (Apache Kafka) Data Sender (Toolkit) Input Data (Sensor Data) DBMS (Transactional Data) Benchmark Query Implementation Result Validator and Metric Calculator (Toolkit) Data and Workload Generator (Toolkit)