WITH APACHE STORM Mevlut Demir PhD Student The University of Texas - - PowerPoint PPT Presentation

with apache storm
SMART_READER_LITE
LIVE PREVIEW

WITH APACHE STORM Mevlut Demir PhD Student The University of Texas - - PowerPoint PPT Presentation

REAL-TIME ANALYTICS WITH APACHE STORM Mevlut Demir PhD Student The University of Texas at San Antonio Department of Electrical and Computer Engineering IN TODAYS TALK 1- Problem Formulation 2- A Real-Time Framework and Its Components


slide-1
SLIDE 1

The University of Texas at San Antonio – Department of Electrical and Computer Engineering

REAL-TIME ANALYTICS WITH APACHE STORM

Mevlut Demir PhD Student

slide-2
SLIDE 2

The University of Texas at San Antonio – Department of Electrical and Computer Engineering

1- Problem Formulation 2- A Real-Time Framework and Its Components with an existing applications 3- Proposed Framework 4- Conclusion

IN TODAY’S TALK

REAL-TIME ANALYTICS WITH APACHE STORM

slide-3
SLIDE 3

The University of Texas at San Antonio – Department of Electrical and Computer Engineering

  • Number of IoT devices increased.
  • currently ~7 billion ,by 2020 ~50 billion (exponentially growing)
  • low manufacturing costs
  • availability of internet connections
  • IoT devices consist of :
  • CPU
  • memory storage
  • a wireless connection
  • IoT devices equipment with:
  • sensors (produce data)
  • actuators ( capable of receiving commands)

1- INTRODUCTION

REAL-TIME ANALYTICS WITH APACHE STORM

slide-4
SLIDE 4

The University of Texas at San Antonio – Department of Electrical and Computer Engineering

  • An example of IoT in modern life : Robots;
  • limited on-board computation power
  • generates large amount of data
  • Challenges:
  • latency
  • computation needs (limits the robot’s mobility due to weights and

power demands)

1- INTRODUCTION

REAL-TIME ANALYTICS WITH APACHE STORM

*Google Images

slide-5
SLIDE 5

The University of Texas at San Antonio – Department of Electrical and Computer Engineering

  • Solution:
  • scalable data processing platforms -> CLOUD

It is a model for enabling ubiquitous, on-demand access to a shared pool

  • f configurable computing resources (e.g., computer networks, servers,

storage, applications and services), which can be rapidly provisioned and released with minimal management effort.[9]

  • becoming the standard computation
  • Advantages of using central data processing:
  • the ability to easily draw from vast stores of

information,

  • efficient allocation of computing resources,
  • a proclivity for parallelization.

1- INTRODUCTION

REAL-TIME ANALYTICS WITH APACHE STORM

slide-6
SLIDE 6

The University of Texas at San Antonio – Department of Electrical and Computer Engineering

  • Data transfer should be in an efficient and scalable

manner.

  • Traditional GET/POST approach is not suitable because this approach

increases latency and network traffic.

  • Parallel processing
  • Real-time analysis
  • Batch analysis

1.1- REQUIREMENTS FOR IOT DEVICES

REAL-TIME ANALYTICS WITH APACHE STORM

slide-7
SLIDE 7

The University of Texas at San Antonio – Department of Electrical and Computer Engineering

  • 2. A REAL-TIME ARCHITECTURE

REAL-TIME ANALYTICS WITH APACHE STORM

IoT Cloud Architecture [1]

  • Gateway layer:

Drivers are deployed in gateway layer.

  • Publish-subscribe messaging

layer

  • Cloud-based big data processing

layer: Apache Storm

Process data and send back to the device.

slide-8
SLIDE 8

The University of Texas at San Antonio – Department of Electrical and Computer Engineering

  • Gateways responsible for:
  • Managing drivers
  • Managing connections to the

brokers

  • Handling the load balancing of

the device data to the brokers

  • Update the gateway master
  • Update state information of

gateways in a Zookeeper.

2.1- GATEWAY LAYER

REAL-TIME ANALYTICS WITH APACHE STORM

Each has a unique ID

  • Gateway master responsible for:
  • Control gateways
  • Deploy/undeploy & start/stop

the drivers

Gateway layer [2]

slide-9
SLIDE 9

The University of Texas at San Antonio – Department of Electrical and Computer Engineering

  • Driver:
  • Data bridge between a device

and the cloud app.

  • Responsible for data conversion
  • Has name and set of

communication channels

  • Can be deployed multiple times

2.1- GATEWAY LAYER

REAL-TIME ANALYTICS WITH APACHE STORM

Each channel has a unique name

MQ Layer[2]

slide-10
SLIDE 10

The University of Texas at San Antonio – Department of Electrical and Computer Engineering

  • RabbitMQ

2.2- MESSAGING LAYER

REAL-TIME ANALYTICS WITH APACHE STORM

  • Topic based publish

subscribe broker

  • Has a rich API ; topics can be

easily created.

  • Supports Advance Message

Queuing Protocol(AMQP) and Message Queue Telemetry Transport (MQTT)

  • Low latency
  • Creates lightweight topics

RabbitMQ [3]

slide-11
SLIDE 11

The University of Texas at San Antonio – Department of Electrical and Computer Engineering

  • Kafka

2.2- MESSAGING LAYER

REAL-TIME ANALYTICS WITH APACHE STORM

  • Topic based publish subscribe

broker

  • Messages are appended to

commit log

  • Topics are divided into

partitions

  • Consumer can read the same

topic in parallel

  • Has its own messaging

protocol

  • Does not support AMQP or

MQTT

Kafka[4]

slide-12
SLIDE 12

The University of Texas at San Antonio – Department of Electrical and Computer Engineering

  • Need to detect online and
  • ffline devices
  • Storm requires coordination

among the processing units, because of its distributed nature

2.3- ZOOKEEPER

REAL-TIME ANALYTICS WITH APACHE STORM

Discovery[2]

slide-13
SLIDE 13

The University of Texas at San Antonio – Department of Electrical and Computer Engineering

  • Apache Storm

2.4- PROCESSING LAYER

REAL-TIME ANALYTICS WITH APACHE STORM

  • Fault tolerant
  • Horizontally scalable
  • Handles large amount of streaming data
  • Open source
  • Message guarantees
  • Simple programming model
  • Supports multi programming language
slide-14
SLIDE 14

The University of Texas at San Antonio – Department of Electrical and Computer Engineering

  • Apache Storm Concept

2.4- PROCESSING LAYER

REAL-TIME ANALYTICS WITH APACHE STORM

  • Stream:

Storm data model -> unbounded sequence tuple

  • Spout
  • Bolt
  • Topology

Directed acrylic graph Vertices: computation Edges: stream of data tuple

Apache Storm[5]

slide-15
SLIDE 15

The University of Texas at San Antonio – Department of Electrical and Computer Engineering

  • Apache Storm
  • Grouping

2.4- PROCESSING LAYER

REAL-TIME ANALYTICS WITH APACHE STORM

Twitter[6]

slide-16
SLIDE 16

The University of Texas at San Antonio – Department of Electrical and Computer Engineering

  • Apache Storm

2.4- PROCESSING LAYER

REAL-TIME ANALYTICS WITH APACHE STORM

Storm cluster[5]

slide-17
SLIDE 17

The University of Texas at San Antonio – Department of Electrical and Computer Engineering

  • Apache Storm

2.4- PROCESSING LAYER

REAL-TIME ANALYTICS WITH APACHE STORM

Topology

slide-18
SLIDE 18

The University of Texas at San Antonio – Department of Electrical and Computer Engineering

2.5- WRAP UP

REAL-TIME ANALYTICS WITH APACHE STORM

IoT Cloud [2]

slide-19
SLIDE 19

The University of Texas at San Antonio – Department of Electrical and Computer Engineering

TurtleBot follows a large target in front of it by trying to maintain a constant distance to the target. Compressed depth images of the Kinect camera are sent to the cloud and the processing topology calculates command messages, in the form

  • f

velocity vectors, in order to maintain a set distance from the large object in front of TurtleBot.

3- EXISITING APPLICATIONS

REAL-TIME ANALYTICS WITH APACHE STORM

Turtlebot [7]

slide-20
SLIDE 20

The University of Texas at San Antonio – Department of Electrical and Computer Engineering

  • Storm Nimbus and Zookeeper -> 1 node
  • Gateway
  • > 2 nodes
  • Storm supervisors
  • > 3 nodes
  • Brokers
  • > 2 nodes

An instance of medium flavor has 2 VCPUs, 4GB of memory, and 40GB of HDD. 4 spouts and 4 bolts are running in parallel.

3- EXISITING APPLICATIONS

REAL-TIME ANALYTICS WITH APACHE STORM

slide-21
SLIDE 21

The University of Texas at San Antonio – Department of Electrical and Computer Engineering

3- EXISITING APPLICATIONS

REAL-TIME ANALYTICS WITH APACHE STORM

Cloud Drivers[8]

slide-22
SLIDE 22

The University of Texas at San Antonio – Department of Electrical and Computer Engineering

Latency with RabbitMQ Latency with Kafka

3- EXISITING APPLICATIONS

REAL-TIME ANALYTICS WITH APACHE STORM

*[2]

slide-23
SLIDE 23

The University of Texas at San Antonio – Department of Electrical and Computer Engineering

Latency with RabbitMQ Latency with Kafka

3- EXISITING APPLICATIONS

REAL-TIME ANALYTICS WITH APACHE STORM

*[2]

slide-24
SLIDE 24

The University of Texas at San Antonio – Department of Electrical and Computer Engineering

3- EXISITING APPLICATIONS

REAL-TIME ANALYTICS WITH APACHE STORM

Latency observed in TurtleBot application.

*[2]

slide-25
SLIDE 25

The University of Texas at San Antonio – Department of Electrical and Computer Engineering

  • Introduction to a scalable, distributed architecture

and its component.

  • Apache storm is leading real-time processing

engine.

  • RabbitMQ can be chosen when latency is

requirement.

  • Proof of concept was verified by an example.
  • Proposed a new framework.

4- CONCLUSION

REAL-TIME ANALYTICS WITH APACHE STORM

slide-26
SLIDE 26

The University of Texas at San Antonio – Department of Electrical and Computer Engineering

  • [1] Kamburugamuve, Supun, et al. "Cloud-based parallel implementation of slam for mobile

robots." Proceedings of the International Conference on Internet of things and Cloud Computing. ACM, 2016.

  • [2] Kamburugamuve, Supun, Leif Christiansen, and Geoffrey Fox. "A framework for real time

processing of sensor data in the cloud." Journal of Sensors 2015 (2015).

  • [3] http://www.rabbitmq.com/
  • [4] http://kafka.apache.org/
  • [5] http://storm.apache.org/
  • [6] http://www.twitter.com/
  • [7] http:// www.turtlebot.com
  • [8] He, Hengjing, et al. "Cloud based real-time multi-robot collision avoidance for swarm

robotics." International Journal of Grid and Distributed Computing, May 7 (2015).

  • [9] http:// www.wikipedia.com
  • [10] http:// www.tensorflow.org
  • [11] http:// www.kubernetes.io
  • [12] http:// www.github.com

5- REFERENCES

REAL-TIME ANALYTICS WITH APACHE STORM

slide-27
SLIDE 27

The University of Texas at San Antonio – Department of Electrical and Computer Engineering

Q&A

REAL-TIME ANALYTICS WITH APACHE STORM

slide-28
SLIDE 28

The University of Texas at San Antonio – Department of Electrical and Computer Engineering

THANK YOU

REAL-TIME ANALYTICS WITH APACHE STORM