An Open-Source Streaming Machine Learning and Real-Time Analytics - - PowerPoint PPT Presentation

an open source streaming machine learning and real time
SMART_READER_LITE
LIVE PREVIEW

An Open-Source Streaming Machine Learning and Real-Time Analytics - - PowerPoint PPT Presentation

An Open-Source Streaming Machine Learning and Real-Time Analytics Architecture Using an IoT example (incubating) (incubating) Fred Melo William Markito @fredmelo_br @william_markito 1 Traditional Data Analytics - Limitations Store


slide-1
SLIDE 1

1

William Markito

@william_markito

Fred Melo

@fredmelo_br

An Open-Source Streaming Machine Learning and Real-Time Analytics Architecture

Using an IoT example

(incubating) (incubating)

slide-2
SLIDE 2

2

Traditional Data Analytics - Limitations

HDFS

Data Lake

Store Analyti cs Hard to change Labor intensive Inefficient No real-time information ETL based Data-source specific

slide-3
SLIDE 3

3

Stream-based, Real-Time Closed-Loop Analytics

HDFS

Data Lake

Expert System / Machine Learning In-Memory Real- Time Data

Continuous Learning Continuous Improvement Continuous Adapting

Data Stream Pipeline

Multiple Data Sources Real-Time Processing Store Everything

slide-4
SLIDE 4

4

A Streaming Machine Learning for IoT Example

Sensor Data

Smart System

Learns with HISTORICAL TRENDS

"How were the temperature and vibration sensors reading when the latest failures happened? " Live data becomes historical

  • ver time

Real-Time

Evaluates LIVE DATA

“According to historical trends, there’s an 80% chance this equipment would fail in the next 12 hours" Historical

Predictive Maintenance Scenario

slide-5
SLIDE 5

5

Info Analysis Look at past trends

(for similar input)

Evaluate current input

Score / Predict Machine Learning

Streaming Machine Learning

slide-6
SLIDE 6

6

Info Analysis

Filter [ json ]

Machine Learning

Streaming Machine Learning

slide-7
SLIDE 7

7

Info Analysis

Filter Enrich

Machine Learning

Streaming Machine Learning

slide-8
SLIDE 8

8

Info Analysis

Filter Enrich Transform

Machine Learning

Streaming Machine Learning

slide-9
SLIDE 9

9

Info Analysis

Filter Enrich Transform

ML Model

Streaming Machine Learning

slide-10
SLIDE 10

10

Info Analysis

Filter Enrich Transform Transform

ML Model

Streaming Machine Learning

slide-11
SLIDE 11

11

In-Memory Data Grid

Front-end

Update Push

ML Model

Streaming Machine Learning

slide-12
SLIDE 12

12

Neural Network

In-Memory Data Grid

Real-time scoring Train

Supervised Learning Example

Streaming Machine Learning

slide-13
SLIDE 13

13

Ingest Transform Sink SpringXD Store / Analyze Fast Data

Distributed Computing Predict / Machine Learning

Other Sources and Destinations JMS

A Streaming Machine Learning Reference Architecture

slide-14
SLIDE 14

Indoors Localization - Applied Example

14

slide-15
SLIDE 15

Trilateration and its limitations

15

Noisy Data Physical Barriers Large Overlap Areas Moving Targets Innacuracy Large Overlap Areas

slide-16
SLIDE 16

Particle Filters - Calculating the optimum solution

16

slide-17
SLIDE 17

Particle Filters - Calculating the optimum solution

17

slide-18
SLIDE 18

The Solution

18

  • 1. Capture signal strength
  • 2. Calculate distance from

antenna

  • 3. Trilaterate different sensors

to predict location in real-time

  • 4. Show on a map with live

updates

slide-19
SLIDE 19

Architecture Overview

19

Ingest

SpringXD

Groovy

JSON HTTP

+ Distance Transform Sink Calculate Device Distance Predict Location Spring Boot

Application Platform

GUI

slide-20
SLIDE 20

20

  • Cache
  • Configurable through XML, ,Java
  • Region
  • Distributed j.u.Map on steroids
  • Highly available, redundant
  • Member
  • Locator, Server, Client
  • Callbacks
  • Listener, Writer, AsyncEventListener, Parallel/Serial

Geode Basic Concepts

slide-21
SLIDE 21

Introduction to SpringXD

21

Runs as a distributed application or as a single node

slide-22
SLIDE 22

Spring XD

22

A stream is composed from modules. Each module is deployed to a container and its channels are bound to the transport.

slide-23
SLIDE 23

Demo

slide-24
SLIDE 24

24

Why have we selected those projects

  • Iterative & Exploratory

model

  • Web based REPL
  • Multiple Interpreters
  • Apache Geode
  • Apache Spark
  • Markdown
  • Flink
  • Python…
  • In-memory & Persistent
  • Highly Consistent
  • Extreme transaction

processing

  • Thousands of concurrent

clients

  • Reliable event model
  • Productivity
  • Built-in connectors
  • Cloud Agnostic
  • Highly Scalable
  • Easy to setup
  • Streams without coding
slide-25
SLIDE 25

25

https://github.com/Pivotal-Open-Source-Hub/WifiAnalyticsIoT

Source code and detailed instructions available at:

25

William Markito

@william_markito

Fred Melo

@fredmelo_br

Follow us on GitHub!

slide-26
SLIDE 26

26 26

William Markito

@william_markito

Fred Melo

@fredmelo_br

Implementing a Highly Scalable In-Memory Stock Prediction System with Apache Geode (incubating), R and Spring XD Room: Tohotom - 14:30, Sep 30


Fred Melo, Pivotal, William Markito, Pivotal