1 Pivotal ConfidentialInternal Use Only Implementing a highly - - PowerPoint PPT Presentation

1
SMART_READER_LITE
LIVE PREVIEW

1 Pivotal ConfidentialInternal Use Only Implementing a highly - - PowerPoint PPT Presentation

1 1 Pivotal ConfidentialInternal Use Only Implementing a highly scalable Stock prediction system with Apache Geode, Spring XD and Spark MLib (incubating) Fred Melo William Markito @fredmelo_br @william_markito 2 About us Fred Melo


slide-1
SLIDE 1

1 1

Pivotal Confidential–Internal Use Only

slide-2
SLIDE 2

2

William Markito

@william_markito

Fred Melo

@fredmelo_br

(incubating)

Implementing a highly scalable Stock prediction system with Apache Geode, Spring XD and Spark MLib

slide-3
SLIDE 3

About us

Fred Melo Technical Director for Data fmelo@pivotal.io @fredmelo_br William Markito Enterprise Architect for GemFire wmarkito@pivotal.io @william_markito

slide-4
SLIDE 4

A Simple Example

Data Sources Look for patterns Forecast

slide-5
SLIDE 5
slide-6
SLIDE 6

"Smart System"

Applicability

slide-7
SLIDE 7

Smart System

Learns with HISTORICAL TRENDS

Live data becomes historical

  • ver time

Real-Time

Evaluates LIVE DATA

Historical

What do we want to build?

Trading Data “According to historical trends, there’s an 80% chance this stock prices might go down within the next few minutes" "How were the technical indicator readings when the latest price drops happened? "

slide-8
SLIDE 8

Live Data

Data Temperature

Hot Cold

Apache Hawq Apache Geode / GemFire

1- Live data is ingested into the grid 3 - Results are pushed immediately to deployed applications 4 - “Hot" data ages, becoming part of the historical dataset 5 - Re-training triggered, ML model updated.

Spring XD

2 - Trained ML model compares new data to historical patterns

The Machine Learning Pipeline data flow

Spring XD

Machine Learning model

slide-9
SLIDE 9

Live Data

Data Temperature

Hot Warm

Apache Geode / GemFire

1- Live data is ingested into the grid 3 - Results are pushed immediately to deployed applications

Machine Learning model

2 - Trained ML model compares new data to historical patterns

The Machine Learning Pipeline data flow

5 - Re-training triggered, ML model updated.

Spring XD

Simplified Model

Spring XD

slide-10
SLIDE 10

Transform Sink

SpringXD

Extensible Open-Source Fault-Tolerant Horizontally Scalable Cloud-Native

Machine Learning

Enrich Filter Split

Dashboard

Indicators

1 2

Predict

3

Real data Simulator

/Stocks /TechIndicators /Predictions

slide-11
SLIDE 11

Too complex?? Eating it in small bites…

slide-12
SLIDE 12

SpringXD GemFire

slide-13
SLIDE 13

Transform Sink

SpringXD

Extensible Open-Source Fault-Tolerant Horizontally Scalable Cloud-Native

Machine Learning

Enrich Filter Split

Dashboard

Indicators

1 2

Predict

3

Real data Simulator

/Stocks /TechIndicators /Predictions

slide-14
SLIDE 14

/Stocks /TechIndicators /Predictions

  • Cache
  • Configurable through XML, ,Java
  • Region
  • Distributed j.u.Map on steroids
  • Highly available, redundant
  • Member
  • Locator, Server, Client
  • Callbacks
  • Listener, Writer, AsyncEventListener, Parallel/Serial

Apache Geode Concepts

slide-15
SLIDE 15

Apache Geode HA and Fail-Tolerance

slide-16
SLIDE 16

Transform Sink

SpringXD

Extensible Open-Source Fault-Tolerant Horizontally Scalable Cloud-Native

Machine Learning

Enrich Filter Split

Dashboard

Indicators

1 2

Predict

3

Real data Simulator

/Stocks /TechIndicators /Predictions

slide-17
SLIDE 17

Transform Sink

SpringXD

Enrich Filter Split

1 2

Predict

3

Streams Pipelines Sources Sinks Filters Taps

slide-18
SLIDE 18

Transform Sink

SpringXD

Extensible Open-Source Fault-Tolerant Horizontally Scalable Cloud-Native

Machine Learning

Enrich Filter Split

Dashboard

Indicators

1 2

Predict

3

Real data Simulator

/Stocks /TechIndicators /Predictions

slide-19
SLIDE 19

medium avg (x+1) relative strength (x) medium avg (x) price(x)

Machine Learning Model (e.g. Linear Regression)

Features Label

slide-20
SLIDE 20

medium avg (x+1) relative strength (x) medium avg (x) price(x)

Machine Learning Model (e.g. Linear Regression)

Features Label

slide-21
SLIDE 21

Demo Time

Error

slide-22
SLIDE 22

https://github.com/Pivotal-Open-Source-Hub/StockInference-Spark

Source code and detailed instructions available at:

22

William Markito

@william_markito

Fred Melo

@fredmelo_br

Follow us on Twitter!

slide-23
SLIDE 23

23 1

Pivotal Confidential–Internal Use Only