Real-Time Analytics Meets Kubernetes Tal Doron Director, - - PowerPoint PPT Presentation

real time analytics meets kubernetes
SMART_READER_LITE
LIVE PREVIEW

Real-Time Analytics Meets Kubernetes Tal Doron Director, - - PowerPoint PPT Presentation

Real-Time Analytics Meets Kubernetes Tal Doron Director, Technology Innovation ABOUT ME @taldor oron on taldor oron on84 tald ld@gigaspaces.com Ta Tal Doron Director, Technology Innovation About GigaSpaces 300+ Direct customers


slide-1
SLIDE 1

Real-Time Analytics Meets Kubernetes

Tal Doron Director, Technology Innovation

slide-2
SLIDE 2

Ta Tal Doron

Director, Technology Innovation

ABOUT ME

@taldor

  • ron
  • n

taldor

  • ron
  • n84

tald ld@gigaspaces.com

slide-3
SLIDE 3

We provide one of the leading in-memory computing platforms for real-time insight to action and extreme transactional processing. With GigaSpaces, enterprises can operationalize machine learning and transactional processing to gain real-time insights on their data and act upon them in the moment.

About GigaSpaces

Direct customers

300+

Fortune / Organizations

50+ / 500+

Large installations in production (OEM)

5,000+

ISVs

25+

InsightEdge is an in-memory real- time analytics platform for instant insights to action; analyzing data as it's born, enriching it with historical context, for smarter, faster decisions In-Memory Computing Platform for microsecond scale transactional processing, data scalability, and powerful event-driven workflows

slide-4
SLIDE 4

Why

* Intro pictures from Wikipedia

slide-5
SLIDE 5

Dinosaurs

slide-6
SLIDE 6

Dinosaurs

slide-7
SLIDE 7

Dinosaurs

slide-8
SLIDE 8

We’ve looked up to the stars

slide-9
SLIDE 9

Not without first passing through the clouds

slide-10
SLIDE 10

It’s the smallest of opponents that are gamechangers

slide-11
SLIDE 11

We needed to find a way to ship man there…

The first flight of an airplane, the Wright Flyer on December 17, 1903

slide-12
SLIDE 12

How do we become cloud native?

  • Ma

Mana nage L Larg rge D Depl ployment nts

  • Cloud-ready, ZooKeeper based for large-scale and federated deployments
  • RE

REST AP API M Management

  • Standards-based, utilizing
  • Container

eriza zation and Orch chest estration

  • Docker, Kubernetes, OpenShift etc.
  • Applica

cation-dri driven D n Depl ployment nt

  • Serverless-like user experience
  • Plugga

ggable e Elast stic c Reso esource ce Balanci cing g

  • Scheduling for dynamic re-partitioning and resource allocation
  • Tel

elem emet etry and Clust ster er In Intel elligen gence ce

  • Predictive maintenance / fault-tolerance over large-scale deployments
slide-13
SLIDE 13

Who’s using K8s?

slide-14
SLIDE 14
  • An overview of Kubernetes and the value it is bringing for automating deployment,

scaling, and management of containerized applications

  • How organizations can simplify management and container deployment on Cloud,

Hybrid or On-premises environments with GigaSpaces InsightEdge

  • 3 top open-source tools for production: HELM, Istio, and Prometheus
  • A Kubernetes services comparison between cloud providers: AWS vs. Azure vs. GCP

OVERVIEW

slide-15
SLIDE 15

How Can You Gain the Most Value from Your Data?

REAL-TIME SECONDS MINUTES HOURS DAYS MONTHS Actionab le Reactive Historical

Ti Time-cr critical cal de decision Tr Trad aditional al “ “bat atch ch” bu business in intellig igence

Preventive/ Predictive Actionable Reactive Historical

Time Value

Ne Near real-ti time data ta is highly valuable if you act on it on time His istorical

  • rical + near

ar re real-ti time data ta is more

  • re valuable if

you have the means to combine them

slide-16
SLIDE 16

InsightEdge: Real-time Analytics for Instant Insights To Action

VARIOUS DATA SOURCES UNIFIED REAL-TIME ANALYTICS, AI & TRANSACTIONAL PROCESSING REAL-TIME LAYER DISTRIBUTED IN-MEMORY MULTI MODEL STORE RAM STORAGE-CLASS MEMORY SSD STORAGE

HOT DATA WARM DATA

APPLICATION

REAL-TIME INSIGHT TO ACTION

DASHBOARDS

  • No ETL, reduced

complexity

  • Built-in integration with

external Hadoop/Data Lakes S3-like

  • Fast access to historical

data

  • Automated

life-cycle management

DEPLOY ANYWHERE CLOUD/ON-PREMISE

BATCH LAYER

COLD DATA
slide-17
SLIDE 17

Kubernetes

At leas ast 54% % of

  • f the Fort
  • rtun

une 500 00 we were re hirin iring for

  • r

Ku Kubernetes s skills i in 2 2017 Aroun round d 51% % growt rowth for

  • r

Ku Kubernetes s share i in t the ma market in 2018

slide-18
SLIDE 18
  • #1 discussed project on GitHub
  • Top 2 in number of contributors
  • ~400K users on Slack

Kubernetes is the Winner

slide-19
SLIDE 19
  • The leading orchestration tool vs. Docker Swarm, Mesos, OpenShift and

Cloud Foundry and most used CNCF project

  • All cloud vendors have a managed Kubernetes service (EKS, AKS and GKE)
  • Apache Spark 2.3 has native Kubernetes support

Business Landscape

slide-20
SLIDE 20

Why Kubernetes?

Desired State Scheduler Cooperative Multi-Tenancy Service Account Authentication RBAC Authorization HA Architecture

Ke Key bui buildi ding bl blocks s for a “cloud ud like” pl platform a m as a a s service

  • Auto deployment of data services,

functions and frameworks (Spark ML, SQL, Zeppelin, etc.)

  • Orchestration automation with

cloud native solutions (auto scale, self healing)

slide-21
SLIDE 21

Kubernetes – Management POD

MANAGEMENT POD

GSA LOOKUP SERVICE APACHE ZOOKEEPER REST MANAGER

  • Lookup Service (LUS) - The Lookup

Service provides a mechanism for services to discover each other. For example, querying the LUS to find active GSCs.

  • Apache ZooKeeper - Zookeeper is a

centralized service used for space leader election

  • REST Manager - RESTful API for

managing the environment remotely from any platform

NODE

slide-22
SLIDE 22

Kubernetes – Data POD

GSA

DATA POD

DATA GRID INSTANCE #1

  • Data Grid Instance - This is the

fundamental unit of deployment in the data grid. A Processing Unit instance is the actual runtime entity.

  • Each Data POD contains a single

instance to provide cloud native support using Kubernetes built-in controllers (auto scale, self healing)

NODE DATA POD

DATA GRID INSTANCE #N

. . . . .

slide-23
SLIDE 23

Kubernetes – Spark POD

GSA

DRIVER POD

SPARK DRIVER

  • Driver Pod – The Spark driver is

running within a POD. The driver creates executors, connects to them, and executes the applicative code.

  • Executor Pod – When the application

completes, the executors’ pods terminate and are cleaned up, but the master pod persists logs and remains in “completed” state

NODE A EXECUTOR POD

SPARK EXECUTOR

EXECUTOR POD

SPARK EXECUTOR

NODE B EXECUTOR POD

SPARK EXECUTOR

EXECUTOR POD

SPARK EXECUTOR

CLIENT

spark-submit

slide-24
SLIDE 24

XAP High Level Overview 3,1

CLIENT CLIENT CLIENT

REST SELECT

NODE 1

DATA POD C’ DATA POD A MANAGEMENT POD #1

NODE 2

DATA POD A’ DATA POD B MANAGEMENT POD #2

NODE 3

DATA POD B’ DATA POD C MANAGEMENT POD #3

slide-25
SLIDE 25

InsightEdge High Level Overview 3,1

CLIENT CLIENT CLIENT

spark-submit SELECT

NODE 1

DATA POD C’ DATA POD A MANAGEMENT POD #1 SPARK EXECUTOR POD

NODE 2

DATA POD A’ DATA POD B MANAGEMENT POD #2 ZEPPELIN POD SPARK EXECUTOR POD

NODE 3

DATA POD B’ DATA POD C MANAGEMENT POD #3 SPARK EXECUTOR POD SPARK DRIVER POD

slide-26
SLIDE 26

Kubernetes Dashboard View

slide-27
SLIDE 27
  • Apply a POD Anti-Affinity using label selectors for both Data and

Management PODs

  • For example: spread the primary and backup data pods from this service

across zones

  • Each POD has a persistent identifier that is maintained across any

rescheduling using StatefulSets

  • For example: automated rolling updates/scale up data pod one-by-one

“Under the Hood” Guidelines

slide-28
SLIDE 28
  • HELM – The package manager for Kubernetes
  • Helm Charts helps you define, install and upgrade both XAP

and InsightEdge

Installation

# helm install gigaspaces/insightedge --version=14.0 --name demo

slide-29
SLIDE 29
  • The following Helm deploys a cluster with 3 partitions with

512MiB allocated for each partition:

Installation – Define Capacity

# helm install gigaspaces/insightedge --version=14.0 --name demo

  • -set pu.partitions=3 ,pu.resources.limits.memory=512Mi
slide-30
SLIDE 30
  • The following Helm command deploys a cluster in a high

availability topology, with anti-affinity enabled:

Installation – Define High Availability

# helm install gigaspaces/insightedge --version=14.0 --name demo

  • -set pu.ha=true,pu.antiAffinity.enabled=true
slide-31
SLIDE 31
  • Use liveness probes to notify Kubernetes that your application’s

processes are unhealthy and it should restart them

  • The probe calls a bash script

Testing for Liveness

livenessProbe: exec: command:

  • sh
  • -c - “data-pod-liveness 3181"

initialDelaySeconds: 15 timeoutSeconds: 5

slide-32
SLIDE 32
  • Use readiness probes to notify Kubernetes that your application’s

processes are able to process input, for example: when data is loading the pod not yet ready.

  • The probe calls a bash script

Testing for Readiness

readienssProbe: exec: command:

  • sh
  • -c - “data-pod-ready 2251"

initialDelaySeconds: 15 timeoutSeconds: 5

slide-33
SLIDE 33

Lang

WAN Gateway – Real-time IMDG Data Replication WAN Gateway WAN Gateway WAN Gateway

API

Any Cloud

slide-34
SLIDE 34

WAN Gateway

NODE 1

DATA POD C’ DATA POD A

MANAGEMENT POD

NODE 2

DATA POD A’ DATA POD B

MANAGEMENT POD

WEB UI POD

NODE 3

DATA POD B’ DATA POD C

MANAGEMENT POD3

DATA POD D DATA POD D’

NODE 1

DATA POD C’ DATA POD A

MANAGEMENT POD

NODE 2

DATA POD A’ DATA POD B

MANAGEMENT POD

WEB UI POD

NODE 3

DATA POD B’ DATA POD C

MANAGEMENT POD

DATA POD D DATA POD D’

CLUSTER A CLUSTER B

WAN GW POD

PUBLIC IP

DELEGATOR SINK

WAN GATEWAY POD

WAN GW POD

slide-35
SLIDE 35

New York London

Gateway Service

Backup Partition 1 Primary Partition 1 Backup Partition 2 Primary Partition 2 London

Delegator Sink

Gateway Proxy

Hong Kong

Gateway Service

New York

Delegator Sink

Gateway Proxy

Hong Kong Backup Partition 1 Primary Partition 1 Backup Partition 2 Primary Partition 2

Hong Kong

Site DB

Asynchronous persistency

Site DB

Asynchronous persistency

1

1. Updates in New York cluster are pushed to local Delegator 2. Delegator sends the updates to the list of target sites configured in New York Gateway 3. London Sink will write the data to London Cluster 4. Any conflicts that occur are resolved using the custom Conflict Resolution algorithm

2 3 4

Replication Flow

slide-36
SLIDE 36

Auto Pod Failover

NODE 1

DATA POD B’ DATA POD A SPARK EXECUTOR POD MANAGEMENT POD

NODE 2

DATA POD B DATA POD A’ WEB UI POD SPARK EXECUTOR POD SPARK DRIVER POD

slide-37
SLIDE 37

Auto Pod Failover

NODE 1

DATA POD B’ DATA POD A SPARK EXECUTOR POD MANAGEMENT POD

NODE 2

DATA POD B DATA POD A’ WEB UI POD SPARK EXECUTOR POD

Data Pod B Fails

1

SPARK DRIVER POD

slide-38
SLIDE 38

Auto Pod Failover

NODE 1

DATA POD B’ DATA POD A SPARK EXECUTOR POD MANAGEMENT POD

NODE 2

DATA POD B DATA POD A’ WEB UI POD SPARK EXECUTOR POD

Data Pod B Fails Failover to Data Pod B’

2 1

SPARK DRIVER POD

slide-39
SLIDE 39

Auto Pod Failover

NODE 1

DATA POD B’ DATA POD A SPARK EXECUTOR POD MANAGEMENT POD

NODE 2

DATA POD B DATA POD A’ WEB UI POD SPARK EXECUTOR POD SPARK DRIVER POD

Data Pod B Fails Failover to Data Pod B’ Data B is back up

2 1 3

slide-40
SLIDE 40

Auto Pod Failback

NODE 1

DATA POD B’ DATA POD A MANAGEMENT POD SPARK DRIVER POD SPARK EXECUTOR POD

NODE 2

DATA POD B DATA POD A’ WEB UI POD SPARK EXECUTOR POD

Data Pod B Fails Failover to Data Pod B’ Detect failure and restart Pod B Once ready failback to Pod B as “proffered primary”

1 2 3 4

slide-41
SLIDE 41

Automated Rolling Scale Up

NODE 1

DATA POD B’ DATA POD A MANAGEMENT POD SPARK DRIVER POD SPARK EXECUTOR POD

NODE 2

DATA POD A’ DATA POD B WEB UI POD SPARK EXECUTOR POD

Take Down Pod A’ Restart Pod A’ with X2 RAM Fail over to Pod A’ and restart Pod A with X2 RAM Fail back to Pod A

1 2 3 4

Repeat for each Pod

slide-42
SLIDE 42

Kubernetes Comparison

Feature/ Service GCP Azure AWS IBM Automatic Update Auto or On- demand On-demand On-demand On-Demand Auto-scaling nodes Yes No, available thorough k8s autoscale Yes No Node Pools Yes No Yes No Multiple Zones Yes No Yes Yes RBAC Yes Yes Yes Yes Bare Metal Nodes No No Yes Yes

slide-43
SLIDE 43

3 Key Technologies for Kubernetes

  • Is

Istio - Se Service Mesh

Istio manages and routes encrypted network traffic, balances loads across microservices, enforces access policies, verifies service identity and provides tracing, aggregates service to service telemetry.

  • Pr

Prome metheus – Mo Monit itorin ring

Monitor applications and infrastructure running in Kubernetes, supports service discovery, built-in alerts, and more.

  • He

Helm - Pack ackag age Man anag ager for Co Conti tinuous De Deployments ts

Repeatable deployments without all of the overhead and complication of keeping dependencies up to date and consistent

slide-44
SLIDE 44

RECORDED DEMO

LINK: https://www.youtube.com/watch?v=i4Z4__l8N9Q

slide-45
SLIDE 45

Fetch InsightEdge Helm Chart

slide-46
SLIDE 46

Installing a Data Grid

slide-47
SLIDE 47

Monitoring

slide-48
SLIDE 48

Running a Spark job

Run the following InsightEdge submit script for the SparkPi example. It calculates a Pi approximation. The result of the calculation is printed to the log. (Go to the driver pod and see the Pi value that was calculated, e.g. “Pi is roughly 3.1391756458782296”)

slide-49
SLIDE 49

Running an InsightEdge Spark Job

Run the following InsightEdge submit script for the SaveRDD example, which generates 100,000 Products, converts them to RDD, and saves them to the data grid.

slide-50
SLIDE 50

Apache Zeppelin

Zeppelin URL: http://192.168.99.100:30990

slide-51
SLIDE 51

SQL Queries

slide-52
SLIDE 52

SQL Queries

slide-53
SLIDE 53

Failover

slide-54
SLIDE 54

Failover

slide-55
SLIDE 55
slide-56
SLIDE 56

To make a long story short, we’ve built spaceships

slide-57
SLIDE 57

Ta Tal Doron

Director, Technology Innovation

THANK YOU

@taldor

  • ron
  • n

taldor

  • ron
  • n84

tald ld@gigaspaces.com