In-mem DB Performance, Flash Cost Enabling Real-time AI June 2018 - - PowerPoint PPT Presentation

in mem db performance flash cost enabling real time ai
SMART_READER_LITE
LIVE PREVIEW

In-mem DB Performance, Flash Cost Enabling Real-time AI June 2018 - - PowerPoint PPT Presentation

In-mem DB Performance, Flash Cost Enabling Real-time AI June 2018 The Data-Driven Business Challenge From Reactive to Proactive AI Event-Driven Value of Data Interactive Batch Real-time Minutes Days Time to Action 2 Big and S low or


slide-1
SLIDE 1

June 2018

In-mem DB Performance, Flash Cost Enabling Real-time AI

slide-2
SLIDE 2

2

From Reactive to Proactive

The Data-Driven Business Challenge

Value

  • f Data

Time to Action

Real-time Minutes Days Interactive Event-Driven Batch AI

slide-3
SLIDE 3

Batch Layer Real-time Layer

ETL Tools Change Log Batch Processing Data Lake

View 1 View 2

In- Memory NoS QL

S tream S tream Processing S erving

  • Big data but slow
  • Not up to date
  • Complex

3

Too slow

Big and S low or S mall and Fast

Reports Real-time Dashboard

  • S

mall amounts of data

  • Expensive
  • Lacks context

Limited context OR

Data Sources

slide-4
SLIDE 4

4

Traditional Approach, DB over File over Flash

Traditional Layered Approach

Rigid APIs Database File System HCI / Storage Stack

Ext ernal (NVMeOF / Obj ect ) 10 GbE fabric 10-100 GbE fabric

VM Hypervisor

  • S

low

  • Complex
  • Expensive

For every file IOs conducted by the DB

(Record, Redo/Undo, Metadata, ..)

slide-5
SLIDE 5

5

New Cloud Databases Are Built to S cale Ops & Capacity

Decouple access, processing, and capacity and eliminate storage serialization

API & Transaction Distributed Processing & Cache Capacity (Object)

slide-6
SLIDE 6

6

Breaking The Volume and Velocity Barrier

Re-engineer the stack to deliver memory speed with Flash density

100TB NVMe Flash (direct attached) Apps, APIs, and Functions

100 GbE fabric

Real-time Firewall Real-Time DB S upport many standard APIs

  • n a common DB Engine

Unique architecture which use NVMe Flash as an extension of OS Memory

slide-7
SLIDE 7

Breaking Performance Barriers – Design Principles

Zero processing wastes CPU cache opt imizat ion and predict ion E2E zero buf f er dat a f low (NIC t o Disk, accelio) Complet e OS bypass HW awareness RDMA, NVMe (3DXP) Vect or processing operat ions IRQ balancing and t hrot t ling Never blocking, never locking, 100% parallelism Lat ency opt imized, QoSaware, dat a scheduler Lockless, preempt less memory management True scale out t hrough parallelism

7

slide-8
SLIDE 8

Ok, any other challenges on the way to real-time AI ?

slide-9
SLIDE 9

90%

  • f AI Today

Build feature vectors using batch and CS Vs Inspect, Improve

How do we form complex feature vectors in real-time? How do we visualize or act on the results in real-time?

slide-10
SLIDE 10

10

Moving to Continuous Ingest + AI + S erve Flow

External Data lakes

slide-11
SLIDE 11

11

From S ilos and ETLs to All-in-one DBs

Traditional: Unique Model Per Store Multi-Model Store

File Obj ect K/ V Table (fixed) Document S t ream

Dir (tree) Name (tree) Key (Random hash) Extended Metadata Data Blob (immutable) Key (Random hash) Value Blob (immutable) S imple Metadata Key (S eq tree) Value (typed) Key (S eq tree) Value (Flex) attr Value (typed) Value (typed) Value (Flex) attr Topic Value Blob ts Value Blob S hard / Metric ts

Index Met adat a & dat a

S imple Metadata Data Extents Key (hash) Name Base Metadata Path Value (Flex) attr code Value (Flex) attr code

Mult iple Indexes Random, sequent ial and hierarchical S ize, t ime, t ype, owner Any Dat a Type

  • Nest ed at t ribut es (encoded)
  • Flexible value t ypes
  • Can be organized and viewed

as ext ent s, rows, cols or logs Column families Independent t iering logic for indexes, met adat a and dat a

slide-12
SLIDE 12

Ingest/ compress In real-time

12

Time S eries Data Example

Raw time series sample data

Labels

Optimized TSDB Layout (per unique metric)

Pre-aggregation arrays: (to accelerate queries) T/ V chunks with 10:1 Gorilla compression Filter based on labels

Thousands of samples Dat a

Real-time Consistency 50 : 1 Compression 10–100x Faster Queries

slide-13
SLIDE 13

13

  • Write code + local testing
  • Build code and Docker image
  • CI/ CD pipeline
  • Add logging and monitoring
  • Harden security
  • Provision servers + OS
  • Handle data/ event feed
  • Handle failures/ auto-scaling
  • Handle rolling upgrades
  • Configuration management
  • Write code + local testing
  • Provide spec, push deploy

Traditional Dev and Ops Model “Serverless” Development Model

S erverless, The New S tored Procedure

  • 1. Automated by the

serverless platform

  • 2. Pay for what you use

80%

slide-14
SLIDE 14

14

  • Non-blocking, parallel
  • Zero copy, buffer reuse
  • Up to 400K events/ sec/ proc

Addressing S erverless Limitations With Nuclio

Funct ion Workers Event List eners

Serverless for compute and data intensive tasks 100x faster than AWS Lambda !

Performance

Shard 1

Workers Workers

Shard 2 Shard 3 Shard 4

Workers

Streaming and Batch

DB, MQ, File

Functions

  • Auto-rebalance, checkpoints
  • Any source: Kafka, NATS

, Kinesis, event- hub, iguazio, pub/ sub, RabbitMQ, Cron

  • Data bindings
  • S

hared volumes

  • Context cache

Statefulness

nuclio processor

slide-15
SLIDE 15

15

Delivering Intelligent Decisions in Real-Time

Ingested in real-time (compressed to 10TB)

500TB of Raw Data

External Context

Ingest Enrich AI Act

Unified Real-Time DB

Real-time triggers Real-time and historical dashboards Serve

ML Models

slide-16
SLIDE 16

16

Cyber and Network Ops

  • Processing high message throughput

from multiple streams at the rate of > 50K events/ sec

  • Cross correlating with historical and

external data in real-time

  • AI predictions/ inferencing conducted
  • n live data
  • S

mall footprint to fit network locations

A leading telco needs to predict network behavior in real-time:

slide-17
SLIDE 17

Traditional Continuous Analytics

Data Source Data Stores Data Source Data Stores Data Source Streaming Data Stores

Real-time and Batch

Streaming ETL

Real-time Batch

Build and Operationalize Proactive S ystems Faster

S pect rum S t reaming Net cool S t reaming S MOD S pect rum S t reaming Net cool S t reaming S MOD

REST API Visualization Visualization Visualization + Actions

  • Complex, skill gaps, slow to productize
  • No single view of ops, real-time, history
  • Reactive (no actions)
  • Simple, just a few weeks to a working app
  • Unified view across ALL data
  • AI driven, proactive
slide-18
SLIDE 18

Time Series Vectors

(Avg, Min/Max, Stdev per sensor)

Process Sensor Data

  • ML Models
  • Machine Metadata
  • Environmental data

Real-time dashboard

Real-time Alerts Predicted Alerts

Aggregate using Time Series APIs

Every 6 hours Every 15 minut es

Devices & Machines

AI Predict

Azure ML

Upload to Cloud

Query APIs

S t ream Trigger

NoSQL & Time Series API

intelligent edge

Web hook

Update ML Model

Predictive Maintenance Based on Real-time + Historical + Ops Data

slide-19
SLIDE 19

Demo: Voice Driven Real-Time Analytics

Voice Query SQL API AI Update Locations

SMART HOME DEVICE GOOGLE MAP SERVICE WEB UI (REACT) SQL Query

slide-20
SLIDE 20

Demo Video

20

slide-21
SLIDE 21

21

  • Deliver real-time analytics on fresh, historical and operational data
  • Optimize Flash usage to deliver in-memory speed at much lower costs
  • Create a unified data layer for stream processing, AI and serving
  • Adopt cloud-native and serverless approaches to gain agility

Build continuous, data-driven and proactive apps

S ummary

slide-22
SLIDE 22

i nf o@i guazi o. com | www. i guazi o. com

Thank Y

  • u