How-to for real-time streaming and analytics at scale with Apache - - PowerPoint PPT Presentation

how to for real time streaming and analytics at scale
SMART_READER_LITE
LIVE PREVIEW

How-to for real-time streaming and analytics at scale with Apache - - PowerPoint PPT Presentation

How-to for real-time streaming and analytics at scale with Apache Kafka and Apache Ignite Viktor Gamov, Confluent, @gamussa Denis Magda, GridGain, @denismagda Digital Transformations Challenges Application Layer 10-100x more queries and


slide-1
SLIDE 1

How-to for real-time streaming and analytics at scale with Apache Kafka and Apache Ignite

Viktor Gamov, Confluent, @gamussa Denis Magda, GridGain, @denismagda

slide-2
SLIDE 2

@gamussa @denismagda

Digital Transformations Challenges

  • 10-100x more queries and transactions
  • 50x as much data today as a decade ago
  • Overnight analytics becomes real-time

10-100x Queries and Transactions (per Sec) 50x Data Storage (Big Data) 10-1000x Faster Analytics (Hours to Sec)

Application Layer

Web-Scale Apps Mobile Apps IoT Social Media

Data Layer

NoSQL RDBMS Hadoop

slide-3
SLIDE 3

@gamussa @denismagda

In-Memory Computing and Real-Time Streaming To Solve the Challenges

§ Performance Increases 10x to 1,000x § Act faster by analyzing streams of data § Scalability up to petabytes of data

Application Layer

Web-Scale Apps Mobile Apps IoT Social Media

GridGain In-Memory Computing Platform

Transactional Persistence

Confluent Streaming Platform

slide-4
SLIDE 4

@gamussa @denismagda

Pre-Streaming Era

slide-5
SLIDE 5

@gamussa @denismagda

Streaming-First Workd

slide-6
SLIDE 6

@gamussa @denismagda

Serving Layer Apache Ignite, GridGain, etc.

Java Apps with Kafka Streams or KSQL Continuous Computation High Throughput Streaming platform API based clustering

Origins in Streams Processing

slide-7
SLIDE 7

@gamussa @denismagda

Apps Stream Processing Search KV RDBMS DW Real Time Analytics Monitoring

slide-8
SLIDE 8

@gamussa @denismagda

PRODUCER CONSUMER

Producer Application Consumer Application

  • Where to restart ?
  • How to scale and parallelize ?
  • What metrics to capture ?
  • How to handle failure & retries ?
  • How to properly use the producer

/ consumer API ?

slide-9
SLIDE 9

@gamussa @denismagda

PRODUCER CONSUMER

Sink Connector

SMTs

Source Connector

Converter SMTs Converter KAFKA CONNECT KAFKA CONNECT

  • Offset management
  • Elastic scalability
  • Parallelization
  • Task distribution
  • Metrics
  • Failure & retries
  • Configuration

management

  • REST API
  • Schemas & data types
slide-10
SLIDE 10

@gamussa @denismagda

Discover connectors, SMTs, and converters

slide-11
SLIDE 11

@gamussa @denismagda

Discover connectors, SMTs, and converters Descriptions, licensing, support, and more

slide-12
SLIDE 12

@gamussa @denismagda

User Population Coding Sophistication

Core developers who use Java/Scala Core developers who don’t use Java/Scala Data engineers, architects, DevOps/SRE BI analysts

streams

Lower the Bar to Enter the World

slide-13
SLIDE 13

@gamussa @denismagda

GridGain and Kafka Connect

💶

slide-14
SLIDE 14

@gamussa @denismagda

GridGain: Real-time Streaming and Analytics

slide-15
SLIDE 15

@gamussa @denismagda

Essential GridGain APIs

Distributed memory-centric storage

Combines the performance and scale of in- memory computing together with the disk durability and strong consistency in one system

Co-located Computations

Brings the computations to the servers where the data actually resides, eliminating need to move data over the network

Distributed Key-Value

Read, write and transact with fast key-value APIs

Distributed SQL ACID Transactions Machine and Deep Learning

Horizontally, fault-tolerant distributed SQL database that treats memory and disk as active storage tiers Supports distributed ACID transactions for key-value as well as SQL operations Set of simple, scalable and efficient tools that allow building predictive machine learning models without costly data transfers (ETL)

slide-16
SLIDE 16

@gamussa @denismagda

GridGain SQL For Real-Time Analytics

  • 1. Initial Query
  • 2. Query execution over local data
  • 3. Reduce multiple results in one

Ignite Node

Canada

Toronto Ottawa Montreal Calgary

Ignite Node

India

Mumbai New Delhi 1 2 2 3

slide-17
SLIDE 17

Demo

slide-18
SLIDE 18

Q&A