Real-time Analytics Powered by GPU-Accelerated Databases Chris - - PowerPoint PPT Presentation

real time analytics powered by gpu accelerated databases
SMART_READER_LITE
LIVE PREVIEW

Real-time Analytics Powered by GPU-Accelerated Databases Chris - - PowerPoint PPT Presentation

Real-time Analytics Powered by GPU-Accelerated Databases Chris Prendergast and Woody Christy GTC, May 8, 2017 Kinetica Background United States Army Intelligence seeks a means GPUdb goes live Commercialization to assess terrorist and other


slide-1
SLIDE 1

Real-time Analytics Powered by GPU-Accelerated Databases

Chris Prendergast and Woody Christy GTC, May 8, 2017

slide-2
SLIDE 2

Wins IDC HPC Innovation Excellence Award for work with US Postal Service.

Kinetica Background

2009 2012

United States Army Intelligence seeks a means to assess terrorist and other national security threats. No database in the market was fast or flexible enough to met their needs. Founders Amit Vij and Nima Negahban start on the pioneering use of GPUs while building a GPU-accelerated database from the ground up.

2014

Commercialization entered production with USPS.

2016

Rebranded to Kinetica. Seed funding. Moved HQ to San Francisco. Expanded management

  • team. Hired field team.

Wins IDC HPC Innovation Excellence Award for work with US Army. GPUdb goes live with the US Army Intelligence. Patent granted for “Method and system for improving computational concurrency using a multi-threaded GPU calculation engine” 2 2

slide-3
SLIDE 3

Evolution of Analytics

3

Simple Reporting

Standard Analytics

Real-time Analytics

Machine Learning Deep Learning

List customer energy consumption in the past 3 years What is the average consumption by region monthly? Per household? Residential

  • vs. Commercial?

What is the current energy consumption by a region / household? How does that compare to historic averages? How does it compare to

  • ther regions?

Given location, history, demographic, , usage, what is the likelihood of service issues/outage? Deduce from unspecified signals across a wide range of datasets the likelihood this customer will consume more/less energy? Have service interruption? GPU Acceleration

slide-4
SLIDE 4

GPU Acceleration Overcomes Processing Bottlenecks

4

4,000+ cores per device in many cases, versus 16 to 32 cores per typical CPU-based device. High performance computing trend to using GPU’s to solve massive processing challenges GPU acceleration brings high performance compute to commodity hardware Parallel processing is ideal for scanning entire dataset & brute force compute.

GPUs are designed around thousands of small, efficient cores that are well suited to performing repeated similar instructions in parallel. This makes them well-suited to the compute-intensive workloads required of large data sets.

slide-5
SLIDE 5

Kinetica: A Distributed, In-Memory Database

5

GPU-accelerated database operations Natural language processing based full-text search Native GIS and IP- address object support Real time data handlers to ingest structured and unstructured data

Deep integration with open source and commercial frameworks and applications: Hadoop, Spark, NiFi, Accumulo, H20, Tableau, Kibana and Caravel

Predictable scale out for data ingestion and querying No typical tuning, indexing, and tweaking Distributed visualization pipeline built in

slide-6
SLIDE 6

Kinetica: Unique Strengths & Capabilities

Fast, Distributed, OLAP Engine for Fast Moving, Large Scale Data

6

OLAP Performance, Scalability, Stability Geospatial Processing & Visualization API for GPU Powered Data & Compute Orchestration

Converged AI and BI Native Geospatial and Visualization Pipeline Fast Data In-Database Analytics Interactive Location-Based Analytics

slide-7
SLIDE 7

Database or Cache system serving up pre-computed aggregates

It also takes a lot of effort to re-compute aggregates and to load the serving database or cache

What is the main problem?

Challenges with Lambda and Kappa Architectures

7

slide-8
SLIDE 8

Performance BI

0.09s 2.5s

Query 2 : Sum aggregation with a subquery aggregation joining both tables LARGE TELCO

Leading Enterprise Database

8

345s 44s 0.65s 0.68s CASE STUDY Leading Enterprise Database

Query 1 : Simple average calculation on the 1.8B row table

slide-9
SLIDE 9

Real-Time, Advanced Analytics, Speed Layer for Teradata or Oracle

9

Parallel ingestion of events Lambda-type architecture for Teradata or Oracle Kinetica is speed layer with real-time analytic capabilities for millisecond SLAs Converge Machine Learning, Deep Learning, NLP, streaming and location analytics and fast Query, Reporting & Analytics with Kinetica & Teradata/Oracle

DATA IN MOTION AND REST

DATA WAREHOUSE / TRANSACTIONAL Amazon Kinesis ANALYSTS MOBILE USERS DASHBOARDS & APPLICATIONS ALERTING SYSTEMS

Kinetica Connectors

STREAM / ETL PROCESSING

Fast GPU accelerated, in- Memory Database Converge ML, DL, Streaming, Location, and QR&A

slide-10
SLIDE 10

Speed Layer for Hadoop

10 Parallel Ingestion

Parallel ingestion of events Kinetica is speed layer with real- time analytic capabilities HDFS for archival store Much looser coupling than traditional lambda architecture Batch mode Spark or MR jobs can push data to Kinetica as needed for fast query on data loaded from HDFS

EVENTS MESSAGE BROKERS Amazon Kinesis ANALYSTS MOBILE USERS DASHBOARDS & APPLICATIONS ALERTING SYSTEMS

Put, get, scan Execute complex analytics on the fly Kinetica Connectors

STREAM PROCESSING

° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° °

HDFS

(Hadoop Distributed File System)

slide-11
SLIDE 11
  • No need to regularly recompute aggregates.
  • No need to load and manage a separate serving system or cache to make deep historical aggregates available to

your stream processing code.

  • Aggregates are always up to date, as they are computed on demand; the latest events are always included
  • Better performance with significantly reduced operational complexity, hardware footprint and cost.

SIMPLIFY YOUR ARCHITECTURE

slide-12
SLIDE 12

STREAMING ANALYTICS, SIMPLIFIED

EVENTS MESSAGE BROKERS

Amazon Kinesis

ANALYSTS MOBILE USERS DASHBOARDS & APPLICATIONS ALERTING SYSTEMS

PUT, GET, SCAN Execute complex analytics on the fly Kinetica Connectors

STREAM PROCESSING

slide-13
SLIDE 13

INTELLIGENCE: US Army - INSCOM

US Army’s in-memory computational engine for any data with a geospatial or temporal attribute for a major joint cloud initiative within the Intelligence Community (IC ITE). Intel analysts are able to conduct near real-time analytics and fuse SIGINT, ISR, and GEOINT streaming big data feeds and visualize in a web browser. First time in history military analysts are able to query and visualize billions to trillions of near real- time objects in a production environment. Major executive military and congressional visibility.

Oracle Spatial (92 Minutes) 42x Lower Space 28x Lower Cost 38x Lower Power Cost U.S Army INSCOM Shift from Oracle to GPUdb GPUdb (20ms) 1 GPUdb server vs 42 servers with Oracle 10gR2 (2011)

CASE STUDY : LOCATION BASED ANALYTICS

slide-14
SLIDE 14

LOGISTICS: Route optimization

DISTRIBUTED ANALYSIS AT SCALE

200,000 USPS devices emitting location each minute à 250+ million events captured and analyzed daily… ...... tracked on 10 nodes.

USPS is the single largest logistic entity in the country, moving more individual items in four hours than the combination of UPS, FedEx, and DHL move all year.

CASE STUDY : LOCATION BASED ANALYTICS 15,000 simultaneous sessions

slide-15
SLIDE 15

PREDICTIVE INFRASTRUCTURE MANAGEMENT

15

Kinetica operates as a speed-layer with ESRI to monitor, manage, and predict infrastructure health.

LARGE UTILITY COMPANY

CASE STUDY : LOCATION BASED ANALYTICS

slide-16
SLIDE 16

LOGISTICS & FLEET MANAGEMENT

16

Kinetica enables agile tracking of shipments to assist store managers for tracking of inventory and arrival times.

  • Visibility and tracking of deliveries & trucks for store

managers

  • ETA & Notifications – Provide estimated time of delivery,

notifications and custom location based alerting

  • Route Optimization based on truck size, and if cargo is

perishable or contains hazardous materials. LARGE RETAILER

CASE STUDY : LOCATION BASED ANALYTICS

slide-17
SLIDE 17

PIPELINE & WELL ANALYTICS

17

Kinetica enables interactive query and geospatial visualization of large numbers of upstream and midstream assets.

  • Complex joins across several tables with 300m

rows of data. Approx 100GB in size.

  • Create custom visualizations, charts.
  • Visualization of wells by land ownership,

region, etc. ENERGY RESEARCH

CASE STUDY : LOCATION BASED ANALYTICS

slide-18
SLIDE 18

LIFE SCIENCES : GENOMICS RESEARCH

CASE STUDY : ADVANCED IN-DATABASE ANALYTICS

18

GPU-acceleration on Kinetica enables processing of transcriptomics to run simulations for drug research.

  • Seeking out signals from massive collection of drug targets combined with historical data.
  • Accelerate simulations of chemical reactions.
  • In-database processing to develop models, leveraging GPU acceleration for performance, and

direct access to CUDA APIs via UDFs deployed within Kinetica.

One of the things I like about Kinetica is it gives us more of a general-purpose use of the

  • technology. There has been a lot of software created to answer certain questions [but] highly

specialized tools have limited functionality and are tuned to do a certain workload.

"

Mark Ramsey, Chief Data Officer at GSK

slide-19
SLIDE 19

RISK MANAGEMENT

19

Large financial institution moves counterparty risk analysis from overnight to real-time.

  • Data collected by XVA library which computes risk

metrics for each trade

  • Risk computations are becoming more complex and

computationally heavy. xVA analysis needs to project years into the future.

  • Kinetica enables banks to move from batch/overnight

analysis to a streaming/real-time system for flexible real-time monitoring by traders, auditors and management. MULTINATIONAL BANK

CASE STUDY : ADVANCED IN-DATABASE ANALYTICS

slide-20
SLIDE 20

Faster Analytics on Inventory and Sales

0.65s 0.68s

LARGE RETAILER

Enterprise In-Mem DB

20

34s 44s 0.65s 0.68s CASE STUDY Enterprise In-Mem DB

Query 1 : Sum of retail sales grouped by region Query 2 : Sum of inventory available grouped by type

slide-21
SLIDE 21

Stop by Booth #431 and Get your Free T-shirt

www.kinetica.com