Real-time Analytics Powered by GPU-Accelerated Databases Chris - - PowerPoint PPT Presentation
Real-time Analytics Powered by GPU-Accelerated Databases Chris - - PowerPoint PPT Presentation
Real-time Analytics Powered by GPU-Accelerated Databases Chris Prendergast and Woody Christy GTC, May 8, 2017 Kinetica Background United States Army Intelligence seeks a means GPUdb goes live Commercialization to assess terrorist and other
Wins IDC HPC Innovation Excellence Award for work with US Postal Service.
Kinetica Background
2009 2012
United States Army Intelligence seeks a means to assess terrorist and other national security threats. No database in the market was fast or flexible enough to met their needs. Founders Amit Vij and Nima Negahban start on the pioneering use of GPUs while building a GPU-accelerated database from the ground up.
2014
Commercialization entered production with USPS.
2016
Rebranded to Kinetica. Seed funding. Moved HQ to San Francisco. Expanded management
- team. Hired field team.
Wins IDC HPC Innovation Excellence Award for work with US Army. GPUdb goes live with the US Army Intelligence. Patent granted for “Method and system for improving computational concurrency using a multi-threaded GPU calculation engine” 2 2
Evolution of Analytics
3
Simple Reporting
Standard Analytics
Real-time Analytics
Machine Learning Deep Learning
List customer energy consumption in the past 3 years What is the average consumption by region monthly? Per household? Residential
- vs. Commercial?
What is the current energy consumption by a region / household? How does that compare to historic averages? How does it compare to
- ther regions?
Given location, history, demographic, , usage, what is the likelihood of service issues/outage? Deduce from unspecified signals across a wide range of datasets the likelihood this customer will consume more/less energy? Have service interruption? GPU Acceleration
GPU Acceleration Overcomes Processing Bottlenecks
4
4,000+ cores per device in many cases, versus 16 to 32 cores per typical CPU-based device. High performance computing trend to using GPU’s to solve massive processing challenges GPU acceleration brings high performance compute to commodity hardware Parallel processing is ideal for scanning entire dataset & brute force compute.
GPUs are designed around thousands of small, efficient cores that are well suited to performing repeated similar instructions in parallel. This makes them well-suited to the compute-intensive workloads required of large data sets.
Kinetica: A Distributed, In-Memory Database
5
GPU-accelerated database operations Natural language processing based full-text search Native GIS and IP- address object support Real time data handlers to ingest structured and unstructured data
Deep integration with open source and commercial frameworks and applications: Hadoop, Spark, NiFi, Accumulo, H20, Tableau, Kibana and Caravel
Predictable scale out for data ingestion and querying No typical tuning, indexing, and tweaking Distributed visualization pipeline built in
Kinetica: Unique Strengths & Capabilities
Fast, Distributed, OLAP Engine for Fast Moving, Large Scale Data
6
OLAP Performance, Scalability, Stability Geospatial Processing & Visualization API for GPU Powered Data & Compute Orchestration
Converged AI and BI Native Geospatial and Visualization Pipeline Fast Data In-Database Analytics Interactive Location-Based Analytics
Database or Cache system serving up pre-computed aggregates
It also takes a lot of effort to re-compute aggregates and to load the serving database or cache
What is the main problem?
Challenges with Lambda and Kappa Architectures
7
Performance BI
0.09s 2.5s
Query 2 : Sum aggregation with a subquery aggregation joining both tables LARGE TELCO
Leading Enterprise Database
8
345s 44s 0.65s 0.68s CASE STUDY Leading Enterprise Database
Query 1 : Simple average calculation on the 1.8B row table
Real-Time, Advanced Analytics, Speed Layer for Teradata or Oracle
9
Parallel ingestion of events Lambda-type architecture for Teradata or Oracle Kinetica is speed layer with real-time analytic capabilities for millisecond SLAs Converge Machine Learning, Deep Learning, NLP, streaming and location analytics and fast Query, Reporting & Analytics with Kinetica & Teradata/Oracle
DATA IN MOTION AND REST
DATA WAREHOUSE / TRANSACTIONAL Amazon Kinesis ANALYSTS MOBILE USERS DASHBOARDS & APPLICATIONS ALERTING SYSTEMS
Kinetica Connectors
STREAM / ETL PROCESSING
Fast GPU accelerated, in- Memory Database Converge ML, DL, Streaming, Location, and QR&A
Speed Layer for Hadoop
10 Parallel Ingestion
Parallel ingestion of events Kinetica is speed layer with real- time analytic capabilities HDFS for archival store Much looser coupling than traditional lambda architecture Batch mode Spark or MR jobs can push data to Kinetica as needed for fast query on data loaded from HDFS
EVENTS MESSAGE BROKERS Amazon Kinesis ANALYSTS MOBILE USERS DASHBOARDS & APPLICATIONS ALERTING SYSTEMS
Put, get, scan Execute complex analytics on the fly Kinetica Connectors
STREAM PROCESSING
° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° °
HDFS
(Hadoop Distributed File System)
- No need to regularly recompute aggregates.
- No need to load and manage a separate serving system or cache to make deep historical aggregates available to
your stream processing code.
- Aggregates are always up to date, as they are computed on demand; the latest events are always included
- Better performance with significantly reduced operational complexity, hardware footprint and cost.
SIMPLIFY YOUR ARCHITECTURE
STREAMING ANALYTICS, SIMPLIFIED
EVENTS MESSAGE BROKERS
Amazon Kinesis
ANALYSTS MOBILE USERS DASHBOARDS & APPLICATIONS ALERTING SYSTEMS
PUT, GET, SCAN Execute complex analytics on the fly Kinetica Connectors
STREAM PROCESSING
INTELLIGENCE: US Army - INSCOM
US Army’s in-memory computational engine for any data with a geospatial or temporal attribute for a major joint cloud initiative within the Intelligence Community (IC ITE). Intel analysts are able to conduct near real-time analytics and fuse SIGINT, ISR, and GEOINT streaming big data feeds and visualize in a web browser. First time in history military analysts are able to query and visualize billions to trillions of near real- time objects in a production environment. Major executive military and congressional visibility.
Oracle Spatial (92 Minutes) 42x Lower Space 28x Lower Cost 38x Lower Power Cost U.S Army INSCOM Shift from Oracle to GPUdb GPUdb (20ms) 1 GPUdb server vs 42 servers with Oracle 10gR2 (2011)
CASE STUDY : LOCATION BASED ANALYTICS
LOGISTICS: Route optimization
DISTRIBUTED ANALYSIS AT SCALE
200,000 USPS devices emitting location each minute à 250+ million events captured and analyzed daily… ...... tracked on 10 nodes.
USPS is the single largest logistic entity in the country, moving more individual items in four hours than the combination of UPS, FedEx, and DHL move all year.
CASE STUDY : LOCATION BASED ANALYTICS 15,000 simultaneous sessions
PREDICTIVE INFRASTRUCTURE MANAGEMENT
15
Kinetica operates as a speed-layer with ESRI to monitor, manage, and predict infrastructure health.
LARGE UTILITY COMPANY
CASE STUDY : LOCATION BASED ANALYTICS
LOGISTICS & FLEET MANAGEMENT
16
Kinetica enables agile tracking of shipments to assist store managers for tracking of inventory and arrival times.
- Visibility and tracking of deliveries & trucks for store
managers
- ETA & Notifications – Provide estimated time of delivery,
notifications and custom location based alerting
- Route Optimization based on truck size, and if cargo is
perishable or contains hazardous materials. LARGE RETAILER
CASE STUDY : LOCATION BASED ANALYTICS
PIPELINE & WELL ANALYTICS
17
Kinetica enables interactive query and geospatial visualization of large numbers of upstream and midstream assets.
- Complex joins across several tables with 300m
rows of data. Approx 100GB in size.
- Create custom visualizations, charts.
- Visualization of wells by land ownership,
region, etc. ENERGY RESEARCH
CASE STUDY : LOCATION BASED ANALYTICS
LIFE SCIENCES : GENOMICS RESEARCH
CASE STUDY : ADVANCED IN-DATABASE ANALYTICS
18
GPU-acceleration on Kinetica enables processing of transcriptomics to run simulations for drug research.
- Seeking out signals from massive collection of drug targets combined with historical data.
- Accelerate simulations of chemical reactions.
- In-database processing to develop models, leveraging GPU acceleration for performance, and
direct access to CUDA APIs via UDFs deployed within Kinetica.
One of the things I like about Kinetica is it gives us more of a general-purpose use of the
- technology. There has been a lot of software created to answer certain questions [but] highly
specialized tools have limited functionality and are tuned to do a certain workload.
"
Mark Ramsey, Chief Data Officer at GSK
RISK MANAGEMENT
19
Large financial institution moves counterparty risk analysis from overnight to real-time.
- Data collected by XVA library which computes risk
metrics for each trade
- Risk computations are becoming more complex and
computationally heavy. xVA analysis needs to project years into the future.
- Kinetica enables banks to move from batch/overnight
analysis to a streaming/real-time system for flexible real-time monitoring by traders, auditors and management. MULTINATIONAL BANK
CASE STUDY : ADVANCED IN-DATABASE ANALYTICS
Faster Analytics on Inventory and Sales
0.65s 0.68s
LARGE RETAILER
Enterprise In-Mem DB
20
34s 44s 0.65s 0.68s CASE STUDY Enterprise In-Mem DB
Query 1 : Sum of retail sales grouped by region Query 2 : Sum of inventory available grouped by type