MapD #mapd @datarefined www.map-d.com 180 Sansome St. Todd Mostak - - PowerPoint PPT Presentation

mapd
SMART_READER_LITE
LIVE PREVIEW

MapD #mapd @datarefined www.map-d.com 180 Sansome St. Todd Mostak - - PowerPoint PPT Presentation

MapD #mapd @datarefined www.map-d.com 180 Sansome St. Todd Mostak todd@map-d.com @datarefined San Francisco, CA 94104 super-fast database MapD? built into GPU memory worlds fastest Do? real-time big data analytics interactive


slide-1
SLIDE 1

www.map-d.com @datarefined Todd Mostak todd@map-d.com Ι Ι 180 Sansome St. San Francisco, CA 94104

#mapd @datarefined

MapD

slide-2
SLIDE 2

MapD?

super-fast database built into GPU memory

Do?

world’s fastest real-time big data analytics interactive visualization

Demo?

twitter analytics platform 1billion+ tweets millisecond response time

slide-3
SLIDE 3

The importance of interactivity

People have struggled for a long time to build interactive visualizations of big data that can deliver insight

  • Hypothesis testing can occur at “speed of thought”

Interactivity means: How Interactive is interactive enough?

  • According to a study by Jeffrey Heer and Zhicheng Liu, “an injected

delay of half a second per operation adversely affects user performance in exploratory data analysis.”

  • Some types of latency are more detrimental than others:
  • For example, linking and brushing more sensitive than zooming
slide-4
SLIDE 4

The Arrival of In-Memory Systems

  • Traditional RDBMS used to be too slow to serve as a back-end

for interactive visualizations.

  • Queries of over a billion records could take minutes if not

hours

  • But in-memory systems can execute such queries in a fraction
  • f the time.
  • Both full DBMS and “pseudo”-DBMS solutions
  • But still often too slow
slide-5
SLIDE 5

Enter Map-D

slide-6
SLIDE 6

the technology

slide-7
SLIDE 7

Core Innovation

SQL-enabled column store database built into the memory architecture on GPUs and CPUs

  • Memory and computational bandwidth of multiple GPUs
  • Heterogeneous architectures (CPUs and GPUs)
  • Fast RDMA between GPUs on different nodes
  • GPU Graphics pipeline

System can scan data at > 2TB/sec per node, with > 10TB/sec per node logical throughput with shared scans Code developed from scratch to take advantage of: Two-level buffer pool across GPU and CPU memory Shared scans – multiple queries of the same data can share memory bandwidth

slide-8
SLIDE 8

The Hardware

S1 CPU 0 CPU 1 RAID Controller GPU 0 S2 S3 S4 GPU 3 GPU 2 GPU 1 IB IB

QPI PCI PCI

S1 CPU 0 CPU 1 RAID Controller GPU 0 S2 S3 S4 GPU 3 GPU 2 GPU 1 IB IB

QPI PCI PCI

Switch Node 0 Node 1

slide-9
SLIDE 9

The Two-Level Buffer Pool

GPU Memory CPU Memory SSD

slide-10
SLIDE 10

Multiple GPUs, with data partitioned between them Node 1 Node 2 Node 3 Filter

text ILIKE ‘rain’

Filter

text ILIKE ‘rain’

Filter

text ILIKE ‘rain’

Shared Nothing Processing

slide-11
SLIDE 11

the product

slide-12
SLIDE 12

Complex Analytics

GPU in-memory SQL database

Visualization Image processing OpenGL H.264/VP8 streaming GPU pipeline Machine learning Graph analytics Scale to cluster of GPU nodes SQL compiler Shared scans User defined functions Hybrid GPU/CPU execution OpenCL and CUDA

License

Simple # of GPUs Mobile/server versions

Product GPU powered end-to-end big data analytics and visualization platform

slide-13
SLIDE 13

Map-D code

Single GPU 12GB memory Map-D code integrated into GPU memory Single CPU 768GB memory Map-D code integrated into CPU memory NVIDIA TEGRA Mobile chip 4GB memory Map-D code integrated into chip memory 8 cards = 4U box 4 sockets = 4U box Map-D code runs on GPU + CPU memory 36U rack: ~400GB GPU ~12TB CPU Mobile Map-D running small datasets Native App Web-based service

MapD hardware architecture

Large Data Big Data Small Data

Next Gen Flash 40TB 100GB/s

slide-14
SLIDE 14

www.map-d.com @datarefined info@map-d.com

MapD