How we run SQL queries in-memory when available memory is - - PowerPoint PPT Presentation

how we run sql queries in memory when available memory is
SMART_READER_LITE
LIVE PREVIEW

How we run SQL queries in-memory when available memory is - - PowerPoint PPT Presentation

How we run SQL queries in-memory when available memory is constrained with Kognitio analytical query streaming Roger Gaskell CEO Andrew Maclean - CTO 1 The problem with in-memory is there is never enough memory. 2 Who is Kognitio


slide-1
SLIDE 1

How we run SQL queries in-memory when available memory is constrained

with Kognitio analytical query streaming

1

Roger Gaskell – CEO Andrew Maclean - CTO

slide-2
SLIDE 2

2

The problem with in-memory is… …there is never enough memory.

slide-3
SLIDE 3

3

Who is Kognitio

Originally founded in 1988 as White Cross Systems (later merged with Kognitio), focused on developing a database that could support high speed data analytics… …where data would be held in computer memory… …in a Shared nothing MPP (Massively Parallel Processing)

slide-4
SLIDE 4

4

Quick intro to Kognitio

In-memory analytical platform

  • Provides ultra-fast high

concurrency SQL for big data

  • Sophisticated support for

embedding Non-SQL programs in any language

  • High concurrency, mixed

work loads Massively parallel processing

  • Architected as scalable,

shared nothing, massively parallel processing

  • Data of interest held in-

memory – queries satisfied exclusively in memory

  • Sits between where the

data is stored and the data analysis tools and applications Many deployment options

  • Standalone Linux compute

cluster or existing Hadoop cluster

  • On-premise or in the cloud
slide-5
SLIDE 5

5

Architecture

Hive tables / HDFS file system Local attached disk or NAS / Kognitio Linear File System

External data sources Kognitio analytical platform layer Application & client layer

Queries Results Analytics

Cloud storage Other Hadoop clusters Data warehouses and legacy systems Data feeds

Query coordinator Processing Persistent memory images Kognitio

Persistence layer

slide-6
SLIDE 6

6

When is Kognitio used?

  • 0.5TB – 100TB
  • 100million – trillions of records
  • Conventional technologies struggling to provide the required

performance

Large data volumes

  • Client needs high-speed, interactive, ad-hoc analytics often using

visualization tools like Qlik, Tableau, PowerBI, Microstrategy

  • High query throughput – data as a service

Need for speed

  • Pervasive or Self-serve BI & analytics
  • Data-as-a-service applications

High concurrency, mixed workload

slide-7
SLIDE 7

7

Never enough memory

Available memory Data Work Space

select c.region_name, count(*), sum(o.price) from customers c, orders o where c.id = o.customer_id group by 1

slide-8
SLIDE 8

8

“We love the speed but the ‘out of memory’ errors (when the system is busy or the query involves too much data) are very frustrating”

Early customer feedback

slide-9
SLIDE 9

Possible approaches

9

Page to disk

  • Very slow
  • Can slow down queries even when

there is plenty of work-space

  • Requires available disk space

Statically divide workspace

  • Limits concurrency
  • Inefficient use of workspace
  • Individual work-space can be exhausted

while others are unused

Kognitio query streaming

  • Dynamic allocation of workspace
  • Dynamic re-sizing as load changes
  • In-memory makes re-computation of

intermediate results very fast

  • Re-compute from raw data used to

cope with constrained work-space

  • Never return out of memory errors

Session 1 Session 2 Session 3 Session 4 Session 5 Session 6 Session 7 Session 8

slide-10
SLIDE 10

10

Kognitio Query Streaming

select c.region_name, count(*), sum(o.price) from customers c, orders o where c.id = o.customer_id group by 1

Conventional Plan Streaming Plan

Customer table distributed on customer.id

slide-11
SLIDE 11

11

Kognitio Query Streaming

select c.region_name, count(*), sum(o.price) from customers c, orders o where c.id = o.customer_id group by 1

Conventional Plan Streaming Plan

Customer table NOT distributed on customer.id

slide-12
SLIDE 12

12

How this looks

slide-13
SLIDE 13

13

Each node optimising locally

slide-14
SLIDE 14

14

slide-15
SLIDE 15

15

Example use case

Retail data

Inmar Hadoop Cluster Kognitio on Hadoop SQL with embedded R processing

data in Hive ORC files

data pinned in memory

Clients pay to perform interactive ad-hoc retail analytics

  • n billions of POS transactions
slide-16
SLIDE 16

Product Evolution

16

1990 – 1st Gen In-memory Database Appliance “Transputer” based 1996 – 2nd Gen In-memory Database Appliance “x86” based 2003 – 3rd Gen Software only Commodity Servers

slide-17
SLIDE 17

17

þ kognitio.com

linkedin.com/company/kognitio USA: +1 855 KOGNITIO UK: +44 1344 300770 twitter.com/kognitio youtube.com/kognitio

Hadoop is the only BI platform you need, with ultra-fast, high-concurrency SQL

facebook.com/kognitio