In-memory In-memory Analytical Platforms Aditya Satyadev - - PowerPoint PPT Presentation

in memory in memory analytical platforms
SMART_READER_LITE
LIVE PREVIEW

In-memory In-memory Analytical Platforms Aditya Satyadev - - PowerPoint PPT Presentation

In-memory In-memory Analytical Platforms Aditya Satyadev Director, BizAcuity Solutions Sachin Sangtani Senior Technical Consultant, Kognitio Agenda - update Introduction Context In-memory Analytics Benefits of adoption


slide-1
SLIDE 1

In-memory In-memory Analytical Platforms

Aditya Satyadev Director, BizAcuity Solutions Sachin Sangtani Senior Technical Consultant, Kognitio

slide-2
SLIDE 2

Agenda - update

  • Introduction
  • Context
  • In-memory Analytics
  • Benefits of adoption
  • About BizAcuity and Kognitio

2

slide-3
SLIDE 3

Introduction

  • Aditya Satyadev – BizAcuity
  • Sachin Sangtani - Kognitio

3

slide-4
SLIDE 4

Analytics - Traditional Approach..

  • Enterprise Data Warehouse
  • Monolithic Applications
  • MPP Architecture
  • BI Applications (Query Based , Caching, Daily Refresh)
  • Columnar Storage
  • ACID (Atomicity, Consistency, Isolation, Durability)
  • ACID (Atomicity, Consistency, Isolation, Durability)

4

slide-5
SLIDE 5

Analytics – Advanced Approach..

  • In-Memory BI Tools
  • Complex BI Application Design
  • OLAP Cubes
  • Dimensional Modeling
  • Additional Maintenance of Cube

5

slide-6
SLIDE 6

Analytics – In-Memory Approach..

In-Memory Platforms

  • Kognitio
  • SAP HANA
  • Oracle Exalytics
  • ParAccel
  • ParAccel
  • EXASOL
  • ..

6

slide-7
SLIDE 7

This is a test…

<begin style=“gratuitous advertising”> In the event of a real data emergency, please proceed to your In the event of a real data emergency, please proceed to your nearest in-memory database vendor and purchase large capacity of licenses <end style=“gratuitous advertising”>

7

slide-8
SLIDE 8

Gartner Views

Media Tablets and Beyond Mobile-centric Applications and Interfaces Contextual and Social User Experience Internet Of Things Appstores and Marketplaces Next-generation Analytics Next-generation Analytics Big Data & The Logical Datawarehouse In-memory Computing Extreme Low-energy Servers Cloud Computing

8

slide-9
SLIDE 9

In-Memory Analytical Platforms

  • Lower Latency
  • Eliminate Maintenance Windows

Faster Iteration

Higher Sophistication Around Data

  • Timeliness

Faster Iteration Through Requirements

9

slide-10
SLIDE 10

Business Value

Data Quality Data Profiling Optimization Data Quality Data Profiling Optimization Data Management Reporting OLAP Forecasting Predictive Modeling Current State Desired State Data Management Reporting OLAP Forecasting Predictive Modeling Current State

10

slide-11
SLIDE 11

Where does all the time, effort AND MONEY go?

Data Data Reporting & Data Quality Data Profiling Data Management Reporting & OLAP

  • Forecasting
  • Predictive

Modeling

  • Optimization

11

slide-12
SLIDE 12

Where SHOULD all the time, effort AND MONEY go?

Data Quality Data Profiling Data Management Reporting & OLAP In Memory Analytical

Data Management Reporting & OLAP Forecasting Predictive Modeling Optimization

In Memory Analytical Platform

  • Data Quality
  • Data Profiling
  • Data Management

12

slide-13
SLIDE 13

Typical Analysis/Reporting Query

  • 11 Billion Row Fact Table
  • Six Tables
  • 4 Inline Nested Subqueries
  • Multiple Passes Through Fact Table
  • Aggregations/Group By
  • Numerous Predicates, including:

– BETWEEN – NOT EQUAL TO

  • - Balance information of targeted accounts obtained from transaction table
  • select C.Client_ID,

D.Demog_Group, D.Demog_Desc, 1+avg(F.Credit_Limit_Changes) CL_Issued, sum(case when T.Trans_Type='C' then T.Transaction_Amount else 0 end) - sum(case when T.Trans_Type='D' then T.Transaction_Amount else 0 end) Balance, sum(case when T.Trans_Type='C' then T.Transaction_Amount else 0 end) Total_Credit, sum(case when T.Trans_Type='D' then T.Transaction_Amount else 0 end) Total_Debit, min(case when T.Trans_Type='C' then date '2009-11-15' - T.Effective_Date else 365*10 end) Days_Last_Credit, min(case when T.Trans_Type='D' then date '2009-11-15' - T.Effective_Date else 365*10 end) Days_Last_Debit from DEMO_FS.V_FIN_ACCOUNT F, DEMO_FS.V_FIN_CLIENT C, DEMO_FS.V_FIN_CLIENT_ACCOUNT_LINK L, DEMO_FS.V_FIN_ADD_CLIENT A, DEMO_FS.V_FIN_DEMOG_DESCS D, DEMO_FS.V_FIN_CC_TRANS T,

  • Query to produce campaign planning
  • (

select Account_ID, count(Trans_Year) Years_Present, sum(No_Trans) No_Trans, sum(Total_Spend) Total_Spend, case count(Trans_Year) when 1 then 'One-off' else 'Repeat‘ end Behavior_Flag from ( select * from

– NOT EQUAL TO – IN

1. >1 Day 2. 1 Day 3. Hours 4. Rewrite and pre-aggregate 5. Few Minutes 6. Few Seconds

select * from ( select Account_ID, Extract(Year from Effective_Date) Trans_Year, count(Transaction_ID) No_Trans, sum(Transaction_Amount) Total_Spend, avg(Transaction_Amount) Avg_Spend from DEMO_FS.V_FIN_CC_TRANS where extract(year from Effective_Date)<2009 and Trans_Type='D' and Account_ID<>9025011 and actionid in ( select actionid from DEMO_FS.V_FIN_actions where actionoriginid =1) group by Account_ID, Extract(Year from Effective_Date ) ) Acc_Summary where No_Trans in (3,4,5,6) and Avg_Spend>1000 and Trans_Year between 2004 and 2008 ) Target_Accs group by Account_ID ) Campaign_Grouping where Campaign_Grouping.Account_ID=L.Account_ID and L.Client_ID=C.Client_ID and C.Client_ID=A.Client_ID and A.Demog_Code=D.Demog_Code and D.Demog_code in (1,4,5,9,10,11,50,55) and Campaign_Grouping.Account_ID=F.Account_ID and Campaign_Grouping.Account_ID=T.Account_ID and T.Effective_Date < date '2009-11-15' group by C.Client_ID, Demog_Group, Demog_Desc

  • rder by Days_Last_Debit;

13

slide-14
SLIDE 14

Word Problem

  • A user is running sales scorecards for a 3000+ person sales

team globally. The report requires top down drill paths to go from global numbers down to the transactional detail for every sales person, comparing it to prior year numbers across five different measures. What would the typical IT response be?

  • 1. Run on the weekends only please
  • 2. Build pre-aggregated tables; will take 3 months
  • 3. Build extracts to load into Excel
  • 4. You can build these yourself and run them on demand

14

slide-15
SLIDE 15

Business Analysis & Development

  • A user has asked that multiple million rows of facility level

data be made available intraday to monitor risk rating changes and re-compute probability of default based on a complex algorithm and alert users via push notifications within 3 minutes.

  • 1. Yeah and monkeys might fly out of my @^%$*^*&
  • 2. $$$$$$$$$$$$$$ ( * Rs. 60)
  • 3. Piece of cake…

15

slide-16
SLIDE 16

What to look for in an In-Memory database

Use of RAM Accessing RAM Effectively And Efficiently Efficient Core to RAM Ratio Memory Management Messaging and Networking True Symmetric MPP Architecture Simplicity & Maturity Stop Following Me!

16

slide-17
SLIDE 17

Use of RAM

RAM Load data Hard Disk Performance baseline: Load speeds into memory in excess of 13TB/hour

17

slide-18
SLIDE 18

Accessing RAM effectively

Compiler/Interpreter SELECT state, count(id) FROM table GROUP BY state HAVING count(id) > 50000; Machine Code ID Name Description Zip State Machine Code Performance baseline: 90% less code by going directly to machine code

18

slide-19
SLIDE 19

Core to RAM Ratio

Cores RAM Performance baseline: Ideally 4-12 GB/core in memory

19

slide-20
SLIDE 20

Memory Management

SELECT state, count(id) FROM table HAVING GROUP BY state HAVING count(id) > 50000; HAVING SELECT GROUP BY Temp Space On Disk SELECT GROUP BY

20

slide-21
SLIDE 21

Simplicity & Maturity

Pre Ordering data on load Ecosystem Plug and Play Near Linear Scalability Multi-terabyte Multi-node Optimizer Hints Pre-Aggregation Disk I/O Contention Maturity (20+ years vs 1) Tune your BI tool Simple to install/administer Indexes Projections Temp Space on Disk Pre Ordering data on load Presupposing order of data Ecosystem Plug and Play Caching Partitioning strategies Science Projects

21

slide-22
SLIDE 22

Recap

  • Use of RAM
  • Accessing RAM effectively
  • Core to RAM Ratio
  • Memory Management
  • True Symmetric MPP
  • Messaging and Networking
  • Messaging and Networking
  • Simplicity

22

slide-23
SLIDE 23

Typical Analysis/Reporting Query

  • 11 Billion Row Fact Table
  • Six Tables
  • 4 Inline Nested Subqueries
  • Multiple Passes Through Fact Table
  • Aggregations/Group By
  • Numerous Predicates, including:

– BETWEEN – NOT EQUAL TO

  • - Balance information of targeted accounts obtained from transaction table
  • select C.Client_ID,

D.Demog_Group, D.Demog_Desc, 1+avg(F.Credit_Limit_Changes) CL_Issued, sum(case when T.Trans_Type='C' then T.Transaction_Amount else 0 end) - sum(case when T.Trans_Type='D' then T.Transaction_Amo else 0 end) Balance, sum(case when T.Trans_Type='C' then T.Transaction_Amount else 0 end) Total_Credit, sum(case when T.Trans_Type='D' then T.Transaction_Amount else 0 end) Total_Debit, min(case when T.Trans_Type='C' then date '2009-11-15' - T.Effective_Date else 365*10 end) Days_Last_Credit, min(case when T.Trans_Type='D' then date '2009-11-15' - T.Effective_Date else 365*10 end) Days_Last_Debit from DEMO_FS.V_FIN_ACCOUNT F, DEMO_FS.V_FIN_CLIENT C, DEMO_FS.V_FIN_CLIENT_ACCOUNT_LINK L, DEMO_FS.V_FIN_ADD_CLIENT A, DEMO_FS.V_FIN_DEMOG_DESCS D, DEMO_FS.V_FIN_CC_TRANS T,

  • Query to produce campaign planning
  • (

select Account_ID, count(Trans_Year) Years_Present, sum(No_Trans) No_Trans, sum(Total_Spend) Total_Spend, case count(Trans_Year) when 1 then 'One-off' else 'Repeat‘ end Behavior_Flag from ( select * from

– NOT EQUAL TO – IN

1. >1 Day 2. 1 Day 3. Hours 4. Rewrite and pre-aggregate 5. Few Minutes 6. Few Seconds

select * from ( select Account_ID, Extract(Year from Effective_Date) Trans_Year, count(Transaction_ID) No_Trans, sum(Transaction_Amount) Total_Spend, avg(Transaction_Amount) Avg_Spend from DEMO_FS.V_FIN_CC_TRANS where extract(year from Effective_Date)<2009 and Trans_Type='D' and Account_ID<>9025011 and actionid in ( select actionid from DEMO_FS.V_FIN_actions where actionoriginid =1) group by Account_ID, Extract(Year from Effective_Date ) ) Acc_Summary where No_Trans in (3,4,5,6) and Avg_Spend>1000 and Trans_Year between 2004 and 2008 ) Target_Accs group by Account_ID ) Campaign_Grouping where Campaign_Grouping.Account_ID=L.Account_ID and L.Client_ID=C.Client_ID and C.Client_ID=A.Client_ID and A.Demog_Code=D.Demog_Code and D.Demog_code in (1,4,5,9,10,11,50,55) and Campaign_Grouping.Account_ID=F.Account_ID and Campaign_Grouping.Account_ID=T.Account_ID and T.Effective_Date < date '2009-11-15' group by C.Client_ID, Demog_Group, Demog_Desc

  • rder by Days_Last_Debit;

23

slide-24
SLIDE 24

Word Problem

  • A user is running sales scorecards for a 3000+ person sales

team globally. The report requires top down drill paths to go from global numbers down to the transactional detail for every sales person, comparing it to prior year numbers across five different measures. What would the typical IT response be?

  • 1. Run on the weekends only please
  • 2. Build pre-aggregated tables; will take 3 months
  • 3. Build extracts to load into Excel
  • 4. You can build these yourself and run them on demand

24

slide-25
SLIDE 25

Business Analysis & Development

  • A user has asked that multiple million rows of facility level

data be made available intraday to monitor risk rating changes and re-compute probability of default based on a complex algorithm and alert users via push notifications within 3 minutes.

  • 1. Yeah and monkeys might fly out of my @^%$*^*&
  • 2. $$$$$$$$$$$$$$
  • 3. Piece of cake…

25

slide-26
SLIDE 26

Analytical Platform Reference Architecture

Kognitio

BI Tools OLAP Clients Excel

Application & Client Layer ANSI SQL (ODBC, JDBC) MDX ODBO & XMLA (Virtual Cubes) Queries Results Queries Results

MicroStrategy, Cognos, BO, Alteryx, T ableau, neutrinoBI, LogiXML, etc. arcplan, Nova View, PolyVist a, etc.

Near-line storage (optional)

CPU Cores CPU Cores CPU Cores CPU Cores CPU Cores CPU Cores CPU Cores CPU Cores

Kognitio Analytical Platform Reporting Data Feeds

RAM

CPU Cores

RAM

CPU Cores

RAM RAM

CPU Cores

RAM

CPU Cores

RAM RAM

CPU Cores

RAM

CPU Cores

RAM RAM

CPU Cores

RAM

CPU Cores

RAM

Legacy Systems

Persistence Layer

Enterprise Data Warehouses

Data Data Data

Hadoop Clusters 26

slide-27
SLIDE 27

Evaluation considerations

  • Load speeds

– 13TB/Hour

  • Maturity

– Version of product; time in market; proven technology vs marketing hype

  • Scan speeds

– A recent live test with a 72 blade server appliance scanned 1.8 trillion records in 38s!

  • CPU Utilization

– No CPU cycles wasted on compression/decompression, rearranging data into columns, managing indexes, etc.

  • Load & Go
  • Load & Go

– No special tuning

  • Commodity hardware

– Vendor lockin

  • Disk vs DRAM

– 70Mb/Sec vs 6300Mb/Sec

  • Scale up and out to multi-terabyte instances

– Appliance limitations

  • RAM vs Cache
  • Memory Management

27

slide-28
SLIDE 28

Rethink Everything

Business Value Efficiency Quality Skillsets Performance

28

slide-29
SLIDE 29

Q&A

29

slide-30
SLIDE 30

Thank you!

30