In-memory In-memory Analytical Platforms
Aditya Satyadev Director, BizAcuity Solutions Sachin Sangtani Senior Technical Consultant, Kognitio
In-memory In-memory Analytical Platforms Aditya Satyadev - - PowerPoint PPT Presentation
In-memory In-memory Analytical Platforms Aditya Satyadev Director, BizAcuity Solutions Sachin Sangtani Senior Technical Consultant, Kognitio Agenda - update Introduction Context In-memory Analytics Benefits of adoption
Aditya Satyadev Director, BizAcuity Solutions Sachin Sangtani Senior Technical Consultant, Kognitio
Agenda - update
2
Introduction
3
Analytics - Traditional Approach..
4
Analytics – Advanced Approach..
5
Analytics – In-Memory Approach..
In-Memory Platforms
6
This is a test…
<begin style=“gratuitous advertising”> In the event of a real data emergency, please proceed to your In the event of a real data emergency, please proceed to your nearest in-memory database vendor and purchase large capacity of licenses <end style=“gratuitous advertising”>
7
Gartner Views
Media Tablets and Beyond Mobile-centric Applications and Interfaces Contextual and Social User Experience Internet Of Things Appstores and Marketplaces Next-generation Analytics Next-generation Analytics Big Data & The Logical Datawarehouse In-memory Computing Extreme Low-energy Servers Cloud Computing
8
In-Memory Analytical Platforms
Faster Iteration
Faster Iteration Through Requirements
9
Business Value
Data Quality Data Profiling Optimization Data Quality Data Profiling Optimization Data Management Reporting OLAP Forecasting Predictive Modeling Current State Desired State Data Management Reporting OLAP Forecasting Predictive Modeling Current State
10
Where does all the time, effort AND MONEY go?
Data Data Reporting & Data Quality Data Profiling Data Management Reporting & OLAP
Modeling
11
Where SHOULD all the time, effort AND MONEY go?
Data Quality Data Profiling Data Management Reporting & OLAP In Memory Analytical
Data Management Reporting & OLAP Forecasting Predictive Modeling Optimization
In Memory Analytical Platform
12
Typical Analysis/Reporting Query
– BETWEEN – NOT EQUAL TO
D.Demog_Group, D.Demog_Desc, 1+avg(F.Credit_Limit_Changes) CL_Issued, sum(case when T.Trans_Type='C' then T.Transaction_Amount else 0 end) - sum(case when T.Trans_Type='D' then T.Transaction_Amount else 0 end) Balance, sum(case when T.Trans_Type='C' then T.Transaction_Amount else 0 end) Total_Credit, sum(case when T.Trans_Type='D' then T.Transaction_Amount else 0 end) Total_Debit, min(case when T.Trans_Type='C' then date '2009-11-15' - T.Effective_Date else 365*10 end) Days_Last_Credit, min(case when T.Trans_Type='D' then date '2009-11-15' - T.Effective_Date else 365*10 end) Days_Last_Debit from DEMO_FS.V_FIN_ACCOUNT F, DEMO_FS.V_FIN_CLIENT C, DEMO_FS.V_FIN_CLIENT_ACCOUNT_LINK L, DEMO_FS.V_FIN_ADD_CLIENT A, DEMO_FS.V_FIN_DEMOG_DESCS D, DEMO_FS.V_FIN_CC_TRANS T,
select Account_ID, count(Trans_Year) Years_Present, sum(No_Trans) No_Trans, sum(Total_Spend) Total_Spend, case count(Trans_Year) when 1 then 'One-off' else 'Repeat‘ end Behavior_Flag from ( select * from
– NOT EQUAL TO – IN
1. >1 Day 2. 1 Day 3. Hours 4. Rewrite and pre-aggregate 5. Few Minutes 6. Few Seconds
select * from ( select Account_ID, Extract(Year from Effective_Date) Trans_Year, count(Transaction_ID) No_Trans, sum(Transaction_Amount) Total_Spend, avg(Transaction_Amount) Avg_Spend from DEMO_FS.V_FIN_CC_TRANS where extract(year from Effective_Date)<2009 and Trans_Type='D' and Account_ID<>9025011 and actionid in ( select actionid from DEMO_FS.V_FIN_actions where actionoriginid =1) group by Account_ID, Extract(Year from Effective_Date ) ) Acc_Summary where No_Trans in (3,4,5,6) and Avg_Spend>1000 and Trans_Year between 2004 and 2008 ) Target_Accs group by Account_ID ) Campaign_Grouping where Campaign_Grouping.Account_ID=L.Account_ID and L.Client_ID=C.Client_ID and C.Client_ID=A.Client_ID and A.Demog_Code=D.Demog_Code and D.Demog_code in (1,4,5,9,10,11,50,55) and Campaign_Grouping.Account_ID=F.Account_ID and Campaign_Grouping.Account_ID=T.Account_ID and T.Effective_Date < date '2009-11-15' group by C.Client_ID, Demog_Group, Demog_Desc
13
Word Problem
team globally. The report requires top down drill paths to go from global numbers down to the transactional detail for every sales person, comparing it to prior year numbers across five different measures. What would the typical IT response be?
14
Business Analysis & Development
data be made available intraday to monitor risk rating changes and re-compute probability of default based on a complex algorithm and alert users via push notifications within 3 minutes.
15
What to look for in an In-Memory database
Use of RAM Accessing RAM Effectively And Efficiently Efficient Core to RAM Ratio Memory Management Messaging and Networking True Symmetric MPP Architecture Simplicity & Maturity Stop Following Me!
16
Use of RAM
RAM Load data Hard Disk Performance baseline: Load speeds into memory in excess of 13TB/hour
17
Accessing RAM effectively
Compiler/Interpreter SELECT state, count(id) FROM table GROUP BY state HAVING count(id) > 50000; Machine Code ID Name Description Zip State Machine Code Performance baseline: 90% less code by going directly to machine code
18
Core to RAM Ratio
Cores RAM Performance baseline: Ideally 4-12 GB/core in memory
19
Memory Management
SELECT state, count(id) FROM table HAVING GROUP BY state HAVING count(id) > 50000; HAVING SELECT GROUP BY Temp Space On Disk SELECT GROUP BY
20
Simplicity & Maturity
Pre Ordering data on load Ecosystem Plug and Play Near Linear Scalability Multi-terabyte Multi-node Optimizer Hints Pre-Aggregation Disk I/O Contention Maturity (20+ years vs 1) Tune your BI tool Simple to install/administer Indexes Projections Temp Space on Disk Pre Ordering data on load Presupposing order of data Ecosystem Plug and Play Caching Partitioning strategies Science Projects
21
Recap
22
Typical Analysis/Reporting Query
– BETWEEN – NOT EQUAL TO
D.Demog_Group, D.Demog_Desc, 1+avg(F.Credit_Limit_Changes) CL_Issued, sum(case when T.Trans_Type='C' then T.Transaction_Amount else 0 end) - sum(case when T.Trans_Type='D' then T.Transaction_Amo else 0 end) Balance, sum(case when T.Trans_Type='C' then T.Transaction_Amount else 0 end) Total_Credit, sum(case when T.Trans_Type='D' then T.Transaction_Amount else 0 end) Total_Debit, min(case when T.Trans_Type='C' then date '2009-11-15' - T.Effective_Date else 365*10 end) Days_Last_Credit, min(case when T.Trans_Type='D' then date '2009-11-15' - T.Effective_Date else 365*10 end) Days_Last_Debit from DEMO_FS.V_FIN_ACCOUNT F, DEMO_FS.V_FIN_CLIENT C, DEMO_FS.V_FIN_CLIENT_ACCOUNT_LINK L, DEMO_FS.V_FIN_ADD_CLIENT A, DEMO_FS.V_FIN_DEMOG_DESCS D, DEMO_FS.V_FIN_CC_TRANS T,
select Account_ID, count(Trans_Year) Years_Present, sum(No_Trans) No_Trans, sum(Total_Spend) Total_Spend, case count(Trans_Year) when 1 then 'One-off' else 'Repeat‘ end Behavior_Flag from ( select * from
– NOT EQUAL TO – IN
1. >1 Day 2. 1 Day 3. Hours 4. Rewrite and pre-aggregate 5. Few Minutes 6. Few Seconds
select * from ( select Account_ID, Extract(Year from Effective_Date) Trans_Year, count(Transaction_ID) No_Trans, sum(Transaction_Amount) Total_Spend, avg(Transaction_Amount) Avg_Spend from DEMO_FS.V_FIN_CC_TRANS where extract(year from Effective_Date)<2009 and Trans_Type='D' and Account_ID<>9025011 and actionid in ( select actionid from DEMO_FS.V_FIN_actions where actionoriginid =1) group by Account_ID, Extract(Year from Effective_Date ) ) Acc_Summary where No_Trans in (3,4,5,6) and Avg_Spend>1000 and Trans_Year between 2004 and 2008 ) Target_Accs group by Account_ID ) Campaign_Grouping where Campaign_Grouping.Account_ID=L.Account_ID and L.Client_ID=C.Client_ID and C.Client_ID=A.Client_ID and A.Demog_Code=D.Demog_Code and D.Demog_code in (1,4,5,9,10,11,50,55) and Campaign_Grouping.Account_ID=F.Account_ID and Campaign_Grouping.Account_ID=T.Account_ID and T.Effective_Date < date '2009-11-15' group by C.Client_ID, Demog_Group, Demog_Desc
23
Word Problem
team globally. The report requires top down drill paths to go from global numbers down to the transactional detail for every sales person, comparing it to prior year numbers across five different measures. What would the typical IT response be?
24
Business Analysis & Development
data be made available intraday to monitor risk rating changes and re-compute probability of default based on a complex algorithm and alert users via push notifications within 3 minutes.
25
Analytical Platform Reference Architecture
Kognitio
BI Tools OLAP Clients Excel
Application & Client Layer ANSI SQL (ODBC, JDBC) MDX ODBO & XMLA (Virtual Cubes) Queries Results Queries Results
MicroStrategy, Cognos, BO, Alteryx, T ableau, neutrinoBI, LogiXML, etc. arcplan, Nova View, PolyVist a, etc.
Near-line storage (optional)
CPU Cores CPU Cores CPU Cores CPU Cores CPU Cores CPU Cores CPU Cores CPU Cores
Kognitio Analytical Platform Reporting Data Feeds
RAM
CPU Cores
RAM
CPU Cores
RAM RAM
CPU Cores
RAM
CPU Cores
RAM RAM
CPU Cores
RAM
CPU Cores
RAM RAM
CPU Cores
RAM
CPU Cores
RAM
Legacy Systems
Persistence Layer
Enterprise Data Warehouses
Data Data Data
Hadoop Clusters 26
Evaluation considerations
– 13TB/Hour
– Version of product; time in market; proven technology vs marketing hype
– A recent live test with a 72 blade server appliance scanned 1.8 trillion records in 38s!
– No CPU cycles wasted on compression/decompression, rearranging data into columns, managing indexes, etc.
– No special tuning
– Vendor lockin
– 70Mb/Sec vs 6300Mb/Sec
– Appliance limitations
27
Rethink Everything
28
Q&A
29
Thank you!
30