Agenda Infobright Technology Overview Use Cases and Case Studies - PowerPoint PPT Presentation

The Database for Analytic Applications April 13, 2010 David Lutz Director, Technical Sales Consulting

Agenda  Infobright Technology Overview  Use Cases and Case Studies  Migration to Infobright  Getting Started

Infobright Innovation  First commercial open source analytic Cool Vendor in Data Management Partner of the Year 2009 database and Integration 2009  Knowledge Grid provides significant advantage over other columnar databases Infobright: Economic  Fastest time-to-value, simplest Data Warehouse Choice administration Strong Momentum & Adoption  Release 3.3 Generally Available  > 120 customers in 10 Countries  > 40 Partners on 6 continents  A vibrant open source community  > 1 million visitors  > 35,000 downloads  > 4,500 active community participants 3

Challenging Times More data  More online activity more web data  Growth of mobile more call data, web data  Servers/networks lots of log/event data With increasing value in the details  Target individual customers  Identify micro-segments  Find security threats  Identify fraud “Enterprise data growth over the next 5 years is estimated to be 650%.” Gartner

Challenging Times More requirements  More users  Diverse demands  More data sources With less  Time  Resources  Money “The universe of applications for which analytics is now an important component continues to expand.” Wells Fargo Equity Research

Analytic Infrastructure Requirements  Handles large data volumes with less cost and complexity  Meets business users needs  Fast query response – static and ad hoc queries  Fast access to new data  Access to detailed data, not just aggregates  Takes less IT time  Easy to implement  No complex hardware configuration  No index creation, data partitioning or manual tuning  Lower cost 6

Infobright Technology Infobright is a high performance analytic database that delivers fast query performance against large volumes of data with minimal IT effort 7

What is Unique about Infobright?  Uses intelligence , not hardware, to drive query performance:  Creates information about the data (metadata) upon load, automatically  Uses metadata to eliminate or reduce the need to access data to respond to a query  The less data that needs to be accessed, the faster the response  What this means to you:  No need to partition data, create/maintain indexes or tune for performance  ad hoc queries are as fast as static queries, so users have total flexibility  ad hoc queries that may take hours with other databases run in minutes; queries that take minutes with other databases run in seconds 8

Infobright and MySQL  Infobright is architected on MySQL, “the world’s most popular open source database”  Provides a simple scalability path for MySQL users and OEMs  No new management interface to learn  MySQL integration enables seamless connectivity to BI tools and MySQL drivers for ODBC, JDBC, C/C++, .NET, Perl, Python, PHP, Ruby, Tcl, etc. 9

Infobright Technology: Key Concepts 1. Column orientation 2. Data packs and Compression 3. Knowledge Grid 4. Optimizer 10

1. Column vs. Row Orientation Employee_ID Job Dept City 1 Shipping Operations Toronto 2 Receiving Operations Toronto 3 Accounting Finance Boston Data stored in rows Data stored in columns 1 1 Shipping Shipping Operations Operations Toronto Toronto 1 1 Shipping Shipping Operations Operations Toronto Toronto 2 2 Receiving Receiving Operations Operations Toronto Toronto 2 2 Receiving Receiving Operations Operations Toronto Toronto 3 3 Accounting Accounting Finance Finance Boston Boston 3 3 Accounting Accounting Finance Finance Boston Boston 11

1. Column vs. Row Orientation - Use Cases ID job dept city Row-Based Storage # Row Oriented works if…  # ID job dept city  All the columns are needed  # # # #  Transac1onal processing is required   # # # # # # Column Oriented works if…  Column-Based Storage Column-Based Storage  Only relevant columns are needed  id job dept city id job dept city  Reports are aggregates (sum, count, average, etc.)  # # # # Benefits   # #  Very efficient compression  # # # #  Faster results for analy1cal queries  # #  Reading column takes similar CPU resources as  reading a row  12

2. Data Packs and Compression Data Packs  Each data pack contains 65,536 data values 64K   Compression is applied to each individual data pack  The compression algorithm varies depending on data 64K  type and distribution Compression  Results vary depending on the 64K  distribution of data among data packs  A typical overall compression ratio 64K  seen in the field is 10:1 Patent Pending  Some customers have seen results Compression of 40:1 and higher Algorithms  For example, 1TB of raw data compressed 10 to 1 would only require 100GB of disk capacity 13

2. What Your Data Looks Like Now Original data  500GB   Compressed data  50 GB  Avg compression ra1o of 10:1  = + Knowledge Grid  < 0.5 GB  < 1% of compressed data 14

3. The Knowledge Grid Knowledge Grid  Knowledge Nodes  applies to the whole table built for each Data Pack Information about the data Data Pack Node  DPN Column A Column A Column B … Numerical Histogram  DP1 DP1 Histogram Built during  DP2  LOAD  DP3 Character Map  CMAP DP4 DP5 DP6 Built using  Pack‐to‐Pack  P-2-P  JOIN   Knowledge Nodes answer the query directly, or  Identify only relevant Data Packs, minimizing decompression 15

3. Knowledge Grid Nodes - DPNs Data Pack Nodes …  DPN Histogram This KN contains  sta$s$cal and  aggregate values for the Data Pack:  •  MINIMUM value  •  MAXIMUM value  CMAP •  COUNT of all elements  •  SUM of all values  •  No. of NULLs  P-2-P MIN MAX COUNT SUM No. NULLs 1 25000 65536 58003500 1000 DPNs help  opGmize  the search by  minimizing  the need to decompress data.    DPNs alone oZen contain enough informa1on to  resolve  a query.   16

3. Knowledge Grid Nodes - Histograms Numerical Histograms …  DPN Histogram The MIN‐MAX range from the DPN is divided into  1024 intervals .  This KN is a  binary representa>on of whether a numerical value  CMAP  exists within each interval.  If the MIN‐MAX range is < 1024, then each ‘interval’ is a dis1nct value.  P-2-P 1 - 24 25 - 48 49 - 72 … 24577 - 25000 1 1 0 1 1 0 Numerical Histograms are very efficient at  minimizing the Data Packs required  to   resolve a query with  numerical  constraints.   17

3. Knowledge Grid Nodes - CMAPs Character Maps …  DPN Histogram The first  64 posi$ons of text fields are read.  This is a  binary representa>on of the occurrence of every possible  CMAP character within the first 64 posi1ons.  Character Position P-2-P 1 2 3 4 5 6 … 64 A 1 0 0 0 1 1 1 1  CMAPs are very efficient at  ASCII Character B 1 0 1 1 0 1 0 resolving  text‐based  search  C 0 1 0 0 0 0 0 queries that involve the  … beginnings  of strings.  a 0 1 0 1 0 1 0 1 b 1 1 0 0 0 1 1 … 18

3. Knowledge Grid Nodes - P-2-P Pack‐to‐Pack Nodes (P‐2‐P) …  DPN Histogram A fourth type of Knowledge Node is created by a  JOIN  query.  P‐2‐P nodes describe  rela>onships between the Data Packs of columns  CMAP of joined tables.   P‐2‐P Nodes are stored in  Table 1  Table 2  memory  and persisted  P-2-P during a session.  Column A Column C Query performance   improves  as joins are  created and re‐used.  Best prac1ce is to “warm   up queries” to pre‐establish  P2P??  19

4. Optimizer 1. Query received 2. Optimizer iterates on Knowledge Grid 3. Each pass eliminates Data Packs 4. If any Data Packs are needed to resolve query, only those are decompressed Knowledge Grid Results Query 1% Q: How are  my sales  doing this  year? ✔ ✔ ✔ ✔ ✔ Compressed Data 20

A Simple Query using the Knowledge Grid SELECT COUNT(*) FROM employees WHERE salary > 100000 salary age job city AND age < 35 All packs ignored AND job = ‘IT’ AND city = ‘San Mateo’; All packs Find the Data Packs with salary > $100,000 1. ignored Find the Data Packs that contain age < 35 2. Find the Data Packs that have job = ‘IT’ 3. Find the Data Packs that have City = ‘San Mateo’ 4. All packs Now we eliminate all rows that have been 5. ignored flagged as irrelevant. Finally we have identified the data pack that 6. needs to be decompressed Only this pack will be decompressed Completely Irrelevant Suspect All values match 21

Agenda Infobright Technology Overview Use Cases and Case Studies - PowerPoint PPT Presentation

The Database for Analytic Applications April 13, 2010 David Lutz Director, Technical Sales Consulting Agenda Infobright Technology Overview Use Cases and Case Studies Migration to Infobright Getting Started Infobright Innovation

Unicode Agenda for Bangla Unicode Agenda for Bangla Unicode Agenda for Bangla Unicode Agenda for

Negotiating Conflicts Eff Effectively ti l Agenda Agenda Agenda Agenda Introductions

Katie Dively, Research Scientist II Agenda Agenda Agenda Agenda Welcome! 7 Step

THE BLACK ART OF BINARY HIJACKING HIJACKING Agenda Agenda Agenda Agenda 2 2 Overview of

Community Advisory Group Meeting June 20, 2016 Agenda 1. Welcome, Introductions and Agenda

Anaheim August 27, 2008 Agenda Agenda Agenda Introduction New Rule Requirements

Investor Report 2019 Earning Result 2 nd March 2020 AGENDA ITEM 01 FY2019 Performance AGENDA

Capital markets day 27 th September 2017 Agenda Time Agenda item Led by Time Agenda item

March 17, 2010 PURPOSE and AGENDA PURPOSE and AGENDA This meeting is a part of the NEPA/CEPA

MOBILITY RESULTS PRESENTATION FOR THE YEAR ENDED 30 JUNE 2014 AGENDA AGENDA FINANCIAL

R E B I R T H R E B I R T H 1 Meeting Agenda Meeting Agenda Agenda 1

Todays Agenda Todays Agenda Continued Todays Agenda Continued Save the Date August

Web E Web E ngineer ngineer ing Pr ing Pr oc ess oc ess We e k 2 Agenda (Lecture) Agenda

F F unctional Design unctional Design We e k 9 Agenda (Lecture) Agenda (Lecture)

IDN BOF Agenda Harald Alvestrand, chair Agenda - 1 0900: Agenda bash, blue sheet, scribe ! 0910:

Agenda Agenda Linda Rammler, UConn UCEDD (copy from Agenda handout) Fr. John Gallagher,

PARTNERS Q2-18 EARNINGS PRESENTATION August 2, 2018 Forward Looking Statement This release

Population Dynamics Deductive modelling: based on physical laws Inductive modelling: based on

Jetdrop 3000 Milestone presentation Gun rnadttir, Harpa rastardttir, Hildur

Hypertension (High Blood Pressure) Bamona, Astou, & Adonis What Is Blood Pressure?

ANNUAL REPORT (January-December 2016) Third development Plan (DP3) A new path to Supporting the

Assessing The Necessity Survey and Decision Tree Activities Conducted Decision tree created

Nordic Group Limited Analyst Briefing July 6, 2015 Disclaimer This presentation (this

Kanmantoo Copper Mine DPA March 2020 PURPOSE OF PRESENTATION Explain the purpose of the

Agenda Infobright Technology Overview Use Cases and Case Studies - PowerPoint PPT Presentation

The Database for Analytic Applications April 13, 2010 David Lutz Director, Technical Sales Consulting Agenda Infobright Technology Overview Use Cases and Case Studies Migration to Infobright Getting Started Infobright Innovation

Unicode Agenda for Bangla Unicode Agenda for Bangla Unicode Agenda for Bangla Unicode Agenda for

Negotiating Conflicts Eff Effectively ti l Agenda Agenda Agenda Agenda Introductions

Katie Dively, Research Scientist II Agenda Agenda Agenda Agenda Welcome! 7 Step

THE BLACK ART OF BINARY HIJACKING HIJACKING Agenda Agenda Agenda Agenda 2 2 Overview of

Community Advisory Group Meeting June 20, 2016 Agenda 1. Welcome, Introductions and Agenda

Anaheim August 27, 2008 Agenda Agenda Agenda Introduction New Rule Requirements

Investor Report 2019 Earning Result 2 nd March 2020 AGENDA ITEM 01 FY2019 Performance AGENDA

Capital markets day 27 th September 2017 Agenda Time Agenda item Led by Time Agenda item

March 17, 2010 PURPOSE and AGENDA PURPOSE and AGENDA This meeting is a part of the NEPA/CEPA

MOBILITY RESULTS PRESENTATION FOR THE YEAR ENDED 30 JUNE 2014 AGENDA AGENDA FINANCIAL

R E B I R T H R E B I R T H 1 Meeting Agenda Meeting Agenda Agenda 1

Todays Agenda Todays Agenda Continued Todays Agenda Continued Save the Date August

Web E Web E ngineer ngineer ing Pr ing Pr oc ess oc ess We e k 2 Agenda (Lecture) Agenda

F F unctional Design unctional Design We e k 9 Agenda (Lecture) Agenda (Lecture)

IDN BOF Agenda Harald Alvestrand, chair Agenda - 1 0900: Agenda bash, blue sheet, scribe ! 0910:

Agenda Agenda Linda Rammler, UConn UCEDD (copy from Agenda handout) Fr. John Gallagher,

PARTNERS Q2-18 EARNINGS PRESENTATION August 2, 2018 Forward Looking Statement This release

Population Dynamics Deductive modelling: based on physical laws Inductive modelling: based on

Jetdrop 3000 Milestone presentation Gun rnadttir, Harpa rastardttir, Hildur

Hypertension (High Blood Pressure) Bamona, Astou, &amp; Adonis What Is Blood Pressure?

ANNUAL REPORT (January-December 2016) Third development Plan (DP3) A new path to Supporting the

Assessing The Necessity Survey and Decision Tree Activities Conducted Decision tree created

Nordic Group Limited Analyst Briefing July 6, 2015 Disclaimer This presentation (this

Kanmantoo Copper Mine DPA March 2020 PURPOSE OF PRESENTATION Explain the purpose of the

Hypertension (High Blood Pressure) Bamona, Astou, & Adonis What Is Blood Pressure?