Kaycee Lai, CEO & Founder Presto Summit NYC 2019
Kaycee Lai, CEO & Founder Presto Summit NYC 2019 WHO WE ARE - - PowerPoint PPT Presentation
Kaycee Lai, CEO & Founder Presto Summit NYC 2019 WHO WE ARE - - PowerPoint PPT Presentation
Kaycee Lai, CEO & Founder Presto Summit NYC 2019 WHO WE ARE EXEC TEAM $400M+ from successful startup exits Pedigree from GOOG, VMW, MSFT, ORCL Dr. Shuo Yang Azary Smotrich Kaycee Lai VP, Engineering Principal Architect CEO & Founder
2
WHO WE ARE
EXEC TEAM
$400M+ from successful startup exits
Pedigree from GOOG, VMW, MSFT, ORCL
TOP INVESTORS
Successful track record nurturing startups to success
DOMAIN EXPERTISE
Data Ops / Big Data / Analytics / Cloud / Data Management / Cluster Management/ Data Governance CEO & Founder VP, Engineering Principal Architect Jocelyn Goldfein
Board Member Zetta Ventures
Arnold Silverman
Board Member Discovery Ventures
Graham Brooks
Board Member .406 Ventures
Jeff Parks
Investor Riverwood
Kaycee Lai
- Dr. Shuo Yang
Azary Smotrich
- GM $120M P&L @EMC
- President @Waterline Data
- VP Sales
@Virsto (VMware) @Avamar (EMC) @Delphix
- Ph.D. CS from Purdue Univ.
- Key member of ”Borg” @Google
- Built cloud native analytics @EA
- Office of CTO @Oracle
- Founding Eng @ModleN
- Prescriptive Analytics @NASA
- Founding Eng. @Waterline
3 Resources Required: Business Analysts / IT / Data Scientists / DBAs /BI Developers / SIs
Weeks Hours Days Months Days Months
Prep Data SQL Statement Query Data Visualize Data 6 4 3 1 5 Move Data 2 Govern Data 7 Discover Data Resources Required: Data Governance / Office of CDO / Compliance
GETTING ANSWERS FOR BI IS EVEN HARDER
Not sure if the data is right until step 6!
>4 Months to answer 1 question
4
BI/ANALYTICS SHOULD BE ABOUT ANSWERING QUESTIONS…RIGHT?
5
OUR VISION TO SIMPLIFY BI & ANALYTICS
Any Data Source Location Relationships Instructions Intent of question Assembly logic SQL Statement Federated Query BI Integration
Reveal Rationalize Execute 3 2 1 4 Connect
Reduce a 4 month process to minutes
6 FAST, SCALABLE, SAAS PLAFORM ON CLOUD (AWS)
RATIONALIZE
Logical Guidance (Reasoner)
REVEAL
Relationships (Data Map)
CONNECT
Data Catalogs Data Sources
EXECUTE
Federated Query
DATA AS A SERVICE WITH PROMETHIUM
VISUALIZE
Self-Service Analytics Instructions (Directions) NLP Search (Question Builder) Location (Data Explorer) Data Discovery Auto SQL Statement (SQL AI) Data “Prep”
ARCHITECTURE
8
SCALABLE ARCHITECTURE
DATA CONTEXT ENGINE FRONT END QUERY EXECUTION AI/NLP
9
KEY COMPONENTS
DATA SOURCES QUERY EXECUTION 3RD PARTY DATA CATALOG
10
HOW IT WORKS - CONNECT
DATA SOURCES 3rd PARTY DATA CATALOG
SMART BOTS Cloud JDBC HDFS Data Catalogs
- 1. API-Based (e.g. JDBC)
- 2. Name / Location / Schema
- 3. No heavy processing & data movement
- 4. Alt. Names: Tags / Synonyms
- 5. Data Quality
- 6. Lineage
INFO FROM SMARTBOTS:
11
HOW IT WORKS - REVEAL
DATA CONTEXT ENGINE
- 3. Location (IP address / URL)
DATA EXPLORER (FIND DATA)
- 1. Table/File/Column Name
- 2. Vendor Name / Data Type
DIRECTIONS (ASSEMBLE)
- 3. Select / Join
- 1. Tables / Files
- 2. From what Vendor
DATA MAP (VISUALIZE)
- 1. Topology
- 2. Relationships
- 3. Alternate Versions
12
HOW IT WORKS - RATIONALIZE
DATA CONTEXT ENGINE
DATA MAP (VISUALIZE)
- 1. Delete / Change Tables
- 2. Find Missing Tables via Catalog
DIRECTIONS (ASSEMBLE)
- 3. Auto-Create SQL Statement for Presto
- 1. Change Join Types
- 2. Change Join Operators
13
HOW IT WORKS - EXECUTE
VIRTUAL VIEW
Query Initiated Direct Access to Data – No ETL
14
THE NO-ETL APPROACH
HOW IT WAS DONE
~ 4 MONTHS
HOW IT CAN BE TODAY
~ 4 MINUTES
Build complex data pipelines Copy/Move data to a data warehouse / lake Schedule long running ETL jobs
1 3 2
Manually subset, join, write SQL statements
4
Select data discovered Run queries directly from source
1 2
Query against data warehouse/ lake
5
15
SUPPORT & INTEGRATION
SUPERSET
* On Roadmap * *
DATA SOURCES DATA VIRTUALIZATION DATA VISUALIZATION DATA LINEAGE DATA CATALOG PLATFORM
(PUBLIC CLOUD OR ON-PREM VPC) RDBMS: HDFS: S3 based: Cloud:
BUSINESS IMPACT
17
PAIN POINTS ADRESSED
Data is fractured across multiple systems, multiple vendors and multiple locations No single tool can search for data across the entire data estate Loading all of the data into a single repository is expensive & time consuming
FINDING DATA IS COMPLEX & TIME CONSUMING.
Need to know data relationships to know what / how to join SQL statements can take up to 8 hours to create Few people in the organization who can write a valid SQL statement
ANSWERING QUESTIONS STILL TAKES HUGE MANUAL EFFORT POST DATA DISCOVERY.
Insights are limited as SQL statements can only reflect data from one system Highly manual two-step process of moving data from each separate vendor then manually joining the data
QUERIES ACROSS DIFFERENT SOURCES / VENDORS = HARD TO EXECUTE
PROMETHIUM’S DATA EXPLORER ™ 1 SINGLE solution to find data without the need to move data Reveals context of data across all vendors and systems PROMETHIUM’S QUESTION BUILDER ™ NLP driven method to transform questions into data PROMETHIUM’S DATA MAP & DIRECTIONS Instantly generates a STEP BY STEP assembly directions + DATA MAP PROMETHIUM’S SQL AI Instantly generates a valid SQL statement PROMETHIUM’S KALEIDOSCOPE ™ 1-STEP Federated query execution across various sources with integration for BI tools such as Tableau
PROBLEM PROBLEM PROBLEM SOLUTION SOLUTION SOLUTION
18
TODAY: TIME & EFFORT
TODAY
Time to find Data 4 weeks Time to Move Data 5 days Time to Subset/Model/Join 2 months Time to Write 1 SQL Statement 8 hours Time to Aggregate & Query Data 3 days # Data Analysts 4 # Data Engineers 2 # Business Analysts 2
TODAY
# of People Involved 8 Amount of Time (months) 3 month+ # of Questions answered in 1 year < 4 Cost of Asking 4 Questions $549,973 Data Analyst Cost $125,000 Data Engineer Cost $250,000 Business Analyst Cost $90,000
19
PROMETHIUM EFFICIENCY
TODAY PROMETHIUM
Time to find Data 4 weeks 1 min Time to Move Data 5 days Time to Subset/Model/Join 2 months 2 sec Time to Write 1 SQL Statement 8 hours 1 min Time to Aggregate & Query Data 3 days 1 min # Data Analysts 4 1 # Data Engineers 2 # Business Analysts 2
TODAY
# of People Involved 8 Amount of Time (months) 3 month+ # of Questions answered in 1 year < 4 Cost of Asking 4 Questions $549,973
PROMETHIUM
# of People Involved 1 Amount of Time (min) ~3 min # of Questions answered in 1 year 40,000 Cost of Asking 4 Questions $7.30
People Efficiency 7X less Time Efficiency ~ 10,000X less Cost Efficiency ~ 75,000X less
For 7x less resources & 75,000X less cost, Promethium can answer up to 10,000X more questions.
What can a business do if it has a 10,000X increase in efficiency to answer questions & gain insights?
Data Analyst Cost $125,000 Data Engineer Cost $250,000 Business Analyst Cost $90,000
20
AI-DRIVEN APPROACH WITH PROMETHIUM
Discover Prep Execute Ask a Question (NLP)