Me vs BigQuery CEO @ Applications Databases Files Stripe Asana - - PowerPoint PPT Presentation

me
SMART_READER_LITE
LIVE PREVIEW

Me vs BigQuery CEO @ Applications Databases Files Stripe Asana - - PowerPoint PPT Presentation

Data Warehouse Benchmark: Redshift vs Snowflake Me vs BigQuery CEO @ Applications Databases Files Stripe Asana Instagram Amazon Aurora Amazon Cloudfront Xero Bing Ads Intercom Amazon RDS Amazon Kinesis Firehose Zendesk Braintree


slide-1
SLIDE 1

Data Warehouse Benchmark: Redshift vs Snowflake vs BigQuery

CEO @ Me

slide-2
SLIDE 2

Instagram Intercom iTunes Jira Magento MailChimp Mandrill Marketo Mixpanel NetSuite SuiteAnalytics Pardot QuickBooks Online Recurly Sailthru Salesforce SalesforceIQ SAP Business One Shopify

Events

Segment Snowplow

Files

Amazon Cloudfront Amazon Kinesis Firehose Amazon S3 Azure Blob Storage CSV Dropbox FTP FTPS Google Cloud Storage Google Sheets JSON SFTP

Databases

Amazon Aurora Amazon RDS Azure SQL Database DynamoDB Google Cloud SQL Heroku MariaDB MongoDB MySQL Oracle DB PostgreSQL SQL Server

Applications

Asana Bing Ads Braintree Payments Desk.com DoubleClick Dynamics (365, GP, AX) Eloqua Facebook Ad Insights Freshdesk FrontApp Github Google Adwords Google Analytics Google Analytics 360 Google Play Help Scout HubSpot Hybris

For an updated list of data sources visit fivetran.com/directory

Stripe Xero Zendesk Zendesk Chat (Zopim) Zuora Webhooks

slide-3
SLIDE 3

select author_email, count() from github.commit group by 1 select * from github.commit where sha = ‘feeec5a81da13e95a1911b09773f8228f8c0db76‘

Online Transaction Processing (OLTP) is very different from Online Analytical Processing (OLAP) This talk is about OLAP!

slide-4
SLIDE 4

commit file added removed changed xxx file1.txt 1 10 11 xxx file2.txt 100 100 xxx file3.txt 50 50 50 yyy file1.txt 1 10 11

xxx,file1.txt,1,10,11,xxx,file2.txt,100,0,100,xxx,file3.txt,50,50,50,yyy,file1.txt,1,10,11

Row Store:

commit file added removed changed xxx file1.txt 1 10 11 xxx file2.txt 100 100 xxx file3.txt 50 50 50 yyy file1.txt 1 10 11

Column Store:

xxx,xxx,xxx,yyy,file1.txt,file2.txt,file3.txt,1,100,50,1,10,0,50,10,11,100,50,11

select file, sum(changed) from github.commit group by 1

slide-5
SLIDE 5

C-store: the data warehouse that changed everything

2005 2011 2013 2015 2016 C-store BigQuery v1 Redshift Snowflake BigQuery v2

slide-6
SLIDE 6

2011: Early BigQuery

select foo, bar from large_table join other_large_table select why, did, you, invent, your, own, sql from google

Not so great at joins Nonstandard SQL-like language

2005 2011 2013 2015 2016 C-store BigQuery v1 Redshift Snowflake BigQuery v2

slide-7
SLIDE 7

2013: AWS Redshift takes off

2005 2011 2013 2015 2016 C-store BigQuery v1 Redshift Snowflake BigQuery v2

slide-8
SLIDE 8

Snowflake: store the data in S3!

(similar to BigQuery)

2005 2011 2013 2015 2016 C-store BigQuery v1 Redshift Snowflake BigQuery v2

slide-9
SLIDE 9

2016: BigQuery gets way better Fact-to-fact joins work! Standard SQL! DELETE and UPDATE!

update mytable set name = ‘Hello world!’ where id = 1

2005 2011 2013 2015 2016 C-store BigQuery v1 Redshift Snowflake BigQuery v2

slide-10
SLIDE 10

Benchmark time!

slide-11
SLIDE 11

What data did we query?

slide-12
SLIDE 12

What queries did we run?

slide-13
SLIDE 13

What is TPC-DS?

slide-14
SLIDE 14

How to run TPC-DS without cheating DON’T run the same query twice DON’T use dist keys DON’T use sort/partition keys DO apply compression encoding DO use a realistic (small) scale DO compare cost

slide-15
SLIDE 15

DON’T use dist keys

slide-16
SLIDE 16

DON’T use sort/partition keys

slide-17
SLIDE 17
slide-18
SLIDE 18
slide-19
SLIDE 19

How does this compare to

  • ther benchmarks?
slide-20
SLIDE 20

Amazon’s Redshift vs BigQuery benchmark

slide-21
SLIDE 21

Periscope’s Redshift vs Snowflake vs BQ

slide-22
SLIDE 22

Mark Litwintshik’s 1.1 billion taxi-rides

slide-23
SLIDE 23
slide-24
SLIDE 24

What really matters: ease of use

slide-25
SLIDE 25

Instagram Intercom iTunes Jira Magento MailChimp Mandrill Marketo Mixpanel NetSuite SuiteAnalytics Pardot QuickBooks Online Recurly Sailthru Salesforce SalesforceIQ SAP Business One Shopify

Events

Segment Snowplow

Files

Amazon Cloudfront Amazon Kinesis Firehose Amazon S3 Azure Blob Storage CSV Dropbox FTP FTPS Google Cloud Storage Google Sheets JSON SFTP

Databases

Amazon Aurora Amazon RDS Azure SQL Database DynamoDB Google Cloud SQL Heroku MariaDB MongoDB MySQL Oracle DB PostgreSQL SQL Server

Applications

Asana Bing Ads Braintree Payments Desk.com DoubleClick Dynamics (365, GP, AX) Eloqua Facebook Ad Insights Freshdesk FrontApp Github Google Adwords Google Analytics Google Analytics 360 Google Play Help Scout HubSpot Hybris

For an updated list of data sources visit fivetran.com/directory

Stripe Xero Zendesk Zendesk Chat (Zopim) Zuora Webhooks