me
play

Me vs BigQuery CEO @ Applications Databases Files Stripe Asana - PowerPoint PPT Presentation

Data Warehouse Benchmark: Redshift vs Snowflake Me vs BigQuery CEO @ Applications Databases Files Stripe Asana Instagram Amazon Aurora Amazon Cloudfront Xero Bing Ads Intercom Amazon RDS Amazon Kinesis Firehose Zendesk Braintree


  1. Data Warehouse Benchmark: Redshift vs Snowflake Me vs BigQuery CEO @

  2. Applications Databases Files Stripe Asana Instagram Amazon Aurora Amazon Cloudfront Xero Bing Ads Intercom Amazon RDS Amazon Kinesis Firehose Zendesk Braintree Payments iTunes Azure SQL Database Amazon S3 Zendesk Chat (Zopim) Desk.com Jira DynamoDB Azure Blob Storage Zuora DoubleClick Magento Google Cloud SQL CSV Dynamics (365, GP, AX) MailChimp Heroku Dropbox Eloqua Mandrill MariaDB FTP Facebook Ad Insights Marketo MongoDB FTPS Freshdesk Mixpanel MySQL Google Cloud Storage FrontApp NetSuite SuiteAnalytics Oracle DB Google Sheets Github Pardot PostgreSQL JSON Google Adwords QuickBooks Online SQL Server SFTP Google Analytics Recurly Google Analytics 360 Sailthru Google Play Salesforce Events Help Scout SalesforceIQ HubSpot SAP Business One Hybris Shopify Webhooks Segment Snowplow For an updated list of data sources visit fivetran.com/directory

  3. Online Transaction Processing ( OLTP ) select * from github.commit where sha = ‘feeec5a81da13e95a1911b09773f8228f8c0db76‘ is very different from Online Analytical Processing ( OLAP ) select author_email, count () from github.commit group by 1 This talk is about OLAP !

  4. Row Store: commit file added removed changed xxx file1.txt 1 10 11 xxx file2.txt 100 0 100 xxx file3.txt 50 50 50 yyy file1.txt 1 10 11 xxx,file1.txt,1,10,11,xxx,file2.txt,100,0,100,xxx,file3.txt,50,50,50,yyy,file1.txt,1,10,11 Column Store: commit file added removed changed xxx file1.txt 1 10 11 xxx file2.txt 100 0 100 xxx file3.txt 50 50 50 yyy file1.txt 1 10 11 xxx,xxx,xxx,yyy,file1.txt,file2.txt,file3.txt,1,100,50,1,10,0,50,10,11,100,50,11 select file, sum(changed) from github.commit group by 1

  5. C-store : the data warehouse that changed everything 2005 C-store 2011 BigQuery v1 2013 Redshift 2015 Snowflake 2016 BigQuery v2

  6. 2011: Early BigQuery Not so great at joins select foo, bar 2005 C-store from large_table 2011 BigQuery v1 join other_large_table 2013 Redshift Nonstandard SQL-like language 2015 Snowflake select why, did, you, 2016 BigQuery v2 invent, your, own, sql from google

  7. 2013: AWS Redshift takes off 2005 C-store 2011 BigQuery v1 2013 Redshift 2015 Snowflake 2016 BigQuery v2

  8. Snowflake: store the data in S3! (similar to BigQuery) 2005 C-store 2011 BigQuery v1 2013 Redshift 2015 Snowflake 2016 BigQuery v2

  9. 2016: BigQuery gets way better Fact-to-fact joins work! 2005 C-store Standard SQL! 2011 BigQuery v1 DELETE and UPDATE! 2013 Redshift update mytable set name = ‘Hello world!’ 2015 Snowflake where id = 1 2016 BigQuery v2

  10. Benchmark time!

  11. What data did we query?

  12. What queries did we run?

  13. What is TPC-DS?

  14. How to run TPC-DS without cheating DON’T run the same query twice DON’T use dist keys DON’T use sort/partition keys DO apply compression encoding DO use a realistic (small) scale DO compare cost

  15. DON’T use dist keys

  16. DON’T use sort/partition keys

  17. How does this compare to other benchmarks?

  18. Amazon’s Redshift vs BigQuery benchmark

  19. Periscope’s Redshift vs Snowflake vs BQ

  20. Mark Litwintshik’s 1.1 billion taxi-rides

  21. What really matters: ease of use

  22. Applications Databases Files Stripe Asana Instagram Amazon Aurora Amazon Cloudfront Xero Bing Ads Intercom Amazon RDS Amazon Kinesis Firehose Zendesk Braintree Payments iTunes Azure SQL Database Amazon S3 Zendesk Chat (Zopim) Desk.com Jira DynamoDB Azure Blob Storage Zuora DoubleClick Magento Google Cloud SQL CSV Dynamics (365, GP, AX) MailChimp Heroku Dropbox Eloqua Mandrill MariaDB FTP Facebook Ad Insights Marketo MongoDB FTPS Freshdesk Mixpanel MySQL Google Cloud Storage FrontApp NetSuite SuiteAnalytics Oracle DB Google Sheets Github Pardot PostgreSQL JSON Google Adwords QuickBooks Online SQL Server SFTP Google Analytics Recurly Google Analytics 360 Sailthru Google Play Salesforce Events Help Scout SalesforceIQ HubSpot SAP Business One Hybris Shopify Webhooks Segment Snowplow For an updated list of data sources visit fivetran.com/directory

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend