Innovation at AWS Eric Ferreira ericfe@amazon.com Principal - PowerPoint PPT Presentation

Innovation at AWS Eric Ferreira ericfe@amazon.com Principal Database Engineer Amazon Redshift

The Amazon Flywheel Focus on things that stay the same Price Selection Delivery

Applying this at AWS

Focus on things that stay the same Performance Amazon Value Redshift Simplicity

Adopt a retail mindset

Customers have choice Delight them and they’ll stay Earn their business one hour at a time

Start with the Customer Work Backwards

What Do Customers Want? • What problems are customers facing? • How will my service alleviate this pain? • Why will this idea delight customers? • Why can I do this better than anyone else?

What we heard from customers about DW • Complicated to install, maintain, operate • Require large upfront payments • Too expensive • Always running out of capacity

Press Release Describe the product in terms of customer value Why will customers care? Is it newsworthy? How is this differentiated?

FAQ Answer customer questions How does this help me? How do I get started? How will this work with my ETL/BI tools? When should I use this vs. Hadoop?

2 pizza teams An individual team should be no larger than can be fed • by two pizzas. Beyond this size, you define contracts and interfaces • with other teams Attention is a scarce resource. Time is a scarce resource • Apply attention and time to changing reality, not • communicating status.

Build the Product Assemble Internal Private Build Launch Iterate a Team Beta Beta

Iterate

Add Features Get Feedback that matter Increase Raise Value Adoption

Redshift pushes a new DB version every two weeks. 120+ features since launch Unload logs (7/5) Temp Credentials (4/11) Sharing snapshots (7/18) DUB (4/25) Resource Level IAM (8/9) Kinesis EMR/HDFS/SSH copy, SHA1 Builtin (7/15) 3 new regex features, Unload to single Distributed Tables, Audit SOC1/2/3 (5/8) file, FedRAMP(5/6) Logging/CloudTrail, Concurrency, Resize Statement Timeout (7/22) Perf., Approximate Count Distinct, SNS WLM Timeout/Wildcards (8/1) Alerts, Cross Region Backup (11/13) UTF-8 Substitution (8/29) Resize progress indicator & Cluster JDBC Fetch Size (6/27) Version (3/21) Service Launch (2/14) New query monitoring system tables and Split_part, Audit tables (10/3) diststyle all (1/13) 50 slots, COPY from EMR, ECDHE EIP Support for VPC Clusters (12/28) ciphers (4/22) Redshift on DW2 (SSD) Nodes (1/23) PCI (8/22) Distributed Tables, Single Node Cursor Support, Maximum Connections to 500 Regex_Substr, COPY from JSON (3/25) SIN/SYD (10/8) (12/13) PDX (4/2) JSON, Regex, Cursors (9/10) Compression for COPY from SSH, Fetch NRT (6/5) HSM Support (11/11) CRC32 Builtin, CSV, Restore Progress size support for single node clusters, (8/9) new system tables with commit stats, row_number(), strotol() and query Timezone, Epoch, Autoformat (7/25) termination (2/13) 4 byte UTF-8 (7/18) Unload Encrypted Files

Collect Store Analyze Athena EMR AWS Import/ Direct Connect S3 Glacier Export Snowball Machine Redshift Learning AWS IoT Kinesis DynamoDB Elasticsearch QuickSight EC2 Lambda AWS Glue AWS Database Migration Service

Collection & Storage • Store anything • Object storage • Designed for 99.999999999% durability Amazon S3 • Scalable & Cost effective; $0.023/GB-Mo • Integrated with Amazon Glacier • Support for multiple encryption methods; integrated with AWS KMS, with support for external HSMs

Data Management & ETL • Hive Metastore-compatible data catalog with integrated crawlers for schema, data type, and partition inference • Generates Python code to move data from source to destination AWS Glue • Edit jobs using your favorite IDE and share snippets via Git • Runs jobs in Spark containers that auto-scale based on SLA • Serverless with no infrastructure to manage; pay only for the resources you consume

Amazon RDS for Aurora MySQL compatible with up to 5x better performance on the • same hardware: 100,000 writes/sec & 500,000 reads/sec Scalable with up to 64 TB in single database, up to 15 read • replicas Highly available, durable, and fault-tolerant custom SSD storage • layer: 6-way replicated across 3 Availability Zones Transparent encryption for data at rest using AWS KMS • Stored procedures in Aurora can invoke AWS Lambda functions • MySQL & PostgreSQL compatible engines •

Structured Data Processing Petabyte-scale relational, MPP, data warehousing clusters with the • ability to join across Exabytes of data in S3 using Redshift Spectrum, a serverless scale out query layer that charges $5/TB scanned Fully managed with SSD and HDD platforms • Built-in end to end security, including customer-managed keys • Fault tolerant. Automatically recovers from disk and node failures • Data automatically backed up to Amazon S3 with cross region • Amazon Redshift backup capability for global disaster recovery $1,000/TB/Year; start at $0.25/hour. Provision in minutes; scale from • 160GB to 2PB of compressed data with just a few clicks

Semi-structured / Unstructured Data Processing Hadoop, Hive, Presto, Spark, Tez, Impala etc. • Release 5.3: Hadoop 2.7.3, Hive 2.1, Spark 2.1, Zeppelin, Presto, HBase – 1.2.3 and HBase on S3, Phoenix, Tez, Flink. New applications added within 30 days of their open source release – Fully managed, autoscaling clusters with support for on-demand • and spot pricing Support for HDFS and S3 filesystems enabling separated compute Amazon EMR • and storage; multiple clusters can run against the same data in S3 HIPAA-eligible. Support for end-to-end encryption, IAM/VPC, S3 • client-side encryption with customer managed keys and AWS KMS

Serverless Query Processing Serverless query service for querying data in S3 using standard SQL, • with no infrastructure to manage No data loading required; query directly from Amazon S3 • Use standard ANSI SQL queries with support for joins, JSON, and • window functions Amazon Athena Support for multiple data formats include text, CSV, TSV, JSON, • Avro, ORC, Parquet Pay per query only when you’re running queries based on data • scanned. If you compress your data, you pay less and your queries run faster

Serverless Event Processing Server-less compute service that runs your code in • response to events Extend AWS services with user defined custom logic • Write custom code in Node.js, Python, and Java • AWS Lambda Pay only for the requests served and compute time • required - billing in increments of 100 milliseconds

Stream Processing Real-time stream processing • High throughput; elastic • Highly available; data replicated across multiple • Availability Zones with configurable retention Amazon Kinesis S3, Redshift, DynamoDB Integrations • Kinesis Streams for custom streaming applications; • Kinesis Firehose for easy integration with Amazon S3 and Redshift; Kinesis Analytics for streaming SQL

Search and Operational Analytics Distributed search and analytics engine • Managed service using Elasticsearch and Kibana • Fully managed; Zero admin • Amazon Elasticsearch Highly Available and Reliable • Service Tightly integrated with other AWS services •

Predictive Applications Easy to use, managed service built for developers - • Deploy models to in seconds Robust, powerful technology based on Amazon’s • internal systems Create models using your data already stored in the • Amazon ML AWS cloud; deploy models in batch and real time modes Spark on Amazon EMR also available for custom • machine learning applications

Business Intelligence Fast and cloud-powered • Easy to use, no infrastructure to manage • Scales to 100s of thousands of users • Amazon QuickSight Quick calculations with SPICE • 1/10th the cost of legacy BI software •

Amazon Redshift

Amazon SWF Amazon VPC Amazon EC2 AWS IAM OLAP MPP Columnar PostgreSQL Amazon Redshift Amazon S3 Amazon Amazon AWS KMS CloudWatch Route 53

Redshift Cluster Architecture SQL Clients/BI Tools Massively parallel, shared nothing • JDBC/ODBC Leader node • 128GB RAM SQL endpoint – Leader 16 cores Node Stores metadata – 16TB disk 10 GigE Coordinates parallel SQL processing – (HPC) Compute nodes • Local, columnar storage – 128GB RAM 128GB RAM 128GB RAM Compute Compute Compute 16 cores 16 cores 16 cores Executes queries in parallel – Node Node Node 16TB disk 16TB disk 16TB disk Load, backup, restore – Ingestion S3 / EMR / DynamoDB / SSH Backup Restore

Brute force only takes you so far…

Designed for I/O Reduction CREATE TABLE audience ( Columnar storage • aid INT --audience_id ,loc CHAR(3) --location ,dt DATE --date ); Data compression • aid loc dt 1 SFO 2016-09-01 Zone maps • 2 JFK 2016-09-14 3 SFO 2017-04-01 4 JFK 2017-05-14 aid loc dt • Accessing dt with row storage: – Need to read everything – Unnecessary I/O

Innovation at AWS Eric Ferreira ericfe@amazon.com Principal - PowerPoint PPT Presentation

Innovation at AWS Eric Ferreira ericfe@amazon.com Principal Database Engineer Amazon Redshift The Amazon Flywheel Focus on things that stay the same Price Selection Delivery Applying this at AWS Focus on things that stay the same

KAFKA STREAMS CLOUD MONITORING AWS CLOUD MONITORING AWS APP CLOUD MONITORING AWS HTTP APP

AWS Agility + Splunk Visibility = Cloud Success Splunk App for AWS Demo Laura Ripans, AWS

stewardship uptake in China Megan McLeod | AWS Asia-Pacific AWS STANDARD V2.0 AWS Water

Instance Support Elastic Load Balancing Amazon EC2 AWS Elastic Beanstalk Amazon EC2 Container

Maspex is using AWS services for AWS allows us implement marketing activities IT

How to install Patch Manager Plus at AWS Steps to install Patch Manager Plus at AWS 1. Login to

Encryption at Scale on AWS Matt Campagna campagna@amazon.com Agenda Describe the AWS Key

The AWS Mission Enable businesses and developers to use web services to build scalable,

DevOps & AWS Chris Econn Head of DevOps CorpInfo | AWS Premier Partner DevOps Bill of Rights

Scalable WordPress in AWS Elastic Beanstalk Stephen J. Butler, Technology Services

Cloud Security on the Dollar Menu ARNEL MANALO, CISSP, AWS-CSAA SHELLCON 2018 Agenda

Troubleshooting AWS App Workshop Splunk Add-on for AWS 4.3+ Kamilo Amir | Splunk Cloud Architect

Introduction to AI on AWS Boaz Ziniman - Technical Evangelist, AWS @ziniman GPU Technology

Experience IoT through the use of AWS IoT Buttons CETPA Paige Johnson, AWS Frank Anderson,

VMD & NAMD on Elastic Compute Cloud (EC2) instance of Amazon Web Services (AWS) Start VMD

AWS and OpenAI gym Tutorial 10-703: Deep Reinforcement Learning: Recitation I Objectives for

MA/CSSE 474 Theory of Computation Reduction: Decidability and Undecidability Proofs SD and

If processes are fundamental, what does this tell us about the nature of time? Antony Galton

Efficient Programming in Stata and Mata II: Obtaining Non-Standard Distributions for a

Integration and Automation of Data Preparation and Data Mining Yanhui Geng Huawai Technologies

Troubleshooting FTE TE Rep eports 1 The Ohio Department of Education funds development of EMIS

The Ophidia stack: a big data analy4cs framework for Virtual

This Unit: Arithmetic App App App A little review System software Binary + 2s

Section 1.1: Percentages MATH 105: Contemporary Mathematics University of Louisville August 22,

Innovation at AWS Eric Ferreira ericfe@amazon.com Principal - PowerPoint PPT Presentation

Innovation at AWS Eric Ferreira ericfe@amazon.com Principal Database Engineer Amazon Redshift The Amazon Flywheel Focus on things that stay the same Price Selection Delivery Applying this at AWS Focus on things that stay the same

KAFKA STREAMS CLOUD MONITORING AWS CLOUD MONITORING AWS APP CLOUD MONITORING AWS HTTP APP

AWS Agility + Splunk Visibility = Cloud Success Splunk App for AWS Demo Laura Ripans, AWS

stewardship uptake in China Megan McLeod | AWS Asia-Pacific AWS STANDARD V2.0 AWS Water

Instance Support Elastic Load Balancing Amazon EC2 AWS Elastic Beanstalk Amazon EC2 Container

Maspex is using AWS services for AWS allows us implement marketing activities IT

How to install Patch Manager Plus at AWS Steps to install Patch Manager Plus at AWS 1. Login to

Encryption at Scale on AWS Matt Campagna campagna@amazon.com Agenda Describe the AWS Key

The AWS Mission Enable businesses and developers to use web services to build scalable,

DevOps &amp; AWS Chris Econn Head of DevOps CorpInfo | AWS Premier Partner DevOps Bill of Rights

Scalable WordPress in AWS Elastic Beanstalk Stephen J. Butler, Technology Services

Cloud Security on the Dollar Menu ARNEL MANALO, CISSP, AWS-CSAA SHELLCON 2018 Agenda

Troubleshooting AWS App Workshop Splunk Add-on for AWS 4.3+ Kamilo Amir | Splunk Cloud Architect

Introduction to AI on AWS Boaz Ziniman - Technical Evangelist, AWS @ziniman GPU Technology

Experience IoT through the use of AWS IoT Buttons CETPA Paige Johnson, AWS Frank Anderson,

VMD &amp; NAMD on Elastic Compute Cloud (EC2) instance of Amazon Web Services (AWS) Start VMD

AWS and OpenAI gym Tutorial 10-703: Deep Reinforcement Learning: Recitation I Objectives for

MA/CSSE 474 Theory of Computation Reduction: Decidability and Undecidability Proofs SD and

If processes are fundamental, what does this tell us about the nature of time? Antony Galton

Efficient Programming in Stata and Mata II: Obtaining Non-Standard Distributions for a

Integration and Automation of Data Preparation and Data Mining Yanhui Geng Huawai Technologies

Troubleshooting FTE TE Rep eports 1 The Ohio Department of Education funds development of EMIS

The Ophidia stack: a big data analy4cs framework for Virtual

This Unit: Arithmetic App App App A little review System software Binary + 2s

Section 1.1: Percentages MATH 105: Contemporary Mathematics University of Louisville August 22,

DevOps & AWS Chris Econn Head of DevOps CorpInfo | AWS Premier Partner DevOps Bill of Rights

VMD & NAMD on Elastic Compute Cloud (EC2) instance of Amazon Web Services (AWS) Start VMD