Infrastructure as a Service (IaaS) Google Compute Engine AWS - - PowerPoint PPT Presentation
Infrastructure as a Service (IaaS) Google Compute Engine AWS - - PowerPoint PPT Presentation
Infrastructure as a Service (IaaS) Google Compute Engine AWS Elastic Compute Cloud (EC2) Azure Virtual Machines Digital Ocean Go Google gle Compu pute e En Engi gine ne (GC GCE) E) Infrastructure-as-a-Service Hardware service
Go Google gle Compu pute e En Engi gine ne (GC GCE) E)
Infrastructure-as-a-Service
Hardware service for you to create and run virtual machine instances on Lowest-level abstraction for cloud infrastructure Flexible, but requires management Good for arbitrary workloads
Provides vertical scaling options
Number of cores Amount of RAM Video card types Type of disk (standard, SSD) Up to 96 cores, 684 GB ! (10/2017)
Billed at a sub-minute level
Portland State University CS 430P/530 Internet, Web & Cloud Systems
Seen previously
Segmentation and filtering for security Instance templates and groups to auto-scale up at the VM level Load balancing to distribute work across VMs globally
Portland State University CS 430P/530 Internet, Web & Cloud Systems
Pre-em emptible ptible VMs
VMs that Google can re-claim at any time if demand spikes
80% lower in cost! Framestore
Video rendering for visual designers 15,000 cores needed at peak rendering times Unneeded at quiet time Not mission-critical, OK to be pre-empted temporarily
$300k saved by using on-demand, pre-emptible infrastructure over dedicated Fault-tolerance built-in to application to restart interrupted jobs
Portland State University CS 430P/530 Internet, Web & Cloud Systems
Compute Engine access
All eventually hit API
Portland State University CS 430P/530 Internet, Web & Cloud Systems
REST API directly
Via Web eb UI
console.cloud.google.com
Portland State University CS 430P/530 Internet, Web & Cloud Systems
Via Cloud
- ud Shel
ell l or SDK DK
Command-line interface (CLI) with myriad command-line options List
gcloud compute instances list
Create
gcloud compute instances create myinstance
Delete
gcloud compute instances delete myinstance
Portland State University CS 430P/530 Internet, Web & Cloud Systems
Via libraries braries and d API
Through client libraries in many different languages (Python, JavaScript,
Go, etc.)
Translates into HTTP/JSON API requests Python example via google-api-python-client Python package
Portland State University CS 430P/530 Internet, Web & Cloud Systems
compute = googleapiclient.discovery.build('compute', 'v1') def list_instances(compute, project, zone): result = compute.instances().list(project=project, zone=zone).execute() return result['items'] if 'items' in result else None
Via RES EST T API
Demo via interactive API Explorer
From web console, APIs and Services ➔ Library ➔ Compute Engine API ➔Try this
API in APIs Explorer Example: listing instances via API
Enable OAuth2
compute.instances.list
Portland State University CS 430P/530 Internet, Web & Cloud Systems
Example
REST API call
GET
https://compute.googleapis.com/compute/v1/projects/{project}/zones/{zone }/instances JSON response
Portland State University CS 430P/530 Internet, Web & Cloud Systems
Storage as a Service
Ba Back ckgr ground:
- und: Go
Google gle file le sy syst stem em (GF GFS) S) 2003
Designed to support Google Search
Retrieving, storing, and querying of web pages at massive scale
Goals
Large data sets, high-throughput, low-latency querying Durability and availability with very little management overhead
Dead disks simply replaced and system seamlessly adapts
Done via horizontal scaling and replication
http://research.google.com/archive/gfs-sosp2003.pdf
But, initially proprietary
Yahoo! later reverse-engineered GFS Released as Hadoop Distributed File System (HDFS). Open-sourced and distributed by Apache
Spun out commercially into …
Portland State University CS 430P/530 Internet, Web & Cloud Systems
Go Google gle Cloud ud Storag age e (gc gcs)
AWS equivalent is S3 Fully-managed (e.g. serverless), no-ops storage service
No administration or capacity management Backed up and versioned automatically
Replicated and cached over multiple zones/regions
Fixed region for local computation Multi-region for global file delivery Adats to load and access patterns for high availability and throughput
Low latency: 10s of ms on first use, then faster via migration Data encrypted at rest when not being used and in flight
Key sharding with parts of keys in multiple jurisdictions But, unencrypted when being used
Massive scale
Autism Speaks: 1300 genomes and > 100 TB of data Projected to 10,000 genomes > 1 PB of data
Portland State University CS 430P/530 Internet, Web & Cloud Systems
Model del
Storage done via "buckets"
Buckets, like URLs, must be uniquely identifiable
Object-level storage
Access storage similar to accessing objects over the web Present an identifier, receive it back in its entirety Different than block-level storage of disks
Portland State University CS 430P/530 Internet, Web & Cloud Systems
App pplications lications
Good for large unstructured data that does not need to be queried
Images, Video, Zip files Structured data that needs to be queried should use DBs
Used to feed and store data and logs from all cloud services
BigQuery, App Engine, Cloud SQL, ComputeEngine,
Dataflow/Dataproc, Etc..
Portland State University CS 430P/530 Internet, Web & Cloud Systems
Access cess
Web interface SDK via gsutil command and gs:// URI
gsutil ls gsutil mb gs://xx-yy-zz
Client libraries (Python google-cloud-storage) REST API (Storage JSON API)
Portland State University CS 430P/530 Internet, Web & Cloud Systems
from google.cloud import storage storage_client = storage.Client() bucket_name = 'my-new-bucket' bucket = storage_client.create_bucket(bucket_name)
Database as a Service
Main in ty type pes
SQL
Relational structured data Complex querying using relations Schema (statically typed data) Strict transactional consistency Vertical scaling
Portland State University CS 430P/530 Internet, Web & Cloud Systems
NoSQL
Non-realational, unstructured data Simple, fast key-value lookup Schemaless (dynamically typed data) Loose eventual consistency Horizontal scaling
What explains the last two design patterns?
CAP P Theo eorem rem (Fox/B x/Bre rewer er 2000) 0)
Any networked system can have at most two of three desirable
properties
C = consistency A = availability P = partition-tolerance
Can not have strong consistency in the wake of network outages
with high availability
Two consistency options for networked databases
ACID (atomicity, consistency, isolation, durability)
To achieve strong consistency, lose “A” availability in the face of a network
partition “P”
Can not perform transactions until all replicas fully on-line Cloud SQL
BASE (basically available, soft state, eventual consistency)
To achieve high availability, lose “C” in the face of a network partition “P” Cloud BigTable & Cloud Datastore
Portland State University CS 430P/530 Internet, Web & Cloud Systems
App pplication lication driv ives es consis nsisten ency cy mo model el
Bank accounts
Require strong consistency
High-score updates in a game?
Can survive with just eventual consistency
Need different implementations of databases (and DBaaS) to support
different application requirements
Portland State University CS 430P/530 Internet, Web & Cloud Systems
Two arch chit itectural ectural opt ptions
- ns
Server-based
Machines with pre-configured database software Cloud SQL, AWS RDS (MySQL, Postgres, MS SQL Server, etc.) Many backend databases, many DBaaS
Serverless
Fully managed, NoOps, database services that automatically scale Cloud Datastore, AWS DynamoDB Cloud Spanner
Portland State University CS 430P/530 Internet, Web & Cloud Systems
Ser erverless erless app pplied lied to d databa tabases ses
Portland State University CS 430P/530 Internet, Web & Cloud Systems
App optimization Scaling High availability DB backups DB patching DB installation OS patching OS installation Server maintenance Rack and stack Power, HVAC, network
On-premises
App optimization Scaling High availability DB backups DB patching DB installation OS patching OS installation Server maintenance Rack and stack Power, HVAC, network
IaaS
App optimization Scaling High availability DB backups DB patching DB installation OS patching OS installation Server maintenance Rack and stack Power, HVAC, network
DBaaS
Server-based DBaaS
Cloud SQL
AWS RDS (Relational Database Service) Azure SQL Database
Reca ecall ll
Drop-in replacement for MySQL or Postgres relational database
AWS RDS with MS SQL Server, Oracle, MariaDB
Uses pre-configured VMs on demand
Vertical scaling (read and write) Horizontal scaling only for reads via replicas
Accessed via standard drivers
Portland State University CS 430P/530 Internet, Web & Cloud Systems
Serverless DBaaS
Cloud Datastore (NoSQL)
AWS DynamoDB Azure Cosmos DB
Cloud ud Da Datast tastore
- re
Distributed, fully-managed NoSQL database optimized for reading
Schemaless, key-value store
Store entities and objects given a unique key Stored object can be modified without conforming to some database schema
Limited querying (mostly gets and puts) NoOps (e.g. serverless operation)
Autoscaled and managed, no configuration Data automatically stored across multiple zones for availability Programming API from App Engine for many languages
Portland State University CS 430P/530 Internet, Web & Cloud Systems
Cloud ud Da Datast tastore
- re
Data organized by "Kind"
Similar to table in SQL, categorizes entities for queries
Each entity with unique key
Similar to a row in SQL, but not all entries of a Kind have the same
properties
Each entity stores properties containing data
Properties similar to columns in SQL
Portland State University CS 430P/530 Internet, Web & Cloud Systems
Op Oper erations ations
Insert and retrieve data via put() and get() Minimal query support via query() and scan()
Scan goes through entire datastore Query done via indices built using keys specified by application
(typically)
Portland State University CS 430P/530 Internet, Web & Cloud Systems
Sum umma mary
Portland State University CS 430P/530 Internet, Web & Cloud Systems
Transactions No Yes No Yes Complex queries No No No Yes Capacity Petabytes+ Terabytes+ Petabytes+ Up to 500GB
Cloud Spanner
Amazon Aurora "NewSQL"
Cloud ud Spa panner nner (2017) 7)
Horizontally scalable, relational ACID database
Management of machines explicit
Best of SQL
SQL queries, JOINs Schemas, strong types Strong consistency Indexes, strong secondary keys
Best of NoSQL
Horizontal scaling
Portland State University CS 430P/530 Internet, Web & Cloud Systems
Spa panner nner and nd th the C e CAP P th theo eorem rem
C (consistency) over A (availability) just like ACID Scale via synchronous replicas (unlike Cloud Datastore)
3 copies by default
But, when partitions happen, go into partition mode
Replicas use consensus mechanism to manage partitions Replicas on the “majority” side of partition continue, those in minority
lose availability
Engineer against P (partitions) via Google’s network to get 5 9s
reliability
Good for scaling OLTP (On-Line Transaction Processing)
applications
Portland State University CS 430P/530 Internet, Web & Cloud Systems
https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/4 5855.pdf
Ex Example ple us use e cases ses
Require SQL with ACID at massive scale Initially, manually-sharded MySQL
Columns and tables of each database split across multiple nodes Resharding a multi-year process Moved to Cloud Spanner F1 paper: "A Distributed SQL Database that Scales"
https://research.google.com/pubs/pub41344.html
From sharded MySQL to Spanner
https://quizlet.com/blog/quizlet-cloud-spanner
Seamless integration of game data
https://blogs.unity3d.com/2018/06/21/bringing-connected-games-within-reach-with-
google-cloud/
Portland State University CS 430P/530 Internet, Web & Cloud Systems
Blockchain.com
Wallet and explorer https://cloudblog.withgoogle.com/products/databases/blockchain-
scaling-and-saving-with-cloud-spanner (7/11/2019)
When it came time for Blockchain to expand its Explorer offering to
include the Ethereum network, it turned to Cloud Spanner
"The company has achieved savings of 30% by replacing its previous database layer
with (the on-demand scalability of) Cloud Spanner."
Portland State University CS 430P/530 Internet, Web & Cloud Systems
Other database types
Cloud Memorystore
In In-memor emory y cache ches
Consider on-line profiles for gamers
Read frequently Update infrequently
Candidates for caching and replication Redis
High-performance data retrieval from in-memory store Used as sub-millisecond application caches for frequently accessed data
Cloud Memory Store, AWS ElastiCache
Hosted, fully managed, Redis/Memcached
Portland State University CS 430P/530 Internet, Web & Cloud Systems
Apache Kafka
Log Log-based based datast tastores
- res
Store data as initial base plus incremental changes
Similar to git and log-structured file systems
Build services around replaying logs Example: New
York Times
https://www.confluent.io/blog/publishing-apache-kafka-new-york-
times/
161 years of published content Requires search Requires the latest version of content to show up immediately when
published
Requires the ability to update search results based on new content Requires the ability to update personalized results based on new settings
- f the user
Portland State University CS 430P/530 Internet, Web & Cloud Systems
Previous ious app pproach
- ach
Producers of content running one set of systems
Disparate schemas in each CMS Disparate APIs to access content Previous versions of documents not available or difficult to access Schema and data formatting in silos of software hinder app development
Consumers of content running another set of systems
Disparate ways of operating
Portland State University CS 430P/530 Internet, Web & Cloud Systems
Log Log-based based app pproach
- ach
All published content appended to a particular topic queue in
chronological order
Services (e.g. search, personalization) access content by consuming
the logs
Do away with databases and just keep logs
Databases don't do well with streaming changes Force one into schemas that are hard to change and evolve
Log consumer always "replays" the log associated with the topic of
interest
Portland State University CS 430P/530 Internet, Web & Cloud Systems
Log Log-based based databases tabases
Append-only transactional logs via Kafka
Portland State University CS 430P/530 Internet, Web & Cloud Systems
Blockchains (append-only databases)
Azure Blockchain Workbench (2018) AWS Managed Blockchain, AWS QLDB*
"What the Internet did for communications, blockchain will do for trusted transactions"
- Ginni Rometty IBM (2018)
Bl Blockc ckchain hain-as as-a-Ser Service vice
What is a blockchain? Immutable ledger (transaction log)
Recall CRUD (create, read, update, delete) Block-chain (append, read)
Highly replicated and distributed
Partition tolerance where majority continue Eventual consistency Sound familiar to Cloud Spanner?
It should…https://lemag.sfeir.com/blockchain-and-cloud-spanner/
But, public versions tolerant to byzantine (malicious) participants
Built assuming mutually distrustful participants Security through cryptography and consensus protocols resistant to
byzantine failures
Portland State University CS 430P/530 Internet, Web & Cloud Systems