Infrastructure as a Service (IaaS) Google Compute Engine AWS - - PowerPoint PPT Presentation

infrastructure as a service iaas
SMART_READER_LITE
LIVE PREVIEW

Infrastructure as a Service (IaaS) Google Compute Engine AWS - - PowerPoint PPT Presentation

Infrastructure as a Service (IaaS) Google Compute Engine AWS Elastic Compute Cloud (EC2) Azure Virtual Machines Digital Ocean Go Google gle Compu pute e En Engi gine ne (GC GCE) E) Infrastructure-as-a-Service Hardware service


slide-1
SLIDE 1

Google Compute Engine AWS Elastic Compute Cloud (EC2) Azure Virtual Machines Digital Ocean

Infrastructure as a Service (IaaS)

slide-2
SLIDE 2

Go Google gle Compu pute e En Engi gine ne (GC GCE) E)

 Infrastructure-as-a-Service

 Hardware service for you to create and run virtual machine instances on  Lowest-level abstraction for cloud infrastructure  Flexible, but requires management  Good for arbitrary workloads

 Provides vertical scaling options

 Number of cores  Amount of RAM  Video card types  Type of disk (standard, SSD)  Up to 96 cores, 684 GB ! (10/2017)

 Billed at a sub-minute level

Portland State University CS 430P/530 Internet, Web & Cloud Systems

slide-3
SLIDE 3

 Seen previously

 Segmentation and filtering for security  Instance templates and groups to auto-scale up at the VM level  Load balancing to distribute work across VMs globally

Portland State University CS 430P/530 Internet, Web & Cloud Systems

slide-4
SLIDE 4

Pre-em emptible ptible VMs

 VMs that Google can re-claim at any time if demand spikes

 80% lower in cost!  Framestore

 Video rendering for visual designers  15,000 cores needed at peak rendering times  Unneeded at quiet time  Not mission-critical, OK to be pre-empted temporarily

 $300k saved by using on-demand, pre-emptible infrastructure over dedicated  Fault-tolerance built-in to application to restart interrupted jobs

Portland State University CS 430P/530 Internet, Web & Cloud Systems

slide-5
SLIDE 5

Compute Engine access

 All eventually hit API

Portland State University CS 430P/530 Internet, Web & Cloud Systems

REST API directly

slide-6
SLIDE 6

Via Web eb UI

 console.cloud.google.com

Portland State University CS 430P/530 Internet, Web & Cloud Systems

slide-7
SLIDE 7

Via Cloud

  • ud Shel

ell l or SDK DK

 Command-line interface (CLI) with myriad command-line options  List

gcloud compute instances list

 Create

gcloud compute instances create myinstance

 Delete

gcloud compute instances delete myinstance

Portland State University CS 430P/530 Internet, Web & Cloud Systems

slide-8
SLIDE 8

Via libraries braries and d API

 Through client libraries in many different languages (Python, JavaScript,

Go, etc.)

 Translates into HTTP/JSON API requests  Python example via google-api-python-client Python package

Portland State University CS 430P/530 Internet, Web & Cloud Systems

compute = googleapiclient.discovery.build('compute', 'v1') def list_instances(compute, project, zone): result = compute.instances().list(project=project, zone=zone).execute() return result['items'] if 'items' in result else None

slide-9
SLIDE 9

Via RES EST T API

 Demo via interactive API Explorer

 From web console, APIs and Services ➔ Library ➔ Compute Engine API ➔Try this

API in APIs Explorer  Example: listing instances via API

 Enable OAuth2

 compute.instances.list

Portland State University CS 430P/530 Internet, Web & Cloud Systems

slide-10
SLIDE 10

 Example

 REST API call

 GET

https://compute.googleapis.com/compute/v1/projects/{project}/zones/{zone }/instances  JSON response

Portland State University CS 430P/530 Internet, Web & Cloud Systems

slide-11
SLIDE 11

Storage as a Service

slide-12
SLIDE 12

Ba Back ckgr ground:

  • und: Go

Google gle file le sy syst stem em (GF GFS) S) 2003

 Designed to support Google Search

 Retrieving, storing, and querying of web pages at massive scale

 Goals

 Large data sets, high-throughput, low-latency querying  Durability and availability with very little management overhead

 Dead disks simply replaced and system seamlessly adapts

 Done via horizontal scaling and replication

 http://research.google.com/archive/gfs-sosp2003.pdf

 But, initially proprietary

 Yahoo! later reverse-engineered GFS  Released as Hadoop Distributed File System (HDFS).  Open-sourced and distributed by Apache

 Spun out commercially into …

Portland State University CS 430P/530 Internet, Web & Cloud Systems

slide-13
SLIDE 13

Go Google gle Cloud ud Storag age e (gc gcs)

 AWS equivalent is S3  Fully-managed (e.g. serverless), no-ops storage service

 No administration or capacity management  Backed up and versioned automatically

 Replicated and cached over multiple zones/regions

 Fixed region for local computation  Multi-region for global file delivery  Adats to load and access patterns for high availability and throughput

 Low latency: 10s of ms on first use, then faster via migration  Data encrypted at rest when not being used and in flight

 Key sharding with parts of keys in multiple jurisdictions  But, unencrypted when being used

 Massive scale

 Autism Speaks: 1300 genomes and > 100 TB of data  Projected to 10,000 genomes > 1 PB of data

Portland State University CS 430P/530 Internet, Web & Cloud Systems

slide-14
SLIDE 14

Model del

 Storage done via "buckets"

 Buckets, like URLs, must be uniquely identifiable

 Object-level storage

 Access storage similar to accessing objects over the web  Present an identifier, receive it back in its entirety  Different than block-level storage of disks

Portland State University CS 430P/530 Internet, Web & Cloud Systems

slide-15
SLIDE 15

App pplications lications

 Good for large unstructured data that does not need to be queried

 Images, Video, Zip files  Structured data that needs to be queried should use DBs

 Used to feed and store data and logs from all cloud services

 BigQuery, App Engine, Cloud SQL, ComputeEngine,

Dataflow/Dataproc, Etc..

Portland State University CS 430P/530 Internet, Web & Cloud Systems

slide-16
SLIDE 16

Access cess

 Web interface  SDK via gsutil command and gs:// URI

gsutil ls gsutil mb gs://xx-yy-zz

 Client libraries (Python google-cloud-storage)  REST API (Storage JSON API)

Portland State University CS 430P/530 Internet, Web & Cloud Systems

from google.cloud import storage storage_client = storage.Client() bucket_name = 'my-new-bucket' bucket = storage_client.create_bucket(bucket_name)

slide-17
SLIDE 17

Database as a Service

slide-18
SLIDE 18

Main in ty type pes

 SQL

 Relational structured data  Complex querying using relations  Schema (statically typed data)  Strict transactional consistency  Vertical scaling

Portland State University CS 430P/530 Internet, Web & Cloud Systems

 NoSQL

 Non-realational, unstructured data  Simple, fast key-value lookup  Schemaless (dynamically typed data)  Loose eventual consistency  Horizontal scaling

What explains the last two design patterns?

slide-19
SLIDE 19

CAP P Theo eorem rem (Fox/B x/Bre rewer er 2000) 0)

 Any networked system can have at most two of three desirable

properties

 C = consistency  A = availability  P = partition-tolerance

 Can not have strong consistency in the wake of network outages

with high availability

 Two consistency options for networked databases

 ACID (atomicity, consistency, isolation, durability)

 To achieve strong consistency, lose “A” availability in the face of a network

partition “P”

 Can not perform transactions until all replicas fully on-line  Cloud SQL

 BASE (basically available, soft state, eventual consistency)

 To achieve high availability, lose “C” in the face of a network partition “P”  Cloud BigTable & Cloud Datastore

Portland State University CS 430P/530 Internet, Web & Cloud Systems

slide-20
SLIDE 20

App pplication lication driv ives es consis nsisten ency cy mo model el

 Bank accounts

 Require strong consistency

 High-score updates in a game?

 Can survive with just eventual consistency

 Need different implementations of databases (and DBaaS) to support

different application requirements

Portland State University CS 430P/530 Internet, Web & Cloud Systems

slide-21
SLIDE 21

Two arch chit itectural ectural opt ptions

  • ns

 Server-based

 Machines with pre-configured database software  Cloud SQL, AWS RDS (MySQL, Postgres, MS SQL Server, etc.)  Many backend databases, many DBaaS

 Serverless

 Fully managed, NoOps, database services that automatically scale  Cloud Datastore, AWS DynamoDB  Cloud Spanner

Portland State University CS 430P/530 Internet, Web & Cloud Systems

slide-22
SLIDE 22

Ser erverless erless app pplied lied to d databa tabases ses

Portland State University CS 430P/530 Internet, Web & Cloud Systems

App optimization Scaling High availability DB backups DB patching DB installation OS patching OS installation Server maintenance Rack and stack Power, HVAC, network

On-premises

App optimization Scaling High availability DB backups DB patching DB installation OS patching OS installation Server maintenance Rack and stack Power, HVAC, network

IaaS

App optimization Scaling High availability DB backups DB patching DB installation OS patching OS installation Server maintenance Rack and stack Power, HVAC, network

DBaaS

slide-23
SLIDE 23

Server-based DBaaS

slide-24
SLIDE 24

Cloud SQL

AWS RDS (Relational Database Service) Azure SQL Database

slide-25
SLIDE 25

Reca ecall ll

 Drop-in replacement for MySQL or Postgres relational database

 AWS RDS with MS SQL Server, Oracle, MariaDB

 Uses pre-configured VMs on demand

 Vertical scaling (read and write)  Horizontal scaling only for reads via replicas

 Accessed via standard drivers

Portland State University CS 430P/530 Internet, Web & Cloud Systems

slide-26
SLIDE 26

Serverless DBaaS

slide-27
SLIDE 27

Cloud Datastore (NoSQL)

AWS DynamoDB Azure Cosmos DB

slide-28
SLIDE 28

Cloud ud Da Datast tastore

  • re

 Distributed, fully-managed NoSQL database optimized for reading

 Schemaless, key-value store

 Store entities and objects given a unique key  Stored object can be modified without conforming to some database schema

 Limited querying (mostly gets and puts)  NoOps (e.g. serverless operation)

 Autoscaled and managed, no configuration  Data automatically stored across multiple zones for availability  Programming API from App Engine for many languages

Portland State University CS 430P/530 Internet, Web & Cloud Systems

slide-29
SLIDE 29

Cloud ud Da Datast tastore

  • re

 Data organized by "Kind"

 Similar to table in SQL, categorizes entities for queries

 Each entity with unique key

 Similar to a row in SQL, but not all entries of a Kind have the same

properties

 Each entity stores properties containing data

 Properties similar to columns in SQL

Portland State University CS 430P/530 Internet, Web & Cloud Systems

slide-30
SLIDE 30

Op Oper erations ations

 Insert and retrieve data via put() and get()  Minimal query support via query() and scan()

 Scan goes through entire datastore  Query done via indices built using keys specified by application

(typically)

Portland State University CS 430P/530 Internet, Web & Cloud Systems

slide-31
SLIDE 31

Sum umma mary

Portland State University CS 430P/530 Internet, Web & Cloud Systems

Transactions No Yes No Yes Complex queries No No No Yes Capacity Petabytes+ Terabytes+ Petabytes+ Up to 500GB

slide-32
SLIDE 32

Cloud Spanner

Amazon Aurora "NewSQL"

slide-33
SLIDE 33

Cloud ud Spa panner nner (2017) 7)

 Horizontally scalable, relational ACID database

 Management of machines explicit

 Best of SQL

 SQL queries, JOINs  Schemas, strong types  Strong consistency  Indexes, strong secondary keys

 Best of NoSQL

 Horizontal scaling

Portland State University CS 430P/530 Internet, Web & Cloud Systems

slide-34
SLIDE 34

Spa panner nner and nd th the C e CAP P th theo eorem rem

 C (consistency) over A (availability) just like ACID  Scale via synchronous replicas (unlike Cloud Datastore)

 3 copies by default

 But, when partitions happen, go into partition mode

 Replicas use consensus mechanism to manage partitions  Replicas on the “majority” side of partition continue, those in minority

lose availability

 Engineer against P (partitions) via Google’s network to get 5 9s

reliability

 Good for scaling OLTP (On-Line Transaction Processing)

applications

Portland State University CS 430P/530 Internet, Web & Cloud Systems

https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/4 5855.pdf

slide-35
SLIDE 35

Ex Example ple us use e cases ses

 Require SQL with ACID at massive scale  Initially, manually-sharded MySQL

 Columns and tables of each database split across multiple nodes  Resharding a multi-year process  Moved to Cloud Spanner  F1 paper: "A Distributed SQL Database that Scales"

https://research.google.com/pubs/pub41344.html

 From sharded MySQL to Spanner

 https://quizlet.com/blog/quizlet-cloud-spanner

 Seamless integration of game data

 https://blogs.unity3d.com/2018/06/21/bringing-connected-games-within-reach-with-

google-cloud/

Portland State University CS 430P/530 Internet, Web & Cloud Systems

slide-36
SLIDE 36

 Blockchain.com

 Wallet and explorer  https://cloudblog.withgoogle.com/products/databases/blockchain-

scaling-and-saving-with-cloud-spanner (7/11/2019)

 When it came time for Blockchain to expand its Explorer offering to

include the Ethereum network, it turned to Cloud Spanner

 "The company has achieved savings of 30% by replacing its previous database layer

with (the on-demand scalability of) Cloud Spanner."

Portland State University CS 430P/530 Internet, Web & Cloud Systems

slide-37
SLIDE 37

Other database types

slide-38
SLIDE 38

Cloud Memorystore

slide-39
SLIDE 39

In In-memor emory y cache ches

 Consider on-line profiles for gamers

 Read frequently  Update infrequently

 Candidates for caching and replication  Redis

 High-performance data retrieval from in-memory store  Used as sub-millisecond application caches for frequently accessed data

 Cloud Memory Store, AWS ElastiCache

 Hosted, fully managed, Redis/Memcached

Portland State University CS 430P/530 Internet, Web & Cloud Systems

slide-40
SLIDE 40

Apache Kafka

slide-41
SLIDE 41

Log Log-based based datast tastores

  • res

 Store data as initial base plus incremental changes

 Similar to git and log-structured file systems

 Build services around replaying logs  Example: New

York Times

 https://www.confluent.io/blog/publishing-apache-kafka-new-york-

times/

 161 years of published content  Requires search  Requires the latest version of content to show up immediately when

published

 Requires the ability to update search results based on new content  Requires the ability to update personalized results based on new settings

  • f the user

Portland State University CS 430P/530 Internet, Web & Cloud Systems

slide-42
SLIDE 42

Previous ious app pproach

  • ach

 Producers of content running one set of systems

 Disparate schemas in each CMS  Disparate APIs to access content  Previous versions of documents not available or difficult to access  Schema and data formatting in silos of software hinder app development

 Consumers of content running another set of systems

 Disparate ways of operating

Portland State University CS 430P/530 Internet, Web & Cloud Systems

slide-43
SLIDE 43

Log Log-based based app pproach

  • ach

 All published content appended to a particular topic queue in

chronological order

 Services (e.g. search, personalization) access content by consuming

the logs

 Do away with databases and just keep logs

 Databases don't do well with streaming changes  Force one into schemas that are hard to change and evolve

 Log consumer always "replays" the log associated with the topic of

interest

Portland State University CS 430P/530 Internet, Web & Cloud Systems

slide-44
SLIDE 44

Log Log-based based databases tabases

 Append-only transactional logs via Kafka

Portland State University CS 430P/530 Internet, Web & Cloud Systems

slide-45
SLIDE 45

Blockchains (append-only databases)

Azure Blockchain Workbench (2018) AWS Managed Blockchain, AWS QLDB*

"What the Internet did for communications, blockchain will do for trusted transactions"

  • Ginni Rometty IBM (2018)
slide-46
SLIDE 46

Bl Blockc ckchain hain-as as-a-Ser Service vice

 What is a blockchain?  Immutable ledger (transaction log)

 Recall CRUD (create, read, update, delete)  Block-chain (append, read)

 Highly replicated and distributed

 Partition tolerance where majority continue  Eventual consistency  Sound familiar to Cloud Spanner?

 It should…https://lemag.sfeir.com/blockchain-and-cloud-spanner/

 But, public versions tolerant to byzantine (malicious) participants

 Built assuming mutually distrustful participants  Security through cryptography and consensus protocols resistant to

byzantine failures

Portland State University CS 430P/530 Internet, Web & Cloud Systems

slide-47
SLIDE 47

Data warehouses

Google BigQuery, AWS Athena, Azure Data Lake