Cloud Computing
RICS tutorial
Dan C. Marinescu
Computer Science Division EECS Department, UCF
Email:dcm@cs.ucf.edu
Cloud Computing RICS tutorial Dan C. Marinescu Computer Science - - PowerPoint PPT Presentation
Cloud Computing RICS tutorial Dan C. Marinescu Computer Science Division EECS Department, UCF Email:dcm@cs.ucf.edu The tutorial is based on the book Cloud Computing: Theory and Practice ISBN-13: 978-0124046276 Published by Morgan Kaufmann in
Computer Science Division EECS Department, UCF
Email:dcm@cs.ucf.edu
http://www.amazon.com/Cloud-Computing-Practice-Dan- Marinescu/dp/0124046274/ref=sr_1_4?s=books&ie=UTF8&qid=1365357500&sr=1-4&keywords=Dan+C.+Marinescu Cloud Computing - RICS May 2013 2
1.
2.
3.
4.
5.
6.
Cloud Computing - RICS May 2013 3
Network centric computing and network centric content. Cloud computing: the good, challenges, and vulnerabilities. Types of clouds. Cloud delivery models.
Cloud Computing - RICS May 2013 4
Information processing can be done more efficiently on large farms
Grid computing – initiated by the National Labs in the early 1990s;
targeted primarily at scientific computing
Utility computing – initiated in 2005-2006 by IT companies and
targeted at enterprise computing.
The focus of utility computing is on the business model for providing
Cloud computing is a path to utility computing embraced by major IT
5 Cloud Computing - RICS May 2013
Content: any type or volume of media, be it static or dynamic,
The “Future Internet” will be content-centric; the creation and
6 Cloud Computing - RICS May 2013
Data-intensive: large scale simulation in science and engineering
Network-intensive: transferring large volumes of data requires high
Low-latency networks for data streaming, parallel computing,
The systems are accessed using thin clients running on systems
The infrastructure should support some form of workflow
Cloud Computing - RICS May 2013 7
The web and the semantic web - expected to support composition of
The Grid - initiated in the early 1990s by National Laboratories and
Peer-to-peer systems Computer clouds
Cloud Computing - RICS May 2013 8
Uses Internet technologies to offer scalable and elastic services. The
The resources used for these services can be metered and the users
The maintenance and security are ensured by service providers. The service providers can operate more efficiently due to specialization
9 Cloud Computing - RICS May 2013
Lower costs for the cloud service provider are past to the cloud
Data is stored:
closer to the site where it is used. in a device and in a location-independent manner.
The data storage strategy can increases reliability, as well as
Cloud Computing - RICS May 2013 10
Public Cloud - the infrastructure is made available to the general
Private Cloud - infrastructure operated solely for an organization. Community Cloud - the infrastructure is shared by several
Hybrid Cloud - composition of two or more clouds (public, private, or
Cloud Computing - RICS May 2013 11
Resources such as CPU cycles, storage, network bandwidth are
When multiple applications share a system their peak demands for
Resources can be aggregated to support data-intensive
Data sharing facilitates collaborative activities. Many applications
12 Cloud Computing - RICS May 2013
Eliminate the initial investment costs for a private computing
Cost reduction: concentration of resources creates the opportunity to
Elasticity: the ability to accommodate workloads with very large
User convenience: virtualization allows users to operate in familiar
Cloud Computing - RICS May 2013 13
It is in a better position to exploit recent advances in software, networking,
It is focused on enterprise computing; its adoption by industrial
A cloud consists of a homogeneous set of hardware and software
The resources are in a single administrative domain (AD). Security,
14 Cloud Computing - RICS May 2013
Availability of service; what happens when the service provider
Diversity of services, data organization, user interfaces available
Data confidentiality and auditability, a serious problem. Data transfer bottleneck; many applications are data-intensive.
15 Cloud Computing - RICS May 2013
Performance unpredictability, one of the consequences of resource
How to use resource virtualization and performance isolation for QoS
guarantees?
How to support elasticity, the ability to scale up and down quickly?
Resource management; is self-organization and self-management a
Security and confidentiality; major concern. Addressing these challenges provides good research
Cloud Computing - RICS May 2013 16
Cloud Computing - RICS May 2013 17
Delivery models
Infrastructure as a Service (IaaS) Software as a Service (SaaS) Platform as a Service (PaaS)
Deployment models
Private cloud Hybrid cloud Public cloud Community cloud
Defining attributes
Massive infrastructure Accessible via the Internet Utility computing. Pay-per-usage Elasticity
Cloud computing
Resources
Networks Compute & storage servers Services Applications
Infrastructure
Distributed infrastructure Resource virtualization Autonomous systems
Software as a Service (SaaS) Platform as a Service (PaaS) In frastructure as a Service (IaaS)
Cloud Computing - RICS May 2013 18
Applications are supplied by the service provider. The user does not manage or control the underlying cloud
Services offered include:
Enterprise services such as: workflow management, group-ware and
collaborative, supply chain, communications, digital signature, customer relationship management (CRM), desktop software, financial management, geo-spatial, and search.
Web 2.0 applications such as: metadata management, social
networking, blogs, wiki services, and portal services.
Not suitable for real-time applications or those where data is not
Examples: Gmail, Google search engine.
Cloud Computing - RICS May 2013 19
Allows a cloud user to deploy consumer-created or acquired
The user:
has control over the deployed applications and, possibly, application
hosting environment configurations;
does not manage or control the underlying cloud infrastructure including
network, servers, operating systems, or storage.
Not particularly useful when:
the application must be portable; proprietary programming languages are used; the hardware and software must be customized to improve the
performance of the application.
Cloud Computing - RICS May 2013 20
The user is able to deploy and run arbitrary software, which can
The user does not manage or control the underlying cloud
Services offered by this delivery model include: server hosting, web
Cloud Computing - RICS May 2013 21
Cloud Computing - RICS May 2013 22
Facilities Hardware Core connectivity Abstraction API
Software as a Service
Facilities Hardware Core connectivity Abstraction API Integration and middleware Data Metadata Applications API Presentation
Infrastructure as a Service
Facilities Hardware Core connectivity Abstraction API Integration and middleware
Platform as a Service
Cloud Computing - RICS May 2013 23 Carrier
S e c u r i t y P r i v a c y
Service Consumer Broker Service Provider Auditor Security audit Privacy impact audit Performance audit Service Management Business support Provisioning Portability/ Interoperability IAAS IaaS SaaS Service Layer PaaS Carrier Hardware Facility
Physical resource layer Resource abstraction and control layer
Intermediation Aggregation Arbitrage
Paradigm shift with implications on computing ethics:
the control is relinquished to third party services; the data is stored on multiple sites administered by several
multiple services interoperate across the network.
Implications
unauthorized access; data corruption; infrastructure failure, and service unavailability.
Cloud Computing - RICS May 2013 24
Systems can span the boundaries of multiple organizations and
The complex structure of cloud services can make it difficult to
Identity fraud and theft are made possible by the unauthorized
Cloud Computing - RICS May 2013 25
Cloud service providers have already collected petabytes of
Privacy is affected by cultural differences; some cultures favor
Cloud Computing - RICS May 2013 26
Clouds are affected by malicious attacks and failures of the
Such events can affect the Internet domain name servers and
in 2004 an attack at Akamai caused a domain name outage and a major
blackout that affected Google, Yahoo, and other sites.
in 2009, Google was the target of a denial of service attack which took
down Google News and Gmail for several days;
in 2012 lightning caused a prolonged down time at Amazon.
Cloud Computing - RICS May 2013 27
IaaS services from Amazon Open-source platforms for private clouds Cloud storage diversity and vendor lock-in Cloud interoperability; the Intercloud Energy use and ecological impact large datacenters Service and compliance level agreements Responsibility sharing between user and the cloud service provider
Cloud Computing - RICS May 2013 28
The cloud computing infrastructure at Amazon, Google, and Microsoft
Amazon is a pioneer in Infrastructure-as-a-Service (IaaS) Google's efforts are focused on Software-as-a-Service (SaaS) and
Platform-as-a-Service (PaaS)
Microsoft is involved in PaaS
Private clouds are an alternative to public clouds. Open-source cloud
Eucalyptus OpenNebula Nimbus OpenStack
Cloud Computing - RICS May 2013 29
Amazon offers cloud services through a network of data centers on
In each region there are several availability zones interconnected by
An availability zone is a data center consisting of a large number of
Regions do not share resources and communicate through the
Cloud Computing - RICS May 2013 30
Cloud Computing - RICS May 2013 31
Internet
Cloud interconnect
AWS storage servers
S3 S3 S3
S3
EBS
EBS
SDB SDB SDB
Simple DB Compute server EC2 instance Compute server EC2 instance Compute server
Instance
EC2 instance Servers running AWS services SQS Cloud watch AWS management console Elastic beanstalk Cloud front Elastic cache Elastic load balancer Cloud formation
NAT
Retrieve the user input from the front-end. Retrieve the disk image of a VM (Virtual Machine) from a repository. Locate a system and requests the VMM (Virtual Machine Monitor)
Invoke the Dynamic Host Configuration Protocol (DHCP) and the IP
Cloud Computing - RICS May 2013 32
A main attraction of the Amazon cloud computing is the low cost.
Cloud Computing - RICS May 2013 33
Route 53 - low-latency DNS service used to manage user's DNS public
Elastic MapReduce (EMR) - supports processing of large amounts of
Simple Workflow Service (SWF) - supports workflow management;
ElastiCache - enables web applications to retrieve data from a
DynamoDB - scalable and low-latency fully managed NoSQL database
Cloud Computing - RICS May 2013 34
CloudFront - web service for content delivery. Elastic Load Balancer - automatically distributes the incoming
Elastic Beanstalk - handles automatically deployment, capacity
CloudFormation - allows the creation of a stack describing the
Cloud Computing - RICS May 2013 35
Handles automatically the deployment, capacity provisioning, load
Interacts with other services including EC2, S3, SNS, Elastic Load
The management functions provided by the service are:
deploy a new application version (or rollback to a previous version); access to the results reported by CloudWatch monitoring service; email notifications when application status changes or application servers are
added or removed; and
access to server log files without needing to login to the application servers.
The service is available using: a Java platform, the PHP server-side
Cloud Computing - RICS May 2013 36
Eucalyptus - can be regarded as an open-source counterpart of
Open-Nebula - a private cloud with users actually logging into the
Nimbus - a cloud solution for scientific applications based on Globus
the image storage, the credentials for user authentication, the requirement that a running Nimbus process can ssh into all
Cloud Computing - RICS May 2013 37
Cloud Computing - RICS May 2013 38
Risks when a large organization relies on a single cloud
cloud services may be unavailable for a short, or an extended
permanent data loss in case of a catastrophic system failure; the provider may increase the prices for service.
Switching to another provider could be very costly due
A solution is to replicate the data to multiple cloud
Cloud Computing - RICS May 2013 39
Cloud Computing - RICS May 2013 40
d1 d3 a1 a2 a3 b2 dP c1 b1 d2 d3 c3 b3 c2 d1 aP bP cP Disk 1 Disk 4 Disk 3 Disk 2
RAID 5 controller Proxy
a2 c1 b2 a3 bP c2 d2 a1 dP c1 b1 b3 cP d3 c3
Client (a) (b)
d1 aP Cloud 1 Cloud 2 Cloud 4 Cloud 3
Is an Intercloud, a federation of clouds that cooperate to
Not likely at this time:
there are no standards for either storage of processing; the clouds are based on different delivery models; the set of services supported by these delivery models is large
CSPs (Cloud Service Providers) belive that they have a
Security is a major concern for cloud users and an Intercloud
Cloud Computing - RICS May 2013 41
The energy consumption of large-scale data centers and their costs for
In 2006, the 6,000 data centers in the U.S consumed 61x109 KWh of
The energy consumed by the data centers was expected to double from
The greenhouse gas emission due to the data centers is estimated to
The effort to reduce energy use is focused on computing, networking,
Cloud Computing - RICS May 2013 42
Operating efficiency of a system is captured by the performance per
The performance of supercomputers has increased 3.5 times faster
A typical Google cluster spends most of its time within the 10-50%
Cloud Computing - RICS May 2013 43
An energy-proportional system consumes no power when idle, very
By definition, an ideal energy-proportional system is always
Humans are a good approximation of an ideal energy proportional
Even when power requirements scale linearly with the load, the
Cloud Computing - RICS May 2013 44
Cloud Computing - RICS May 2013 45 10 10 100 90 80 70 60 50 40 30 20
Percentage of power usage
100 90 80 70 60 50 40 30 20
Percentage
utilization Typical operating region Energy efficiency Power
SLA - a negotiated contract between the customer and CSP; can be
Identify and define the customer’s needs and constraints including the
level of resources, security, timing, and QoS.
Provide a framework for understanding; a critical aspect of this
framework is a clear definition of classes of service and the costs.
Simplify complex issues; clarify the boundaries between the
responsibilities of clients and CSP in case of failures.
Reduce areas of conflict. Encourage dialog in the event of disputes. Eliminate unrealistic expectations.
Specifies the services that the customer receives, rather than how
Cloud Computing - RICS May 2013 46
Cloud Computing - RICS May 2013 47
Interface Application Operating system Hypervisor Computing service Storage service Network Local infrastructure Interface Application Operating system Hypervisor Computing service Storage service Network Local infrastructure Interface Application Operating system Hypervisor Computing service Storage service Network Local infrastructure
SaaS PaaS IaaS C L O U D U S E R S E R V I C E P R O V I D E R
User responsibility
Potential loss of control/ownership of data. Data integration, privacy enforcement, data encryption. Data remanence after de-provisioning. Multi tenant data isolation. Data location requirements within national borders. Hypervisor security. Audit data integrity protection. Verification of subscriber policies through provider controls. Certification/Accreditation requirements for a given cloud service.
Cloud Computing - RICS May 2013 48
Existing cloud applications and new opportunities Architectural styles for cloud applications Coordination based on a state machine model – the Zookeeper The MapReduce programming model Clouds for science and engineering High performance computing on a cloud Legacy applications on a cloud Social computing, digital content, and cloud computing
Cloud Computing - RICS May 2013 49
Cloud computing is very attractive to the users:
Economic reasons
low infrastructure investment low cost - customers are only billed for resources used
Convenience and performance
application developers enjoy the advantages of a just-in-time
infrastructure they are free to design an application without being concerned with the system where the application will run;
the potential to reduce the execution time of compute-intensive
and data-intensive applications through parallelization. If an application can partition the workload in n segments and spawn n instances of itself, then the execution time could be reduced by a factor close to n.
Cloud computing is also beneficial for the providers of computing
Cloud Computing - RICS May 2013 50
Ideal applications for cloud computing:
Web services; Database services; Transaction-based services; The resource requirements of
Applications unlikely to perform well on a cloud:
Applications with a complex workflow and multiple dependencies,
Applications which require intensive communication among
When the workload cannot be arbitrarily partitioned.
Cloud Computing - RICS May 2013 51
Challenges
Performance isolation is nearly impossible to reach in a real
Reliability - major concern; server failures expected when a
Cloud infrastructure exhibits latency and bandwidth fluctuations
Performance considerations limit the amount of data logging;
Cloud Computing - RICS May 2013 52
Three broad categories of existing applications:
Processing pipelines; Batch processing systems; and Web applications
Potentially new applications
Batch processing for decision support systems and business
Mobile interactive applications which process large volumes of
Science and engineering could greatly benefit from cloud
Cloud Computing - RICS May 2013 53
Indexing large datasets created by web crawler engines. Data mining - searching large collections of records to locate
Image processing
image conversion, e.g., enlarge an image or create
compress or encrypt images.
Video transcoding from one video format to another, e.g., from
Document processing;
convert large collection of documents from one format to
encrypt the documents; use Optical Character Recognition to produce digital images of
Cloud Computing - RICS May 2013 54
Generation of daily, weekly, monthly, and annual activity reports for
Processing, aggregation, and summaries of daily transactions for
Processing billing and payroll records. Management of the software development, e.g., nightly updates of
Automatic testing and verification of software and hardware
Cloud Computing - RICS May 2013 55
Sites for online commerce Sites with a periodic or temporary presence.
Conferences or other events. Active during a particular season (e.g., the Holidays Season)
Sites for promotional activities Sites that ``sleep'' during the night and auto-scale during the
Cloud Computing - RICS May 2013 56
Based on the client-server paradigm. Stateless servers - view a client request as an independent
Often clients and servers communicate using Remote Procedure
Simple Object Access Protocol (SOAP) - application protocol for
Representational State Transfer (REST) - software architecture for
Cloud Computing - RICS May 2013 57
Cloud elasticity distribute computations and data across multiple
ZooKeeper
distributed coordination service for large-scale distributed systems; high throughput and low latency service; implements a version of the Paxos consensus algorithm; open-source software written in Java with bindings for Java and C. the servers in the pack communicate and elect a leader; a database is replicated on each server; consistency of the
a client connect to a single server, synchronizes its clock with the
Cloud Computing - RICS May 2013 58
Cloud Computing - RICS May 2013 59
Server Server Server Server Server Client Client Client Client Client Client Client Client (a) Write processor Replicated database Atomic broadcast
WRITE READ
(b) Leader Follower Follower Follower Follower Follower (c)
WRITE
Messaging layer responsible for the election of a new leader
Messaging protocols uses:
packets - sequence of bytes sent through a FIFO channel, proposals - units of agreement, and messages - sequence of bytes atomically broadcast to all
A message is included into a proposal and it is agreed upon
Proposals are agreed upon by exchanging packets with a
Cloud Computing - RICS May 2013 60
Messaging layer guarantees
Reliable delivery: if a message m is delivered to one server, it
Total order: if message m a is delivered before message n to
Causal order: if message n is sent after m has been delivered
Cloud Computing - RICS May 2013 61
Cloud Computing - RICS May 2013 62
The guarantees provided by Zookeeper:
Atomicity - a transaction either completes or fails. Sequential consistency of updates - updates are applied
Single system image for the clients - a client receives the
Persistence of updates - once applied, an update persists
Reliability - the system is guaranteed to function correctly
Cloud Computing - RICS May 2013 63
The API is simple - consists of seven operations:
create - add a node at a given location on the tree; delete - delete a node; get data - read data from a node; set data - write data to a node; get children - retrieve a list of the children of the node Synch - wait for the data to propagate.
Cloud Computing - RICS May 2013 64
Elasticity ability to use as many servers as necessary to optimally
How to divide the load
Transaction processing systems a front-end distributes the incoming
transactions to a number of back-end systems. As the workload increases new back-end systems are added to the pool.
For data-intensive batch applications two types of divisible workloads:
modularly divisible the workload partitioning is defined apriori arbitrarily divisible the workload can be partitioned into an
arbitrarily large number of smaller workloads of equal, or very close size.
Many applications in physics, biology, and other areas of
Cloud Computing - RICS May 2013 65
1.
2.
3.
4.
5.
6.
7.
Cloud Computing - RICS May 2013 66
Cloud Computing - RICS May 2013 67
Segment 1
Segment 1
Segment 2 Segment 3 Segment M Map instance 1 Map instance 2 Map instance 3 Map instance M Local disk Local disk Local disk Local disk Master instance Application
1 2 1 3 4 1
Reduce instance 1 Reduce instance 2 Reduce instance R Map phase Reduce phase Shared storage Shared storage
5 6
Input data
7
The application illustrates the means to
create an on-demand infrastructure run it on a massively distributed system in a manner that allows
GrepTheWeb
Performs a search of a very large set of records to identify
It is analogous to the Unix grep command. The source is a collection of document URLs produced by the
Uses message passing to trigger the activities of multiple
Cloud Computing - RICS May 2013 68
by the web crawler;
the current status and to terminate the processing.
Cloud Computing - RICS May 2013 69
SQS Controller EC2 Cluster Input records Regular expression Output Status S3 Simple DB
(a)
Status DB Launch controller Monitor controller Shutdown controller Billing controller Billing service Launch queue Monitor queue Billing queue Shutdown queue Amazon SimpleDB Output Input Amazon S3 HDHS Hadoop Cluster on Amazon SE2 (b)
Get file Put file
Controller
The generic problems in virtually all areas of science are:
Collection of experimental data. Management of a very large volumes of data. Building and execution of models. Integration of data and literature. Documentation of the experiments. Sharing the data with others; data preservation for a long periods
All these activities require “big” data storage and systems capable
Cloud Computing - RICS May 2013 70
Phases of data discovery in large scientific data sets:
recognition of the information problem; generation of search queries using one or more search engines; evaluation of the search results; evaluation of the web documents; comparing information from different sources.
Large scientific data sets:
biomedical and genomic data from the National Center for
astrophysics data from NASA atmospheric data from the National Oceanic and Atmospheric
Cloud Computing - RICS May 2013 71
Comparative benchmark of EC2 and three supercomputers at the
Conclusion - communication intensive applications are affected by
Cloud Computing - RICS May 2013 72
Is it feasible to run legacy applications on a cloud? Cirrus - a general platform for executing legacy Windows
BLAST - a biology code which finds regions of local similarity
AzureBLAST - a version of BLAST running on the Azure platform.
Cloud Computing - RICS May 2013 73
Cloud Computing - RICS May 2013 74
Web portal Web service Web role
Job registration
Job manager role
Job scheduler
Azure table
Scaling engine Parametric engine Sampling filter Worker Worker Worker Worker Worker Worker Dispatch queue
Azure blob
Cloud Computing - RICS May 2013 75 Portal Client Queues BigJob Manager Worker Role
BigJob Agent task 1 task 1 task k
Worker Role
BigJob Agent task k+1 task k+2 task n
Service Mahagement API Blob
query post results start VM query state start replicas
Networks allowing researchers to share data and provide a virtual
MyExperiment for biology. nanoHub for nanoscience.
Volunteer computing - a large population of users donate resources
Mersenne Prime Search SETI@Home, Folding@home, Storage@Home PlanetLab
Berkeley Open Infrastructure for Network Computing (BOINC)
Cloud Computing - RICS May 2013 76
Virtual machine monitor Virtual machine Performance and security isolation Architectural support for virtualization x86 support for virtualization Full and paravirtualization Xen 1.0 and 2.0 Performance comparison of virtual machine monitors The darker side of virtualization
Cloud Computing - RICS May 2013 77
Partitions the resources of computer system into one or more virtual
A VMM allows
Multiple services to share the same platform. Live migration- the movement of a server from one platform to
System modification while maintaining backward compatibility
Enforces isolation among the systems, thus security.
Cloud Computing - RICS May 2013 78
A VMM
Traps the privileged instructions executed by a guest OS and
Traps interrupts and dispatches them to the individual guest
Controls the virtual memory management. Maintains a shadow page table for each guest OS and replicates
Monitors the system performance and takes corrective actions to
Cloud Computing - RICS May 2013 79
VM - isolated environment that appears to be a whole computer, but
Process VM - a virtual platform created for an individual process
System VM - supports an operating system together with many user
Traditional VM - supports multiple virtual machines and runs directly
Hybrid VM - shares the hardware with a host operating system and
Hosted VM - runs under a host operating system.
Cloud Computing - RICS May 2013 80
Cloud Computing - RICS May 2013 81
VMM Hardware
Application Application Guest OS
Host OS
Application Application
Hardware Host OS Virtual Machine Monitor VM-1
Application Guest OS -1
VM-n
Application Guest OS -n (c) (d)
Virtual Machine Monitor Hardware
(b) Application
VM-n
Guest OS -n
VM-1
Application Guest OS -1 (a) System VMs Traditional VM Hybrid VM Hosted VM Process VMs Whole system VM Codesigned VM Dynamic translators HLL VMs Binary
Multi program
Same ISA Same ISA Different ISA Different ISA
Cloud Computing - RICS May 2013 82
The run-time behavior of an application is affected by other
Performance isolation - a critical condition for QoS guarantees in
A VMM is a much simpler and better specified system than a
The security vulnerability of VMMs is considerably reduced as the
Cloud Computing - RICS May 2013 83
Conditions for efficient virtualization
A program running under the VMM should exhibit a behavior essentially
identical to that demonstrated when running on an equivalent machine directly.
The VMM should be in complete control of the virtualized resources. A statistically significant fraction of machine instructions must be
executed without the intervention of the VMM.
Two classes of machine instructions:
Sensitive - require special precautions at execution time:
Control sensitive - instructions that attempt to change either the
memory allocation or the privileged mode.
Mode sensitive - instructions whose behavior is different in the
privileged mode.
Innocuous - not sensitive.
Cloud Computing - RICS May 2013 84
Full virtualization – a guest OS can run unchanged under the VMM
Requires a virtualizable architecture Examples: VMware
Paravirtualization - a guest operating system is modified to use only
Some aspects of the hardware cannot be virtualized. Improved performance. Present a simpler interface
Examples: Xen, Denaly
Cloud Computing - RICS May 2013 85
Cloud Computing - RICS May 2013 86
Guest OS Hypervisor Hardware abstraction layer Hardware Guest OS Hypervisor Hardware abstraction layer Hardware
(a) Full virtualization (b) Paravirtualization
Ring de-privileging - a VMMs forces the operating system and the
Ring aliasing - a guest OS is forced to run at a privilege level other
Address space compression - a VMM uses parts of the guest
Non-faulting access to privileged state - several store instructions
Guest system calls which cause transitions to/from privilege level 0
Interrupt virtualization - in response to a physical interrupt the VMM
Cloud Computing - RICS May 2013 87
Access to hidden state - elements of the system state, e.g.,
Ring compression - paging and segmentation protect VMM code
The task-priority register is frequently used by a guest OS; the VMM
Cloud Computing - RICS May 2013 88
Supports two modes of operations:
VMX root - for VMM operations VMX non-root - support a VM.
The Virtual Machine Control Structure including host-state and
VM entry - the processor state is loaded from the guest-state of the VM
scheduled to run; then the control is transferred from VMM to the VM.
VM exit - saves the processor state in the guest-state area of the
running VM; then it loads the processor state from the host-state area, finally transfers control to the VMM.
Cloud Computing - RICS May 2013 89
Cloud Computing - RICS May 2013 90
VMX root VMX non-root
VM entry VM exit Virtual-machine control structure
(a) (b) host-state guest-state
I/O MMU virtualization gives VMs direct access to
VT-d supports:
DMA address remapping, address translation for device DMA
Interrupt remapping, isolation of device interrupts and VM
I/O device assignment, the devices can be assigned by an
Reliability features, it reports and records DMA and interrupt
Cloud Computing - RICS May 2013 91
The goal of the Cambridge group - design a VMM capable of scaling
Linux, Minix, NetBSD, FreeBSD, NetWare, and OZONE can operate
Xen domain - ensemble of address spaces hosting a guest OS and
Dom0 - dedicated to execution of Xen control functions and privileged
instructions
DomU - a user domain
Applications make system calls using hypercalls processed
Cloud Computing - RICS May 2013 92
Cloud Computing - RICS May 2013 93
Domain0 control interface Virtual x86 CPU Virtual physical memory Virtual network Virtual block devices
Xen-aware device drivers
Xen-aware device drivers
Xen-aware device drivers Xen-aware device drivers
Xen-aware device drivers
Xen runs at privilege Level 0, the guest OS at Level 1, and
The x86 architecture does not support either the tagging of TLB
Solution - load Xen in a 64 MB segment at the top of each address
Xen schedules individual domains using the Borrowed Virtual Time
A guest OS must register with Xen a description table with the
Cloud Computing - RICS May 2013 94
XenStore – a Dom0 process.
Supports a system-wide registry and naming service. Implemented as a hierarchical key-value storage. A watch function of informs listeners of changes of the key in storage
they have subscribed to.
Communicates with guest VMs via shared memory using Dom0
privileges.
Toolstack - responsible for creating, destroying, and
To create a new VM a user provides a configuration file describing
memory and CPU allocations and device configurations.
Toolstack parses this file and writes this information in XenStore. Takes advantage of Dom0 privileges to map guest memory, to load a
kernel and virtual BIOS and to set up initial communication channels with XenStore and with the virtual console when a new VM is created.
Cloud Computing - RICS May 2013 95
Cloud Computing - RICS May 2013 96
Each domain has one or more Virtual Network Interfaces (VIFs)
Split drivers have a front-end of in the DomU and the back-end in
Ring - a circular queue of descriptors allocated by a domain and
Two rings of buffer descriptors, one for packet sending and one for
To transmit a packet:
a guest OS enqueues a buffer descriptor to the send ring, then Xen copies the descriptor and checks safety, copies only the packet header, not the payload, and executes the matching rules.
Cloud Computing - RICS May 2013 97
Xen zero-copy semantics for data transfer using I/O rings. (a) The communication between a guest domain and the driver domain over an I/O and an event channel; NIC is the Network Interface Controller. (b) the circular ring of buffers.
Cloud Computing - RICS May 2013 98
Consumer Request (private pointer in Xen) Producer Request (shared pointer updated by the guest OS) Producer Response (shared pointer updated by Xen) Consumer Response (private pointer maintained by the guest OS) Response queue Request queue Unused descriptors Outstanding descriptors Bridge Driver domain Guest domain Backend Frontend XEN Network interface NIC
(a) (b)
I/O channel Event channel
Optimization of:
Virtual interface - takes advantage of the capabilities of some
I/O channel - rather than copying a data buffer holding a packet,
Virtual memory - takes advantage of the superpage and global
Cloud Computing - RICS May 2013 99
In a layered structure a defense mechanism at some layer can be
It is feasible to insert a rogue VMM, a Virtual-Machine Based Rootkit
Rootkit - malware with a privileged access to a system. The VMBR can enable a separate malicious OS to run surreptitiously
Under the protection of the VMBR the malicious OS could:
observe the data, the events, or the state of the target system; run services such as spam relays or distributed denial-of-service attacks; interfere with the application.
Cloud Computing - RICS May 2013 100
The insertion of a Virtual-Machine Based Rootkit (VMBR) as the lowest layer of the software stack running on the physical hardware; (a) below an
VMBR enables a malicious OS to run surreptitiously and makes it invisible to the genuine or the guest OS and to the application.
Cloud Computing - RICS May 2013 101
Hardware Virtual machine based rootkit Operating system (OS) Application Hardware Virtual machine monitor Guest OS Application Virtual machine based rootkit (a) (b) Malicious OS Malicious OS
Policies and mechanisms Tradeoffs Resource bundling Combinatorial auctions
Cloud Computing - RICS May 2013 102
Cloud resource management
Requires complex policies and decisions for multi-objective
It is challenging - the complexity of the system makes it impossible to
have accurate global state information and because of the
Affected by unpredictable interactions with the environment, e.g.,
system failures, attacks
Cloud service providers are faced with large fluctuating loads which
challenge the claim of cloud elasticity.
The strategies for resource management for IaaS, PaaS, and SaaS
Cloud Computing - RICS May 2013 103
1.
2.
3.
4.
5.
Cloud Computing - RICS May 2013 104
Control theory uses the feedback to guarantee system stability and
Machine learning does not need a performance model of the
Utility-based require a performance model and a mechanism to
Market-oriented/economic do not require a model of the system,
Cloud Computing - RICS May 2013 105
To reduce cost and save energy we may need to concentrate the
We may also need to operate at a lower clock rate; the performance
Cloud Computing - RICS May 2013 106
Resources in a cloud are allocated in bundles. Users get maximum benefit from a specific combination of resources:
Resource bundling complicates traditional resource allocation models
The bidding process aims to optimize an objective function f(x,p). In the context of cloud computing, an auction is the allocation of
Cloud Computing - RICS May 2013 107
Users provide bids for desirable bundles and the price they are
Prices and allocation are set as a result of an auction. Ascending Clock Auction, (ASCA) the current price for each
The algorithm involves user bidding in multiple rounds; to address
Cloud Computing - RICS May 2013 108
The schematics of the ASCA algorithm; to allow for a single round auction users are represented by proxies which place the bids xu(t). The auctioneer determines if there is an excess demand and, in that case, it raises the price of resources for which the demand exceeds the supply and requests new bids.
Cloud Computing - RICS May 2013 109
Auctioneer u1 Proxy u2 u3 uU Proxy Proxy Proxy x1(t) xU(t) x3(t) x2(t)
) (
t x
u u
p(t+1)
A pricing and allocation algorithm partitions the set of users in two
Desirable properties of a pricing algorithm:
Be computationally tractable; traditional combinatorial auction algorithms
e.g., Vickey-Clarke-Groves (VLG) are not computationally tractable.
Scale well - given the scale of the system and the number of requests for
service, scalability is a necessary condition.
Be objective - partitioning in winners and losers should only be based on
the price of a user's bid; if the price exceeds the threshold then the user is a winner, otherwise the user is a loser.
Be fair - make sure that the prices are uniform, all winners within a given
resource pool pay the same price.
Indicate clearly at the end of the auction the unit prices for each resource
pool.
Indicate clearly to all participants the relationship between the supply and
the demand in the system.
Cloud Computing - RICS May 2013 110
Cloud security risks Operating systems security. Virtual machine security. Security of virtualization Security risks posed by shared images Security risks posed by a management OS XOAR- breaking the monolithic design of TCB
Cloud Computing - RICS May 2013 111
Traditional threats impact amplified due to the vast amount of cloud
New threats cloud servers host multiple VMs; multiple applications
Authentication and authorization the procedures in place for one
Third-party control generates a spectrum of concerns caused by the
Availability of cloud services system failures, power outages, and
Cloud Computing - RICS May 2013 112
Three actors involved; six types of attacks possible.
The user can be attacked by:
Service SSL certificate spoofing, attacks on browser caches, or
phishing attacks.
The cloud infrastructure attacks that either originates at the
cloud or spoofs to originate from the cloud infrastructure.
The service can be attached by:
A user buffer overflow, SQL injection, and privilege escalation
are the common types of attacks.
The cloud infrastructure the most serious line of attack. Limiting
access to resources, privilege-related attacks, data distortion, injecting additional operations.
The cloud infrastructure can be attached by:
A user targets the cloud control system. A service requesting an excessive amount of resources and
causing the exhaustion of the resources.
Cloud Computing - RICS May 2013 113
Surfaces of attacks in a cloud computing environment.
Cloud Computing - RICS May 2013 114
User Service Cloud infrastructure
Control and monitor the cloud Invoke the service and get results Request resources and manage them
Service-User User-Service Cloud-User User-Cloud Cloud-Service Service-Cloud
Identified by a 2010n Cloud Security Alliance (CSA) report:
The abusive use of the cloud -the ability to conduct nefarious activities
from the cloud
APIs that are not fully secure - may not protect the users during a range
and control of the application during runtime.
Malicious insiders - cloud service providers do not disclose their hiring
standards and policies so this can be a serious threat
Shared technology. Account hijacking. Data loss or leakage – if the only copy of the data is stored on the cloud,
then sensitive data is permanently lost when cloud data replication fails followed by a storage media failure
Unknown risk profile - exposure to the ignorance or underestimation of
the risks of cloud computing
Cloud Computing - RICS May 2013 115
The lack of transparency makes auditability a very difficult
Auditing guidelines elaborated by the National Institute of
the Federal Information Processing Standard (FIPS) the Federal Information Security Management Act (FISMA)
Cloud Computing - RICS May 2013 116
The unauthorized access to confidential information and the data theft
Data is more vulnerable in storage, as it is kept in storage for extended
periods of time.
Threats during processing cannot be ignored; such threats can originate
from flaws in the VMM, rogue VMs, or a VMBR.
There is the risk of unauthorized access and data theft posed by rogue
Lack of standardization is also a major concern. Users are concerned about the legal framework for enforcing cloud
Multi-tenancy is the root cause of many user concerns. Nevertheless,
The threats caused by multi-tenancy differ from one cloud delivery
Cloud Computing - RICS May 2013 117
The contract between the user and the Cloud Service Provider (CSP)
CSP obligations to handle securely sensitive information and its obligation to
comply to privacy laws.
CSP liabilities for mishandling sensitive information. CSP liabilities for data loss. The rules governing ownership of the data. The geographical regions where information and backups can be stored.
Cloud Computing - RICS May 2013 118
A critical function of an OS is to protect applications against a wide
The elements of the mandatory OS security:
Access control mechanisms to control the access to system objects. Authentication usage mechanisms to authenticate a principal. Cryptographic usage policies mechanisms used to protect the data
Commercial OS do not support a multi-layered security; only
Trusted paths mechanisms support user interactions with trusted
Cloud Computing - RICS May 2013 119
Closed-box platforms e.g., cellular phones, game consoles and ATM
Such facilities are not available to open-box platforms, the traditional
Commodity operating system offer low assurance. An OS is a complex
An OS provides weak mechanisms for applications to authenticate to
An OS poorly isolates one application from another, once an application
Cloud Computing - RICS May 2013 120
Hybrid and hosted VMs, expose the entire system to the vulnerability
In a traditional VM the Virtual Machine Monitor (VMM) controls the
A VMM controls the execution of privileged operations and can enforce
memory isolation as well as disk and network access.
The VMMs are considerably less complex and better structured than
traditional operating systems thus, in a better position to respond to security attacks.
A major challenge a VMM sees only raw data regarding the state of a
guest operating system while security services typically operate at a higher logical level, e.g., at the level of a file rather than a disk block.
A secure TCB (Trusted Computing Base) is a necessary condition for
Cloud Computing - RICS May 2013 121
Cloud Computing - RICS May 2013 122
Virtual Machine Monitor & Security Services
Guest VM Guest VM
Guest OS Guest OS
Application Application
Virtual Machine Monitor & Security Services
Guest VM Guest VM
Guest OS Guest OS
Application Application Security Services VM
Reduced Guest OS
Security Services Trusted Computing Base (TCB) (a) (b) Frontend of VM Security Services Frontend of VM Security Services
Starvation of resources and denial of service for some VMs.
(a) badly configured resource limits for some VMs; (b) a rogue VM with the capability to bypass resource limits set in VMM.
VM side-channel attacks: malicious attack on one or more VMs by a
(a) lack of proper isolation of inter-VM traffic due to misconfiguration of
the virtual network residing in the VMM;
(b) limitation of packet inspection devices to handle high speed traffic,
e.g., video traffic;
(c) presence of VM instances built from insecure VM images, e.g., a VM
image having a guest OS without the latest patches.
Buffer overflow attacks.
Cloud Computing - RICS May 2013 123
Deployment of rogue or insecure VM; unauthorized users may
improper configuration of access controls on VM administrative tasks
such as instance creation, launching, suspension, re-activation and so
Presence of insecure and tampered VM images in the VM image
(a) lack of access control to the VM image repository; (b) lack of mechanisms to verify the integrity of the images, e.g., digitally
signed image.
Cloud Computing - RICS May 2013 124
The complete state of an operating system running under a virtual
Ability to support the IaaS delivery model; in this model a user selects an
image matching the local environment used by the application and then uploads and runs the application on the cloud using this image.
Increased reliability; an operating system with all the applications running
under it can be replicated and switched to a hot standby
Improved intrusion prevention and detection; a clone can look for known
patterns in system activity and detect intrusion. The operator can switch to a hot standby when suspicious events are detected.
More efficient and flexible software testing; instead of very large number of
dedicated systems running under different OS, different version of each OS, and different patches for each version, virtualization allows the multitude of OS instances to share a small number of physical systems.
Cloud Computing - RICS May 2013 125
Straightforward mechanisms to implement resource management
To balance the load of a system, a VMM can move an OS and the
applications running under it to another server when the load on the current server exceeds a high water mark.
To reduce power consumption the load of lightly loaded servers can be
moved to other servers and then turn off or set on standby mode the lightly loaded servers.
When secure logging and intrusion protection are implemented at the
Cloud Computing - RICS May 2013 126
Diminished ability to manage the systems and track their status.
The number of physical systems in the inventory of an organization is
limited by cost, space, energy consumption, and human support. Creating a virtual machine (VM) reduces ultimately to copying a file, therefore the explosion of the number of VMs. The only limitation for the number of VMs is the amount of storage space available.
Qualitative aspect of the explosion of the number of VMs traditionally,
a virtual environment the number of different operating systems, their versions, and the patch status of each version will be very diverse. Heterogeneity will tax the support team.
The software lifecycle has serious implication on security. The traditional
assumption the software lifecycle is a straight line, hence the patch management is based on a monotonic forward progress. The virtual execution model maps to a tree structure rather than a line; indeed, at any point in time multiple instances of the VM can be created and then each
Cloud Computing - RICS May 2013 127
Infection may last indefinitely some of the infected VMs may be
In a traditional computing environment a steady state can be reached.
A side effect of the ability to record in a file the complete state of a VM
Virtualization undermines the basic principle that time sensitive data
Cloud Computing - RICS May 2013 128
Image sharing is critical for the IaaS cloud delivery model. For
Amazon Machine Images (AMIs) accessible through the Quick Start. Community AMI menus of the EC2 service.
Many of the images analyzed by a recent report allowed a user to
A software vulnerability audit revealed that 98% of the Windows
Security risks:
Backdoors and leftover credentials. Unsolicited connections. Malware.
Cloud Computing - RICS May 2013 129
A virtual machine monitor or hypervisor is considerably smaller than an
The Trusted Computer Base (TCB) of a cloud computing environment
The management OS supports administrative tools, live migration,
In Xen the management operating system runs in Dom0; it manages
Allocate memory in the Dom0 address space and load the kernel of the
guest operating system from the secondary storage.
Allocate memory for the new VM and use foreign mapping to load the
kernel to the new VM.
Set up the initial page tables for the new VM. Release the foreign mapping on the new VM memory, set up the virtual
CPU registers and launch the new VM.
Cloud Computing - RICS May 2013 130
The trusted computing base of a Xen-based environment includes the hardware, Xen, and the management operating system running in Dom0. The management OS supports administrative tools, live migration, device drivers, and device
DomU.
Cloud Computing - RICS May 2013 131 Domain0 control interface Virtual x86 CPU Virtual physical memory Virtual network Virtual block devices
Xen Application Application Application Guest OS Guest OS Guest OS x86 Hardware
Management OS
Device emulation Device drivers Live migration Administrative tools
At the time it creates a DomU:
Refuse to carry out the steps necessary to start the new VM. Modify the kernel of the guest OS to allow a third party to monitor and
control the execution of applications running under the new VM.
Undermine the integrity of the new VM by setting the wrong page tables
and/or setup wrong virtual CPU registers.
Refuse to release the foreign mapping and access the memory while
the new VM is running.
At run time:
Dom0 exposes a set of abstract devices to the guest operating systems
using split drivers with the frontend of in a DomU and the backend in
is encrypted. Transport Layer Security (TLS) does not guarantee that Dom0 cannot extract cryptographic keys from the memory of the OS and applications running in DomU
Cloud Computing - RICS May 2013 132
The entire state of the system is maintained by XenStore. A malicious VM can deny to other VMs access to XenStore; it can
Cloud Computing - RICS May 2013 133
To implement a secure run-time system we have to intercept and
New hypercalls are necessary to protect:
The privacy and integrity of the virtual CPU of a VM. When Dom0 wants to
save the state of the VM the hypercall should be intercepted and the contents of the virtual CPU registers should be encrypted. When DomU is restored the virtual CPU context should be decrypted and then an integrity check should be carried out.
The privacy and integrity of the VM virtual memory. The page table update
hypercall should be intercepted and the page should be encrypted so that Dom0 handles only encrypted pages of the VM. To guarantee the integrity the hypervisor should calculate a hash of all the memory pages before they are saved by Dom0. An address translation is necessary as a restored DomU may be allocated a different memory region.
The freshness of the virtual CPU and the memory of the VM. The solution
is to add to the hash a version number.
Cloud Computing - RICS May 2013 134
Xoar is a version on Xen designed to boost system security; based on
Maintain the functionality provided by Xen. Ensure transparency with existing management and VM interfaces. Tight control of privileges, each component should only have the
privileges required by its function.
Minimize the interfaces of all components to reduce the possibility that a
component can be used by an attacker.
Eliminate sharing. Make sharing explicit whenever it cannot be eliminated
to allow meaningful logging and auditing.
Reduce the opportunity of an attack targeting a system component by
limiting the time window when the component runs.
The security model of Xoar assumes that threats come from:
A guest VM attempting to violate data integrity or confidentiality of another
guest VM on the same platform, or to exploit the code of the guest.
Bugs in initialization code of the management virtual machine.
Cloud Computing - RICS May 2013 135
Permanent components XenStore-State} maintains all information regarding the state of the system.
Components used to boot the system; they self-destruct before any user VM is
drivers and then boot the system:
PCIBack - virtualizes access to PCI bus configuration. Bootstrapper - coordinates booting of the system.
Components restarted on each request:
XenStore-Logic Toolstack - handles VM management requests, e.g., it requests the Builder
to create a new guest VM in response to a user request.
Builder - initiates user VMs.
Components restarted on a timer: the two components export physical storage device drivers and the physical network driver to a guest VM.
Blk-Back - exports physical storage device drivers using udev rules. NetBack - exports the physical network driver.
Cloud Computing - RICS May 2013 136
Xoar has nine classes of components of four types: permanent, self-destructing, restarted upon request, and restarted on timer. A guest VM is started using the by the Builder using the Toolstack; it is controlled by the XenStore-Logic. The devices used by the guest VM are emulated by the Qemu component. Qemu is responsible for device emulation
Cloud Computing - RICS May 2013 137
XenStore-State Permanent XenStore-Logic
Guest VM
Self-destructing Restarted on timer Restarted on each request PCIBack Bootstrapper NetBack BlkBack Toolstack Builder QEMU
XOAR
Component sharing between guest VM in Xoar. Two VM share only the XenStore components. Each one has a private version of the BlkBack, NetBack and Toolstack.
Cloud Computing - RICS May 2013 138 Virtual Machine A Qemu BlkBack A NetBack A Toolstack A Builder Virtual Machine B BlkBack B NetBack B Toolstack B Xen XenStore XenStore-Logic XenStore-State