Architecting for the cloud: lessons learned from 100 CloudStack - - PowerPoint PPT Presentation

architecting for the cloud lessons learned from 100
SMART_READER_LITE
LIVE PREVIEW

Architecting for the cloud: lessons learned from 100 CloudStack - - PowerPoint PPT Presentation

Architecting for the cloud: lessons learned from 100 CloudStack deployments Sheng Liang CTO, Cloud Platforms, Citrix CloudStack History 2008 2009 2010 2012 2011 Sept 2008: Nov 2009: May 2010: July 2011: April 2012: CloudStack


slide-1
SLIDE 1

CTO, Cloud Platforms, Citrix

Architecting for the cloud: lessons learned from 100 CloudStack deployments

Sheng Liang

slide-2
SLIDE 2

2008 Sept 2008: VMOps Founded 2009 Nov 2009: CloudStack 1.0 GA 2010 May 2010: Cloud.com Launch & CloudStack 2.0 GA 2011 July 2011: Citrix Acquires Cloud.com 2012 April 2012: Apache CloudStack

CloudStack History

slide-3
SLIDE 3

Open Source Xen Hypervisor Open Source Xen Hypervisor Amazon Proprietary Orchestration Software Amazon Proprietary Orchestration Software EC2 API EC2 API Amazon eCommerce Platform Amazon eCommerce Platform Networking Storage Commodity Servers

The inventor of IaaS cloud – Amazon EC2

slide-4
SLIDE 4

Open Source Xen Hypervisor Open Source Xen Hypervisor Amazon Proprietary Orchestration Software Amazon Proprietary Orchestration Software EC2 API EC2 API Amazon eCommerce Platform Amazon eCommerce Platform Networking Storage Commodity Servers XenServer CloudStack CloudPortal Cloud APIs

ESX Hyper-V KVM OVM

CloudStack is inspired by Amazon EC2

slide-5
SLIDE 5

There will be 1000s of clouds

IT SP Owner | Operator Horizontal General Purpose Vertical Special Purpose

Desktop Cloud

Data center mgmt and automation
slide-6
SLIDE 6

Learning from 100s of CloudStack deployments

Enterprise Service Providers Web 2.0

slide-7
SLIDE 7

What is the biggest difference between traditional-style data center automation and Amazon-style cloud?

slide-8
SLIDE 8
slide-9
SLIDE 9
slide-10
SLIDE 10

How to handle failures

slide-11
SLIDE 11 11

8%

Kashi Venkatesh Vishwanath and Nachiappan Nagappan, Characterizing Cloud Computing Hardware Reliability, SoCC’10

Annual Failure Rate of servers

  • Server failure comes from:

70% - hard disk 6% - RAID controller 5% - memory 18% - other factors

  • Application can still fail for
  • ther reasons:

Network failure Software bugs Human admin error

slide-12
SLIDE 12

Internet Core Routers Access Routers Aggregation Switches Load Balancers Top of Rack Switches

… …

Servers

slide-13
SLIDE 13 13

40%

Phillipa Gill, Navendu Jain & Nachiappan Nagappan, Understanding Network Failures in Data Centers: Measurement, Analysis and Implications, SIGCOMM 2011

Effectiveness of network redundancy in reducing failures

  • Bugs in failover

mechanism

  • Incorrect configuration
  • Protocol issues such

as TCP back-off, timeouts, and spanning tree reconfiguration

slide-14
SLIDE 14
  • A. Promise users VM, storage, and networking will

never fail

  • B. Backup VM for users and restore for users

when failure happens C.Tell users to expect failure. Users to backup VM and handle failure themselves

  • - no strategy to handle failures
slide-15
SLIDE 15

zCloud East Zone AWS East Zone zCloud West Zone AWS West Zone

slide-16
SLIDE 16

zCloud East Zone AWS East Zone zCloud West Zone AWS West Zone

Design for

Failure

slide-17
SLIDE 17

Cloud workloads

Traditional-Style

Reliable hardware, backup entire cloud, and restore for users when failure happens

Amazon-Style

Tell users to expect failure. Users to build apps that can withstand infrastructure failure Link aggregation Storage multi-pathing VM HA, fault tolerance VM live migration VM backup/snapshots Multi-site redundancy Chaos monkey Ephemeral resources Strong consistency Eventual consistency

slide-18
SLIDE 18

Designing a zone for a traditional workload

vCenter/XenCenter vCenter/XenCenter Hypervisor Cluster Hypervisor Cluster Hypervisor Cluster Hypervisor Cluster Hypervisor Cluster Hypervisor Cluster Enterprise Networking (e.g., VLAN) Enterprise Networking (e.g., VLAN) Enterprise Storage (e.g., SAN) Enterprise Storage (e.g., SAN) Hypervisor Storage SAN Networking L2 VLANs Network Services Load Balancing VPN Multi-tier Apps Ent App Mgmt vSphere or XenServer Enterprise Traditional-Style Availability Zone
slide-19
SLIDE 19

Designing a zone for an Amazon-style workload

Hypervisor Storage Local EBS Networking L3 SDN based L2 Elastic IP Network Services Security Groups ELB Multi-tier Apps 3rd Party Tools (e.g., RightScale, enStratus) 3rd Party Tools (e.g., RightScale, enStratus) XenServer or KVM GSLB Software Defined Networks (e.g., Security Groups, EIP, ELB,...) Software Defined Networks (e.g., Security Groups, EIP, ELB,...) Amazon-Style Availability Zone Server Racks Server Racks Server Racks Server Racks Server Racks Server Racks Server Racks Server Racks Server Racks Server Racks Server Racks Server Racks Server Racks Server Racks Server Racks Server Racks Server Racks Server Racks Server Racks Server Racks Server Racks Server Racks Server Racks Server Racks Elastic Block Storage Elastic Block Storage
slide-20
SLIDE 20 Users Availability Zone 1 NetScaler

ELB/ GSLB

NetScaler Availability Zone 2

?

Object store is critical for Amazon-style cloud

NetScaler Availability Zone 2 Storage Cloud
slide-21
SLIDE 21 Object Storage Object Storage AWS-style Availability Zone AWS-style Availability Zone AWS-style Availability Zone AWS-style Availability Zone AWS-style Availability Zone AWS-style Availability Zone

Same Cloud can Support Both Styles

Traditional Style Availability Zone Traditional Style Availability Zone Apache CloudStack Mgmt Server Traditional Style Availability Zone Traditional Style Availability Zone

Replication/DR

slide-22
SLIDE 22

Tests for a “true” cloud app

  • Does it require SAN or VLAN?
  • Does it run in multiple data centers?
  • Does it involve a distributed object store?
  • Is there a single point of failure?
slide-23
SLIDE 23

Learning from 100s of CloudStack deployments

Enterprise Service Providers Web 2.0

Traditional-style Mostly Amazon-style Mostly traditional style

slide-24
SLIDE 24 Router L3 Core Switch Top of Rack Switch

… … … …

Availability Zone 1 Servers Object Store Pod 1 Pod 2 Pod 3 Pod N Load Balancer Internet Availability Zone 2 Primary CloudStack Mgmt Server Cluster Primary MySQL CloudStack Admin Backup MySQL Standby CloudStack Mgmt Server Cluster
slide-25
SLIDE 25

DB Security Group Web Security Group

Layer 3 cloud networking (security groups)

… …

Web VM Web VM Web VM Web VM Web VM Web VM Web VM Web VM DB VM DB VM Web VM Web VM DB VM DB VM Web VM Web VM
slide-26
SLIDE 26

Layer 2 VLAN networking

… …

User 2 User 2 User 1 User 1 User 1 User 1 User 1 User 1 User 1 User 1 User 2 User 2
slide-27
SLIDE 27

OVS networking

… …

User 2 User 2 User 1 User 1 User 1 User 1 User 1 User 1 User 1 User 1 User 2 User 2 OVS OVS OVS OVS OVS

GRE Key 2 GRE Key 1

slide-28
SLIDE 28

Multi-tier virtual networking

App subnet 10.1.2.0/24 App VM 1 App VM 2 Web VM 1 Web VM 2 Web VM 3 Web VM 4 Web subnet 10.1.1.0/24 DB Subnet 10.1.3.0/24 DB VM 1 Customer Premises Customer Premises

IPSec VPN

Internet Internet

MPLS VLAN

Network Services

  • IPAM
  • DNS
  • LB [intra]
  • S-2-S VPN
  • Static Routes
  • ACLs
  • NAT, PF
  • FW [ingress & egress]
  • BGP
NetScaler VPX GRE Key 1 GRE Key 2 GRE Key 3 Virtual Router

Public VLAN

slide-29
SLIDE 29

Network flexibility

Network Services

  • L2 connectivity
  • IPAM
  • DNS
  • Routing
  • ACL
  • Firewall
  • NAT
  • VPN
  • LB
  • IDS
  • IPS

Network Isolation

  • No isolation
  • VLAN isolation
  • SDN overlays
  • L3 isolation

Service Providers

  • Virtual appliances
  • Hardware firewalls
  • LB appliances
  • SDN controllers
  • IDS /IPS
appliances
  • VRF
  • Hypervisor
slide-30
SLIDE 30

“The Apache Way”

  • Collaborative software development
  • Commercial-friendly standard license
  • Consistently high quality software
  • Respectful, honest, technical-based interaction
  • Faithful implementation of standards
  • Security as a mandatory feature
slide-31
SLIDE 31

Apache CloudStack Community

Pre Apache Move (Jan 2012) June Actuals # of companies endorsing project 1 68 # of companies participating 10 140 # of developers working on project 40 238
slide-32
SLIDE 32

Apache CloudStack community projects

  • SDN

Nicira Midokura Big Switch Networks Stratosphere

  • Backup/DR

Sungard

  • Networking

Cisco Brocade (ADX)

  • Smart Storage

Hadoop + S3 API for object store NetApp (FlexPod, object store) Basho RIAK CS Caringo object store Cloudian S3

  • PaaS

CloudFoundry implementation through IronFoundry and Stackato teams Engine Yard Cumulogic GigaSpaces

slide-33
SLIDE 33

Workload requirements drive cloud architecture There is real demand for SDN in cloud infrastructure Open source developers drive cloud adoption

slide-34
SLIDE 34

More info http://cloudstack.org