HYPER COOL INFRASTRUCTURE OPENSTACK SUMMIT BOSTON | MAY 2017 RANDY - - PowerPoint PPT Presentation

hyper cool infrastructure
SMART_READER_LITE
LIVE PREVIEW

HYPER COOL INFRASTRUCTURE OPENSTACK SUMMIT BOSTON | MAY 2017 RANDY - - PowerPoint PPT Presentation

HYPER COOL INFRASTRUCTURE OPENSTACK SUMMIT BOSTON | MAY 2017 RANDY RUBINS Sr Cloud Consultant, Red Hat May 10, 2017 WHY HYPER "COOL" INFRA? Did not like "CONVERGED" Needed to preserve a "C-word" Could have used


slide-1
SLIDE 1

HYPER COOL INFRASTRUCTURE

OPENSTACK SUMMIT BOSTON | MAY 2017 RANDY RUBINS Sr Cloud Consultant, Red Hat May 10, 2017

slide-2
SLIDE 2

WHY HYPER "COOL" INFRA?

Did not like "CONVERGED" Needed to preserve a "C-word" Could have used "COMPLEX" or "CRAMMED" Ended up with a four-letter word that helped get my presentation accepted

slide-3
SLIDE 3

AGENDA

What is HCI? Drivers and Use Cases Red Hat Hyperconverged Solutions Architectural Considerations Implementation Details Performance and Scale Considerations Futures Q & A

slide-4
SLIDE 4

HYPER-CONVERGED INFRASTRUCTURE

"Hyperconvergence moves away from multiple discrete systems that are packaged together and evolve into software-defined intelligent environments that all run in commodity, off-the-shelf x86 rack servers. Its infrastructures are made up of conforming x86 server systems equipped with direct-attached storage. It includes the ability to plug and play into a data center pool of like systems."

  • Wikipedia
slide-5
SLIDE 5

HYPER-CONVERGED INFRASTRUCTURE

"Hyperconvergence delivers simplification and savings by consolidating all required functionality into a single infrastructure stack running on an efficient, elastic pool of x86 resources." and "... hyperconverged infrastructure delivers on the promise of the SDDC at the technological level."

  • Hyperconverged.org
slide-6
SLIDE 6

HYPER-CONVERGED INFRASTRUCTURE

"The “hyper” in hyperconvergence comes from the hypervisor, or more generically, virtualization technology. Hyperconvergence means to bring the virtual aspects of infrastructure together with the physical, resulting in a single solution... The servers, storage and virtualization stack are not only bundled together, but are completely integrated and transparent to the administrator.... Hyper(visor) + Convergence = Hyperconvergence."

  • Scale Computing
slide-7
SLIDE 7

Smaller hardware footprint Lower cost of entry Standardization Maximized capacity utilization D-NFV vCPE ROBO Lab/Sandbox

DRIVERS

HYPER-CONVERGED INFRASTRUCTURE

USE CASES

slide-8
SLIDE 8

HYPERCONVERGED SOLUTIONS

+ +

TRADITIONAL VIRTUALIZTION PRIVATE CLOUD CONTAINERIZED CLOUD APPS

+

slide-9
SLIDE 9

RHV-S (GRAFTON)

hosted-engine

+

grafton-0 grafton-1 grafton-2

slide-10
SLIDE 10

REQUIREMENTS/LIMITATIONS

RHV-S

1

Valid subscriptions for RHV 4.1 & RHGS 3.2

2

Exactly 3 physical nodes with adequate memory and storage

3

2 network interfaces (gluster back-end, and

  • virtmgmt)

4

RAID10/5/6 supported/recommended

5

1 spare hot drive recommended per node

6

RAID cards must use flash backed write cache

7

3-4 gluster volumes (engine, vmstore, data, shared_storage geo-replicated volume)

8

29 - 40 VMs supported

9

4 vCPUs / 2TB max per VM supported Currently in Beta/LA - subject to change in GA version Full details can be found here: http://red.ht/2qKwMKY

slide-11
SLIDE 11

OSP-HCI

  • vercloud-ctrl-mon-0
  • vercloud-ctrl-mon-1

undercloud

  • vercloud-ctrl-mon-2
  • vercloud-comp-osd-2

+

  • vercloud-comp-osd-0
  • vercloud-comp-osd-1
slide-12
SLIDE 12

REQUIREMENTS/LIMITATIONS

OSP-HCI

1

Valid subscriptions for RHOSP 10 & RHCS 2.0

2

(1) OSP undercloud (aka "director") - can be a VM

3

(3) "OSP controller + Ceph MON" nodes

4

(3+) "OSP compute + Ceph OSD" nodes with adequate memory and storage

5

10Gbps network interfaces for Ceph storage and OpenStack tenant networks

6

Up to 1 datacenter rack (42 nodes) for "OSP compute + Ceph OSD" Currently in Tech Preview, soon to reach fully-supported status, GA being evaluated. Full details can be found here: http://red.ht/2jXvxkB​

slide-13
SLIDE 13

undercloud

  • vercloud-comp-osd-2
  • vercloud-comp-osd-0
  • vercloud-comp-osd-1

hosted-engine grafton-2 grafton-0 grafton-1 ansible-tower cloudforms

RHV-S + OSP-HCI

  • vercloud-ctrl-mon-2
  • vercloud-ctrl-mon-0
  • vercloud-ctrl-mon-1
slide-14
SLIDE 14

undercloud

  • vercloud-comp-osd-2
  • vercloud-comp-osd-0
  • vercloud-comp-osd-1

hosted-engine grafton-2 grafton-0 grafton-1 ansible-tower cloudforms

HYPER COOL INFRA

  • vercloud-ctrl-mon-2
  • vercloud-ctrl-mon-1
  • vercloud-ctrl-mon-0
slide-15
SLIDE 15

IMPLEMENTATION DETAILS

RHV-S

1

Install RHEL 7.3 and RHV 4.1 on (3) grafton nodes

2

Configure public key authentication based SSH

3

Deploy gluster via cockpit plugin / gdeploy

4

Deploy hosted-engine via cockpit plugin

5

Enable gluster functionality on hosted-engine

6

Create networks for gluster storage, provisioning, and the rest of the OSP isolated networks

7

Create master storage domain

8

Add remaining (2) hypervisors to hosted-engine

9

Upload RHEL 7.3 guest image

10

Create RHEL 7.3 template

slide-16
SLIDE 16

IMPLEMENTATION DETAILS

OSP-HCI Deploy director (undercloud) on RHV-S using RHEL 7.3 template

slide-17
SLIDE 17

IMPLEMENTATION DETAILS

OSP-HCI Install and configure director via ansible-undercloud playbook

... undercloud files certs build_undercloud_cert.sh cacert.pem

  • penssl-undercloud.cnf

privkey.pem stack.sudo undercloud.pem tasks main.yml templates hosts.j2 instackenv.json.j2 resolv.conf.j2 undercloud.conf.j2 undercloud.yml ...

slide-18
SLIDE 18

IMPLEMENTATION DETAILS

OSP-HCI Prepare and upload overcloud images

  • name: extract overcloud images

become_user: stack unarchive: copy: false src: /usr/share/rhosp-director-images/overcloud-full-latest-{{ osp_version }}.tar dest: /home/stack/images/

  • name: extract ironic python agent images

become_user: stack unarchive: copy: false src: /usr/share/rhosp-director-images/ironic-python-agent-latest-{{ osp_version }}.tar dest: /home/stack/images/

  • name: set root password on overcloud image

shell: export LIBGUESTFS_BACKEND=direct && virt-customize -a /home/stack/images/overcloud-full.qcow2 --root- password password:{{ admin_password }}

  • name: upload overcloud images

become_user: stack shell: source ~/stackrc && openstack overcloud image upload --image-path /home/stack/images --update-existing ignore_errors: true

slide-19
SLIDE 19

IMPLEMENTATION DETAILS

OSP-HCI Customize tripleo heat templates based on Reference Architecture doc

[stack@director ~]$ tree custom-templates/ custom-templates/ ceph.yaml certs build_overcloud_cert.sh cacert-oc.pem

  • penssl-oc.cnf
  • vercloud.pem

privkey-oc.pem compute.yaml custom-roles.yaml enable-tls.yaml first-boot-template.yaml inject-trust-anchor.yaml layout.yaml network.yaml nic-configs compute-nics.yaml controller-nics.yaml numa-systemd-osd.sh post-deploy-template.yaml rhel-registration environment-rhel-registration.yaml rhel-registration-resource- registry.yaml rhel-registration.yaml scripts rhel-registration rhel-unregistration scripts configure_fence.sh deploy.sh ironic-assign.sh nova_mem_cpu_calc.py nova_mem_cpu_calc_results.txt wipe-disk.sh

NOTE: Use Github repo https://github.com/RHsyseng/hci

slide-20
SLIDE 20

IMPLEMENTATION DETAILS

OSP-HCI Add resource isolation and tuning to custom templates NOTE: Follow Chapter 7 of OSP10/RHCS2 Reference Architecture Guide(!)

slide-21
SLIDE 21

IMPLEMENTATION DETAILS

OSP-HCI Forced to use a KVM host and virtual-bmc ipmi-to-libvirt proxy due to lack of oVirt/RHV ironic driver. RFE: KVM instances w/vbmc RHV 4.1

https://bugs.launchpad.net/ironic-staging-drivers/+bug/1564841

slide-22
SLIDE 22

IMPLEMENTATION DETAILS

OSP-HCI Create instackenv.json file and register (3) KVM instances and (3) OSP baremetal nodes and run introspection

{ "name": "hci-comp0", "pm_type": "pxe_ipmitool", "mac": [ "84:2b:2b:4a:0c:3f" ], "cpu": "1", "memory": "4096", "disk": "50", "arch": "x86_64", "pm_user": "root", "pm_password": "calvin", "pm_addr": "192.168.0.104", "capabilities": "node:comp0,boot_option:local" } { "name": "hci-ctrl0", "pm_type": "pxe_ipmitool", "mac": [ "52:54:00:b7:c2:7d" ], "cpu": "1", "memory": "4096", "disk": "50", "arch": "x86_64", "pm_user": "root", "pm_password": "calvin", "pm_addr": "192.168.2.10", "pm_port": "6230", "capabilities": "node:ctrl0,boot_option:local" }

  • sp-ctrl/ceph-mon (KVM)
  • sp-comp/ceph-osd (BM)
slide-23
SLIDE 23

IMPLEMENTATION DETAILS

OSP-HCI Deploy overcloud hci stack using deploy.sh script

source ~/stackrc time openstack overcloud deploy \

  • -stack hci \
  • -templates \
  • r ~/custom-templates/custom-roles.yaml \
  • e /usr/share/openstack-tripleo-heat-templates/environments/puppet-pacemaker.yaml \
  • e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml \
  • e /usr/share/openstack-tripleo-heat-templates/environments/storage-environment.yaml \
  • e /usr/share/openstack-tripleo-heat-templates/environments/tls-endpoints-public-dns.yaml \
  • e ~/custom-templates/enable-tls.yaml \
  • e ~/custom-templates/inject-trust-anchor.yaml \
  • e ~/custom-templates/rhel-registration/environment-rhel-registration.yaml \
  • e ~/custom-templates/rhel-registration/rhel-registration-resource-registry.yaml \
  • e ~/custom-templates/network.yaml \
  • e ~/custom-templates/ceph.yaml \
  • e ~/custom-templates/compute.yaml \
  • e ~/custom-templates/layout.yaml
slide-24
SLIDE 24

IMPLEMENTATION DETAILS

OSP-HCI Validate deployment

[stack@director ~]$ openstack catalog show keystone +-----------+-------------------------------------------------+ | Field | Value | +-----------+-------------------------------------------------+ | endpoints | regionOne | | | publicURL: https://hci.rrubins.lan:13000/v2.0 | | | internalURL: http://172.20.15.102:5000/v2.0 | | | adminURL: http://172.20.16.14:35357/v2.0 | | | | | name | keystone | | type | identity | +-----------+-------------------------------------------------+

Stack CREATE COMPLETE (hci): Stack CREATE completed successfully

slide-25
SLIDE 25

IMPLEMENTATION DETAILS

OSP-HCI

[root@hci-ctrl0 ~]# ceph -s cluster aaaabbbb-cccc-dddd-eeee-ff0123456789 health HEALTH_OK monmap e1: 3 mons at {hci-ctrl0=172.20.17.200:6789/0,hci- ctrl1=172.20.17.201:6789/0,hci-ctrl2=172.20.17.202:6789/0} election epoch 6, quorum 0,1,2 hci-ctrl0,hci-ctrl1,hci-ctrl2

  • sdmap e138: 21 osds: 21 up, 21 in

flags sortbitwise pgmap v1071: 704 pgs, 6 pools, 1132 kB data, 76 objects 816 MB used, 2764 GB / 2765 GB avail 704 active+clean [stack@director ~]$ openstack server list +--------------------------------------+-----------+--------+-----------------------+----------------+ | ID | Name | Status | Networks | Image Name | +--------------------------------------+-----------+--------+-----------------------+----------------+ | e957e67f-a494-4ff0-a274-8b87c86e5bc8 | hci-comp2 | ACTIVE | ctlplane=172.20.16.11 | overcloud-full | | 37ffa97b-08c9-44db-a597-28b2b2e28b28 | hci-ctrl2 | ACTIVE | ctlplane=172.20.16.9 | overcloud-full | | 495a1c59-8cd5-48da-84d9-84a17192fdce | hci-comp1 | ACTIVE | ctlplane=172.20.16.13 | overcloud-full | | 58998f26-48c3-436c-9368-e669f9cf16bd | hci-comp0 | ACTIVE | ctlplane=172.20.16.12 | overcloud-full | | a4277acf-5862-489b-bba3-6a4b203e70ea | hci-ctrl1 | ACTIVE | ctlplane=172.20.16.17 | overcloud-full | | e86ac0a0-af08-4a6b-a2a1-a080b8b74cb6 | hci-ctrl0 | ACTIVE | ctlplane=172.20.16.15 | overcloud-full | +--------------------------------------+-----------+--------+-----------------------+----------------+

slide-26
SLIDE 26

IMPLEMENTATION DETAILS

ADDITIONAL TASKS Deploy CloudForms appliance on RHV-S Configure infrastructure provider for RHV and cloud provider for OSP Provision some RHV and OSP instances to validate functionality

slide-27
SLIDE 27

POC HARDWARE

(6) Dell PowerEdge R710

(2) Intel Xeon E5520 quad-core CPUs 96GB Memory (1) 120GB SATA SSD Hard Drive (7) 146GB SAS Hard Drive (4) 1GbE Port (2) 10Gb Port (Intel X520-DA2)

(1) Cisco SG300 52-port 1GbE switch (1) Quanta LB6M 24-port 10GbE switch

slide-28
SLIDE 28

PROPOSED HARDWARE

(3) Dell PowerEdge R630

(2) Intel E5-2640v4 @ 2.4GHz CPUs 128GB RAM (1) H730 RAID Controller (2) 400GB SATA SSD (RAID1) (1) 1GbE quad-port (on-board) (1) 10GbE dual-port DA/SFP+ (Intel X520-DA2

  • r X710-DA2)

(6) Dell PowerEdge R730XD

(2) Intel E5-2640v4 @ 2.4GHz CPUs 256GB RAM (1) H730 RAID Controller (2) 400GB SATA SSD (RAID1) (12) 1.2TB SAS HDD (3) 480GB SAS SSD (1) 1GbE quad-port (on-board) (1) 10GbE dual-port DA/SFP+ (Intel X520-DA2

  • r X710-DA2)

RHV-S & OSP-HCI (COMP+OSD) OSP-HCI (CTRL+MON)

(2) Dell Networking S3048-ON

​ (48) 1GbE ports

(2) Dell Networking S4048-ON

​ (48) 10GbE ports NETWORK SWITCHES RHV-S + OSP-HCI

slide-29
SLIDE 29

PROPOSED HARDWARE

(6) Dell PowerEdge R730XD

(2) Intel E5-2640v4 @ 2.4GHz CPUs 384GB RAM (1) H730 RAID Controller (2) 400GB SATA SSD (RAID1) (12) 1.2TB SAS HDD (3) 480GB SAS SSD (1) 1GbE quad-port (on-board) (2) 10GbE dual-port DA/SFP+ (Intel X520-DA2 or X710-DA2) HYPER COOL INFRA

(2) Dell Networking S4048-ON

​ (48) 10GbE SFP+ ports NETWORK SWITCHES

slide-30
SLIDE 30

PERFORMANCE AND SCALE CONSIDERATIONS

1

10Gbps interfaces and jumbo frames for storage and tenant traffic

2

Set Nova reserved_memory and cpu_allocation_ratio based on calculations using the reference architecture supplied script

3

Avoid resource congestion with NUMA alignment and CPU pinning

4

Reduce Ceph backfill and recovery operations

5

Make sure the proper RHEL7 tuned profile is selected (throughput-performance)

6

Can scale osp-compute/ceph-osd nodes (3-42) RHV-S OSP-HCI

1

Use 10Gbps interfaces and jumbo frames for storage traffic

2

Gluster volumes must be configured with replica 3, features.shard enable, and features.shard-block-size 512MB

3

Currently 3-node cluster and CANNOT be scaled(!) Planned for a future release

slide-31
SLIDE 31

FUTURES

Reduce footprint to 6-nodes for a fully-supported hybrid HCI solution

* Requires completion of pxe_ovirt ironic driver implementation (!)

Standardize on SDN (OVN)

* Already a Tech Preview in RHV 4.1 SHORT-TERM LONGER-TERM

Containerize OSP services (Kolla, Kubernetes) Further automate the HCI buildout and configuration using Ansible

slide-32
SLIDE 32

Q&A

slide-33
SLIDE 33

twitter.com/RedHatNews youtube.com/redhat facebook.com/redhatinc

THANK YOU!

plus.google.com/+RedHat linkedin.com/company/red-hat