build and operate a ceph infrastructure university of
play

Build and operate a CEPH Infrastructure - University of Pisa case - PowerPoint PPT Presentation

Build and operate a CEPH Infrastructure - University of Pisa case study Simone Spinelli simone.spinelli@unipi.it 17 TF-Storage meeting - Pisa 13-14 October 2015 Agenda C E P H @ u n i p i : a n o v e r v i e w Performances


  1. Build and operate a CEPH Infrastructure - University of Pisa case study Simone Spinelli simone.spinelli@unipi.it 17 TF-Storage meeting - Pisa 13-14 October 2015

  2. Agenda ● C E P H @ u n i p i : a n o v e r v i e w ● Performances ● Infrastructure bricks: ● Our experience – Network ● Conclusions – OSD nodes – Monitor Node – Racks – MGMT tools 17 TF-Storage meeting - Pisa 13-14 October 2015

  3. University of Pisa ● Big sized Italian university: – 70K students – 8K employees – Not a campus but spread all over the city → no big datacenter but many small sites Own and manage an optical infrastructure with on top an MPLS-based MAN ● Proud host of GARR Network PoP ● ● Surrounded by other research/educational institutions (CNR/SantAnna/Scuola Normale…) 17 TF-Storage meeting - Pisa 13-14 October 2015

  4. How we use CEPH Currently in production as backend for an Openstack installation, it hosts: ● department tenants (Web servers, etc.. ) ● tenants for research projects (DNA seq, etc… ) ● tenants for us: multimedia content from elearning platforms Working on: ● An email system for students hosted on Openstack → RBD ● A sync&share platform → RadosGW 17 TF-Storage meeting - Pisa 13-14 October 2015

  5. Timeline ● Spring 2014: we started to plan: Capacity/Replica planning – Rack engineering (power/cooling) – Bare metal management – Confjguration Management – ● Dec 2014: fjrst testbed ● Feb 2015: 12 nodes cluster goes in production ● Jul 2015: Openstack goes in production ● Oct 2015: Start to deploy new ceph nodes (+12) 17 TF-Storage meeting - Pisa 13-14 October 2015

  6. Overview ● 3 sites (we started with 2): – One replica per site – 2 active computing and storage – 1 for storage and quorum ● 2 difgerent network infrastructures: – services (1Gb and 10 Gb) – storage (10Gb and 40Gb) 17 TF-Storage meeting - Pisa 13-14 October 2015

  7. Network ● Ceph clients and cluster networks are realized as VLAN on the same switching infrstructure ● Redundancy and loadbalancing are achieved by LACP ● Switching platforms: – Juniper EX4550: 32p SFP – Juniper EX4200: 24p copper 17 TF-Storage meeting - Pisa 13-14 October 2015

  8. Storage ring ● Sites interconnected wirh a 2x40Gb ERP ● For Storage nodes: 1VirtualChassis per DC: – Maximize the bandwidht: 128GB backend inside the VC – Easy to confjgure and manage (NSSU) – No more than 8 nodes per VC – For computing nodes difgerent VC 17 TF-Storage meeting - Pisa 13-14 October 2015

  9. Hardware:OSD nodes DELL R720XD (2U): ● Ubuntu 14.04 ● 2 Xeon e5-2603@1.8Ghz: 8 core total ● Linux 3.13.0-46-generic #77-Ubuntu ● 64GB RAM DDR3 ● Linux bonding driver: ● 2x10Gb Intel X520 Network Adapter – No special functions ● 12 2TB SATA disks (6disks/RUs) – Less complex ● 2 Samsung 850 256GB SSD disks ● Really easy to deploy with iDRAC – Mdadm raid1 for OS ● Intended to be the virtual machine pool – 6 partition per disk for XFS journal (faster) 17 TF-Storage meeting - Pisa 13-14 October 2015

  10. Hardware:OSD nodes Supermicro SSG6047R-OSD120H: ● Ubuntu 14.04 ● 2 Xeon e5-2630v2@2.60Ghz : 24 core ● Linux 3.13.0-46-generic #77-Ubuntu total ● 2 SSD raid 1 for OS (dedicated) ● 256GB RAM DDR3 ● Linux bonding driver: ● 4x10Gb Intel X520 Network Adapter – No special functions ● 30 6TB SATA disks (7.5disks/RU) – Less complex ● 6 intel 3700 SSD disks for XFS journal ● Intended to be the object storage pool (slow) – 1 disk → 5 OSD 17 TF-Storage meeting - Pisa 13-14 October 2015

  11. Hardware: monitor nodes Sun Sunfjre x4150 ● Hardware not virtual (3 in production, going to be 5) ● Ubuntu 14.04 - Linux 3.13.0-46-generic #77-Ubuntu ● 2 Intel Xeon X5355@2.66Ghz ● 2x1GB intel for Ceph Client network (LACP) ● 16GB RAM ● 5x120GB intel 3500 SSD RAID 10 + HotSpare 17 TF-Storage meeting - Pisa 13-14 October 2015

  12. Racks plans IN PROGRESS: NOW: computing and storage will be computing and storage are in specifjc racks. mixed. ● 24U OSD nodes For storage: ● 32U OSD nodes ● 4U Computing nodes ● 2U monitor/cache ● 2U monitor/cache ● 8U network ● 10U network For computing: ● 32U for computing nodes ● 10U network The storage network fan-out is optimized 17 TF-Storage meeting - Pisa 13-14 October 2015

  13. confjguration essential -1 262.1 root default rule serra_fibo_ing_high-end_ruleset { -15 87.36 datacenter fibonacci ruleset 3 -16 87.36 rack rack-c03-fib type replicated -14 87.36 datacenter serra min_size 1 -17 87.36 rack rack-02-ser max_size 10 -35 87.36 datacenter ingegneria step take default -31 0 rack rack-01-ing step choose firstn 0 type datacenter -32 0 rack rack-02-ing step chooseleaf firstn 1 type host-high- -33 0 rack rack-03-ing end -34 0 rack rack-04-ing step emit -18 87.36 rack rack-03-ser } 17 TF-Storage meeting - Pisa 13-14 October 2015

  14. Tools Just 3 people working on CEPH (not 100%) and you need to grow quickly → Automation is REALLY important ● Confjguration management: Puppet – Most of the classes are already production-ready – A lot of documentation (best practices, books, community) ● Bare metal installation:The Foreman – Complete lifecycle for hardware – DHCP, DNS, Puppet ENC 17 TF-Storage meeting - Pisa 13-14 October 2015

  15. Tools F o r m o n i t o r i n g / a l a r m i n g : Test environment: (Vagrant and VirtualBox) to test what is hardware ● Nagios+CheckMK indipendent: – alarms ● new functionalities – graphing ● Puppet classes ● Rsyslog ● upgrades procedures ● Looking at collectD + Graphite – Metrics correlation 17 TF-Storage meeting - Pisa 13-14 October 2015

  16. Openstack integration ● It works straightforward ● Shared storage → live migration ● CEPH as a backend for: ● multiple pools are supported – Volumes ● Current issues: (OS=Juno Ceph=Giant) – Vms – M a s s i v e v o l u m e d e l e t i o n – Images – Evacuate ● Copy on Write: VM as a snapshot 17 TF-Storage meeting - Pisa 13-14 October 2015

  17. Performances – ceph bench writes ===================================== ==================================== =================================== Total time run: 10.353915 Total time run: 60.308706 Total time run: 120.537838 Total writes made: 1330 Total writes made: 5942 Total writes made: 12593 Write size: 4194304 Write size: 4194304 Write size: 4194304 Bandwidth (MB/sec): 513.815 Bandwidth (MB/sec): 394.106 Bandwidth (MB/sec): 417.894 Stddev Bandwidth: 161.337 Stddev Bandwidth: 103.204 Stddev Bandwidth: 84.4311 Max bandwidth (MB/sec): 564 Max bandwidth (MB/sec): 524 Max bandwidth (MB/sec): 560 Min bandwidth (MB/sec): 0 Min bandwidth (MB/sec): 0 Min bandwidth (MB/sec): 0 Average Latency: 0.123224 Average Latency: 0.162265 Average Latency: 0.153105 Stddev Latency: 0.0928879 Stddev Latency: 0.211504 Stddev Latency: 0.175394 Max latency: 0.955342 Max latency: 2.71961 Max latency: 2.05649 Min latency: 0.045272 Min latency: 0.041313 Min latency: 0.038814 ===================================== ==================================== ==================================== 17 TF-Storage meeting - Pisa 13-14 October 2015

  18. Performances – ceph bench reads rados bench -p BenchPool 10 rand rados bench -p BenchPool 10 seq =================================== ================================== Total time run: 10.065519 Total time run: 10.057527 Total reads made: 1561 Total reads made: 1561 Read size: 4194304 Read size: 4194304 Bandwidth (MB/sec): 620.336 Bandwidth (MB/sec): 620.829 Average Latency: 0.102881 Average Latency: 0.102826 Max latency: 0.294117 Max latency: 0.328899 Min latency: 0.04644 Min latency: 0.041481 =================================== ================================== 17 TF-Storage meeting - Pisa 13-14 October 2015

  19. Performances: adding VMs What to measure: See how Latency is infmuenced by IOPS, measuring it while we add VMs (fjxed load generator). ● See how Total bandwidth decrease adding VMs ● Setup: 40VM on Openstack with 2 10G volumes (pre-allocated with dd): ● One with bandwidht CAP (100MB) – One with IOPS CAP (200 total) – We use fjo as benchmark tool and dsh to launch it from a master node. ● Refence: Measure Ceph RBD performance in a quantitative way: https://software.intel.com/en-us/blogs/2013/10/25/measure-ceph-rbd-performance- ● in-a-quantitative-way-part-i 17 TF-Storage meeting - Pisa 13-14 October 2015

  20. Fio fio --size=1G \ fio --size=4G \ --runtime 60 \ --runtime=60 \ --ioengine=libaio \ --ioengine=libaio \ --direct=1 \ --direct=1 \ --rw=randread [randwrite]\ --rw=read [write]\ --name=fiojob \ --name=fiojob \ --blocksize=4K \ --blocksize=128K [256K] \ --iodepth=2 \ --iodepth=64 \ --rate_iops=200 \ --output=seqread.out --output=randread.out 17 TF-Storage meeting - Pisa 13-14 October 2015

  21. Performances -write 17 TF-Storage meeting - Pisa 13-14 October 2015

  22. Performances - write 17 TF-Storage meeting - Pisa 13-14 October 2015

  23. Performances - read 17 TF-Storage meeting - Pisa 13-14 October 2015

  24. Performances - read 17 TF-Storage meeting - Pisa 13-14 October 2015

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend