Comet Virtual Clusters – What’s underneath?
Philip Papadopoulos San Diego Supercomputer Center ppapadopoulos@ucsd.edu
Comet Virtual Clusters Whats underneath? Philip Papadopoulos San - - PowerPoint PPT Presentation
Comet Virtual Clusters Whats underneath? Philip Papadopoulos San Diego Supercomputer Center ppapadopoulos@ucsd.edu Overview NSF Award# 1341698, Gateways to Discovery: Cyberinfrastructure for the Long Tail of Science PI: Michael Norman
Philip Papadopoulos San Diego Supercomputer Center ppapadopoulos@ucsd.edu
NSF Award# 1341698, Gateways to Discovery: Cyberinfrastructure for the Long Tail of Science PI: Michael Norman Co-PIs: Shawn Strande, Philip Papadopoulos, Robert Sinkovits, Nancy Wilkins-Diehr SDSC Project in Collaboration with Indiana University (led by Geoffrey Fox)
cores)
Kepler3 GPUs (36)
bandwidth
Juniper 100 Gbps Arista 40GbE (2x) Data Mover Nodes
Research and Education Network Access Data Movers Internet 2 7x 36-port FDR in each rack wired as full fat-tree. 4:1 over subscription between racks.
72 HSWL 320 GB Core InfiniBand (2 x 108- port) 36 GPU 4 Large- Memory
IB-Ethernet Bridges (4 x 18-port each) Performance Storage 7.7 PB, 200 GB/s 32 storage servers Durable Storage 6 PB, 100 GB/s 64 storage servers Arista 40GbE (2x) 27 racks
FDR 36p FDR 36p
64 128 18
72 HSWL 320 GB 72 HSWL
2*36 4*18 Mid-tier InfiniBand Additional Support Components (not shown for clarity) Ethernet Mgt Network (10 GbE) Node-Local Storage 18 72
FDR FDR 40GbE 40GbE 10GbE 18 switches
4 4
FDR
72 Home File Systems VM Image Repository
Login Data Mover
Management Gateway Hosts
1. Each LAG group has a single IB local ID (LID) 2. IB switches are destination routed – Default is that all sources for the same destination LID take the same route (port)
becomes 2^LMC addresses. At each switch level, there are now 2^LMC routes to a destination LID (better route dispersion)
LMC for better route balancing, you reduce the size of your
IB Switch LID of LAG IB Nodes
PROBLEM: Losing Ethernet Paths from Nodes to storage
bridges “answers” with it’s MAC address. When it receives a packet destined for IP XX.YY it forwards (Layer 2) to the appropriate mac
per bridge. Our network config worked for 18+ months.
subnet change occurred, an ARP flood ensued (2K nodes each asking for O(64) Ether mac addresses).
respond to all ARP requests. Lustre wasn’t happy
inside our Arista Fabric).
IB/Ether Bridge (mac: bb) Ethernet IP XX.YY (mac: aa) IPoIB node: Who has XX.YY? Bridge answers: “I do, at bb” (PROXY ARP) Lustre Storage Arista Switch/Router
Nucleus Persistent virtual front end API
Idle disk images Active virtual compute nodes Disk images
Attached and synchronized
Frontend Virtual Frontend Hosting Disk Image Vault Compute Compute Compute Compute Compute Compute Compute Compute Compute
public network private Virtual Compute Virtual Compute Virtual Compute private Virtual Compute Virtual Compute Virtual Compute private physical virtual virtual Virtual Frontend Virtual Frontend
Virtual Frontend Virtual Compute Virtual Compute Virtual Compute private Ethernet Infiniband
All l no nodes s ha have
rivate Eth thernet
Infin iniband
Local l Di Disk St Storage Vi Virtual l Co Compute Nod
s can Netw twork boo boot (P (PXE XE) fr from
ts vi virtual l fr frontend All l Di Disks retain state
In Infin iniband Vi Virtualization
8% latency overhead. .
inal ba bandwid idth overhead Comet: Providing Virtualized HPC for XSEDE
science.
significant I/O performance degradation (e.g., excessive DMA interrupts)
InfiniBand host channel adapters
functions, each light weight but with its own DMA streams, memory space, interrupts
InfiniBand latency/bandwidth and minimal overhead
WR F 3.4.1 – 3hr forecast
gateway.
task.
AVX options.
Case: 218 taxa, 10,000 generations.
communication
all communication)
benchmark.
gateway.
AVX options.
218 taxa, 2,294 characters, 1,846 patterns, 100 bootstraps specified.
SMP.
uses ibverbs for multi-node runs.
$ cm comet iso list 1: CentOS-7-x86_64-NetInstall-1511.iso 2: ubuntu-16.04.2-server-amd64.iso 3: ipxe.iso ...<snip>... 19: Fedora-Server-netinst-x86_64-25-1.3.iso 20: ubuntu-14.04.4-server-amd64.iso
$ cm comet iso attach 2 vctNN1 $ cm comet power on vctNN $ cm comet console vctNN
cm comet iso attach 2 vctNN
Virtual compute-x
NAS Compute nodes
iqn.2001-04.com.nas-0-0-vm-compute-x
a.
b.
NAS Compute nodes
Virtual compute-x
iqn.2001-04.com.nas-0-0-vm-compute-x
NAS Compute nodes
Virtual compute-x
NAS Compute nodes
Virtual compute-x
NAS Compute nodes
Virtual compute-x
Power off
https://github.com/rocksclusters/img-storage-roll