1
400G Demonstrator for ISC ‘13
Post ISC phase 2013 Wolfgang Wünsch, Technische Universität Dresden Eduard Beier, T-Systems International
400G Demonstrator for ISC 13 Post ISC phase 2013 Wolfgang Wnsch, - - PowerPoint PPT Presentation
400G Demonstrator for ISC 13 Post ISC phase 2013 Wolfgang Wnsch, Technische Universitt Dresden Eduard Beier, T-Systems International 1 Agenda Partner Purpose Project Structure Topology just click! Turbine Development
1
Post ISC phase 2013 Wolfgang Wünsch, Technische Universität Dresden Eduard Beier, T-Systems International
–public – E. Beier/ W. Wünsch 400G Demonstrator für ISC’13
Agenda
just click! (hype perlinked)
–public – E. Beier/ W. Wünsch 400G Demonstrator für ISC’13 3
Partner
Back to Agenda
–public – E. Beier/ W. Wünsch 400G Demonstrator für ISC’13 4
Purpose The purpose of the project is: to demonstrate, that bandwidth beyond 100GBit/s is feasible and useful
Back to Agenda
–public – E. Beier/ W. Wünsch 400G Demonstrator für ISC’13 5
Project Structure
Back to Agenda Project Boar ard
Prof.
Dr .W. Nagel
Prof.
Gentzsc sch
Geiger
sen
selowsk ski Jan Heichler
Projec ect Man anag agem emen ent
E.Beier
ünsch n.n.
WP1
Perfo forma mancetT tTests sts Andy Georgi System Performance Metering
WP2
Parallel Filesyste tems ms Klaus Gottsc
halk Filesystem Optim izing
WP3
Server & Sto torage Beier/Wünsc sch Server & Storage Project Management
WP4
Transp nsport Mask skos / Ma Mayer Planning / / Engineering WDM Project Management
WP5
Layer 2/3 Dani niel Nowara Router Project Management
WP7
Applicati tions ns Ferdinand Jami mitz tzky Applications Project Management
WP6
SDN & NFV Ralf f Braun SDN & NFV & Security
WP8
Public Relati tions ns Udo Schä häfe fer Project Marketing
–public – E. Beier/ W. Wünsch 400G Demonstrator für ISC’13 6
WP1: Performance Tests
measurements
ad: Andy Georgi Back to Projec ect Structure
–public – E. Beier/ W. Wünsch 400G Demonstrator für ISC’13 7
WP2: Parallel File System
coordination with other WPs and partners
Concept)
ad: : Klau aus Gottschal alk Back to Projec ect Structure
–public – E. Beier/ W. Wünsch 400G Demonstrator für ISC’13 8
WP3: Server & Storage & IB & etc
Infiniband infrastructure in coordination with other WPs and partners
ad : Projekt Man anag agement Back to Projec ect Structure
–public – E. Beier/ W. Wünsch 400G Demonstrator für ISC’13 9
WP4: Transport
infrastructure in coordination with other WPs and partners
ad: Stefan an Mas askos (Planning) / Heinz May ayer (Te Technology) Back to Projec ect Structure
–public – E. Beier/ W. Wünsch 400G Demonstrator für ISC’13 10
WP5: Layer 2/3
coordination with other WPs and partners
WP-Leiter: : Dan aniel Nowar ara Back to Projec ect Structure
–public – E. Beier/ W. Wünsch 400G Demonstrator für ISC’13 11
WP6: SDN & NFV & Security
ad Ral alf Brau aun (T-Labs abs) Back to Projec ect Structure
–public – E. Beier/ W. Wünsch 400G Demonstrator für ISC’13 12
WP7: Applications
ad: Ferdinan and Jam amitzky Back to Projec ect Structure
–public – E. Beier/ W. Wünsch 400G Demonstrator für ISC’13 13
WP8: Public Relations
etc)
ad: Udo Schäf äfer Back to Projec ect Structure
–public – E. Beier/ W. Wünsch 400G Demonstrator für ISC’13 14
400G Demonstrator
Topology
10GbE for Demonstrator Computing Center
Euro Industriepark München
Back to Agenda
Turbine Development
Back to Agenda
Detai ails:
Data volume: ~ 1TB Overall Workflow: Multitude of independent simulation runs (HTC). Simulations running on HPC resources at different sites. Every simulation produces input data for subsequent simulations. Subsequent simulations again run at different sites.
Thus to avoid knock-on delays in workflow execution data instantly should be available at different sites! GPFS:
Adopted feature: Active File Management (AFM) and Stretched Cluster Cross site data replication allows running simulations without prior copying Implicit data consistent backup via AFM data replication
Turbine Development: Benefits of GPFS Usage on 400G
Back to Tu Turbine Developm pment
tim e
a = 6
n * m = 28 ≥ a * b = 30 Δt Solver 1 Solver 2
b = 5
240m in
Back to Tu Turbine Developm pment
400G: : Ban andw dwidt dth req equirem emen ents for di differ eren ent job di distribu bution set etups ps
Extreme/HTC setup with a = n * m = 300, b = 1:
write peak):
Required bandwidth:
400GBit/s
Required machine size: > 19200 Cores (when single jobs run on 64 cores) „Gentle“ setup with a = 50, b = 6:
data to disk to represent runtime differences over larger values b:
Required bandwidth:
4GBit/s
Required machine size: > 3200 Cores (when single jobs run on 64 cores)
Turbine Development: Benefits of GPFS Usage on 400G
Back to Tu Turbine Developm pment
–public – E. Beier/ W. Wünsch 400G Demonstrator für ISC’13 19
Chemnitz
Turbine Development Setup
processing @ DSI
Calculat ation (Solver 1)
Calculat ation (Solver 2)
proces essing @ DSI Back to Tu Turbine Developm pment
–public – E. Beier/ W. Wünsch 400G Demonstrator für ISC’13 20
Chemnitz
Turbine Development & GPFS
Parallel Distr tribute uted File Syste tem GPFS
Back to Tu Turbine Developm pment
Order
er 30 differ eren ent mode dels are used ed worldw dwide de
Exper
erimen ents with thes ese mode dels produ duce1 e10s
es
100s of
bytes es of
ata nee eed to to be be compar pared ed betwee een multipl ple e sites es worldw dwide de
Movem
emen ent of
ata should be be within months *
Transfer Rate Time to transport 1 PB of Data 10 Mbps ~ 27 years 1 Gbps ~ 97 days 100 Gbps ~ 23 hours
* Otherwise the questions will be forgotten ;-)
Statistics taken from: „BER Network Requirements Workshop”, LBNL report LBNL-4089E 2010, P 33. Recommended Reading
Climate Computing Extremely High Bandw dwidth dth Requ quirements ‘Very Big Data‘
Back to Agenda
–public – E. Beier/ W. Wünsch 400G Demonstrator für ISC’13 22
Climate Computing Application Setup
Folde der 1 CMIP Folde der 2 CMIP Federat ation Preal allocat ation Model po post-pr processing an and anal alysis Visual alisiat ation @ ISC ’13 Leipz pzig Folder 3 CMIP Back to Climat ate Compu puting
–public – E. Beier/ W. Wünsch 400G Demonstrator für ISC’13 23
CCA & GPFS & iRODS
GPFS and/or
Global Namespace iRODS
Back to Climat ate Compu puting
–public – E. Beier/ W. Wünsch 400G Demonstrator für ISC’13 24
Service Recipient Relations
Distributed ed Folders Distributed ed Folders Service Rec Recipien ent Feder eration, Prea eallocation Resea earch Clien ent PREP EP & POST on Cloud Res esources es Calculation Climate e Computing TRA RACE on HPC Resources es TRACE E on HPC Resources es Turbine e Dev evel elopmen ent Client ev evaluating results, e. e.g. TEC ECPLOT 5T 5T 5T 5T 5T 5T 5T 5T 5T 5T 5T 5T Distributed ed Folders 5T 5T
Back to Agenda
–public – E. Beier/ W. Wünsch 400G Demonstrator für ISC’13 25
400GBit/s Data Path
Router / Switch 400G G WDM Super er Channel el (4x1 x100Gb GbE) E) Link Serve
ve r
Storage GPFS Files esystem em
Back to Agenda
Router Router
12x Server
12x40 GbE
12x Server
12x40 GbE
400 GBit/s /s
36x700G Flash 36x700G Flash ∼7000 cores
IB FDR Network
12xIB FDR
TUD Cluster
∼2000 cores
IB FDR10 Network
12xIB IB FDR10
10
–public – E. Beier/ W. Wünsch 400G Demonstrator für ISC’13 26
3x700G Flash Server
Router 6 6 GByte/s /s 5 5 GByte/s /s
∼7000 cores
IB FDR Network
7 7 GByte/s /s
Per Server
36x700G Flash 12x Server
Router
∼7000 cores
IB FDR Network
50
GByte/s /s
60
GByte/s /s
72 72
GByte/s /s
84 84
GByte/s /s
Throughput Targets
Back to Agenda
–public – E. Beier/ W. Wünsch 400G Demonstrator für ISC’13 27
The Big Picture
Back to Agenda
Firew ewall / En Encryption Router er / Switch 400G G WDM Super er Channel el (4x1 x100Gb GbE) E) Link Server er Storage Parallel el File e System
17x Server Firewall ∼200 cores
1x10 GbE
36x700G Flash 12x Server
Router
17x Server
17x10 GbE Router 2x100 GbE*
DATE Cluster 1 DATE Cluster 2
12x40 GbE 17x10 GbE
17x2T Disk 17x2T Disk
Router 1x10 GbE
400 GBit/s /s
36x700G Flash ∼2000 cores 12x Server
12x40 GbE
∼7000 cores
TUD Cluster RZG Cluster
IB FDR10 Network
12xIB FDR10
10
IB FDR Network
12xIB FDR
SGI Cluster
1x10 GbE
Firew ewall *maybe nor available e during ISC
–public – E. Beier/ W. Wünsch 400G Demonstrator für ISC’13 28
Connection RZG G Router – RZG-WDM and Connection TUD UD Router– TUD UD WDM Type 2 x LC (100GBaseLR4) length TUD: 10m; RZG: 10m Volume 4 each (8 total)
7750 7750 SR12E 7750 7750 SR12
1830 PSS
4x100 GbE
Router er / Switch 400G G WDM Super er Channel el Link WDM Ter erminal Amplifier er
1830 PSS
4x100 GbE
coher eren ent Super er Channel el (2 x 16QAM@ 50 GH GHzGr Grid / 2 x 200 GB GBit/s)
OLA+ DGE OLA OLA OLA OLA OLA OLA OLA
70km 70km 70km 70km 70km 70km 70km 70km 70km
400 G WDM Super Channel
Back k to Big Picture
–public – E. Beier/ W. Wünsch 400G Demonstrator für ISC’13 29
Connection TU TUD Server – TUD UD IB-Switch and Connection RZG G Server – RZG G IB Switch Type MPO (Infiniband FDR (56GBit/s)) length TUD: 10m, RZG: 10m Volume 12 each side (24 total)
Infiniband Connections
Mellan anox Connec ect-IB Volume:1 e:12on eac ach side de (24total al)
Back k to Big Picture
Mellan anox active Cabl able (incl. . QSFP) Volume: : 12 on eac ach side (24 total al)
–public – E. Beier/ W. Wünsch 400G Demonstrator für ISC’13 30
Connection TU TUD Server – TUD UD Router and Connection RZG G Server – RZG G Router Type MPO (40GBaseSR4) length TUD: 10m; RZG: 10m Volume 24x10m
Mellan anox Connec ect X3
Volume: 12 on eac ach side (24 total)
Mellan anox active e Cabl able e (incl. . QSFP) Volume: e: 12 on eac ach side de (24 total al)
40GbE Connections
Alcat atel-Lucent 3-po port 40GbE IMM
Volume: 4 @ RZG G
Alcat atel-Lucent 6-po port 40GbE IMM (no picture) Volume:2@T
@TUD UD Back k to Big Picture
–public – E. Beier/ W. Wünsch 400G Demonstrator für ISC’13 31
IBM iDat ataP aPlex dx360 M4 (Volume: 12
12@RZ @RZG)
Bull NovaS aScal ale R460 F3 (Volume: 12@T
@TUD UD)
Server
Back k to Big Picture
–public – E. Beier/ W. Wünsch 400G Demonstrator für ISC’13 32
Connection Router – WDM Type LC singelmode (100GBaseLR10) length 10m Volume 2 Back k to Big Picture
Alcat atel-Lucent 2-port 100GbE IMM
Volume me: 3 @ TUD UD
Alcat atel-Lucent 1-po port 100GbE IMM (no picture) Volume
me:4@R 4@RZG
–public – E. Beier/ W. Wünsch 400G Demonstrator für ISC’13 33
Connection TUD UD Router – TUD UD 10GbE Cluster Type LC duplex multimode (10GBaseSR) length ? Volume 17 Back k to Big Picture
–public – E. Beier/ W. Wünsch 400G Demonstrator für ISC’13 34
Alcat atel el Lucen ent 7750 SR1 R12 @RZG Alcat atel el Lucen ent 7750 SR1 R12 E @TU TUD
Back k to Big Picture
–public – E. Beier/ W. Wünsch 400G Demonstrator für ISC’13 35
Alcat atel Lucent 1830 PSS 32
Back k to Big Picture
–public – E. Beier/ W. Wünsch 400G Demonstrator für ISC’13 36
Back k to Big Picture
EMC2 XtremSF 700GB SLC
Volume me: 36 PCIe cards on each side (72 total)
–public – E. Beier/ W. Wünsch 400G Demonstrator für ISC’13
closing WS WS
21.6.-?
37
Project Lifetime
Demonstrat ator Setup Getting through Te Test Item em List TI TIL
IS ISC
28.1- 15.6.
Kickoff WS WS
Back to Agenda
–public – E. Beier/ W. Wünsch 400G Demonstrator für ISC’13 Press Release
Timeline Rev. C
Setup tup Server, Stor
nfini niband nd Setu tup GPFS
final Perfo foma mance Tests sts
RfS WDM & IP Performance Tests done RfS GPFS
Setu tup WDM, M, Route ter DATE Applicati tions ns fina nal confi nfig
RfS Tunnel, Server RfS Applications
Back to Agenda
no Applicati tion Tests ts possi sible GPFS opti timi mizati tion
CW 19 CW 20 CW 21 CW22 CW23 CW 24
We are here
Stop DATE
–public – E. Beier/ W. Wünsch 400G Demonstrator für ISC’13 39
Demonstrator Application Test Environment DATE Object ctive ve
to get two high sophisticated applications running @400G
integrated part of the project; the application teams get access as soon possible on new building blocks of the ‘big picture’
Back to Agenda
–public – E. Beier/ W. Wünsch 400G Demonstrator für ISC’13 40
DATE Phase 1
Router er / Switch 400G G WDM Super er Channel el (4x1 x100Gb GbE) E) Link
17x Server
Serve
ve r
Storage Firew ewall / En Encryption GPFS Files es System em
7750 7750 SR12
17x Server
17x10 GbE 7750 7750 SR12 2x100 GbE
DATE Cluster 1 DATE Cluster 2
17x10 GbE
17x2T Disk 17x2T Disk
Back to Ti Timeline
–public – E. Beier/ W. Wünsch 400G Demonstrator für ISC’13 41
DATE Phase 2
Router er / Switch 400G G WDM Super er Channel el (4x1 x100Gb GbE) E) Link Serve
ve r
Storage Firew ewall / En Encryption GPFS Files es System em
17x Server 3xGPFS Server
7750 7750 SR12
17x Server
17x10 GbE 7750 7750 SR12 2x100 GbE
DATE Cluster 1 DATE Cluster 2
17x10 GbE
17x2T Disk 17x2T Disk 2xGPFS Server 2xGPFS Server
2x40 GbE
2x40 GbE
Back to Ti Timeline
4x700G Flash 4x700G Flash
–public – E. Beier/ W. Wünsch 400G Demonstrator für ISC’13 42
DATE Phase 3
Router er / Switch 400G G WDM Super er Channel el (4x1 x100Gb GbE) E) Link Serve
ve r
Storage Firew ewall / En Encryption GPFS Files es System em
17x Server 3xGPFS Server
7750 7750 SR12
17x Server
17x10 GbE 7750 7750 SR12 2x100 GbE
DATE Cluster 1 DATE Cluster 2
17x10 GbE
17x2T Disk 17x2T Disk ∼7000 cores
TUD Cluster
IB FDR Network
2xGPFS Server 2xGPFS Server
2x40 GbE 23xIB FDR
2x40 GbE
Back to Ti Timeline
4x700G Flash 4x700G Flash
–public – E. Beier/ W. Wünsch 400G Demonstrator für ISC’13 43
Test Items Object ctive ve
networking , HPC, virtualization and other fields
implementation of the applications
partner is invited to contribute proposals Back to Projec ect Lifet etime 21.6.-?
–public – E. Beier/ W. Wünsch 400G Demonstrator für ISC’13 44
Test Item List NFV (TSI)
Back to Agenda
Lo Loadb dbalance cer / Bundl dling / Perfo formance ce / CoS / FCAPS PS (T-Labs bs)
21.6.-?
GPFS Network (TSI-SfR fR) ) SDN (TUD) D) RDMA over Ethernet
–public – E. Beier/ W. Wünsch 400G Demonstrator für ISC’13 45
Network Functions Virtualisation (NFV)
Back to Te Test Item em List
–public – E. Beier/ W. Wünsch 400G Demonstrator für ISC’13 46
Example NFV Use Case
ESX Cluster
Firewall Firewall
Customer B
VLAN 2 VLAN 1
Each customer configures his own FW entity
Firewall
VLAN 1
Customer A
VLAN 1 VLAN 2 VLAN 2 VLAN 2 VLAN 1 VLAN 2
Internet
Router er/Switch Firew ewall VM VM Standby Fi Firewall VM VM
Back to Te Test Item em List
–public – E. Beier/ W. Wünsch 400G Demonstrator für ISC’13 47
NFV & ESX Test Setup
ESX Cluster
Client
40 40 GbE
Router/Switch Firewall VM VM Standard HW (2xE xE5-2670 + + 128G G RA RAM)
40 40 GbE 40 40 GbE 40 40 GbE 40 40 GbE
Client nt
40 40 GbE
Back to Te Test Item em List
–public – E. Beier/ W. Wünsch 400G Demonstrator für ISC’13 48
Demonstrator NFV Setup
Router er / Switch Link Serve
ve r
Storage Firew ewall / En Encryption / Compres ession GPFS Files es System em
17x Server Firewall ∼200 cores
1x10 GbE
3xGPFS Server
7750 7750 SR12
17x Server
17x10 GbE 7750 7750 SR12E 2x100 GbE
DATE Cluster 1 DATE Cluster 2
17x10 GbE
17x2T Disk 17x2T Disk SGI Clus uster
7750 7750 SR12 1x10 GbE
∼2000 cores 3xGPFS Server ∼7000 cores
TUD Cluster RZG Cluster
IB FDR10 Network
3xIB FDR10
10
IB FDR Network
3xIB FDR 1x100 GbE
3xFEC
3x40 GbE 3x40 GbE
3xFEC
3x40 GbE 3x40 GbE
9x700G Flash 9x700G Flash
1x10 GbE
Back to Te Test Item em List
–public – E. Beier/ W. Wünsch 400G Demonstrator für ISC’13 49
Scheduled for: 21. 1.6.-5.7.? Objectives
PCIe3.0) feasible?
NFV Objectives & Comments
Comm mments
important , even in HPC environments
Back to Te Test Item em List
–public – E. Beier/ W. Wünsch 400G Demonstrator für ISC’13 50
Loadbalancer / Bundling / Performance / CoS / FCAPS Setup
Back to Te Test Item em List
–public – E. Beier/ W. Wünsch 400G Demonstrator für ISC’13 51
GPFS Network
17x Server Firewall ∼200 cores
1x10 GbE
3xGPFS Server
7750 7750 SR12
17x Server
17x10 GbE 7750 7750 SR12E 2x100 GbE
DATE Cluster 1 DATE Cluster 2
3x40 GbE 17x10 GbE
17x2T Disk 17x2T Disk
7750 7750 SR12 1x10 GbE
∼2000 cores 3xGPFS Server
3x40 GbE
∼7000 cores
TUD Cluster RZG Cluster
IB FDR10 Network
3xIB FDR10
10
IB FDR Network
3xIB FDR 1x100 GbE
3xGPFS Server
3x40 GbE
Router er / Switch Link Serve
ve r
Storage Firew ewall / En Encryption / Compres ession GPFS Files es System em
12x700G Flash 12x700G Flash 12x700G Flash SGI Clus uster
1x10 GbE
Back to Te Test Item em List
–public – E. Beier/ W. Wünsch 400G Demonstrator für ISC’13 52
Scheduled for: 21. 1.6.-5.7.? Objectives
be TUD) gespiegelt werden? (Server Überlast?)
GPFS noch nach”?
Test Item GPFS Network
Comm mments
Back to Te Test Item em List
–public – E. Beier/ W. Wünsch 400G Demonstrator für ISC’13 53
Test Item SDN
Comm mments * Aufsetzen einer virtuellen SDN Umgebung zwischen ZIH & RZG mittels vSwitch * Wünschenswert wäre die Einbindung von aktiven Netzwerkelementen mit OpenFlow Unterstützung (bspw. auch Barracudas SDN Gateway) * Gegenüberstellung verschiedener verfügbarer OpenFlow Controller (Beacon, Floodlight, FlowER, OpenDaylight, ...) * Zeitraum: 1 Monat, kann aber parallel zu anderen Untersuchungen laufen Da vermutlich nicht besonders viel Zeit zur Verfügung stehen wird, glaube ich das dies bereits sehr ambitioniert ist. Ich muss dann schauen wieviele Controller aufgesetzt und getestet werden können. Aber ich würde die Umgebung so aufbauen das ich auch nach dem 400G Showcase noch damit arbeiten kann.
Back to Te Test Item em List
–public – E. Beier/ W. Wünsch 400G Demonstrator für ISC’13 54
Test Item RDMA over Ethernet
Comm mments
Back to Te Test Item em List
–public – E. Beier/ W. Wünsch 400G Demonstrator für ISC’13 55