FEDERICA Beauty and the NOC Tomasz Sroczynski, PSNC 7th TF-NOC - - PowerPoint PPT Presentation
FEDERICA Beauty and the NOC Tomasz Sroczynski, PSNC 7th TF-NOC - - PowerPoint PPT Presentation
FEDERICA Beauty and the NOC Tomasz Sroczynski, PSNC 7th TF-NOC meeting, Poznan, 12.12.12 What is FEDERICA? main idea and purposes FEDERICA Federated E-infrastructure Dedicated to European Researchers Innovating in Computing Network
FEDERICA – Beauty and the NOC
Tomasz Sroczynski, PSNC 7th TF-NOC meeting, Poznan, 12.12.12
What is FEDERICA? – main idea and purposes
FEDERICA – Federated E-infrastructure Dedicated to European Researchers Innovating in Computing Network Architectures FEDERICA allows researchers (NRENs as well as individuals, e.g. PhD students) to have their own testbed for network R&D purposes. Future Internet research activities are privileged (like new routing protocols). Researcher has it’s own „slice” of isolated (VLAN, MPLS), virtualised computing and network infrastructure that allows to perform even disruptive experiments. Research on layers 2-7 of ISO OSI model is possible. FEDERICA is operational since 2009, currently it is supported as a part
- f GN3 SA1 T3 (GARR as coordinator).
www.fp7-federica.eu
FEDERICA vs Clouds
Is FEDERICA a Cloud? It may be, if TaaS will be considered as a part of Cloud services. FEDERICA IaaS
Complexity Complex topologies, custom routing, up to 100 VLAN IDs for user Simple clusters, one VLAN ID, NAT User’s choice network topology Yes No Special purpose hardware High performance MX LR instances No (except Amazon – special purpose hardware for GPU processing) Computing/storage capabilities Limited High Reproducibility Yes (e.g. QoS) No Purposes Research on the Internet or networks in general, down to L2 of ISO OSI model Usually providing extra storage space in servers/data centers
Topology
Core PoPs – full mesh topology:
- Juniper MX480:
dual CPU, 1 line card with 32 GbE ports, virtual and logical routing, VLAN, MPLS, IPv4, IPv6;
- 2x Sun Fire X2200 M2:
2x quad core AMD@2GHz, 32GB RAM, 8x 1000/100/10 Ethernet NIC, 2x 500GB HDD, VMware ESXi 5.0 Non-Core PoPs:
- Juniper EX3200;
- 1x Sun Fire X2200 M2
Use cases (projects) and their slices
IIDS – Intelligent Intrusion Detection System VMSR – Virtual Multi-Stage Routing NOVI – Networking innovations Over Virtualized Infrastructures BonFIRE – Building service testbeds for Future Internet Research and Experimentation CONFINE IP – Community Networks Testbed for the Future Internet
FEDERICA NOC
Currently (since 2011) the FEDERICA NOC is run entirely by Poznan Supercomputing and Networking Center:
- Monitoring of the FEDERICA infrastructure
24 hours a day, 365 days a year;
- Substrate management – administration,
maintenance and provisioning of the FEDERICA resources;
- User support – network
slices provisioning, user helpdesk; Not a typical NOC:
- Virtual infrastructures
- ver the substrate;
- Not only networking
maintenance performed;
NOC general routine and procedures
FEDERICA NOC shape is still evolving. However, we’ve developed a NOC routine duties list that consists of:
- constant substrate monitoring (Nagios, G3, vSphere, SSH),
- constant slices monitoring (Nagios, G3, vSphere),
- regular devices’ health status lookup (vSphere, SSH),
- regular (or after crucial changes have been made) Juniper equipment
configuration backups (SSH),
- regular (or before risky maintenance work) VM backups (vSphere).
NOC recent activities
In the last few months performed activities mostly involved:
- V-Nodes and Juniper equipment cleanup,
- FEDERICA Wiki update and cleanup,
- major and minor fixes, including restoration of one PoP switch.
One of the main tasks was to perform V-Nodes migration from VMware ESXi 3.5 to 5.0 version. In collaboration with PSNC NOC:
- PSNC NOC procedures for support FEDERICA in substrate monitoring,
- unique mailing list for issue reporting,
- new TTS group with access for FEDERICA NOC members – also
people outside PSNC (from other participating NRENs) that are relevant. We’ve also developed new release of FEDERICA User FAQ along with our internal knowledge base for present or future FEDERICA NOC: information, procedures, JunOS configuration examples, troubleshooting advice.
Our tools
There are several tools that look to be obvious to use: Nagios, RT, SSH, eLOMs… And some other: FEDERICA Wiki, vSphere.
Our tools (continued)
G3 (developed and maintained by CESNET) monitors all substrate elements: links, routers, interfaces, virtualization servers and every single VM etc. (over 50,000 entities monitored).
Our tools (continued)
G3 produces graphs from between one hour statistics up to one year statistics. Monitors e.g. memory activity, network data I/O rate, CPU usage of both hosts/VMs etc.
Issues met (example 1)
Case: eLOM interfaces for all the servers (two V-Nodes and User Access Server) in PSNC were unavailable Investigation: no pings, „CRITICAL” in Nagios… but all three interfaces went down simultaneously, in one minute! Diagnosis: patchord from the switch between MX and the V-Nodes was accidentally unplugged… This case taught us not to reject even such possibilities that sound silly.
Issues met (example 2)
Case: several interfaces
- n
REDIRIS V-Node were indicated by G3 as being down Investigation: in vSphere relevant vmnics were up along with corresponding EX interfaces, traffic was alright Diagnosis: MIB was to be updated by ”set interfaces ge-x/y/z unit 0 bandwidth 1g” configuration statement In general the G3 monitoring system provided along with Nagios by CESNET does the work and is a very helpful tool for NOC indeed.
Next steps (TODOs)
FEDERICA NOC is going to perform operations which will help to keep things in order and thus operate on the substrate faster and easier:
- solving any other remaining issues,
- rearrangement of FEDERICA address pool usage,
- ACL update.