FEDERICA Beauty and the NOC Tomasz Sroczynski, PSNC 7th TF-NOC - - PowerPoint PPT Presentation

federica beauty and the noc
SMART_READER_LITE
LIVE PREVIEW

FEDERICA Beauty and the NOC Tomasz Sroczynski, PSNC 7th TF-NOC - - PowerPoint PPT Presentation

FEDERICA Beauty and the NOC Tomasz Sroczynski, PSNC 7th TF-NOC meeting, Poznan, 12.12.12 What is FEDERICA? main idea and purposes FEDERICA Federated E-infrastructure Dedicated to European Researchers Innovating in Computing Network


slide-1
SLIDE 1
slide-2
SLIDE 2

FEDERICA – Beauty and the NOC

Tomasz Sroczynski, PSNC 7th TF-NOC meeting, Poznan, 12.12.12

slide-3
SLIDE 3

What is FEDERICA? – main idea and purposes

FEDERICA – Federated E-infrastructure Dedicated to European Researchers Innovating in Computing Network Architectures FEDERICA allows researchers (NRENs as well as individuals, e.g. PhD students) to have their own testbed for network R&D purposes. Future Internet research activities are privileged (like new routing protocols). Researcher has it’s own „slice” of isolated (VLAN, MPLS), virtualised computing and network infrastructure that allows to perform even disruptive experiments. Research on layers 2-7 of ISO OSI model is possible. FEDERICA is operational since 2009, currently it is supported as a part

  • f GN3 SA1 T3 (GARR as coordinator).

www.fp7-federica.eu

slide-4
SLIDE 4

FEDERICA vs Clouds

Is FEDERICA a Cloud? It may be, if TaaS will be considered as a part of Cloud services. FEDERICA IaaS

Complexity Complex topologies, custom routing, up to 100 VLAN IDs for user Simple clusters, one VLAN ID, NAT User’s choice network topology Yes No Special purpose hardware High performance MX LR instances No (except Amazon – special purpose hardware for GPU processing) Computing/storage capabilities Limited High Reproducibility Yes (e.g. QoS) No Purposes Research on the Internet or networks in general, down to L2 of ISO OSI model Usually providing extra storage space in servers/data centers

slide-5
SLIDE 5

Topology

Core PoPs – full mesh topology:

  • Juniper MX480:

dual CPU, 1 line card with 32 GbE ports, virtual and logical routing, VLAN, MPLS, IPv4, IPv6;

  • 2x Sun Fire X2200 M2:

2x quad core AMD@2GHz, 32GB RAM, 8x 1000/100/10 Ethernet NIC, 2x 500GB HDD, VMware ESXi 5.0 Non-Core PoPs:

  • Juniper EX3200;
  • 1x Sun Fire X2200 M2
slide-6
SLIDE 6

Use cases (projects) and their slices

IIDS – Intelligent Intrusion Detection System VMSR – Virtual Multi-Stage Routing NOVI – Networking innovations Over Virtualized Infrastructures BonFIRE – Building service testbeds for Future Internet Research and Experimentation CONFINE IP – Community Networks Testbed for the Future Internet

slide-7
SLIDE 7

FEDERICA NOC

Currently (since 2011) the FEDERICA NOC is run entirely by Poznan Supercomputing and Networking Center:

  • Monitoring of the FEDERICA infrastructure

24 hours a day, 365 days a year;

  • Substrate management – administration,

maintenance and provisioning of the FEDERICA resources;

  • User support – network

slices provisioning, user helpdesk; Not a typical NOC:

  • Virtual infrastructures
  • ver the substrate;
  • Not only networking

maintenance performed;

slide-8
SLIDE 8

NOC general routine and procedures

FEDERICA NOC shape is still evolving. However, we’ve developed a NOC routine duties list that consists of:

  • constant substrate monitoring (Nagios, G3, vSphere, SSH),
  • constant slices monitoring (Nagios, G3, vSphere),
  • regular devices’ health status lookup (vSphere, SSH),
  • regular (or after crucial changes have been made) Juniper equipment

configuration backups (SSH),

  • regular (or before risky maintenance work) VM backups (vSphere).
slide-9
SLIDE 9

NOC recent activities

In the last few months performed activities mostly involved:

  • V-Nodes and Juniper equipment cleanup,
  • FEDERICA Wiki update and cleanup,
  • major and minor fixes, including restoration of one PoP switch.

One of the main tasks was to perform V-Nodes migration from VMware ESXi 3.5 to 5.0 version. In collaboration with PSNC NOC:

  • PSNC NOC procedures for support FEDERICA in substrate monitoring,
  • unique mailing list for issue reporting,
  • new TTS group with access for FEDERICA NOC members – also

people outside PSNC (from other participating NRENs) that are relevant. We’ve also developed new release of FEDERICA User FAQ along with our internal knowledge base for present or future FEDERICA NOC: information, procedures, JunOS configuration examples, troubleshooting advice.

slide-10
SLIDE 10

Our tools

There are several tools that look to be obvious to use: Nagios, RT, SSH, eLOMs… And some other: FEDERICA Wiki, vSphere.

slide-11
SLIDE 11

Our tools (continued)

G3 (developed and maintained by CESNET) monitors all substrate elements: links, routers, interfaces, virtualization servers and every single VM etc. (over 50,000 entities monitored).

slide-12
SLIDE 12

Our tools (continued)

G3 produces graphs from between one hour statistics up to one year statistics. Monitors e.g. memory activity, network data I/O rate, CPU usage of both hosts/VMs etc.

slide-13
SLIDE 13

Issues met (example 1)

Case: eLOM interfaces for all the servers (two V-Nodes and User Access Server) in PSNC were unavailable Investigation: no pings, „CRITICAL” in Nagios… but all three interfaces went down simultaneously, in one minute! Diagnosis: patchord from the switch between MX and the V-Nodes was accidentally unplugged… This case taught us not to reject even such possibilities that sound silly.

slide-14
SLIDE 14

Issues met (example 2)

Case: several interfaces

  • n

REDIRIS V-Node were indicated by G3 as being down Investigation: in vSphere relevant vmnics were up along with corresponding EX interfaces, traffic was alright Diagnosis: MIB was to be updated by ”set interfaces ge-x/y/z unit 0 bandwidth 1g” configuration statement In general the G3 monitoring system provided along with Nagios by CESNET does the work and is a very helpful tool for NOC indeed.

slide-15
SLIDE 15

Next steps (TODOs)

FEDERICA NOC is going to perform operations which will help to keep things in order and thus operate on the substrate faster and easier:

  • solving any other remaining issues,
  • rearrangement of FEDERICA address pool usage,
  • ACL update.
slide-16
SLIDE 16

Q&A

Thank you for your attention!