SURFsara NOC Flash talk Erik Ruiter, Sr. Network Specialist, - - PowerPoint PPT Presentation

surfsara noc flash talk
SMART_READER_LITE
LIVE PREVIEW

SURFsara NOC Flash talk Erik Ruiter, Sr. Network Specialist, - - PowerPoint PPT Presentation

SURFsara NOC Flash talk Erik Ruiter, Sr. Network Specialist, SURFsara TF-NOC Meeting Cambridge UK, 20-3-2014 Services National supercomputer National compute cluster Grid compute & storage Cartesius (capability Lisa (capacity computing)


slide-1
SLIDE 1

Erik Ruiter, Sr. Network Specialist, SURFsara

SURFsara NOC Flash talk

TF-NOC Meeting Cambridge UK, 20-3-2014

slide-2
SLIDE 2

TF-NOC meeting Cambridge 2014 – NOC Flash Talk

Services

National supercomputer Cartesius (capability computing) National compute cluster Lisa (capacity computing) Grid compute & storage Gina (middleware services) HPC Cloud IaaS (Do-it-yourself) Hadoop – Data processing (map-reduce algorithm) GPU cluster (Computing on a video card) Collaboratorium Remote collaboration (video wall) Render cluster (Data visualization) Beehub / SURFDrive (Dropbox unlimited)

2

slide-3
SLIDE 3

TF-NOC meeting Cambridge 2014 – NOC Flash Talk

Fully redundant topology Core routers: 2x Juniper MX960 Internal firewall cluster: 2x Fortigate 311b + 5x Cisco 3750 External firewall cluster: 2x Fortigate 3040 + 2x Juniper EX4550

Core Network: High level overview

3

slide-4
SLIDE 4

TF-NOC meeting Cambridge 2014 – NOC Flash Talk

SURFsara E-infra Network infrastructure

  • Connects the HPC

environments in SURFsara

  • High capacity (160 Gbps

Between QNodes)

  • Used for East - West traffic
  • Low latency
  • High scalability (upto 786 x

10Gbps)

  • Easy scaling
  • Single CLI management
  • Based on Juniper QFabric

Core Network: E-infra compute and storage network

4

slide-5
SLIDE 5

TF-NOC meeting Cambridge 2014 – NOC Flash Talk

NOC Tools: Monitoring

Monitoring:

  • Icinga
  • Currently monitoring 149 hosts, with 379 services
  • Nconf is used for configuration (no manual text editing)
  • Cacti
  • Syslog daemon
  • Syslog-ng -> looking for better solution (logstash)

5

slide-6
SLIDE 6

TF-NOC meeting Cambridge 2014 – NOC Flash Talk

NOC Tools: NFSen

Netflow Is enabled on

  • ur 2 core routers

Nfsen is used for:

  • Intrusion detection
  • Using alert triggers when

suspicious traffic patterns are detected. (only a few rules in place at the moment)

  • Traffic monitoring for
  • ur main uplinks:
  • SURFnet (10 Gbps)
  • LHCOPN (10 Gbps)

6

slide-7
SLIDE 7

TF-NOC meeting Cambridge 2014 – NOC Flash Talk

NOC Tools: Management

Network management:

  • All elements reachable on SSH via management infrastructure
  • All elements reachable through console port (on centralized console servers)
  • All elements Authenticated by TACACS+
  • All elements have SNMP v2 READ-ONLY access (limited to single IP address)

Configuration management:

  • Rancid + SVNWEB
  • Small in-house developed web interface to easily find configs

7

slide-8
SLIDE 8

TF-NOC meeting Cambridge 2014 – NOC Flash Talk

NOC Ticketing system

Ticketing system:

  • Trac
  • Combined issue tracking and wiki system
  • Used for software development projects, can interface with SVN, GIT

,etc.

  • All departments within SURFsara use Trac, Having their own Trac wiki

and Ticket queue.

  • Network access requests also have a separate Trac queue:

Request is first validated by Security team. Then assigned to NOC team.

8

slide-9
SLIDE 9

TF-NOC meeting Cambridge 2014 – NOC Flash Talk

NOC Documentation

Documentation

  • MediaWiki / Trac
  • Stores all project and operational related information
  • Currently looking into moving all documentation from MediaWiki to Trac.
  • Exporting is difficult (different markup Language, Trac supports less functions)
  • Racktables
  • Rackspace
  • VLANs
  • IP space
  • Looking into storing cabling information
  • Racktables custom added features:
  • Daily script that does reverse DNS lookups to determine IP subnet occupation
  • Daily script that reads IP information from routers (SNMP) to document ‘routed by’

information.

  • Racktables API is used to create lightweight IP / VLAN overview

9

slide-10
SLIDE 10

NOC Structure

  • Small team
  • 5 Network Engineers
  • 1 Team Leader
  • All engineers work on support, implementation and innovation projects
  • Rotated NOC duty days (once per week)
  • answering mail
  • Small operational requests
  • handling incidents
  • Only working hours support

TF-NOC meeting Cambridge 2014 – NOC Flash Talk

10

slide-11
SLIDE 11

NOC Frontend / Communication

  • Customers
  • Mainly internal -> system administrators of HPC systems
  • No real SLA’s, however providing redundant connectivity is getting more attention
  • Internal communications are mainly done through email and Trac tickets
  • For external communications we have a mailing list noc@surfsara.nl
  • There is no helpdesk phone number

TF-NOC meeting Cambridge 2014 – NOC Flash Talk

11

slide-12
SLIDE 12

www.surfsara.nl Erik Ruiter Erik.Ruiter@surfsara.nl