Slide: 1
ISDD Friday lecture Bits, Bytes and certainly more than just - - PowerPoint PPT Presentation
ISDD Friday lecture Bits, Bytes and certainly more than just - - PowerPoint PPT Presentation
ISDD Friday lecture Bits, Bytes and certainly more than just Microsoft, an overview of the ESRF computing infrastructure Slide: 1 Organisational Today, many of us are computer experts or at least computer literate Home computing
Slide: 2
Organisational
- Today, many of us are computer experts … or at least computer literate
- Home computing (PCs, Smartphones, tablets, Playstations, smartTVs, etc.)
- Desktop computing (office applications, data analysis, etc.)
- IT (Information Technology) or ICT (Information and Communication Technology) is
transforming our lives
- Two Divisions provide professional computing support at ESRF:
ISDD Friday lecture – Computing Infrastructure
Slide: 3 ISDD Friday lecture – Computing Infrastructure
Management Information Systems Management Information Systems & Web
Computing Groups/Units
Software Windows UNIX Network Hotline MIS Web Data Analysis Accelerator Control Beamline Control
Jeremy Claude
Slide: 4
- Monthly Computer Coordination Meetings (CCMs)
- To discuss cross divisional computing matters like standards, support, developments
Participants:
- ISDD: G. Beruyer, JM. Chaize, C. Ferrero, A. Götz
- TID: R. Dimper, B. Lebayle, D. Porte
- AF. Maydew (notes)
- Bi-monthly Computer Security Working Group (CSWG) meetings
- To discuss all matters concerning IT security, define policies, follow up incidents
Participants:
- F. Calvelo-Vazquez, R. Dimper, L. Duparchy, B. Dupré, B. Lebayle, AF. Maydew (notes), C.
Rolland
- Many thematic meetings:
- LINUX
- Buffer Storage, etc.
ISDD Friday lecture – Computing Infrastructure
Organisational
“At this point in the meeting we’ll
- pen a discussion of whether or not
we needed to have this meeting.”
Slide: 5
This presentation is not about:
- Software design,
- software standards,
- control systems,
- field buses,
- stepper motor controllers,
- programmable logic controllers,
- digital electronics,
- Microsoft Office,
- OpenOffice,
- data analysis software,
- ISDD activities,
- EX2,
- CPER,
- :
- a million other interesting things
- :
This presentation is about:
- the computer rooms,
- the network,
- data storage, data management,
- IT support,
- Upcoming projects
- :
- computer infrastructure!
ISDD Friday lecture – Computing Infrastructure
Overview
Slide: 6
Outline
Organisational
- Overview
- Network
- Computer rooms
- Keeping our data safe
- Analysing data
- Around the desktop
- Its all virtual
- Databases
- What’s on our plate?
ISDD Friday lecture – Computing Infrastructure
Slide: 7
Today IT (Information Technology) is underpinning everybody’s work
- Many systems/computers are critical for everyday work
- Desktop PC
- Printers
- Network
- Internet
- Databases Management Information Systems
- Smartphones
- to assure functions like:
- Internet browsing
- Text editing
- Order processing
- Data analysis
- Vacation requests
- :
- :
Overview
ISDD Friday lecture – Computing Infrastructure
Slide: 8
Core mission of the ESRF – produce data (and publications!)
The data life cycle
ISDD Friday lecture – Computing Infrastructure
Overview
Step 1 Generation Step 2 Verification Step 3 Transfer+ Storage Step 4 Transformation/ Analysis Step 5 Archival Step 6 Publication Step 7 Destruction
Slide: 9
Outline
Organisational Overview
- Network
- Computer rooms
- Keeping our data safe
- Analysing data
- Around the desktop
- Its all virtual
- Databases
- What’s on our plate?
ISDD Friday lecture – Computing Infrastructure
Slide: 10 ISDD Friday lecture – Computing Infrastructure
Network
IT is everywhere…the network is everywhere!
ESRF operates with a class B IP address:
- 160.103.a.b a=subnet, b=host
- Network speed: Mbps or Gbps = Mega or Giga
bits per second Network backbone based on Extreme Network switches:
- BlackDiamond8k switches with multiple 10 Gbps
backbone links
- On the beamlines: Extreme Summit X450-48P
- 398 switches, all with 1 Gbps or 10Gbps ports
- Inter-switch links based on up to 8 x 10 Gbps ports
- Extreme Networks = fast (10 G wires-peed routing,
filtering), reliable (dual power, dual management, dual modules), stable
- First 40 Gbps ports ordered
Slide: 11
Inventory
- 280 networks
- 8 627 nodes
- 46 routers
- 398 network switches
- >15 000 x 1Gbps capable copper ports
- >1 000 x 10Gbps fibre ports
- Beamlines with 10Gbps uplinks:
- BM5, BM14, ID14, ID15, ID17Sat1, ID19, ID20, BM23, ID23, ID24, ID29, ID30
- Computers with “private” 10Gbps links:
- hexsalsa (ID15), wid15dimax (ID15), id19sat1 (ID19), lid29io (ID29), id29gate (ID29)
And the network is also:
- Wi-Fi, SSL gateways, firewall, copper cabling, fibre optic cabling, network monitoring
and ...
- Network standby for the accelerators and beamlines
ISDD Friday lecture – Computing Infrastructure
Network
Slide: 12 ISDD Friday lecture – Computing Infrastructure
Network
Synopsis
80 Gbps 40 Gbps 10 Gbps 1 Gbps 100 Mbps Backup links
Standard beamline
High-throughput beamline
Control Room Building Computer Room Central Building Computer Room
Offices Internet
Slide: 13
ESRF/ILL/EMBL connected via RENATER Network
ISDD Friday lecture – Computing Infrastructure
Slide: 14
ILL8
Site Entrance H2 Restaurant Roundabout Tigre 1 active Tigre 2 passive Tigre 1 passive Tigre 2 active A480 / Campus St Martin d'Heres A B C D1 D2 E ILL/ILL17 ESRF/Central Building EMBL Active device
Metronet / Tigre
Fiber optic termination Site router Avenue des Martyrs / INPG
Z5
Network – Internet cabling
ISDD Friday lecture – Computing Infrastructure
Slide: 15
Router ESRF LAN ILL LAN EMBL LAN ESRF Premises ILL Premises Tigre1 Tigre2 Renater Grenoble
DMZ DMZ DMZ
PacketShaper Firewall+router Level2 switch BGP BGP
Network – Firewall et al
ISDD Friday lecture – Computing Infrastructure
Slide: 16
Outline
Organisational Overview Network
- Computer rooms
- Keeping our data safe
- Analysing data
- Around the desktop
- Its all virtual
- Databases
- What’s on our plate?
ISDD Friday lecture – Computing Infrastructure
Slide: 17
Computer rooms / data centres
- Two computer rooms – data centres
- CTRM – 150 kW electrical power, 110 m2
- Central Building – 370 kW electrical power, 300 m2
- Why?
- Never put all eggs into the same basket
- keep a copy of all data in the two rooms
- Disks tapes
- Split fault tolerant systems between the two rooms
- Many technical rooms, at least one in each building – network hubs
ISDD Friday lecture – Computing Infrastructure
Why a new+bigger data centre?
- Insufficient power
- Insufficient cooling
- Insufficient floor space
- Inadequate infrastructure
- Instant provisioning required:
- rack space
- network connections
- power outlets
Slide: 18
Data Centre - Construction
- Built around the existing computing room, all equipement kept operational
during the works
- Reinforced slab and false floor supporting 1000 kg/m2
- Fireproof glass windows
- Noise reduction
ISDD Friday lecture – Computing Infrastructure
Slide: 19
- 10 months (without preparatory works)
- Dust minimized
- Noise minimized
- Disturbance minimized
- Cooling kept efficient
- Computing equipment kept up and running (even when replacing the racks!)
Data Centre - Construction
ISDD Friday lecture – Computing Infrastructure
Slide: 20
Data Centre - design
- 300 m2
- 370 kW
- 1000 kg/m2
- Cold aisle / hot aisle
- Low density area = 66 racks, 170 kW
- High density area = 10 racks, 200 kW
ISDD Friday lecture – Computing Infrastructure
Slide: 21
Data Centre – behind the scene
Dual power supply for all equipment Dual UPS in separate rooms Aerial cable trays for electricity + network Flexible and modular electrical distribution Dual cooling system = chilled water + air exchangers Smoke extraction system (in case of fire) Chilled water circuit for the high density area False floor: several fan-equipped tiles
ISDD Friday lecture – Computing Infrastructure
Slide: 22
Cold aisle / Hot aisle principle – section view
Rack
Computer
Inject cold air in false floor Extract hot air
ISDD Friday lecture – Computing Infrastructure
Data Centre – cooling
Slide: 23
Cold aisle / Hot aisle principle – aerial view
ISDD Friday lecture – Computing Infrastructure
Data Centre – cooling
Slide: 24
Why a high density area?
A perforated tile is not sufficient for cooling a single rack full of powerful servers (20-30 kW/rack), free air flow typically allows for 10-15 kW/rack maximum Rack
ISDD Friday lecture – Computing Infrastructure
Computer
More efficient: cold air has not to be pushed over 20 meters to the computers More reliable: one of the 6 units can fail without consequence 6 dedicated AC units
Door
Top view
Slide: 25
Data Centre
Two rows of racks (cold aisle) “The cube” (closed hot aisle, up to 200 kW)
ISDD Friday lecture – Computing Infrastructure
Slide: 26
Data Centre – to house what?
- Network equipment
- Disk systems
- Tape libraries
- Infrastructure servers
ISDD Friday lecture – Computing Infrastructure
Slide: 27 ISDD Friday lecture – Computing Infrastructure
>160 infrastructure servers
AD Patching Antivirus Printing LDAP
Antivirus email UNIX Printing
LINUX repository
Sysadmin Netadmin
Mailing lists Calendar Mysql databases
DNS SSH/NX
MIS UNIX Virtualization NIS
DHCP
PXE/Rembo Licenses
Web proxies
Backup Storage
Firewall (DNS, ssh) MAIL Transfer Agents
Time
File sharing
WiKi
Samba
Web/plone
ISPyB database
ECAPS
OAR
- Perf. Clusters
Paleo database
Radius
- Perf. Network
Jira
Graindb database
Antispam email
Web filtering
Linux clusters WiFi service UNIX WINDOWS NETWORK Co-admin
Terminal services
Slide: 28
E-mail infrastructure
ISDD Friday lecture – Computing Infrastructure
MAIL Transfer Agent MAIL Transfer Agent
Antivirus e-mail
Antispam e-mail
ESRF Spam filtering > 70%
(60 000-400 000 emails/day for ESRF + ILL))
Spam filtering > 50%
Slide: 29
Outline
Organisational Overview Network Computer rooms
- Keeping our data safe
- Analysing data
- Around the desktop
- Its all virtual
- Databases
- What’s on our plate?
ISDD Friday lecture – Computing Infrastructure
Slide: 30
What is RAID?
- No, its not the stuff to kill bugs
- RAID stands for Redundant Array of Independent/Inexpensive Disks
- There are different RAID levels, the most popular being RAID-0, RAID-1, and
RAID-5
ISDD Friday lecture – Computing Infrastructure
Slide: 31
Central disk storage
- All central disk storage is based on NetApp NAS filers
- Capacity:
- 150 TB legacy disk storage system
- 600 TB GX disk storage system
– 685 TB for NICE in total, including 311 TB /data/visitor, – 120 filesystems, including 106 for Beamline data
- 500 TB under commissioning
- Access modes:
- UNIX - NFS
- Windows - CIFS
ISDD Friday lecture – Computing Infrastructure
Slide: 32
Central disk storage
Performance
- Performance – single thread
- Typical 80 MB/s write and 50 MB/s read on legacy and GX
- Typical 400 MB/s read/write requested in CFT 2011
- Typical 500 MB/s write and 200 MB/s read currently obtained
- Performance - overall
- Typical 1 GB/s total bandwidth for legacy and GX
- Typical 4 GB/s total bandwidth requested in CFT 2011
- Typical 1.7 GB/s total bandwidth currently obtained on new system
- New system will be used for /data/visitors exclusively
- The older GX systems will be reconfigured for higher performance (8 TB file
systems)
- Next step:
- Tendering in 2012 = 1 to 2 PB, probably use of pNFS (parallel NFS,
NFS V4.1) for higher performance
ISDD Friday lecture – Computing Infrastructure
Slide: 33
NICE data management policy
- /data/visitor (proposals)
- accounts and data deleted 30 days after the end date
- can be extended for one month on request, once only
- inhouse data
- deleted after one year (twice a year, 30/06 and 31/12)
- can be kept indefinitely on request, i.e. user manages disk space
- home directories (10 GB) and e-mail (4 GB)
- kept indefinitely
ISDD Friday lecture – Computing Infrastructure
Slide: 34
What is LTO?
- LTO = Linear Tape Open (successor of the DLT)
- First released in year 2000 (HP/IBM), now in the 5th
generation
- Serpentine recording/reading, multiple tracks at once
- Coherent + downward compatible road map
- Same form factor of tapes and tape drives
- Tape cost: ~25€/tape = ~25€/TB
- 64 km to write an entire LTO4 tape @ 3.2 m/s (a
pedestrian walks at 1.4 m/s)
ISDD Friday lecture – Computing Infrastructure
Slide: 35
LTO evolution
LTO Generation
Attribute LTO-1 LTO-2 LTO-3 LTO-4 LTO-5 LTO-6 LTO-7 LTO-8 Release Year 2000 2003 2005 2007 2010 TBA TBA TBA Native Data Capacity 100 GB 200 GB 400 GB 800 GB 1.5 TB 3.2 TB 6.4 TB 12.8 TB Max r/w Speed (MB/s) 20 40 80 120 140 200 315 472 Tape Thickness 8.9 µm 8.9 µm 8 µm 6.6 µm 6.4 µm Tape Length 609 m 609 m 680 m 820 m 846 m Tracks written per pass 8 8 16 16 16 Passes to write entire tape 48 64 44 56 80 Total tracks 384 512 704 896 1280
ISDD Friday lecture – Computing Infrastructure
Slide: 36
Backup Overview
160 infrastructure servers 280 BL workstations 6 Oracle databases Nice disk storage 2 tape libraries 3 file servers Time Navigator (TiNa) software 16 backup servers
ISDD Friday lecture – Computing Infrastructure
Slide: 37
Backup Timeline
Full backups
Backup of all current data Performed typically every month (depends on data) Varies from once a week (databases) to 3 months (LTPs) Done systematically prior to removing a proposal account
Incremental backups
Backup of all new data… … and data modified since last backup Performed daily (typically during the night)
Retention time
Data kept for 6 months after it has been backed up Afterwards backup media is re-used for new backups…
ISDD Friday lecture – Computing Infrastructure
Slide: 38
Tape Backup Libraries
2 STK L8500 tape libraries:
Capacity of 8500 tapes each (37% used)
8 redundant robots (handbots) in each
63 LTO-3 and LTO-4 tape drives in total in both libraries
Data protection:
1 tape library in each computer room
Data stored in one room is backed up in the other
Some critical data is duplicated in both rooms
ISDD Friday lecture – Computing Infrastructure
Slide: 39
Tape Backup Media
Visitors 46% Inhouse 35% Infra 9% Archiving 8% Misc 2%
Over 6300 tapes (LTO3+4) and 3.7 PB of data
3.1 PB used by Nice, of which 1.7 PB for /data/visitor 325 TB used by infrastructure servers and databases 312 TB used by Data Archiving
ISDD Friday lecture – Computing Infrastructure
Slide: 40
Further Backup Activities
- Beamline Backup
- Automated handling (installation & monitoring) of backup clients
- Data available to users for 24/7 restoration
- Low-latency disk-based storage for fast backups & restores
- 21 TB total backup data (0.5 % of tape backup !)
- Data Archiving
- Currently 156 TB stored forever (2 x 200 tapes)
- Data duplicated on 2 sets of tapes in 2 libraries in 2 buildings
- Data will be migrated on newer tape technologies when needed
- Data Externalization
- Selected data (8 TB) stored in a safe place every 2 months
ISDD Friday lecture – Computing Infrastructure
Slide: 41
Outline
Organisational Overview Network Computer rooms Keeping our data safe
- Analysing data
- Around the desktop
- Its all virtual
- Databases
- What’s on our plate?
ISDD Friday lecture – Computing Infrastructure
Slide: 42
Intel Nehalem Processor 731 000 000 transistors
Slide: 43
Multicore architecture
- CPUs (Central Processing Units) have reached
a frequency/heat limit in 2003
- This was the end of sequential computing
- Since then, processors have more and more
“cores”, i.e. independent processing units
- This triggered a software revolution, starting
with games
- Multi-core architectures are now common place
- This is pushed to the extreme in GPUs
(Graphical Processing Units)
- Nvidia Fermi processor = 512 cores, 1.2
GHz, 3 billion transistors!
- Low power consumption, i.e. many cores at low
frequency
- A new challenge: how to get the data quickly
in/out of the processors
ISDD Friday lecture – Computing Infrastructure
Slide: 44
Compute clusters
- OAR job scheduler (accessible via rnice, resource reservation, interactive / batch)
- Linux compute clusters
– NICE grid: 507 cores on 89 nodes – Dedicated: 446 cores on 62 nodes (bliss, cronus, mx, violet)
- Many ageing HP and SUN pizza boxes, 1 IBM blade cluster (14 blades)
- 3 BullX clusters with
– CPU blades – up to 96GB RAM, 2 Intel processors – GPU blades – up to 48GB RAM, 2 Intel processors, 2 Fermi GPUs – Optional Infiniband – Up to 18 CPU blades or 9 GPU blades per chassis
- Scientific software: Matlab, Mathematica, Octave, IDL, Python, etc.
ISDD Friday lecture – Computing Infrastructure
MX group blade cluster:
- 18 CPU blades
- 36 x 6-core Intel 3GHz
processors, 8GB per core)
- total 216 cores
- total 1 728GB RAM
- 2.6 Tflops
Slide: 45
LinkSCEEM2 – GPU codes
- LinkSCEEM-2 – Linking Scientific Computing in Europe and the Easter Mediterranean
- Porting SR data analysis code to GPUs (D. Karkoulis)
- Shadow ray tracing code optimised
- Comparison of CUDA and OpenCL
ISDD Friday lecture – Computing Infrastructure
Slide: 46
Outline
Organisational Overview Network Computer rooms Keeping our data safe Analysing data
- Around the desktop
- Its all virtual
- Databases
- What’s on our plate?
ISDD Friday lecture – Computing Infrastructure
Slide: 47
Around the desktop
- Windows office system
- store procurement,
- definition of standards,
- installation,
- printing,
- anti-virus,
- patching,
- file sharing,
- multimedia,
- loan pools,
- user support
- Hotline (Jira) → 20 calls per day on average
ISDD Friday lecture – Computing Infrastructure
Slide: 48
PC procurement
- All PCs and laptops are from DELL
- Standard configurations for Windows PCs and Laptops in the Stores
Over 5 years: 810 PCs, 456 laptops
ISDD Friday lecture – Computing Infrastructure
Slide: 49
The brand new DELL keyboard (Windows 8 compatible) Now available in the Stores!
ISDD Friday lecture – Computing Infrastructure
Slide: 50
Outline
Organisational Overview Network Computer rooms Keeping our data safe Analysing data Around the desktop
- Its all virtual
- Databases
- What’s on our plate?
ISDD Friday lecture – Computing Infrastructure
Slide: 51
Virtualisation
- Virtulisation allows to optimise the use of server computers
- Several operating system instances run a physical server
- Operating system instances are independent, i.e.
- They are managed independently like separate computers
- Problems do not propagate to other instances
- Allows to keep old UNIX releases, i.e. ideal for software development platforms
- Allows to optimise hardware usage
- We use XEN and KVM
- KVM will be our standard platform
ISDD Friday lecture – Computing Infrastructure
Slide: 52
Cloud computing
- What is the “cloud computing”?
- The next big thing in IT after the Grid hype
- A metaphor for the delivery of computing requirements as a service.
- Sharing of resources for economies of scale
- Access through a web browser or a light weight application
- Used by companies to meet unpredictable business needs (flexibility)
- Infrastructure as a Service (IaaS)
- Software as a Service (SaaS)
- Commonly known examples:
- Dropbox
- Picasa
- Google docs
- iCloud
- CERN, EMBL, and ESA currently investigate
Cloud computing within EIROforum
ISDD Friday lecture – Computing Infrastructure
Slide: 53
Outline
Organisational Overview Network Computer rooms Keeping our data safe Analysing data Around the desktop Its all virtual
- Databases
- What’s on our plate?
ISDD Friday lecture – Computing Infrastructure
Slide: 54 ISDD Friday lecture – Computing Infrastructure
MIS infrastructure
Slide: 55
MIS applications
ISDD Friday lecture – Computing Infrastructure
Applications
- E-business Suite
- Alfresco
- Orchestra
- SLX
- Sagere
- Ever
- Trèsorerie
- Pleiades
- QlickView
- E-recruitment
- SALTO
- Cyperplus paiement
- Business Object
- ORACLE ERP
- SMIS
- Safety trainings
- ISPyB
- TomoDB
- TBS Pools
- Phone Directory
- Site entrance
- Gas tracking
- Store withdrawal
- Magellan
- Paperless PO
- BAT
- Budget expenditure
- Resource booking
- Allshare
- :
Support
- Web - Plone
- PC support
- Server support
- Backup
Slide: 56
Outline
Organisational Overview Network Computer rooms Keeping our data safe Analysing data Around the desktop Its all virtual Databases
- What’s on our plate?
ISDD Friday lecture – Computing Infrastructure
Slide: 57
Key issues
- Extending our disk capacity
- Find the right balance between price, performance, reliability, ease of operation
- Replace ageing data analysis clusters
- Upgrade the ORACLE ERP system
- Upgrade the CMS (Content Management System) of our Web
- Replace the RICOH photocopiers/printers
- Upgrade or replace PLEIADES
- Work on the Peer Review Process and the new BTAPs
- In the frame of the CRISP and PaN-data projects, and together with ISDD and EXPD:
- Work on the beamline local buffer solution
- Work on Identity Management, Authorisation
- Work on metadata capture, data preservation, data continuum
- Further discuss the data policy at ESRF
- Continue observing the EIROforum Cloud initiative
- Try to do all this despite a very difficult budgetary context
Slide: 58
CRISP WP 18 – The data challenge
- CRISP – WP18 ISDD + TID
- Typically 10 Beamlines, each with 3 x 16 Mpixel detectors producing at 100-200
MB/s with sustained peak performance of minutes to hours → 21.7 TB/hour maximum
- Not all detectors operate simultaneously → 1 TB/hour
- Because of the cycle time of experiments → 100 GB/hour
- 2015 figure = 10 times more → 1TB/hour = 24 TB/day
- Local buffer storage on the Beamlines to:
- Guaranteed data rate from detector
- Allow for fast online data analysis
- Provide a buffer for 2 days of data production
- Allow automatic export of data
ISDD Friday lecture – Computing Infrastructure
“Now that we can tell time, I’d like to suggest that we begin imposing deadlines.”
Slide: 59
CRISP WP 18 – Local buffer storage
- ESRF requirements
- Very fast write while reading for on-line data analysis
- Complementary to central disk storage
- 3 fast CCD detectors/experiment
- Peak write/read 300 MB/s now and 3 GB/s in 3 years
- Average (sustained) write/read 1/10th of the above
- Local buffer for 2 days (weekend), i.e. ~10TB/beamline
- NFS V3/V4 and CIFS
- List 10 000 files < 3s
- Multiple 10Gbps network attachments
ISDD Friday lecture – Computing Infrastructure
Slide: 60
- We are going to investigate and eventually prototype the following:
- High performance RAID hardware, SSDs
- RAM disks
- Parallel NFS
- Double buffering
- Linux kernel I/O scheduling
- Very recent LINUX kernels
- The challenge: finding the right balance between performance and ease of maintenance.
ISDD Friday lecture – Computing Infrastructure
CRISP WP 18 – Local buffer storage
Slide: 61
Thank you for your attention!
ISDD Friday lecture – Computing Infrastructure