FJPPL Computing Workshop Operational experience with second machine - - PowerPoint PPT Presentation

fjppl computing workshop
SMART_READER_LITE
LIVE PREVIEW

FJPPL Computing Workshop Operational experience with second machine - - PowerPoint PPT Presentation

Centre de Calcul de lInstitut National de Physique Nuclaire et de Physique des Particules FJPPL Computing Workshop Operational experience with second machine room at CC-IN2P3 Xavier Canehan Introduction 2 computing rooms at CC-IN2P3


slide-1
SLIDE 1

Centre de Calcul de l’Institut National de Physique Nucléaire et de Physique des Particules

FJPPL Computing Workshop Operational experience with second machine room at CC-IN2P3

Xavier Canehan

slide-2
SLIDE 2

 2 computing rooms at CC-IN2P3 since 2011  Critical choices upon conception lead to consequent

advantages

 Adaptability remains mandatory  Monitoring and testing even the building  Drawbacks

Introduction

FJPPL Computing Workshop – Operational Experience – Xavier Canehan 10/03/2015

2

slide-3
SLIDE 3

Centre de Calcul de l’Institut National de Physique Nucléaire et de Physique des Particules

2 computing rooms

Square feets and high power consumption

FJPPL Computing Workshop – Operational Experience – Xavier Canehan 10/03/2015

3

slide-4
SLIDE 4

FJPPL Computing Workshop – Operational Experience – Xavier Canehan 10/03/2015

4

+10 year perspective Modularity Multi-Tier architecture Ease of deployment Modernity Hot water

Vil-2 initial objectives

slide-5
SLIDE 5

FJPPL Computing Workshop – Operational Experience – Xavier Canehan 10/03/2015

5

 Modern computing room

(details shown during visit)

High Quality results of Initial Conception Initial plan for 2011 - 2019

Target: 2011 50 racks 0.6 MW 2015 125 racks 1.5 MW

2019 240 racks 3.6 MW Tier II Tier III Tier III-IV

 Multi-tier by design  3 phase deployment

  • first one dedicated to

computing farm

  • relying upon regular

budget

slide-6
SLIDE 6

FJPPL Computing Workshop – Operational Experience – Xavier Canehan 10/03/2015

6

First phase: Tier II over 2 lines

TIER II TIER II

28 InRow Cooling units,18 – 20 kW each

One 2 MVA UPS chain of 4 * 500 kVA UPS

2 transformers of 1600 kVA

3 chilling units for 2,4 MW, only one distribution circuit. Backup through a 24m^3 water tank.

2 power lines: dedicated main up to 9 MW, 2 MW reservation on backup line

⅓ floor space used at each level

slide-7
SLIDE 7

FJPPL Computing Workshop – Operational Experience – Xavier Canehan 10/03/2015

7

Site mean power usage, PUE advantage to Vil-2

7

100 200 300 400 500 600 700 800 Vil-1 Vil-2 kW IT kW Total

Mean Power wer Usage age IT agai ainst nst Total al (kW kW)

Room kW IT kW Total PUE PUE1 Vil-1 320 720 (-130) 1.84 Vil-2 300 440 1.46

Site power er co consumpti sumption

  • n

ar around nd 1. 1.1 1 MW MW

Best PUE UE in in Vil Vil-2 Movin ing from Vil il-1 to V Vil il-2 gain ins s ~20% of

  • f power cost
slide-8
SLIDE 8

 PUE is not linear: works by step, intercept is not null

  • Beside investment costs
  • Operational cost of electrical infrastructure must be taken in

account

 eg 1 UPS consumes up to 3kW

 Other costs of investment

  • Water cooled racks value
  • PDU and rack power protection costs

 Vil-1 is fully redundant, deals with hygrometry  Vil-2 initial target was deliberately limited

Entry ticket cost vs fully functional Vil-1

FJPPL Computing Workshop – Operational Experience – Xavier Canehan 10/03/2015

8

No p poin int t in in d dit itchi hing ng Vi Vil-1

slide-9
SLIDE 9

Centre de Calcul de l’Institut National de Physique Nucléaire et de Physique des Particules

Moving grounds

Adaptation remains mandatory

FJPPL Computing Workshop – Operational Experience – Xavier Canehan 10/03/2015

9

slide-10
SLIDE 10

 Environment changes  Needs and perspectives clarify and evolve  IT technology is volatile  Infrastructure pace stays slower  Monitoring everything

Have to evolve

FJPPL Computing Workshop – Operational Experience – Xavier Canehan 10/03/2015

10

slide-11
SLIDE 11

FJPPL Computing Workshop – Operational Experience – Xavier Canehan 10/03/2015

11

 Space floor is no more

a problem

 14 C6200 Dell

PowerEdge will

  • approach InRow Cooling

unit capacity (18kW)

  • need 45 sockets, 3

phases PDU

  • (dedicated PDU development)

Environment – IT densification effects upon racks

At rack limits

IT densification Power sockets # per rack kW/m² increase

Partially filled racks to limit constraints

slide-12
SLIDE 12

Environment – Power costs evolution

FJPPL Computing Workshop – Operational Experience – Xavier Canehan 10/03/2015

12

 5% to 7% of annual

power cost increase

 Looking for IT

efficiency

 Fine tuned power

contract helps to minimize costs

/annual power consumption /annual power cost, without tax / kWh cost, in € cents

Se Seek the more effi ficien cient IT ha hardwa ware in the most effici ficient ent room

slide-13
SLIDE 13

 Actual hardware bears 35°C all year long  Increasing room temperature lowers overall power

consumption

Power efficiency, ASHRAE recommendations

FJPPL Computing Workshop – Operational Experience – Xavier Canehan 10/03/2015

13

25°C server entry setpoint

  • 40% fan

activity for cooling units 11 kW Gain

 Less noise: 95-110 dB to 85-95 dB  Hot corridor temperature also increased

Corridor temperature 36 36°C-40 40°C What will be next setpoint ?

slide-14
SLIDE 14

Planning to cope with scientific needs in foreseable future

FJPPL Computing Workshop – Operational Experience – Xavier Canehan 10/03/2015

14

20 40 60 80 100 120 140 2014 2015 2016 2017 2018 2019

Estimated imated #Racks ks

500 1000 1500 2000 2500

Estimated imated IT power er [kW] W]

  • Estimations from LHC and LSST/Euclid/CTA figures
  • Data modified with current densification factor

What if all new hardware goes into Vil-2 ?

Need Vil-2 adaptation to host storage systems

slide-15
SLIDE 15

 Previous power and cooling distribution systems implied

evolution by pair of hot corridors

 Minimizing costs by actual infrastructure hardware reuse  2017 aim:

  • 80 racks, 1 MW IT
  • 1 lane Tier II and 1 lane Tier III

Initial Plans revision

FJPPL Computing Workshop – Operational Experience – Xavier Canehan 10/03/2015

15

Infrastr astruc uctur ture e cost for each new la lane:

~1.5 .5 M€

Adapt pting ing exis istin ting g le least used ed Tie ier II II la lane 2 p phase ase pla lan le less s than 350 k k€ per year

slide-16
SLIDE 16

Introducing Tier III – Phase 1, power redundancy

FJPPL Computing Workshop – Operational Experience – Xavier Canehan 10/03/2015

16

Used Hot Aisle A/B Hot Aisle C/D Used TIER II TIER III 2015 15 exten tension

  • n

2015 2015 ext.

details to be seen during visit

slide-17
SLIDE 17

Centre de Calcul de l’Institut National de Physique Nucléaire et de Physique des Particules

Drawbacks and limits

From details to major drawbacks

FJPPL Computing Workshop – Operational Experience – Xavier Canehan 10/03/2015

17

slide-18
SLIDE 18

 Coping with interdependant limits

  • Cooling capacity or cooling redundancy
  • Power capacity and Power redundancy
  • Per rack, group of racks, aisle, distribution line

 Multi-tier ability adds an order of complexity

  • Event more if you mix tiers in a single line

 Strict deployment plans needed

Infrastructure limits are easily forgotten

FJPPL Computing Workshop – Operational Experience – Xavier Canehan 10/03/2015

18

Monit itoring ring is is manda dato tory

slide-19
SLIDE 19

Stopping InRow cooling units for 20 minutes → +15°C

Increases corridor temperature around 49°C Increases front temperature to 43°C

Dealing with cooling defect

FJPPL Computing Workshop – Operational Experience – Xavier Canehan 10/03/2015

19

Our water tank provide ides 20 m min in dela lay Need an effi ficient ient shut utdow down system tem

slide-20
SLIDE 20

 Smart shutdown relies upon IPMI

  • detects low continuous slope
  • or fast temperature change

 IPMI needs network  If network switches shut down before servers, IPMI is

useless

 Need a dumb backup power cut system

Won’t reproduce a bad experiment with a water leak

  • 20°C on roof, +65°C in hot corridor

Smart versus dumb shutdown systems

FJPPL Computing Workshop – Operational Experience – Xavier Canehan 10/03/2015

20

slide-21
SLIDE 21

 Improving global Campus Energy Reuse Efficiency ?

Cooling technology: hot water choice

FJPPL Computing Workshop – Operational Experience – Xavier Canehan 10/03/2015

21 Land nd procu

  • cureme

rement nt

  • Agreement to provide hot water on campus

Hot t water ter need ed

  • Very efficent chillers, allowing reuse of hot water
  • Silent hardware

Fixed xed tech chnolo

  • logy
  • Campus is late
  • 3 years spent
slide-22
SLIDE 22

FJPPL Computing Workshop – Operational Experience – Xavier Canehan 10/03/2015

22

No humans: no window, no faucet ?

slide-23
SLIDE 23

 Won’t ever be able to use direct Free Cooling  But new cooling technologies are available

Cooling technology: improving efficiency

FJPPL Computing Workshop – Operational Experience – Xavier Canehan 10/03/2015

23

New cooling technologies Change IT procurement

slide-24
SLIDE 24

FJPPL Computing Workshop – Operational Experience – Xavier Canehan 10/03/2015

24

Very valuable modularity outcomes

 Ceiling rails  Preset pipes  Movable separation

wall between used and free space

 Roof as technical level

slide-25
SLIDE 25

Questions? Thank you!

FJPPL Computing Workshop – Operational Experience – Xavier Canehan 10/03/2015

25