Facebook and the Open Compute Project Charlie Manese - - PowerPoint PPT Presentation

facebook and the open compute project
SMART_READER_LITE
LIVE PREVIEW

Facebook and the Open Compute Project Charlie Manese - - PowerPoint PPT Presentation

Facebook and the Open Compute Project Charlie Manese Infrastructure NSF SDC - June 22, 2015 January 2015 Data Triplet Mezzanine Center Rack Card v2 Battery Freedom Windmill Open Group Hug Cold Open Cabinet Servers (Intel)


slide-1
SLIDE 1
slide-2
SLIDE 2

Facebook and the Open Compute Project

Charlie Manese Infrastructure NSF SDC - June 22, 2015
slide-3
SLIDE 3
slide-4
SLIDE 4 January 2015
slide-5
SLIDE 5
slide-6
SLIDE 6
slide-7
SLIDE 7
slide-8
SLIDE 8

2011 2013 2012 2014

Watermark
 (AMD) Mezzanine Card v1 Windmill (Intel) Winterfell Knox Open Rack v1 Group Hug Micro Server (Panther) Honey Badger Cold Storage Open Rack v2 Mezzanine Card v2 Spitfire
 Server (AMD) Power Supply 
 Battery
 Cabinet Freedom Servers Data Center Triplet Rack
slide-9
SLIDE 9

Open data center stack

Open Rack Leopard Knox Wedge Battery Power 6-Pack Cold Storage Cooling
slide-10
SLIDE 10

Servers Network Software Data Center & Storage

HipHop Virtual Machine

5x ¡– ¡6x faster ¡than ¡ Zend

slide-11
SLIDE 11

Servers Network Software Data Center & Storage

slide-12
SLIDE 12

Original OCP designs

Servers Network Software Data Center & Storage

Energy

Efficiency

38 %

Cost

24 %

slide-13
SLIDE 13

Servers Network Software Data Center & Storage

slide-14
SLIDE 14

Efficiency gains with OCP

$2 Billion

slide-15
SLIDE 15

Efficiency gains with OCP

95,000 Cars

80,000 Homes

Annual Carbon Savings

Annual Energy Savings

slide-16
SLIDE 16

Design principles

▪ Efficiency ▪ Scale ▪ Simplicity ▪ Vanity Free ▪ Easy to Operate

slide-17
SLIDE 17

DATA CENTER

slide-18
SLIDE 18

Facebook greenfield datacenter

Goal

▪ Design and build the most efficient datacenter eco-system possible

Control

▪ Application ▪ Server configuration ▪ Datacenter design
slide-19
SLIDE 19

Prineville, OR Forest City, NC

Luleå, Sweden

slide-20
SLIDE 20

Electrical overview

▪ Eliminate 480V to 208V transformation ▪ Used 480/277VAC distribution to IT equipment ▪ Remove centralized UPS ▪ Implemented 48VDC UPS System ▪ Result a highly efficient electrical system and small failure

domain

slide-21
SLIDE 21

Typical Power Prineville Power

Utility Transformer 480/277 VAC 99.999%
 Availability

Total loss up to server: 


2% loss 6% - 12% loss 3% loss 208/120VAC

AC/DC DC/AC ASTS/PDU SERVER PS

Standby Generator 10% loss (assuming 90% plus PS) Utility Transformer 480/277 VAC 99.9999% Availability 2% loss 480/277VAC

Total loss up to server:

FB SERVER PS

Standby Generator

48VDC DC UPS

(Stand-by) 5.5% loss UPS 480VAC

21% to 27% 7.5%

slide-22
SLIDE 22

Reactor power panel

▪ Custom Fabricated RPP ▪ Delivers 165kW, 480/277V, 3-phase to CAB

level

▪ Contains Cam-Lock connector for

maintenance wrap around

▪ Line Reactor ▪ Reduces short circuit current < 10kA ▪ Corrects leading power factor towards

unity (3% improvement)

▪ Reduces THD for improved electrical

system performance (iTHD 2% improvement)

▪ Power consumption = 360 Watt
slide-23
SLIDE 23

Battery cabinet

▪ Custom DC UPS ▪ 56kW or 85kW ▪ 480VAC, 3-phase input ▪ 45 second back-up ▪ 20 sealed VRLA batteries ▪ Battery Validation System ▪ Six 48VDC Output ▪ Two 50A 48VDC aux outputs
slide-24
SLIDE 24

Mechanical overview

▪ Removed ▪ Centralized chiller plant ▪ HVAC ductwork ▪ System Basis of Design ▪ ASHRAE Weather Data: N=50 years ▪ TC9.9 2008: Recommended Envelopes

▪ Built-up penthouse air handling system ▪ Server waste heat is used for office space heating

slide-25
SLIDE 25

Typical datacenter cooling

Ductless Return Air Fan Wall Evap System 100% Outside Air Intake DUCTLESS SUPPLY Return Air Filter Wall Return Air Plenum Ductless Relief Air Relief Air DATA CENTER SUPPLY DUCT

CT CHILLER

AHU

DATA CENTER Return Ductwork

Prineville datacenter cooling

slide-26
SLIDE 26 Fan Wall Evap System 100% Outside Air Intake Hot Aisle Return Air Filter Wall Relief Air Corridor Common Return Air Plenum Relief Air Fan Supply Air Corridor Data Center Mixed Air Corridor Server Cabinets Server Cabinets Hot Aisle Hot Aisle Relief Air Hot Aisle Common Cold Aisle

PRN datacenter cooling

slide-27
SLIDE 27

Cold aisle pressurization – ductless supply

slide-28
SLIDE 28

80ºF/27ºC inlet 65% humidity 20ºF/11ºC ΔT

PRN1A1B

85ºF/30ºC inlet 80% humidity 22ºF/11ºC ΔT 85ºF/30ºC inlet 90% humidity 22ºF/11ºC ΔT 85ºF/30ºC inlet 80% humidity 22ºF/11ºC ΔT

PRN1C1D FRC1A1B LLA1A1B

Basis of design comparison

slide-29
SLIDE 29

RACK, SERVERS, AND STORAGE

slide-30
SLIDE 30

Open Compute Rack: Open Rack

  • Well-defined “Mechanical API” between the server and

the rack

  • Accepts any size equipment 1U – 10U
  • Wide 21” equipment bay for maximum space efficiency
  • Shared 12v DC power system
slide-31
SLIDE 31

Open Compute Server v2

  • First step with shared components by

reusing PSU and fans between two servers

  • Increased rack density without

sacrificing efficiency or cost

  • All new Facebook deployments in 2012

were “v2” servers

slide-32
SLIDE 32

Open Compute Server v3

  • Reuses the “v2” half-width

motherboards

  • Self-contained sled for Open Rack
  • 3-across 2U form factor enables

80mm fans with 45 servers per rack

slide-33
SLIDE 33

Open Vault

  • Storage JBOD for Open Rack
  • Fills the volume of the rack without sacrificing

hot-swap

slide-34
SLIDE 34

NETWORK

slide-35
SLIDE 35

Traffic growth

slide-36
SLIDE 36

Fabric

slide-37
SLIDE 37

Wedge

slide-38
SLIDE 38

FBOSS

slide-39
SLIDE 39

6-Pack

slide-40
SLIDE 40

SERVICEABILITY

slide-41
SLIDE 41

Complex designs

Typical large datacenter: 1000 Servers per Technician
slide-42
SLIDE 42

Complex Simple designs

Typical large datacenter: 1000 Servers per Technician Facebook datacenter: 25,000 Servers per Technician
slide-43
SLIDE 43

Efficiency through serviceability

Standing ¡at ¡Machine OEM ¡REPAIRS Pre-­‑Repair ¡Activities ¡ Min Part ¡Swap ¡Duration ¡ Min Additional ¡ Steps ¡Min Post-­‑Repair ¡ Activities ¡Min Total ¡Repair ¡Time ¡ Min Hard ¡Drive ¡(Non-­‑raid) 2 3 2 7 DIMM ¡(Offline) 2 3 2 7 Motherboard 2 20 20 2 44 PSU ¡(Hot ¡Swap) 2 5 2 9 OCP#1 ¡REPAIRS Pre-­‑Repair ¡Activities ¡ Min Part ¡Swap ¡Duration ¡ Min Additional ¡ Steps ¡Min Post-­‑Repair ¡ Activities ¡Min Total ¡Repair ¡Time ¡ Min Hard ¡Drive ¡(Non-­‑raid) 0.98 0.98 DIMM ¡(Offline) 0.82 0.82 Motherboard 2.5 10.41 2.5 15.41 PSU ¡(Hot ¡Swap) 0.65 0.65
slide-44
SLIDE 44

First-time-fix repair rates

85% 86% 87% 88% 89% 90% 91% 92% 93% 94% 95% 96% 97% 98% 99% 100% Jul 12 Aug 12 Sep 12 Oct 12 Nov 12 Dec 12 Target
slide-45
SLIDE 45

Let’s engage

slide-46
SLIDE 46 KEYNOTE