I/O Congestion Avoidance via Routing and Object Placement David - PowerPoint PPT Presentation

I/O Congestion Avoidance via Routing and Object Placement David Dillow, Galen Shipman, Sarp Oral, and Zhe Zhang 1

Motivation ● Goal: 240 GB/s routed ● Direct-attached vs. Center-wide ● Limited allocations ● INCITE averages 27 million hours ● Prefer to spend time computing ● Performance issues at scale 2

Spider Resources ● 48 DDN 9900 Couplets ● 13,440 SATA 1 TB hard drives ● DDR InfiniBand connectivity ● 192 Dell PowerEdge 1950 OSS ● 16 GB memory ● 2x Quad-core Xeons @ 2.3 GHz ● 4 Cisco 7024D 288 port DDR IB switches ● 48 Flextronics 24 port DDR IB switches 3

Wiring up SION 32 links 96 links 96 links 64 links 64 links 8 links 96 links 96 links 4

Direct-attached Traffic Flow Fabric Client OSS Storage 5

Direct-attached Raw I/O Baseline 6

Direct-attached Lustre Baseline 7

Writer Skew 8

SeaStar Bandwidth 9

Link Oversubscription ● Each link can sustain ~3100 MB/s (unidir) ● Each OST can contribute 180 MB/s with a balanced load presented to the DDN 9900 ● 260 MB/s individually ● Therefore, each link can support 17 client-OST pairs at saturation ● 11 client-OST pairs @ 260 MB/s 10

Link Oversubscription ● 70% of tests had more than one link with 18 client-OST pairs ● 42% had more than 34 pairs ● 21% had more than 60 ● 3% had over 70% But that's only part of the issue 11

Imbalanced Sharing 12

Placing I/O in the Torus ● We want to minimize link congestion ● Prefer no more than 11 client-OST pairs ● Easiest method is to place active clients topologically close to servers ● Use hop count as our metric 13

Placing I/O in the Torus ● For each OST to be used ● Calculate hop count from to OSS from each client ● Pick the client with the lowest count ● Remove that client from further consideration 14

Placing I/O in the Torus Fabric Client OSS Storage 15

Placement Results 16

Improved Writer Skew 17

Does it work in a smaller space? 18

LNET Routing ● Allows us to separate storage from compute platform ● Very simple in nature ● List of routers for each remote LNET ● Routers can have different weights ● 1024 character max for route option ● use lctl add_route for larger configs 19

Simple LNET Routing ● 196 routers in the torus ● Client uses each router in a weight class in a round-robin manner ● 8 back-to-back messages to a single destination will use 8 different routers ● Congestion in both torus and InfiniBand ● No opportunity to improve placement to control congestion 20

Simple LNET Routing Fabric Client Router Storage 21

InfiniBand Congestion 22

Improved Routing Configurations ● Aim to eliminate InfiniBand congestion ● Aim to reduce torus congestion ● Provide ability for application to determine which router will be used for a particular OST ● given OST-to-OSS mapping ● given client-to-router mappings 23

Nearest Neighbor ● 32 sets of routers ● one set for each leaf module ● 6 OSS servers in each set ● 6 routers in each set ● Each client chooses the nearest router to talk to the OSSes in a set ● Variable performance ● by job size ● by job location 24

Nearest Neighbor Fabric Client 1 1 1 1 2 2 2 2 Router (Group A) 1 1 1 1 2 2 2 2 Storage (Group A) 1 1 1 1 2 2 2 2 Router (Group B) 1 1 1 1 2 2 2 2 Storage (Group B) 25

Round Robin ● Again, 32 sets of routers ● Ordered list of routers for each set ● Client chooses router (nid % 6) for the set ● Throws I/O traffic around the torus 26

Round Robin Fabric Client 1 2 1 2 1 2 1 2 Router (Group A) 2 1 1 1 2 1 2 1 Storage (Group A) 1 1 1 2 1 2 1 2 Router (Group B) 2 1 2 1 2 1 2 1 Storage (Group B) 27

Projection ● 192 LNET networks ● one for each OSS ● One primary router for each LNET ● add higher weights for backup routers ● Clients experience variable latency to OSSes based on location ● Placement calculations similar to direct- attached 28

Projection Fabric Client Router Storage 29

Routed Placement Results (IOR) 30

Conclusions ● Goals exceeded: 244 GB/s on routed storage ● “Projected” configuration in production ● Working with library developers to bring this to users 31

Questions? ● Contact info: David Dillow 865-241-6602 dillowda@ornl.gov This research used resources of the Oak Ridge Leadership Computing Facility, located in the National Center for Computational Sciences at Oak Ridge National Laboratory, which is supported by the Office of Science of the Department of Energy under Contract DE-AC05-00OR22725. Notice: This manuscript has been authored by UT-Battelle, LLC, under Contract No. DE-AC05-00OR22725 with the U.S. Department of Energy. The United States Government retains and the publisher, by accepting the article for publication, acknowledges that the United States Government retains a non-exclusive, paid-up, irrevocable, world-wide license to publish or reproduce the published form of this manuscript, or allow others to do so, for United States Government purposes. 32

I/O Congestion Avoidance via Routing and Object Placement David - PowerPoint PPT Presentation

I/O Congestion Avoidance via Routing and Object Placement David Dillow, Galen Shipman, Sarp Oral, and Zhe Zhang 1 Motivation Goal: 240 GB/s routed Direct-attached vs. Center-wide Limited allocations INCITE averages 27 million

TCP Congestion Avoidance Joshua Gancher November 10, 2016 Joshua Gancher TCP Congestion

What do you mean, Congestion? some history Congestion Collapse

Congestion Games and Selfish Routing Maria Serna Fall 2016 AGT-MIRI, FIB-UPC Congestion Games

CS 557 Congestion Avoidance Congestion Avoidance and Control Jacobson and Karels, 1988 Spring

Optimization-based routing and congestion control Routing, congestion control as optimization

Pattern avoidance Definitions in rook monoids Rook Monoids Avoidance 1d Avoidance All 0/No 0

Internet congestion control: TCP Internet congestion control: TCP 1988: "Congestion

Congestion Games with affine functions Maria Serna Fall 2016 AGT-MIRI, FIB-UPC Congestion Games

Congestion Control Outline Queuing Discipline Reacting to Congestion Avoiding Congestion 1

CS 356: Computer Network Architectures Lecture 20: Congestion Avoidance Chap. 6.4 and related

The Present and Future of Congestion Control Mark Handley Outline Purpose of congestion

Congestion Control Mark Handley Outline Part 1: Traditional congestion control for bulk

Scalable Routing Outline Routing Algorithms Scalability 1 Overview Forwarding vs Routing

Ad Hoc Wireless Routing CS 218- Fall 2003 Wireless multihop routing challenges Review of

Routing Algebras What are routing algebras? Created to study properties of routing protocols

Object Oriented Object 3 Programming Object 1 Object 2 Object 4 For : COP 3330. Object

Zonal Placement & Cost Mitigation within SPP SPP Strategic Planning Committee April 20, 2017

Game-theoretic aspects of link placement in the WorldWide Web George Kouroupas kouroupa@ aueb .gr

Medicare DSH: What is in the Proposed Rule and What it Means for Hospitals May 23, 2013 1

Debra Hart, MS Director, Education & Transition Institute for Community Inclusion University

Integrated Care for Kids (InCK) Notice of Funding Opportunity Center for Medicare and Medicaid

Logistics Webinars are recorded and archived at https://kids- alliance.org/resources/

INFORMATION Overview Context: You want to reward good performance by a subordinate, but he

zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA Why does a Managed Care Organization care

I/O Congestion Avoidance via Routing and Object Placement David - PowerPoint PPT Presentation

I/O Congestion Avoidance via Routing and Object Placement David Dillow, Galen Shipman, Sarp Oral, and Zhe Zhang 1 Motivation Goal: 240 GB/s routed Direct-attached vs. Center-wide Limited allocations INCITE averages 27 million

TCP Congestion Avoidance Joshua Gancher November 10, 2016 Joshua Gancher TCP Congestion

What do you mean, Congestion? some history Congestion Collapse

Congestion Games and Selfish Routing Maria Serna Fall 2016 AGT-MIRI, FIB-UPC Congestion Games

CS 557 Congestion Avoidance Congestion Avoidance and Control Jacobson and Karels, 1988 Spring

Optimization-based routing and congestion control Routing, congestion control as optimization

Pattern avoidance Definitions in rook monoids Rook Monoids Avoidance 1d Avoidance All 0/No 0

Internet congestion control: TCP Internet congestion control: TCP 1988: &quot;Congestion

Congestion Games with affine functions Maria Serna Fall 2016 AGT-MIRI, FIB-UPC Congestion Games

Congestion Control Outline Queuing Discipline Reacting to Congestion Avoiding Congestion 1

CS 356: Computer Network Architectures Lecture 20: Congestion Avoidance Chap. 6.4 and related

The Present and Future of Congestion Control Mark Handley Outline Purpose of congestion

Congestion Control Mark Handley Outline Part 1: Traditional congestion control for bulk

Scalable Routing Outline Routing Algorithms Scalability 1 Overview Forwarding vs Routing

Ad Hoc Wireless Routing CS 218- Fall 2003 Wireless multihop routing challenges Review of

Routing Algebras What are routing algebras? Created to study properties of routing protocols

Object Oriented Object 3 Programming Object 1 Object 2 Object 4 For : COP 3330. Object

Zonal Placement &amp; Cost Mitigation within SPP SPP Strategic Planning Committee April 20, 2017

Game-theoretic aspects of link placement in the WorldWide Web George Kouroupas kouroupa@ aueb .gr

Medicare DSH: What is in the Proposed Rule and What it Means for Hospitals May 23, 2013 1

Debra Hart, MS Director, Education &amp; Transition Institute for Community Inclusion University

Integrated Care for Kids (InCK) Notice of Funding Opportunity Center for Medicare and Medicaid

Logistics Webinars are recorded and archived at https://kids- alliance.org/resources/

INFORMATION Overview Context: You want to reward good performance by a subordinate, but he

zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA Why does a Managed Care Organization care

Internet congestion control: TCP Internet congestion control: TCP 1988: "Congestion

Zonal Placement & Cost Mitigation within SPP SPP Strategic Planning Committee April 20, 2017

Debra Hart, MS Director, Education & Transition Institute for Community Inclusion University