Rx for Data Center Communication Scalability mir Vigfsson Hussam - - PowerPoint PPT Presentation

rx for data center communication scalability
SMART_READER_LITE
LIVE PREVIEW

Rx for Data Center Communication Scalability mir Vigfsson Hussam - - PowerPoint PPT Presentation

Rx for Data Center Communication Scalability mir Vigfsson Hussam Abu-Libdeh Mahesh Balakrishnan Gregory Chockler Robert Burgess Yoav Tock Ken Birman Haoyuan Li IBM Research, Cornell University Microsoft Research, Haifa Labs Silicon


slide-1
SLIDE 1

Ýmir Vigfússon Gregory Chockler Yoav Tock

Rx for Data Center Communication Scalability

Hussam Abu-Libdeh Robert Burgess Ken Birman Haoyuan Li Mahesh Balakrishnan IBM Research, Haifa Labs Cornell University Microsoft Research, Silicon Valley

slide-2
SLIDE 2

Useful

– IPMC is fast, and widely supported – Multicast and pub/sub often used implicitly – Lots of redundant traffic in data centers [Anand et al. SIGMETRICS ’09]

Rarely used

– IP Multicast has scalability problems!

IP Multicast in Data Centers

slide-3
SLIDE 3

IP Multicast in Data Centers

  • Switching hierarchies
slide-4
SLIDE 4

IP Multicast in Data Centers

  • Switches have limited state space

Switch model (10Gbps) Group capacity Alcatel-Lucent OmniSwitch OS6850-4 260 Cisco Catalyst 3750E-48PD-EF 1,000 D-Link DGS-3650 864 Dell PowerConnect 6248P 69 Extreme Summit X450a-48t 792 Foundry FastIron Edge X 448+2XG 511 HP ProCurve 3500yl 1,499

slide-5
SLIDE 5

IP Multicast in Data Centers

slide-6
SLIDE 6

IP Multicast in Data Centers

  • NICs also have limited state space

E.g. 16 exact addresses 512-bit Bloom filter

slide-7
SLIDE 7

IP Multicast in Data Centers

slide-8
SLIDE 8

IP Multicast in Data Centers

  • Kernel has to filter out unwanted packets!
slide-9
SLIDE 9
  • Packet loss triggers further problems

–Reliability layer may aggravate loss –Major companies have suffered multicast storms

IPMC has dangerous scalability issues

IP Multicast in Data Centers

slide-10
SLIDE 10

Key ideas

  • Treat IPMC groups as a scarce resource

– Limit the number of physical IPMC groups – Translate logical IPMC groups into either physical IPMC groups or multicast by iterated unicast.

  • Merge similar groups together
  • Dr. Multicast
slide-11
SLIDE 11
  • Transparent: Standard IPMC interface to

user, standard IGMP interface to network.

  • Robust: Distributed, fault-tolerant service.
  • Optimizes resource use: Merges similar

multicast groups together.

  • Scalable in number of groups: Limits

number of physical IPMC groups.

  • Dr. Multicast
slide-12
SLIDE 12
  • Dr. Multicast
  • Library maps logical IPMC to

physical IPMC or iterated unicast

  • Transparent to the application

– IPMC calls intercepted and modified

  • Transparent to the network

– Ordinary IPMC/IGMP traffic

slide-13
SLIDE 13
  • Transparent: Standard IPMC interface to

user, standard IGMP interface to network.

  • Robust: Distributed, fault-tolerant service.
  • Optimizes resource use: Merges similar

multicast groups together.

  • Scalable in number of groups: Limits

number of physical IPMC groups.

  • Dr. Multicast
slide-14
SLIDE 14
  • Dr. Multicast
  • Per-node agent maintains global group

membership and mapping – Library consults local agent

  • Leader agent periodically computes new

mapping (see later).

  • State reconciled via gossip
slide-15
SLIDE 15

Library Layer Overhead

  • Experiment measuring sends/sec at one sender
  • Sending to r addresses realizes roughly 1/r ops/sec
  • Insignificant overhead when mapping logical IPMC group to

physical IPMC group.

slide-16
SLIDE 16

Network Overhead and Robustness

  • Experiment on 90 Emulab nodes

Nodes introduced 10 at a time. Total network traffic grows linearly. Average traffic received per-node. Robust to major correlated failure

Half of the nodes die

slide-17
SLIDE 17
  • Transparent: Standard IPMC interface to

user, standard IGMP interface to network.

  • Robust: Distributed, fault-tolerant service.
  • Optimizes resource use: Merges similar

multicast groups together.

  • Scalable in number of groups: Limits

number of physical IPMC groups.

  • Dr. Multicast
slide-18
SLIDE 18

Optimization questions

BLACK

Multicast Users Groups Users Groups

slide-19
SLIDE 19

Optimization Questions

Assign IPMC and unicast addresses s.t.

  • Min. receiver filtering
  • Min. network traffic
  • Min. # IPMC addresses
  • … yet deliver all messages to interested parties
slide-20
SLIDE 20

Optimization Questions

Assign IPMC and unicast addresses s.t.

  • receiver filtering
  • network traffic
  • # IPMC addresses (hard)

M 

) 1 (  

  • Knob to control relative costs of CPU filtering

and of duplicate traffic.

  • Both and are part of administrative policy.

M

slide-21
SLIDE 21

MCMD Heuristic

Groups in `user- interest’ space GRAD STUDENTS FREE FOOD

(1,1,1,1,1,0,1,0,1,0,1,1)

slide-22
SLIDE 22

MCMD Heuristic

Groups in `user- interest’ space

Grow M meta-groups around the groups greedily while cost decreases

slide-23
SLIDE 23

MCMD Heuristic

Groups in `user- interest’ space

Grow M meta-groups around the groups greedily while cost decreases

slide-24
SLIDE 24

MCMD Heuristic

Groups in `user- interest’ space Unicast Unicast 224.1.2.3 224.1.2.4 224.1.2.5

slide-25
SLIDE 25
  • Social:

– Yahoo! Groups – Amazon Recommendations – Wikipedia Edits – LiveJournal Communities – Mutual Interest Model

Data sets/models

Users Groups

slide-26
SLIDE 26

MCMD Heuristic

  • Total cost on samples of 1000 logical groups.

– Costs drop exponentially with more IPMC addresses

slide-27
SLIDE 27
  • Social:

– Yahoo! Groups – Amazon Recommendations – Wikipedia Edits – LiveJournal Communities – Mutual Interest Model

  • Systems:

– IBM Websphere

Data sets/models

Users Groups

slide-28
SLIDE 28

MCMD Heuristic

  • Total cost on IBM Websphere data set (simulation)

– Negligible costs when using only 4 IPMC addresses

slide-29
SLIDE 29

MCMD Heuristic

  • Parallel Websphere cells (127 nodes each)

– Allow 1000 IPMC groups. Optimal until 250 cells.

Duplication costs Filtering costs

slide-30
SLIDE 30
  • Transparent: Standard IPMC interface to

user, standard IGMP interface to network.

  • Robust: Distributed, fault-tolerant service.
  • Optimizes resource use: Merges similar

multicast groups together.

  • Scalable in number of groups: Limits

number of physical IPMC groups.

  • Dr. Multicast
slide-31
SLIDE 31

Group Scalability

  • Experiment on Emulab with 1 receiver, 9 senders
  • MCMD prevents ill-effects when the # of groups scales up
slide-32
SLIDE 32

IPMC is useful, but has scalability problems  Dr. Multicast treats IPMC groups as scarce and sensitive resources

– Transparent, backward-compatible – Scalable in the number of groups – Robust against failures – Optimizes resource use by merging similar groups

  • Enables safe and scalable use of multicast
  • Dr. Multicast
slide-33
SLIDE 33
slide-34
SLIDE 34

Acceptable Use Policy

  • Assume a higher-level network management tool

compiles policy into primitives.

  • Explicitly allow a process (user) to use IPMC groups.
  • allow-join(process ID, logical group ID)
  • allow-send(process ID, logical group ID)
  • Multicast by point-to-point unicast always permitted.
  • Additional restraints.
  • max-groups(process ID, limit)
  • force-unicast(process ID, logical group ID)
slide-35
SLIDE 35
  • IBM Websphere has remarkable

structure

  • Typical for real-world systems?

– Only one data point.

Group Similarity

slide-36
SLIDE 36

Group Similarity

  • Def: Similarity of groups is

IBM Websphere

slide-37
SLIDE 37
  • User and group degree distributions

appear to follow power-laws.

  • Power-law degree distributions often

modeled by preferential attachment.

  • Mutual Interest model:

– Preferential attachment for bipartite graphs.

Social data sets

Groups Users

slide-38
SLIDE 38
  • Useful, but rarely used.
  • Various problems:

– Security – Stability – Scalability

  • Bottom line: Administrators have no

control over IPMC.

– Thus they choose to disable it.

IP Multicast in Data Centers

slide-39
SLIDE 39
  • Policy: Enable control of IPMC.
  • Transparency: Should be backward

compatible with hardware and software.

  • Scalability: Needs to scale in number
  • f groups.
  • Robustness: Solution should not bring

in new problems.

Wishlist

slide-40
SLIDE 40
  • What’s in a ``group’’ ?
  • Social:

– Yahoo! Groups – Amazon Recommendations – Wikipedia Edits – LiveJournal Communities – Mutual Interest Model

  • Systems:

– IBM Websphere – Hierarchy Model

Data sets/models

Users Groups

slide-41
SLIDE 41
  • Distributed systems tend to be

hierarchically structured.

  • Hierarchy model

– Motivated by Live Objects.

Systems Data Set

Thm: Expect a pair of users to overlap in groups .

slide-42
SLIDE 42

Group similarity

  • Def: Similarity of groups j,j’ is

Wikipedia LiveJournal

slide-43
SLIDE 43

Group similarity

  • Def: Similarity of groups j,j’ is

Mutual Interest Model

slide-44
SLIDE 44
  • Most network traffic is unicast

communication (one-to-one).

  • But a lot of content is identical:

– Audio streams, video broadcasts, remote updates, etc. – Video traffic is forecast to be 90% of Internet traffic in 2013.

  • To minimize redundancy, would be

nice to multicast communication (one-to-many).

Group communication

slide-45
SLIDE 45

IP Multicast in Data Centers

Smaller scale – well defined hierarchy Single administrative domain Firewalled – can ignore malicious behavior