Unicorn: Unified Resource Orchestration for Multi- Domain, - - PowerPoint PPT Presentation

unicorn unified resource orchestration for multi domain
SMART_READER_LITE
LIVE PREVIEW

Unicorn: Unified Resource Orchestration for Multi- Domain, - - PowerPoint PPT Presentation

Unicorn: Unified Resource Orchestration for Multi- Domain, Geo-Distributed Data Analytics Qiao Xiang 12 , Jace Liu 1 , Harvey Newman 3 , Tony Wang 12 , Y. Richard Yang 12 , Jensen Zhang 1 1 Tongji University, 2 Yale University, 3 California


slide-1
SLIDE 1

Unicorn: Unified Resource Orchestration for Multi- Domain, Geo-Distributed Data Analytics

Qiao Xiang12, Jace Liu1, Harvey Newman3, Tony Wang12, Y. Richard Yang12, Jensen Zhang1

1 Tongji University, 2 Yale University, 3 California Institute of Technology,

November, 2017, INDIS Workshop, Denver, CO

slide-2
SLIDE 2

Background

  • Data-intensive applications rely on clusters of heterogeneous servers

as the major computing platform.

  • Missing: a unified framework to manage a large set of distributively-
  • wned, heterogeneous resources for multi-domain data analytics.
  • Members: worldwide multi-organizational collaboration among

Caltech, Tongji University, Tsinghua University, Yale University, the OpenDaylight ALTO team and the Kytos team.

2

slide-3
SLIDE 3

Figure source: cern.ch

Example Design Setting: Large Hadron Collider (LHC)

3

slide-4
SLIDE 4

The Compact Muon Solenoid (CMS) Computing Model

Calibration CERN Analysis Facility 200Hz -400 Hz RAW:~1.7-1.1 MB/evt

Tie-1 Tie-2 Tie-0

Large raw datasets from LHC at the Tier-0 site RECO and AOD datasets are distributed to Tier-1 sites RECO, AOD and simulation datasets are transferred among Tier-1~3 sites for analysis.

RA RAW Da Data: tens of PB per year RE RECO CO and and AO AOD: multiple times of RAW data depending on analysis requirements 4

slide-5
SLIDE 5

Resource Orchestration in CMS: Challenges

  • It is a multi-domain science

network.

  • Different domains (resource

providers) provide heterogeneous resources.

  • Different resource providers use

different controllers to manage the resources, especially the networking resource, e.g., OpenDaylight and Kytos.

  • Different jobs in the network.

– PB dataset transfers – Various HEP analytics

Figure source: cern.ch

5

slide-6
SLIDE 6

Multi-Domain Resource Orchestration: Design Requirements

  • Multi-controller coordination.

– Resource providers, which use different network controllers (e.g, OpenDaylight, Kytos, ONOS, and etc.), can communicate and coordinate the orchestration process through a unified interface.

  • Consistent operation paradigm.

– Efficient resource utilization without resource overloading. – Fast convergence.

  • Autonomy and privacy of resource providers.

– Resource providers can make and practice their own resource supply strategies with control of privacy.

6

Unicorn: a multi-domain, multi-controller (MDMC) resource orchestration system

slide-7
SLIDE 7

Unicorn: An MDMC Resource Orchestration System

  • Users send jobs to a logically centralized orchestrator;
  • The orchestrator sends resource reservation requests to different domains;
  • The reservation servers, running on top of different controllers, process the

requests and return the result (success/fail).

7

Global Resource Orchestrator

Resource Reservation Server Resource Reservation Server

Reservation Requests Reservation results Reservation Requests Reservation results Jobs

ODL Kytos

Multi-Controller Coordination

How does the orchestrator know how much resources to request for each job?.

slide-8
SLIDE 8

Unicorn: An MDMC Resource Orchestration System

  • Add resource information servers to provide such information of each domain.
  • The orchestrator uses the queried resource information to compute the optimal

resource reservation requests, and send to the reservation servers.

8

Global Resource Orchestrator

Resource Reservation Server Resource Reservation Server

Reservation Requests/Results Resource Discovery Queries/Responses Jobs

Resource Information Server Resource Information Server

Reservation Requests/Results

OpenDaylight Kytos

Consistent Operation Paradigm

slide-9
SLIDE 9

Unicorn: An MDMC Resource Orchestration System

9

Global Resource Orchestrator

Resource Reservation Server Resource Reservation Server

Reservation Requests/Results Resource Discovery Queries/Responses Jobs

Resource Information Server Resource Information Server

Reservation Requests/Results

OpenDaylight Kytos

  • Consistent operation paradigm
  • Multi-controller coordination
  • Provider autonomy and privacy
  • Global orchestrator
  • Servers with unified interfaces
  • ?

How does resource information servers provide accurate resource information yet still ensure the autonomy and privacy of providers?

slide-10
SLIDE 10

Resource Information Server: Related Work

  • All-detail resource graph

– Examples: HTCondor, Mesos, YARN, etc. – Nodes: computing/storage resources – Links: networking resources – Limitation

  • Reveal all details of resources, compromising the privacy of

clusters in the multi-domain setting.

  • Resource supply heterogeneousity and dynamicity lead to high
  • verhead.
  • Alternative design: one-big-switch abstraction

– Example: P4P/ALTO.

– Limitation: cannot reveal the shared bottleneck resources between analytics tasks, leading to resource overloading and slow convergence.

10

slide-11
SLIDE 11

Resource Information Server: Solution

11

Extremely abstract; Cannot reveal shared bottleneck resources. What is the right abstraction for multi- domain science networks? All-Detail Resource Graph One-Big- Switch Abstraction Extremely detailed; Compromised privacy; High overhead.

  • Basic idea: instead of the more limited graph model to represent

resource availability, mathematical programming, such as linear programming, is a more general, abstract constraint representation.

  • We refer to this feasible region representation as resource state

abstraction (ReSA).

slide-12
SLIDE 12

Resource State Abstraction (ReSA): Example

  • For each link, use a linear constraint to represent the

bandwidth sharing among flows that use this link.

  • Geometrically, ReSA is the feasible region of flow rates

defined by these linear constraints.

  • However, some constraints are redundant, i.e., the feasible

region of flow rates will not change without these constraints.

12

sw1 sw2

s1 d1 s2 d2

sw5 sw8 sw6 sw4 sw7

l1 l7 l12

sw3

l6

Each link: 100 Mbps

𝑠

" ≤ 𝑐%, ∀ 𝑗 ∈ 1, 2, 5, 6

𝑠

/ ≤ 𝑐%, ∀ 𝑗 ∈ 7, 8, 11, 12

𝑠

" + 𝑠 / ≤ 𝑐%, ∀ 𝑗 ∈ 3, 4

𝑠

" 100 100

𝑠

/

slide-13
SLIDE 13

Minimal, Equivalent ReSA: Example

13

𝑠

" + 𝑠 / ≤ 100 𝑁𝑐𝑞𝑡

𝑠

"

100 100

𝑠

/

  • Minimal, equivalent ReSA reveals shared bottleneck

resources.

sw1 sw2

s1 d1 s2 d2

sw5 sw8 sw6 sw4 sw7

l1 l7 l12

sw3

l6

Each link: 100 Mbps 𝑠

" ≤ 𝑐%, ∀ 𝑗 ∈ 1, 2, 5, 6

𝑠

/ ≤ 𝑐%, ∀ 𝑗 ∈ 7, 8, 11, 12

𝑠

" + 𝑠 / ≤ 𝑐%, ∀ 𝑗 ∈ 3, 4

slide-14
SLIDE 14

ReSA for Multi-Domain, Resource Discovery

  • Accurate, efficient discovery process.

– Two-phase discovery decomposition. – Path query: find all the domains it passes through for each job. – Resource query: ReSA query for all jobs entering the same domain.

  • Minimal information exposure of multiple resource

providers.

– Secure multi-party computation.

  • Dynamic update of resource availability.

– Server-side event.

14

slide-15
SLIDE 15

Minimal Information Exposure of Resource Providers

  • Basic idea. a secure multi-party computational geometry protocol

to decide the redundancy of each linear inequality using vertex enumeration and halfspace test.

  • ReSA servers from different domains do not reveal their own set of

linear inequalities to others during the protocol.

15

ReSA Server A ReSA Server B ReSA Server C ReSA Server D {𝑔

" + 𝑔 / ≤ 100𝐻𝑐𝑞𝑡}

{𝑔

=≤ 200𝐻𝑐𝑞𝑡}

{𝑔

"+𝑔 / + 𝑔 = ≤ 100𝐻𝑐𝑞𝑡}

{𝑔

/+𝑔 = ≤ 100𝐻𝑐𝑞𝑡}

Global Resource Orchestrator

{𝑔

"+𝑔 / + 𝑔 = ≤ 100𝐻𝑐𝑞𝑡}

∅ ∅ ∅

slide-16
SLIDE 16

Putting Pieces Together

  • Multi-domain orchestration
  • Multi-controller coordination
  • Provider autonomy and privacy

16

  • 2. Reservation

Requests/Results

ReSA Server ReSA Server

Global Resource Orchestrator

Resource Reservation Server Resource Reservation Server

  • 2. Reservation

Requests/Results Jobs

  • 1. Resource Discovery

Queries/Responses

OpenDaylight Kytos

  • Global orchestrator
  • Servers with unified interfaces
  • Resource state abstraction
slide-17
SLIDE 17

Unicorn Implementation

  • Orchestrator: ~2700 LoC Python code
  • ReSA server: ~2500 LoC Java code
  • Resource reservation server:

– fast data transfer (FDT), FireQoS, OpenvSwitch, etc.

  • Network controllers: OpenDaylight, Kytos

– ONOS and Ryu are under development

17

slide-18
SLIDE 18

Evaluation

18 Arpanet Aarnet Chinanet

Topologies

20 40 60 80 100 120 140 160 180 200 220

Number of linear inequalities

Intra-domain resource view Cross-domain resource view

Arpanet Aarnet Chinanet

Topologies

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Compression ratio

Intra-domain resource view Cross-domain resource view 5 10 20 30

Number of jobs

20 40 60 80 100 120 140 160

Number of linear inequalities

Intra-domain resource view Cross-domain resource view 5 10 20 30

Number of jobs

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Compression ratio

Intra-domain resource view Cross-domain resource view

Please refer to our paper for more details.

slide-19
SLIDE 19

Summary

  • Unicorn: a multi-domain, multi-controller resource orchestration

system

– Global orchestrator to achieve consistent operation paradigm. – Servers with unified interfaces enable multi-controller coordination. – Resource state abstraction: a set of linear inequalities to represent resource availability, achieving accurate, minimal information exposure of resource providers.

19

Demonstration at SC17 When: 2-3pm on Tuesday and Wednesday Where: booth 663 (Caltech Booth) What: multi-domain resource orchestration across multiple booths and wide area networks.

Contact: Qiao Xiang (qiao.xiang@yale.edu)

slide-20
SLIDE 20

Backup Slides

20

slide-21
SLIDE 21

Resource State Abstraction (ReSA)

  • Basic idea: instead of the more limited graph model to represent

resource availability, mathematical programming, such as linear programming, is a more general, abstract constraint representation.

  • Introduce mathematical programming constraints, representing

feasible regions, as a more powerful, flexible representation of resource availability.

  • We refer to this feasible region representation as resource state

abstraction representation.

21

, subject to Objective of App 𝑙

slide-22
SLIDE 22

One-Big-Switch Abstraction: Example

  • Two flows

– 𝑔

": 𝑡", 𝑒"

– 𝑔

/: 𝑡/, 𝑒/

  • Resource information provided by one-big-switch abstraction

𝑠

" = 100𝑁𝑐/𝑡

𝑠

/ = 100𝑁𝑐/𝑡

  • The view provided by one-big-switch abstraction can have

ambiguity.

22

s1 d1 s2 d2

slide-23
SLIDE 23

One-Big-Switch Abstraction: Ambiguity

  • No shared resources
  • Shared resources

23

𝑠

/

𝑠

"

100 100

Feasible region

sw1 sw2

s1 d1 s2 d2

sw5 sw8 sw6 sw4 sw7

l1 l7 l12

sw3

l6

Each link: 100 Mbps

sw1 sw2

s1 d1 s2 d2

sw5 sw8 sw6 sw4 sw7

l1 l7 l12

sw3

l6

Each link: 100 Mbps

𝑠

"

100 100

𝑠

/

Feasible region

slide-24
SLIDE 24

Minimal, Equivalent ReSA Example - 1

  • When there is no shared resource among flows, minimal,

equivalent ReSA reduces to the one-big-switch abstraction.

24

sw1 sw2

s1 d1 s2 d2

sw5 sw8 sw6 sw4 sw7

l1 l7 l12

sw3

l6

Each link: 100 Mbps

𝑠

" ≤ 𝑑%, ∀ 𝑗 ∈ 1, 2,3, 4, 5, 6

𝑠

/ ≤ 𝑑%, ∀ 𝑗 ∈ {7, 8, 9, 10, 11, 12}

𝑠

" ≤ 100 𝑁𝑐𝑞𝑡

𝑠

/ ≤ 100 𝑁𝑐𝑞𝑡

𝑠

/

𝑠

"

100 100

slide-25
SLIDE 25

How ReSA Works [2][3]

  • 1. Applications (clients) send a

request for a set of flows 𝐺.

  • 2. The RSA server collects network

information; calculates the minimal, equivalent ReSA Π I(𝐺) for this request; and returns to the client.

  • 3. Applications use Π

I(𝐺) as constraints to compute the bandwidth requirement for flow set 𝐺, and send the request to the PCE server.

25 [2] Gao, et.al., "ORSAP: Abstracting routing state on demand", poster, in IEEE ICNP 2016. [3] Gao, et.al., "NOVA: towards on-demand equivalent network view abstraction for network

  • ptimization", in IEEE/ACM IWQoS 2017.

ReSA Server App 1

A set of flows 𝐺 Π I(𝐺)

PCE Server

𝐺 with the computed bandwidth requirements based on Π I(𝐺)

App 2

, subject to

Π I(𝐺)

slide-26
SLIDE 26

Accurate, Efficient Resource Discovery

  • Basic idea. Two-phase discovery decomposition: path query and

resource query

  • Path query: for each job, find all the domains it passes through (i.e.,

domain-path).

– The outcome is equivalent to the set of all jobs enter each domain. – Lemma: path query requires no additional information exposure from each domain than current inter-domain routing protocols, i.e., BGP.

  • Resource query: for each domain, find its accurate resource

availability for all the flows that enter this domain (i.e., ReSA).

26

slide-27
SLIDE 27

Path Query

  • Existing multi-domain routing protocols, e.g., BGP, provide the

information to construct the domain path.

  • Lemma: path query requires no additional information exposure

from each domain than current inter-domain routing protocols, i.e., BGP.

– Proof: for any flow, the ingress point of each site it passes must be known by the last-hop site to forward the flow.

27

slide-28
SLIDE 28

28

Controller A Controller B Controller C

10.0.0.1 10.0.2.3

Domain A Domain B Domain C Application 1 2 3 4 5 1.2.3.4 5.6.7.8 6

  • 1. pQuery([{1.2.3.4, 5.6.7.8}], null) //null means site A is where the

source resides in.

  • 2. pResponse: [10.0.0.1]
  • 3. pQuery([{1.2.3.4, 5.6.7.8}], 10.0.0.1)
  • 4. pResponse: [10.0.2.3]
  • 5. pQuery([{1.2.3.4, 5.6.7.8}], 10.0.2.3)
  • 6. pResponse: [null] // reaches the destination site.

Path Query Example

slide-29
SLIDE 29

Schedulability of ReSA View

  • Proposition: when the view represented by ReSA satisfies one of

the following conditions:

– resources represented in the original set of constraints 𝐷 can be fully controlled reserved on the edge, i.e. , all the attributes of each resource can be reserved at end hosts; – all the attributes computed in 𝐷′, the minimal, equivalent ReSA, can be fully fully controlled on the edge;

the RSA view provides a full schedulability of resources to a logically centralized resource orchestrator.

29

Question: what if resources are not controlled at end hosts, e.g., networking resources are controlled by TCP congestion control mechanisms?

slide-30
SLIDE 30

Minimal, Equivalent ReSA

  • Equivalence: two set of linear constraints Π and Π

I are equivalent if their corresponding feasible regions are identical.

  • Minimal, Equivalent ReSA Problem. given the raw resource state

represented by a set of linear constraints Π: {𝑩𝒚 ≤ 𝒄}, find Π I, the minimal subset of Π that is equivalent to Π.

  • MECS Algorithm:

– Iteratively select 𝑑: 𝑏R 𝒚 ≤ 𝑐 ∈ Π and solve: 𝑧 = max 𝑏R𝒚, 𝑡. 𝑢. , Π\{𝑑} – If 𝑐 < 𝑧, put 𝑑 into Π I. – A polynomial-time algorithm with proved optimality.

30

slide-31
SLIDE 31

Unicorn Architecture

31

  • 2. Reservation

Requests/Results

ReSA Server ReSA Server

Global Resource Orchestrator

Resource Reservation Server Resource Reservation Server

  • 2. Reservation

Requests/Results Jobs

  • 1. Resource Discovery

Queries/Responses

Unicorn

slide-32
SLIDE 32

Resource Information Server: Related Work

  • All-detail resource graph

– Examples: HTCondor, Mesos, YARN, etc. – Nodes: computing/storage resources – Links: networking resources

  • Limitation
  • Reveal all details of resources to applications, compromising

the privacy of resource providers.

  • Resource supply heterogeneousity and dynamicity lead to high
  • verhead.

32

slide-33
SLIDE 33

Resource Information Server: Related Work

  • Alternative design: One-Big-Switch abstraction

– Example: P4P/ALTO [1]. – Combine the objective of applications and providers. – Decouple the decision variables from applications and providers in the constraints using prime-dual decomposition. – The interface between applications and providers is the dual variables.

  • Limitation: cannot reveal the shared bottleneck resources

between analytics tasks.

  • An iterative approach which takes time to converge.
  • Cannot prevent applications from overloading the resources.

33

[1] Xie, et.al., "P4P: Provider portal for applications.", in SIGCOMM 2008.