Trading Communication and Computing for Distributed Matrix - - PowerPoint PPT Presentation

trading communication and computing for
SMART_READER_LITE
LIVE PREVIEW

Trading Communication and Computing for Distributed Matrix - - PowerPoint PPT Presentation

Multi-Cell Mobile Edge Coded Computing: Trading Communication and Computing for Distributed Matrix Multiplication ISIT, June 21-26, 2020 Emerging Mobile Applications Computation-intensive Delay-sensitive 2 Mobile Edge Computing


slide-1
SLIDE 1

ISIT, June 21-26, 2020

Multi-Cell Mobile Edge Coded Computing: Trading Communication and Computing for Distributed Matrix Multiplication

slide-2
SLIDE 2

2

Emerging Mobile Applications

Computation-intensive

Delay-sensitive

slide-3
SLIDE 3

Provides IT and cloud-computing capabilities within the Radio Access Network (RAN) in close proximity to mobile subscribers [ETSI'14]

3

[ETSI’14] “Mobile-edge computing—Introductory technical white paper,” White Paper, ETSI, Sophia Antipolis, France, Sep. 2014.

CDN

Gateway

Web Data center

Mobile Edge Computing (MEC)

Promote user experience:

◆ Save energy ◆ Reduce latency

slide-4
SLIDE 4

Input data uploading

Distributed edge computing

Output data downloading

4

Challenge

User 1 User i User M Computation timeline Uplink Computation Downlink Uplink Downlink

Challenges:

◆ Severe interferences or deep fading ◆ Random server computing times, i.e., stragglers ◆ End-to-end times are significantly prolonged

EN 1 EN k EN K

Task offloading procedure

slide-5
SLIDE 5

Exploit computation replication and coded computing

Investing more time in any one of three task offloading steps can reduce the time needed for subsequent steps

5

Our Approach

◆ Consider matrix multiplication in linear inference task: ◆ Assign the input vectors U from users to multiple ENs ◆ Encode model A by hybrid MDS and repetition codes

Repeated assignment of U MDS-Repetition coding for A Reduce recovery threshold Transmission cooperation Create spatial redundancy Mitigate interferences Overcome straggler

— U: user’s input vectors, A: network-stored model, V: desired output vectors

Tradeoffs among upload, computing, and download latencies

slide-6
SLIDE 6

[Zhang’19] utilizes MDS-Repetition codes

[Li’20] exploits computation replication

Our work

6

Related Works

◆ Assume input vectors from all users are available at all ENs ◆ Propose a computing-downloading strategy

[Zhang’19]J. Zhang and O. Simeone, “On model coding for distributed inference and transmission in mobile edge computing systems,” IEEE Commun. Letters, vol. 23, no. 6, pp. 1065–1068, Jun. 2019. [Li’20] K. Li, M. Tao, and Z. Chen, “Exploiting computation replication for mobile edge computing: A fundamental computation-communication tradeoff study,” IEEE Trans. Wireless Commun., pp. 1–1, 2020.

◆ Assume computing times of ENs are deterministic; adopt general task model ◆ Characterize an upload-download latency tradeoff ◆ Propose a joint task assignment, upload, computing, and download policy ◆ Study tradeoffs among upload, computing, and download latencies ◆ Converse: our policy is approximately optimal for sufficiently large upload times.

slide-7
SLIDE 7

7

MEC Network Model

Stored encoded model Input data Desired output

User 1 User i User M

Uplink Downlink

Each user i has N input vectors and desires N output vectors

Each EN stores rows of Am×n ,

is the time to compute a row-vector product

is the set of input vectors from all users assigned to EN k

The computing time of EN k is

slide-8
SLIDE 8

Repetition order r :

Recovery order q: the number of non-straggling ENs to return outputs

Normalized uploading time (NULT):

Normalized computing time (NCT):

Normalized downloading time (NDLT):

For an NULT , the compute-download latency region:

8

Performance Metric

reference time

the average number of ENs that are assigned the same input vector

avoid rounding complications

reference time reference time

Feasible region:

To store enough information

  • f A for computing outputs

The rest K-q ENs are stragglers

slide-9
SLIDE 9

9

Given an upload latency , what is the optimal trade-off region between computing and download latencies ?

Fundamental Question

◆ Characterize the inner bound and outer bound on the compute-download latency

region at any given upload latency

◆ Present tradeoffs among upload, computing, and download latencies.

2 4 6 8 10 12 14 16 1 1.5 2 2.5 3 3.5 4 4.5 NCT (c) NDLT (

d)

Outer Bound Inner Bound

Compute-download latency region at for M=K=10

slide-10
SLIDE 10

M=5 users, K=5 ENs, N=5 input vectors, m=40 row vectors, μ=3/5, (r, q)=(4, 3)

Each user divides input vectors into 5

4 =5 subsets, each has 1 input and is

assigned to a distinct subset of 4 ENs

Uplink: 5-transmitter 5-receiver X-multicast channel with multicast group size 4

◆ Optimal per-receiver DoF ◆ Approximated transmission rate: ◆ Upload time:

The NULT:

EN 3

ui,1 ui,4

10

Example: Task Assignment & Upload

ui,4 ui,2 ui,5 ui,1ui,3 ui,4 ui,5 ui,1ui,2 ui,3 ui,5 ui,1ui,2 ui,3 ui,4 ui,3 ui,1ui,2 ui,4 ui,5

EN 5 EN 1 EN 4 EN 2

user

1 i 5 i = 1, ..., 5

ui,3 ui,2

… …

Interference alignment

ui,5

Any 2 ENs may be stragglers at edge computing phases

slide-11
SLIDE 11

Hybrid MDS-Repetition codes

Edge computing (q = 3)

◆ Each EN computes 24×20 row-vector products ◆ Waiting for the fastest 3 ENs, the NCT is

11

◆ Coding rates :

storage constraint: recovery condition: choose maximum :

◆ Encode A into Ac with 60 rows, then split into

submatrices, each with 6 rows and replicated at 2 ENs.

Example: Coding & Edge Computing

EN 4 EN 5 EN 2 EN 1

× × × × ×

EN 3

ui,1 ui,3 ui,4 ui,5

ui,1 ui,2 ui,3 ui,5

ui,1 ui,2 ui,3 ui,4 ui,4 ui,2

i = 1, ..., 5

ui,5 ui,3 ui,1 ui,2 ui,4 ui,5

Inputs assignment at r = 4

1

a

4

a

2

a

5

a

3

a

6

a

1

a

2

a

3

a

4

a

5

a

6

a

27

a

13

a

31

a

42

a

7

a

1

a

8

a

1 1

a

9

a

2 1

a

40

a

41

a

45

a

31

a

33

a

37

a

32

a

34

a

38

a

33

a

36

a

39

a

43

a

46

a

49

a

52

a

44

a

47

a

50

a

53

a

48

a

51

a

54

a

55

a

58

a

56

a

59

a

57

a

60

a

15

a

13

a

14

a

16

a

17

a

18

a

19

a

20

a

21

a

22

a

23

a

24

a

25

a

28

a

26

a

29

a

27

a

30

a

7

a

1

a

8

a

1 1

a

9

a

2 1

a

25

a

28

a

26

a

29

a

30

a

15

a

14

a

16

a

17

a

18

a

33

a

32

a

34

a

33

a

36

a

45

a

43

a

46

a

44

a

47

a

48

a

19

a

20

a

21

a

22

a

23

a

24

a

42

a

40

a

41

a

37

a

38

a

39

a

49

a

52

a

50

a

53

a

51

a

54

a

55

a

58

a

56

a

59

a

57

a

60

a

store MDS code

(60, 40)

A Ac = [ , ..., ]

1

a

60

a m

MDS code rate Repetition code rate select to store at each EN

Any 2 ENs may be stragglers

slide-12
SLIDE 12

Divide needed outputs into multiple groups

Computation results of :

◆ Different groups are transmitted using TDMA ◆ Downlink channel for transmitting outputs in

each group is cooperative X channel

◆ 30 outputs

— 2-transmitter 5-receiver MISO broadcast channel — optimal per-receiver DoF: 2/5 by zero-forcing (ZF) precoding

12

Example: Output Data Download

Inputs assignment at r = 4 store MDS code

(60, 40)

A Ac = [ , ..., ]

1

a

60

a

MISO broadcast channel

slide-13
SLIDE 13

Divide needed outputs into multiple groups

Computation results of :

◆ 30 outputs

— 2-transmitter 5-receiver MISO broadcast channel — optimal per-receiver DoF: 2/5 by zero-forcing (ZF) precoding

◆ The rest needed 34×5=170 outputs:

— 2-transmitter 5-receiver X channel,

— optimal per-receiver DoF: 1/3 by asymptotic interference alignment (IA)

◆ NDLT: 3/40+51/100=117/200 ◆ Different groups are transmitted using TDMA ◆ Downlink channel for transmitting outputs in

each group is cooperative X channel

13

Example: Output Data Download

Inputs assignment at r = 4 store MDS code

(60, 40)

A Ac = [ , ..., ]

1

a

60

a

MISO broadcast channel X channel

slide-14
SLIDE 14

Computation results of :

14

Example: Output Data Download

Inputs assignment at r = 4 store MDS code

(60, 40)

A Ac = [ , ..., ]

1

a

60

a

MISO broadcast channel X channel

◆ NDLT: 117/200×2 = 117/100

slide-15
SLIDE 15

Computation results of :

Total NDLT:

◆ 3-transmitter 5-receiver cooperative X channel with

cooperation group size 2

◆ 3-transmitter 5-receiver X channel ◆ NDLT: (21/100+77/300)×2=14/15

15

Example: Output Data Download

Inputs assignment at r = 4 store MDS code

(60, 40)

A Ac = [ , ..., ]

1

a

60

a

cooperative X channel X channel

=14/15+(117/200)×3=1613/600

Note that the rest number of outputs in the last round can be regarded as an integer divided evenly by when

slide-16
SLIDE 16

At a pair (r, q) in

where and is determined by

For an NULT , the inner bound of compute-download latency region: (time- and memory-sharing)

NULT:

NCT:

NDLT:

16

Achievable Results

Consider all feasible values of q

slide-17
SLIDE 17

r = K and using only ZF precoding in downlink:

it is consistent with the normalized communication delay in [Zhang’19, Eq. (13)].

q = K and µ = 1:

It recovers the communication load in [songze’17, Remark 5], and the NDT with cache-aided EN cooperation in [Sengupta’17, Eq. (25)].

17

Special Cases

[Zhang’19] Zhang and O. Simeone, “On model coding for distributed inference and transmission in mobile edge computing systems,” IEEE Commun. Letters, vol. 23, no. 6, pp. 1065–1068, Jun. 2019. [songze’17] S. Li, M. A. Maddah-Ali, and A. S. Avestimehr, “Communication-aware computing for edge processing,” in Proc. IEEE Int.

  • Symp. Inf. Theory (ISIT), Jun. 2017, pp. 2885–2889.

[Sengupta’17] A. Sengupta, R. Tandon, and O. Simeone, “Fog-aided wireless networks for content delivery: Fundamental latency tradeoffs,” IEEE Trans. Inf. Theory, vol. 63, no. 10, pp. 6650–6678, 2017.

slide-18
SLIDE 18

At a pair (r, q) in

NULT:

NCT:

NDLT:

For an NULT , the outer bound of compute-download latency region:

Optimality

18

Converse

◆ The NULT is optimal ◆ The NDLT or NCT is within constant multiplicative gaps to their respective lower

bounds for sufficiently large NULT

slide-19
SLIDE 19

M=K=10, µ=3/5

19

Numerical Results: Inner and Outer Bound

◆ Maximum q: Hybrid MDS-repetition codes degrade to repetition code ◆ As q increases, the NDLT is reduced at the expense of an increasing NCT ◆ Allowing for a longer upload time increases the compute-download latency region.

q = 10:

Repetition code Repetition code

Increase NULT

slide-20
SLIDE 20

Propose a policy based on hybrid coded computing and on coordinated and cooperative interference management

Characterize the trade-off region between computing and download latencies at a given upload latency

Reveal the tradeoffs among upload, computing, and download latencies

Provide the converse result

Full version: https://arxiv.org/abs/2004.14170

20

Conclusions

◆ Increasing the upload latency can reduce both computing and download latencies ◆ By increasing the computing latency, the download latency can be reduced

slide-21
SLIDE 21

Thank you!

23