Trading Communication and Computing for Distributed Matrix - - PowerPoint PPT Presentation
Trading Communication and Computing for Distributed Matrix - - PowerPoint PPT Presentation
Multi-Cell Mobile Edge Coded Computing: Trading Communication and Computing for Distributed Matrix Multiplication ISIT, June 21-26, 2020 Emerging Mobile Applications Computation-intensive Delay-sensitive 2 Mobile Edge Computing
2
Emerging Mobile Applications
◼
Computation-intensive
◼
Delay-sensitive
◼
Provides IT and cloud-computing capabilities within the Radio Access Network (RAN) in close proximity to mobile subscribers [ETSI'14]
3
[ETSI’14] “Mobile-edge computing—Introductory technical white paper,” White Paper, ETSI, Sophia Antipolis, France, Sep. 2014.
CDN
Gateway
Web Data center
Mobile Edge Computing (MEC)
Promote user experience:
◆ Save energy ◆ Reduce latency
◼
Input data uploading
◼
Distributed edge computing
◼
Output data downloading
4
Challenge
User 1 User i User M Computation timeline Uplink Computation Downlink Uplink Downlink
Challenges:
◆ Severe interferences or deep fading ◆ Random server computing times, i.e., stragglers ◆ End-to-end times are significantly prolonged
EN 1 EN k EN K
Task offloading procedure
◼
Exploit computation replication and coded computing
◼
Investing more time in any one of three task offloading steps can reduce the time needed for subsequent steps
5
Our Approach
◆ Consider matrix multiplication in linear inference task: ◆ Assign the input vectors U from users to multiple ENs ◆ Encode model A by hybrid MDS and repetition codes
Repeated assignment of U MDS-Repetition coding for A Reduce recovery threshold Transmission cooperation Create spatial redundancy Mitigate interferences Overcome straggler
— U: user’s input vectors, A: network-stored model, V: desired output vectors
Tradeoffs among upload, computing, and download latencies
◼
[Zhang’19] utilizes MDS-Repetition codes
◼
[Li’20] exploits computation replication
◼
Our work
6
Related Works
◆ Assume input vectors from all users are available at all ENs ◆ Propose a computing-downloading strategy
[Zhang’19]J. Zhang and O. Simeone, “On model coding for distributed inference and transmission in mobile edge computing systems,” IEEE Commun. Letters, vol. 23, no. 6, pp. 1065–1068, Jun. 2019. [Li’20] K. Li, M. Tao, and Z. Chen, “Exploiting computation replication for mobile edge computing: A fundamental computation-communication tradeoff study,” IEEE Trans. Wireless Commun., pp. 1–1, 2020.
◆ Assume computing times of ENs are deterministic; adopt general task model ◆ Characterize an upload-download latency tradeoff ◆ Propose a joint task assignment, upload, computing, and download policy ◆ Study tradeoffs among upload, computing, and download latencies ◆ Converse: our policy is approximately optimal for sufficiently large upload times.
7
MEC Network Model
Stored encoded model Input data Desired output
User 1 User i User M
Uplink Downlink
◼
Each user i has N input vectors and desires N output vectors
◼
Each EN stores rows of Am×n ,
◼
is the time to compute a row-vector product
◼
is the set of input vectors from all users assigned to EN k
◼
The computing time of EN k is
◼
Repetition order r :
◼
Recovery order q: the number of non-straggling ENs to return outputs
◼
Normalized uploading time (NULT):
◼
Normalized computing time (NCT):
◼
Normalized downloading time (NDLT):
◼
For an NULT , the compute-download latency region:
8
Performance Metric
reference time
the average number of ENs that are assigned the same input vector
avoid rounding complications
reference time reference time
◆
Feasible region:
To store enough information
- f A for computing outputs
The rest K-q ENs are stragglers
9
Given an upload latency , what is the optimal trade-off region between computing and download latencies ?
Fundamental Question
◆ Characterize the inner bound and outer bound on the compute-download latency
region at any given upload latency
◆ Present tradeoffs among upload, computing, and download latencies.
2 4 6 8 10 12 14 16 1 1.5 2 2.5 3 3.5 4 4.5 NCT (c) NDLT (
d)
Outer Bound Inner Bound
Compute-download latency region at for M=K=10
◼
M=5 users, K=5 ENs, N=5 input vectors, m=40 row vectors, μ=3/5, (r, q)=(4, 3)
◼
Each user divides input vectors into 5
4 =5 subsets, each has 1 input and is
assigned to a distinct subset of 4 ENs
◼
Uplink: 5-transmitter 5-receiver X-multicast channel with multicast group size 4
◆ Optimal per-receiver DoF ◆ Approximated transmission rate: ◆ Upload time:
◼
The NULT:
EN 3
ui,1 ui,4
10
Example: Task Assignment & Upload
ui,4 ui,2 ui,5 ui,1ui,3 ui,4 ui,5 ui,1ui,2 ui,3 ui,5 ui,1ui,2 ui,3 ui,4 ui,3 ui,1ui,2 ui,4 ui,5
EN 5 EN 1 EN 4 EN 2
user
1 i 5 i = 1, ..., 5
ui,3 ui,2
… …
Interference alignment
ui,5
Any 2 ENs may be stragglers at edge computing phases
◼
Hybrid MDS-Repetition codes
◼
Edge computing (q = 3)
◆ Each EN computes 24×20 row-vector products ◆ Waiting for the fastest 3 ENs, the NCT is
11
◆ Coding rates :
storage constraint: recovery condition: choose maximum :
◆ Encode A into Ac with 60 rows, then split into
submatrices, each with 6 rows and replicated at 2 ENs.
Example: Coding & Edge Computing
EN 4 EN 5 EN 2 EN 1
× × × × ×
EN 3
ui,1 ui,3 ui,4 ui,5
ui,1 ui,2 ui,3 ui,5
ui,1 ui,2 ui,3 ui,4 ui,4 ui,2
i = 1, ..., 5
ui,5 ui,3 ui,1 ui,2 ui,4 ui,5
Inputs assignment at r = 4
1
a
4
a
2
a
5
a
3
a
6
a
1
a
2
a
3
a
4
a
5
a
6
a
27
a
13
a
31
a
42
a
7
a
1
a
8
a
1 1
a
9
a
2 1
a
40
a
41
a
45
a
31
a
33
a
37
a
32
a
34
a
38
a
33
a
36
a
39
a
43
a
46
a
49
a
52
a
44
a
47
a
50
a
53
a
48
a
51
a
54
a
55
a
58
a
56
a
59
a
57
a
60
a
15
a
13
a
14
a
16
a
17
a
18
a
19
a
20
a
21
a
22
a
23
a
24
a
25
a
28
a
26
a
29
a
27
a
30
a
7
a
1
a
8
a
1 1
a
9
a
2 1
a
25
a
28
a
26
a
29
a
30
a
15
a
14
a
16
a
17
a
18
a
33
a
32
a
34
a
33
a
36
a
45
a
43
a
46
a
44
a
47
a
48
a
19
a
20
a
21
a
22
a
23
a
24
a
42
a
40
a
41
a
37
a
38
a
39
a
49
a
52
a
50
a
53
a
51
a
54
a
55
a
58
a
56
a
59
a
57
a
60
a
store MDS code
(60, 40)
A Ac = [ , ..., ]
1
a
60
a m
MDS code rate Repetition code rate select to store at each EN
Any 2 ENs may be stragglers
◼
Divide needed outputs into multiple groups
◼
Computation results of :
◆ Different groups are transmitted using TDMA ◆ Downlink channel for transmitting outputs in
each group is cooperative X channel
◆ 30 outputs
— 2-transmitter 5-receiver MISO broadcast channel — optimal per-receiver DoF: 2/5 by zero-forcing (ZF) precoding
12
Example: Output Data Download
Inputs assignment at r = 4 store MDS code
(60, 40)
A Ac = [ , ..., ]
1
a
60
a
MISO broadcast channel
◼
Divide needed outputs into multiple groups
◼
Computation results of :
◆ 30 outputs
— 2-transmitter 5-receiver MISO broadcast channel — optimal per-receiver DoF: 2/5 by zero-forcing (ZF) precoding
◆ The rest needed 34×5=170 outputs:
— 2-transmitter 5-receiver X channel,
— optimal per-receiver DoF: 1/3 by asymptotic interference alignment (IA)
◆ NDLT: 3/40+51/100=117/200 ◆ Different groups are transmitted using TDMA ◆ Downlink channel for transmitting outputs in
each group is cooperative X channel
13
Example: Output Data Download
Inputs assignment at r = 4 store MDS code
(60, 40)
A Ac = [ , ..., ]
1
a
60
a
MISO broadcast channel X channel
◼
Computation results of :
14
Example: Output Data Download
Inputs assignment at r = 4 store MDS code
(60, 40)
A Ac = [ , ..., ]
1
a
60
a
MISO broadcast channel X channel
◆ NDLT: 117/200×2 = 117/100
◼
Computation results of :
◼
Total NDLT:
◆ 3-transmitter 5-receiver cooperative X channel with
cooperation group size 2
◆ 3-transmitter 5-receiver X channel ◆ NDLT: (21/100+77/300)×2=14/15
15
Example: Output Data Download
Inputs assignment at r = 4 store MDS code
(60, 40)
A Ac = [ , ..., ]
1
a
60
a
cooperative X channel X channel
=14/15+(117/200)×3=1613/600
Note that the rest number of outputs in the last round can be regarded as an integer divided evenly by when
◼
At a pair (r, q) in
where and is determined by
◼
For an NULT , the inner bound of compute-download latency region: (time- and memory-sharing)
◆
NULT:
◆
NCT:
◆
NDLT:
16
Achievable Results
Consider all feasible values of q
◼
r = K and using only ZF precoding in downlink:
it is consistent with the normalized communication delay in [Zhang’19, Eq. (13)].
◼
q = K and µ = 1:
It recovers the communication load in [songze’17, Remark 5], and the NDT with cache-aided EN cooperation in [Sengupta’17, Eq. (25)].
17
Special Cases
[Zhang’19] Zhang and O. Simeone, “On model coding for distributed inference and transmission in mobile edge computing systems,” IEEE Commun. Letters, vol. 23, no. 6, pp. 1065–1068, Jun. 2019. [songze’17] S. Li, M. A. Maddah-Ali, and A. S. Avestimehr, “Communication-aware computing for edge processing,” in Proc. IEEE Int.
- Symp. Inf. Theory (ISIT), Jun. 2017, pp. 2885–2889.
[Sengupta’17] A. Sengupta, R. Tandon, and O. Simeone, “Fog-aided wireless networks for content delivery: Fundamental latency tradeoffs,” IEEE Trans. Inf. Theory, vol. 63, no. 10, pp. 6650–6678, 2017.
◼
At a pair (r, q) in
◆
NULT:
◆
NCT:
◆
NDLT:
◼
For an NULT , the outer bound of compute-download latency region:
◼
Optimality
18
Converse
◆ The NULT is optimal ◆ The NDLT or NCT is within constant multiplicative gaps to their respective lower
bounds for sufficiently large NULT
◼
M=K=10, µ=3/5
19
Numerical Results: Inner and Outer Bound
◆ Maximum q: Hybrid MDS-repetition codes degrade to repetition code ◆ As q increases, the NDLT is reduced at the expense of an increasing NCT ◆ Allowing for a longer upload time increases the compute-download latency region.
q = 10:
Repetition code Repetition code
Increase NULT
◼
Propose a policy based on hybrid coded computing and on coordinated and cooperative interference management
◼
Characterize the trade-off region between computing and download latencies at a given upload latency
◼
Reveal the tradeoffs among upload, computing, and download latencies
◼
Provide the converse result
◼
Full version: https://arxiv.org/abs/2004.14170
20
Conclusions
◆ Increasing the upload latency can reduce both computing and download latencies ◆ By increasing the computing latency, the download latency can be reduced
Thank you!
23