Kube-Knots: Resource Harvesting through Dynamic Container Orchestration in GPU-based Datacenters
Prashanth Thinakaran, Jashwant Raj Gunasekaran, Bikash Sharma, Chita Das, Mahmut Kandemir
September 25th, IEEE CLUSTER’19
Kube-Knots: Resource Harvesting through Dynamic Container - - PowerPoint PPT Presentation
Kube-Knots: Resource Harvesting through Dynamic Container Orchestration in GPU-based Datacenters Prashanth Thinakaran , Jashwant Raj Gunasekaran, Bikash Sharma, Chita Das, Mahmut Kandemir September 25th, IEEE CLUSTER19 Motivation Sub-PF GPU
September 25th, IEEE CLUSTER’19
2
1 https://openai.com/blog/ai-and-compute/ 2 Schwartz, Roy, et al. "Green AI." arXiv preprint arXiv:1907.10597 (2019)
Pre GPU
3
1 https://openai.com/blog/ai-and-compute/ 2 Schwartz, Roy, et al. "Green AI." arXiv preprint arXiv:1907.10597 (2019)
Pre GPU
Sub-PF GPU Training
Algorithmic Parallelism & TPUs
Most of the contribution was on improving accuracy but not resource efficiency!!!
Kube-Knots focus on Green AI (Efficiency) instead of Red AI (Accuracy)
4
5
6
7
8
9
10
11
12
Predictable load over time Tightly correlated metrics No solid leads
13
14
1 2 4 8 16 32 64 128
Inference Batch Sizes
20 40 60 80 100
% GPU Memory Used
TF face imc key ner pos chk
15
1 2 4 8 16 32 64 128
Inference Batch Sizes
20 40 60 80 100
% GPU Memory Used
TF face imc key ner pos chk
16
17
App-Mix-1 App-Mix-2 App-Mix-3
18
App-Mix-1 App-Mix-2 App-Mix-3
19
20
21
22
23
24
25
compared to GPU-Agnostic scheduler
to active GPUs.
App-Mix-1 App-Mix-2 App-Mix-3
26
App-Mix-1 App-Mix-2 App-Mix-3
27
28
29
September 25th, IEEE CLUSTER’19
“Workload Setup Docker TensorFlow / HPC experiments used in evaluation of kube-knots,” https://hub.docker.com/r/prashanth5192/gpu
31
Uniform Kubernetes default Scheduler GPUs cannot be shared Low PPW and No QoS guarantees Resource Agnostic Sharing First Fit Decreasing bin-packing High PPW Poor QoS and high queueing delays
Correlation Based Provisioning
Utilization metrics based bin-packing High PPW Assured QoS but high queueing delays due to affinity constraints Peak Prediction Predicts the resource peaks of co-scheduled apps by Auto Correlation Factor High PPW and Assured QoS guarantees