DADI Block-Level Image Service for Agile and Elastic Application Deployment
Huiba Li, Yifan Yuan, Rui Du, Kai Ma, Lanzheng Liu and Windsor Hsu Alibaba Group
DADI Block-Level Image Service for Agile and Elastic Application - - PowerPoint PPT Presentation
DADI Block-Level Image Service for Agile and Elastic Application Deployment Huiba Li, Yifan Yuan, Rui Du, Kai Ma, Lanzheng Liu and Windsor Hsu Alibaba Group The Problem Container deployment (cold startup) is slow Long-tail latency reaches
Huiba Li, Yifan Yuan, Rui Du, Kai Ma, Lanzheng Liu and Windsor Hsu Alibaba Group
Features Existing Sys Complexity Universality Security Overall Block- Device- Based
regular file system, such as ext4
secure container and virtual machine Cider (based on Ceph; no layering format.) Low stability↑
advanced features↑ App can choose a best-match file system, e.g. NTFS, and pack it into the image as a dependence. small attack surface need the courage to walk alone (almost) TODO: layering File- System- Based
interface directly
container image
(due to inertia and following the crowd) CRFS, Teleport, CernVM-FS, Slacker, Wharf, CFS High stability↓
advanced features↓ Fixed features; may not match all applications. (e.g. a Windows container on a Linux host) large attack surface Technical advantage is insignificant.
docker registry
download untar
Each layer is a change set compared to the previous state (files added, modified, deleted) (read-only, shared) docker registry
download untar
Each layer is a change set compared to the previous state (files added, modified, deleted) (read-only, shared) Container layer is a change set compared to the image (files added, modified, deleted) (read-write, private) docker registry
download untar
Each layer is a change set compared to the previous state (files added, modified, deleted) (read-only, shared) Container layer is a change set compared to the image (files added, modified, deleted) (read-write, private) Usually the layers are stored in separate directories, and a merged view is created with a kernel module overlayfs. docker registry
download untar
App Processes directories container user space kernel space
directories layers (directories)
Docker Registry
download, ungzip & untar
Overlay Block Device
ZFile Overlay Block Device
ZFile P2P on-demand read in a tree-structured topology Overlay Block Device
App Processes regular file system (ext4, etc.) virtual block device OverlayBD file system (ext4, etc.) container
P2P RPC
for downloaded layers
user space kernel space
for new layers
ZFile lsmd daemon ZFile ZFile (layer blobs)
2 15 87 150 1 4 10 50 10 30 15
pread
length Segment raw data to read raw data raw data to read hole
hole
raw data to read
save memory by combining
sorted by logical offsets
5 10 100 5 10 10 2 1 5 3 5 10 10 20 87 5 2 15 87 150 1 4 10 50 10 30 15 30 15 13 100 10 110 27 150 10
+ =>
length Segment
# of Segments in Merged Index 0K 1K 2K 3K 4K 5K Layers Depth 5 10 15 20 25 30 35 40 45 Merged Index Size of Productional Images 4.5K * 16 bytes = 72KB
Queries / Second 0M 3M 6M 9M Size of Index (# of Segments) 1K 2K 3K 4K 5K 6K 7K 8K 9K 10K IOPS (bs=8KB, non-cached)
0K 30K 60K 90K 120K
I/O Queue Depth 1 2 4 8 16 32 64 128 256 Thin LVM DADI w/o comp DADI - ZFile
> 6M QPS for productional images
Data (R/W) Index (R/W)
Header
Index
Trailer
Raw Data
Layer (RO)
Header
Raw Data Index
Header
append append commit
Header
Index
Trailer
Compressed Chunks [Dict]
Header
Index
Trailer
Raw Data
ZFile Underlay file
(DADI layer blob)
Registry
DADI-Root DADI-Agent DADI-Agent DADI-Agent DADI-Agent DADI-Agent DADI-Agent DADI-Root DADI-Agent DADI-Agent DADI-Agent DADI-Agent DADI-Agent DADI-Agent DADI-Agent DADI-Agent HTTP(S) request DADI request Datacenter 1 Datacenter 2 DADI-Agent
Cold Start Latency (s) 5 10 15 20
.tgz +
CRFS pseudo Slacker DADI from Registry DADI from P2P Root
Image Pull App Launch Warm Startup Latency (s) 0.6 1.2 1.8 2.4
Thin LVM (device mapper) DADI
NVMe SSD Cloud Disk
Startup Latency (s) 0.0 0.6 1.2 1.8 2.4
Warm Cache Cold Cache
app launch with prefetch app launch Cold Startup Latency (s) 0.0 1.0 2.0 3.0 # of Hosts (and Containers)
10 20 30 40
pseudo-Slacker DADI
# of Container Instances Started 0K 3K 5K 8K 10K Time (s) 1 2 3 4
Cold Startup 1 Cold Startup 2 Cold Startup 3 Warm Startup
Estimated Startup Latencies (s) 1.5 2.0 2.5 3.0 3.5 # of Containers 10K 20K 30K 40K 50K 60K 70K 80K 90K 100K 2-ary tree 3-ary tree 4-ary tree 5-ary tree
Large-Scale Startup of Agilityon 1,000 hosts Projected Hyper-Scale Startup of Agility (by evaluating a single branch of the P2P tree) (Agility is a small application specifically written in Python to assist the test)
Time to du All Files (s) 0.4 0.8 1.2 1.6
Thin LVM DADI
NVMe SSD Cloud Disk Time to tar All Files (s) 3 6 9 12
Thin LVM DADI
NVMe SSD Cloud Disk
Image Scanning with du Image Scanning with tar