Workload-driven Analysis of File Systems in Shared Multi-Tier Data-Centers
- ver InfiniBand
- K. Vaidyanathan
- P. Balaji
- H. –W. Jin D.K. Panda
Network-Based Computing Laboratory Department of Computer Science and Engineering The Ohio State University
Workload-driven Analysis of File Systems in Shared Multi-Tier - - PowerPoint PPT Presentation
Workload-driven Analysis of File Systems in Shared Multi-Tier Data-Centers over InfiniBand K. Vaidyanathan P. Balaji H. W. Jin D.K. Panda Network-Based Computing Laboratory Department of Computer Science and Engineering The Ohio
Network-Based Computing Laboratory Department of Computer Science and Engineering The Ohio State University
– Primary means of electronic interaction – Online book-stores, World-cup scores, Stock markets –
– High Performance-to-cost ratio – Has been proposed by Industry and Research Environments [shah01]: CSP: A Novel System Architecture for Scalable Internet and Communication Services. H. V. Shah, D. B. Minturn, A. Foong, G. L. McAlpine, R. S. Madukkarumukumana and G. J. Regnier In USITS 2001
– provides specific services (serving static and dynamic content) – Use high speed interconnects like InfiniBand, Myrinet, etc.
Proxy Server Web Server (Apache) Application Server (PHP) Database Server (MySQL)
Storage
– Currently used by several ISPs and Web Service Providers (IBM, HP)
– Amount of data replicated increases linearly with the number of web- sites hosted Proxy Server Web Server Application Server Database Server WAN WAN
Website B Website C Website A
A B C A B C A B C
Storage
– Replication of content on all nodes if we use local file system – Need to fetch the document via network if we use network file system, however no replication required
Proxy Server SAN SAN SAN SAN Web Server Application Server Database Server
Network-based File Systems
Local file system Local file system
Data-Center Interaction File System Interaction
Local file system
compute node
Web Server
Local file system
Metadata Manager I/O(OST) Node I/O(OST) Node Meta Data Data Data
compute node compute node compute node
Client-side Cache Server-side Cache
– Small files get huge benefits, if in memory. Otherwise, we pay a penalty of accessing the disk – Large Files may not fit in memory and also have high penalties in accessing the disk
– Large Files, if striped, can reside in file system cache on multiple nodes – Small files also get benefits due to aggregate cache
– Multiple web-sites share the file system cache; each website has lesser amount of file system cache to utilize – Bursts of requests/accesses to one web-site may result in cache pollution – May result in drastic drop in the number of cache hits
– 8 SuperMicro SUPER X5DL8-GG nodes; Dual Intel Xeon 3.0 GHz processors – 512 KB L2 Cache, 2 GB memory; PCI-X 64 bit 133 MHz
– 8 SuperMicro SUPER P4DL6 nodes; Dual Intel Xeon 2.4 GHz processors – 512 KB L2 Cache, 512 MB memory; PCI-X 64 bit 133 MHz
– High Temporal locality (constant α) – Low Temporal locality (varying α)
6 GB 1K – 64MB Class 4 2 GB 1K – 16MB Class 3 450 MB 1K – 4MB Class 2 100 MB 1K – 1MB Class 1 25 MB 1K – 250K Class 0 Size File Sizes Class
(open() and close())
system due to striping of the file
50713 3000 44108 9600 2379 1400 76312 1500
Read Latency (no cache)
1998 7.7 13825 680 1578 4 1602 4
Read Latency (cache)
876 876 1060 1060 6 6 6 6
Open & Close
1M 4K 1M 4K 1M 4K 1M 4K
200000 400000 600000 800000 Zipf Class 0 Zipf Class 1 Zipf Class 2 Zipf Class 3 #packets sent/received
ext3fs pvfs lustre
200000 400000 600000 800000 TPCW Class 0 TPCW Class 1 TPCW Class 2 TPCW Class 3 #packets sent/receiv ext3fs pvfs lustre
– Increases proportionally compared to the local file system for PVFS – For Lustre, the traffic is close to that of the local file system – For dynamic content, the network traffic does not increase with increase in database size
locality
2000 4000 6000 8000 10000 12000 14000 Zipf Class 0 Zipf Class 1 Zipf Class 2 Zipf Class 3 T P S ext3fs ramfs pvfs lustre 50 100 150 200 250 TPCW Class 0 TPCW Class 1 TPCW Class 2 TPCW Class 3 T P S ext3fs ramfs pvfs lustre
20 40 60 80 100
α = 0.8 α = 0.75 α = 0.7 α = 0.65 α = 0.6 α = 0.55 α = 0.5 α = 0.4 α = 0.3
Workload with varying temporal locality TPS ext3fs pvfs lustre
0% 20% 40% 60% 80% 100% Single Shared Single Shared Single Shared Single Shared Single Shared Zipf Class Zipf Class 1 Zipf Class 2 Zipf Class 3 Zipf Class 4 Percentage of Cached/NonCached Content NonCached Cached
0% 10% 20% 30% 40% 50% 60% Low Load Medium Load Heavy Load
P erfo rm an ce Im p ro vem en t Zipf Class 0 Zipf Class 1 Zipf Class 2 0% 10% 20% 30% 40% 50% Low Load Medium Load Heavy Load Performance Improvement TPCW Class 0 TPCW Class 1 TPCW Class 2
– Avoidance of Cache Pollution – Reduced overhead of open() and close() operations for small files
2 4 6 8 10 12 14 16 18 20 α = 0.75 α = 0.65 α = 0.55 α = 0.45 Workload with varying temporal locality TPS
pvfs pvfs with ramfs
– Under-utilization of file system cache in clusters – Cache Pollution affects performance
aggregate cache and cache pollution effects
file system
– Combination of Network and Memory File System for static content with low temporal locality – Memory File System and local file system for static content with high temporal locality and dynamic content
memory cache and provide prioritization and QoS