- Dr. Mais Nijim
20.07.2010
1
Dr. Mais Nijim 1 20.07.2010 Motivation Introduction Related Work - - PowerPoint PPT Presentation
Dr. Mais Nijim 1 20.07.2010 Motivation Introduction Related Work 2 20.07.2010 Global online satellite images distribution system operated at the Earth Resources Observation and Science (EROS) center of the U.S Geological Survey
20.07.2010
1
Motivation Introduction Related Work
20.07.2010
2
Global online satellite images distribution system
Science (EROS) center of the U.S Geological Survey The EROS system motivates the needs for prefetching to improve the performance of hybrid storage system
20.07.2010
3
Reference:
20.07.2010
4
Study shows that new data is growing annually at the rate of 30% Supercomputing centers and rich media
Lawrence National laboratory, Oakridge national Lab, NASA, Google, and CNN rely on the large scale storage systems to meet demanding requirements of large data capacity with high performance and reliability
20.07.2010
5
Large scale storage systems have to be developed to fulfill rapidly increasing demands on both large storage capacity with high performance and high I/O performance Storage capacity employing more disks I/O performance increasing the number of storage components
20.07.2010
6
Hybrid storage system Solid State Drives Hard Disks Tapes
20.07.2010
7
Solid State Disks Highly accessed storage objects in a hybrid storage system can be prefetched and cashed to a high speed storage components Solid-state disks can be readily connected to any
20.07.2010
8
Tape Storage Hybrid storage systems are cost-effective, because of the inexpensive tapes Tape storage system has high reliability, long archive life time, and low cost Tapes are ideal storage platform for a wide variety of data-intensive applications
20.07.2010
9
Prefetching is a promising solution to the reduction
and tapes Prefetching is a process that aims at reducing the number of requests issued to HDDS or tapes while caching popular data in SSDs Aggressive prefetching are need to efficiently reduce I/O latency Overaggressive scheme may waste I/O bandwidth by transforming useless data
20.07.2010
10
LAN
SAN
FTP server with solid state disks
Web users Lower level prefetching tapes to SAN
Data miss
Upper level prefetching SAN to FTP
Data miss
¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡Fig.2 ¡The ¡Hybrid ¡Storage ¡System ¡Architecture ¡with ¡Prefetching
20.07.2010
11
Move data from parallel tape storage to hard disks Parallel tape can increase the aggregate bandwidth between the disk storage and the tape storage by changing parallel load/unload operation
20.07.2010
12
To support parallelism Data striping technique is used To obtain the optima striping, data size and workload are considered
20.07.2010
13
Striping can cause lots of small data requests Increase switch time Data Placement Algorithm Propose a data clustering algorithm that clusters
together Related data requests are highly to be requested together
20.07.2010
14
O41 O31 O21 O11 Tape Library 1 O42 O32 O22 O12 Tape Library 2 O43 O33 O23 O13 Tape Library 3 Index 1 2 3 Reference O1 O2 O3 Priority 4 3 2 Step 1 3 2 4 Disk 1 O11 O31 O13 O33 Disk 2 O12 O32 O21 Updated priority
LRU Eviction Either O11,O12 or O13
15
start
R={ri,ri+1,…,rj} P(r) Tape-library(r)) If data is in disk system yes No prefetching No Fetch data from As many tapes As possible Round Robin Data placement If disk if full yes LRU Eviction policy No
20.07.2010
16
The first component is solid state partitioning PaSSD Dynamically partition the array of the solid states among HDDs in such a way to maximize I/O performance Allocated dynamically depending on the popularity, size of contents, and access pattern
20.07.2010
17
Two approaches:
20.07.2010
18
yes yes No
start
R={ri,ri+1,…,rj} P(r), block(r),disk(r) If data is in Solid-state No prefetching No Apply PaSSD Fetch data from as many disks as possible If solid-states LRU Eviction policy
20.07.2010
19
20.07.2010
20
Access time when prefetching is not carried out Access time when prefetching is carried out
The ultimate goal of our analytical model is to provide criteria that can mathematically evaluate the performance
Average Access time improvement S, where S is defined as
20.07.2010
21
System Utilization A job is defined as the retrieval time of an object. Therefore, the above equation gives the average retrieval time of an item.
In the server access model, we consider multiple users accessing the network through the ftp server we consider M/G/1 round robin queuing system In this system, the average time to finish a job, necessitate a service time x, is calculated as follows
20.07.2010
22
Size of object located in disks Size of object located in tapes
20.07.2010
23
The average service time x is calculated as
20.07.2010
24
Prefetching a proportion hs of the users requests results in a hit in the solid-state disks, which means that this portion is served by the solid state disks The failure ratio fs=1-hs which means that the requests are located in the disk systems and/or the tapes The portion hd results in a hit in the disk system fd=1-hs-hd means that the request is served by the tape storage
20.07.2010
25
20.07.2010
26
20.07.2010
27
Average number of items to be prefetched from tapes to disks Average number of items to be prefetched from disk to solid state disks
20.07.2010
28
Probability of items to be prefetched from the tape Probability of items to be prefetched from the disk system to the solid states
20.07.2010
29
The hit ratio in the disk system will be increased by the number of the prefetched items When the data objects are to prefetched from the disk system to the solid-state disks, the hit ratio in the solid state-disks is expected to rise
20.07.2010
30
20.07.2010
31
20.07.2010
32
20.07.2010
33
The use of large scale parallel disk systems continues to rise as the demands for data-intensive applications with large capacities grow Traditional storage systems scale up storage capacity by employing more hard disk drives, which tends to be an expensive solution due to ever increasing cost for HDDs In hybrid storage systems, judiciously transferring data back and forth among SSDs, HDDS, and tapes is critical for I/O performance A multi-layer prefetching algorithm (PreHySys) that can reduce missing rate of high-end storage components thereby reducing the average response time for data requests in hybrid storage systems
20.07.2010
34