Dr. Mais Nijim 1 20.07.2010 Motivation Introduction Related Work - - PowerPoint PPT Presentation

dr mais nijim
SMART_READER_LITE
LIVE PREVIEW

Dr. Mais Nijim 1 20.07.2010 Motivation Introduction Related Work - - PowerPoint PPT Presentation

Dr. Mais Nijim 1 20.07.2010 Motivation Introduction Related Work 2 20.07.2010 Global online satellite images distribution system operated at the Earth Resources Observation and Science (EROS) center of the U.S Geological Survey


slide-1
SLIDE 1
  • Dr. Mais Nijim

20.07.2010

1

slide-2
SLIDE 2

 Motivation  Introduction  Related Work

20.07.2010

2

slide-3
SLIDE 3

 Global online satellite images distribution system

  • perated at the Earth Resources Observation and

Science (EROS) center of the U.S Geological Survey  The EROS system motivates the needs for prefetching to improve the performance of hybrid storage system

20.07.2010

3

slide-4
SLIDE 4

Reference:

20.07.2010

4

slide-5
SLIDE 5

 Study shows that new data is growing annually at the rate of 30%  Supercomputing centers and rich media

  • rganizations

 Lawrence National laboratory, Oakridge national Lab, NASA, Google, and CNN rely on the large scale storage systems to meet demanding requirements of large data capacity with high performance and reliability

20.07.2010

5

slide-6
SLIDE 6

 Large scale storage systems have to be developed to fulfill rapidly increasing demands on both large storage capacity with high performance and high I/O performance  Storage capacity employing more disks  I/O performance increasing the number of storage components

20.07.2010

6

slide-7
SLIDE 7

Hybrid storage system Solid State Drives Hard Disks Tapes

20.07.2010

7

slide-8
SLIDE 8

 Solid State Disks  Highly accessed storage objects in a hybrid storage system can be prefetched and cashed to a high speed storage components  Solid-state disks can be readily connected to any

  • ther type of storage devices

20.07.2010

8

slide-9
SLIDE 9

 Tape Storage  Hybrid storage systems are cost-effective, because of the inexpensive tapes  Tape storage system has high reliability, long archive life time, and low cost  Tapes are ideal storage platform for a wide variety of data-intensive applications

20.07.2010

9

slide-10
SLIDE 10

 Prefetching is a promising solution to the reduction

  • f latency of data transferring among SSDs, HDDs,

and tapes  Prefetching is a process that aims at reducing the number of requests issued to HDDS or tapes while caching popular data in SSDs  Aggressive prefetching are need to efficiently reduce I/O latency  Overaggressive scheme may waste I/O bandwidth by transforming useless data

20.07.2010

10

slide-11
SLIDE 11

LAN

SAN

FTP server with solid state disks

Web users Lower level prefetching tapes to SAN

Data miss

Upper level prefetching SAN to FTP

Data miss

¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡Fig.2 ¡The ¡Hybrid ¡Storage ¡System ¡Architecture ¡with ¡Prefetching

20.07.2010

11

slide-12
SLIDE 12

 Move data from parallel tape storage to hard disks  Parallel tape can increase the aggregate bandwidth between the disk storage and the tape storage by changing parallel load/unload operation

20.07.2010

12

slide-13
SLIDE 13

 To support parallelism  Data striping technique is used  To obtain the optima striping, data size and workload are considered

20.07.2010

13

slide-14
SLIDE 14

 Striping can cause lots of small data requests  Increase switch time  Data Placement Algorithm  Propose a data clustering algorithm that clusters

  • bjects with high probability to be requested

together  Related data requests are highly to be requested together

20.07.2010

14

slide-15
SLIDE 15

O41 O31 O21 O11 Tape Library 1 O42 O32 O22 O12 Tape Library 2 O43 O33 O23 O13 Tape Library 3 Index 1 2 3 Reference O1 O2 O3 Priority 4 3 2 Step 1 3 2 4 Disk 1 O11 O31 O13 O33 Disk 2 O12 O32 O21 Updated priority

  • 3
  • 1
  • 2

LRU Eviction Either O11,O12 or O13

  • 20.07.2010

15

slide-16
SLIDE 16

start

R={ri,ri+1,…,rj} P(r) Tape-library(r)) If data is in disk system yes No prefetching No Fetch data from As many tapes As possible Round Robin Data placement If disk if full yes LRU Eviction policy No

20.07.2010

16

slide-17
SLIDE 17

 The first component is solid state partitioning PaSSD  Dynamically partition the array of the solid states among HDDs in such a way to maximize I/O performance  Allocated dynamically depending on the popularity, size of contents, and access pattern

20.07.2010

17

slide-18
SLIDE 18

 Two approaches:

  • I. Content popularity based weight assignment
  • II. Collaborative popularity based weight assignment

20.07.2010

18

slide-19
SLIDE 19

yes yes No

start

R={ri,ri+1,…,rj} P(r), block(r),disk(r) If data is in Solid-state No prefetching No Apply PaSSD Fetch data from as many disks as possible If solid-states LRU Eviction policy

20.07.2010

19

slide-20
SLIDE 20

 Server Access Model  Access time with no prefetching  Access time with prefetching

20.07.2010

20

slide-21
SLIDE 21

Access time when prefetching is not carried out Access time when prefetching is carried out

 The ultimate goal of our analytical model is to provide criteria that can mathematically evaluate the performance

  • f our algorithm

 Average Access time improvement S, where S is defined as

20.07.2010

21

slide-22
SLIDE 22

System Utilization A job is defined as the retrieval time of an object. Therefore, the above equation gives the average retrieval time of an item.

 In the server access model, we consider multiple users accessing the network through the ftp server  we consider M/G/1 round robin queuing system  In this system, the average time to finish a job, necessitate a service time x, is calculated as follows

20.07.2010

22

slide-23
SLIDE 23

s = s + ʹ″ s

Size of object located in disks Size of object located in tapes

20.07.2010

23

slide-24
SLIDE 24

 The average service time x is calculated as

20.07.2010

24

slide-25
SLIDE 25

 Prefetching a proportion hs of the users requests results in a hit in the solid-state disks, which means that this portion is served by the solid state disks  The failure ratio fs=1-hs which means that the requests are located in the disk systems and/or the tapes  The portion hd results in a hit in the disk system  fd=1-hs-hd means that the request is served by the tape storage

20.07.2010

25

slide-26
SLIDE 26

20.07.2010

26

slide-27
SLIDE 27

20.07.2010

27

slide-28
SLIDE 28

Average number of items to be prefetched from tapes to disks Average number of items to be prefetched from disk to solid state disks

20.07.2010

28

slide-29
SLIDE 29

Probability of items to be prefetched from the tape Probability of items to be prefetched from the disk system to the solid states

20.07.2010

29

slide-30
SLIDE 30

 The hit ratio in the disk system will be increased by the number of the prefetched items  When the data objects are to prefetched from the disk system to the solid-state disks, the hit ratio in the solid state-disks is expected to rise

20.07.2010

30

slide-31
SLIDE 31

t = h⋅ 0+(1− ʹ″ h )⋅ rd +(1− h − ʹ″ h )rt = 1− hs − n (F)p1 b (1− hs −(1− p1))λs1 + fd − n (F)P2 − n (F)p1 ʹ″ s b b ʹ″ b − ʹ″ b (1− hd + n (F)(1− p2))λs +b (1− hs + n (F)(1− p1))λ ʹ″ s

20.07.2010

31

slide-32
SLIDE 32

20.07.2010

32

slide-33
SLIDE 33

20.07.2010

33

slide-34
SLIDE 34

 The use of large scale parallel disk systems continues to rise as the demands for data-intensive applications with large capacities grow  Traditional storage systems scale up storage capacity by employing more hard disk drives, which tends to be an expensive solution due to ever increasing cost for HDDs  In hybrid storage systems, judiciously transferring data back and forth among SSDs, HDDS, and tapes is critical for I/O performance  A multi-layer prefetching algorithm (PreHySys) that can reduce missing rate of high-end storage components thereby reducing the average response time for data requests in hybrid storage systems

20.07.2010

34