Grid Datafarm Architecture and Standardization of Grid File System - - PowerPoint PPT Presentation

grid datafarm architecture and standardization of grid
SMART_READER_LITE
LIVE PREVIEW

Grid Datafarm Architecture and Standardization of Grid File System - - PowerPoint PPT Presentation

ISGC2004 July 2004, Taipei Grid Datafarm Architecture and Standardization of Grid File System Osamu Tatebe Tatebe Osamu Grid Technology Research Center, AIST Grid Technology Research Center, AIST National Institute of Advanced Industrial


slide-1
SLIDE 1

ISGC2004 July 2004, Taipei

National Institute of Advanced Industrial Science and Technology

Grid Datafarm Architecture and Standardization of Grid File System

Osamu Osamu Tatebe Tatebe Grid Technology Research Center, AIST Grid Technology Research Center, AIST

slide-2
SLIDE 2

National Institute of Advanced Industrial Science and Technology

[Background] Petascale Data Intensive Computing

Detector for ALICE experiment Detector for LHCb experiment

High Energy Physics

CERN LHC, KEK Belle ~MB/collision,

100 collisions/sec

~PB/year 2000 physicists, 35 countries

Astronomical Data Analysis

data analysis of the whole data TB~PB/year/telescope SUBARU telescope 10 GB/night, 3 TB/year

slide-3
SLIDE 3

National Institute of Advanced Industrial Science and Technology

Petascale Data-intensive Computing Requirements

Peta/Exabyte Peta/Exabyte scale files, millions of millions of files scale files, millions of millions of files Scalable computational power Scalable computational power > 1TFLOPS, hopefully > 10TFLOPS Scalable parallel I/O throughput Scalable parallel I/O throughput > 100GB/s, hopefully > 1TB/s within a system and between systems Efficiently global sharing Efficiently global sharing with group with group-

  • oriented
  • riented

authentication and access control authentication and access control Fault Tolerance Fault Tolerance / Dynamic re / Dynamic re-

  • configuration

configuration Resource Management and Scheduling Resource Management and Scheduling System monitoring and administration System monitoring and administration Global Computing Environment Global Computing Environment

slide-4
SLIDE 4

National Institute of Advanced Industrial Science and Technology

Goal and feature of Grid Datafarm

Goal Goal Dependable data sharing among multiple organizations High-speed data access, High-performance data computing Grid Grid Datafarm Datafarm Gfarm File System – Global dependable virtual file system

Federates scratch disks in PCs

Parallel & distributed data computing

Associates Computational Grid with Data Grid

Features Features Secured based on Grid Security Infrastructure Scalable depending on data size and usage scenarios Data location transparent data access Automatic and transparent replica selection for fault tolerance High-performance data access and computing by accessing multiple dispersed storages in parallel (file affinity scheduling)

slide-5
SLIDE 5

National Institute of Advanced Industrial Science and Technology

Grid Datafarm (1): Gfarm file system - World- wide virtual file system [CCGrid 2002]

Transparent access to dispersed file data in a Grid Transparent access to dispersed file data in a Grid POSIX I/O APIs Applications can access Gfarm file system without any modification Automatic and transparent replica selection for fault tolerance and access-concentration avoidance

Gfarm File System

/grid ggf jp aist gtrc file1 file3 file2 file4 file1 file2 File replica creation

Virtual Directory Tree

mapping

File system metadata

slide-6
SLIDE 6

National Institute of Advanced Industrial Science and Technology

Grid Datafarm (2): High-performance data access and computing support [CCGrid 2002]

Do not separate Storage and CPU Parallel and distributed file I/O

slide-7
SLIDE 7

National Institute of Advanced Industrial Science and Technology

Trans-Pacific Grid Datafarm testbed: Network and cluster configuration [SAINT2004]

2.4G 2.4G 10G 10G 1G 2.4G(1G) 1G 1G SuperSINET APAN/TransPAC

Los Angeles

622M

AIST Titech Maffin

10G 10G

APAN Tokyo XP

SuperSINET Tsukuba WAN 10G 2.4G

New York

OC-12 ATM SC2003 Phoenix 32 nodes 23.3 TBytes 2 GB/sec 5G 16 nodes 11.7 TBytes 1 GB/sec 16 nodes 11.7 TBytes 1 GB/sec

7 nodes 3.7 TBytes 200 MB/sec 10 nodes 1 TBytes 300 MB/sec 147 nodes 16 TBytes 4 GB/sec

Indiana Univ

Kasetsart Univ, Thailand

SDSC

Trans-Pacific thoretical peak 3.9 Gbps Gfarm disk capacity 70 TBytes disk read/write 13 GB/sec Chicago

Abilene Abilene

KEK

Univ Tsukuba NII

1G [2.34 Gbps] [950 Mbps] [500 Mbps]

slide-8
SLIDE 8

National Institute of Advanced Industrial Science and Technology

Scientific Application (1)

ATLAS Data Production ATLAS Data Production Distribution kit Atlfast – fast simulation

Input data stored in Gfarm file system not NFS

G4sim – full simulation (Collaboration with ICEPP, KEK) Belle Monte Belle Monte-

  • Carlo Production

Carlo Production 30 TB data needs to be generated 10 M events generated in a few days using a 50-node PC cluster Simulation data will be generated distributedly in tens of universities and KEK (Collaboration with KEK, U-Tokyo)

slide-9
SLIDE 9

National Institute of Advanced Industrial Science and Technology

Scientific Application (2)

Astronomical Object Survey Astronomical Object Survey Data analysis on the whole archive 652 GBytes data observed by SUBARU telescope Large configuration data from Lattice QCD Large configuration data from Lattice QCD Three sets of hundreds of gluon field configurations on a 24^3*48 4-D space-time lattice (3 sets x 364.5 MB x 800 = 854.3 GB) Generated by the CP-PACS parallel computer at Center for Computational Physics, Univ. of Tsukuba (300Gflops x years of CPU time)

slide-10
SLIDE 10

National Institute of Advanced Industrial Science and Technology

File transfer via standard protocol

Multiple Multiple GridFTP GridFTP servers for a single servers for a single Gfarm Gfarm file file system system

In collaboration with AIST and Kasetsart University, Thailand

slide-11
SLIDE 11

National Institute of Advanced Industrial Science and Technology

GridFTP data transfer performance

GridFTP download performance 20 40 60 80 100 120 1 2 3 4 5 6 7 Number of clients Transfer rate [MB/sec] local file system 1 server for Gfarm file system 2 servers for Gfarm file system

Two GridFTP servers can provide almost peak performance (1 Gbps)

slide-12
SLIDE 12

National Institute of Advanced Industrial Science and Technology

File Sharing by Windows PCs

Multiple Samba servers for Multiple Samba servers for Gfarm Gfarm file system file system

slide-13
SLIDE 13

National Institute of Advanced Industrial Science and Technology

Development Status and Future Plan

Gfarm Gfarm v1 v1 Reference implementation of Grid Datafarm architecture Version 1.0.3.1 released on July 5, 2004 (http://datafarm.apgrid.org/) Gfarm Gfarm v2 v2 – – towards *true* global virtual file system towards *true* global virtual file system POSIX compliant - supports read-write mode, advisory file locking, . . . Robustness improved, Security enhanced. Can be substituted for NFS, AFS, . . . Application area Application area Scientific application (High energy physics, Astronomic data analysis, Bioinformatics, Computational Chemistry, Computational Physics, . . .) Business application (Dependable data computing in eGovernment and eCommerce, . . .) Other applications that needs dependable file sharing among several

  • rganizations

Standardization effort with GGF Grid File System WG (GFS Standardization effort with GGF Grid File System WG (GFS-

  • WG)

WG) Foster (world-wide) storage sharing and integration dependable data sharing, high-performance data access among several

  • rganizations
slide-14
SLIDE 14

National Institute of Advanced Industrial Science and Technology

Global Grid Forum Grid File System WG

https://forge.gridforum.org/projects/gfs https://forge.gridforum.org/projects/gfs-

  • wg/

wg/

slide-15
SLIDE 15

National Institute of Advanced Industrial Science and Technology

Charter of GFS-WG

Two goals (two documents) Two goals (two documents) File System Directory Services

Manage namespace for files, access control, and metadata management

Architecture for Grid File System Services

Provides functionality of virtual file system in grid environment Facilitates federation and sharing of virtualized data Uses File System Directory Services and standard access protocols

slide-16
SLIDE 16

National Institute of Advanced Industrial Science and Technology

File System Directory Services

Recommendation Document: Recommendation Document: “ “File System Directory Services Specification File System Directory Services Specification” ” provides global namespace manages virtual file system directories in association with access permission, and other application-specific metadata http://www.ggf.org/Meetings/GGF11/Documents/F http://www.ggf.org/Meetings/GGF11/Documents/F ile_System_Directory_Service_Specification.pdf ile_System_Directory_Service_Specification.pdf Will be submitted in GGF13 (March 2005) Will be submitted in GGF13 (March 2005)

slide-17
SLIDE 17

National Institute of Advanced Industrial Science and Technology

Architecture of Grid File System Services

Recommendation Document: Recommendation Document: “ “Architecture for Grid File System Services Architecture for Grid File System Services” ” provides a composition of services to realize Grid File System using File System Directory Services and other common services https://forge.gridforum.org/projects/gfs https://forge.gridforum.org/projects/gfs-

  • wg/document/Architecture_Specification_for_Grid_

wg/document/Architecture_Specification_for_Grid_ File_System_Services/en/2 File_System_Services/en/2 Will be submitted in GGF14 (June 2005) Will be submitted in GGF14 (June 2005)

slide-18
SLIDE 18

National Institute of Advanced Industrial Science and Technology

File System Directory Services (1)

Web Service ( Web Service (aka aka Grid Service) Grid Service)

Directory PortType

Grid namespace Grid namespace (Virtual directory tree,

(Virtual directory tree, Grid namespace for files, OGSA naming) Grid namespace for files, OGSA naming)

Virtual directories Junctions (referrals)

points to an object (logical or physical files, file systems, other File System Directory Service instances) Translation point to a grid-wide pointer

Attributes (and Status) Attributes (and Status)

Attributes: ACL, times (Status: open, lock) /grid ggf jp data gfs file1 file3 file2 file4 file1 file2 EPR1 (WS-Addressing Endpoint Reference) EPR2

slide-19
SLIDE 19

National Institute of Advanced Industrial Science and Technology

File System Directory Services (2)

Operations Operations

lookup, list

Obtain entry or entries that includes target

create, delete, move update

Management of attributes and status

/grid ggf jp data gfs file1 file3 file2 file4 file1 file2

slide-20
SLIDE 20

National Institute of Advanced Industrial Science and Technology

Grid File System Services

Provides virtual file system by integrating and Provides virtual file system by integrating and federating file systems in a grid federating file systems in a grid Service oriented architecture using several Service oriented architecture using several services services Architecture document Architecture document Provide a composition of services to realize Grid file system

What kind of functionalities each service should have How these services relate with each other

slide-21
SLIDE 21

National Institute of Advanced Industrial Science and Technology

Architecture of Grid file system

/grid ggf jp data gfs file1 file3 file2 file4 file1 file2 File System Directory Services File Access Services (GridFTP server, SRM), . . . file4 File System Services Client Library POSIX APIs ReplicaSet Services file4 file2 File System Server Applications, Users, . . . NFSv4, . . .

slide-22
SLIDE 22

National Institute of Advanced Industrial Science and Technology

Grid File System

Directory Service

Specification (slide courtesy by Manuel Pereira at IBM Almaden Research Center)

slide-23
SLIDE 23

National Institute of Advanced Industrial Science and Technology

Filesystem Directory Service Objects

Object oriented service interface Object oriented service interface Foundational communication objects Foundational communication objects Reply Entry

slide-24
SLIDE 24

National Institute of Advanced Industrial Science and Technology

Filesystem Directory Service Objects

slide-25
SLIDE 25

National Institute of Advanced Industrial Science and Technology

Summary

Grid Grid Datafarm Datafarm architecture architecture Gfarm file system – Global dependable virtual file system

Dependable data sharing via shared file system among multiple

  • rganizations

Parallel & distributed data computing

Associates Computational Grid with Data Grid High-speed data access, High-performance data computing

Applications Applications ATLAS, Belle, Astronomical data analysis, Lattice QCD, . . . GridFTP server, samba server, . . . Standardization of Grid file system Standardization of Grid file system – – GGF GFS GGF GFS-

  • WG

WG File System Directory Service Specification Architecture of Grid File System Join Join gfs gfs-

  • wg@ggf.org

wg@ggf.org mailing list, join weekly teleconference mailing list, join weekly teleconference Next GFS Next GFS-

  • WG F2F meeting will be at San Jose on August 6, 2004

WG F2F meeting will be at San Jose on August 6, 2004 Next GGF Next GGF-

  • 12 will be on September 20

12 will be on September 20-

  • 23, 2004 at

23, 2004 at Brussels, Belgium Brussels, Belgium

GGF GFS GGF GFS-

  • WG

WG https://forge.gridforum.org/projects/gfs https://forge.gridforum.org/projects/gfs-

  • wg

wg/ / Grid Grid Datafarm Datafarm https://datafarm.apgrid.org/ https://datafarm.apgrid.org/