ISGC2004 July 2004, Taipei
National Institute of Advanced Industrial Science and Technology
Grid Datafarm Architecture and Standardization of Grid File System - - PowerPoint PPT Presentation
ISGC2004 July 2004, Taipei Grid Datafarm Architecture and Standardization of Grid File System Osamu Tatebe Tatebe Osamu Grid Technology Research Center, AIST Grid Technology Research Center, AIST National Institute of Advanced Industrial
ISGC2004 July 2004, Taipei
National Institute of Advanced Industrial Science and Technology
National Institute of Advanced Industrial Science and Technology
Detector for ALICE experiment Detector for LHCb experiment
High Energy Physics
CERN LHC, KEK Belle ~MB/collision,
100 collisions/sec
~PB/year 2000 physicists, 35 countries
Astronomical Data Analysis
data analysis of the whole data TB~PB/year/telescope SUBARU telescope 10 GB/night, 3 TB/year
National Institute of Advanced Industrial Science and Technology
National Institute of Advanced Industrial Science and Technology
Goal Goal Dependable data sharing among multiple organizations High-speed data access, High-performance data computing Grid Grid Datafarm Datafarm Gfarm File System – Global dependable virtual file system
Federates scratch disks in PCs
Parallel & distributed data computing
Associates Computational Grid with Data Grid
Features Features Secured based on Grid Security Infrastructure Scalable depending on data size and usage scenarios Data location transparent data access Automatic and transparent replica selection for fault tolerance High-performance data access and computing by accessing multiple dispersed storages in parallel (file affinity scheduling)
National Institute of Advanced Industrial Science and Technology
Gfarm File System
/grid ggf jp aist gtrc file1 file3 file2 file4 file1 file2 File replica creation
Virtual Directory Tree
mapping
File system metadata
National Institute of Advanced Industrial Science and Technology
Do not separate Storage and CPU Parallel and distributed file I/O
National Institute of Advanced Industrial Science and Technology
2.4G 2.4G 10G 10G 1G 2.4G(1G) 1G 1G SuperSINET APAN/TransPAC
Los Angeles
622M
AIST Titech Maffin
10G 10G
APAN Tokyo XP
SuperSINET Tsukuba WAN 10G 2.4G
New York
OC-12 ATM SC2003 Phoenix 32 nodes 23.3 TBytes 2 GB/sec 5G 16 nodes 11.7 TBytes 1 GB/sec 16 nodes 11.7 TBytes 1 GB/sec
7 nodes 3.7 TBytes 200 MB/sec 10 nodes 1 TBytes 300 MB/sec 147 nodes 16 TBytes 4 GB/sec
Indiana Univ
Kasetsart Univ, Thailand
SDSC
Trans-Pacific thoretical peak 3.9 Gbps Gfarm disk capacity 70 TBytes disk read/write 13 GB/sec Chicago
Abilene Abilene
KEK
Univ Tsukuba NII
1G [2.34 Gbps] [950 Mbps] [500 Mbps]
National Institute of Advanced Industrial Science and Technology
Input data stored in Gfarm file system not NFS
National Institute of Advanced Industrial Science and Technology
National Institute of Advanced Industrial Science and Technology
In collaboration with AIST and Kasetsart University, Thailand
National Institute of Advanced Industrial Science and Technology
GridFTP download performance 20 40 60 80 100 120 1 2 3 4 5 6 7 Number of clients Transfer rate [MB/sec] local file system 1 server for Gfarm file system 2 servers for Gfarm file system
Two GridFTP servers can provide almost peak performance (1 Gbps)
National Institute of Advanced Industrial Science and Technology
National Institute of Advanced Industrial Science and Technology
Gfarm Gfarm v1 v1 Reference implementation of Grid Datafarm architecture Version 1.0.3.1 released on July 5, 2004 (http://datafarm.apgrid.org/) Gfarm Gfarm v2 v2 – – towards *true* global virtual file system towards *true* global virtual file system POSIX compliant - supports read-write mode, advisory file locking, . . . Robustness improved, Security enhanced. Can be substituted for NFS, AFS, . . . Application area Application area Scientific application (High energy physics, Astronomic data analysis, Bioinformatics, Computational Chemistry, Computational Physics, . . .) Business application (Dependable data computing in eGovernment and eCommerce, . . .) Other applications that needs dependable file sharing among several
Standardization effort with GGF Grid File System WG (GFS Standardization effort with GGF Grid File System WG (GFS-
WG) Foster (world-wide) storage sharing and integration dependable data sharing, high-performance data access among several
National Institute of Advanced Industrial Science and Technology
National Institute of Advanced Industrial Science and Technology
National Institute of Advanced Industrial Science and Technology
National Institute of Advanced Industrial Science and Technology
National Institute of Advanced Industrial Science and Technology
Web Service ( Web Service (aka aka Grid Service) Grid Service)
Grid namespace Grid namespace (Virtual directory tree,
(Virtual directory tree, Grid namespace for files, OGSA naming) Grid namespace for files, OGSA naming)
points to an object (logical or physical files, file systems, other File System Directory Service instances) Translation point to a grid-wide pointer
Attributes (and Status) Attributes (and Status)
Attributes: ACL, times (Status: open, lock) /grid ggf jp data gfs file1 file3 file2 file4 file1 file2 EPR1 (WS-Addressing Endpoint Reference) EPR2
National Institute of Advanced Industrial Science and Technology
/grid ggf jp data gfs file1 file3 file2 file4 file1 file2
National Institute of Advanced Industrial Science and Technology
National Institute of Advanced Industrial Science and Technology
/grid ggf jp data gfs file1 file3 file2 file4 file1 file2 File System Directory Services File Access Services (GridFTP server, SRM), . . . file4 File System Services Client Library POSIX APIs ReplicaSet Services file4 file2 File System Server Applications, Users, . . . NFSv4, . . .
National Institute of Advanced Industrial Science and Technology
Specification (slide courtesy by Manuel Pereira at IBM Almaden Research Center)
National Institute of Advanced Industrial Science and Technology
National Institute of Advanced Industrial Science and Technology
National Institute of Advanced Industrial Science and Technology
Grid Grid Datafarm Datafarm architecture architecture Gfarm file system – Global dependable virtual file system
Dependable data sharing via shared file system among multiple
Parallel & distributed data computing
Associates Computational Grid with Data Grid High-speed data access, High-performance data computing
Applications Applications ATLAS, Belle, Astronomical data analysis, Lattice QCD, . . . GridFTP server, samba server, . . . Standardization of Grid file system Standardization of Grid file system – – GGF GFS GGF GFS-
WG File System Directory Service Specification Architecture of Grid File System Join Join gfs gfs-
wg@ggf.org mailing list, join weekly teleconference mailing list, join weekly teleconference Next GFS Next GFS-
WG F2F meeting will be at San Jose on August 6, 2004 Next GGF Next GGF-
12 will be on September 20-
23, 2004 at Brussels, Belgium Brussels, Belgium
GGF GFS GGF GFS-
WG https://forge.gridforum.org/projects/gfs https://forge.gridforum.org/projects/gfs-
wg/ / Grid Grid Datafarm Datafarm https://datafarm.apgrid.org/ https://datafarm.apgrid.org/