Development of e-Science Application Framework Eric Yen, Simon C. - - PowerPoint PPT Presentation
Development of e-Science Application Framework Eric Yen, Simon C. - - PowerPoint PPT Presentation
Development of e-Science Application Framework Eric Yen, Simon C. Lin & Hurng-Chun Lee ASGC Academia Sinica Taiwan 24 Jan. 2006 LCG and EGEE Grid Sites in the LCG Asia-Pacific Region 4 LCG sites in Taiwan IHEP Beijing IHEP Beijing
last update 01/11/06 04:29 AM
LCG
LCG and EGEE Grid Sites in the Asia-Pacific Region
4 LCG sites in Taiwan 12 LCG sites in Asia/ Pacific Academia Sinica Grid Computing Centre
- - Tier-1 Centre for the LHC
Computing Grid (LCG)
- - Asian Operations Centre
for LCG and EGEE
- - Coordinator of the
Asia/Pacific Federation in EGEE
LCG site
- ther site
PAEC NCP Islamabad IHEP Beijing KNU Daegu
- Univ. Melbourne
GOG Singapore KEK Tsukuba ICEPP Tokyo Taipei - ASGC, IPAS NTU, NCU VECC Kolkata Tata Inst. Mumbai
AP Federation now shares the e-Infrastructure with WLCG
LCG site
- ther site
PAEC NCP Islamabad IHEP Beijing KNU Daegu
- Univ. Melbourne
GOG Singapore KEK Tsukuba ICEPP Tokyo Taipei - ASGC, IPAS NTU, NCU VECC Kolkata Tata Inst. Mumbai
SC Workshop, Taipei, 30-31 Oct 2005 Jason Shih, ASGC
Perspectives of ASGC, as Tier-1
Help trouble shooting of Tier-2s on various of
services/functionality before involving in SC (IS, SRM, SE etc)
Reaching persistent data transfer rate Increasing reliability and availability of T2
computing facilities, sort of stress testing
Close communication with T2s, T0 and T1s Gaining experiences before LHC experiments
begins
2005/12/16 Simon C. Lin / ASGC
Plan of AP Federation
VO Services: deployed from April 2005 in Taiwan (APROC) LCG: ATLAS, CMS BioInformatics, BioMed Geant4 APeSci : for collaboration general e-Science services in Asia
Pacific Areas
APDG: for testing and testbed only TWGRID: established for local services in Taiwan Potential Applications LCG, Belle, nano, biomed, digital archive, earthquake, GeoGrid, astronomy, Atmospheric Science
SC Workshop, Taipei, 30-31 Oct 2005 Jason Shih, ASGC
Plans for T1/T2
T1-T2 test plan
what services/functionality need to test recommendation for T2 sites, checklist What have to be done before join SC Communication methods, and how to improve if needed Scheduling of the plans, candidates of sites Timeline for the testing SRM + FTS functionality testing Network performance tuning (jumbo framing!?)
T1 expansion plan
Computing power/storage storage management, e.g. CASTOR2 + SRM Network improvement
Enabling Grids for E-sciencE
INFSO-RI-508833
Enabling Grids for E-SciencE
- >200 sites
- >15 000 CPUs (with peaks >20 000 CPUs)
- ~14 000 jobs successfully completed per day
- 20 Virtual Organisations
- >800 registered users, representing 1000s of scientists
A Worldwide Science Grid
EGEE Asia Pacific Services by Taiwan
Production CA Services AP CIC/ROC VO Support Pre-production site User Support MW and technology development Application Development Education and Training Promotion and Outreach Scientific Linux Mirroring and Services
APROC
Taiwan acts as Asia Pacific CIC and ROC in
EGEE
APROC established in early 2005
Supports EGEE sites in Asia Pacific Australia, Japan, India, Korea, Singapore,
Taiwan
8 sites, 6 countries
Provides Global and Regional services
APROC EGEE Wide Services
GStat
Monitoring Application to check health of Grid Information System http://goc.grid.sinica.edu.tw/gstat/
GGUS Search
Performs Google search targeted at key Grid knowledge bases
GOCWiki
Hosted Wiki for User and Operations related FAQ and Guides
APROC EGEE Regional Services
Site Registration and Certification
Monitoring and Daily Operations Problem diagnosis, tracking and troubleshooting
Middleware certification test-bed
New release testing, supplemental documentation
Release support and coordination
updates, upgrades and installation
Security coordination
With Operational Security Coordination Team (OSCT)
VO Services
CA for collaborators in Asia-Pacific VOMS, LFC, RB, BDII support for new VO in region
Support Services
Web portal and documentation User and Operations Ticketing System
Education and Training
Note: gLite and the development of EGEE were introduced in all the events which are run by ASGC Event Date Attendant Venue China Grid LCG Training 16-18 May 2004 40 Beijing, China ISGC 2004 Tutorial 26 July 2004 50 AS, Taiwan Grid Workshop 16-18 Aug. 2004 50 Shang-Dong, China NTHU 22-23 Dec. 2004 110 Shin-Chu, Taiwan NCKU 9-10 Mar. 2005 80 Tainan, Taiwan ISGC 2005 Tutorial 25 Apr. 2005 80 AS, Taiwan Tung-Hai Univ. June 2005 100 Tai-chung, Taiwan EGEE Workshop
- Aug. 2005
80 20th APAN, Taiwan
24 Jan. 2006 21st APAN, Japan
Service Challenge
2005/11/20 Min-Hong Tsai / ASGC
GOC APEL Accounting excluding non-LHC VO (Biomed)
ASGC Usage
13
2005/11/20 Min-Hong Tsai / ASGC
SRM Services
Increase to four pool nodes for more parallel Gridftp transfers SRMCP’s stream and TCP buffer option did not function Work around by configuring SRM server And transfer rate can reach 80MB/s, the average is 50MB/s.
14
Atlas SC3 DDM - ASGC VOBOX
Average throughput per day (01/18, 2006)
Latest update can be found at: http://atlas-ddm-monitoring.web.cern.ch/atlas-ddm-monitoring/all.php
Total cumulative data transferred (01/18, 2006)
Tasks/deliverables
Batch services
Deliver prod quality batch services Frontline consultancy and support for batch scheduler Customized tool suites for secure and consistent management
Manage hierarchical storage
Prod quality DM services, including planning, procurement, and
- peration both for SW and HW
Meet data transfer rate requirement declared in MoU OP experiences/procedures sharing with Tier-1s, 2s HA + L/B
Middleware support
Frontline consultancy and support for other tiers in tweaking
configurations, trouble shooting, and maintenance procedures.
Certification testing for pre-released tag of LCG Installation guide/note if lack from official release Training courses
ARDA
Goal: Coordinate to prototype distributed analysis
systems for the LHC experiments using a grid.
ARDA-ASGC Collaboration: since mid 2003
Building push/pull model prototype(2003) Integrate Atlas/LHCb analysis tool to gLite(2004) Provide first integration testing and usage document on Atlas tools:Dial
(2004)
CMS monitoring system development (2005)
Monitoring system to integrate RGMA & MonaLisa ARDA/CMS Analysis Prototype: Dashboard
ARDA Taiwan Team: http://lcg.web.cern.ch/LCG/activities/arda/team.html 4 FTEs participated: 2 FTEs at CERN, the other 2 are in Taiwan
24 Jan. 2006 21st APAN, Japan
mpiBLAST-g2
mpiBLAST-g2 ASGC, Taiwan and PRAGMA http://bits.sinica.edu.tw/mpiBlast/index_en.php
A GT2-enabled parallel BLAST runs on Grid
GT2 GASSCOPY API
MPICH-g2
The enhancement from mpiBLAST by ASGC
- Performing cross cluster scheme of job
execution
- Performing remote database sharing
- Help Tools for
– database replication – automatic resource specification and job submission (with static resource table) – multi-query job splitting and result merging
- Close link with mpiBLAST development
team
– The new patches of mpiBLAST can be quickly applied in mpiBLAST
- g2
28 April, 2005 ISGC 2005, Taiwan
SC2004 mpiBLAST-g2 demonstration
KISTI
mpiBLAST-g2 current deployment
- - From PRAGMA GOC http://pragma-goc.rocksclusters.org
mpiBLAST-g2 Performance Evaluation (perfect case)
Elapsed time Speedu p
Database: est_human ~ 3.5 GBytes Queries: 441 test sequences ~ 300 KBytes
- Overall speedup is approximately linear
— Searching + Merging — BioSeq fetching — Overall
mpiBLAST-g2 Performance Evaluation (worse case)
Elapsed time Speedu p
Database: drosophila NT ~ 122 MBytes Queries: 441 test sequences ~ 300 KBytes
- The overall speedup is limited by the unscalable
BioSeq fetching — Searching + Merging — BioSeq fetching — Overall
Summary
Two grid-enabled BLAST implementations (mpiBLAST
- g2 and DIANE-
BLAST) were introduced for efficient handling the BLAST jobs on the Grid
Both implementations are based on the Master-Worker model for distributing BLAST jobs on the Grid
The mpiBLAST
- g2 has good scalability and speedup in some cases
Require the fault-tolerance MPI implementation for error recovery In the unscalable cases, BioSeq fetching is the bottleneck
DIANE-BLAST provides flexible mechanism for error recovery
Any master-worker workflow can be easily plugged into this framework The job thread control should be improved to achieving the good performance
and scalability
25
24 Jan. 2006 21st APAN, Japan
DataGrid for Digital Archives
Data Grid for Digital Archives
27
Long-Term Archives for AS NDAP Contents
Project Totel Files Total Size (MB) 珍藏歷史文物 3,353 4,495,853.22 管理員 1,095 981.33 台灣貝類相 3,878 21,869.78 近代中國歷史地圖與遙測影像資訊典藏計畫 33,671 364,554.69 語言典藏計畫 1 7.05 技術研發分項計畫 39,315 98,246.45 魚類資料庫 32,070 4,199.32 台史所 34,040 44,157.20 台灣本土植物 31,027 1,578,654.76 近代外交經濟重要檔案計畫 603,997 20,601,428.38 台灣原住民 601,715 1,516,242.05 1,384,162 28,726,194.23
Table I. Size of Digital Contents of NDAP 2002 2003 2004 2005 Total Total Data Size (GB) 22,810.00 38,550.00 63,480.00 70,216.02 195,056.02 AS Production (GB) 22,800.68 31,622.17 47,430.79 55,757.47 157,611.11 Table II. Details of NDAP Production in 2005 Metadata Size(MB) Metadata Records Data Size(GB) All Inst. 56,204.40 1,035,538.00 70,216.02 AS 53,434.13 763,431.00 55,757.47
24 Jan. 2006 21st APAN, Japan
Atmospheric Sciences
Linux
Atmosphere Databank Architecture
NCU/databank NTNU/dms
MCAT / srb001 Users
ASCC/srb002 ASCC/lcg00104(TB) ASCC/lcg00105(TB) NTU/monsoon(TB) NTU/dbar_rs1, dbar_rs_2 ASCC/gis252(TB) ASCC/gis212(TB) windows Command Lines Applications Portal/Web Client
- Use LAS (Live Access Server) to access the dataset from the
SRB System, and integrate with Google Earth
Industrial Program
- NSC-Quanta Collaboration
– To help Quanta Blade System have best performance for HPC and Grid Computing – Quanta is the largest Notebook manufacturer in the world – Participants: AS, NTU, NCTS, NTHU, NCHC – Scientific Research Disciplines: Material Science, Nano-Technology, Computational Chemistry, Bioinformatics, Engineering, etc. – Performance Tuning, Grid Benchmarking
24 Jan. 2006 21st APAN, Japan
e-Science Application Framework
Grid Application Architecture
Layered architecture to
improve the usability of Grid applications
Two frameworks built on
the top of current Grid middleware to
provide friendly graphic user
interface
handle Grid application logic in
an efficient way
reduce the efforts of
application gridification
Grid Application Logic Framework
An integration of the tools for handling different
Grid application computing models (i.e. application logics)
Through which different application logics are
handled in an efficient way on the Grid
stability scalability performance
Grid Application User Interface Framework
A container-like framework in which one can easily
build an application oriented graphic user interface to interact with the application logics running on the Grid
A set of API for graphic Grid application user
interface development
Grid Application Portal (GAPortal)
Grid Application Portal (cont.)
A prototype of the Grid application user interface framework
based on the NRPGM BioPortal
A web-based portal of Grid applications Grid-enabled components
Grid proxy management proxy delegation (from web-browser side to the Grid worker node) proxy renewal (integration with MyProxy) Grid platform adapter LCG/EGEE platform GT
- based Grid
DIANE & GANGA
DIANE is a “skeleton” of efficient handling master-worker type
application logic on distributed computing environment
GANGA is a light-weight framework aiming to distribute HEP
analysis tasks on heterogeneous computing environments
DIANE + GANGA = a framework of efficient handling master-
worker type application on the Grid
A collaboration with LCG-ARDA team at CERN
DIANE/GANGA framework
Key features
Automatic load balancing by task pull model Worker health detection User defined fail recovery mechanism
Summary
Production Grid Application Environment has been
available
~95% system reliability
Test job successful rate > 90%
Application Driven is the best policy for e-Science
infrastructure development.
The success of e-Science lies on the worldwide