EGEE Asia Pacific Regional Operation Center Min-Hong Tsai ASGC - - PowerPoint PPT Presentation

egee asia pacific regional operation center
SMART_READER_LITE
LIVE PREVIEW

EGEE Asia Pacific Regional Operation Center Min-Hong Tsai ASGC - - PowerPoint PPT Presentation

Enabling Grids for E-sciencE EGEE Asia Pacific Regional Operation Center Min-Hong Tsai ASGC ISGC 2007 March 29, Taipei http://www.eu-egee.org/ http://www.twgrid.org/aproc/ www.eu-egee.org EGEE-II INFSO-RI-031688 Agenda Enabling Grids for


slide-1
SLIDE 1

EGEE-II INFSO-RI-031688

Enabling Grids for E-sciencE

www.eu-egee.org

EGEE Asia Pacific Regional Operation Center

Min-Hong Tsai

ASGC ISGC 2007 March 29, Taipei

http://www.eu-egee.org/

http://www.twgrid.org/aproc/

slide-2
SLIDE 2

2

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Agenda

  • APROC Introduction
  • Status
  • Joining EGEE
slide-3
SLIDE 3

3

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

APROC Introduction I

  • APROC Mission

– Provide deployment support facilitating Grid expansion – Maximize the availability of Grid services

  • Supports EGEE sites in Asia Pacific since April 2005

– 20 production sites, 8 countries – 9 sites joined EGEE since last ISGC: recently HKU, KISTI – 3 sites in certification process

  • Philippines: Advanced Science and Technology Institute
  • Korea: KONKUK
  • Mongolia: (MAS IPT) Mongolian Academy of Sciences
slide-4
SLIDE 4

4

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

APROC Services

  • Site Deployment Support

– Registration – Installation – Certification

  • Operations Support

– Monitoring, troubleshooting – Problem tracking – Software updates and security coordination – Regional VO services - VOMS and LFC

  • ASGCCA CA Service

– provide certificates for AP EGEE/LCG sites without domestic CA.

  • EGEE Operations

– CIC-on-duty: EGEE global operations – Monitoring tool development: GStat and GGUS Search – TPM: Front line user support (Q4 2006) – OSCT: Incident Response duty (Dec 2006)

slide-5
SLIDE 5

5

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

APROC Usage

  • New Active VOs: Belle and TWGrid
  • This year: 200 KSI2K Years
  • Last year: 41 KSI2K Years
slide-6
SLIDE 6

6

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

APROC Availability

  • Daily snapshots of SAM results of

region

– Availability increased to 70-80% range from 60-70% a half year ago

  • CT mostly replica management

failure

– Sensitive to Information System access/performance – Request that data management clients can failover to secondary BDII

  • Network Issues

– Often the root cause of CT, JL and JS – Network congested site set up local top- level BDII

  • Increase default update timeout and

breath time

20 40 60 80 100 2005-04 2005-06 2005-08 2005-10 2005-12 2006-02 2006-04 2006-06 2006-08 2006-10 2006-12 2007-02 avail avail reliab 0% 20% 40% 60% 80% 100% 2005-04 2005-07 2005-10 2006-01 2006-04 2006-07 2006-10 2007-01 SD CT JL JS ER OK

2.4 2.6 2.7 3.0

Remove Slow BDII JS from SSH upgrade LHC OPN Hardware Failure

slide-7
SLIDE 7

7

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Monitoring and Notification

  • Planned integration of Asset DB
  • Nagios plugins developed
  • CE
  • LFC
  • VOMS
  • Storage
  • IT services
  • OS
  • Notification via Email

– SMS transmission device currently being tested

slide-8
SLIDE 8

8

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Nagios Regional Monitoring

  • Tests run at faster frequency

– 5-10 minutes – Faster response to faults

  • Add customized plugins

– Run low level tests for faster isolation of problems – Tests may not be available in global monitoring tools yet – Ability to run tests on the target host via NRPE

  • Management Interface

– Acknowledgement – On demand execution of tests – Historical availability – Test dependencies

http://lists.grid.sinica.edu.tw/apwiki/Nagios_monitoring_-_APROC_sites http://lists.grid.sinica.edu.tw/apwiki/Nagios_Plugins_Description

slide-9
SLIDE 9

9

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Plans

  • Increase monitoring coverage

– Information System – Network performance monitoring

  • available/achievable bandwidth
  • Full mesh monitoring
  • Improve troubleshooting tools

– http://lists.grid.sinica.edu.tw/apwiki/APROC/Troubleshooting_Guides – FAQ system – Service diagnostic scripts

  • Integration of ticketing system with GGUS
  • Training

– EGEE Induction at GridAsia 2007. June 5, 2007 Singapore.

slide-10
SLIDE 10

10

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Joining EGEE Infrastructure

  • Contact APROC
  • If domestic CA is not available

– Register as a ASGCCA RA during ISGC

  • Dedicated an administrator with Unix experience
  • Allocate servers

– 5: UI, CE, WN, DPM, MON – 3: CE/WN, MON, DPM

  • UI can be installed in user account
  • Consider Virtual Machine for MON
  • Study user guide and installation manual
  • Send configuration file to APROC for review before deployment
  • Complete registration and certification process
slide-11
SLIDE 11

11

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Long Term Operations

  • Establish domestic CA if none exists
  • Increase availability and resource levels
  • Establish domestic operations structure

– Operations procedures – Tools: monitoring and notification, ticketing system – User and administrator support

  • Training for administrators and users
  • Collaborate with APROC in Regional operations
  • Q: Need for regional experimental Grid?
slide-12
SLIDE 12

12

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Issues in AsiaPacific

  • No regional projects to promote collaboration in EGEE
  • Network bandwidth

– Low capacity: regional and last mile – Usage based billing

  • Need for training

– Training for trainers – Application Training – E-Learning material

  • However EGEE already provides

– M/W development and integration – Operations structure, coordination and support – Close to 200 user communities

slide-13
SLIDE 13

13

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Summary

  • APROC Provides EGEE operations support services to AsiaPacific
  • EGEE sites in region has grown to 20 sites with utilization of 200

ksi2k years

  • We have also improved availability but still is significant room for

improvement

  • We look forward to more site joining EGEE in the region and eht

possibility for further collaboration

– Applications – Operations

  • Feedback on what we can improve
slide-14
SLIDE 14

14

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Thanks You for Your Attention!

  • Questions?

– roc@lists.grid.sinica.edu.tw – http://www.twgrid.org/aproc/

  • Thanks to efforts from:

– T1/APROC Team

  • Jason Shih

Dave Wei

  • Felix Lee

Joanna Huang

  • Aries Hong

Hung-Che Jen

  • Jinny Chien

Shu-Ting Liao

  • Yi-Ping Wu

Min Tsai