Core Service Failures Co e Se ce a u es Results from TIC WG - PowerPoint PPT Presentation

Dec 17, 2022 •295 likes •348 views

Enabling Grids for E-sciencE g Core Service Failures Co e Se ce a u es Results from TIC WG Marcin Radecki Marcin Radecki 1st OAT Meeting, CERN 6-7 May 08 www.eu-egee.org EGEE-II INFSO-RI-031688 EGEE and gLite are registered trademarks

Enabling Grids for E-sciencE g Core Service Failures Co e Se ce a u es Results from TIC WG Marcin Radecki Marcin Radecki 1st OAT Meeting, CERN 6-7 May 08 www.eu-egee.org EGEE-II INFSO-RI-031688 EGEE and gLite are registered trademarks
Where is the problem? Enabling Grids for E-sciencE • Service availability monitoring is not only a function of site services – Example: � lcg-cr command, uses LFC and top level BDII which are not in the administrative t i th d i i t ti VO LFC domain of the site Reality Reality Picture seen Picture seen by monitoring site boundary 1st OAT meeting, 6-7 May 2008, CERN EGEE-II INFSO-RI-031688 2
By what means we tell the problem occurred? Enabling Grids for E-sciencE Three ways of determining a Core Service problem • 1. Error message coming from gLite 1. Improved SAM sensors deployed at regional SAM instance in CE 1. Improved SAM sensors deployed at regional SAM instance in CE 2. Ambiguous error messages GGUS #33813 3. Network stack problem – not enough information passed at the application layer 2. Last Core Service status in SAM DB – Can be used by monitoring tools to avoid raising alarms on OK sites – Solution may pose unacceptable load on SAM DB 3. Heuristics by which a Core Service failure is seen as many sites failing – Problem with services running on a performance edge f – Problem with bad firewall config etc. Loosing the error message as reliable information source we can only reach a kind of certainty level 1st OAT meeting, 6-7 May 2008, CERN EGEE-II INFSO-RI-031688 3
Conclusion: How to improve? Enabling Grids for E-sciencE • Long-term goals – Common error message format from all gLite components – Avoid design with dependencies on remote services or locate Avoid design with dependencies on remote services or locate them within the site boundaries if possible – if not possible improve reliability of Core Services • Short-term goals – limit impact of network on monitoring � locate monitoring closer to sites � locate monitoring closer to sites � integrate newtork monitoring results into service monitoring – put more „intelligence” into monitoring – in case of a service failure: failure: � compare last core service result in SAM DB � run the test twice � run additional check on the dependency service 1st OAT meeting, 6-7 May 2008, CERN EGEE-II INFSO-RI-031688 4
References Enabling Grids for E-sciencE • Full report on Core Services failures – http://galaxy.agh.edu.pl/~radecki/cod-cs-failures.pdf • TIC web page on Core Service TIC b C S i – http://goc.grid.sinica.edu.tw/gocwiki/Tools_Improvements_for_C OD/FailuresDueToCoreServices 1st OAT meeting, 6-7 May 2008, CERN EGEE-II INFSO-RI-031688 5

Recommend

Welcome Welcome Core: Core A Regional Destination Core: Core UL Core: Core Downtown

Welcome Welcome Core: Core A Regional Destination Core: Core UL Core: Core Downtown Core: Core Oil Center Center Core: Core Horse Farm Aspects: Bertrand Bertrand Aspects: Bertrand Bertrand Aspects: Bertrand Bertrand

200 views • 17 slides

Casey Rosenthal @caseyrosenthal Part One. SERVICE A SERVICE B SERVICE C SERVICE D SERVICE E

Casey Rosenthal @caseyrosenthal Part One. SERVICE A SERVICE B SERVICE C SERVICE D SERVICE E SERVICE F SERVICE G SERVICE A SERVICE B SERVICE C SERVICE D SERVICE E SERVICE F SERVICE G SERVICE A SERVICE B SERVICE C SERVICE D

909 views • 58 slides

Caching, Parallelism, Fault Tolerance Marco Serafini COMPSCI 532 Lectures 2-3 Memory Hierarchy

Caching, Parallelism, Fault Tolerance Marco Serafini COMPSCI 532 Lectures 2-3 Memory Hierarchy Multi-Core Processors Processor (chip) Processor (chip) Processor (chip) core core core core core core core core core core core

1.05k views • 53 slides

Protection and Restoration Introduction Fact: Networks fail. Types of failures: Path

SYSC 5801 Protection and Restoration Introduction Fact: Networks fail. Types of failures: Path failures Link failures Node failures Results: packet losses, waste of resources, and higher delay. What IGP does in the event

722 views • 35 slides

Investigation of Failures 49 CFR 192.617 192.617 Investigation of Failures Each operator

Investigation of Failures 49 CFR 192.617 192.617 Investigation of Failures Each operator shall establish procedures for analyzing accidents and failures, including the selection of samples of the failed facility or equipment for

952 views • 55 slides

ABSENCE: Usage-based Failure Detection in Mobile Networks Binh Nguyen , Zihui Ge, Jacobus Van der

ABSENCE: Usage-based Failure Detection in Mobile Networks Binh Nguyen , Zihui Ge, Jacobus Van der Merwe, He Yan, Jennifer Yates Mobicom 2015 1 Silent failures EPC core core RAN 2 Silent failures EPC core core RAN Silent failures:

130 views • 12 slides

Min inimizing tic ick bit ite exposure: tic ick bio iology, management and personal

Min inimizing tic ick bit ite exposure: tic ick bio iology, management and personal protection Arkansas Ticks Hard Ticks (Ixodidae) Lone star tick - Amblyomma americanum Gulf Coast tick - Amblyomma maculatum American dog tick -

679 views • 40 slides

Attitude s towar ds pr oduc ts with he alth c laims c laims -Synte tic r Synte tic r e por

Attitude s towar ds pr oduc ts with he alth c laims c laims -Synte tic r Synte tic r e por e por t fr t fr om F om F oc us gr oc us gr oup oup r e se ar c h- Re spo nde nts. Who wa s ta rg e te d a nd why? E L DE RL

1.49k views • 24 slides

Improving Heavy Vehicle Energy Productivity a a Mark Hammond, Chief Technical Officer - TIC

TODAYs TRUCKS Improving Heavy Vehicle Energy Productivity a a Mark Hammond, Chief Technical Officer - TIC Innovation X-Change - 10 th April 2018, Sydney 1 Who is TIC? Truck Industry Council (TIC) was formed in 2001 Industry organisation

377 views • 22 slides

TIC TAC TOE TIC TAC TOE DEVELOPMENT CSSE 120Rose Hulman Institute of Technology Viewing

TIC TAC TOE TIC TAC TOE DEVELOPMENT CSSE 120Rose Hulman Institute of Technology Viewing Feedback and Tasks g Do a Team update on previously-graded projects Look in Task view to see if there are grader comments (beginning with

416 views • 8 slides

PERFORMANCE FAULT TOLERANCE AVAILABILITY FEATURE VELOCITY PERFORMANCE FAULT TOLERANCE

PERFORMANCE FAULT TOLERANCE AVAILABILITY FEATURE VELOCITY PERFORMANCE FAULT TOLERANCE AVAILABILITY SERVICE A SERVICE B SERVICE C SERVICE D SERVICE E SERVICE F SERVICE G SERVICE A SERVICE B SERVICE C SERVICE D SERVICE E SERVICE F

790 views • 50 slides

Failures and Consensus Failures and Consensus Coordination Coordination If the solution to

Failures and Consensus Failures and Consensus Coordination Coordination If the solution to availability and scalability is to decentralize and replicate functions and data, how do we coordinate the nodes? data consistency update

717 views • 36 slides

MySQL High Availability Solutions Alex Poritskiy Percona The Five 9s of Availability

MySQL High Availability Solutions Alex Poritskiy Percona The Five 9s of Availability Clustering & disasters Geographical Redundancy power failures network failures Clustering Technologies hardware failures software failures

597 views • 47 slides

Availability models Dr. Jnos Tapolcai tapolcai@tmit.bme.hu http://opti.tmit.bme.hu/~tapolcai/

Availability models Dr. Jnos Tapolcai tapolcai@tmit.bme.hu http://opti.tmit.bme.hu/~tapolcai/ / 1 Failure sources HW failures Network element failures Type failures Manufacturing or design failures Turns out at the testing

553 views • 31 slides

Political Market Failures and Corruption November 2008 () Political Market Failures and

Political Market Failures and Corruption November 2008 () Political Market Failures and Corruption November 2008 1 / 9 When Does Political Competition Lead to Optimal Provision of Public Goods? Political economy models predict that policy in

105 views • 9 slides

Contention-Related Crash Failures Anas Durand LIP6, Sorbonne Universit, Paris April 1st,

Contention-Related Crash Failures Anas Durand LIP6, Sorbonne Universit, Paris April 1st, 2019 1 / 25 Anas Durand Contention-Related Crash Failures Set Agreement and Renaming in the Presence of Contention-Related Crash Failures SSS 2018

1.02k views • 52 slides

Welcome MEANINGFUL CONSUMER INVOLVEMENT PART 2 Goals for the Day State three reasons why

Trauma-Informed Care May 24, 2017 Policy Workshop Welcome MEANINGFUL CONSUMER INVOLVEMENT PART 2 Goals for the Day State three reasons why consumer involvement is a key tenet of trauma- informed care. Identify essential guidelines

324 views • 8 slides

Programming Abstraction in C++ Eric S. Roberts and Julie Zelenski Stanford University 2010

Introduction Maze Problem Two-player Games Programming Abstraction in C++ Eric S. Roberts and Julie Zelenski Stanford University 2010 Introduction Maze Problem Two-player Games Chapter 7. Backtracking Algorithms Introduction Maze Problem

1.1k views • 36 slides

& OTHER QUESTIONS: EQUAL A COFFEE MUG? WHAT IS TOPOLOGY? SARAH BLACKWELL WHAT IS TOPOLOGY?

WHY DOES A DONUT & OTHER QUESTIONS: EQUAL A COFFEE MUG? WHAT IS TOPOLOGY? SARAH BLACKWELL WHAT IS TOPOLOGY? WHAT DOES IT MEAN FOR TWO THINGS TO BE THE SAME? 1 = 7 (mod 6) color versus colour congruent or similar triangles

542 views • 30 slides

Moodle Plugin for Game Based Learning Earlier Attempt to Build a System Moodle Games Moodle

Moodle Plugin for Game Based Learning Kumar, P Introduction Related Work Moodle Plugin for Game Based Learning Earlier Attempt to Build a System Moodle Games Moodle Pankaj Kumar Proposed Games Tic-Tac-Toe Department of Computer

1.17k views • 48 slides

Faster Octave and Matlab Code Christian Himpe ( christian.himpe@wwu.de ) WWU Mnster Institute

Faster Octave and Matlab Code Christian Himpe ( christian.himpe@wwu.de ) WWU Mnster Institute for Computational and Applied Mathematics 23.10.2013 Overview 1 Octave 2 Acceleration 3 Profiling 4 Miscellaneous 5 MEX Code GNU Octave What OCTAVE

651 views • 28 slides

and Education Workforce. Past Recommendation: Quality Grants for Achieving and Sustaining Higher

Equitable Access Workgroup Recommendation A Refundable Tax Credit for the Early Care and Education Workforce. Past Recommendation: Quality Grants for Achieving and Sustaining Higher Ratings Double the number of children in high quality child

302 views • 15 slides

Solving problems by searching Part I: Uninformed Search

Solving problems by searching Part I: Uninformed Search Formula<ng a Problem Domain State: Some descrip<on of the current world state Ac<on:

771 views • 44 slides

A TRAUMA-INFORMED LEARNING COLLABORATIVE MOVING FROM THEORY TO PRACTICE Wednesday, June 7, 2018

A TRAUMA-INFORMED LEARNING COLLABORATIVE MOVING FROM THEORY TO PRACTICE Wednesday, June 7, 2018 1:00 - 2:00 PM central DISCLAIMER This activity is made possible by the Health Resources and Services Administration, Bureau of Primary Health

660 views • 37 slides

Core Service Failures Co e Se ce a u es Results from TIC WG - PowerPoint PPT Presentation

Enabling Grids for E-sciencE g Core Service Failures Co e Se ce a u es Results from TIC WG Marcin Radecki Marcin Radecki 1st OAT Meeting, CERN 6-7 May 08 www.eu-egee.org EGEE-II INFSO-RI-031688 EGEE and gLite are registered trademarks

Welcome Welcome Core: Core A Regional Destination Core: Core UL Core: Core Downtown

Casey Rosenthal @caseyrosenthal Part One. SERVICE A SERVICE B SERVICE C SERVICE D SERVICE E

Caching, Parallelism, Fault Tolerance Marco Serafini COMPSCI 532 Lectures 2-3 Memory Hierarchy

Protection and Restoration Introduction Fact: Networks fail. Types of failures: Path

Investigation of Failures 49 CFR 192.617 192.617 Investigation of Failures Each operator

ABSENCE: Usage-based Failure Detection in Mobile Networks Binh Nguyen , Zihui Ge, Jacobus Van der

Min inimizing tic ick bit ite exposure: tic ick bio iology, management and personal

Attitude s towar ds pr oduc ts with he alth c laims c laims -Synte tic r Synte tic r e por

Improving Heavy Vehicle Energy Productivity a a Mark Hammond, Chief Technical Officer - TIC

TIC TAC TOE TIC TAC TOE DEVELOPMENT CSSE 120Rose Hulman Institute of Technology Viewing

PERFORMANCE FAULT TOLERANCE AVAILABILITY FEATURE VELOCITY PERFORMANCE FAULT TOLERANCE

Failures and Consensus Failures and Consensus Coordination Coordination If the solution to

MySQL High Availability Solutions Alex Poritskiy Percona The Five 9s of Availability

Availability models Dr. Jnos Tapolcai tapolcai@tmit.bme.hu http://opti.tmit.bme.hu/~tapolcai/

Political Market Failures and Corruption November 2008 () Political Market Failures and

Contention-Related Crash Failures Anas Durand LIP6, Sorbonne Universit, Paris April 1st,

Welcome MEANINGFUL CONSUMER INVOLVEMENT PART 2 Goals for the Day State three reasons why

Programming Abstraction in C++ Eric S. Roberts and Julie Zelenski Stanford University 2010

& OTHER QUESTIONS: EQUAL A COFFEE MUG? WHAT IS TOPOLOGY? SARAH BLACKWELL WHAT IS TOPOLOGY?

Moodle Plugin for Game Based Learning Earlier Attempt to Build a System Moodle Games Moodle

Faster Octave and Matlab Code Christian Himpe ( christian.himpe@wwu.de ) WWU Mnster Institute

and Education Workforce. Past Recommendation: Quality Grants for Achieving and Sustaining Higher

Solving problems by searching Part I: Uninformed Search

A TRAUMA-INFORMED LEARNING COLLABORATIVE MOVING FROM THEORY TO PRACTICE Wednesday, June 7, 2018

Sambuz

Useful Links

Newsletter

Mail Us

Core Service Failures Co e Se ce a u es Results from TIC WG - PowerPoint PPT Presentation

Enabling Grids for E-sciencE g Core Service Failures Co e Se ce a u es Results from TIC WG Marcin Radecki Marcin Radecki 1st OAT Meeting, CERN 6-7 May 08 www.eu-egee.org EGEE-II INFSO-RI-031688 EGEE and gLite are registered trademarks

Welcome Welcome Core: Core A Regional Destination Core: Core UL Core: Core Downtown

Casey Rosenthal @caseyrosenthal Part One. SERVICE A SERVICE B SERVICE C SERVICE D SERVICE E

Caching, Parallelism, Fault Tolerance Marco Serafini COMPSCI 532 Lectures 2-3 Memory Hierarchy

Protection and Restoration Introduction Fact: Networks fail. Types of failures: Path

Investigation of Failures 49 CFR 192.617 192.617 Investigation of Failures Each operator

ABSENCE: Usage-based Failure Detection in Mobile Networks Binh Nguyen , Zihui Ge, Jacobus Van der

Min inimizing tic ick bit ite exposure: tic ick bio iology, management and personal

Attitude s towar ds pr oduc ts with he alth c laims c laims -Synte tic r Synte tic r e por

Improving Heavy Vehicle Energy Productivity a a Mark Hammond, Chief Technical Officer - TIC

TIC TAC TOE TIC TAC TOE DEVELOPMENT CSSE 120Rose Hulman Institute of Technology Viewing

PERFORMANCE FAULT TOLERANCE AVAILABILITY FEATURE VELOCITY PERFORMANCE FAULT TOLERANCE

Failures and Consensus Failures and Consensus Coordination Coordination If the solution to

MySQL High Availability Solutions Alex Poritskiy Percona The Five 9s of Availability

Availability models Dr. Jnos Tapolcai tapolcai@tmit.bme.hu http://opti.tmit.bme.hu/~tapolcai/

Political Market Failures and Corruption November 2008 () Political Market Failures and

Contention-Related Crash Failures Anas Durand LIP6, Sorbonne Universit, Paris April 1st,

Welcome MEANINGFUL CONSUMER INVOLVEMENT PART 2 Goals for the Day State three reasons why

Programming Abstraction in C++ Eric S. Roberts and Julie Zelenski Stanford University 2010

&amp; OTHER QUESTIONS: EQUAL A COFFEE MUG? WHAT IS TOPOLOGY? SARAH BLACKWELL WHAT IS TOPOLOGY?

Moodle Plugin for Game Based Learning Earlier Attempt to Build a System Moodle Games Moodle

Faster Octave and Matlab Code Christian Himpe ( christian.himpe@wwu.de ) WWU Mnster Institute

and Education Workforce. Past Recommendation: Quality Grants for Achieving and Sustaining Higher

Solving problems by searching Part I: Uninformed Search

A TRAUMA-INFORMED LEARNING COLLABORATIVE MOVING FROM THEORY TO PRACTICE Wednesday, June 7, 2018

Sambuz

Useful Links

Newsletter

Mail Us

& OTHER QUESTIONS: EQUAL A COFFEE MUG? WHAT IS TOPOLOGY? SARAH BLACKWELL WHAT IS TOPOLOGY?