SLIDE 1
PROBLEM
- Reliable distributed systems must handle crash failures
- Application crashes, hardware failures, etc.
- Detecting failures can take longer than recovery
- Building a fast, reliable and unobtrusive failure detector is
challenging
- Distributed systems are built upon asynchronous
communication environment
- Existing failure detection techniques (e.g., end-to-end