Browse by author
Lookup NU author(s): Dr Paul EzhilchelvanORCiD, Emeritus Professor Santosh Shrivastava
There is a large category of distributed systems that use component (e.g., process, object) replication for availability. A large part of the effort involved in crafting these systems lies in maintaining the cardinality of the set of replicas. For example in primary-secondary replication, in the event that one component crashes, it is necessary to create a replacement on some operational machine and hence maintain the cardinality of the set of components to at least two. In systems where failed components are recreated on other machines, the internal composition of the set of a component group (referred to as a unit) may be seen to `walk? over a number of machines during normal system operation. We are interested in the problem of recovery after a total failure of a unit ( a disaster ); that is, recovery after all or large number of unit members have failed or partitioned such that the unit can no longer function normally. Disaster recovery requires that once sufficient members belonging to the unit have restarted or got reconnected, the unit should resume functioning without further delay. A particular requirement is that only the components belonging to the last unit configuration be part of the post-disaster unit configuration. This paper presents an algorithm which a component can execute to determine whether it belonged to the last unit configuration. The algorithm has been developed in the context of an asynchronous distributed system where message delays are unknown and therefore a slow component can appear as crashed or disconnected.
Author(s): Black D, Ezhilchelvan PD, Shrivastava SK
Publication type: Report
Publication status: Published
Series Title: Department of Computing Science Technical Report Series
Year: 1997
Pages: 19
Print publication date: 01/01/1997
Source Publication Date: 1997
Report Number: 602
Institution: Department of Computing Science, University of Newcastle upon Tyne
Place Published: Newcastle upon Tyne
URL: http://www.cs.ncl.ac.uk/publications/trs/papers/602.pdf