Browse by author
Lookup NU author(s): Rem Gensh, Dr Ashur Rafiev, Emeritus Professor Alexander RomanovskyORCiD, Dr Fei Xia, Professor Alex Yakovlev
This is the final published version of a conference proceedings (inc. abstract) that has been published in its final definitive form by IEEE Computer Society, 2017.
For re-use rights please refer to the publisher's terms and conditions.
The optimality and maintainability of fault tolerance mechanisms in a computer system has typically not been a major topic of concern, mostly because fault tolerance is a non-functional system requirement. This paper proposes a Holistic Fault Tolerance architecture, based on a centralised fault tolerance management, with related functionality distributed across the entire system. The most suitable error detection and error recovery strategies for a given application are chosen by a special crosscutting controller depending on error rates, system performance and resource utilisation requirements. We discuss the motivation for introducing this holistic fault tolerance architecture and reason about its benefits from the point of view of optimal system operation and improved maintainability. The advantages and possible implementation challenges of the proposed approach are demonstrated by a real-world application
Author(s): Gensh R, Rafiev A, Romanovsky A, Garcia A, Xia F, Yakovlev A
Publication type: Conference Proceedings (inc. Abstract)
Publication status: Published
Conference Name: 18th IEEE International Symposium on High-Assurance Systems Engineering (HASE 2017)
Year of Conference: 2017
Pages: 5-8
Online publication date: 27/04/2017
Acceptance date: 02/04/2016
Date deposited: 27/06/2017
Publisher: IEEE Computer Society
URL: https://doi.org/10.1109/HASE.2017.13
DOI: 10.1109/HASE.2017.13
Library holdings: Search Newcastle University Library for this item
ISBN: 9781509046355