Browse by author
Lookup NU author(s): Dr James Smith, Professor Paul WatsonORCiD
It is argued that there is a significant class of pipelined large grain data flow computations whose wide area distribution and long running nature suggest a need for fault-tolerance, but for which existing approaches appear either costly or incomplete. An example, which motivated this paper, is the execution of queries over distributed databases. This paper presents an approach which exploits some limited input from the application layer in order to implement a low overhead recovery protocol for such data flow computations. Over a large range of possible data flow graphs, the protocol is shown to support tolerance of a single machine failure, per execution of the data flow computation, and in many cases to provide a greater degree of fault-tolerance.
Author(s): Smith J, Watson P
Publication type: Report
Publication status: Published
Series Title: School of Computing Science Technical Report Series
Year: 2004
Pages: 16
Print publication date: 01/04/2004
Source Publication Date: April 2004
Report Number: 836
Institution: School of Computing Science, University of Newcastle upon Tyne
Place Published: Newcastle upon Tyne
URL: http://www.cs.ncl.ac.uk/publications/trs/papers/836.pdf