Browse by author
Lookup NU author(s): Dr Jacek CalaORCiD, Professor Paolo MissierORCiD
This is the authors' accepted manuscript of a conference proceedings (inc. abstract) that has been published in its final definitive form by Springer, 2018.
For re-use rights please refer to the publisher's terms and conditions.
Many resource-intensive analytics processes evolve over time following new versions of the reference datasets and software dependen- cies they use. We focus on scenarios in which any version change has the potential to affect many outcomes, as is the case for instance in high throughput genomics where the same process is used to analyse large cohorts of patient genomes, or cases. As any version change is unlikely to affect the entire population, an efficient strategy for restoring the cur- rency of the outcomes requires first to identify the scope of a change, i.e., the subset of affected data products. In this paper we describe a generic and reusable provenance-based approach to address this scope discovery problem. It applies to a scenario where the process consists of complex hierarchical components, where different input cases are processed using different version configurations of each component, and where separate provenance traces are collected for the executions of each of the com- ponents. We show how a new data structure, called a restart tree, is computed and exploited to manage the change scope discovery problem.
Author(s): Cala J, Missier P
Editor(s): Belhajjame K; Gehani A; Alper P
Publication type: Conference Proceedings (inc. Abstract)
Publication status: Published
Conference Name: 7th International Provenance and Annotation Workshop, IPAW 2018
Year of Conference: 2018
Pages: 3-15
Print publication date: 06/09/2018
Online publication date: 06/09/2018
Acceptance date: 09/01/2018
Date deposited: 02/06/2018
ISSN: 0302-9743
Publisher: Springer
URL: https://doi.org/10.1007/978-3-319-98379-0_1
DOI: 10.1007/978-3-319-98379-0_1
Notes: From IPAW 2018: Provenance and Annotation of Data and Processes
Library holdings: Search Newcastle University Library for this item
Series Title: Lecture Notes in Computer Science (LNCS)
ISBN: 9783319983783