Toggle Main Menu Toggle Search

Open Access padlockePrints

Handling Overlapping Asymmetric Data Sets—A Twice Penalized P-Spline Approach

Lookup NU author(s): Matt McTeer, Professor Quentin AnsteeORCiD, Professor Paolo MissierORCiD

Downloads


Licence

This work is licensed under a Creative Commons Attribution 4.0 International License (CC BY 4.0).


Abstract

© 2024 by the authors.Aims: Overlapping asymmetric data sets are where a large cohort of observations have a small amount of information recorded, and within this group there exists a smaller cohort which have extensive further information available. Missing imputation is unwise if cohort size differs substantially; therefore, we aim to develop a way of modelling the smaller cohort whilst considering the larger. Methods: Through considering traditionally once penalized P-Spline approximations, we create a second penalty term through observing discrepancies in the marginal value of covariates that exist in both cohorts. Our now twice penalized P-Spline is designed to firstly prevent over/under-fitting of the smaller cohort and secondly to consider the larger cohort. Results: Through a series of data simulations, penalty parameter tunings, and model adaptations, our twice penalized model offers up to a 58% and 46% improvement in model fit upon a continuous and binary response, respectively, against existing B-Spline and once penalized P-Spline methods. Applying our model to an individual’s risk of developing steatohepatitis, we report an over 65% improvement over existing methods. Conclusions: We propose a twice penalized P-Spline method which can vastly improve the model fit of overlapping asymmetric data sets upon a common predictive endpoint, without the need for missing data imputation.


Publication metadata

Author(s): McTeer M, Henderson R, Anstee QM, Missier P

Publication type: Article

Publication status: Published

Journal: Mathematics

Year: 2024

Volume: 12

Issue: 5

Online publication date: 05/03/2024

Acceptance date: 03/03/2024

Date deposited: 26/03/2024

ISSN (electronic): 2227-7390

Publisher: Multidisciplinary Digital Publishing Institute (MDPI)

URL: https://doi.org/10.3390/math12050777

DOI: 10.3390/math12050777

Data Access Statement: Data underpinning this study are not publicly available. The European NAFLD Registry protocol has been published in [25], including details of sample handing and processing, and the network of recruitment sites. Patient level data will not be made available due to the various constraints imposed by ethics panels across all the different countries from which patients were recruited and the need to maintain patient confidentiality. The point of contact for any inquiries regarding the European NAFLD Registry is Quentin M. Anstee via email: NAFLD.Registry@newcastle.ac.uk


Altmetrics

Altmetrics provided by Altmetric


Funding

Funder referenceFunder name
European Union’s Horizon 2020

Share