Investigation of prediction accuracy and the impact of sample size, ancestry and tissue in transcriptome wide association studies

Fryett, JJ; Morris, AP; Cordell, HJ

doi:10.1002/gepi.22290

Investigation of prediction accuracy and the impact of sample size, ancestry and tissue in transcriptome wide association studies

Lookup NU author(s): James Fryett, Professor Heather Cordell ORCiD

Downloads

Published version [.pdf]

Licence

This work is licensed under a Creative Commons Attribution 4.0 International License (CC BY 4.0).

Abstract

In transcriptome-wide association studies (TWAS), gene expression values are predicted using genotype data, and tested for association with a phenotype. The power of this approach to detect associations relies, at least in part, on the accuracy of the prediction. Here we compare the prediction accuracy of six different methods - LASSO, Ridge regression, Elastic net, Best Linear Unbiased Predictor, Bayesian Sparse Linear Mixed Model and Random Forests - by performing cross-validation using data from the Geuvadis Project. We also examine prediction accuracy (a) at different sample sizes, (b) when ancestry of the prediction model training and testing populations is different, and (c) when the tissue used to train the model is different from the tissue to be predicted. We find that, for most genes, expression cannot be accurately predicted, but in general sparse statistical models tend to outperform polygenic models at prediction. Average prediction accuracy is reduced when the model training set size is reduced or when predicting across ancestries, and is marginally reduced when predicting across tissues. We conclude that using sparse statistical models and development of large reference panels across multiple ethnicities and tissues will lead to better prediction of gene expression, and thus may improve TWAS power.

Publication metadata

Author(s): Fryett JJ, Morris AP, Cordell HJ

Publication type: Article

Publication status: Published

Journal: Genetic Epidemiology

Year: 2020

Volume: 44

Issue: 5

Pages: 425-441

Print publication date: 01/07/2020

Online publication date: 19/03/2020

Acceptance date: 06/03/2020

Date deposited: 07/03/2020

ISSN (print): 0741-0395

ISSN (electronic): 1098-2272

Publisher: John Wiley & Sons, Inc.

URL: https://doi.org/10.1002/gepi.22290

DOI: 10.1002/gepi.22290

Altmetrics

Altmetrics provided by Altmetric

ePrints

Investigation of prediction accuracy and the impact of sample size, ancestry and tissue in transcriptome wide association studies

Downloads

Licence

Abstract

Publication metadata

Altmetrics

Share