TY - JOUR
T1 - Ordered multinomial regression for genetic association analysis of ordinal phenotypes at Biobank scale
AU - German, Christopher A.
AU - Sinsheimer, Janet S.
AU - Klimentidis, Yann C.
AU - Zhou, Hua
AU - Zhou, Jin J.
N1 - Publisher Copyright:
© 2019 Wiley Periodicals, Inc.
PY - 2020/4/1
Y1 - 2020/4/1
N2 - Logistic regression is the primary analysis tool for binary traits in genome-wide association studies (GWAS). Multinomial regression extends logistic regression to multiple categories. However, many phenotypes more naturally take ordered, discrete values. Examples include (a) subtypes defined from multiple sources of clinical information and (b) derived phenotypes generated by specific phenotyping algorithms for electronic health records (EHR). GWAS of ordinal traits have been problematic. Dichotomizing can lead to a range of arbitrary cutoff values, generating inconsistent, hard to interpret results. Using multinomial regression ignores trait value hierarchy and potentially loses power. Treating ordinal data as quantitative can lead to misleading inference. To address these issues, we analyze ordinal traits with an ordered, multinomial model. This approach increases power and leads to more interpretable results. We derive efficient algorithms for computing test statistics, making ordinal trait GWAS computationally practical for Biobank scale data. Our method is available as a Julia package OrdinalGWAS.jl. Application to a COPDGene study confirms previously found signals based on binary case–control status, but with more significance. Additionally, we demonstrate the capability of our package to run on UK Biobank data by analyzing hypertension as an ordinal trait.
AB - Logistic regression is the primary analysis tool for binary traits in genome-wide association studies (GWAS). Multinomial regression extends logistic regression to multiple categories. However, many phenotypes more naturally take ordered, discrete values. Examples include (a) subtypes defined from multiple sources of clinical information and (b) derived phenotypes generated by specific phenotyping algorithms for electronic health records (EHR). GWAS of ordinal traits have been problematic. Dichotomizing can lead to a range of arbitrary cutoff values, generating inconsistent, hard to interpret results. Using multinomial regression ignores trait value hierarchy and potentially loses power. Treating ordinal data as quantitative can lead to misleading inference. To address these issues, we analyze ordinal traits with an ordered, multinomial model. This approach increases power and leads to more interpretable results. We derive efficient algorithms for computing test statistics, making ordinal trait GWAS computationally practical for Biobank scale data. Our method is available as a Julia package OrdinalGWAS.jl. Application to a COPDGene study confirms previously found signals based on binary case–control status, but with more significance. Additionally, we demonstrate the capability of our package to run on UK Biobank data by analyzing hypertension as an ordinal trait.
KW - electronic health record
KW - genome-wide association study
KW - ordered multinomial regression
UR - http://www.scopus.com/inward/record.url?scp=85077372481&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85077372481&partnerID=8YFLogxK
U2 - 10.1002/gepi.22276
DO - 10.1002/gepi.22276
M3 - Article
C2 - 31879980
AN - SCOPUS:85077372481
SN - 0741-0395
VL - 44
SP - 248
EP - 260
JO - Genetic Epidemiology
JF - Genetic Epidemiology
IS - 3
ER -