TY - JOUR
T1 - Boosting gene mapping power and efficiency with efficient exact variance component tests of single nucleotide polymorphism sets
AU - Zhou, Jin J.
AU - Hu, Tao
AU - Qiao, Dandi
AU - Cho, Michael H.
AU - Zhou, Hua
N1 - Publisher Copyright:
© 2016 by the Genetics Society of America.
PY - 2016/11
Y1 - 2016/11
N2 - Single nucleotide polymorphism (SNP) set tests have been a powerful method in analyzing next-generation sequencing (NGS) data. The popular sequence kernel association test (SKAT) method tests a set of variants as random effects in the linear mixed model setting. Its P-value is calculated based on asymptotic theory that requires a large sample size. Therefore, it is known that SKAT is conservative and can lose power at small or moderate sample sizes. Given the current cost of sequencing technology, scales of NGS are still limited. In this report, we derive and implement computationally efficient, exact (nonasymptotic) score (eScore), likelihood ratio (eLRT), and restricted likelihood ratio (eRLRT) tests, EXACTVCTEST, that can achieve high power even when sample sizes are small. We perform simulation studies under various genetic scenarios. Our EXACTVCTEST (i.e., eScore, eLRT, eRLRT) exhibits well-controlled type I error. Under the alternative model, eScore P-values are universally smaller than those from SKAT. eLRT and eRLRT demonstrate significantly higher power than eScore, SKAT, and SKAT optimal (SKAT-o) across all scenarios and various samples sizes. We applied these tests to an exome sequencing study. Our findings replicate previous results and shed light on rare variant effects within genes. The software package is implemented in the open source, high-performance technical computing language JULIA, and is freely available at https://github.com/Tao-Hu/VarianceComponentTest.jl. Analysis of each trait in the exome sequencing data set with 399 individuals and 16; 619 genes takes around 1 min on a desktop computer.
AB - Single nucleotide polymorphism (SNP) set tests have been a powerful method in analyzing next-generation sequencing (NGS) data. The popular sequence kernel association test (SKAT) method tests a set of variants as random effects in the linear mixed model setting. Its P-value is calculated based on asymptotic theory that requires a large sample size. Therefore, it is known that SKAT is conservative and can lose power at small or moderate sample sizes. Given the current cost of sequencing technology, scales of NGS are still limited. In this report, we derive and implement computationally efficient, exact (nonasymptotic) score (eScore), likelihood ratio (eLRT), and restricted likelihood ratio (eRLRT) tests, EXACTVCTEST, that can achieve high power even when sample sizes are small. We perform simulation studies under various genetic scenarios. Our EXACTVCTEST (i.e., eScore, eLRT, eRLRT) exhibits well-controlled type I error. Under the alternative model, eScore P-values are universally smaller than those from SKAT. eLRT and eRLRT demonstrate significantly higher power than eScore, SKAT, and SKAT optimal (SKAT-o) across all scenarios and various samples sizes. We applied these tests to an exome sequencing study. Our findings replicate previous results and shed light on rare variant effects within genes. The software package is implemented in the open source, high-performance technical computing language JULIA, and is freely available at https://github.com/Tao-Hu/VarianceComponentTest.jl. Analysis of each trait in the exome sequencing data set with 399 individuals and 16; 619 genes takes around 1 min on a desktop computer.
KW - Exact tests
KW - Linear mixed effect model
KW - Next-generation sequencing studies
KW - SNP set tests
KW - Small sample sizes
UR - http://www.scopus.com/inward/record.url?scp=84994908295&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84994908295&partnerID=8YFLogxK
U2 - 10.1534/genetics.116.190454
DO - 10.1534/genetics.116.190454
M3 - Article
C2 - 27646141
AN - SCOPUS:84994908295
SN - 0016-6731
VL - 204
SP - 921
EP - 931
JO - Genetics
JF - Genetics
IS - 3
ER -