TY - JOUR
T1 - Denoising Autoencoder, A Deep Learning Algorithm, Aids the Identification of A Novel Molecular Signature of Lung Adenocarcinoma
AU - Wang, Jun
AU - Xie, Xueying
AU - Shi, Junchao
AU - He, Wenjun
AU - Chen, Qi
AU - Chen, Liang
AU - Gu, Wanjun
AU - Zhou, Tong
N1 - Funding Information:
This work was supported in part by the National Natural Science Foundation of China (Grant Nos. 61372164 to XX , 61471112 to WG , and 61571109 to WG ), the Key R & D Program of Jiangsu Province, China (Grant No. BE2016002-3 to WG ), the Fundamental Research Funds for the Central Universities, China (Grant No. 2242017K3DN04 to WG ), the Clinical Research Cultivation Program, China (Grant No. 2017CX010 to LC ), and the Social Development Foundation of Jiangsu Province – Clinical Frontier Technology, China (Grant No. BE2018746 to LC ).
Publisher Copyright:
© 2020 The Authors
PY - 2020/8
Y1 - 2020/8
N2 - Precise biomarker development is a key step in disease management. However, most of the published biomarkers were derived from a relatively small number of samples with supervised approaches. Recent advances in unsupervised machine learning promise to leverage very large datasets for making better predictions of disease biomarkers. Denoising autoencoder (DA) is one of the unsupervised deep learning algorithms, which is a stochastic version of autoencoder techniques. The principle of DA is to force the hidden layer of autoencoder to capture more robust features by reconstructing a clean input from a corrupted one. Here, a DA model was applied to analyze integrated transcriptomic data from 13 published lung cancer studies, which consisted of 1916 human lung tissue samples. Using DA, we discovered a molecular signature composed of multiple genes for lung adenocarcinoma (ADC). In independent validation cohorts, the proposed molecular signature is proved to be an effective classifier for lung cancer histological subtypes. Also, this signature successfully predicts clinical outcome in lung ADC, which is independent of traditional prognostic factors. More importantly, this signature exhibits a superior prognostic power compared with the other published prognostic genes. Our study suggests that unsupervised learning is helpful for biomarker development in the era of precision medicine.
AB - Precise biomarker development is a key step in disease management. However, most of the published biomarkers were derived from a relatively small number of samples with supervised approaches. Recent advances in unsupervised machine learning promise to leverage very large datasets for making better predictions of disease biomarkers. Denoising autoencoder (DA) is one of the unsupervised deep learning algorithms, which is a stochastic version of autoencoder techniques. The principle of DA is to force the hidden layer of autoencoder to capture more robust features by reconstructing a clean input from a corrupted one. Here, a DA model was applied to analyze integrated transcriptomic data from 13 published lung cancer studies, which consisted of 1916 human lung tissue samples. Using DA, we discovered a molecular signature composed of multiple genes for lung adenocarcinoma (ADC). In independent validation cohorts, the proposed molecular signature is proved to be an effective classifier for lung cancer histological subtypes. Also, this signature successfully predicts clinical outcome in lung ADC, which is independent of traditional prognostic factors. More importantly, this signature exhibits a superior prognostic power compared with the other published prognostic genes. Our study suggests that unsupervised learning is helpful for biomarker development in the era of precision medicine.
KW - Denoising autoencoder
KW - Lung cancer
KW - Molecular signature
KW - Prognosis
KW - Unsupervised learning
UR - http://www.scopus.com/inward/record.url?scp=85106257606&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85106257606&partnerID=8YFLogxK
U2 - 10.1016/j.gpb.2019.02.003
DO - 10.1016/j.gpb.2019.02.003
M3 - Article
C2 - 33346087
AN - SCOPUS:85106257606
SN - 1672-0229
VL - 18
SP - 468
EP - 480
JO - Genomics, Proteomics and Bioinformatics
JF - Genomics, Proteomics and Bioinformatics
IS - 4
ER -