TY - JOUR
T1 - DeepBiome
T2 - A Phylogenetic Tree Informed Deep Neural Network for Microbiome Data Analysis
AU - Zhai, Jing
AU - Choi, Youngwon
AU - Yang, Xingyi
AU - Chen, Yin
AU - Knox, Kenneth
AU - Twigg, Homer L.
AU - Won, Joong Ho
AU - Zhou, Hua
AU - Zhou, Jin J.
N1 - Publisher Copyright:
© The Author(s) under exclusive licence to International Chinese Statistical Association 2024.
PY - 2024
Y1 - 2024
N2 - Evidence linking the microbiome to human health is rapidly growing. The microbiome profile has the potential as a novel predictive biomarker for many diseases. However, tables of bacterial counts are typically sparse, and bacteria are classified within a hierarchy of taxonomic levels, ranging from species to phylum. Existing tools focus on identifying microbiome associations at either the community level or a specific, pre-defined taxonomic level. Incorporating the evolutionary relationship between bacteria can enhance data interpretation. This approach allows for aggregating microbiome contributions, leading to more accurate and interpretable results. We present DeepBiome, a phylogeny-informed neural network architecture, to predict phenotypes from microbiome counts and uncover the microbiome–phenotype association network. It utilizes microbiome abundance as input and employs phylogenetic taxonomy to guide the neural network’s architecture. Leveraging phylogenetic information, DeepBiome reduces the need for extensive tuning of the deep learning architecture, minimizes overfitting, and, crucially, enables the visualization of the path from microbiome counts to disease. It is applicable to both regression and classification problems. Simulation studies and real-life data analysis have shown that DeepBiome is both highly accurate and efficient. It offers deep insights into complex microbiome–phenotype associations, even with small to moderate training sample sizes. In practice, the specific taxonomic level at which microbiome clusters tag the association remains unknown. Therefore, the main advantage of the presented method over other analytical methods is that it offers an ecological and evolutionary understanding of host–microbe interactions, which is important for microbiome-based medicine. DeepBiome is implemented using Python packages Keras and TensorFlow. It is an open-source tool available at https://github.com/Young-won/DeepBiome.
AB - Evidence linking the microbiome to human health is rapidly growing. The microbiome profile has the potential as a novel predictive biomarker for many diseases. However, tables of bacterial counts are typically sparse, and bacteria are classified within a hierarchy of taxonomic levels, ranging from species to phylum. Existing tools focus on identifying microbiome associations at either the community level or a specific, pre-defined taxonomic level. Incorporating the evolutionary relationship between bacteria can enhance data interpretation. This approach allows for aggregating microbiome contributions, leading to more accurate and interpretable results. We present DeepBiome, a phylogeny-informed neural network architecture, to predict phenotypes from microbiome counts and uncover the microbiome–phenotype association network. It utilizes microbiome abundance as input and employs phylogenetic taxonomy to guide the neural network’s architecture. Leveraging phylogenetic information, DeepBiome reduces the need for extensive tuning of the deep learning architecture, minimizes overfitting, and, crucially, enables the visualization of the path from microbiome counts to disease. It is applicable to both regression and classification problems. Simulation studies and real-life data analysis have shown that DeepBiome is both highly accurate and efficient. It offers deep insights into complex microbiome–phenotype associations, even with small to moderate training sample sizes. In practice, the specific taxonomic level at which microbiome clusters tag the association remains unknown. Therefore, the main advantage of the presented method over other analytical methods is that it offers an ecological and evolutionary understanding of host–microbe interactions, which is important for microbiome-based medicine. DeepBiome is implemented using Python packages Keras and TensorFlow. It is an open-source tool available at https://github.com/Young-won/DeepBiome.
KW - Metagenomics
KW - Mixed taxonomic levels
KW - Neural networks
KW - Phylogenetic tree
KW - Prediction
UR - http://www.scopus.com/inward/record.url?scp=85195898614&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85195898614&partnerID=8YFLogxK
U2 - 10.1007/s12561-024-09434-9
DO - 10.1007/s12561-024-09434-9
M3 - Article
AN - SCOPUS:85195898614
SN - 1867-1764
JO - Statistics in Biosciences
JF - Statistics in Biosciences
ER -