DeepBiome: A Phylogenetic Tree Informed Deep Neural Network for Microbiome Data Analysis

Jing Zhai, Youngwon Choi, Xingyi Yang, Yin Chen, Kenneth Knox, Homer L. Twigg, Joong Ho Won, Hua Zhou, Jin J. Zhou

Research output: Contribution to journalArticlepeer-review

Abstract

Evidence linking the microbiome to human health is rapidly growing. The microbiome profile has the potential as a novel predictive biomarker for many diseases. However, tables of bacterial counts are typically sparse, and bacteria are classified within a hierarchy of taxonomic levels, ranging from species to phylum. Existing tools focus on identifying microbiome associations at either the community level or a specific, pre-defined taxonomic level. Incorporating the evolutionary relationship between bacteria can enhance data interpretation. This approach allows for aggregating microbiome contributions, leading to more accurate and interpretable results. We present DeepBiome, a phylogeny-informed neural network architecture, to predict phenotypes from microbiome counts and uncover the microbiome–phenotype association network. It utilizes microbiome abundance as input and employs phylogenetic taxonomy to guide the neural network’s architecture. Leveraging phylogenetic information, DeepBiome reduces the need for extensive tuning of the deep learning architecture, minimizes overfitting, and, crucially, enables the visualization of the path from microbiome counts to disease. It is applicable to both regression and classification problems. Simulation studies and real-life data analysis have shown that DeepBiome is both highly accurate and efficient. It offers deep insights into complex microbiome–phenotype associations, even with small to moderate training sample sizes. In practice, the specific taxonomic level at which microbiome clusters tag the association remains unknown. Therefore, the main advantage of the presented method over other analytical methods is that it offers an ecological and evolutionary understanding of host–microbe interactions, which is important for microbiome-based medicine. DeepBiome is implemented using Python packages Keras and TensorFlow. It is an open-source tool available at https://github.com/Young-won/DeepBiome.

Original languageEnglish (US)
JournalStatistics in Biosciences
DOIs
StateAccepted/In press - 2024

Keywords

  • Metagenomics
  • Mixed taxonomic levels
  • Neural networks
  • Phylogenetic tree
  • Prediction

ASJC Scopus subject areas

  • Statistics and Probability
  • Biochemistry, Genetics and Molecular Biology (miscellaneous)

Fingerprint

Dive into the research topics of 'DeepBiome: A Phylogenetic Tree Informed Deep Neural Network for Microbiome Data Analysis'. Together they form a unique fingerprint.

Cite this