Motivation: The advent of high-throughput next generation sequencing technology has greatly promoted the field of metagenomics where previously unattainable information about microbial communities can be discovered. Detecting differentially abundant features (e.g. species or genes) plays a critical role in revealing the contributors (i.e. pathogens) to the biological or medical status of microbial samples. However, currently available statistical methods lack power in detecting differentially abundant features contrasting different biological or medical conditions, in particular, for time series metagenomic sequencing data. We have proposed a novel procedure, metaDprof, which is built upon a spline-based method assuming heterogeneous error, to meet the challenges of detecting differentially abundant features from metagenomic samples by comparing different biological/medical conditions across time. It contains two stages: (i) global detection on features and (ii) time interval detection for significant features. The detection procedures in both stages are based on sound statistical support. Results: Compared with existing methods the new method metaDprof shows the best performance in comprehensive simulation studies. Not only can it accurately detect features relating to the biological condition or disease status of samples but it also can accurately detect the starting and ending time points when the differences arise. The proposed method is also applied to a real metagenomic dataset and the results provide an interesting angle to understand the relationship between the microbiota in mouse gut and diet type.
ASJC Scopus subject areas
- Statistics and Probability
- Molecular Biology
- Computer Science Applications
- Computational Theory and Mathematics
- Computational Mathematics