TY - JOUR
T1 - Identification and quantitation of clinically relevant microbes in patient samples
T2 - Comparison of three k-mer based classifiers for speed, accuracy, and sensitivity
AU - Watts, George S.
AU - Thornton, James E.
AU - Youens-Clark, Ken
AU - Ponsero, Alise J.
AU - Slepian, Marvin J.
AU - Menashi, Emmanuel
AU - Hu, Charles
AU - Deng, Wuquan
AU - Armstrong, David G
AU - Reed, Spenser
AU - Cranmer, Lee D
AU - Hurwitz, Bonnie L.
N1 - Funding Information:
Sequence data was generated by the Genomics Shared Resource at the University of Arizona Cancer Center; supported by the Southwest Environmental Health Sciences Center, NIEHS grant ES06694, and the Arizona Cancer Center, NIH grant CA23074. James Thornton Jr. was supported by start-up funds provided by the University of Arizona Bio5 Institute. FN sample collection and sequencing support were provided by a New Idea Award to George Watts from the Leukemia and Lymphoma Society. VAP sample collection and sequencing support was provided by a Flinn Foundation award #2097 to Emmanuel Menashi. DFU sample processing and sequencing support was provided by start-up funds provided to Bonnie Hurwitz by the University of Arizona. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Publisher Copyright:
© 2019 Watts et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
PY - 2019
Y1 - 2019
N2 - Infections are a serious health concern worldwide, particularly in vulnerable populations such as the immunocompromised, elderly, and young. Advances in metagenomic sequencing availability, speed, and decreased cost offer the opportunity to supplement or even replace culture-based identification of pathogens with DNA sequence-based diagnostics. Adopting metagenomic analysis for clinical use requires that all aspects of the workflow are optimized and tested, including data analysis and computational time and resources. We tested the accuracy, sensitivity, and resource requirements of three top metagenomic taxonomic classifiers that use fast k-mer based algorithms: Centrifuge, CLARK, and KrakenUniq. Binary mixtures of bacteria showed all three reliably identified organisms down to 1% relative abundance, while only the relative abundance estimates of Centrifuge and CLARK were accurate. All three classifiers identified the organisms present in their default databases from a mock bacterial community of 20 organisms, but only Centrifuge had no false positives. In addition, Centrifuge required far less computational resources and time for analysis. Centrifuge analysis of metagenomes obtained from samples of VAP, infected DFUs, and FN showed Centrifuge identified pathogenic bacteria and one virus that were corroborated by culture or a clinical PCR assay. Importantly, in both diabetic foot ulcer patients, metagenomic sequencing identified pathogens 4-6 weeks before culture. Finally, we show that Centrifuge results were minimally affected by elimination of time-consuming read quality control and host screening steps.
AB - Infections are a serious health concern worldwide, particularly in vulnerable populations such as the immunocompromised, elderly, and young. Advances in metagenomic sequencing availability, speed, and decreased cost offer the opportunity to supplement or even replace culture-based identification of pathogens with DNA sequence-based diagnostics. Adopting metagenomic analysis for clinical use requires that all aspects of the workflow are optimized and tested, including data analysis and computational time and resources. We tested the accuracy, sensitivity, and resource requirements of three top metagenomic taxonomic classifiers that use fast k-mer based algorithms: Centrifuge, CLARK, and KrakenUniq. Binary mixtures of bacteria showed all three reliably identified organisms down to 1% relative abundance, while only the relative abundance estimates of Centrifuge and CLARK were accurate. All three classifiers identified the organisms present in their default databases from a mock bacterial community of 20 organisms, but only Centrifuge had no false positives. In addition, Centrifuge required far less computational resources and time for analysis. Centrifuge analysis of metagenomes obtained from samples of VAP, infected DFUs, and FN showed Centrifuge identified pathogenic bacteria and one virus that were corroborated by culture or a clinical PCR assay. Importantly, in both diabetic foot ulcer patients, metagenomic sequencing identified pathogens 4-6 weeks before culture. Finally, we show that Centrifuge results were minimally affected by elimination of time-consuming read quality control and host screening steps.
UR - http://www.scopus.com/inward/record.url?scp=85076108681&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85076108681&partnerID=8YFLogxK
U2 - 10.1371/journal.pcbi.1006863
DO - 10.1371/journal.pcbi.1006863
M3 - Article
C2 - 31756192
AN - SCOPUS:85076108681
SN - 1553-734X
VL - 15
JO - PLoS computational biology
JF - PLoS computational biology
IS - 11
M1 - e1006863
ER -