TY - JOUR
T1 - Evaluation of in Silico Multifeature Libraries for Providing Evidence for the Presence of Small Molecules in Synthetic Blinded Samples
AU - Nuñez, Jamie R.
AU - Colby, Sean M.
AU - Thomas, Dennis G.
AU - Tfaily, Malak M.
AU - Tolic, Nikola
AU - Ulrich, Elin M.
AU - Sobus, Jon R.
AU - Metz, Thomas O.
AU - Teeguarden, Justin G.
AU - Renslow, Ryan S.
N1 - Funding Information:
This research was partially supported by the Genomic Science Program (GSP), Office of Biological and Environmental Research (OBER), the U.S. Department of Energy (DOE), and is a contribution of the Pacific Northwest National Laboratory (PNNL) Metabolic and Spatial Interactions in Communities (MOSAIC) Scientific Focus Area (SFA). The Multi-Attribute Matching Engine (MAME) was fully developed under MOSAIC funding. Portions of this research were also supported by the National Institutes of Health, National Institute of Environmental Health Sciences grant U2CES030170, the United States Environmental Protection Agency (Interagency Agreement DW-089-92452001-0 in support of DOE Project No. 68955A), the National Cancer Institute (grant R03CA222443), and a PNNL Laboratory Directed Research and Development program, the Microbiomes in Transition (MinT) Initiative. This work was performed in the W. R. Wiley Environmental Molecular Sciences Laboratory (EMSL), a DOE national scientific user facility at the PNNL. The NWChem calculations were performed using the Cascade supercomputer at the EMSL. PNNL is operated by Battelle for the DOE under contract DE-AC05-76RL0 1830.
Publisher Copyright:
Copyright © 2019 American Chemical Society.
PY - 2019/9/23
Y1 - 2019/9/23
N2 - The current gold standard for unambiguous molecular identification in metabolomics analysis is comparing two or more orthogonal properties from the analysis of authentic reference materials (standards) to experimental data acquired in the same laboratory with the same analytical methods. This represents a significant limitation for comprehensive chemical identification of small molecules in complex samples. The process is time consuming and costly, and the majority of molecules are not yet represented by standards. Thus, there is a need to assemble evidence for the presence of small molecules in complex samples through the use of libraries containing calculated chemical properties. To address this need, we developed a Multi-Attribute Matching Engine (MAME) and a library derived in part from our in silico chemical library engine (ISiCLE). Here, we describe an initial evaluation of these methods in a blinded analysis of synthetic chemical mixtures as part of the U.S. Environmental Protection Agency's (EPA) Non-Targeted Analysis Collaborative Trial (ENTACT, Phase 1). For molecules in all mixtures, the initial blinded false negative rate (FNR), false discovery rate (FDR), and accuracy were 57%, 77%, and 91%, respectively. For high evidence scores, the FDR was 35%. After unblinding of the sample compositions, we optimized the scoring parameters to better exploit the available evidence and increased the accuracy for molecules suspected as present. The final FNR, FDR, and accuracy were 67%, 53%, and 96%, respectively. For high evidence scores, the FDR was 10%. This study demonstrates that multiattribute matching methods in conjunction with in silico libraries may one day enable reduced reliance on experimentally derived libraries for building evidence for the presence of molecules in complex samples.
AB - The current gold standard for unambiguous molecular identification in metabolomics analysis is comparing two or more orthogonal properties from the analysis of authentic reference materials (standards) to experimental data acquired in the same laboratory with the same analytical methods. This represents a significant limitation for comprehensive chemical identification of small molecules in complex samples. The process is time consuming and costly, and the majority of molecules are not yet represented by standards. Thus, there is a need to assemble evidence for the presence of small molecules in complex samples through the use of libraries containing calculated chemical properties. To address this need, we developed a Multi-Attribute Matching Engine (MAME) and a library derived in part from our in silico chemical library engine (ISiCLE). Here, we describe an initial evaluation of these methods in a blinded analysis of synthetic chemical mixtures as part of the U.S. Environmental Protection Agency's (EPA) Non-Targeted Analysis Collaborative Trial (ENTACT, Phase 1). For molecules in all mixtures, the initial blinded false negative rate (FNR), false discovery rate (FDR), and accuracy were 57%, 77%, and 91%, respectively. For high evidence scores, the FDR was 35%. After unblinding of the sample compositions, we optimized the scoring parameters to better exploit the available evidence and increased the accuracy for molecules suspected as present. The final FNR, FDR, and accuracy were 67%, 53%, and 96%, respectively. For high evidence scores, the FDR was 10%. This study demonstrates that multiattribute matching methods in conjunction with in silico libraries may one day enable reduced reliance on experimentally derived libraries for building evidence for the presence of molecules in complex samples.
UR - http://www.scopus.com/inward/record.url?scp=85071870177&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85071870177&partnerID=8YFLogxK
U2 - 10.1021/acs.jcim.9b00444
DO - 10.1021/acs.jcim.9b00444
M3 - Article
C2 - 31430141
AN - SCOPUS:85071870177
SN - 1549-9596
VL - 59
SP - 4052
EP - 4060
JO - Journal of Chemical Information and Modeling
JF - Journal of Chemical Information and Modeling
IS - 9
ER -