TY - JOUR
T1 - CAT-GxD
T2 - Centralized access to gene expression datasets
AU - Roxas, Bryan Angelo P.
AU - Roxas, Jennifer Lising
AU - Guo, Jessica S.
AU - LeBauer, David S.
AU - McCarthy, Fiona
AU - Vedantam, Gayatri
AU - Viswanathan, V. K.
N1 - Publisher Copyright:
© 2025
PY - 2025/12
Y1 - 2025/12
N2 - Objectives: Extensive multi-omic bacterial gene expression datasets are publicly available, but tools that unify these datasets for interpretation and hypothesis-testing are limited. The complexity and requirement of specialized bioinformatics and programming expertise pose significant major barriers for researchers attempting to query and analyze these datasets. Our objective was to develop an integrated search engine that simplifies access to publicly available gene expression data to facilitate comparison and analysis of multi-omics datasets. Methods: We developed the Centralized Access to Gene Expression Datasets (CAT-GxD) search engine to provide integrated access to, and facilitate analysis of, publicly available transcriptomics and proteomics datasets of the CDC Urgent Threat pathogen Clostridioides difficile. Manual data curation was performed to integrate and standardize all 74 nonredundant transcriptomics and quantitative proteomics datasets available at Gene Expression Omnibus (GEO) database and ProteomeXchange Consortium. The CAT-GxD search engine, developed on open-source R-shiny framework, is available at https://viz.datascience.arizona.edu/catgxd/. Results: CAT-GxD successfully consolidated disparate transcriptomics and proteomics datasets, supporting interpretation and hypothesis testing. CAT-GxD provides customizable visualization of gene expression data under different conditions. We demonstrate the utility of CAT-GxD in analyzing the contribution of RNA polymerase, nitrogen-limitation N (RpoN) to C. difficile biology, and highlight the RpoN-dependent regulation of genes treated with succinate and the secondary bile acid deoxycholate. Conclusions: CAT-GxD streamlines the analysis of C. difficile multi-omic data, reducing the complexity and analysis time. It facilitates the generation of novel hypotheses and the identification of anti-infective targets, and can be adapted to incorporate data analysis paradigms for diverse organisms.
AB - Objectives: Extensive multi-omic bacterial gene expression datasets are publicly available, but tools that unify these datasets for interpretation and hypothesis-testing are limited. The complexity and requirement of specialized bioinformatics and programming expertise pose significant major barriers for researchers attempting to query and analyze these datasets. Our objective was to develop an integrated search engine that simplifies access to publicly available gene expression data to facilitate comparison and analysis of multi-omics datasets. Methods: We developed the Centralized Access to Gene Expression Datasets (CAT-GxD) search engine to provide integrated access to, and facilitate analysis of, publicly available transcriptomics and proteomics datasets of the CDC Urgent Threat pathogen Clostridioides difficile. Manual data curation was performed to integrate and standardize all 74 nonredundant transcriptomics and quantitative proteomics datasets available at Gene Expression Omnibus (GEO) database and ProteomeXchange Consortium. The CAT-GxD search engine, developed on open-source R-shiny framework, is available at https://viz.datascience.arizona.edu/catgxd/. Results: CAT-GxD successfully consolidated disparate transcriptomics and proteomics datasets, supporting interpretation and hypothesis testing. CAT-GxD provides customizable visualization of gene expression data under different conditions. We demonstrate the utility of CAT-GxD in analyzing the contribution of RNA polymerase, nitrogen-limitation N (RpoN) to C. difficile biology, and highlight the RpoN-dependent regulation of genes treated with succinate and the secondary bile acid deoxycholate. Conclusions: CAT-GxD streamlines the analysis of C. difficile multi-omic data, reducing the complexity and analysis time. It facilitates the generation of novel hypotheses and the identification of anti-infective targets, and can be adapted to incorporate data analysis paradigms for diverse organisms.
KW - Bacterial gene expression datasets
KW - Clostridioides difficile
KW - Database
KW - Multi-omic
KW - Proteomics
KW - Transcriptomics
UR - https://www.scopus.com/pages/publications/105019108683
UR - https://www.scopus.com/inward/citedby.url?scp=105019108683&partnerID=8YFLogxK
U2 - 10.1016/j.anaerobe.2025.103005
DO - 10.1016/j.anaerobe.2025.103005
M3 - Article
C2 - 41092974
AN - SCOPUS:105019108683
SN - 1075-9964
VL - 96
JO - Anaerobe
JF - Anaerobe
M1 - 103005
ER -