CAT-GxD: Centralized access to gene expression datasets

Bryan Angelo P. Roxas, Jennifer Lising Roxas, Jessica S. Guo, David S. LeBauer, Fiona McCarthy, Gayatri Vedantam, V. K. Viswanathan

Research output: Contribution to journalArticlepeer-review

Abstract

Objectives: Extensive multi-omic bacterial gene expression datasets are publicly available, but tools that unify these datasets for interpretation and hypothesis-testing are limited. The complexity and requirement of specialized bioinformatics and programming expertise pose significant major barriers for researchers attempting to query and analyze these datasets. Our objective was to develop an integrated search engine that simplifies access to publicly available gene expression data to facilitate comparison and analysis of multi-omics datasets. Methods: We developed the Centralized Access to Gene Expression Datasets (CAT-GxD) search engine to provide integrated access to, and facilitate analysis of, publicly available transcriptomics and proteomics datasets of the CDC Urgent Threat pathogen Clostridioides difficile. Manual data curation was performed to integrate and standardize all 74 nonredundant transcriptomics and quantitative proteomics datasets available at Gene Expression Omnibus (GEO) database and ProteomeXchange Consortium. The CAT-GxD search engine, developed on open-source R-shiny framework, is available at https://viz.datascience.arizona.edu/catgxd/. Results: CAT-GxD successfully consolidated disparate transcriptomics and proteomics datasets, supporting interpretation and hypothesis testing. CAT-GxD provides customizable visualization of gene expression data under different conditions. We demonstrate the utility of CAT-GxD in analyzing the contribution of RNA polymerase, nitrogen-limitation N (RpoN) to C. difficile biology, and highlight the RpoN-dependent regulation of genes treated with succinate and the secondary bile acid deoxycholate. Conclusions: CAT-GxD streamlines the analysis of C. difficile multi-omic data, reducing the complexity and analysis time. It facilitates the generation of novel hypotheses and the identification of anti-infective targets, and can be adapted to incorporate data analysis paradigms for diverse organisms.

Original languageEnglish (US)
Article number103005
JournalAnaerobe
Volume96
DOIs
StatePublished - Dec 2025

Keywords

  • Bacterial gene expression datasets
  • Clostridioides difficile
  • Database
  • Multi-omic
  • Proteomics
  • Transcriptomics

ASJC Scopus subject areas

  • Microbiology
  • Infectious Diseases

Fingerprint

Dive into the research topics of 'CAT-GxD: Centralized access to gene expression datasets'. Together they form a unique fingerprint.

Cite this