Long intergenic noncoding RNAs (lincRNAs) are a large yet enigmatic class of eukaryotic transcripts that can have critical biological functions. The wealth of RNA-sequencing (RNA-seq) data available for plants provides the opportunity to implement a harmonized identification and annotation effort for lincRNAs that enables cross-species functional and genomic comparisons as well as prioritization of functional candidates. In this study, we processed 424 Tera base pairs of RNA-seq data from 416,000 experiments to identify ∼130,000 lincRNAs in four Brassicaceae: Arabidopsis thaliana, Camelina sativa, Brassica rapa, and Eutrema salsugineum. We used nanopore RNA-seq, transcriptome-wide structural information, peptide data, and epigenomic data to characterize these lincRNAs and identify conserved motifs. We then used comparative genomic and transcriptomic approaches to highlight lincRNAs in our data set with sequence or transcriptional conservation. Finally, we used guilt-by-association analyses to assign putative functions to lincRNAs within our data set. We tested this approach on a subset of lincRNAs associated with germination and seed development, observing germination defects for Arabidopsis lines harboring T-DNA insertions at these loci. LincRNAs with Brassicaceae-conserved putative miRNA binding motifs, small open reading frames, or abiotic-stress modulated expression are a few of the annotations that will guide functional analyses into this cryptic portion of the transcriptome.
ASJC Scopus subject areas
- Plant Science