TY - JOUR
T1 - Identification and functional annotation of long intergenic non-coding RNAs in Brassicaceae
AU - Palos, Kyle
AU - Nelson Dittrich, Anna C.
AU - Yu, Li'ang
AU - Brock, Jordan R.
AU - Railey, Caylyn E.
AU - Wu, Hsin Yen Larry
AU - Sokolowska, Ewelina
AU - Skirycz, Aleksandra
AU - Hsu, Polly Yingshan
AU - Gregory, Brian D.
AU - Lyons, Eric
AU - Beilstein, Mark A.
AU - Nelson, Andrew D.L.
N1 - Publisher Copyright:
© The Author(s) 2022. Published by Oxford University Press on behalf of American Society of Plant Biologists.
PY - 2022/9
Y1 - 2022/9
N2 - Long intergenic noncoding RNAs (lincRNAs) are a large yet enigmatic class of eukaryotic transcripts that can have critical biological functions. The wealth of RNA-sequencing (RNA-seq) data available for plants provides the opportunity to implement a harmonized identification and annotation effort for lincRNAs that enables cross-species functional and genomic comparisons as well as prioritization of functional candidates. In this study, we processed 424 Tera base pairs of RNA-seq data from 416,000 experiments to identify ∼130,000 lincRNAs in four Brassicaceae: Arabidopsis thaliana, Camelina sativa, Brassica rapa, and Eutrema salsugineum. We used nanopore RNA-seq, transcriptome-wide structural information, peptide data, and epigenomic data to characterize these lincRNAs and identify conserved motifs. We then used comparative genomic and transcriptomic approaches to highlight lincRNAs in our data set with sequence or transcriptional conservation. Finally, we used guilt-by-association analyses to assign putative functions to lincRNAs within our data set. We tested this approach on a subset of lincRNAs associated with germination and seed development, observing germination defects for Arabidopsis lines harboring T-DNA insertions at these loci. LincRNAs with Brassicaceae-conserved putative miRNA binding motifs, small open reading frames, or abiotic-stress modulated expression are a few of the annotations that will guide functional analyses into this cryptic portion of the transcriptome.
AB - Long intergenic noncoding RNAs (lincRNAs) are a large yet enigmatic class of eukaryotic transcripts that can have critical biological functions. The wealth of RNA-sequencing (RNA-seq) data available for plants provides the opportunity to implement a harmonized identification and annotation effort for lincRNAs that enables cross-species functional and genomic comparisons as well as prioritization of functional candidates. In this study, we processed 424 Tera base pairs of RNA-seq data from 416,000 experiments to identify ∼130,000 lincRNAs in four Brassicaceae: Arabidopsis thaliana, Camelina sativa, Brassica rapa, and Eutrema salsugineum. We used nanopore RNA-seq, transcriptome-wide structural information, peptide data, and epigenomic data to characterize these lincRNAs and identify conserved motifs. We then used comparative genomic and transcriptomic approaches to highlight lincRNAs in our data set with sequence or transcriptional conservation. Finally, we used guilt-by-association analyses to assign putative functions to lincRNAs within our data set. We tested this approach on a subset of lincRNAs associated with germination and seed development, observing germination defects for Arabidopsis lines harboring T-DNA insertions at these loci. LincRNAs with Brassicaceae-conserved putative miRNA binding motifs, small open reading frames, or abiotic-stress modulated expression are a few of the annotations that will guide functional analyses into this cryptic portion of the transcriptome.
UR - http://www.scopus.com/inward/record.url?scp=85136509743&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85136509743&partnerID=8YFLogxK
U2 - 10.1093/plcell/koac166
DO - 10.1093/plcell/koac166
M3 - Article
C2 - 35666179
AN - SCOPUS:85136509743
SN - 1040-4651
VL - 34
SP - 3233
EP - 3260
JO - Plant Cell
JF - Plant Cell
IS - 9
ER -