TY - JOUR
T1 - Identification and functional annotation of long intergenic non-coding RNAs in Brassicaceae
AU - Palos, Kyle
AU - Nelson Dittrich, Anna C.
AU - Yu, Li'ang
AU - Brock, Jordan R.
AU - Railey, Caylyn E.
AU - Wu, Hsin Yen Larry
AU - Sokolowska, Ewelina
AU - Skirycz, Aleksandra
AU - Hsu, Polly Yingshan
AU - Gregory, Brian D.
AU - Lyons, Eric
AU - Beilstein, Mark A.
AU - Nelson, Andrew D.L.
N1 - Funding Information:
The authors would like to acknowledge the NSF Graduate Research Fellowship Grant DGE-1746060 (awarded to K.P.), NSF-MCB 2051885 (awarded to P.Y.H.), NSF-IOS 1758532 (awarded to A.D.L.N.), NSF-IOS 1444490 (awarded to E.L., B.D.G., and M.A.B.), NSF-DBI-1743442 to E.L., and NSF-IOS 2023310 (awarded to A.D.L.N., B.D.G., and E.L.). The authors would like to thank the plant scientists who have contributed genomic and transcriptomic data to the NCBI SRA, making this work possible. The authors, particularly J.R.B., would like to thank Dr. Ken Olsen, Washington University in St Louis, for thoughtful discussion regarding the Camelina experiments. The authors would also like to thank Dr. Bob Schmitz, University of Georgia, Athens, for his helpful advice on analyzing Arabidopsis and Eutrema epigenetic data. Finally, the authors would like to thank members of the Stress Architecture and RNA Biology Cluster and Skirycz lab at BTI and the PaBeBaMo group at the University of Arizona School of Plant Sciences for helpful discussion.
Publisher Copyright:
© The Author(s) 2022. Published by Oxford University Press on behalf of American Society of Plant Biologists.
PY - 2022/9
Y1 - 2022/9
N2 - Long intergenic noncoding RNAs (lincRNAs) are a large yet enigmatic class of eukaryotic transcripts that can have critical biological functions. The wealth of RNA-sequencing (RNA-seq) data available for plants provides the opportunity to implement a harmonized identification and annotation effort for lincRNAs that enables cross-species functional and genomic comparisons as well as prioritization of functional candidates. In this study, we processed 424 Tera base pairs of RNA-seq data from 416,000 experiments to identify ∼130,000 lincRNAs in four Brassicaceae: Arabidopsis thaliana, Camelina sativa, Brassica rapa, and Eutrema salsugineum. We used nanopore RNA-seq, transcriptome-wide structural information, peptide data, and epigenomic data to characterize these lincRNAs and identify conserved motifs. We then used comparative genomic and transcriptomic approaches to highlight lincRNAs in our data set with sequence or transcriptional conservation. Finally, we used guilt-by-association analyses to assign putative functions to lincRNAs within our data set. We tested this approach on a subset of lincRNAs associated with germination and seed development, observing germination defects for Arabidopsis lines harboring T-DNA insertions at these loci. LincRNAs with Brassicaceae-conserved putative miRNA binding motifs, small open reading frames, or abiotic-stress modulated expression are a few of the annotations that will guide functional analyses into this cryptic portion of the transcriptome.
AB - Long intergenic noncoding RNAs (lincRNAs) are a large yet enigmatic class of eukaryotic transcripts that can have critical biological functions. The wealth of RNA-sequencing (RNA-seq) data available for plants provides the opportunity to implement a harmonized identification and annotation effort for lincRNAs that enables cross-species functional and genomic comparisons as well as prioritization of functional candidates. In this study, we processed 424 Tera base pairs of RNA-seq data from 416,000 experiments to identify ∼130,000 lincRNAs in four Brassicaceae: Arabidopsis thaliana, Camelina sativa, Brassica rapa, and Eutrema salsugineum. We used nanopore RNA-seq, transcriptome-wide structural information, peptide data, and epigenomic data to characterize these lincRNAs and identify conserved motifs. We then used comparative genomic and transcriptomic approaches to highlight lincRNAs in our data set with sequence or transcriptional conservation. Finally, we used guilt-by-association analyses to assign putative functions to lincRNAs within our data set. We tested this approach on a subset of lincRNAs associated with germination and seed development, observing germination defects for Arabidopsis lines harboring T-DNA insertions at these loci. LincRNAs with Brassicaceae-conserved putative miRNA binding motifs, small open reading frames, or abiotic-stress modulated expression are a few of the annotations that will guide functional analyses into this cryptic portion of the transcriptome.
UR - http://www.scopus.com/inward/record.url?scp=85136509743&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85136509743&partnerID=8YFLogxK
U2 - 10.1093/plcell/koac166
DO - 10.1093/plcell/koac166
M3 - Article
C2 - 35666179
AN - SCOPUS:85136509743
SN - 1040-4651
VL - 34
SP - 3233
EP - 3260
JO - Plant Cell
JF - Plant Cell
IS - 9
ER -