Progress Towards Plant Community Transcriptomics: Pilot RNA-Seq Data from 24 Species of Vascular Plants at Harvard Forest

  • Hannah E. Marx (Contributor)
  • Stacy A. Jorgensen (Contributor)
  • Eldridge Wisely (Contributor)
  • Zheng Li (Contributor)
  • Katrina M Dlugosch (Contributor)
  • Michael S Barker (Contributor)



Assembled transcriptomes of 24 vascular plant species from Harvard Forest. Transcriptomes for each species were sequenced and assembled as described below. Additional details available in the associated manuscript: Raw reads for each available at NCBI SRA SRP127805 and BioProject PRJNA422719. Taxon selection and sampling The Harvard Forest Flora (Jenkins et al., 2008) was used to select taxa to represent each category (native/invasive, diploid/polyploid). Invasive species status was determined from the Harvard Forest Flora Database (Jenkins and Motzkin, 2009). Putative diploids and neo-polyploid species were identified from chromosome counts obtained from the Chromosome Counts Database (Rice et al., 2015). Congeneric species pairs were selected based on their phylogenetic relatedness. The Harvard Forest Flora Database was used to assesscalculate recent encounter rates of each target species, and locate sampling sites. Tissue from mature leaves was collected from an individual representing each target species at two time points (July and August) during the 2016 growing season. The same individual was sampled at both time points for perennial individuals, and the same population was sampled for annuals. Field sampling for plant RNA-seq followed the protocol described in Yang et al. 2017 (Yang et al., 2017). Leaf tissues were flash frozen in liquid nitrogen in the field, and shipped on dry ice to the University of Arizona for RNA extraction. RNA extraction and RNA-seq Total RNA was extracted from leaf tissue collected at each time point for all species using the Spectrum Plant Total RNA Kit (Sigma-Aldrich Co., St. Louis, MO, USA) following Protocol A. RNA was used to prepare cDNA using Nugen’s Ovation RNA-Seq System via single primer isothermal amplification (Catalogue # 7102-A01) and automated on the Apollo 324 liquid handler (Wafergen). cDNA was quantified on the Nanodrop (Thermo Fisher Scientific) and was sheared to approximately 300 bp fragments using the Covaris M220 ultrasonicator. Libraries were generated using Kapa Biosystem’s library preparation kit (KK8201). Fragments were end repaired and A-tailed, and individual indexes and adapters (Bioo, catalogue #520999) were ligated on each separate sample. The adapter ligated molecules were cleaned using AMPure beads (Agencourt Bioscience/Beckman Coulter, A63883), and amplified with Kapa’s HIFI enzyme (KK2502). Each library was then analyzed for fragment size on an Agilent’s Tapestation, and quantified by qPCR (KAPA Library Quantification Kit, KK4835) on Thermo Fisher Scientific’s Quantstudio 5 before multiplex pooling (13-16 samples per lane) and paired-end sequencing at 2x150 bp on the Illumina NextSeq500 platform at Arizona State University’s CLAS Genomics Core facility. Raw read quality was assessed using fastQC (Andrews, 2010). De novo transcriptome assembly Raw sequence reads were processed using the SnoWhite pipeline (Barker et al., 2010a; Dlugosch et al., 2013), which included trimming adapter sequences and bases with a quality score below 20 from the 3' ends of all reads, removing reads that are entirely primer and/or adapter fragments using TagDust (Lassmann et al., 2009), and removing polyA/T tails with SeqClean ( The cleaned reads from each sample time point were merged together by pairs, and pooled to assemble a reference de novo transcriptome for each species. All transcriptomes were assembled with SOAPdenovo-Trans v1.03 (Xie et al., 2014) using a k-mer of 57.
Date made available2020

Cite this