TY - JOUR
T1 - Single-molecule sequencing of the desiccation-tolerant grass Oropetium thomaeum
AU - Vanburen, Robert
AU - Bryant, Doug
AU - Edger, Patrick P.
AU - Tang, Haibao
AU - Burgess, Diane
AU - Challabathula, Dinakar
AU - Spittle, Kristi
AU - Hall, Richard
AU - Gu, Jenny
AU - Lyons, Eric
AU - Freeling, Michael
AU - Bartels, Dorothea
AU - Ten Hallers, Boudewijn
AU - Hastie, Alex
AU - Michael, Todd P.
AU - Mockler, Todd C.
N1 - Publisher Copyright:
© 2015 Macmillan Publishers Limited. All rights reserved.
PY - 2015/11/26
Y1 - 2015/11/26
N2 - Plant genomes, and eukaryotic genomes in general, are typically repetitive, polyploid and heterozygous, which complicates genome assembly. The short read lengths of early Sanger and current next-generation sequencing platforms hinder assembly through complex repeat regions, and many draft and reference genomes are fragmented, lacking skewed GC and repetitive intergenic sequences, which are gaining importance due to projects like the Encyclopedia of DNA Elements (ENCODE). Here we report the whole-genome sequencing and assembly of the desiccation-tolerant grass Oropetium thomaeum. Using only single-molecule real-time sequencing, which generates long (>16 kilobases) reads with random errors, we assembled 99% (244 megabases) of the Oropetium genome into 625 contigs with an N50 length of 2.4 megabases. Oropetium is an example of a 'near-complete' draft genome which includes gapless coverage over gene space as well as intergenic sequences such as centromeres, telomeres, transposable elements and rRNA clusters that are typically unassembled in draft genomes. Oropetium has 28,466 protein-coding genes and 43% repeat sequences, yet with 30% more compact euchromatic regions it is the smallest known grass genome. The Oropetium genome demonstrates the utility of single-molecule real-time sequencing for assembling high-quality plant and other eukaryotic genomes, and serves as a valuable resource for the plant comparative genomics community.
AB - Plant genomes, and eukaryotic genomes in general, are typically repetitive, polyploid and heterozygous, which complicates genome assembly. The short read lengths of early Sanger and current next-generation sequencing platforms hinder assembly through complex repeat regions, and many draft and reference genomes are fragmented, lacking skewed GC and repetitive intergenic sequences, which are gaining importance due to projects like the Encyclopedia of DNA Elements (ENCODE). Here we report the whole-genome sequencing and assembly of the desiccation-tolerant grass Oropetium thomaeum. Using only single-molecule real-time sequencing, which generates long (>16 kilobases) reads with random errors, we assembled 99% (244 megabases) of the Oropetium genome into 625 contigs with an N50 length of 2.4 megabases. Oropetium is an example of a 'near-complete' draft genome which includes gapless coverage over gene space as well as intergenic sequences such as centromeres, telomeres, transposable elements and rRNA clusters that are typically unassembled in draft genomes. Oropetium has 28,466 protein-coding genes and 43% repeat sequences, yet with 30% more compact euchromatic regions it is the smallest known grass genome. The Oropetium genome demonstrates the utility of single-molecule real-time sequencing for assembling high-quality plant and other eukaryotic genomes, and serves as a valuable resource for the plant comparative genomics community.
UR - http://www.scopus.com/inward/record.url?scp=84948440327&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84948440327&partnerID=8YFLogxK
U2 - 10.1038/nature15714
DO - 10.1038/nature15714
M3 - Article
C2 - 26560029
AN - SCOPUS:84948440327
SN - 0028-0836
VL - 527
SP - 508
EP - 511
JO - Nature
JF - Nature
IS - 7579
ER -