TY - JOUR
T1 - Evolutionary Dynamics of Abundant 7-bp Satellites in the Genome of Drosophila virilis
AU - Flynn, Jullien M.
AU - Long, Manyuan
AU - Wing, Rod A.
AU - Clark, Andrew G.
AU - Arkhipova, Irina
N1 - Funding Information:
We thank Yasir Ahmed-Braimah for helpful discussions and advice for some analyses. We also thank Bryant McAllister for providing Drosophila americana strains along with their fusion status. We are grateful for Elissa Cosgrove’s help with some computational trouble-shooting and Asha Jain’s help in preparing DNA sequencing libraries. We also thank Danny Miller for useful discussions and for providing the raw Nanopore reads from his study. Jianwei Zhang helped with PacBio raw data transfers. Sarah Kingan, Jane Landolin, and Greg Young from Pacific Biosciences were very helpful in exploring the causes for the artifactual repeats and in producing the HiFi data. We thank Amanda Larracuente for advice on FISH protocols and the Cornell Imaging Facility for use of their microscope. This project was funded by National Institutes of Health Grant Number GM116113 to R.A.W., M.L., and A.G.C. and Grant Number GM119125 to A.G.C. and Daniel Barbash. J.M.F. was supported by an Natural Sciences and Engineering Research Council of Canada Doctoral Scholarship.
Publisher Copyright:
© 2020 The Author(s). Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved.
PY - 2020/5/1
Y1 - 2020/5/1
N2 - The factors that drive the rapid changes in abundance of tandem arrays of highly repetitive sequences, known as satellite DNA, are not well understood. Drosophila virilis has one of the highest relative amounts of simple satellites of any organism that has been studied, with an estimated >40% of its genome composed of a few related 7-bp satellites. Here, we use D. virilis as a model to understand technical biases affecting satellite sequencing and the evolutionary processes that drive satellite composition. By analyzing sequencing data from Illumina, PacBio, and Nanopore platforms, we identify platform-specific biases and suggest best practices for accurate characterization of satellites by sequencing. We use comparative genomics and cytogenetics to demonstrate that the highly abundant AAACTAC satellite family arose from a related satellite in the branch leading to the virilis phylad 4.5-11 Ma before exploding in abundance in some species of the clade. The most abundant satellite is conserved in sequence and location in the pericentromeric region but has diverged widely in abundance among species, whereas the satellites nearest the centromere are rapidly turning over in sequence composition. By analyzing multiple strains of D. virilis, we saw that the abundances of two centromere-proximal satellites are anticorrelated along a geographical gradient, which we suggest could be caused by ongoing conflicts at the centromere. In conclusion, we illuminate several key attributes of satellite evolutionary dynamics that we hypothesize to be driven by processes including selection, meiotic drive, and constraints on satellite sequence and abundance.
AB - The factors that drive the rapid changes in abundance of tandem arrays of highly repetitive sequences, known as satellite DNA, are not well understood. Drosophila virilis has one of the highest relative amounts of simple satellites of any organism that has been studied, with an estimated >40% of its genome composed of a few related 7-bp satellites. Here, we use D. virilis as a model to understand technical biases affecting satellite sequencing and the evolutionary processes that drive satellite composition. By analyzing sequencing data from Illumina, PacBio, and Nanopore platforms, we identify platform-specific biases and suggest best practices for accurate characterization of satellites by sequencing. We use comparative genomics and cytogenetics to demonstrate that the highly abundant AAACTAC satellite family arose from a related satellite in the branch leading to the virilis phylad 4.5-11 Ma before exploding in abundance in some species of the clade. The most abundant satellite is conserved in sequence and location in the pericentromeric region but has diverged widely in abundance among species, whereas the satellites nearest the centromere are rapidly turning over in sequence composition. By analyzing multiple strains of D. virilis, we saw that the abundances of two centromere-proximal satellites are anticorrelated along a geographical gradient, which we suggest could be caused by ongoing conflicts at the centromere. In conclusion, we illuminate several key attributes of satellite evolutionary dynamics that we hypothesize to be driven by processes including selection, meiotic drive, and constraints on satellite sequence and abundance.
KW - centromeres
KW - comparative genomics
KW - long-read sequencing
KW - repetitive DNA
UR - http://www.scopus.com/inward/record.url?scp=85084103400&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85084103400&partnerID=8YFLogxK
U2 - 10.1093/molbev/msaa010
DO - 10.1093/molbev/msaa010
M3 - Article
C2 - 31960929
AN - SCOPUS:85084103400
SN - 0737-4038
VL - 37
SP - 1362
EP - 1375
JO - Molecular biology and evolution
JF - Molecular biology and evolution
IS - 5
ER -