TY - JOUR
T1 - Expanding the stdpopsim species catalog, and lessons learned for realistic genome simulations
AU - Lauterbur, M. Elise
AU - Cavassim, Maria Izabel A.
AU - Gladstein, Ariella L.
AU - Gower, Graham
AU - Pope, Nathaniel S.
AU - Tsambos, Georgia
AU - Adrion, Jeffrey
AU - Belsare, Saurabh
AU - Biddanda, Arjun
AU - Caudill, Victoria
AU - Cury, Jean
AU - Echevarria, Ignacio
AU - Haller, Benjamin C.
AU - Hasan, Ahmed R.
AU - Huang, Xin
AU - Iasi, Leonardo Nicola Martin
AU - Noskova, Ekaterina
AU - Obsteter, Jana
AU - Pavinato, Vitor Antonio Correa
AU - Pearson, Alice
AU - Peede, David
AU - Perez, Manolo F.
AU - Rodrigues, Murillo F.
AU - Smith, Chris C.R.
AU - Spence, Jeffrey P.
AU - Teterina, Anastasia
AU - Tittes, Silas
AU - Unneberg, Per
AU - Vazquez, Juan Manuel
AU - Waples, Ryan K.
AU - Wohns, Anthony Wilder
AU - Wong, Yan
AU - Baumdicker, Franz
AU - Cartwright, Reed A.
AU - Gorjanc, Gregor
AU - Gutenkunst, Ryan N.
AU - Kelleher, Jerome
AU - Kern, Andrew D.
AU - Ragsdale, Aaron P.
AU - Ralph, Peter L.
AU - Schrider, Daniel R.
AU - Gronau, Ilan
N1 - Funding Information:
We wish to thank the dozens of workshop attendees, and especially the two dozen or so hackathon participants, whose combined feedback motivated many of the updates made to stdpopsim in the past two years. Human Frontier Science Program RGY0075/2019 Jean Cury Brown University Predoctoral Training Program in Biological Data Science (NIH T32 GM128596) David Peede Science for Life Laboratory Knut and Alice Wallenberg Foundation Per Unneberg Deutsche Forschungsgemeinschaft EXC 2064/1 - Project number 390727645 Franz Baumdicker Deutsche Forschungsgemeinschaft EXC 2124 - Project number 390838134 Franz Baumdicker National Science Foundation DBI-1929850 Reed A Cartwright University of Edinburgh BBS/E/D/30002275 Gregor Gorjanc National Institute of General Medical Sciences R01GM127348 Ryan N Gutenkunst Robertson Foundation Jerome Kelleher National Institute of General Medical Sciences R01HG010774 Andrew D Kern National Institute of General Medical Sciences R35GM138286 Daniel R Schrider.
Publisher Copyright:
© Lauterbur et al.
PY - 2023/6
Y1 - 2023/6
N2 - Simulation is a key tool in population genetics for both methods development and empirical research, but producing simulations that recapitulate the main features of genomic datasets remains a major obstacle. Today, more realistic simulations are possible thanks to large increases in the quantity and quality of available genetic data, and the sophistication of inference and simulation software. However, implementing these simulations still requires substantial time and specialized knowledge. These challenges are especially pronounced for simulating genomes for species that are not well-studied, since it is not always clear what information is required to produce simulations with a level of realism sufficient to confidently answer a given question. The community-developed framework stdpopsim seeks to lower this barrier by facilitating the simulation of complex population genetic models using up-to-date information. The initial version of stdpopsim focused on establishing this framework using six well-characterized model species (Adrion et al., 2020). Here, we report on major improvements made in the new release of stdpopsim (version 0.2), which includes a significant expansion of the species catalog and substantial additions to simulation capabilities. Features added to improve the realism of the simulated genomes include non-crossover recombination and provision of species-specific genomic annotations. Through community-driven efforts, we expanded the number of species in the catalog more than threefold and broadened coverage across the tree of life. During the process of expanding the catalog, we have identified common sticking points and developed the best practices for setting up genome-scale simulations. We describe the input data required for generating a realistic simulation, suggest good practices for obtaining the relevant information from the literature, and discuss common pitfalls and major considerations. These improvements to stdpopsim aim to further promote the use of realistic whole-genome population genetic simulations, especially in non-model organisms, making them available, transparent, and accessible to everyone.
AB - Simulation is a key tool in population genetics for both methods development and empirical research, but producing simulations that recapitulate the main features of genomic datasets remains a major obstacle. Today, more realistic simulations are possible thanks to large increases in the quantity and quality of available genetic data, and the sophistication of inference and simulation software. However, implementing these simulations still requires substantial time and specialized knowledge. These challenges are especially pronounced for simulating genomes for species that are not well-studied, since it is not always clear what information is required to produce simulations with a level of realism sufficient to confidently answer a given question. The community-developed framework stdpopsim seeks to lower this barrier by facilitating the simulation of complex population genetic models using up-to-date information. The initial version of stdpopsim focused on establishing this framework using six well-characterized model species (Adrion et al., 2020). Here, we report on major improvements made in the new release of stdpopsim (version 0.2), which includes a significant expansion of the species catalog and substantial additions to simulation capabilities. Features added to improve the realism of the simulated genomes include non-crossover recombination and provision of species-specific genomic annotations. Through community-driven efforts, we expanded the number of species in the catalog more than threefold and broadened coverage across the tree of life. During the process of expanding the catalog, we have identified common sticking points and developed the best practices for setting up genome-scale simulations. We describe the input data required for generating a realistic simulation, suggest good practices for obtaining the relevant information from the literature, and discuss common pitfalls and major considerations. These improvements to stdpopsim aim to further promote the use of realistic whole-genome population genetic simulations, especially in non-model organisms, making them available, transparent, and accessible to everyone.
UR - http://www.scopus.com/inward/record.url?scp=85163100933&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85163100933&partnerID=8YFLogxK
U2 - 10.7554/eLife.84874
DO - 10.7554/eLife.84874
M3 - Article
C2 - 37342968
AN - SCOPUS:85163100933
SN - 2050-084X
VL - 12
JO - eLife
JF - eLife
M1 - RP84874
ER -