Modeling Biases from Low-Pass Genome Sequencing to Enable Accurate Population Genetic Inferences

Emanuel M. Fonseca, Linh N. Tran, Hannah Mendoza, Ryan N. Gutenkunst

Research output: Contribution to journalArticlepeer-review

Abstract

Low-pass genome sequencing is cost-effective and enables analysis of large cohorts. However, it introduces biases by reducing heterozygous genotypes and low-frequency alleles, impacting subsequent analyses such as model-based demographic history inference. Several approaches exist for inferring an unbiased allele frequency spectrum (AFS) from low-pass data, but they can introduce spurious noise into the AFS. Rather than correcting the AFS, here, we developed an approach that incorporates low-pass biases into the demographic modeling and directly analyzes the AFS from low-pass data. Our probabilistic model captures biases from the Genome Analysis Toolkit multisample calling pipeline, and we implemented it in the population genomic inference software dadi. We evaluated the model using simulated low-pass datasets and found that it alleviated low-pass biases in inferred demographic parameters. We further validated the model by downsampling 1000 Genomes Project data, demonstrating its effectiveness on real data. Our model is widely applicable and substantially improves model-based inferences from low-pass population genomic data.

Original languageEnglish (US)
Article numbermsaf002
JournalMolecular biology and evolution
Volume42
Issue number1
DOIs
StatePublished - Jan 1 2025

Keywords

  • GATK multisample calling
  • allele frequency spectrum
  • demography inference
  • inbreeding
  • low-pass sequencing

ASJC Scopus subject areas

  • Ecology, Evolution, Behavior and Systematics
  • Molecular Biology
  • Genetics

Fingerprint

Dive into the research topics of 'Modeling Biases from Low-Pass Genome Sequencing to Enable Accurate Population Genetic Inferences'. Together they form a unique fingerprint.

Cite this