Allele identification in assembled genomic sequence datasets

Katrina M. Dlugosch, Aurélie Bonin

Research output: Chapter in Book/Report/Conference proceedingChapter

1 Scopus citations


Allelic variation within species provides fundamental insights into the evolution and ecology of organisms, and information about this variation is becoming increasingly available in sequence datasets of multiple and/or outbred individuals. Unfortunately, identifying true allelic variants poses a number of challenges, given the presence of both sequencing errors and alleles from other closely related loci. We outline the key considerations involved in this process, including assessing the accuracy of allele resolution in sequence assembly, clustering of alleles within and among individuals, and identifying clusters that are most likely to correspond to true allelic variants of a single locus. Our focus is particularly on the case where alleles must be identified without a fully resolved reference genome, and where sequence depth information cannot be used to infer the putative number of loci sharing a sequence, such as in transcriptome or post-assembly datasets. Throughout, we provide information about publicly available tools to aid allele identification in such cases.

Original languageEnglish (US)
Title of host publicationData Production and Analysis in Population Genomics
Subtitle of host publicationMethods and Protocols
PublisherHumana Press Inc.
Number of pages15
ISBN (Print)9781617798696
StatePublished - 2012

Publication series

NameMethods in Molecular Biology
ISSN (Print)1064-3745


  • AllelePipe
  • Allelic variation
  • Gene duplication
  • Granularity
  • Maximum likelihood clustering
  • Next-generation sequencing
  • Paralogs
  • Single-linkage clustering
  • Transcriptome data

ASJC Scopus subject areas

  • Molecular Biology
  • Genetics


Dive into the research topics of 'Allele identification in assembled genomic sequence datasets'. Together they form a unique fingerprint.

Cite this