The recent de Novo origin of protein C-termini

Matthew E. Andreatta, Joshua A. Levine, Scott G. Foy, Lynette D. Guzman, Luke J. Kosinski, Matthew H.J. Cordes, Joanna Masel

Research output: Contribution to journalArticlepeer-review

9 Scopus citations


Protein-coding sequences can arise either from duplication and divergence of existing sequences, or de novo fromnoncoding DNA. Unfortunately, recently evolved de novo genes can be hard to distinguish fromfalse positives, making their study difficult. Here, we study a more tractable version of the process of conversion of noncoding sequence into coding: the co-option of short segments of noncoding sequence into the C-termini of existing proteins via the loss of a stop codon. Because we study recent additions to potentially old genes,we are able to apply a variety of stringent quality filters to our annotations ofwhat is a true protein-coding gene, discarding the putative proteins of unknown function that are typical of recent fully de novo genes. We identify 54 examples of C-terminal extensions in Saccharomyces and 28 in Drosophila, all of themrecent enough to still be polymorphic.We find one putative gene fusion that turns out, on close inspection, to be the product of replicated assembly errors, further highlighting the issue of false positives in the study of rare events. Four of the Saccharomyces C-terminal extensions (toADH1,ARP8, TPM2, and PIS1) that survived our quality filters are predicted to lead to significant modification of a protein domain structure.

Original languageEnglish (US)
Pages (from-to)1686-1701
Number of pages16
JournalGenome biology and evolution
Issue number6
StatePublished - Jun 2015


  • Gene birth
  • Origin of novelty
  • Protein structure
  • Stop codon readthrough

ASJC Scopus subject areas

  • Ecology, Evolution, Behavior and Systematics
  • Genetics


Dive into the research topics of 'The recent de Novo origin of protein C-termini'. Together they form a unique fingerprint.

Cite this