TY - JOUR
T1 - Gene birth contributes to structural disorder encoded by overlapping genes
AU - Willis, Sara
AU - Masel, Joanna
N1 - Funding Information:
We thank Rafik Neme and Matt Cordes for comments on an earlier draft of this manuscript, Scott Foy for sharing his IUPred interface script, David Karlin for alerting us to issues annotating the relative ages of TGBp2 and TGBp3, and Arlin Stolzfus for pointers to the mutation-driven evolution literature. This work was supported by the John Templeton Foundation (39667, 60814), and the National Institutes of Health (GM104040).
Publisher Copyright:
© 2018 by the Genetics Society of America.
PY - 2018/9
Y1 - 2018/9
N2 - The same nucleotide sequence can encode two protein products in different reading frames. Overlapping gene regions encode higher levels of intrinsic structural disorder (ISD) than nonoverlapping genes (39% vs. 25% in our viral dataset). This might be because of the intrinsic properties of the genetic code, because one member per pair was recently born de novo in a process that favors high ISD, or because high ISD relieves increased evolutionary constraint imposed by dual-coding. Here, we quantify the relative contributions of these three alternative hypotheses. We estimate that the recency of de novo gene birth explains 32% or more of the elevation in ISD in overlapping regions of viral genes. While the two reading frames within a same-strand overlapping gene pair have markedly different ISD tendencies that must be controlled for, their effects cancel out to make no net contribution to ISD. The remaining elevation of ISD in the older members of overlapping gene pairs, presumed due to the need to alleviate evolutionary constraint, was already present prior to the origin of the overlap. Same-strand overlapping gene birth events can occur in two different frames, favoring high ISD either in the ancestral gene or in the novel gene; surprisingly, most de novo gene birth events contained completely within the body of an ancestral gene favor high ISD in the ancestral gene (23 phylogenetically independent events vs. 1). This can be explained by mutation bias favoring the frame with more start codons and fewer stop codons.
AB - The same nucleotide sequence can encode two protein products in different reading frames. Overlapping gene regions encode higher levels of intrinsic structural disorder (ISD) than nonoverlapping genes (39% vs. 25% in our viral dataset). This might be because of the intrinsic properties of the genetic code, because one member per pair was recently born de novo in a process that favors high ISD, or because high ISD relieves increased evolutionary constraint imposed by dual-coding. Here, we quantify the relative contributions of these three alternative hypotheses. We estimate that the recency of de novo gene birth explains 32% or more of the elevation in ISD in overlapping regions of viral genes. While the two reading frames within a same-strand overlapping gene pair have markedly different ISD tendencies that must be controlled for, their effects cancel out to make no net contribution to ISD. The remaining elevation of ISD in the older members of overlapping gene pairs, presumed due to the need to alleviate evolutionary constraint, was already present prior to the origin of the overlap. Same-strand overlapping gene birth events can occur in two different frames, favoring high ISD either in the ancestral gene or in the novel gene; surprisingly, most de novo gene birth events contained completely within the body of an ancestral gene favor high ISD in the ancestral gene (23 phylogenetically independent events vs. 1). This can be explained by mutation bias favoring the frame with more start codons and fewer stop codons.
KW - Alternative reading frame
KW - Evolutionary constraint
KW - Gene age
KW - Mutation-driven evolution
KW - Overprinting
UR - http://www.scopus.com/inward/record.url?scp=85052654706&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85052654706&partnerID=8YFLogxK
U2 - 10.1534/genetics.118.301249
DO - 10.1534/genetics.118.301249
M3 - Article
C2 - 30026186
AN - SCOPUS:85052654706
SN - 0016-6731
VL - 210
SP - 303
EP - 313
JO - Genetics
JF - Genetics
IS - 1
ER -