TY - JOUR
T1 - A genome-wide pairwise-identity-based proposal for the classification of viruses in the genus Mastrevirus (family Geminiviridae)
AU - Muhire, Brejnev
AU - Martin, Darren P.
AU - Brown, Judith K.
AU - Navas-Castillo, Jesús
AU - Moriones, Enrique
AU - Zerbini, F. Murilo
AU - Rivera-Bustamante, Rafael
AU - Malathi, V. G.
AU - Briddon, Rob W.
AU - Varsani, Arvind
N1 - Funding Information:
BM is funded by the University of Cape Town, South Africa. JNC and EM are members of the Research Group AGR-214, partially funded by Consejería de Economía, Innovación y Ciencia, Junta de Andalucía, Spain, cofinanced by FEDER-FSE. The authors would like to thank the Center for High Performance Computing in Cape Town and the Information Communication Technology Services Department at the University of Cape Town for use of their high-performance computing clusters. The authors would additionally like to thank Claude Fauquet for reading through and commenting on the manuscript.
PY - 2013/6
Y1 - 2013/6
N2 - Recent advances in the ease with which the genomes of small circular single-stranded DNA viruses can be amplified, cloned, and sequenced have greatly accelerated the rate at which full genome sequences of mastreviruses (family Geminiviridae, genus Mastrevirus) are being deposited in public sequence databases. Although guidelines currently exist for species-level classification of newly determined, complete mastrevirus genome sequences, these are difficult to apply to large sequence datasets and are permissive enough that, effectively, a high degree of leeway exists for the proposal of new species and strains. The lack of a standardised and rigorous method for testing whether a new genome sequence deserves such a classification is resulting in increasing numbers of questionable mastrevirus species proposals. Importantly, the recommended sequence alignment and pairwise identity calculation protocols of the current guidelines could easily be modified to make the classification of newly determined mastrevirus genome sequences significantly more objective. Here, we propose modified versions of these protocols that should substantially minimise the degree of classification inconsistency that is permissible under the current system. To facilitate the objective application of these guidelines for mastrevirus species demarcation, we additionally present a user-friendly computer program, SDT (species demarcation tool), for calculating and graphically displaying pairwise genome identity scores. We apply SDT to the 939 full genome sequences of mastreviruses that were publically available in May 2012, and based on the distribution of pairwise identity scores yielded by our protocol, we propose mastrevirus species and strain demarcation thresholds of >78 % and >94 % identity, respectively.
AB - Recent advances in the ease with which the genomes of small circular single-stranded DNA viruses can be amplified, cloned, and sequenced have greatly accelerated the rate at which full genome sequences of mastreviruses (family Geminiviridae, genus Mastrevirus) are being deposited in public sequence databases. Although guidelines currently exist for species-level classification of newly determined, complete mastrevirus genome sequences, these are difficult to apply to large sequence datasets and are permissive enough that, effectively, a high degree of leeway exists for the proposal of new species and strains. The lack of a standardised and rigorous method for testing whether a new genome sequence deserves such a classification is resulting in increasing numbers of questionable mastrevirus species proposals. Importantly, the recommended sequence alignment and pairwise identity calculation protocols of the current guidelines could easily be modified to make the classification of newly determined mastrevirus genome sequences significantly more objective. Here, we propose modified versions of these protocols that should substantially minimise the degree of classification inconsistency that is permissible under the current system. To facilitate the objective application of these guidelines for mastrevirus species demarcation, we additionally present a user-friendly computer program, SDT (species demarcation tool), for calculating and graphically displaying pairwise genome identity scores. We apply SDT to the 939 full genome sequences of mastreviruses that were publically available in May 2012, and based on the distribution of pairwise identity scores yielded by our protocol, we propose mastrevirus species and strain demarcation thresholds of >78 % and >94 % identity, respectively.
UR - http://www.scopus.com/inward/record.url?scp=84878598727&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84878598727&partnerID=8YFLogxK
U2 - 10.1007/s00705-012-1601-7
DO - 10.1007/s00705-012-1601-7
M3 - Article
C2 - 23340592
AN - SCOPUS:84878598727
SN - 0304-8608
VL - 158
SP - 1411
EP - 1424
JO - Archives of virology
JF - Archives of virology
IS - 6
ER -