TY - GEN

T1 - OMG! Orthologs in Multiple Genomes - Competing graph-theoretical formulations

AU - Zheng, Chunfang

AU - Swenson, Krister

AU - Lyons, Eric

AU - Sankoff, David

PY - 2011

Y1 - 2011

N2 - From the set of all pairwise homologies, weighted by sequence similarities, among a set of genomes, we seek disjoint orthology sets of genes, in which each element is orthogonal to all other genes (on a different genome) in the same set. In a graph-theoretical formulation, where genes are vertices and weighted edges represent homologies, we suggest three criteria, with three different biological motivations, for evaluating the partition of genes produced by deletion of a subset of edges: i) minimum weight edge removal, ii) minimum degree-zero vertex creation, and iii) maximum number of edges in the transitive closure of the graph after edge deletion. For each of the problems, all either proved or conjectured to be NP-hard, we suggest approximate and heuristic algorithms of finding orthology sets satisfying the criteria, and show how to incorporate genomes that have a whole genome duplication event in their immediate lineage. We apply this to ten flowering plant genomes, involving 160,000 different genes in given pairwise homologies. We evaluate the results in a number of ways and recommend criterion iii) as best suited to applications to multiple gene order alignment.

AB - From the set of all pairwise homologies, weighted by sequence similarities, among a set of genomes, we seek disjoint orthology sets of genes, in which each element is orthogonal to all other genes (on a different genome) in the same set. In a graph-theoretical formulation, where genes are vertices and weighted edges represent homologies, we suggest three criteria, with three different biological motivations, for evaluating the partition of genes produced by deletion of a subset of edges: i) minimum weight edge removal, ii) minimum degree-zero vertex creation, and iii) maximum number of edges in the transitive closure of the graph after edge deletion. For each of the problems, all either proved or conjectured to be NP-hard, we suggest approximate and heuristic algorithms of finding orthology sets satisfying the criteria, and show how to incorporate genomes that have a whole genome duplication event in their immediate lineage. We apply this to ten flowering plant genomes, involving 160,000 different genes in given pairwise homologies. We evaluate the results in a number of ways and recommend criterion iii) as best suited to applications to multiple gene order alignment.

UR - http://www.scopus.com/inward/record.url?scp=80052984663&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=80052984663&partnerID=8YFLogxK

U2 - 10.1007/978-3-642-23038-7_30

DO - 10.1007/978-3-642-23038-7_30

M3 - Conference contribution

AN - SCOPUS:80052984663

SN - 9783642230370

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 364

EP - 375

BT - Algorithms in Bioinformatics - 11th International Workshop, WABI 2011, Proceedings

PB - Springer-Verlag

ER -