Abstract
Background: Chaining is a major problem in constructing gene families. Results: We define a new kind of cluster on graphs with strong and weak edges: soft cliques with backbones (SCWiB). This differs from other definitions in how it controls the "chaining effect", by ensuring clusters satisfy a tolerant edge density criterion that takes into account cluster size. We implement algorithms for decomposing a graph of similarities into SCWiBs. We compare examples of output from SCWiB and the Markov Cluster Algorithm (MCL), and also compare some curated Arabidopsis thaliana gene families with the results of automatic clustering. We apply our method to 44 published angiosperm genomes with annotation, and discover that Amborella trichopoda is distinct from all the others in having substantially and systematically smaller proportions of moderate- and large-size gene families. Conclusions: We offer several possible evolutionary explanations for this result.
Original language | English (US) |
---|---|
Article number | S8 |
Journal | BMC genomics |
Volume | 15 |
Issue number | 6 |
DOIs | |
State | Published - Oct 17 2014 |
Keywords
- Amborella trichopeda
- Angiosperms
- Clustering
- Gene families
- S-plex
ASJC Scopus subject areas
- Biotechnology
- Genetics