TY - JOUR
T1 - Topological Analysis of Large-scale Biomedical Terminology Structures
AU - Bales, Michael E.
AU - Lussier, Yves A.
AU - Johnson, Stephen B.
N1 - Funding Information:
Support for this research was provided by NLM training grant 5T15LM07079. This work was partially supported by grants LM008308-01 and 1U54CA121852. The authors would like to thank Drs. Olivier Bodenreider, James Cimino, William Hole, and Adam Rothschild, who provided invaluable guidance during the preparation of this manuscript. We also thank Drs. Patrick Mary and David Auber for support with Tulip software. In addition, this manuscript has benefited greatly from the insightful comments of two anonymous reviewers. We thank them for their diligence.
PY - 2007/11
Y1 - 2007/11
N2 - Objective: To characterize global structural features of large-scale biomedical terminologies using currently emerging statistical approaches. Design: Given rapid growth of terminologies, this research was designed to address scalability. We selected 16 terminologies covering a variety of domains from the UMLS Metathesaurus, a collection of terminological systems. Each was modeled as a network in which nodes were atomic concepts and links were relationships asserted by the source vocabulary. For comparison against each terminology we created three random networks of equivalent size and density. Measurements: Average node degree, node degree distribution, clustering coefficient, average path length. Results: Eight of 16 terminologies exhibited the small-world characteristics of a short average path length and strong local clustering. An overlapping subset of nine exhibited a power law distribution in node degrees, indicative of a scale-free architecture. We attribute these features to specific design constraints. Constraints on node connectivity, common in more synthetic classification systems, localize the effects of changes and deletions. In contrast, small-world and scale-free features, common in comprehensive medical terminologies, promote flexible navigation and less restrictive organic-like growth. Conclusion: While thought of as synthetic, grid-like structures, some controlled terminologies are structurally indistinguishable from natural language networks. This paradoxical result suggests that terminology structure is shaped not only by formal logic-based semantics, but by rules analogous to those that govern social networks and biological systems. Graph theoretic modeling shows early promise as a framework for describing terminology structure. Deeper understanding of these techniques may inform the development of scalable terminologies and ontologies.
AB - Objective: To characterize global structural features of large-scale biomedical terminologies using currently emerging statistical approaches. Design: Given rapid growth of terminologies, this research was designed to address scalability. We selected 16 terminologies covering a variety of domains from the UMLS Metathesaurus, a collection of terminological systems. Each was modeled as a network in which nodes were atomic concepts and links were relationships asserted by the source vocabulary. For comparison against each terminology we created three random networks of equivalent size and density. Measurements: Average node degree, node degree distribution, clustering coefficient, average path length. Results: Eight of 16 terminologies exhibited the small-world characteristics of a short average path length and strong local clustering. An overlapping subset of nine exhibited a power law distribution in node degrees, indicative of a scale-free architecture. We attribute these features to specific design constraints. Constraints on node connectivity, common in more synthetic classification systems, localize the effects of changes and deletions. In contrast, small-world and scale-free features, common in comprehensive medical terminologies, promote flexible navigation and less restrictive organic-like growth. Conclusion: While thought of as synthetic, grid-like structures, some controlled terminologies are structurally indistinguishable from natural language networks. This paradoxical result suggests that terminology structure is shaped not only by formal logic-based semantics, but by rules analogous to those that govern social networks and biological systems. Graph theoretic modeling shows early promise as a framework for describing terminology structure. Deeper understanding of these techniques may inform the development of scalable terminologies and ontologies.
UR - http://www.scopus.com/inward/record.url?scp=35648992978&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=35648992978&partnerID=8YFLogxK
U2 - 10.1197/jamia.M2080
DO - 10.1197/jamia.M2080
M3 - Article
C2 - 17712094
AN - SCOPUS:35648992978
SN - 1067-5027
VL - 14
SP - 788
EP - 797
JO - Journal of the American Medical Informatics Association
JF - Journal of the American Medical Informatics Association
IS - 6
ER -