MEDRank: Using graph-based concept ranking to index biomedical texts

Jorge R. Herskovic, Trevor Cohen, Devika Subramanian, M. Sriram Iyengar, Jack W. Smith, Elmer V. Bernstam

Research output: Contribution to journalArticlepeer-review

10 Scopus citations

Abstract

Background: As the volume of biomedical text increases exponentially, automatic indexing becomes increasingly important. However, existing approaches do not distinguish central (or core) concepts from concepts that were mentioned in passing. We focus on the problem of indexing MEDLINE records, a process that is currently performed by highly trained humans at the National Library of Medicine (NLM). NLM indexers are assisted by a system called the Medical Text Indexer (MTI) that suggests candidate indexing terms. Objective: To improve the ability of MTI to select the core terms in MEDLINE abstracts. These core concepts are deemed to be most important and are designated as " major headings" by MEDLINE indexers. We introduce and evaluate a graph-based indexing methodology called MEDRank that generates concept graphs from biomedical text and then ranks the concepts within these graphs to identify the most important ones. Methods: We insert a MEDRank step into the MTI and compare MTI's output with and without MEDRank to the MEDLINE indexers' selected terms for a sample of 11,803 PubMed Central articles. We also tested whether human raters prefer terms generated by the MEDLINE indexers, MTI without MEDRank, and MTI with MEDRank for a sample of 36 PubMed Central articles. Results: MEDRank improved recall of major headings designated by 30% over MTI without MEDRank (0.489 vs. 0.376). Overall recall was only slightly (6.5%) higher (0.490 vs. 0.460) as was F2 (3%, 0.408 vs. 0.396). However, overall precision was 3.9% lower (0.268 vs. 0.279). Human raters preferred terms generated by MTI with MEDRank over terms generated by MTI without MEDRank (by an average of 1.00 more term per article), and preferred terms generated by MTI with MEDRank and the MEDLINE indexers at the same rate. Conclusions: The addition of MEDRank to MTI significantly improved the retrieval of core concepts in MEDLINE abstracts and more closely matched human expectations compared to MTI without MEDRank. In addition, MEDRank slightly improved overall recall and F2.

Original languageEnglish (US)
Pages (from-to)431-441
Number of pages11
JournalInternational Journal of Medical Informatics
Volume80
Issue number6
DOIs
StatePublished - Jun 2011
Externally publishedYes

Keywords

  • Abstracting and indexing as topic
  • Algorithms
  • Automatic data processing
  • Digital libraries
  • MEDLINE
  • Medical informatics
  • Natural language processing.
  • PubMed

ASJC Scopus subject areas

  • Health Informatics

Fingerprint

Dive into the research topics of 'MEDRank: Using graph-based concept ranking to index biomedical texts'. Together they form a unique fingerprint.

Cite this