Abstract
“Long tail” data are considered to be smaller, heterogeneous, researcher-held data, which present unique data management and scholarly communication challenges. These data are presumably concentrated within relatively lower-funded projects due to insufficient resources for curation. To better understand the nature and distribution of long tail data, we examine National Science Foundation (NSF) funding patterns using Latent Dirichlet Allocation (LDA) and bibliographic data. We also introduce the concept of “Topic Investment” to capture differences in topics across funding levels and to illuminate the distribution of funding across topics. This study uses the discipline of astronomy as a case study, overall exploring possible associations between topic, funding level and research output, with implications for research policy and practice. We find that while different topics demonstrate different funding levels and publication patterns, dynamics predicted by the “long tail” theoretical framework presented here can be observed within NSF-funded topics in astronomy.
Original language | English (US) |
---|---|
Article number | e276 |
Journal | Proceedings of the Association for Information Science and Technology |
Volume | 57 |
Issue number | 1 |
DOIs | |
State | Published - 2020 |
Keywords
- astronomy
- data curation
- long tail
- research funding
- topic analysis
ASJC Scopus subject areas
- General Computer Science
- Library and Information Sciences