TY - GEN
T1 - Mining Semantic Relations in Data References to Understand the Roles of Research Data in Academic Literature
AU - Fan, Lizhou
AU - Lafia, Sara
AU - Wofford, Morgan
AU - Thomer, Andrea
AU - Yakel, Elizabeth
AU - Hemphill, Libby
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - Research data serves important roles in scientific discovery and academic innovation. To appropriately assign credit for data work and to measure the value of research data, it is essential to articulate how data are actually used in research. We leveraged a combination of computational methods and human analysis to characterize different types of data use by mining semantic relations from the phrases where data are referenced in academic literature. In particular, we investigated references to data in the bibliography of a large social science data archive, the Inter-university Consortium for Political and Social Research (ICPSR). After retrieving and extracting semantic relations as subject-relation-object triples, we used rule-based methods to classify them. We then annotated samples from 11 frequent classes of data reference triples and found that they vary primarily along two dimensions of data use: proximity and function. Proximity describes the distance between the author and the data they reference (e.g., direct or indirect engagement). Function describes the role that data plays in each reference (e.g., describing interaction or providing context). These semantic relationships between authors and data reveal the ways data are used in scientific publications. Evidence of the variety of ways data are used can help stakeholders in research data curation and stewardship - including data providers, data curators, and data users - recognize the myriad ways that their investments in data sharing are realized.
AB - Research data serves important roles in scientific discovery and academic innovation. To appropriately assign credit for data work and to measure the value of research data, it is essential to articulate how data are actually used in research. We leveraged a combination of computational methods and human analysis to characterize different types of data use by mining semantic relations from the phrases where data are referenced in academic literature. In particular, we investigated references to data in the bibliography of a large social science data archive, the Inter-university Consortium for Political and Social Research (ICPSR). After retrieving and extracting semantic relations as subject-relation-object triples, we used rule-based methods to classify them. We then annotated samples from 11 frequent classes of data reference triples and found that they vary primarily along two dimensions of data use: proximity and function. Proximity describes the distance between the author and the data they reference (e.g., direct or indirect engagement). Function describes the role that data plays in each reference (e.g., describing interaction or providing context). These semantic relationships between authors and data reveal the ways data are used in scientific publications. Evidence of the variety of ways data are used can help stakeholders in research data curation and stewardship - including data providers, data curators, and data users - recognize the myriad ways that their investments in data sharing are realized.
KW - information extraction
KW - knowledge discovery
KW - research data management
KW - semantic triples
KW - text mining
UR - https://www.scopus.com/pages/publications/85174576117
UR - https://www.scopus.com/pages/publications/85174576117#tab=citedBy
U2 - 10.1109/JCDL57899.2023.00039
DO - 10.1109/JCDL57899.2023.00039
M3 - Conference contribution
AN - SCOPUS:85174576117
T3 - Proceedings of the ACM/IEEE Joint Conference on Digital Libraries
SP - 215
EP - 227
BT - Proceedings - 2023 ACM/IEEE Joint Conference on Digital Libraries, JCDL 2023
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2023 ACM/IEEE Joint Conference on Digital Libraries, JCDL 2023
Y2 - 26 June 2023 through 30 June 2023
ER -