Heterogeneous Metric Learning with Content-Based Regularization for Software Artifact Retrieval

Liang Wu, Liang Du, Bo Liu, Guandong Xu, Yong Ge, Yanjie Fu, Jianhui Li, Yuanchun Zhou, Hui Xiong

Research output: Chapter in Book/Report/Conference proceedingConference contribution

11 Scopus citations

Abstract

The problem of software artifact retrieval has the goal to effectively locate software artifacts, such as a piece of source code, in a large code repository. This problem has been traditionally addressed through the textual query. In other words, information retrieval techniques will be exploited based on the textual similarity between queries and textual representation of software artifacts, which is generated by collecting words from comments, identifiers, and descriptions of programs. However, in addition to these semantic information, there are rich information embedded in source codes themselves. These source codes, if analyzed properly, can be a rich source for enhancing the efforts of software artifact retrieval. To this end, in this paper, we develop a feature extraction method on source codes. Specifically, this method can capture both the inherent information in the source codes and the semantic information hidden in the comments, descriptions, and identifiers of the source codes. Moreover, we design a heterogeneous metric learning approach, which allows to integrate code features and text features into the same latent semantic space. This, in turn, can help to measure the artifact similarity by exploiting the joint power of both code and text features. Finally, extensive experiments on real-world data show that the proposed method can help to improve the performances of software artifact retrieval with a significant margin.

Original languageEnglish (US)
Title of host publicationProceedings - 14th IEEE International Conference on Data Mining, ICDM 2014
EditorsRavi Kumar, Hannu Toivonen, Jian Pei, Joshua Zhexue Huang, Xindong Wu
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages610-619
Number of pages10
EditionJanuary
ISBN (Electronic)9781479943029
DOIs
StatePublished - Jan 1 2014
Externally publishedYes
Event14th IEEE International Conference on Data Mining, ICDM 2014 - Shenzhen, China
Duration: Dec 14 2014Dec 17 2014

Publication series

NameProceedings - IEEE International Conference on Data Mining, ICDM
NumberJanuary
Volume2015-January
ISSN (Print)1550-4786

Conference

Conference14th IEEE International Conference on Data Mining, ICDM 2014
Country/TerritoryChina
CityShenzhen
Period12/14/1412/17/14

ASJC Scopus subject areas

  • General Engineering

Fingerprint

Dive into the research topics of 'Heterogeneous Metric Learning with Content-Based Regularization for Software Artifact Retrieval'. Together they form a unique fingerprint.

Cite this