Short and sparse text topic modeling via self-aggregation

Xiaojun Quan, Chunyu Kit, Yong Ge, Sinno Jialin Pan

Research output: Chapter in Book/Report/Conference proceedingConference contribution

144 Scopus citations

Abstract

The overwhelming amount of short text data on social media and elsewhere has posed great challenges to topic modeling due to the sparsity problem. Most existing attempts to alleviate this problem resort to heuristic strategies to aggregate short texts into pseudo-documents before the application of standard topic modeling. Although such strategies cannot be well generalized to more general genres of short texts, the success has shed light on how to develop a generalized solution. In this paper, we present a novel model towards this goal by integrating topic modeling with short text aggregation during topic inference. The aggregation is founded on general topical affinity of texts rather than particular heuristics, making the model readily applicable to various short texts. Experimental results on real-world datasets validate the effectiveness of this new model, suggesting that it can distill more meaningful topics from short texts.

Original languageEnglish (US)
Title of host publicationIJCAI 2015 - Proceedings of the 24th International Joint Conference on Artificial Intelligence
EditorsMichael Wooldridge, Qiang Yang
PublisherInternational Joint Conferences on Artificial Intelligence
Pages2270-2276
Number of pages7
ISBN (Electronic)9781577357384
StatePublished - 2015
Externally publishedYes
Event24th International Joint Conference on Artificial Intelligence, IJCAI 2015 - Buenos Aires, Argentina
Duration: Jul 25 2015Jul 31 2015

Publication series

NameIJCAI International Joint Conference on Artificial Intelligence
Volume2015-January
ISSN (Print)1045-0823

Conference

Conference24th International Joint Conference on Artificial Intelligence, IJCAI 2015
Country/TerritoryArgentina
CityBuenos Aires
Period7/25/157/31/15

ASJC Scopus subject areas

  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'Short and sparse text topic modeling via self-aggregation'. Together they form a unique fingerprint.

Cite this