Categorization and analysis of text in computer mediated communication archives using visualization

Ahmed Abbasi, Hsinchun Chen

Research output: Chapter in Book/Report/Conference proceedingConference contribution

21 Scopus citations

Abstract

Digital libraries (DLs) for online discourse contain large amounts of valuable information that is difficult to navigate and analyze. Visualization systems developed to facilitate improved CMC archive analysis and navigation primarily focus on interaction information, with little emphasis on textual content. In this paper we present a system that provides DL exploration services such as visualization, categorization, and analysis for CMC text. The system incorporates an extended feature set comprised of stylistic, topical, and sentiment related features to enable richer content representation. The system also includes the Ink Blot technique which utilizes decision tree models and text overlay to visualize CMC messages. Ink Blots can be used for text categorization and analysis across forums, authors, threads, messages, and over time. The proposed system's analysis capabilities were evaluated with a series of examples and a qualitative user study. Empirical categorization experiments comparing the Ink Blot technique against a benchmark support vector machine classifier were also conducted. The results demonstrated the efficacy of the Ink Blot technique for text categorization and also highlighted the effectiveness of the extended feature set for improved text categorization.

Original languageEnglish (US)
Title of host publicationProceedings of the 7th ACM/IEEE Joint Conference on Digital Libraries, JCDL 2007
Subtitle of host publicationBuilding and Sustaining the Digital Environment
Pages11-18
Number of pages8
DOIs
StatePublished - 2007
Event7th ACM/IEEE Joint Conference on Digital Libraries, JCDL 2007: Building and Sustaining the Digital Environment - Vancouver, BC, Canada
Duration: Jun 18 2007Jun 23 2007

Publication series

NameProceedings of the ACM International Conference on Digital Libraries

Other

Other7th ACM/IEEE Joint Conference on Digital Libraries, JCDL 2007: Building and Sustaining the Digital Environment
Country/TerritoryCanada
CityVancouver, BC
Period6/18/076/23/07

Keywords

  • Computer mediated communication
  • Text mining
  • Visualization

ASJC Scopus subject areas

  • Software
  • Information Systems
  • Computer Science Applications
  • Library and Information Sciences

Fingerprint

Dive into the research topics of 'Categorization and analysis of text in computer mediated communication archives using visualization'. Together they form a unique fingerprint.

Cite this