TY - GEN
T1 - Categorization and analysis of text in computer mediated communication archives using visualization
AU - Abbasi, Ahmed
AU - Chen, Hsinchun
PY - 2007
Y1 - 2007
N2 - Digital libraries (DLs) for online discourse contain large amounts of valuable information that is difficult to navigate and analyze. Visualization systems developed to facilitate improved CMC archive analysis and navigation primarily focus on interaction information, with little emphasis on textual content. In this paper we present a system that provides DL exploration services such as visualization, categorization, and analysis for CMC text. The system incorporates an extended feature set comprised of stylistic, topical, and sentiment related features to enable richer content representation. The system also includes the Ink Blot technique which utilizes decision tree models and text overlay to visualize CMC messages. Ink Blots can be used for text categorization and analysis across forums, authors, threads, messages, and over time. The proposed system's analysis capabilities were evaluated with a series of examples and a qualitative user study. Empirical categorization experiments comparing the Ink Blot technique against a benchmark support vector machine classifier were also conducted. The results demonstrated the efficacy of the Ink Blot technique for text categorization and also highlighted the effectiveness of the extended feature set for improved text categorization.
AB - Digital libraries (DLs) for online discourse contain large amounts of valuable information that is difficult to navigate and analyze. Visualization systems developed to facilitate improved CMC archive analysis and navigation primarily focus on interaction information, with little emphasis on textual content. In this paper we present a system that provides DL exploration services such as visualization, categorization, and analysis for CMC text. The system incorporates an extended feature set comprised of stylistic, topical, and sentiment related features to enable richer content representation. The system also includes the Ink Blot technique which utilizes decision tree models and text overlay to visualize CMC messages. Ink Blots can be used for text categorization and analysis across forums, authors, threads, messages, and over time. The proposed system's analysis capabilities were evaluated with a series of examples and a qualitative user study. Empirical categorization experiments comparing the Ink Blot technique against a benchmark support vector machine classifier were also conducted. The results demonstrated the efficacy of the Ink Blot technique for text categorization and also highlighted the effectiveness of the extended feature set for improved text categorization.
KW - Computer mediated communication
KW - Text mining
KW - Visualization
UR - http://www.scopus.com/inward/record.url?scp=36349003655&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=36349003655&partnerID=8YFLogxK
U2 - 10.1145/1255175.1255178
DO - 10.1145/1255175.1255178
M3 - Conference contribution
AN - SCOPUS:36349003655
SN - 1595936440
SN - 9781595936448
T3 - Proceedings of the ACM International Conference on Digital Libraries
SP - 11
EP - 18
BT - Proceedings of the 7th ACM/IEEE Joint Conference on Digital Libraries, JCDL 2007
T2 - 7th ACM/IEEE Joint Conference on Digital Libraries, JCDL 2007: Building and Sustaining the Digital Environment
Y2 - 18 June 2007 through 23 June 2007
ER -