Abstract
There has been increasing interest in using data from social media, search engines, and other web sources for predictive analytics in many different domains. Although using these datasets in different context has shown significant promise, mounting evidence suggests that many of the results being produced could be misrepresented because of the loosely structured textual data and noise caused by anomalous media spikes and use of misleading terms and phases. We introduce a novel and efficient framework combining natural language processing (NLP) and machine learning classification techniques to extract signal from social media text. Our methodology was tested using two different large real world datasets from social media and resulted in an overall accuracy of 88% and high per-class precision and recall. The methodology described in this paper can be used for a variety of purposes to yield improved analyses of social media and web text with a view to enabling improved predictions.
Original language | English (US) |
---|---|
State | Published - 2015 |
Event | 25th Annual Workshop on Information Technologies and Systems, WITS 2015 - Dallas, United States Duration: Dec 12 2015 → Dec 13 2015 |
Other
Other | 25th Annual Workshop on Information Technologies and Systems, WITS 2015 |
---|---|
Country/Territory | United States |
City | Dallas |
Period | 12/12/15 → 12/13/15 |
ASJC Scopus subject areas
- Information Systems