TY - GEN
T1 - We know where you are tweeting from
T2 - 3rd IEEE International Congress on Big Data, BigData Congress 2014
AU - Alsudais, Abdulkareem
AU - Leroy, Gondy
AU - Corso, Anthony
N1 - Publisher Copyright:
© 2014 IEEE.
PY - 2014/9/22
Y1 - 2014/9/22
N2 - Identifying the type of the place a user is tweeting from is important for many business and social applications, e.g., user profiles can help local businesses identify current and potential clients and their interests. We used Random Forest to identify six location categories. They are active life, eating out, hotels, nightlife, shopping, and shows. We evaluated 16 features for use in classification. The features are generated from the textual contents in the tweet, the metadata associated with the tweet, and the geographical area the user is tweeting from. We trained our classifier by analyzing 43,149 reviews from Yelp and by examining two twitter datasets. The first is an original dataset consisting of 6,359 tweets and the second is a stratified one containing 2,400 tweets uniformly distributed between the six categories. We evaluated our approach by creating a gold standard. Using 60% of our tweets for training and 40% for testing, our approach classified 74% of tweets in the original dataset, and 77% of tweets in the stratified dataset, correctly with the right location category. The results could be beneficial for research and business.
AB - Identifying the type of the place a user is tweeting from is important for many business and social applications, e.g., user profiles can help local businesses identify current and potential clients and their interests. We used Random Forest to identify six location categories. They are active life, eating out, hotels, nightlife, shopping, and shows. We evaluated 16 features for use in classification. The features are generated from the textual contents in the tweet, the metadata associated with the tweet, and the geographical area the user is tweeting from. We trained our classifier by analyzing 43,149 reviews from Yelp and by examining two twitter datasets. The first is an original dataset consisting of 6,359 tweets and the second is a stratified one containing 2,400 tweets uniformly distributed between the six categories. We evaluated our approach by creating a gold standard. Using 60% of our tweets for training and 40% for testing, our approach classified 74% of tweets in the original dataset, and 77% of tweets in the stratified dataset, correctly with the right location category. The results could be beneficial for research and business.
KW - Natural Language Processing
KW - Random Forests
KW - location analytics
UR - http://www.scopus.com/inward/record.url?scp=84923884974&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84923884974&partnerID=8YFLogxK
U2 - 10.1109/BigData.Congress.2014.91
DO - 10.1109/BigData.Congress.2014.91
M3 - Conference contribution
AN - SCOPUS:84923884974
T3 - Proceedings - 2014 IEEE International Congress on Big Data, BigData Congress 2014
SP - 594
EP - 600
BT - Proceedings - 2014 IEEE International Congress on Big Data, BigData Congress 2014
A2 - Chen, Peter
A2 - Chen, Peter
A2 - Jain, Hemant
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 27 June 2014 through 2 July 2014
ER -