TY - GEN
T1 - Analyzing the language of food on social media
AU - Fried, Daniel
AU - Surdeanu, Mihai
AU - Kobourov, Stephen
AU - Hingle, Melanie
AU - Bell, Dane
N1 - Publisher Copyright:
© 2014 IEEE.
PY - 2015/1/7
Y1 - 2015/1/7
N2 - We investigate the predictive power behind the language of food on social media. We collect a corpus of over three million food-related posts from Twitter and demonstrate that many latent population characteristics can be directly predicted from this data: overweight rate, diabetes rate, political leaning, and home geographical location of authors. For all tasks, our language-based models significantly outperform the majority-class baselines. Performance is further improved with more complex natural language processing, such as topic modeling. We analyze which textual features have greatest predictive power for these datasets, providing insight into the connections between the language of food, geographic locale, and community characteristics. Lastly, we design and implement an online system for real-time query and visualization of the dataset. Visualization tools, such as geo-referenced heatmaps and temporal histograms, allow us to discover more complex, global patterns mirrored in the language of food.
AB - We investigate the predictive power behind the language of food on social media. We collect a corpus of over three million food-related posts from Twitter and demonstrate that many latent population characteristics can be directly predicted from this data: overweight rate, diabetes rate, political leaning, and home geographical location of authors. For all tasks, our language-based models significantly outperform the majority-class baselines. Performance is further improved with more complex natural language processing, such as topic modeling. We analyze which textual features have greatest predictive power for these datasets, providing insight into the connections between the language of food, geographic locale, and community characteristics. Lastly, we design and implement an online system for real-time query and visualization of the dataset. Visualization tools, such as geo-referenced heatmaps and temporal histograms, allow us to discover more complex, global patterns mirrored in the language of food.
UR - http://www.scopus.com/inward/record.url?scp=84921776765&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84921776765&partnerID=8YFLogxK
U2 - 10.1109/BigData.2014.7004305
DO - 10.1109/BigData.2014.7004305
M3 - Conference contribution
AN - SCOPUS:84921776765
T3 - Proceedings - 2014 IEEE International Conference on Big Data, IEEE Big Data 2014
SP - 778
EP - 783
BT - Proceedings - 2014 IEEE International Conference on Big Data, IEEE Big Data 2014
A2 - Chang, Wo
A2 - Huan, Jun
A2 - Cercone, Nick
A2 - Pyne, Saumyadipta
A2 - Honavar, Vasant
A2 - Lin, Jimmy
A2 - Hu, Xiaohua Tony
A2 - Aggarwal, Charu
A2 - Mobasher, Bamshad
A2 - Pei, Jian
A2 - Nambiar, Raghunath
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2nd IEEE International Conference on Big Data, IEEE Big Data 2014
Y2 - 27 October 2014 through 30 October 2014
ER -