TY - GEN
T1 - Age and gender prediction on health forum data
AU - Shrestha, Prasha
AU - Bethard, Steven
AU - Pedersen, Ted
AU - Rey-Villamizar, Nicolas
AU - Sadeque, Farig
AU - Solorio, Thamar
N1 - Funding Information:
This project was partially supported by NSF award No. 1462141.
PY - 2016
Y1 - 2016
N2 - Health support forums have become a rich source of data that can be used to improve health care outcomes. A user profile, including information such as age and gender, can support targeted analysis of forum data. But users might not always disclose their age and gender. It is desirable then to be able to automatically extract this information from users' content. However, to the best of our knowledge there is no such resource for author profiling of health forum data. Here we present a large corpus, with close to 85,000 users, for profiling and also outline our approach and benchmark results to automatically detect a user's age and gender from their forum posts. We use a mix of features from a user's text as well as forum specific features to obtain accuracy well above the baseline, thus showing that both our dataset and our method are useful and valid.
AB - Health support forums have become a rich source of data that can be used to improve health care outcomes. A user profile, including information such as age and gender, can support targeted analysis of forum data. But users might not always disclose their age and gender. It is desirable then to be able to automatically extract this information from users' content. However, to the best of our knowledge there is no such resource for author profiling of health forum data. Here we present a large corpus, with close to 85,000 users, for profiling and also outline our approach and benchmark results to automatically detect a user's age and gender from their forum posts. We use a mix of features from a user's text as well as forum specific features to obtain accuracy well above the baseline, thus showing that both our dataset and our method are useful and valid.
KW - Age
KW - Author profiling
KW - Gender
KW - Medical forums
UR - http://www.scopus.com/inward/record.url?scp=85017406666&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85017406666&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85017406666
T3 - Proceedings of the 10th International Conference on Language Resources and Evaluation, LREC 2016
SP - 3394
EP - 3401
BT - Proceedings of the 10th International Conference on Language Resources and Evaluation, LREC 2016
A2 - Calzolari, Nicoletta
A2 - Choukri, Khalid
A2 - Mazo, Helene
A2 - Moreno, Asuncion
A2 - Declerck, Thierry
A2 - Goggi, Sara
A2 - Grobelnik, Marko
A2 - Odijk, Jan
A2 - Piperidis, Stelios
A2 - Maegaard, Bente
A2 - Mariani, Joseph
PB - European Language Resources Association (ELRA)
T2 - 10th International Conference on Language Resources and Evaluation, LREC 2016
Y2 - 23 May 2016 through 28 May 2016
ER -