TY - GEN
T1 - Predicting the visual focus of attention in multi-person discussion videos
AU - Bai, Chongyang
AU - Kumar, Srijan
AU - Leskovec, Jure
AU - Metzger, Miriam
AU - Nunamaker, Jay F.
AU - Subrahmanian, V. S.
N1 - Funding Information:
This work was funded in parts by ARO Grant W911NF1610342, NSF OAC-1835598, DARPA MCS, ARO MURI, JD.com, Amazon, and Stanford Data Science Initiative. JL is a Chan Zuckerberg Biohub investigator.
Publisher Copyright:
© 2019 International Joint Conferences on Artificial Intelligence. All rights reserved.
PY - 2019
Y1 - 2019
N2 - Visual focus of attention in multi-person discussions is a crucial nonverbal indicator in tasks such as inter-personal relation inference, speech transcription, and deception detection. However, predicting the focus of attention remains a challenge because the focus changes rapidly, the discussions are highly dynamic, and the people's behaviors are inter-dependent. Here we propose ICAF (Iterative Collective Attention Focus), a collective classification model to jointly learn the visual focus of attention of all people. Every person is modeled using a separate classifier. ICAF models the people collectively-the predictions of all other people's classifiers are used as inputs to each person's classifier. This explicitly incorporates inter-dependencies between all people's behaviors. We evaluate ICAF on a novel dataset of 5 videos (35 people, 109 minutes, 7604 labels in all) of the popular Resistance game and a widely-studied meeting dataset with supervised prediction. ICAF outperforms the strongest baseline by 1%-5% accuracy in predicting the people's visual focus of attention. Further, we propose a lightly supervised technique to train models in the absence of training labels. We show that light-supervised ICAF performs similar to the supervised ICAF, thus showing its effectiveness and generality to previously unseen videos.
AB - Visual focus of attention in multi-person discussions is a crucial nonverbal indicator in tasks such as inter-personal relation inference, speech transcription, and deception detection. However, predicting the focus of attention remains a challenge because the focus changes rapidly, the discussions are highly dynamic, and the people's behaviors are inter-dependent. Here we propose ICAF (Iterative Collective Attention Focus), a collective classification model to jointly learn the visual focus of attention of all people. Every person is modeled using a separate classifier. ICAF models the people collectively-the predictions of all other people's classifiers are used as inputs to each person's classifier. This explicitly incorporates inter-dependencies between all people's behaviors. We evaluate ICAF on a novel dataset of 5 videos (35 people, 109 minutes, 7604 labels in all) of the popular Resistance game and a widely-studied meeting dataset with supervised prediction. ICAF outperforms the strongest baseline by 1%-5% accuracy in predicting the people's visual focus of attention. Further, we propose a lightly supervised technique to train models in the absence of training labels. We show that light-supervised ICAF performs similar to the supervised ICAF, thus showing its effectiveness and generality to previously unseen videos.
UR - http://www.scopus.com/inward/record.url?scp=85074953535&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85074953535&partnerID=8YFLogxK
U2 - 10.24963/ijcai.2019/626
DO - 10.24963/ijcai.2019/626
M3 - Conference contribution
AN - SCOPUS:85074953535
T3 - IJCAI International Joint Conference on Artificial Intelligence
SP - 4504
EP - 4510
BT - Proceedings of the 28th International Joint Conference on Artificial Intelligence, IJCAI 2019
A2 - Kraus, Sarit
PB - International Joint Conferences on Artificial Intelligence
T2 - 28th International Joint Conference on Artificial Intelligence, IJCAI 2019
Y2 - 10 August 2019 through 16 August 2019
ER -