TY - GEN
T1 - Multiple-Gaze Geometry
T2 - 15th European Conference on Computer Vision, ECCV 2018
AU - Brau, Ernesto
AU - Guan, Jinyan
AU - Jeffries, Tanya
AU - Barnard, Kobus
N1 - Publisher Copyright:
© 2018, Springer Nature Switzerland AG.
PY - 2018
Y1 - 2018
N2 - We develop using person gaze direction for scene understanding. In particular, we use intersecting gazes to learn 3D locations that people tend to look at, which is analogous to having multiple camera views. The 3D locations that we discover need not be visible to the camera. Conversely, knowing 3D locations of scene elements that draw visual attention, such as other people in the scene, can help infer gaze direction. We provide a Bayesian generative model for the temporal scene that captures the joint probability of camera parameters, locations of people, their gaze, what they are looking at, and locations of visual attention. Both the number of people in the scene and the number of extra objects that draw attention are unknown and need to be inferred. To execute this joint inference we use a probabilistic data association approach that enables principled comparison of model hypotheses. We use MCMC for inference over the discrete correspondence variables, and approximate the marginalization over continuous parameters using the Metropolis-Laplace approximation, using Hamiltonian (Hybrid) Monte Carlo for maximization. As existing data sets do not provide the 3D locations of what people are looking at, we contribute a small data set that does. On this data set, we infer what people are looking at with 59% precision compared with 13% for a baseline approach, and where those objects are within about 0.58 m.
AB - We develop using person gaze direction for scene understanding. In particular, we use intersecting gazes to learn 3D locations that people tend to look at, which is analogous to having multiple camera views. The 3D locations that we discover need not be visible to the camera. Conversely, knowing 3D locations of scene elements that draw visual attention, such as other people in the scene, can help infer gaze direction. We provide a Bayesian generative model for the temporal scene that captures the joint probability of camera parameters, locations of people, their gaze, what they are looking at, and locations of visual attention. Both the number of people in the scene and the number of extra objects that draw attention are unknown and need to be inferred. To execute this joint inference we use a probabilistic data association approach that enables principled comparison of model hypotheses. We use MCMC for inference over the discrete correspondence variables, and approximate the marginalization over continuous parameters using the Metropolis-Laplace approximation, using Hamiltonian (Hybrid) Monte Carlo for maximization. As existing data sets do not provide the 3D locations of what people are looking at, we contribute a small data set that does. On this data set, we infer what people are looking at with 59% precision compared with 13% for a baseline approach, and where those objects are within about 0.58 m.
KW - 3D gaze estimation
KW - 3D temporal scene understanding
KW - Discovering objects
KW - MCMC
KW - Model selection
KW - Monocular video
UR - http://www.scopus.com/inward/record.url?scp=85055458488&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85055458488&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-01225-0_38
DO - 10.1007/978-3-030-01225-0_38
M3 - Conference contribution
AN - SCOPUS:85055458488
SN - 9783030012243
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 641
EP - 659
BT - Computer Vision – ECCV 2018 - 15th European Conference, 2018, Proceedings
A2 - Ferrari, Vittorio
A2 - Sminchisescu, Cristian
A2 - Weiss, Yair
A2 - Hebert, Martial
PB - Springer-Verlag
Y2 - 8 September 2018 through 14 September 2018
ER -