TY - JOUR
T1 - Panda
T2 - 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020
AU - Wang, Xueyang
AU - Zhang, Xiya
AU - Zhu, Yinheng
AU - Guo, Yuchen
AU - Yuan, Xiaoyun
AU - Xiang, Liuyu
AU - Wang, Zerun
AU - Ding, Guiguang
AU - Brady, David
AU - Dai, Qionghai
AU - Fang, Lu
N1 - Funding Information:
* These authors have contributed equally to this work. Corresponding author. Mail: fanglu@sz.tsinghua.edu.cn. This work is supported in part by Natural Science Foundation of China (NSFC) under contract No. 61722209, 6181001011, 61971260 and U1936202, in part by Shenzhen Science and Technology Research and Development Funds (JCYJ20180507183706645).
Publisher Copyright:
©2020 IEEE.
PY - 2020
Y1 - 2020
N2 - We present PANDA, the first gigaPixel-level humAN-centric viDeo dAtaset, for large-scale, long-term, and multi-object visual analysis. The videos in PANDA were captured by a gigapixel camera and cover real-world scenes with both wide field-of-view (∼1 km2 area) and high-resolution details (∼gigapixel-level/frame). The scenes may contain 4k head counts with over 100× scale variation. PANDA provides enriched and hierarchical ground-truth annotations, including 15, 974.6k bounding boxes, 111.8k fine-grained attribute labels, 12.7k trajectories, 2.2k groups and 2.9k interactions. We benchmark the human detection and tracking tasks. Due to the vast variance of pedestrian pose, scale, occlusion and trajectory, existing approaches are challenged by both accuracy and efficiency. Given the uniqueness of PANDA with both wide FoV and high resolution, a new task of interaction-aware group detection is introduced. We design a ‘global-to-local zoom-in’ framework, where global trajectories and local interactions are simultaneously encoded, yielding promising results. We believe PANDA will contribute to the community of artificial intelligence and praxeology by understanding human behaviors and interactions in large-scale real-world scenes. PANDA Website: http://www.panda-dataset.com.
AB - We present PANDA, the first gigaPixel-level humAN-centric viDeo dAtaset, for large-scale, long-term, and multi-object visual analysis. The videos in PANDA were captured by a gigapixel camera and cover real-world scenes with both wide field-of-view (∼1 km2 area) and high-resolution details (∼gigapixel-level/frame). The scenes may contain 4k head counts with over 100× scale variation. PANDA provides enriched and hierarchical ground-truth annotations, including 15, 974.6k bounding boxes, 111.8k fine-grained attribute labels, 12.7k trajectories, 2.2k groups and 2.9k interactions. We benchmark the human detection and tracking tasks. Due to the vast variance of pedestrian pose, scale, occlusion and trajectory, existing approaches are challenged by both accuracy and efficiency. Given the uniqueness of PANDA with both wide FoV and high resolution, a new task of interaction-aware group detection is introduced. We design a ‘global-to-local zoom-in’ framework, where global trajectories and local interactions are simultaneously encoded, yielding promising results. We believe PANDA will contribute to the community of artificial intelligence and praxeology by understanding human behaviors and interactions in large-scale real-world scenes. PANDA Website: http://www.panda-dataset.com.
UR - http://www.scopus.com/inward/record.url?scp=85092360580&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85092360580&partnerID=8YFLogxK
U2 - 10.1109/CVPR42600.2020.00333
DO - 10.1109/CVPR42600.2020.00333
M3 - Conference article
AN - SCOPUS:85092360580
SP - 3265
EP - 3275
JO - Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
JF - Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
SN - 1063-6919
M1 - 9156646
Y2 - 14 June 2020 through 19 June 2020
ER -