Unmanned vehicles (UVs) play a key role in autonomous surveillance missions. A major task that these UVs must undertake for such missions is to detect targets and track their locations. In this paper, a bi-level target-detection/localization algorithm based on computer vision is proposed to make use of different capabilities of UVs as a cooperative team. The considered scenario involves an unmanned aerial vehicle (UAV) and a team of multiple unmanned ground vehicles (UGVs), which track and control crowds in a border area collaboratively. A modified moving object detection algorithm is applied to track targets (i.e. crowds) from a camera mounted on a moving UAV. Due to UAV's lower resolution and its greater distance to the crowd, UGVs are used for target classification with higher resolution, to detect the pedestrians at the crowd's boundary at each time point. A new collaborative localization method is then proposed to transform the crowd's image positions to the real-world geographic locations. Finally, a testbed with an agent-based simulation model and real UVs is constructed to test the proposed system performance for both detection and localization algorithms. The preliminary results demonstrate the effectiveness of the proposed approach for autonomous surveillance using cooperative UAV and UGVs.