Visual tracking has drawn constant attention of the researchers and engineers over the last decades. Although the researchers are making much progress persistently, it is still a vital problem to achieve a tracking procedure that simultaneously balances the accuracy, robustness, and tracking speed under complex scenarios, such as occlusion, illumination change, and scale variation.
Much progress has made by the combined region proposal networks and Siamese networks recently. There are still some vital problems unsettled during tracking procedure, such as easy negatives due to sampling process, too many semantic features caused by deep neural networks, which may make the visual tracker inaccurate in spatial location and less robust.
A research team led by Dr. ZHANG Ximing from the Xi'an Institute of Optics and Precision Mechanics (XIOPM) of the Chinese Academy of Sciences proposed a novel tracking strategy to make the camera more intelligent benefiting from accurate and robust visual tracking algorithms.
By analyzing the feature transfer function of the spatial transformer networks, the tracker can solve the spatial transformation problems when suffering from heavy scale change and rotation. Benefiting from the shrinkage loss, the networks penalize the weights of easy samples to alleviate the data imbalance issue.
Considering the redundancy of the proposals, the researchers found that multi-cue such as shape, color, and scale could be applied to refine the high-quality proposals that can not only improve the tracking performance in complex scenarios, but also reduce the computational effort. The results were published in SENSORS.
By utilizing the proposed method, they performed the experimental comparison with state-of-the-art methods in Large-scale Single Object Test Bench, Wild applications and UAV tracking scenarios.
"The tracker can not only handle accurate tracking in stable platform, but also the robust tracking in fast motion," said Dr. ZHANG.
A novel idea was provided that the cameras will be more intelligent through visual tracking algorithm and deep neural networks. In the near future, the scientist will solve the problems of "tracking-by-understanding", the smart camera would truly think like peoples do.
The related results between the proposed algorithm with state-of-the-art methods. (Image by XIOPM)
52 Sanlihe Rd., Beijing,
Copyright © 2002 - Chinese Academy of Sciences