Researchers from the Changchun Institute of Optics, Fine Mechanics and Physics of the Chinese Academy of Sciences have developed a novel autofocus method that harnesses the power of deep learning to dynamically select regions of interest in grayscale images. The study was published in Sensors.
Traditional autofocus methods can be divided into active and passive categories. Active focusing relies on external sensors, increasing costs and complexity. In contrast, passive focusing assesses image quality to control focus, but fixed focusing windows and evaluation functions often lead to focusing failures, especially in complex scenes.
Moreover, the lack of comprehensive datasets has hindered the widespread adoption of deep learning methods in autofocus. Traditional image-based autofocus solutions suffer from issues like misjudging light spots and focal breathing where changes in camera zoom and light intensity during focusing can affect image sharpness evaluation.
In this study, researchers embarked on a three-step method to solve these problems. First, they constructed a comprehensive dataset of grayscale image sequences with continuous focusing adjustments, capturing diverse scenes from simple to complex and at varying focal lengths. This dataset serves as a valuable resource for training and evaluating autofocus algorithms.
Next, researchers transformed the autofocus problem into an ordinal regression task, proposing two focusing strategies, full-stack search and single-frame prediction. These strategies enable the network to adaptively focus on salient regions within the frame, eliminating the need for pre-selected focusing windows.
Finally, researchers designed a MobileViT network equipped with a linear self-attention mechanism. This lightweight yet powerful network achieves dynamic autofocus with minimal computational cost, ensuring fast and accurate focusing.
Experiments showed that the full-stack search strategy achieved a mean absolute error (MAE) of 0.094 with a focusing time of 27.8 milliseconds, while the single-frame prediction strategy achieved an MAE of 0.142 in just 27.5 milliseconds. These results underscore the superior performance of the deep learning-based autofocus method.
This deep learning-based autofocus method underscores the potential of AI in enhancing traditional imaging technologies. Future research could explore the application of this method to color images and video sequences. In addition, optimizing the network architecture and focusing strategies could lead to even faster and more accurate focusing.
86-10-68597521 (day)
86-10-68597289 (night)
86-10-68511095 (day)
86-10-68512458 (night)
cas_en@cas.cn
52 Sanlihe Rd., Xicheng District,
Beijing, China (100864)