New artificial intelligence framework developed for target detection technology
Researchers from the Hefei Institutes of Physical Science (HFIPS) of the Chinese Academy of Sciences (CAS) have proposed a new artificial intelligence framework for target detection that provides a new solution for fast and high-precision real-time online target detection.
Relevant results were published in Expert Systems with Applications.
In recent years, deep learning theory has driven the rapid development of artificial intelligence technology. Object detection technology based on deep learning theory is also successful in many industrial applications. Current research focuses on improving the speed or accuracy of target detection and fails to take efficiency and accuracy into account. How to achieve fast and accurate object detection has become an important challenge in the field of artificial intelligence.
In this study, the researchers found that one of the main defects of the target detection technology based on deep learning resided in the repeated feature extraction and fusion of deep network structures, resulting in unnecessary computational costs.
Therefore, they proposed a multi-input single-output target recognition framework (MiSo), which is different from the traditional multi-input and multi-output model and reduced model complexity and inference time overhead.
Furthermore, under this framework, based on the eRF detection theory proposed earlier, the researchers designed three new learning mechanisms to extract hot spot feature information more accurately and efficiently, which were receptive field adjustment mechanism, residual attention self-learning mechanism, and eRF-based dynamic balance sampling strategy.
"We named them as M2YOLOF," said Wang Hongqiang who led the team, "it detects objects on one feature map and performs well on small objects. It's as fast as YOLOF (You Only Look One-level Feature), but more accurate."
They tried it on standard dataset benchmark and achieved 39.2 average precision (AP) at a speed of 29 frames per second. It's 2.6 AP higher than existing state-of-the-art TridenNet-R50.
This method provides a new idea for research and industrial application of target detection.
More information: Qijin Wang et al, M2YOLOF: Based on effective receptive fields and multiple-in-single-out encoder for object detection, Expert Systems with Applications (2022). DOI: 10.1016/j.eswa.2022.118928