AI worse at recognizing images than humans
Researchers from HSE University and Moscow Polytechnic University have discovered that AI models are unable to represent features of human vision due to a lack of tight coupling with the respective physiology, so they are worse at recognizing images. The results of the study were published in the Proceedings of the Seventh International Congress on Information and Communication Technology.
To understand how machine perception of images differs from human perception, scientists uploaded images of classical visual illusions to the IBM Watson Visual Recognition online service. Most of them were geometric silhouettes, partially hidden by geometric shapes of the background color. The system tried to determine the nature of the image and indicated the degree of certainty in its response.
It turned out that artificial intelligence is not able to recognize any imaginary figure, with the exception of a colored imaginary triangle. Due to the high contrast with the background, it was recognized correctly.
"Objects similar to those that we used during the experiment can be found in real life," says Vladimir Vinnikov, an analyst at the Laboratory of Methods for Big Data Analysis of HSE Faculty of Computer Science and author of the study. "For example, autopilot of a car or airplane perceives a trailer or a radio tower, which at night are indicated only by marker lights, the same way as we perceive imaginary geometric shapes."
The human eye is constantly moving involuntarily, and the photosensitive surface of its retina has the shape of a hemisphere. A person can see an illusion if the image is a vector, i.e., if it includes reference points and curves connecting them. The human imagination will complete the picture due to constant eye movement, a physiological feature of our vision.
In optoelectronic systems everything is arranged differently. Their light-sensitive matrix has a flat, usually rectangular shape, and the lens system itself is not nearly as free in movement as the human eye. Therefore, artificial intelligence cannot complete imaginary lines that connect fragments of a geometric illusion. Machine vision sees only what is actually depicted, whereas people complete the image in their imagination based on its outlines.
Today, neural network image recognition systems are actively spreading in the commercial sector. However, the question of how accurately machines recognize images is still open. Human lives may depend on the accuracy of recognition. For example, an accident may occur if the autopilot of a car or airplane does not recognize an object with low contrast relative to the background and is not able to dodge an obstacle in time.
Scientists believe that inaccuracy of machine image recognition can be corrected. For example, they can complement the recognition of raster images, which represent a grid of pixels, by simulating physiological features of eye movement that allow the eye to see two-dimensional and three-dimensional scenes. An alternative way is to add vector description of the images, which will help to program the machine to bypass the image along the trajectories specified by the vectors.
"Imaginary objects should definitely be used as tests in systems that depend on the recognition of photo and video streams, for example, in autopilots of cars or drones. This will help to avoid the risks associated with the use of machine intelligence systems in industry and transport systems," says Vinnikov.
More information: Vladimir Vinnikov et al, Deficiencies of Computational Image Recognition in Comparison to Human Counterpart, Proceedings of Seventh International Congress on Information and Communication Technology (2022). DOI: 10.1007/978-981-19-1607-6_43