July 16, 2018 feature

A new machine learning strategy that could enhance computer vision

by Ingrid Fadelli , Tech Xplore

Images query example from the study — The model is capable of learning features that encode well the semantic content of the images. Given an image query (image on the left), the model is able to retrieve images which are semantically similar (depict the same type of object), although they might be visually dissimilar (different colours, backgrounds or compositions). Credit: arXiv:1807.02110 [cs.CV]

Researchers from the Universitat Autonoma de Barcelona, Carnegie Mellon University and International Institute of Information Technology, Hyderabad, India, have developed a technique that could allow deep learning algorithms to learn the visual features of images in a self-supervised fashion, without the need for annotations by human researchers.

To achieve remarkable results in computer vision tasks, deep learning algorithms need to be trained on large-scale annotated datasets that include extensive information about every image. However, collecting and manually annotating these images requires huge amounts of time, resources, and human effort.

"We aim to give computers the capability to read and understand textual information in any type of image in the real-world," says Dimosthenis Karatzas, one of the researchers who carried out the study, in an interview with Tech Xplore.

Humans use textual information to interpret all situations presented to them, as well as to describe what is happening around them or in a particular image. Researchers are now trying to give similar capabilities to machines, as this would vastly reduce the amount of resources spent on annotating large datasets.

In their study, Karatzas and his colleagues designed computational models that join textual information about images with the visual information contained within them, using data from Wikipedia or other online platforms. They then used these models to train deep-learning algorithms on how to select good visual features that semantically describe images.

As in other models based on convolutional neural networks (CNNs), features are learned end-to-end, with different layers automatically learning to focus on different things, ranging from pixel level details in the first layers to more abstract features in the last ones.

The model developed by Karatzas and his colleagues, however, does not require specific annotations for each image. Instead, the textual context where the image is found (e.g. a Wikipedia article) acts as the supervisory signal.

In other words, the new technique created by this team of researchers provides an alternative to fully unsupervised algorithms, which uses non-visual elements in correlation with the images, acting as a source for self-supervised training.

"This turns to be a very efficient way to learn how to represent images in a computer, without requiring any explicit annotations – labels about the content of the images – which take a lot of time and manual effort to generate," explains Karatzas. "These new image representations, learnt in a self-supervised way, are discriminatory enough to be used in a range of typical computer vision tasks, such as image classification and object detection."

The methodology developed by the researchers allows the use of text as the supervisory signal to learn useful image features. This could open up new possibilities for deep learning, allowing algorithms to learn good quality image features without the need for annotations, simply by analysing textual and visual sources that are readily available online.

By training their algorithms using images from the internet, the researchers highlighted the value of content that is readily available online.

"Our study demonstrated that the Web can be exploited as a pool of noisy data to learn useful representations about image content," says Karatzas. "We are not the first, nor the only ones that hinted towards this direction, but our work has demonstrated a specific way to do so, making use of Wikipedia articles as the data to learn from."

In future studies, Karatzas and his colleagues will try to identify the best ways to use image-embedded textual information to automatically describe and answer questions about image content.

"We will continue our work on the joint-embedding of textual and visual information, looking for novel ways to perform semantic retrieval by tapping on noisy information available in the Web and Social Media," adds Karatzas.

More information: TextTopicNet - Self-Supervised Learning of Visual Features Through Embedding Images on Semantic Text Spaces, arXiv:1807.02110 [cs.CV] arxiv.org/abs/1807.02110

Citation: A new machine learning strategy that could enhance computer vision (2018, July 16) retrieved 17 July 2024 from https://techxplore.com/news/2018-07-machine-strategy-vision.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

Explore further

Using deep neural network acceleration for image analysis in drug discovery

377 shares

Feedback to editors

Engineers evaluate cybersecurity risks associated with EV fast-charging equipment

12 hours ago

Machine learning framework maps global rooftop growth for sustainable energy and urban planning

14 hours ago

Giving drones wrap-and-grip wings to allow them to land on poles and tree limbs

16 hours ago

Large language models make human-like reasoning mistakes, researchers find

17 hours ago

Unveiling a new class of synthetic fuels

17 hours ago

Microsoft unveils software that allows LLMs to work with spreadsheets

17 hours ago

New technique to assess a general-purpose AI model's reliability before it's deployed

18 hours ago

New system enables intuitive teleoperation of a robotic manipulator in real-time

21 hours ago

Recycled micro-sized silicon anodes from photovoltaic waste improve lithium-ion battery performance

22 hours ago

You're just a stick figure to this camera—a new camera to prevent companies from collecting private information

Jul 15, 2024

Load comments (0)

A new machine learning strategy that could enhance computer vision

Engineers evaluate cybersecurity risks associated with EV fast-charging equipment

Machine learning framework maps global rooftop growth for sustainable energy and urban planning

Giving drones wrap-and-grip wings to allow them to land on poles and tree limbs

Large language models make human-like reasoning mistakes, researchers find

Unveiling a new class of synthetic fuels

Microsoft unveils software that allows LLMs to work with spreadsheets

New technique to assess a general-purpose AI model's reliability before it's deployed

New system enables intuitive teleoperation of a robotic manipulator in real-time

Recycled micro-sized silicon anodes from photovoltaic waste improve lithium-ion battery performance

You're just a stick figure to this camera—a new camera to prevent companies from collecting private information

Using deep neural network acceleration for image analysis in drug discovery

Training artificial intelligence with artificial X-rays

Want computers to see better in the real world? Train them in virtual reality

Making interaction with AI systems more natural with textual grounding

Inkblot tests with AI: OMG, street stabbing? No, flower and flute

'Bat detectives' train new algorithms to discern bat calls in noisy recordings

You're just a stick figure to this camera—a new camera to prevent companies from collecting private information

Visual abilities of language models found to be lacking depth

Reasoning skills of large language models are often overestimated, researchers find

A new model to plan and control the movements of humanoids in 3D environments

Researchers introduce generative AI to analyze complex tabular data

Computer scientists develop new and improved camera inspired by the human eye

Phys.org

Medical Xpress

Science X

A new machine learning strategy that could enhance computer vision

Engineers evaluate cybersecurity risks associated with EV fast-charging equipment

Machine learning framework maps global rooftop growth for sustainable energy and urban planning

Giving drones wrap-and-grip wings to allow them to land on poles and tree limbs

Large language models make human-like reasoning mistakes, researchers find

Unveiling a new class of synthetic fuels

Microsoft unveils software that allows LLMs to work with spreadsheets

New technique to assess a general-purpose AI model's reliability before it's deployed

New system enables intuitive teleoperation of a robotic manipulator in real-time

Recycled micro-sized silicon anodes from photovoltaic waste improve lithium-ion battery performance

You're just a stick figure to this camera—a new camera to prevent companies from collecting private information

Related Stories

Using deep neural network acceleration for image analysis in drug discovery

Training artificial intelligence with artificial X-rays

Want computers to see better in the real world? Train them in virtual reality

Making interaction with AI systems more natural with textual grounding

Inkblot tests with AI: OMG, street stabbing? No, flower and flute

'Bat detectives' train new algorithms to discern bat calls in noisy recordings

Recommended for you

You're just a stick figure to this camera—a new camera to prevent companies from collecting private information

Visual abilities of language models found to be lacking depth

Reasoning skills of large language models are often overestimated, researchers find

A new model to plan and control the movements of humanoids in 3D environments

Researchers introduce generative AI to analyze complex tabular data

Computer scientists develop new and improved camera inspired by the human eye

Your Privacy