May 21, 2019 feature

CycleMatch: a new approach for matching images and text

by Ingrid Fadelli , Tech Xplore

Researchers at Leiden University and the National University of Defense Technology (NUDT), in China, have recently developed a new approach for image-text matching, called CycleMatch. Their approach, presented in a paper published in Elsevier's Pattern Recognition journal, is based on cycle-consistent learning, a technique that is sometimes used to train artificial neural networks on image-to-image translation tasks. The general idea behind cycle-consistency is that when transforming source data into target data and then vice versa, one should finally obtain the original source samples.

When it comes to developing artificial intelligence (AI) tools that perform well in multi-modal or multimedia-based tasks, finding ways to bridge images and text representations is of crucial importance. Past studies have tried to achieve this by uncovering semantics or features that are relevant to both vision and language.

When training algorithms on correlations between different modalities, however, these studies have often neglected or failed to address intra-modal semantic consistency, which is the consistency of semantics for the individual modalities (i.e. vision and language). To address this shortcoming, the team of researchers at Leiden University and NUDT proposed an approach that applies cycle-consistent embeddings to a deep neural network for matching visual and textual representations.

"Our approach, named as CycleMatch, can maintain both inter-modal correlations and intra-modal consistency by cascading dual mappings and reconstructed mappings in a cyclic fashion," the researchers wrote in their paper. "Moreover, in order to achieve a robust inference, we propose to employ two late-fusion approaches: average fusion and adaptive fusion."

The approach devised by the researchers integrates three feature embeddings (dual, reconstructed and latent embeddings) with a neural network for image-text matching. The method has two cycle branches, one starting from an image feature in the visual space and one from a text feature in the textual space.

For each of these cycles, their approach achieves a dual mapping, translating an input feature in the source space into a dual embedding in the target space. The researchers then apply reconstructed mapping, trying to translate this dual embedding back to the source space.

Their approach also allows the researchers to acquire a 'latent space' during both dual and reconstructed mappings, and subsequently correlate latent embeddings. Contrarily to other techniques for image-text matching, therefore, their method can learn both inter-modal mappings (i.e. image-to-text and text-to-image) and intra-modal mappings (image-to-image and text-to-text).

To evaluate their approach, the researchers carried out a series of experiments using two renowned multi-modal datasets, Flickr30K and MSCOCO. Their method achieved state-of-the-art results, outperforming traditional approaches and leading to significant improvements in cross-modal retrieval.

These findings suggest that cycle-consistent embeddings could enhance the performance of neural networks in multi-modal tasks, such as image-text matching, allowing them to acquire both inter-modal and intra-modal mappings. In their future work, the researchers plan to develop their approach further, by taking into account local relations in matching images and text (e.g. semantic correlations between visual regions and phrases).

More information: Yu Liu et al. CycleMatch: A cycle-consistent embedding network for image-text matching, Pattern Recognition (2019). DOI: 10.1016/j.patcog.2019.05.008

Citation: CycleMatch: a new approach for matching images and text (2019, May 21) retrieved 17 July 2024 from https://techxplore.com/news/2019-05-cyclematch-approach-images-text.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

Explore further

A new approach for low-resource machine transliteration using RNNs

173 shares

Feedback to editors

Flexible electronics researchers develop a completely stretchy lithium-ion battery

2 hours ago

A strategy to enhance the stability of perovskite solar cells under reverse bias conditions

3 hours ago

Engineers evaluate cybersecurity risks associated with EV fast-charging equipment

18 hours ago

Machine learning framework maps global rooftop growth for sustainable energy and urban planning

20 hours ago

Giving drones wrap-and-grip wings to allow them to land on poles and tree limbs

22 hours ago

Large language models make human-like reasoning mistakes, researchers find

23 hours ago

Unveiling a new class of synthetic fuels

23 hours ago

Microsoft unveils software that allows LLMs to work with spreadsheets

23 hours ago

New technique to assess a general-purpose AI model's reliability before it's deployed

Jul 16, 2024

New system enables intuitive teleoperation of a robotic manipulator in real-time

Jul 16, 2024

Load comments (0)

CycleMatch: a new approach for matching images and text

Flexible electronics researchers develop a completely stretchy lithium-ion battery

A strategy to enhance the stability of perovskite solar cells under reverse bias conditions

Engineers evaluate cybersecurity risks associated with EV fast-charging equipment

Machine learning framework maps global rooftop growth for sustainable energy and urban planning

Giving drones wrap-and-grip wings to allow them to land on poles and tree limbs

Large language models make human-like reasoning mistakes, researchers find

Unveiling a new class of synthetic fuels

Microsoft unveils software that allows LLMs to work with spreadsheets

New technique to assess a general-purpose AI model's reliability before it's deployed

New system enables intuitive teleoperation of a robotic manipulator in real-time

A new approach for low-resource machine transliteration using RNNs

A new vehicle search system for video surveillance networks

Generating cross-modal sensory data for robotic visual-tactile perception

A deep learning-based method to detect cyberbullying on Twitter

A system to generate new song lyrics that match the style of specific artists

A hierarchical RNN-based model to predict scene graphs for images

Machine learning framework maps global rooftop growth for sustainable energy and urban planning

Microsoft unveils software that allows LLMs to work with spreadsheets

Large language models make human-like reasoning mistakes, researchers find

New system enables intuitive teleoperation of a robotic manipulator in real-time

New technique to assess a general-purpose AI model's reliability before it's deployed

A new neural network makes decisions like a human would

Phys.org

Medical Xpress

Science X

CycleMatch: a new approach for matching images and text

Flexible electronics researchers develop a completely stretchy lithium-ion battery

A strategy to enhance the stability of perovskite solar cells under reverse bias conditions

Engineers evaluate cybersecurity risks associated with EV fast-charging equipment

Machine learning framework maps global rooftop growth for sustainable energy and urban planning

Giving drones wrap-and-grip wings to allow them to land on poles and tree limbs

Large language models make human-like reasoning mistakes, researchers find

Unveiling a new class of synthetic fuels

Microsoft unveils software that allows LLMs to work with spreadsheets

New technique to assess a general-purpose AI model's reliability before it's deployed

New system enables intuitive teleoperation of a robotic manipulator in real-time

Related Stories

A new approach for low-resource machine transliteration using RNNs

A new vehicle search system for video surveillance networks

Generating cross-modal sensory data for robotic visual-tactile perception

A deep learning-based method to detect cyberbullying on Twitter

A system to generate new song lyrics that match the style of specific artists

A hierarchical RNN-based model to predict scene graphs for images

Recommended for you

Machine learning framework maps global rooftop growth for sustainable energy and urban planning

Microsoft unveils software that allows LLMs to work with spreadsheets

Large language models make human-like reasoning mistakes, researchers find

New system enables intuitive teleoperation of a robotic manipulator in real-time

New technique to assess a general-purpose AI model's reliability before it's deployed

A new neural network makes decisions like a human would

Your Privacy