Credit: Unsplash/CC0 Public Domain

A new foundation model dubbed RingMo has been developed to improve the accuracy for remote sensing image interpretation, according to the Aerospace Information Research Institute (AIR), Chinese Academy of Sciences (CAS).

The study titled "RingMo: A Remote Sensing Foundation Model with Masked Image Modeling" was published in IEEE Transactions on Geoscience and Remote Sensing.

Remote sensing images are applied in fields like classification and change detection, and have contributed to the rapid development of image interpretation. The most widely used training is the use of ImageNet pre-trained models to process remote sensing data for specified tasks.

However there are problems, such as a domain gap between natural and remote sensing scenes and the poor generalization capacity of remote sensing models. Thus, it is necessary to develop a foundation model with general remote sensing feature representation. Since a large amount of unlabeled data is available, the self-supervised method is better than the fully supervised method in remote sensing.

The study aims to propose a remote sensing foundation model framework, which can leverage the benefits of generative self-supervised learning for remote sensing images. RingMo features a large-scale dataset constructed by collecting 2 million remote sensing images from satellite and aerial platforms, covering multiple scenes and objects around the world. In addition, remote sensing foundation model training method is designed for dense and small objects in complicated remote sensing scenes.

RingMo is the first generative foundation model for cross-modal remote sensing data. In the future, the model can be applied to 3D reconstruction, residential construction, transportation, water conservancy, and other fields.

More information: Xian Sun et al, RingMo: A Remote Sensing Foundation Model with Masked Image Modeling, IEEE Transactions on Geoscience and Remote Sensing (2022). DOI: 10.1109/TGRS.2022.3194732