A machine learning system that is capable of virtually removing buildings from a live view

Scientists at Osaka University have created a machine learning system that is capable of virtually removing buildings from a live view. By using generative adversarial networks (GAN) algorithms running on a remote server, the team was able to stream in real-time on a mobile device. This work can help accelerate the process of urban renewal based on community agreement.
Some necessary urban renewal tasks, such as demolishing old buildings, are delayed due to the difficulty in convincing stakeholders to commit resources to a project. For instance, differences in understanding about the plan among building owners and nearby residents may lead to conflict and delays. This may result in a paradox in which tasks would be feasible to begin only after they are already accomplished. Without access to a time machine, this may seem to lead to intractable situations in civil planning.
Now, a team of researchers at Osaka University have help to address this concern in the form of a new algorithm based on machine learning that provides augmented reality real-time video demonstrating the view after a building is removed. "Our method enables users to intuitively understand what the future landscape will look like, which can contribute to reducing the time and cost for forming a consensus," first author Takuya Kikuchi says. Communication between a mobile device and a server means that all the processing can be done remotely, so any smart phone or tablet can be used at the location of the building. To speed up the algorithm so it can provide real-time augmented video, the team used semantic segmentation on the input image. This allows the deep learning model to classify images pixel by pixel, as opposed to conventional methods that try to perform 3D object detection.

GAN algorithms use two competing neural networks, a generator and a discriminator. The generator is trained to create increasingly realistic images, while the discriminator is tasked with distinguishing if the image was real or artificially generated. "By learning in this way, the GAN algorithm can produce images that do not actually exist but are plausible," corresponding author Tomohiro Fukuda says. In this case, high accuracy processing was possible as long as the building to be removed from the landscape did not take up more than 15% of the screen. On the basis of field tests, the team was able to achieve virtual demolition video to be streamed at an average rate of 5.71 frames per second, which may greatly assist in on-site community enhancement.

More information: Takuya Kikuchi et al, Diminished reality using semantic segmentation and generative adversarial network for landscape assessment: Evaluation of image inpainting according to colour vision, Journal of Computational Design and Engineering (2022). DOI: 10.1093/jcde/qwac067