Co-learning to improve autonomous driving
Self-driving cars are both fascinating and fear-inducing, as they must accurately assess and navigate the rapidly changing environment. Computer vision, which uses computation to extract information from imagery, is an important aspect of autonomous driving, with tasks ranging from low level, such as determining how far away a given location is from the vehicle, to higher level, such as determining if there is a pedestrian in the road.
Nathan Jacobs, professor of computer science & engineering in the McKelvey School of Engineering at Washington University in St. Louis, and a team of graduate students have developed a joint learning framework to optimize two low-level tasks: stereo matching and optical flow. Stereo matching generates maps of disparities between two images and is a critical step in depth estimation for avoiding obstacles. Optical flow aims to estimate per-pixel motion between video frames and is useful to estimate how objects are moving as well as how the camera is moving relative to them.
The team's work is published on the arXiv preprint server.
Ultimately, stereo matching and optical flow both aim to understand the pixel-wise displacement of images and use that information to capture a scene's depth and motion. Jacobs' team's co-training approach simultaneously addresses both tasks, leveraging their inherent similarities. The framework, which Jacobs presented on Nov. 23 at the British Machine Vision Conference in Aberdeen, UK, outperforms comparable methods for completing stereo matching and optical flow estimation tasks in isolation.
One of the big challenges in training models for these tasks is acquiring high-quality training data, which can be both difficult and costly, Jacobs said. The team's method capitalizes on effective methods for image-to-image translation between computer-generated synthetic images and real image domains. This approach allows their model to excel in real-world scenarios while training solely on ground-truth information from synthetic images.
"Our approach overcomes one of the important challenges in optical flow and stereo, obtaining accurate ground truth," Jacobs said. "Since we can obtain a lot of simulated training data, we get more accurate models than training only on the available labeled real-image datasets. More accurate stereo and optical flow estimates reduce errors that would otherwise propagate through the rest of the autonomous driving pipeline system, such as obstacle avoidance."
More information: Zhexiao Xiong et al, StereoFlowGAN: Co-training for Stereo and Flow with Unsupervised Domain Adaptation, arXiv (2023). DOI: 10.48550/arxiv.2309.01842