LOKI: An intention dataset to train models for pedestrian and vehicle trajectory prediction
Human decision-making processes are inherently hierarchical. This means that they involve several levels of reasoning and different planning strategies that operate simultaneously to achieve both short-term and long-term goals.
Over the past decade or so, an increasing number of computer scientists have been trying to develop computational tools and techniques that could replicate human decision-making processes, allowing robots, autonomous vehicles or other devices to make decisions faster and more efficiently. This is particularly important for robotic systems performing actions that directly impact the safety of humans, such as self-driving cars.
Researchers at Honda Research Institute U.S., Honda R&D, and UC Berkeley have recently compiled LOKI, a dataset that could be used to train models that predict the trajectories of pedestrians and vehicles on the road. This dataset, presented in a paper pre-published on arXiv and set to be presented at the ICCV conference 2021, contains carefully labeled images of different agents (e.g., pedestrians, bicycles, cars, etc.) on the street, captured from the perspective of a driver.
"In our recent paper, we propose to explicitly reason about agents' long-term goals as well as their short-term intents for predicting future trajectories of traffic agents in driving scenes," Chiho Choi, one of the researchers who carried out the study, told TechXplore. "We define long-term goals to be a final position an agent wants to reach for a given prediction horizon, while intent refers to how an agent accomplishes their goal."
Choi and his colleagues hypothesized that to predict the trajectories of traffic agents most efficiently, it is important for machine learning techniques to consider a complex hierarchy of short-term and long-term goals. Based on the agent motions predicted, the model can then plan the movements of a robot or vehicle most efficiently.
The researchers thus set out to develop an architecture that considers both short- and long-term goals as key components of frame-wise intention estimation. The results of these considerations then influence its trajectory prediction module.
"Consider a vehicle at an intersection where the vehicle wants to reach its ultimate goal of turning left to its final goal point," Choi explained. "When reasoning about the agent's motion intent to turn left, it is important to consider not only agent dynamics but also how intent is subject to change based on many factors including i) the agent's own will, ii) social interactions, iii) environmental constraints, iv) contextual cues."
The LOKI dataset contains hundreds of RGB images portrayed different agents in traffic. Each of these images has corresponding LiDAR point clouds with detailed, frame-wise labels for all traffic agents.
The dataset has three unique classes of labels. The first of these are intention labels, which specify 'how' an actor decides to reach a given goal via a series of actions. The second are environmental labels, providing information about the environment that impacts the intentions of agents (e.g., 'road exit' or 'road entrance' positions, 'traffic light," 'traffic sign," 'lane information," etc.). The third class includes contextual labels that could also affect the future behavior of agents, such as weather-related information, road conditions, gender and age of pedestrians, and so on.
"We provide a comprehensive understanding of how intent changes over a long time horizon," Choi said. "In doing so, the LOKI dataset is the first that can be used as a benchmark for intention understanding for heterogeneous traffic agents (i.e., cars, trucks, bicycles, pedestrians, etc.)."
In addition to compiling the LOKI dataset, Choi and his colleagues developed a model that explores how the factors considered by LOKI can affect the future behavior of agents. This model can predict the intentions and trajectories of different agents on the road with high levels of accuracy, specifically considering the impact of i) an agent's own will, ii) social interactions, iii) environmental constraints, and iv) contextual information on its short-term actions and decision-making process.
The researchers evaluated their model in a series of tests and found that it outperformed other state-of-the-art trajectory-prediction methods by up to 27%. In the future, the model could be used to enhance the safety and performance of autonomous vehicles. In addition, other research teams could use the LOKI dataset to train their own models for predicting the trajectories of pedestrians and vehicles on the road.
"We already started exploring other research directions aimed at jointly reasoning about intentions and trajectories while considering different internal/external factors such as agents' will, social interactions and environmental factors," Choi said. "Our immediate plan is to further explore the intention-based prediction space not only for trajectories but also for general human motions and behaviors. We are currently working on expanding the LOKI dataset in this direction and believe our highly flexible dataset will encourage the prediction community to further advance these domains."
More information: Harshayu Girase et al, LOKI: Long term and key intentions for trajectory prediction, arXiv:2108.08236 [cs.CV] arxiv.org/abs/2108.08236
© 2021 Science X Network