Berkeley Lab researcher Sherry Li (Credit: Roy Kaltschmidt/Berkeley Lab)

Urban traffic roughly follows a periodic pattern associated with the typical 9-to-5 work schedule. However, when an accident happens, traffic patterns are disrupted. Designing accurate traffic flow models, for use during accidents, is a major challenge for traffic engineers, who must adapt to unforeseen traffic scenarios in real time.

A team of Lawrence Berkeley National Lab computer scientists are working with the California Department of Transportation (Caltrans) to use high performance computing (HPC) and machine learning to help improve Caltrans' decision making when incidents occur. The research was done in conjunction with California Partners for Advanced Transportation Technology (PATH), part of UC Berkeley's Institute for Transportation Studies (ITS), and Connected Corridors, a collaborative program to research, develop, and test an Integrated Corridor Management approach to managing transportation corridors in California.

Caltrans and Connected Corridors are implementing the system on a trial basis in Los Angeles County through the I-210 pilot. Using real-time data from partners in southern California at the city, county, and state level, the goal is to improve Caltrans' real-time decision-making by executing coordinated multijurisdictional incident response plans to limit the negative impacts of these events. The first iteration of this system will be deployed in the cities of Arcadia, Duarte, Monrovia, and Pasadena in 2020, with plans for future deployments around the state.

"Many traffic-flow prediction methods exist, and each can be advantageous in the right situation," said Sherry Li, a mathematician in Berkeley Lab's Computational Research Division (CRD). "To alleviate the pain of relying on human operators who sometimes blindly trust one particular model, our goal was to integrate multiple models that produce more stable and accurate traffic predictions. We did this by designing an ensemble-learning algorithm that combines different sub-models.

Ensemble learning is the art of combining a diverse set of learners (individual models) to improve, on the fly, the stability and predictive power of the model. This idea has been explored by machine learning researchers for a long time. What is special about traffic flow is the temporal characteristic; traffic flow measurements are correlated over time, as are the prediction results from different individual models.

In the Berkeley Lab-Caltrans collaboration, the ensemble model takes into account the mutual dependency of sub-models and assigns the "shares of vote" to balance their individual performance with their codependency. The ensemble model also values recent prediction performance more than older historical performance. At the end, the combined model is better than any of the single models used in testing in both prediction accuracy and stability.

The project started with funding from Berkeley Lab's Laboratory Directed Research and Development (LDRD) program. The goal was to build a computational framework that would enable HPC applications specific to transportation, such as optimization and control of traffic equilibrium. The systems development team is led by Brian Peterson, a systems development manager at PATH who manages Connected Corridors' systems development team. Hongyuan Zhan, a former Berkeley Lab Computing Sciences summer student from Penn State, was a major contributor to the Connected Corridors work for this research.

Traffic flow prediction by the TDEC algorithm, a model combination scheme that can track the actual traffic closer than a pool of individual candidate models. Green line is the prediction range, blue line is the true flow, red line is the TDEC algorithm prediction. Credit: Hongyuan Zhan

Real-time data, real-time decision-making

Using data collected from Caltrans sensors on California , the project yielded novel algorithms that achieved accurate prediction on a 15-minute rolling basis. The team then validated and integrated the new algorithms using real-time traffic data collected using the Connected Corridors system: a streaming-based, real-time transportation data hub in which Spark MLlib – a scalable machine-learning library – provides machine-learning models that can be utilized within the proposed ensemble-learning framework. The specific implementation of this work was to generate predicted traffic flows at points where sensing was present on the freeway. This in turn could be used to predict traffic demands at freeway entrances and traffic flows at freeway exits.

Ensemble learning partly addresses the issue of different types of vehicles in traffic; however, it does not address sudden changes caused by construction or incidents. The research team applied online (real-time) learning techniques to enable the algorithm to learn not just from the past, but to adapt to new traffic conditions along the way in real time.

The algorithm could be used in combination with these technologies for more accurate and timely traffic prediction and to aid real-time traffic control, such as rerouting traffic, altering traffic light configurations, and other corrective measures.

"The first deployment of the Connected Corridors program is intended to validate the concept and quantify improvements in travel times, traffic flow, and delays under real-world conditions," Peterson said. "Traffic modeling has indicated that significant improvements are possible with the traffic management strategies being developed. Future deployments are in the planning stage with opportunities for ongoing system improvements and new approaches."

In addition to Li, Peterson, and Zhan, other contributors to this project include Berkeley Lab researcher John Wu and ITS' Gabriel Gomes.