Traffic light control for a single intersection is a well-known problem that affects traffic flow in cities and beyond. There are numerous techniques for managing this problem, ranging from naive policies that rotate through the intersection phases with fixed timings to adaptive policies that sense traffic load on different routes and prioritize accordingly.
In this project, we adopt an adaptive algorithm approach that makes decisions based on the intersection’s current state and a future prediction of what “might” occur and work on a single junction.
To achieve this, we employ the Monte Carlo Tree Search (MCTS) algorithm, that explores the decision tree based on a policy influenced by simulation outcomes. We utilize two different methods to implement this algorithm.
One policy we use is The Upper Confidence Bound for Trees (UCT), and the other is Maximum-Entropy Tree Search (MENTS).
We compared the results of these algorithms with those of a naive fixed-time control policy and a random control policy. The UCT-based algorithm achieved results approximately 5% better than the best-performing naive policy for any given intersection.
The MENTS-based algorithm achieved results similar to an optimal naive policy but exhibited significant instability changes in the intersection model and hyperparameters.
Our main conclusion is that, for our simulation model, the MCTS algorithm with UCT is the best-performing policy among those tested. It is significantly more stable, does not require parameter tuning for different intersections, and consistently achieves better results.
This conclusion contradicts the primary paper on which our project was based. While the issue likely lies in our implementation, we were unable to replicate the paper’s results or achieve better performance using MENTS. Based on our understanding, the problem likely stems from parameter determination and normalization of the simulation results.