Solving transportation problems using deep reinforcement learning

In our project, we explored the application of Reinforcement Learning within the context of a smart transportation grid. We learnt fundamental concepts such as Markov Decision Processes (MDP), Reinforcement Learning, and Deep Reinforcement Learning. OpenAI Gym and RDDL were used for the generation of the environment needed for our research process.

We wanted our algorithm to enforce cooperation between the agents, making them un-selfish. We hoped that this approach will improve the performance of the whole system.

To do so, we created a de-centralized algorithm, based on DQN (Deep Q-Network). We examined the implementation of reward sharing (each agent sees not only his reward, but also his neighbors’ rewards), state sharing (the agent sees his neighbors’ state and not only his), integrating a Stackelberg game (split the agents into followers and leaders, where the leaders can choose their actions before the followers) and encoding data with LSTM (Long Short Term Memory, a special kind of recurrent neural networks).

The project was a great success. Although we couldn’t get any improved results over the standard DQN, we learnt a lot. When we chose this project, we knew conducting research is a risk, and that we might not get any results. But that’s life, and that’s research. We had a great time, and we got some much from doing it.