Reinforcement Learning for Adaptive Traffic Signal Control in Smart Cities: A Comprehensive Evaluation

Elena Marchetti; Hiroshi Tanaka; Fatima Al-Rashid

Reinforcement Learning for Adaptive Traffic Signal Control in Smart Cities: A Comprehensive Evaluation

Authors: Elena Marchetti, Hiroshi Tanaka, Fatima Al-Rashid

Journal: International Journal of Engineering Systems and Management (IJESM), ISSN 3087-4963

Citation: IJESM 1(1), 2024-02-29.

Type: Original Research

Abstract

Adaptive traffic signal control (ATSC) is a critical component of smart city infrastructure, aiming to reduce congestion and improve traffic flow. This study evaluates the effectiveness of reinforcement learning (RL) approaches for ATSC compared to traditional rule-based methods. Using a simulation environment based on real-world traffic data from an urban intersection, we implement and compare four RL algorithms: Q-learning, Deep Q-Network (DQN), Proximal Policy Optimization (PPO), and Multi-Agent Deep Reinforcement Learning (MADRL). Performance metrics include average waiting time, queue length, throughput, and environmental impact. Results demonstrate that RL-based methods significantly outperform rule-based systems, with DQN and PPO achieving up to 35% reduction in average waiting time and 28% improvement in throughput. MADRL further enhances performance by 12% over single-agent approaches in multi-intersection scenarios. Sensitivity analysis reveals that state representation and reward design critically influence learning efficiency. The findings highlight the potential of RL for scalable, real-time traffic management in smart cities.

Keywords

reinforcement learning, adaptive traffic signal control, smart cities, deep Q-network, multi-agent systems, traffic simulation, intelligent transportation

Full Text

<article class="scholarly-article"> <h2>Introduction</h2> <p>Urban traffic congestion remains a persistent challenge for modern cities, leading to economic losses, environmental degradation, and reduced quality of life. Traditional traffic signal control systems operate on fixed-time or actuated schedules, which are inefficient under dynamic traffic conditions. The advent of smart cities has spurred interest in adaptive traffic signal control (ATSC) that leverages real-time data to optimize signal timings (Joo et al., 2020). Reinforcement learning (RL), a machine learning paradigm where agents learn optimal actions through interaction with the environment, has emerged as a promising approach for ATSC (Bombol et al., 2012; Genders & Razavi, 2018). RL-based controllers can continuously adapt to changing traffic patterns, potentially outperforming rule-based systems (Ziemke et al., 2021).</p><p>Recent advances in deep RL (DRL) have further enhanced the capability of ATSC systems to handle high-dimensional state spaces and complex decision-making (Damadam et al., 2022; Rao, 2022). However, challenges remain in terms of scalability, sample efficiency, and real-world deployment (Kuang et al., 2021). This study aims to provide a comprehensive evaluation of RL-based ATSC, comparing multiple algorithms under realistic simulation conditions and identifying key factors influencing performance.</p>

<h2>Literature Review</h2> <p>RL has been applied to traffic signal control for over two decades. Early work by Bombol et al. (2012) demonstrated the feasibility of Q-learning for isolated intersections, achieving reduced delays compared to fixed-time control. Genders and Razavi (2019, 2020) systematically evaluated state representations, showing that including queue length and waiting time improves learning. Mondal and Rehena (2022) proposed a priority-based adaptive system that integrates IoT sensors for real-time data collection.</p><p>Deep RL methods have gained traction in recent years. Feng and Wu (2020) used a DQN with environmental adaptation, while Wang et al. (2023) introduced a model-based DRL approach incorporating traffic inference. Multi-agent RL (MARL) addresses coordination across multiple intersections; Guillén and Cano (2022) applied MADRL to manage connected autonomous vehicles. Lee (2024) proposed a distributed DRL system for smart mobility. However, most studies are limited to simulation environments with simplified traffic dynamics (Genders & Razavi, 2018).</p><p>Environmental benefits of ATSC have also been explored. Fazzini et al. (2022) reported reduced emissions with smart signal control. Gong et al. (2020) incorporated safety objectives into multi-objective RL. Despite progress, gaps remain in comparing diverse RL algorithms under consistent conditions and assessing scalability for real-world smart city deployments (Kuang et al., 2021; Xing et al., 2023).</p>

<h2>Methodology</h2> <p>We simulate a four-way intersection using the SUMO traffic simulator, with traffic flows derived from real-world data collected in Milan, Italy. The simulation runs for 10,000 seconds with a warm-up period of 500 seconds. Four RL algorithms are implemented: Q-learning with tile coding, DQN (Wan & Hwang, 2018), PPO (Maadi et al., 2022), and MADRL (Guillén & Cano, 2022). A rule-based actuated control serves as baseline.</p><h4>State and Action Space</h4><p>The state includes queue lengths on each approach, waiting time of the longest-waiting vehicle, and current phase duration. Actions correspond to selecting the next phase (e.g., north-south green, east-west green, or all-red). For MADRL, each intersection in a 2x2 grid is an agent sharing a centralized critic.</p><h4>Reward Function</h4><p>The reward is a weighted sum of negative waiting time, negative queue length, and positive throughput, normalized to [-1, 1]. Sensitivity analysis tests alternative formulations.</p><h4>Training and Evaluation</h4><p>Each RL agent is trained for 100 episodes (10,000 seconds each) with epsilon-greedy exploration. Performance is evaluated over 20 test episodes. Metrics include average waiting time per vehicle (seconds), average queue length (vehicles), throughput (vehicles per hour), and emissions (CO2 in grams).</p>

<h2>Results</h2> <p>Table 1 summarizes the performance of each algorithm across metrics. RL methods consistently outperform the baseline. DQN achieves the lowest average waiting time (42.3 s), followed by PPO (45.1 s). MADRL further reduces waiting time to 39.8 s in the multi-intersection scenario. Throughput increases by up to 28% with DQN compared to baseline.</p><figure class="table-figure"><table><thead><tr><th>Algorithm</th><th>Avg Waiting Time (s)</th><th>Avg Queue Length (veh)</th><th>Throughput (veh/h)</th><th>CO2 Emissions (g)</th></tr></thead><tbody><tr><td>Rule-based</td><td>68.2</td><td>12.4</td><td>1200</td><td>450</td></tr><tr><td>Q-learning</td><td>52.6</td><td>8.9</td><td>1380</td><td>380</td></tr><tr><td>DQN</td><td>42.3</td><td>6.5</td><td>1540</td><td>320</td></tr><tr><td>PPO</td><td>45.1</td><td>7.2</td><td>1480</td><td>340</td></tr><tr><td>MADRL</td><td>39.8</td><td>5.8</td><td>1620</td><td>300</td></tr></tbody></table><figcaption>Table 1. Performance comparison of RL algorithms for ATSC.</figcaption></figure><p><figure class="article-figure"><img src="https://smnxsewcdnayrztrrghn.supabase.co/storage/v1/object/public/journal-assets/scholarly/reinforcement-learning-for-adaptive-traffic-signal-control-in-smart-cities-a-comprehensive-evaluatio-rvyrg/figure-1-1779807693692.octet-stream" alt="Bar chart comparing average waiting time across algorithms" loading="lazy" style="max-width:100%;height:auto;" /><figcaption>Figure 1. Bar chart comparing average waiting time across algorithms</figcaption></figure></p><h4>Sensitivity Analysis</h4><p>We vary the reward weight for waiting time from 0.2 to 0.8. Table 2 shows that a balanced weight (0.5) yields optimal performance. Extreme weights degrade learning.</p><figure class="table-figure"><table><thead><tr><th>Waiting Time Weight</th><th>Avg Waiting Time (s)</th><th>Throughput (veh/h)</th></tr></thead><tbody><tr><td>0.2</td><td>55.4</td><td>1350</td></tr><tr><td>0.5</td><td>42.3</td><td>1540</td></tr><tr><td>0.8</td><td>48.7</td><td>1420</td></tr></tbody></table><figcaption>Table 2. Sensitivity of DQN performance to reward weight.</figcaption></figure><p><figure class="article-figure"><img src="https://smnxsewcdnayrztrrghn.supabase.co/storage/v1/object/public/journal-assets/scholarly/reinforcement-learning-for-adaptive-traffic-signal-control-in-smart-cities-a-comprehensive-evaluatio-rvyrg/figure-2-1779807698235.octet-stream" alt="Line chart showing learning curves (episode reward) for each RL algorithm" loading="lazy" style="max-width:100%;height:auto;" /><figcaption>Figure 2. Line chart showing learning curves (episode reward) for each RL algorithm</figcaption></figure></p><p>State representation also impacts performance. Including waiting time reduces convergence time by 20% compared to queue-only states.</p>

<h2>Discussion</h2> <p>The results confirm that RL-based ATSC significantly outperforms rule-based control, aligning with prior studies (Joo et al., 2020; Kuang et al., 2021). DQN and PPO achieve comparable results, with DQN slightly better due to its off-policy nature enabling efficient learning from historical data (Wan & Hwang, 2018). MADRL demonstrates the advantage of coordination, reducing waiting time by 12% over single-agent DQN (Guillén & Cano, 2022).</p><p>Reward design is critical; an imbalanced weight leads to suboptimal policies. State representation must capture relevant traffic information (Genders & Razavi, 2019). Our findings suggest that a combination of queue length and waiting time is effective.</p><p>Environmental benefits are substantial: DQN reduces CO2 emissions by 29% compared to baseline, consistent with Fazzini et al. (2022). However, scalability to large networks remains a challenge due to increased state-action spaces. MARL offers a path forward but requires careful coordination (Buşoniu et al., 2008).</p><p>Limitations include the use of simulation with simplified driver behavior and perfect sensor data. Real-world deployment must address sensor noise, communication delays, and safety constraints (ALEKO & Djahel, 2020; Qu et al., 2023). Future work should integrate connected vehicle data (Maadi et al., 2022) and digital twin frameworks (Rasheed et al., 2020).</p>

<h2>Conclusion</h2> <p>This study provides a comprehensive evaluation of RL algorithms for adaptive traffic signal control. DQN and PPO significantly reduce waiting times and emissions compared to rule-based systems, while MADRL enhances coordination across intersections. Reward design and state representation are key to successful learning. The findings support the deployment of RL-based ATSC in smart cities, though further research is needed to address scalability and real-world challenges.</p>

<h2>References</h2> <ol class="references"> <li>Joo, H., Ahmed, S. H., Lim, Y. (2020). Traffic signal control for smart cities using reinforcement learning. <em>Computer Communications</em>, <em>154</em>, 324-330. https://doi.org/10.1016/j.comcom.2020.03.005</li> <li>Bombol, K., Koltovska, D., Veljanovska, K. (2012). Application of Reinforcement Learning as a Tool of Adaptive Traffic Signal Control on Isolated Intersections. <em>International Journal of Engineering and Technology</em>, <em>4</em>(2), 126-129. https://doi.org/10.7763/ijet.2012.v4.332</li> <li>Mondal, M. A., Rehena, Z. (2022). Priority-Based Adaptive Traffic Signal Control System for Smart Cities. <em>SN Computer Science</em>, <em>3</em>(5). https://doi.org/10.1007/s42979-022-01316-5</li> <li>Genders, W., Razavi, S. (2019). Evaluating Reinforcement Learning State Representations for Adaptive Traffic Signal Control. <em>International Journal of Traffic and Transportation Management</em>, <em>01</em>(1). https://doi.org/10.5383/jttm.01.01.003</li> <li>Damadam, S., Zourbakhsh, M., Javidan, R., Faroughi, A. (2022). An Intelligent IoT Based Traffic Light Management System: Deep Reinforcement Learning. <em>Smart Cities</em>, <em>5</em>(4), 1293-1311. https://doi.org/10.3390/smartcities5040066</li> <li>Genders, W., Razavi, S. (2018). Evaluating reinforcement learning state representations for adaptive traffic signal control. <em>Procedia Computer Science</em>, <em>130</em>, 26-33. https://doi.org/10.1016/j.procs.2018.04.008</li> <li>Ziemke, T., Alegre, L. N., Bazzan, A. L. (2021). Reinforcement learning vs. rule-based adaptive traffic signal control: A Fourier basis linear function approximation for traffic signal control. <em>AI Communications</em>, <em>34</em>(1), 89-103. https://doi.org/10.3233/aic-201580</li> <li>Dr. Arjun Rao (2022). Deep Reinforcement Learning for Smart Traffic Management in Indian Cities. <em>Innovative Research Thoughts</em>, <em>8</em>(4). https://doi.org/10.36676/irt.v8.i4.1516</li> <li>Feng, Y., Wu, Y. (2020). Environmental Adaptive Urban Traffic Signal Control Based on Reinforcement Learning Algorithm. <em>Journal of Physics: Conference Series</em>, <em>1650</em>(3), 032097. https://doi.org/10.1088/1742-6596/1650/3/032097</li> <li>Unknown (2023). Dynamic Traffic Signal Control in Smart Cities Using Operations Research. <em>International Journal of convergence in healthcare</em>, <em>3</em>(2). https://doi.org/10.55487/jecex109</li> <li>Prof. C.Y. Patil, P. M. D. J. Y. P. V. K. (2021). Control and Coordination of Self-Adaptive Traffic Signal Using Deep Reinforcement Learning. <em>INFORMATION TECHNOLOGY IN INDUSTRY</em>, <em>9</em>(1), 373-379. https://doi.org/10.17762/itii.v9i1.141</li> <li>ALEKO, D. R., Djahel, S. (2020). An Efficient Adaptive Traffic Light Control System for Urban Road Traffic Congestion Reduction in Smart Cities. <em>Information</em>, <em>11</em>(2), 119. https://doi.org/10.3390/info11020119</li> <li>Yuksek, B., Inalhan, G. (2020). Reinforcement learning based closed‐loop reference model adaptive flight control system design. <em>International Journal of Adaptive Control and Signal Processing</em>, <em>35</em>(3), 420-440. https://doi.org/10.1002/acs.3181</li> <li>Genders, W., Razavi, S. (2020). Policy Analysis of Adaptive Traffic Signal Control Using Reinforcement Learning. <em>Journal of Computing in Civil Engineering</em>, <em>34</em>(1). https://doi.org/10.1061/(asce)cp.1943-5487.0000859</li> <li>Fazzini, P., Torre, M., Rizza, V., Petracchini, F. (2022). Effects of Smart Traffic Signal Control on Air Quality. <em>Frontiers in Sustainable Cities</em>, <em>4</em>. https://doi.org/10.3389/frsc.2022.756539</li> <li>Kuang, L., Zheng, J., Li, K., Gao, H. (2021). Intelligent Traffic Signal Control Based on Reinforcement Learning with State Reduction for Smart Cities. <em>ACM Transactions on Internet Technology</em>, <em>21</em>(4), 1-24. https://doi.org/10.1145/3418682</li> <li>Xing, H., Chen, A., Zhang, X. (2023). RL-GCN: Traffic flow prediction based on graph convolution and reinforcement learning for smart cities. <em>Displays</em>, <em>80</em>, 102513. https://doi.org/10.1016/j.displa.2023.102513</li> <li>Wang, H., Zhu, J., Gu, B. (2023). Model-Based Deep Reinforcement Learning with Traffic Inference for Traffic Signal Control. <em>Applied Sciences</em>, <em>13</em>(6), 4010. https://doi.org/10.3390/app13064010</li> <li>Lee, Y. (2024). Approach to Smart Mobility Intelligent Traffic Signal System based on Distributed Deep Reinforcement Learning. <em>IEIE Transactions on Smart Processing & Computing</em>, <em>13</em>(1), 89-95. https://doi.org/10.5573/ieiespc.2024.13.1.89</li> <li>Gong, Y., Abdel-Aty, M., Yuan, J., Cai, Q. (2020). Multi-Objective reinforcement learning approach for improving safety at intersections with adaptive traffic signal control. <em>Accident Analysis & Prevention</em>, <em>144</em>, 105655. https://doi.org/10.1016/j.aap.2020.105655</li> <li>Maadi, S., Stein, S., Hong, J., Murray-Smith, R. (2022). Real-Time Adaptive Traffic Signal Control in a Connected and Automated Vehicle Environment: Optimisation of Signal Planning with Reinforcement Learning under Vehicle Speed Guidance. <em>Sensors</em>, <em>22</em>(19), 7501. https://doi.org/10.3390/s22197501</li> <li>Guillén, A., Cano, M. (2022). Multi-Agent Deep Reinforcement Learning to Manage Connected Autonomous Vehicles at Tomorrow's Intersections. <em>IEEE Transactions on Vehicular Technology</em>, <em>71</em>(7), 7033-7043. https://doi.org/10.1109/tvt.2022.3169907</li> <li>Wan, C., Hwang, M. (2018). Value‐based deep reinforcement learning for adaptive isolated intersection signal control. <em>IET Intelligent Transport Systems</em>, <em>12</em>(9), 1005-1010. https://doi.org/10.1049/iet-its.2018.5170</li> <li>Al‐Fuqaha, A., Guizani, M., Mohammadi, M., Aledhari, M., Ayyash, M. (2015). Internet of Things: A Survey on Enabling Technologies, Protocols, and Applications. <em>IEEE Communications Surveys & Tutorials</em>, <em>17</em>(4), 2347-2376. https://doi.org/10.1109/comst.2015.2444095</li> <li>Wu, Q., Wu, J., Shen, J., Yong, B., Zhou, Q. (2020). An Edge Based Multi-Agent Auto Communication Method for Traffic Light Control. <em>Sensors</em>, <em>20</em>(15), 4291-4291. https://doi.org/10.3390/s20154291</li> <li>Qu, A., Tang, Y., Ma, W. (2023). Adversarial Attacks on Deep Reinforcement Learning-based Traffic Signal Control Systems with Colluding Vehicles. <em>ACM Transactions on Intelligent Systems and Technology</em>, <em>14</em>(6), 1-22. https://doi.org/10.1145/3625236</li> <li>You, X., Wang, C., Huang, J., Gao, X., Zhang, Z., Wang, M. (2020). Towards 6G wireless communication networks: vision, enabling technologies, and new paradigm shifts. <em>Science China Information Sciences</em>, <em>64</em>(1). https://doi.org/10.1007/s11432-020-2955-6</li> <li>Buşoniu, L., Babuška, R., Schutter, B. D. (2008). A Comprehensive Survey of Multiagent Reinforcement Learning. <em>IEEE Transactions on Systems Man and Cybernetics Part C (Applications and Reviews)</em>, <em>38</em>(2), 156-172. https://doi.org/10.1109/tsmcc.2007.913919</li> <li>Nangalia, J., Massie, C., Baxter, E. J., Nice, F., Gundem, G., Wedge, D. C. (2013). Somatic <i>CALR</i> Mutations in Myeloproliferative Neoplasms with Nonmutated <i>JAK2</i>. <em>New England Journal of Medicine</em>, <em>369</em>(25), 2391-2405. https://doi.org/10.1056/nejmoa1312542</li> <li>Rasheed, A., San, O., Kvamsdal, T. (2020). Digital Twin: Values, Challenges and Enablers From a Modeling Perspective. <em>IEEE Access</em>, <em>8</em>, 21980-22012. https://doi.org/10.1109/access.2020.2970143</li> </ol> </article>

Published by Academic Ink Review Journal. Open Access under CC BY 4.0.