PPO-Driven Reinforcement Learning Congestion Control Under High-BDP Wide-Area Deployment: A Scaling Analysis

Ghermezian, Ali; Karpov, Kirill; Kachan, Dmitry; Siemens, Veronika

doi:10.25673/123807

Proceedings of International Conference on Applied Innovation in IT · 2026/04/22 · Vol. 14 · Issue 2 · pp. 43–49

PPO-Driven Reinforcement Learning Congestion Control Under High-BDP Wide-Area Deployment: A Scaling Analysis

Ali Ghermezian, Kirill Karpov, Dmitry Kachan, Veronika Kirova and Eduard Siemens

📄 Download PDF DOI: 10.25673/123807

Abstract

Reinforcement learning (RL) has recently been explored as an adaptive alternative to hand-designed congestion control, yet most reported evaluations remain confined to simulated or moderate-bandwidth environments. This paper studies the scaling behavior of an Aurora-style Proximal Policy Optimization (PPO) congestion control policy when it is deployed on a real high-bandwidth, high bandwidth-delay product (BDP) wide-area network (WAN) path with a 10 Gbps interface budget. The purpose is not to claim a new state-of-the-art controller, but to identify how a simulator-trained PPO policy behaves when transferred to multi-gigabit operation. We integrate the policy through a user-space PCC shim and analyze transport-level logs, including achieved receive rate, loss dynamics, rate oscillations, and available round-trip time (RTT) indicators. The results show a persistent gap between nominal link capacity and achieved single-flow goodput: the analyzed run reaches a mean receive rate of 49.6 Mb/s and a peak of 438.0 Mb/s, corresponding to 0.50% average and 4.38% peak utilization of the 10 Gbps budget. Additional TCP CUBIC and TCP BBR baselines on the same path reached up to 9.91 Gb/s and 9.72 Gb/s with four flows, confirming that the route itself can sustain multi-gigabit throughput. The rate distribution is heavy-tailed, with short high-rate episodes followed by over-shoot-collapse dynamics, bursty loss, and transient delay inflation. These findings indicate that scale-aware training, more robust reward normalization, and lower-overhead pacing are needed before PPO-driven congestion control can generalize reliably to high-BDP WAN deployments.

Keywords

Reinforcement Learning Congestion Control Proximal Policy Optimization (PPO) High Bandwidth-Delay Product (BDP) Wide-Area Networks (WAN) Aurora.

References

N. Jay, N. T. Li and B. Hariharan, “A deep reinforcement learning perspective on internet congestion control,” in Proceedings of the 36th International Conference on Machine Learning (ICML), Long Beach, CA, USA, 2019, pp. 3050-3059, [Online]. Available: https://proceedings.mlr.press/v97/jay19a/jay19a.pdf?utm_source=chatgpt.com.
K. Winstein and H. Balakrishnan, “TCP ex machina: Computer-generated congestion control,” ACM SIGCOMM Computer Communication Review, vol. 43, no. 4, pp. 123-134, 2013, [Online]. Available: https://doi.org/10.1145/2486001.2486020.
K. Karpov, D. Kachan, V. Kirova, A. Ghermezian, D. Koshutina and E. Siemens, “Tunable multi-objective tree synthesis for application-layer multicast,” in Proceedings of the International Conference on Applied Innovations in IT (ICAIIT), Dec. 2025, pp. 68-73.
N. Mareev, D. Kachan, K. Karpov, D. Syzov, E. Siemens and Y. Babich, “Efficiency of a PID-based congestion control for high-speed IP-networks,” in Proceedings of ICAIIT, vol. 6, no. 1, pp. 129-133, 2018, [Online]. Available: https://doi.org/10.13142/kt10006.45.
M. Dong, Q. Li, D. Zats, J. R. Sokoll and I. Stoica, “PCC Vivace: Online-learning congestion control,” in Proceedings of the 15th USENIX Symposium on Networked Systems Design and Implementation (NSDI), 2018, pp. 343-356.
S. Abbasloo, C.-Y. Yen and H. J. Chao, “Classic meets modern: A pragmatic learning-based congestion control for the Internet,” in Proceedings of the ACM SIGCOMM Conference, 2020, pp. 632-647, [Online]. Available: https://doi.org/10.1145/3387514.3405892.
C. Liao, Y. Zhang and K. Winstein, “Astraea: Towards fair and efficient learning-based congestion control,” in Proceedings of EuroSys, 2024, [Online]. Available: https://doi.org/10.48550/arXiv.2403.01798.
R. Galliera, A. Morelli, R. Fronteddu and N. Suri, “MARLIN: Soft actor-critic based reinforcement learning for congestion control in real networks,” in Proceedings of the IEEE/IFIP Network Operations and Management Symposium (NOMS), 2023, [Online]. Available: https://doi.org/10.1109/NOMS56928.2023.10154210.
L. Giacomoni and G. Parisis, “Reinforcement learning-based congestion control: A systematic evaluation of fairness, efficiency and responsiveness,” in Proceedings of the IEEE INFOCOM Conference, 2024, pp. 1451-1460, [Online]. Available: https://doi.org/10.1109/INFOCOM52122.2024.10621288.
F. Ruffy, M. Przystupa and I. Beschastnikh, “Iroko: A framework to prototype reinforcement learning for data center traffic control,” arXiv preprint arXiv:1812.09975, 2018, [Online]. Available: https://doi.org/10.48550/arXiv.1812.09975.