Code used in the paper "ENERO: Efficient real-time WAN routing optimization with Deep Reinforcement Learning". In this paper, the DRL agent is implemented with the PPO algorithm
I am not quite sure what "k_middlepoints" means? It means we assume that there is k middle points?
Another question is in the line 50 of environment16.py: self.top_K_critical_demands = False # If we want to take the top X% of the 5 most loaded links
Does "K" means the same here? Could you please provide more explanation?