This is the repository containing the numerical experiments in a paper by me and Ziping, "A Natural Extension To Online Algorithms For Hybrid RL With Limited Coverage", which you can find at https://arxiv.org/abs/2403.09701.
The simulations demonstrate that appending offline data to the experience replay buffer can encourage sufficient exploration for the portion of the state-action space that does not have good coverage. Doing so leads to a simple, but effective, extension of an online RL algorithm to the setting of hybrid RL.
That is, where the "offline partition" is the portion of the state-action space well-visited by the behavior policy, and the "online partition" is its complement, We see that the hybrid RL algorithm visits the online partition more often than the online RL algorithm, and vice-versa for the offline partition. This happens in both a tabular forest management simulator, and a Tetris simulator we cast as a linear MDP.