Designed a Novel End to End algorithm with Reinforcement Learning guided by Online Learning to MPC(Model Predictive Control), leading to optimal policies for continuous control tasks.
Simulation results: Control Task is to Swingup and Balance the pole:
a)Control of Cartpole with Augmented Random Search(a Model Free RL):
Pole Fails to stay in the upright position after swingup using ARS alone
b)Control of Cartpole with Augmented Random Search + Online Learning MPC:
Pole balances forever, online Learning guides the Augmented Random Search (RL policy)