Just a quick solution to the cart problem in tinygrad as a means to learn the basics of Reinforcement Learning. This contains 3 algorithms:
- Vanilla policy gradients
- Actor critic
- Proximal policy optimisation
I also did a quick check over all of them to check how well they can handle hyperparameter changes(treshold for solved is running for 10K steps)
- VPG solved 4/16
- A2C solved 5/16
- PPO solved 10/16
You can look at the learning graps for each gamma/learning rate in the results folder