dylandjian / retro-contest-sonic Goto Github PK

View Code? Open in Web Editor NEW

76.0 3.0 7.0 14.23 MB

World Models applied to the Open AI Sonic Retro Contest

Python 100.00%

world-models reinforcement-learning deep-learning sonic python3 pytorch

retro-contest-sonic's Introduction

retro-contest-sonic

A student implementation of the World Models paper with documentation.

Ongoing project.

TODO

CURRENTLY DOING

DONE

β-VAE for the Visual model
MDN-LSTM for the Memory model
CMA-ES for the Controller model
Training pipelines for the 3 models
Human recordings to generate data
MongoDB to store data
LSTM and VAE trained "successfully"
Multiprocessing of the evaluation of a set of parameters given by the CMA-ES
Submit learnt agents

LONG TERM PLAN ?

Cleaner code, more optimized and documented
Game agnostic
Continue training / testing better architectures
Online training instead of using a database

How to launch the scripts

Install the modules in the requirements.txt, pytorch 0.4 and mongoDB
Buy or find the ROMs of Sonic The Hedgehog and install them with retro-gym.

Once you've done that, you will need to train the 3 components :
python train_vae.py
python train_lstm.py --folder=xxx
python train_controller.py --folder=xxx where xxx is the folder number created in saved_models/

While training the VAE and the LSTM, pictures will be saved in a folder results/

Once you're done, you can use your best trained controller to play a random level using : python play_best --folder=xxx
Dont forget to change the RENDER_TICK in const.py to 1, so you can see what's happening.

Resources

My write-up on the code and concepts of this repository
World Models paper
Coded on Ubuntu 16.04, Python 3.5, PyTorch 0.4 with GPUs (some change have to be made in order to make it fully compatible to CPU as well, such as adding a map_location="cpu" when loading the model)

Differences with the official paper

No temperature
No flipping of the loss sign during training (to encourage exploration)
β-VAE instead of VAE

retro-contest-sonic's People

Contributors

Stargazers

Watchers

Forkers

rgilman33 joydosun w0lv3r1nix kelvinson epigos hazard-nico

retro-contest-sonic's Issues

How to generate the dataset

Hi, is the method of generating datasets provided in this repository? If so (I noticed that there are files with name "jerk.py" and "human.py"), how can I run these code to get dataset?

Thanks for your help, looking forward to your response :D

A quick question about LSTM model

Thanks for sharing your grate work.

I just have a quick question about LSTM model. How many layers constitute your LSTM model?

How to get the buttons info?

Hi jian. Hope you are doing well. The env.py just shows the info directly: buttons = [“B”, “A”, “MODE”, “START”, “UP”, “DOWN”, “LEFT”, “RIGHT”, “C”, “Y”, “X”, “Z”]. Where can we reach this information for a new game? Any suggestions will be appreciated.

Controller and CMA-ES : number of parameters.

Hey !

Thanks for the PyTorch code, it is pretty useful. The writeup is great too.

I have two questions regarding CMA-ES and the Controller (which is the policy mapping states to actions).

Regarding the number of parameters in the policy

The goal of CMA-ES is to optimize the policy of the controller, which is in your case the neural network defined here. This neural network, composed of 2 FC layers, has over 1M parameters ((1024*2 + 200) * 512 + 512 * 4 = 1 153 024). Would you expect CMA-ES to work on such a high dimensional parameter space ? In the World Models paper, they justify using CMA-ES because they intentionally use a linear policy, which has less than 1k parameters. So it seem weird to use a MLP for the policy.

Regarding what is passed as input to CMA-ES

Also, I don't understand why the number of parameters passed to CMA-ES is PARAMS_FC1 + LATENT_VEC + 512. Isn't this number should be the number of parameters in the policy aka Controller ? Then it should be (PARAMS_FC1 + LATENT_VEC) * 512 + 512 * ACTION_SPACE (as in the calculation mentioned before).

Unable to run train_vae.py

Even after generating the dataset you mentioned here #2 , I am getting the following error when trying to run python train_vae.py,
[TRAIN] Fetching: 25 new run from the db [TRAIN] Last id: 0, added runs: 4 added frames: 10031 [TRAIN] current iteration: 10, averaged loss: 32442.459 Traceback (most recent call last): File "train_vae.py", line 140, in <module> main() File "/home/paperspace/anaconda3/lib/python3.6/site-packages/click/core.py", line 722, in __call__ return self.main(*args, **kwargs) File "/home/paperspace/anaconda3/lib/python3.6/site-packages/click/core.py", line 697, in main rv = self.invoke(ctx) File "/home/paperspace/anaconda3/lib/python3.6/site-packages/click/core.py", line 895, in invoke return ctx.invoke(self.callback, **ctx.params) File "/home/paperspace/anaconda3/lib/python3.6/site-packages/click/core.py", line 535, in invoke return callback(*args, **kwargs) File "train_vae.py", line 136, in main train_vae(str(current_time)) File "train_vae.py", line 102, in train_vae traverse_latent_space(vae, frames[0], frames[-1], total_ite) File "/home/paperspace/dev/retro-contest-sonic/lib/visu.py", line 24, in traverse_latent_space save_image(res, 'results/vae/sample_traverse_{}.png'.format(total_ite)) File "/home/paperspace/anaconda3/lib/python3.6/site-packages/torchvision/utils.py", line 104, in save_image im.save(filename) File "/home/paperspace/anaconda3/lib/python3.6/site-packages/PIL/Image.py", line 1932, in save fp = builtins.open(filename, "w+b") FileNotFoundError: [Errno 2] No such file or directory: 'results/vae/sample_traverse_20.png'
also I had to manually hardcode the timestamp in train_vae.py variable current_time as it was already extracting current time (and I had generated the mongodb dataset retro_contest earlier).