dl2023

Deep Learning Project 2023 Autumn Semester ETH

Main steps in the project:

agents: Folder containing all agent implementations (except the DT agent)
dt_model: Folder containing stored DT Agent models
dqm_cql_model: Folder containing all stored CQL/DQN models
trajectories: Folder containing stored trajectories (potentially files too large for github)
board.py: Implementation of the connect four board for use in the environment
env.py: Implementation of the environment used by the agents
env_opp_integrated: Implementation of the environment where the opponent is an integrated part of the environment instead of a second separate player/agent
networks.py: Implementation of the networks used in the various agents
notebook.ipynb: Notebook used for training and evaluating the DQN and CQL Agents using the default environment
notebook_opp_integrated: Notebook used for training and evaluating the DQN and CQL Agents using the env_opp_integrated environment
trainer.py: Implementation of the Trainer used for training the DQN and CQL Agents
trainer_opp_integrated.py: Implementation of the Trainer with integrated opponent environment.
dt_notebook.ipynb: Notebook containing the Implementation, Training and Evaluation of the DT Agent. Contains code for generating and storing trajectories used for training. Contains code for offline and online training of DT Agents.
transformer_utils.py: Contains helper functions for the DT Agent code.
utils.py: Contains helper functions for the DQN / CQL Agents and the implemented environment

Open notebook.ipynb
Go to the "Agent Evaluation" cell. Here you only need to set agent_type/opponent_type and agent_network_type/opponent_network_type. The code will run the evaluation for all three models.
For more information about the training and evaluation of the Q-Agents as well as all results open evaluation_result_file/eval.txt.

All the relevant code for generating trajectories and training the DT agent and evaluating it is in the dt_notebook.ipynb file. Simply executing the notebook from top to bottom (while specifying file names for storing or loading models/trajectories) runs all the code. In its current version, when running the notebook, trajectories are generated (currently only 10) instead of read from a file. The training of the transformer is commented out, simply uncomment one line of code "trainer.train()". The online training is performed over 2000 rounds by default, the model which is trained can be switched by loading a different (potentially pretrained) model. The transformer is evaluated in 100 epispodes against a chosen opponent (random opponent currently).
The best way to execute the notebook is to go cell by cell and perform minor changes for filenames, opponents or rounds/iterations according to personal wishes.
To recreate the results shown in the paper, load the selected model (see dt_model folder) and run the evaluation function in the lowest cell with the wanted oppon.ent. Prior cells need to be executed due to code dependencies!