An Exploration of Machine Learning Algorithms on The Multi-Arm Bandit Task
The purpose of this project was to explore various implementations of reinforcement learning algorithms using a basic multi-arm bandit tasks. The algorithm variables included the memory depth of trial results, expected value calculation method, and decision policy. The task variables included number of arms, outcome values, outcome probability structures, and jump frequency.
Following the build, test, and analysis processes of this project, I recognize that the research methods I employed lacked conclusive validity. Not only was the research method far too complex, but also failed to directly focus on the right group differences. This has informed me better ways to organize research methods for future projects.