kristjankorjus / replicating-deepmind Goto Github PK
View Code? Open in Web Editor NEWReproducing the results of "Playing Atari with Deep Reinforcement Learning" by DeepMind
License: GNU General Public License v3.0
Reproducing the results of "Playing Atari with Deep Reinforcement Learning" by DeepMind
License: GNU General Public License v3.0
Play small example by hand and make sure that program outputs same numbers as we have on paper.
wait a minute... are you saying you have copied deep minds functionality and written your own atari game agent?
Function should create 84x84 numpy array which has 8x8 squares, each is black or white - like a 10x10 chessboard with some additional room near right and bottom borders.
ALE class should encapsulate only connection with ALE and should be decoupled from MemoryD. In particular all references to memory should be removed and should be moved to main.py.
Also I see no reason why ALE should be in separate directory, why not src with main.py?
MemoryD class contains some non-trivial logic. As it is completely independent class, it should be easy to write tests for this class. This would ensure, that it works correctly. And maybe do some profiling as well.
NeuralNet class should have only methods train() and predict(), everything else (in particular predict_best_action() and minibatch processing) should be moved to main.py. NeuralNet should be simple wrapper around ConvNet, this would allow using it in other projects too.
EENet has GPUs in Tartu
ALE and cuda-convnet2 should be defined as submodules of DeepMind. This way you will get latest version of both ALE and cuda-convnet2 when doing checkout. Also we can push our fixes directly to those projects.
http://stackoverflow.com/questions/5252450/using-someone-elses-repo-as-a-git-submodule-on-github
What to do with cuda-convnet2 patches? We can leave them as manual work, prepare them as patch files to be applied automatically during make, or hope that they are included in next release.
The weights should be initialized in a way that the initial values for expected rewards (when giving input to initial network) would be the same order of magnitude or rather a few orders of magnitude smaller than the reward that we give in case we break a tile (reward=1). At the moment the rewards at the randomly initialized network go as far as (-200 or +200).
We need to decrease weight values, because then adding a reward of 1 to a desired transition/state would really make us choose this same transition next time.
this should be done in constructors of individual layers (the way we initialize W and B)
also, Biases are all initialized at zero for the moment. need to change that.
On lines 124 and 126 the statement predict_rewards([transition['prestate']])
shows up twice. Computing this only once would bring down the number of neural network evaluations per frame.
The memory is of a fixed length, so when we reach 100000 transistions in memory, we need to start overwriting the first transitions. This is not implemented so far.
Attention should be put to figuring out how to deal with extracting transistions from minibatch once part of the transitions have been overwritten. For example: If we have overwritten transistions till position 10 in memory and minibatch asks for transition nr 11, then the 3 previous "images" in the memory do not correspond to what actually happened before the transition 11. SO we either
1)give a repetition of the 11th image instead of img 10, img9 and img8... as we do in the case when we are asked transistions in the beginng of new game. Downside of it is that actually such transition (same image for 4 frames and then an action) never takes place.
or
2) we forbid the minibatch to ask transitions at that location. Considering we have another 1M of transitions to choose from, frobidding to select 3 of them, seems like no problem.
@RDTm, could you write a quick manual to get the emulator runnning?
I downloaded this about a month ago and ran it on my GPU.
Is frames_played
incremented anywhere? I was printing out the value for epsilon
on every frame and it seemed to stay at 0.9
.
for each frame, the loop in preprocessor.py must run for 210*160 times, which is bit inefficient:
for i in range(len(image_string)/2):
num_rows = i % width
num_cols = i / width
hex1 = int(image_string[i*2], 16)
# Division by 2 because: http://en.wikipedia.org/wiki/List_of_video_game_console_palettes
hex2 = int(image_string[i*2+1], 16)/2
gray_val = int(arr[hex2, hex1])
pixels[num_rows, num_cols] = (gray_val, gray_val, gray_val)
# Crop and downscale image
roi = (0, 33, 160, 193) # region of interest is lines 33 to 193
img = img.crop(roi)
new_size = 84, 84
img.thumbnail(new_size)
We need to crate a function to save the learned network parameters to file after a desired nr of games is played in main.py.
We need to add a constructor of NeuralNet, that would build neural net from given weight values.
Minibatch components are addressed using indexes, should be using names. Also rename variables in NeuralNet.train() to prestate and poststate.
I am still not clear about the function φ in Algorithm 1. It is obvious from the paper that by using the function φ the input to Q-network is clipped into a 84×84×4 image. But how did it do that?
In Algorithm 1 we found that
and
This makes me confused. What on earth is s_t+1? Does that mean:
s1 = x1
s2 = s1,a1,x2 = x1,a1,x2
s3 = s2,a2,x3 = x1,a1,x2,a2,x3
s4 = s3,a3,x4 = x1,a1,x2,a3,x3,a3,x4
......
So how did φ process s3, for instance? φ(3) should equal to φ(s3) = φ(x1,a1,x2,a2,x3)? I feel hard to understand this.
I would appreciate if anyone could help.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.