I noticed the CircularQueue stateBuffer will have the

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

stateBuffer issue with Catch on CPU about atari HOT 12 CLOSED

kaixhin commented on August 21, 2024

stateBuffer issue with Catch on CPU

from atari.

Comments (12)

lake4790k commented on August 21, 2024

Even after this fix the performance compared to gpu was poor, looked futher and I think I found another problem in Experience.store:

self.states[self.index] = state:float():mul(self.imgDiscLevels) -- float -> byte

if state is FloatTensor to begin with float() doesn't copy, so mul() will modify the state tensor and this state tensor is the observation that was put in the stateBuffer, so that gets infected as well.

If I fix this with doing an explicit clone at the above location cpu finally seems to converge as well as gpu. This is now really important for me to have a correct comparison base for the async learning.

from atari.

Kaixhin commented on August 21, 2024

Nice spot! I'd err on the side of removing inplace operations when they will mess up operations, rather than doing that clone there. I'm working on a fix branch now so please have a look and see if that's OK. Will go through the code now and try and see if anything else might be messed up - and will report back once I've had a look through.

from atari.

Kaixhin commented on August 21, 2024

Done - please check the changes (between master and fix), and I will merge it into master.

from atari.

lake4790k commented on August 21, 2024

Many thx for the quick fix. Yes, the torch.* functions are much better, inplace is evil.

I think the issue I first mentioned is still there: if the environment is catch the tensor argument to CircularQueue:pushis the same screen tensor that is just updated by catch, so this
self.queue[self.length] = tensor:typeAs(self.queue[1])

will put the same tensor coming form inside catch in multiple positions in the queue and when catch updates the screen all will be modified.

catch just redraws the screen in the same tensor internally and returns its screen tensor (exposing it). Maybe it should be rather defensive and return a clone?

from atari.

Kaixhin commented on August 21, 2024

Yes this problem might crop up again so best be defensive here - added suggested fix and some comments explaining what's going on.

from atari.

lake4790k commented on August 21, 2024

I ran the updated code, catch now converges similarly for cpu/gpu, like with my adhoc fixes, thx!

from atari.

Kaixhin commented on August 21, 2024

@lake4790k I'm finally getting round to finishing my Torch blog post on the DQN + Dueling DQN and think you deserve a mention in the acknowledgements - just let me know what name you want.

from atari.

lake4790k commented on August 21, 2024

thx, the name is Laszlo Keri. I'll try to earn it, the async method should be ready soon...

from atari.

Kaixhin commented on August 21, 2024

Just finished a run on CPU and compared it to one I did yesterday - I get slightly faster learning in the beginning, but it slows down and ends up at the same score (~0.8). My previous run had a rather linear score curve. Run on CPU and GPU with current code are very similar. Debugging DRL is a tricky job...

from atari.

lake4790k commented on August 21, 2024

Yes, I also saw those two different profiles (linear vs fast/slow) on different runs ending up at around 0.8. Also the deepmind papers show a wide range of evolutions when they plot the best 5 agents, so should be fine. Also I see that cpu/gpu have similar profiles, which is what one would expect.

from atari.

lake4790k commented on August 21, 2024

@Kaixhin btw how much time did it take for you to converge up to 0.8 with catch? For me it's a lot, even for this simple game: cpu at least 30 mins, gpu rather at least 45 mins... is this normal or too slow...?

from atari.

Kaixhin commented on August 21, 2024

@lake4790k It takes me about 1.5 hours to finish training for CPU and 2 hours for GPU. Minus 20-30 minutes to get to epoch 35-40, which is where it seems to converge. But I am running this on a laptop with a 2.4GHz Intel Core i7 and a NVIDIA GeForce GT 650M, so a good desktop should get times like yours.

from atari.

stateBuffer issue with Catch on CPU about atari HOT 12 CLOSED

Comments (12)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent