Giter VIP home page Giter VIP logo

Comments (12)

lake4790k avatar lake4790k commented on August 21, 2024

Even after this fix the performance compared to gpu was poor, looked futher and I think I found another problem in Experience.store:

self.states[self.index] = state:float():mul(self.imgDiscLevels) -- float -> byte

if state is FloatTensor to begin with float() doesn't copy, so mul() will modify the state tensor and this state tensor is the observation that was put in the stateBuffer, so that gets infected as well.

If I fix this with doing an explicit clone at the above location cpu finally seems to converge as well as gpu. This is now really important for me to have a correct comparison base for the async learning.

from atari.

Kaixhin avatar Kaixhin commented on August 21, 2024

Nice spot! I'd err on the side of removing inplace operations when they will mess up operations, rather than doing that clone there. I'm working on a fix branch now so please have a look and see if that's OK. Will go through the code now and try and see if anything else might be messed up - and will report back once I've had a look through.

from atari.

Kaixhin avatar Kaixhin commented on August 21, 2024

Done - please check the changes (between master and fix), and I will merge it into master.

from atari.

lake4790k avatar lake4790k commented on August 21, 2024

Many thx for the quick fix. Yes, the torch.* functions are much better, inplace is evil.

I think the issue I first mentioned is still there: if the environment is catch the tensor argument to CircularQueue:pushis the same screen tensor that is just updated by catch, so this
self.queue[self.length] = tensor:typeAs(self.queue[1])

will put the same tensor coming form inside catch in multiple positions in the queue and when catch updates the screen all will be modified.

catch just redraws the screen in the same tensor internally and returns its screen tensor (exposing it). Maybe it should be rather defensive and return a clone?

from atari.

Kaixhin avatar Kaixhin commented on August 21, 2024

Yes this problem might crop up again so best be defensive here - added suggested fix and some comments explaining what's going on.

from atari.

lake4790k avatar lake4790k commented on August 21, 2024

I ran the updated code, catch now converges similarly for cpu/gpu, like with my adhoc fixes, thx!

from atari.

Kaixhin avatar Kaixhin commented on August 21, 2024

@lake4790k I'm finally getting round to finishing my Torch blog post on the DQN + Dueling DQN and think you deserve a mention in the acknowledgements - just let me know what name you want.

from atari.

lake4790k avatar lake4790k commented on August 21, 2024

thx, the name is Laszlo Keri. I'll try to earn it, the async method should be ready soon...

from atari.

Kaixhin avatar Kaixhin commented on August 21, 2024

Just finished a run on CPU and compared it to one I did yesterday - I get slightly faster learning in the beginning, but it slows down and ends up at the same score (~0.8). My previous run had a rather linear score curve. Run on CPU and GPU with current code are very similar. Debugging DRL is a tricky job...

from atari.

lake4790k avatar lake4790k commented on August 21, 2024

Yes, I also saw those two different profiles (linear vs fast/slow) on different runs ending up at around 0.8. Also the deepmind papers show a wide range of evolutions when they plot the best 5 agents, so should be fine. Also I see that cpu/gpu have similar profiles, which is what one would expect.

from atari.

lake4790k avatar lake4790k commented on August 21, 2024

@Kaixhin btw how much time did it take for you to converge up to 0.8 with catch? For me it's a lot, even for this simple game: cpu at least 30 mins, gpu rather at least 45 mins... is this normal or too slow...?

from atari.

Kaixhin avatar Kaixhin commented on August 21, 2024

@lake4790k It takes me about 1.5 hours to finish training for CPU and 2 hours for GPU. Minus 20-30 minutes to get to epoch 35-40, which is where it seems to converge. But I am running this on a laptop with a 2.4GHz Intel Core i7 and a NVIDIA GeForce GT 650M, so a good desktop should get times like yours.

from atari.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.