I would like to know what scores do the agents achieve. Could this information be

What is the actual performance? about atari HOT 7 CLOSED

kaixhin commented on August 21, 2024

What is the actual performance?

from atari.

Comments (7)

Kaixhin commented on August 21, 2024

Not sure this can be provided easily as it takes a lot of time and a decent GPU, plus not all runs achieve the scores reported in the paper due to randomness (personal communication with Tom Schaul). Basically with the items that aren't crossed out on the readme, they've matched the score on one game, as reported in the corresponding paper. Not sure about all of the asynchronous agents, but A3C seems to be fine as well (see #48).

from atari.

cgel commented on August 21, 2024

The problem is that there are lots of implementations around that don't match the results from the papers. If the agents of this repo do, you should at least mention it.
And don't you happen to have lying around the boards from a few runs?

from atari.

Kaixhin commented on August 21, 2024

I've put up all the plots that I have to confirm. The dueling DQN doesn't quite match reported performance, but is definitely better than the double DQN - not sure if the discrepancy is in the single run or if something else is the issue.

from atari.

cgel commented on August 21, 2024

That is very helpful!
There is some extra information needed though. Are those results using the training epsilon or the testing one?

from atari.

Kaixhin commented on August 21, 2024

Testing epsilon, matching the value set in the Double DQN paper.

from atari.

cgel commented on August 21, 2024

The is a 50 times smaller epsilon than other papers used. It might be strongly skewing the results in favour of your implementation.
I think that you should report the results like DQN (Space Invaders) epsilon=0.001

from atari.

Kaixhin commented on August 21, 2024

That is the epsilon for testing reported with the double Q paper (see Robustness to Human starts) and the dueling network paper (see 4.1. Policy evaluation). I've now amended the readme to make this clear, alongside the proper citations. I don't currently have the resources to run more experiments, but if you wish to adjust the line I mentioned and pass me the results I'll put them up as well.

from atari.

Recommend Projects

What is the actual performance? about atari HOT 7 CLOSED

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent