Giter VIP home page Giter VIP logo

Comments (6)

rPortelas avatar rPortelas commented on August 22, 2024 1

@emailweixu It is true that Freeway is challenging in terms of exploration, however in both the EfficientMuzero paper and the original Muzero paper (check Table S1 in appendix), non-zero performance improvements are reported. So we should be able to reproduce it.

from efficientzero.

emailweixu avatar emailweixu commented on August 22, 2024 1

@rPortelas I know both EfficientZero and MuZero reported reasonable performance on Freeway. The original MuZero is not opensourced so I cannot re-run the experiments and cannot know for sure. But since it trained on much more frames (20B frames), it is more likely to be able to obtain reward though random exploration. Furthermore, the original MuZero paper didn't describe how the weights of the models are initialized, it is possible that non-zero initialization of the last prediction layer can get some reward (non-zero initialization can make the initial policy not uniformly random). In fact, I did try non-zero initialization with EfficientZero (change init_zero to False from True), it did get some reward during the training, but the final performance is still much lower than the reported number. But zero initialization is explicitly described by EfficientZero in A.1.

from efficientzero.

rPortelas avatar rPortelas commented on August 22, 2024

Strengthening the relevance of @emailweixu reproducibility issue

Here are my performance results on Freeway, 4 seeds:
freeway_4seeds

The 4 seeds obtained a score of 0 by the end of training, however 1 seed did manage to reacher 21.5 reward at some points during training.

I used the provided train.sh script (so 4gpus), with the following modifications to fit my setup: I used "--object_store_memory 100000000000" and "--num_cpus 80", which should not impact performance.

This issue is related to issue #21 , which points out another reproducibility issue. See issue #21 for potential reasons.

Best,
Rémy

from efficientzero.

emailweixu avatar emailweixu commented on August 22, 2024

@rPortelas Actually, I have reasons to believe that zero score for Freeway is expected. If you play Freeway yourself, you can see that it needs consistent exploration for one direction (UP) for many steps in order to get any reward. However, for the current implementation of EfficientZero, the behavior policy is a stochastic policy based on MCTS result. And at the beginning of training, the policy from MCTS is close to uniform given how EfficientZero is initialized (i.e. zero initialization for last layer of prediction nets), which makes it very hard to consistently go UP. Other algorithms such as CURL or SPR uses a greedy policy (coupled with noisy net) and are more likely to have consistent exploration behavior.

from efficientzero.

szrlee avatar szrlee commented on August 22, 2024

Thanks for the discussion!
Any follow-up message so far?

from efficientzero.

emailweixu avatar emailweixu commented on August 22, 2024

@rPortelas did you try the "raw" version you mentioned in #21 on Freeway?

from efficientzero.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.