Giter VIP home page Giter VIP logo

Comments (6)

lebrice avatar lebrice commented on August 16, 2024 1

When using the gym.vector.VectorEnv API (which is what you're using in this case since batch_size=2048), you don't need to reset the individual envs because they already auto-reset when the episode in any given env is done.

Also just FYI:

  • when you have done=True for the environment at a given index env_idx, then obs[env_idx] is the first observation of the next episode, not the final observation of the previous episode.
  • Likewise, the reward for env env_idx when done[env_idx]==True is the reward associated with the last action you sent. You will have non-zero rewards at that index after that step: since those will be the rewards of the next episode!

Hope this helps :)

from brax.

cdfreeman-google avatar cdfreeman-google commented on August 16, 2024 1

We have some new reset logic plumbing that should resolve this issue in the next couple of days. Originally, we found that we didn't really need to reset during rollouts--we'd just run a rollout for a fixed episode length and then mask out frames after the point at which a done was triggered. Admittedly, this is not what folks are used to, so we'll introduce a wrapper that does that actual resetting logic as you'd usually expect.

from brax.

erikfrey avatar erikfrey commented on August 16, 2024 1

OK! This should be addressed. Envs by default now reset after done=True. You can still get the old behavior if you wish to control auto-resetting yourself, by calling envs.create(..., auto_reset=False)

from brax.

kayuksel avatar kayuksel commented on August 16, 2024

@lebrice Thank you very much for the help!
But if they're on an auto-reset mode, done[env_idx] of next episode should be False.
I have checked the next value of the same env_idx but it was still True. That's a bug?

from brax.

lebrice avatar lebrice commented on August 16, 2024

done[env_idx] of next episode should be False.

No! (Edit: maybe I'm misunderstanding your problem though)

Are you saying that you get done[env_idx]==True multiple steps in a row for the same env_idx?

from brax.

kayuksel avatar kayuksel commented on August 16, 2024

Are you saying that you get done[env_idx]==True multiple steps in a row for the same env_idx?

@lebrice Yes.

from brax.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.