Giter VIP home page Giter VIP logo

alignment's People

Contributors

ranjaykrishna avatar zixianma avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

alignment's Issues

Continue on the previous question.

    Hi thanks for the questions! It's reasonable that the end steps for all episodes is 25 (I believe the max number of steps is set to 25 by default, and it can remain 25 even if you enable early stopping when the goal is achieved). As for the difference between `test_reward` and `test_bench/step_reward`, it's due to two major differences. First, the reward and benchmark loggers log things a little bit differently: (as far as I remember from my notes) the reward logger resets at the end of each episode whereas the benchmark logger resets only once at the collector's init(), so the trends can be different. Second, the `test_bench/step_reward` additionally divides the episode reward by the number of steps in each episode (i.e. avg reward per step). Please check the code for the reward and benchmark logger as well as `offpolicy_trainer` for your own understanding, and feel free to write your own logger for your purposes! Lmk if you have any other questions, thanks!

Originally posted by @zixianma in #3 (comment)

Thanks for your reply to the previous question. I follow your reminder and check the code for SimpleSpreadBenchmarkLogger and find the code that might be the key to the difference between these two metrics (i.e., test_reward and test_bench/step_reward). Here is the code:

bench_data = elem['n'][0]

Here you only add the info of the first agent (i.e., elem['n'][0]). However, for the default setting, there are 5 agents and the length of elem['n'] is 5. Moreover, each element in elem['n'] has different info for different agents, so the rewards can be different. This phenomenon does not occur in the computation of test_reward, so their trends are different. Could you help me check out if my understanding is correct? Thanks!

Reproducing the results in the paper

I recently read the paper preprint on Arxiv, and it's nice to have the official implementation here, thank you! I ran into some problems getting it up and running, for example when reproducing the ELIGN-adv result on Predator-Prey (2v2), the code gave an error in sacd_multi_wm.py, line 307, where nonzero_obs_count contains zeroes and yields NaN. The exact command I used is python train_multi_sacd.py --task simple_tag_in --num-good-agents 2 --num-adversaries 2 --obs-radius 0.5 --intr-rew elign_adv --epoch 100 --save-models --benchmark --logdir log/simple_tag_2_2_elign_adv_32procs --wandb-enabled --training-num 32 --test-num 32.
I'm also a little bit confused about the training time, which is 100+ epochs * 800K episodes per epoch * 25 timesteps per episode = 2*10^9+ timesteps in the Arxiv preprint; this seems somewhat large, could the authors confirm this? It would be great to have the scripts reproducing the main results, in case I made some mistakes regarding the command line arguments to use. Thank you!

Questions about the rewards in logger

Hi, I ran your codes with three different configs and obtained three variants. I observe the tensorboard, finding that the trend of 'test_reward' is different from 'test_bench/step_reward'. Could you explain that? BTW, the end steps for each episode is 25, is it normal? Here is my tensorboard.

image

Confused about the number of episodes and steps per episode while training

Hello, I am so interested in your work about intrinsic reward. However, it makes me confused that the number of training episodes and steps per episode disagree with the code. Your paper said "800K episodes of 25 timesteps" for training and "1K test episodes" for evaluating. But the code set the "--epoch 10" and "--step-per-epoch 1000" by default. It bothers me again that the code setting trains even better. I don't know if something went wrong, just want to know the right episode and step per episode number.

Thanks.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.