opendilab / di-engine Goto Github PK
View Code? Open in Web Editor NEWOpenDILab Decision AI Engine. The Most Comprehensive Reinforcement Learning Framework B.P.
Home Page: https://di-engine-docs.readthedocs.io
License: Apache License 2.0
OpenDILab Decision AI Engine. The Most Comprehensive Reinforcement Learning Framework B.P.
Home Page: https://di-engine-docs.readthedocs.io
License: Apache License 2.0
import ding, torch, sys
print(ding.__version__, torch.__version__, sys.version, sys.platform)
# v0.2.3 1.8.1 3.9.12 (main, Mar 26 2022, 15:51:15)
WARNING:root:If you want to use numba to speed up segment tree, please install numba first
Traceback (most recent call last):
File "/Users/zzhaoao/Documents/RL/New/DI-engine/dizoo/gfootball/entry/parallel/gfootball_ppo_parallel_config.py", line 102, in <module>
parallel_pipeline(config, seed=0)
File "/Users/zzhaoao/Documents/RL/New/DI-engine/ding/entry/parallel_entry.py", line 52, in parallel_pipeline
launch_coordinator(config.seed, config, learner_handle=learner_handle, collector_handle=collector_handle)
File "/Users/zzhaoao/Documents/RL/New/DI-engine/ding/entry/parallel_entry.py", line 125, in launch_coordinator
coordinator = Coordinator(config)
File "/Users/zzhaoao/Documents/RL/New/DI-engine/ding/worker/coordinator/coordinator.py", line 61, in __init__
self._exp_name = cfg.main.exp_name
AttributeError: 'EasyDict' object has no attribute 'exp_name'
WARNING:root:If you want to use numba to speed up segment tree, please install numba first
WARNING:root:If you want to use numba to speed up segment tree, please install numba first
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/usr/local/Cellar/[email protected]/3.9.12/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/spawn.py", line 116, in spawn_main
exitcode = _main(fd, parent_sentinel)
File "/usr/local/Cellar/[email protected]/3.9.12/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/spawn.py", line 126, in _main
self = reduction.pickle.load(from_parent)
File "/usr/local/Cellar/[email protected]/3.9.12/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/synchronize.py", line 110, in __setstate__
self._semlock = _multiprocessing.SemLock._rebuild(*state)
FileNotFoundError: [Errno 2] No such file or directory
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/usr/local/Cellar/[email protected]/3.9.12/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/spawn.py", line 116, in spawn_main
exitcode = _main(fd, parent_sentinel)
File "/usr/local/Cellar/[email protected]/3.9.12/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/spawn.py", line 126, in _main
self = reduction.pickle.load(from_parent)
File "/usr/local/Cellar/[email protected]/3.9.12/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/synchronize.py", line 110, in __setstate__
self._semlock = _multiprocessing.SemLock._rebuild(*state)
FileNotFoundError: [Errno 2] No such file or directory
Exception ignored in: <function Coordinator.__del__ at 0x14d374790>
Traceback (most recent call last):
File "/Users/zzhaoao/Documents/RL/New/DI-engine/ding/worker/coordinator/coordinator.py", line 289, in __del__
self.close()
File "/Users/zzhaoao/Documents/RL/New/DI-engine/ding/worker/coordinator/coordinator.py", line 268, in close
if self._end_flag:
AttributeError: 'Coordinator' object has no attribute '_end_flag'
Hi all,
Nice project. We want to start using it. After reading the doc and the config dizoo/competitive_rl/entry/cpong_dqn_default_config.py
for league train, there are still something not clear to us. Do you have a channel that can discuss trivial questions frequently? Like a WeChat group or slack channel?
The error message is as follows。
Traceback (most recent call last):
File "/root/cityflow/my_cityflow/PPO_Continuous/cityflow_ppo_continuous_train.py", line 201, in
serial_pipeline_onpolicy([main_config, create_config], seed=0)
File "/root/cityflow/my_cityflow/PPO_Continuous/cityflow_ppo_continuous_train.py", line 193, in serial_pipeline_onpolicy
learner.train(new_data, collector.envstep)
File "/root/DI-engine/ding/worker/learner/base_learner.py", line 166, in wrapper
ret = fn(*args, **kwargs)
File "/root/DI-engine/ding/worker/learner/base_learner.py", line 203, in train
log_vars = self._policy.forward(data)
File "/root/DI-engine/ding/policy/ppo.py", line 214, in _forward_learn
ppo_loss, ppo_info = ppo_error_continuous(ppo_batch, self._clip_ratio)
File "/root/DI-engine/ding/rl_utils/ppo.py", line 181, in ppo_error_continuous
dist_new = Independent(Normal(mu_sigma_new['mu'], mu_sigma_new['sigma']), 1)
File "/opt/conda/lib/python3.6/site-packages/torch/distributions/normal.py", line 50, in init
super(Normal, self).init(batch_shape, validate_args=validate_args)
File "/opt/conda/lib/python3.6/site-packages/torch/distributions/distribution.py", line 56, in init
f"Expected parameter {param} "
ValueError: Expected parameter loc (Tensor of shape (64, 1)) of distribution Normal(loc: torch.Size([64, 1]), scale: torch.Size([64, 1])) to satisfy the constraint Real(), but found invalid values:
tensor([[nan],
[nan],
[nan],
[nan],
[nan],
[nan],
[nan],
[nan],
[nan],
[nan],
[nan],
[nan],
[nan],
[nan],
[nan],
[nan],
[nan],
[nan],
[nan],
[nan],
[nan],
[nan],
[nan],
[nan],
[nan],
[nan],
[nan],
[nan],
[nan],
[nan],
[nan],
[nan],
[nan],
[nan],
[nan],
[nan],
[nan],
[nan],
[nan],
[nan],
[nan],
[nan],
[nan],
[nan],
[nan],
[nan],
[nan],
[nan],
[nan],
[nan],
[nan],
[nan],
[nan],
[nan],
[nan],
[nan],
[nan],
[nan],
[nan],
[nan],
[nan],
[nan],
[nan],
[nan]], grad_fn=)
The parameter configuration is as follows。
policy=dict(
cuda=False,
action_space='continuous',
recompute_adv=True,
model=dict(
obs_shape=20,
action_shape=1,
action_space='continuous',
share_encoder = True,
encoder_hidden_size_list=[256,64],
actor_head_hidden_size = 64,
actor_head_layer_num = 1,
critic_head_hidden_size = 64,
critic_head_layer_num = 1,
activation = nn.ReLU(),
norm_type = None,
sigma_type = 'conditioned',
fixed_sigma_value = 0.3,
bound_type = 'tanh',
),
learn=dict(
multi_gpu=False,
epoch_per_collect=5,
batch_size=64,
learning_rate=3e-4,
value_weight=0.5,
entropy_weight=0.01,
clip_ratio=0.2,
adv_norm=True,
value_norm=True,
ignore_done=False,
grad_clip_type='clip_norm',
grad_clip_value=0.5,
),
collect=dict(
n_sample=int(640),
unroll_len=1,
discount_factor=0.99,
gae_lambda=0.95,
),
import ding, torch, sys
print(ding.__version__, torch.__version__, sys.version, sys.platform)
If I use n_episode for SAC policy, it raises error. 'AssertionError: n_episode/n_sample in policy cfg can't be not None at the same time'.
I find that there is a default config value 'n_sample=1' in SAC policy, thus if I define n_episode, two config keys exist at the same time.
I suggest delete the default config for n_sample.
import ding, torch, sys
print(ding.__version__, torch.__version__, sys.version, sys.platform)
We will design a brave new interactive interface in Di-engine 1.0, including program api and cli commands, which will support most reinforcement learning scenarios, and the rest will be implemented by our elastic atomic
components.
Here are some design guidelines for the new interfaces:
Any suggestions are welcome, please leave your comments in this channel.
Hello there, I'm sort of a newbie here. I am trying to reproduce some of the atari games with R2D2, and I am unable to produce them. I've been blocked on this for quite some time and it would be a great help if anyone can help me here.
Thank you.
Hi, I want to use your code as a baseline, however I am not sure if your code is good enough to match the results in original paper.
Could you please provide more information, like training figures?
import ding, torch, sys
print(ding.__version__, torch.__version__, sys.version, sys.platform)
Added this issue as suggested by @PaParaZz1.
TrueSkill is a ranking metric developed by Microsoft for game matchmaking. Unlike ELO which just measures one agent's strength, TrueSkill can measure both strength and stability. Each player starts with mu=25.000
and sigma=8.333
. Former one (mu
) measures strength and the latter one (sigma
) measures stability. After receiving payoffs of one matching, mu
and sigma
will be updated accordingly from the TrueSkill API. Final agent's score can be defined as mu - 3 * sigma
to take both strength and stability into consideration.
Currently this metric is missing in the league demo. It would be better to add it.
slimevolleygym is a pong-like physics game env from the open source community. It follows standard OpenAI gym interface. Naive PPO self-play achieves scores of -0.371 ± 1.085
in slimeVolley-v0
env against built-in AI report.
It would be good to benchmark opendilab's league training and see if it can generate higher results.
dist = TransformedDistribution(Independent(Normal(mu, sigma), 1), [TanhTransform()])
next_action = dist.rsample()
next_log_prob = dist.log_prob()
This is much easier and more importantly, more numerically stable.
2. I also recommend the practice in mbsac when creating variants of a policy. A lot of copies in configs
and __init__learn
are not necessary (for example, in SQILSACPolicy)
3. There are two sac papers sac-v1 and sac-v2 from the same search group, I think we should include links to both papers since it is sac-v2 that proposes automatic entropy adjustment.
4. Maybe we can delete value_network
instead of hardcoding value_network=False
, which is not commonly used and not even used in sac-v2.
5. Maybe we can create a subdirectory for sac instead of squeezing every variant of sac in a single file. Since sac is a very good commonly-used baseline. If a subdirectory is not necessary, at least we should move SACPolicy
to the top instead of SACDiscretePolicy
, which is not commonly used.
File "/root/cityflow/my_cityflow/SAC/cityflow_sac_train.py", line 177, in serial_pipeline
random_collect(cfg.policy, policy, collector, collector_env, commander, replay_buffer)
File "/root/DI-engine/ding/entry/utils.py", line 40, in random_collect
new_data = collector.collect(n_sample=policy_cfg.random_collect_size, policy_kwargs=collect_kwargs)
File "/root/DI-engine/ding/worker/collector/sample_serial_collector.py", line 251, in collect
self._obs_pool[env_id], self._policy_output_pool[env_id], timestep
File "/root/DI-engine/ding/policy/sac.py", line 453, in _process_transition
'logit': model_output['logit'],
KeyError: 'logit'
# ding version `v0.2.0`, linux platform
CPU utilization is not 100% and very low. (below 5% on average)
clone the repo and git checkout main
. (currently on 0fcfdf26
). Run python3 dizoo/slime_volley/entry/slime_volley_selfplay_ppo_main.py
. Open htop
to check CPU usage. Only one core is occupied on a multi-core machine.
During training, run command mpstat 3
. The column of %idle
is less than 20%
(Current value is 97%
)
import ding, torch, sys
print(ding.__version__, torch.__version__, sys.version, sys.platform)
v0.3.1 1.8.1+cpu 3.8.12 (default, Oct 12 2021, 03:01:40) [MSC v.1916 64 bit (AMD64)] win32
When running the basic example: python3 -u dizoo/classic_control/cartpole/entry/cartpole_dqn_main.py
It shows the following error.
Traceback (most recent call last):
File "C:/ProgramData/Anaconda3/envs/PYTORCH/Lib/site-packages/dizoo/classic_control/cartpole/entry/cartpole_dqn_main.py", line 91, in <module>
main(cartpole_dqn_config)
File "C:/ProgramData/Anaconda3/envs/PYTORCH/Lib/site-packages/dizoo/classic_control/cartpole/entry/cartpole_dqn_main.py", line 84, in main
evaluator = InteractionSerialEvaluator(
File "C:\ProgramData\Anaconda3\envs\PYTORCH\lib\site-packages\ding\worker\collector\interaction_serial_evaluator.py", line 56, in __init__
self.reset(policy, env)
File "C:\ProgramData\Anaconda3\envs\PYTORCH\lib\site-packages\ding\worker\collector\interaction_serial_evaluator.py", line 112, in reset
self.reset_env(_env)
File "C:\ProgramData\Anaconda3\envs\PYTORCH\lib\site-packages\ding\worker\collector\interaction_serial_evaluator.py", line 76, in reset_env
self._env.launch()
File "C:\ProgramData\Anaconda3\envs\PYTORCH\lib\site-packages\ding\envs\env_manager\base_env_manager.py", line 199, in launch
[2022-05-22 16:15:02] ERROR Env 0 reset has exceeded max retries(1) base_env_manager.py:274
self.reset(reset_param)
File "C:\ProgramData\Anaconda3\envs\PYTORCH\lib\site-packages\ding\envs\env_manager\base_env_manager.py", line 242, in reset
self._reset(env_id)
File "C:\ProgramData\Anaconda3\envs\PYTORCH\lib\site-packages\ding\envs\env_manager\base_env_manager.py", line 281, in _reset
raise runtime_error
File "C:\ProgramData\Anaconda3\envs\PYTORCH\lib\site-packages\ding\envs\env_manager\base_env_manager.py", line 259, in _reset
obs = reset_fn()
File "C:\ProgramData\Anaconda3\envs\PYTORCH\lib\site-packages\ding\envs\env_manager\base_env_manager.py", line 251, in reset_fn
return self._envs[env_id].reset(**self._reset_param[env_id])
File "C:\ProgramData\Anaconda3\envs\PYTORCH\lib\site-packages\ding\envs\env\ding_env_wrapper.py", line 68, in reset
obs = self._env.reset()
File "C:\ProgramData\Anaconda3\envs\PYTORCH\lib\site-packages\gym\wrappers\record_video.py", line 58, in reset
self.start_video_recorder()
File "C:\ProgramData\Anaconda3\envs\PYTORCH\lib\site-packages\gym\wrappers\record_video.py", line 75, in start_video_recorder
self.video_recorder.capture_frame()
File "C:\ProgramData\Anaconda3\envs\PYTORCH\lib\site-packages\gym\wrappers\monitoring\video_recorder.py", line 155, in capture_frame
self._encode_image_frame(frame)
File "C:\ProgramData\Anaconda3\envs\PYTORCH\lib\site-packages\gym\wrappers\monitoring\video_recorder.py", line 213, in _encode_image_frame
self.encoder = ImageEncoder(
File "C:\ProgramData\Anaconda3\envs\PYTORCH\lib\site-packages\gym\wrappers\monitoring\video_recorder.py", line 337, in __init__
raise error.DependencyNotInstalled(
RuntimeError: Env 0 reset has exceeded max retries(1), and the latest exception is: DependencyNotInstalled("Found neither the ffmpeg nor avconv executables. On OS X, you can install ffmpeg via `brew install ffmpeg`. On most Ubuntu variants, `sudo apt-get install ffmpeg` should do it. On Ubuntu 14.04, however, you'll need to install avconv with `sudo apt-get install libav-tools`.")
Exception ignored in: <function InteractionSerialEvaluator.__del__ at 0x0000017CA3252280>
Traceback (most recent call last):
File "C:\ProgramData\Anaconda3\envs\PYTORCH\lib\site-packages\ding\worker\collector\interaction_serial_evaluator.py", line 138, in __del__
File "C:\ProgramData\Anaconda3\envs\PYTORCH\lib\site-packages\ding\worker\collector\interaction_serial_evaluator.py", line 125, in close
AttributeError: 'InteractionSerialEvaluator' object has no attribute '_end_flag'
Process finished with exit code 1
import ding, torch, sys
print(ding.__version__, torch.__version__, sys.version, sys.platform)
Hi there, I am new to DI-engine.
I am trying to implement the pointer network for my own environment.
The most relevant resource I can find is the docs about the RNN here. It seems that I can treat the pointer network as a kind of RNN and wrap each decoding output as hidden_state
. But the encoder (also an LSTM) output is also used in every decoding step. Can I wrap it as another hidden_state
?
I noticed from slack that a similar architecture had been implemented in DI-star.
Can you give me directions on how to make it work?
Also, I am not sure which part of the codes I should modify. It will be good if you can point me to the docs/ tutorial on customizing models.
This issue is a collection of various interesting agent demonstration trained by DI-engine, it will be updated continually.
Mario 1-1
Mario 1-2
rocket landing
SMAC 5m VS 6m
SMAC MMM
SMAC MMM2
SMAC 3s5z
lunarlander
gfootball
slime_volley
Can you provide a complete example of environment migration? I can't move into my environment.
import ding, torch, sys
print(ding.__version__, torch.__version__, sys.version, sys.platform)
I followed the code in the section "Distributed - Async and parallel" of doc to implement parallelism, but I found that task.parallel_ctx in both processes always remained as original.
And I can't find a function to change task.parallel_ctx in the files of di-engine /ding/framework/, so I can't judge what's wrong.
I want to know how the task.parallel_ctx in one process is synchronized with the ctx of another process. Thanks!
Would it be possible to get an example of training an MAPPO in a sample MA environment?
import ding, torch, sys
print(ding.__version__, torch.__version__, sys.version, sys.platform)
I have run the ddpg and td3 algorithms which use RegressionHead and check their initialized weights. However, head.main.1 seemed to haven't initialized properly.
It's only for head.main.1, head.main.0 is initialized properly.
when I use Multi-GPUs to train my model for Gobigger, the error happened: RuntimeError: Caught RuntimeError in replica 0 on device 0
The detail in opendilab/Gobigger-Explore#3
In the README, could you provide the information on the observation/action spaces that are supported by each algorithm?
Specifically, do BC and GAIL implementations support image and dict observation spaces?
I would like to ask about the details of the use of state standardization.
State is a one-dimensional vector composed of three eigenvectors, where the first eigenvector has a value range of about 0-40, for example: [2, 1, 8, 12, 12, 4, 1, 2]. The range of the second eigenvector is approximately 0-11, for example: [2.3, 1.4, 0.2, 0.9, 8.4, 7.1, 8.3, 9.4]. The third eigenvector is the one-hot vector. Example: [0,0,1,0,0,0,0,0].
In this case, can I use ObsNormEnv directly? I don't think so.
So I would like to ask your advice, thank you very much.
As a follow up to #153, you guys don't need to separately support the SMAC API; you can just use the PettingZoo API since SMAC supports it and it's fairly heavily used.
Although, many multi-agent rl algorithms are implemented here. But there are very few examples of application shown here.
From the quick start:
from ding.config import compile_config
from ding.envs import BaseEnvManager, DingEnvWrapper
from ding.model import DQN
from ding.policy import DQNPolicy
from ding.worker import BaseLearner, SampleCollector, BaseSerialEvaluator, AdvancedReplayBuffer
from dizoo.classic_control.cartpole.config.cartpole_dqn_config import cartpole_dqn_config
# compile config
cfg = compile_config(
cartpole_dqn_config,
BaseEnvManager,
DQNPolicy,
BaseLearner,
SampleCollector,
BaseSerialEvaluator,
AdvancedReplayBuffer,
save_cfg=True
)
This raises
Traceback (most recent call last):
File "r2d2/main.py", line 7, in <module>
from ding.worker import BaseLearner, SampleCollector, BaseSerialEvaluator, AdvancedReplayBuffer
ImportError: cannot import name 'SampleCollector' from 'ding.worker' (/Users/ethanbrooks/Library/Caches/pypoetry/virtualenvs/r2d2-3EWJbHPG-py3.8/lib/python3.8/site-packages/ding/worker/__init__.py)
It appears that BaseSerialEvaluator
is also not present in the library.
I suppose it was meant to be an in place func here, masked_fill_() to be exact, otherwise maybe it wouldn't work
print(torch.__version__, sys.version, sys.platform)
1.11.0+cu113 3.9.7 (default, Sep 16 2021, 16:59:28) [MSC v.1916 64 bit (AMD64)] win32
我已经安装了torch1.11.0+cu113 ,但是在使用pip install DI-engine
时,却在下载不带cuda的torch-1.10.0-cp38-cp38-win_amd64.whl
这和手册里面不一致:
https://di-engine-docs.readthedocs.io/zh_CN/latest/01_quickstart/installation_zh.html
在安装好 CUDA 之后,当您在安装 DI-engine 的依赖项时,会自动获取和安装带有 Nvidia CUDA 加速的 PyTorch。
安装完成后,导致如下冲突:
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
torchvision 0.12.0+cu113 requires torch==1.11.0, but you have torch 1.10.0 which is incompatible.
torchaudio 0.11.0+cu113 requires torch==1.11.0, but you have torch 1.10.0 which is incompatible.
tianshou 0.4.9 requires gym>=0.23.1, but you have gym 0.20.0 which is incompatible.
nbconvert 6.5.0 requires jinja2>=3.0, but you have jinja2 2.11.3 which is incompatible.
卸载torch,升级gym,重新安装torchtorch1.11.0+cu113,导致如下冲突:
di-engine 0.4.0 requires gym==0.20.0, but you have gym 0.25.1 which is incompatible.
di-engine 0.4.0 requires torch<=1.10.0,>=1.1.0, but you have torch 1.11.0+cu113 which is incompatible.
di-engine对版本要求太固定了。
import ding, torch, sys
print(ding.__version__, torch.__version__, sys.version, sys.platform)
>>> print(ding.__version__, torch.__version__, sys.version, sys.platform)
v0.2.2 1.10.0+cu102 3.6.10 |Anaconda, Inc.| (default, Mar 25 2020, 23:51:54)
[GCC 7.3.0] linux
按照DI-engine文档中关于Task与Parallel的用法,将Gogbigger训练di-baseline改为并行及异步形式。
训练测试结果显示,使用Parallel比只使用Task更慢,这可能是什么原因?
with Task(async_mode=True) as task:
task.use_step_wrapper(StepTimer(print_per_step=1))
task.use(evalute(random_evaluator, rule_evaluator, model, task), filter_labels=["standalone", "node.1"])
task.use(collect(epsilon_greedy, collector, replay_buffer), filter_labels=["standalone", "node.0"])
task.use(training(cfg, learner, replay_buffer, task, model), filter_labels=["standalone", "node.0"])
task.run(max_step=max_iterations)
import ding, torch, sys
print(ding.__version__, torch.__version__, sys.version, sys.platform)
There are many bugs in current vesion of DI-engine(V0.3.1) gfootball environment. I have tried to fix some of those, but some problems still exists which are beyond my ability. So I guess it needs systemic maintenance and updates. As far as the codes I have tested, only files in dizoo/gfootball/envs/tests works well(after some bug fix). And the fundamental features metioned in the doc(play with built-in AI & self-play) are basicly unusable.
Besides, since gfootball is an environment with great potential both in academy and practice. I strongly recommend following features being added:
Thanks. I think DI-engine is an excelent potential framwork, hope it to be better.
import ding, torch, sys
print(ding.__version__, torch.__version__, sys.version, sys.platform)
Add league training pipeline in slime_volleyball environment, and make better performance than self-play results (#23)
Related Discussion: #61
import ding, torch, sys
print(ding.__version__, torch.__version__, sys.version, sys.platform)
My DING's version is f1bf66. My pytorch version is 1.7.1+cu101. My system is Linux 3.7.11 (default, Jul 27 2021, 14:32:16) \n[GCC 7.5.0].
I follow the docs' guidance to enable multi-gpu training. I add a config term config.policy.learn.multi_gpu=True in demo/simple_rl/ppo_train.py. But I get the following exception:
WARNING:root:If you want to use numba to speed up segment tree, please install numba first
pygame 1.9.6
Hello from the pygame community. https://www.pygame.org/contribute.html
[ENV] Setting seed: 0
Traceback (most recent call last):
File "imgppo_train.py", line 189, in
main(main_config)
File "imgppo_train.py", line 158, in main
policy = PPOPolicy(cfg.policy, model=model)
File "/home/qhzhang/code/DI-engine/ding/policy/base_policy.py", line 81, in init
self._init_multi_gpu_setting(model)
File "/home/qhzhang/code/DI-engine/ding/policy/base_policy.py", line 101, in _init_multi_gpu_setting
broadcast(param.data, 0)
File "/home/qhzhang/anaconda3/envs/didrive/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 859, in broadcast
_check_default_pg()
File "/home/qhzhang/anaconda3/envs/didrive/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 211, in _check_default_pg
"Default process group is not initialized"
AssertionError: Default process group is not initialized
import ding, torch, sys
print(ding.__version__, torch.__version__, sys.version, sys.platform)
When I used a episode collector together with SAC policy. It raised the following exception.
Traceback (most recent call last):
File "/home/tianhan/codes/astraea/src/train/astraea_episode_ma_sac_config.py", line 96, in <module>
serial_pipeline([main_config, create_config], seed = 9)
File "/home/tianhan/codes/astraea/third_party/DI-engine/ding/entry/serial_entry.py", line 91, in serial_pipeline
new_data = collector.collect(n_sample=cfg.policy.random_collect_size, policy_kwargs=collect_kwargs)
TypeError: collect() got an unexpected keyword argument 'n_sample'
I think the reason is that SAC policy has a default random_collect_size, which is not compatible with episode collector (e.g. EpisodeSerialCollector, which requires n_episode as an argument instead of n_sample).
RL training can be unstable and fall into local optimal solution easily. Visualization and monitoring metrics are therefore extremely important. Assume there are 3 roles in league training (MA, ME, LE). It would be better to visualize metrics for each of these roles over the training time.
import ding, torch, sys
print(ding.__version__, torch.__version__, sys.version, sys.platform)
The value_norm is used in _get_train_sample in PPO Policy, which is used in the _process_timestep function in collector. However, in parallel mode, the collector doesn't have value_norm which is only initialized in _init_learn. Thus, raise the exception "AttributeError: 'PPOCommandModePolicy' object has no attribute '_value_norm".
Currently when I use ding --help
, it prints out the following:
usage: ding [-h] [--cfg CFG] [--seed SEED] [--device DEVICE]
optional arguments:
-h, --help show this help message and exit
--cfg CFG
--seed SEED
--device DEVICE
which is not the expected help information.
I searched the code and found that this behaviour is caused by This line of code, please try to fix this.
The unit test here can just check whether the file was produced successfully, but in actual, image files frequently have issues with generation (for example, some styles, element rendering failure, etc.). As a result, such a test does not function properly:
Here's an idea:
Please test matplotlib image content using unit testing (see https://github.com/matplotlib/pytest-mpl).
Use an image similarity unit test, such as https://github.com/Apkawa/pytest-image-diff.
In addition, files will be generated under the project directory, which could have unintended consequences on the Git workspace and may be added to the repository in later git add commands. For unit tests, please use the mocked path, such as isolated_directory in hbutils.
In reinforcement learning there is well known explore-exploit dilemma. In league training it's crucial that we can have a better entropy coefficient scheduling because of the following reasons:
(1) If entropy of policy drops too fast to zero, it might get stuck in local optimum and failed to explore more states.
(2) If entropy of policy drops too slow, it might fail to select the right action at pivotal moments and the training is very slow.
One solution to address above problem is to have a good scheduling. Assume there are some validation measurements that we can use like win rate, we only decrease entropy coefficient when the win rate is on plateau.
It's similar to learning rate scheduler ReduceLROnPlateau
in PyTorch link
May us know if there could be some documentations about how entropy scheduling can be supported?
Hey, for all your multi-agent environments have you considered using the pettingzoo API?
hi, when runing cartpole_ppo_rnd_main.py, some bug is coming. I want to know the reason and the corresponding solution. the bug is below. looking forward your answer.
Traceback (most recent call last):
File "/home/jgp/.conda/envs/jgpenv/lib/python3.8/site-packages/dizoo/classic_control/cartpole/entry/cartpole_ppo_rnd_main.py", line 70, in
main(cartpole_ppo_rnd_config)
File "/home/jgp/.conda/envs/jgpenv/lib/python3.8/site-packages/dizoo/classic_control/cartpole/entry/cartpole_ppo_rnd_main.py", line 60, in main
reward_model.train()
File "/home/jgp/.conda/envs/jgpenv/lib/python3.8/site-packages/ding/reward_model/rnd_reward_model.py", line 97, in train
self._train()
File "/home/jgp/.conda/envs/jgpenv/lib/python3.8/site-packages/ding/reward_model/rnd_reward_model.py", line 81, in _train
if self.cfg.obs_norm:
AttributeError: 'EasyDict' object has no attribute 'obs_norm'
I propose for m1 (arm64) to be added to do_platform_test
so many failed wheels with default set-ups......
Originally posted by @1733-afk in #259 (comment)
How to run gtrxl with ppo policy? can someone provide an example?
Look at here: https://pypi.org/project/DI-engine/
It says
The author of this package has not provided a project description
If I open the pypi page, I will be confused that what is it? 😕
These information should be configured in setup.py
. So just take a look at the implement in treevalue.
After that, the content in README will be visible on pypi site, like the treevalue. (Some links are down, I'm fixing 😸 )
import ding, torch, sys
print(ding.__version__, torch.__version__, sys.version, sys.platform)
Dear all,
I just try to customize a reward model by Di-engine, however I found we only can get those data(as input of collect_data funciton):
My question is how to get more data in reward model? such as the 'info' from env return.
Looking for replay and thank you.
I used to use 2> error.txt
to redirect only error message, which leaves other information on terminal.
But after logging system refactoriation, 2> error.txt
will redirect all the outputs, including evaluator and collector outputs and so on.
Now the checkpoints and log data are stored in two separate folders. Should we introduce a higher level folder, name EXPERIMENT_NAME or so to store all data of single experiment in it?
By the way, the name format of the checkpoint look quite weird to me, why there are two "_" between the date? I suggest to make it as "checkpoints_MODELNAME" and this is enough! It is not reasonable to write the "created time" in the folder name, since the folder contains lots of checkpoints that are created at different time.
Hello, I'm a beginner in IRL and I want to reproduce the results of TREX algorithm by running "dizoo/mujoco/entry/mujoco_trex_main.py" in the repo. But there is an ImportError: cannot import name 'serial_pipeline_trex_onpolicy' from 'ding.entry'. I think it is because there is no corresponding file named "serial_entry_trex.py" which contains functions 'serial_pipeline_trex_onpolicy' and 'serial_pipeline_trex'.
Can someone help me to solve that, thank you.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.