decisionforce / copo Goto Github PK

[NeurIPS 2021] Official implementation of paper "Learning to Simulate Self-driven Particles System with Coordinated Policy Optimization".

License: Apache License 2.0

Python 12.82% Shell 0.03% Jupyter Notebook 87.15%

multi-agent-reinforcement-learning

copo's Introduction

Code for Coordinated Policy Optimization

Changelog:
+ Jan 29, 2024: Fix dependencies issue.
+ Feb 19, 2023: 🎉 Upload torch implementation of CoPO, compatible with ray=2.2.0.
+ Oct 22, 2022: Update latest experiments results, curves and models!
+ June 22, 2022: Update README to include FAQ, update evaluate population script
+ June 23, 2022: Update a demo script to draw population evaluation results (See FAQ section)
+ July 7, 2022: Remove redundant files and use `train_all_*` scripts

Hi there! This is the source code of the paper “Learning to Simulate Self-driven Particles System with Coordinated Policy Optimization”.

Please following the tutorial below to kickoff the reproduction of our results.

🎉 Results, curves and models

Please refer to this link for latest training and evaluation results, learning curves, scripts to draw figures and models:

benchmarks/MetaDrive-0.2.5

Installation

# Create virtual environment
conda create -n copo python=3.7
conda activate copo

# Install MetaDrive version 0.2.5
pip install git+https://github.com/metadriverse/metadrive.git@releases/0.2.5

# Install dependency
pip install torch  # Make sure your torch is successfully installed! Especially when using GPU!

# Install CoPO repo
git clone https://github.com/decisionforce/CoPO
cd CoPO/copo_code
pip install -e .

# For **running torch implementation**, install torch and update ray:
# (If you are using TF implementation, you might need to follow ray==1.2.0.
pip install -U ray==2.2.0 "ray[rllib]==2.2.0"
pip install -U "numpy<1.24.0"
pip uninstall opencv-python
pip uninstall opencv-python-headless
pip install opencv-python==4.5.5.64
pip install pydantic==1.9.0

Please install latest MetaDrive. Note that we do not support gym>=0.20.0. In setup.py we specify we are using gym==0.19.0.

Training

+ 🎉 We update torch version of our algorithms compatible with ray=2.2.0!!

Please take a look on the scripts at:

cd ~/CoPO  # Go to repo root.
python ./copo_code/copo/torch_copo/train_ccppo.py
python ./copo_code/copo/torch_copo/train_ippo.py
python ./copo_code/copo/torch_copo/train_copo.py

Note that you can even kickoff training at Macbook with this code!

(The below is old guideline for running TF version)

As a quick start, you can start training CoPO in Intersection environment immediately after installation by running:

cd copo_code/copo/
python train_all_copo_dist.py --exp-name copo

Please visit each training script to adjust the hyper-parameters. The general way to run training is following:

cd copo_code/copo/
python train_all_ALGO.py --exp-name EXPNAME

Here EXPNAME is arbitrary name to represent the experiment. One experiment contains multiple concurrent trials with different random seeds or sweeps of hyper-parameter. By default the exp name is TEST. ALGO is the shorthand for algorithms:

ippo  # Individual Policy Optimization
ccppo  # Mean Field Policy Optimization
cl  # Curriculum Learning
copo_dist  # Coordinated Policy Optimiztion (Ours)

You can also specify to use GPU via python train_all_ALGO.py --exp-name EXPNAME --num-gpus 4. By default, we will run 8 trails with different seeds for one environment and one algorithm. If this overwhelms your computing resource, please take a look on the training scripts and modify it by yourself. If you want to verify the code in details, e.g. through setting breakpoints, you can to learn how to run CoPO in the local mode in FAQ section.

Visualization

We provide the trained models for all algorithms in all environments. A simple command can bring you the visualization of the behaviors of the populations!

cd copo_code/copo

# Download and unzip this file:
wget https://github.com/metadriverse/metadrive-benchmark/releases/download/asset-marl/new_best_checkpoints.zip
unzip new_best_checkpoints.zip

python new_vis.py 

# In default, we provide you the CoPO population in Intersection environment. 
# If you want to see others, try:
python new_vis.py --env round --algo ippo

# Or you can use the native renderer for 3D rendering:
#  Press H to show helper message
#  Press Q to switch to third-person view
python new_vis.py --env tollgate --algo cl --use_native_render

We hope you enjoy the interesting behaviors learned in this work! Please feel free to contact us if you have any questions, thanks!

There are two legacy scripts for visualization vis_from_checkpoint.py and vis.py. However, the performance of the agents varies largely due to the changes in MetaDrive environment. The new_vis.py instead runs the trained models from latest benchmark: https://github.com/metadriverse/metadrive-benchmark/tree/main/MARL

Evaluation

Evaluation is important to collect the test time performance of your agents. We will use evaluation results to draw the Radar figure with three metrics: safety, completeness and efficiency.

You can easily evaluate your trained agents via our provided script: copo_code/copo/eval.py. Suppose you use train script python train_all_ippo.py --exp-name my_ippo, then you can run the evaluation via:

cd copo_code/copo/

# Training
python train_all_ippo.py --exp-name my_ippo

# Evaluating
python eval.py --root my_ippo

The evaluation results will be saved to copo_code/copo/evaluate_results. Now, please refer to this link for script to draw figures: https://github.com/metadriverse/metadrive-benchmark/tree/main/MARL

FAQ

How to run CoPO in the local mode?

If you want to dive into the code and try to understand what is happening, you can try to set local mode of Ray to True, in which case all code will run in a single process so that you can easily set breakpoints to step the code.

However, this will raise problem in native CoPO scripts, since MetaDrive has a strict requirement of singleton. Each process can only host one MetaDrive instance, imposed by the simulation engine.

To solve this issue, we need to make several light modifications to the training scripts: Here is the procedure to setup local mode:

Set config["num_workers] = 1, indicating that you ask RLLib to only setup one process.
Remove config["evaluation_config] if any. In CoPO we don't test agents in different environments apart from the training environments, so we don't need to do this step.
Remove all tune.grid_search([...]) code by setting each config with only one value.
Set train(..., local_mode=True).

Here is the exemplar code for training IPPO in roundabout environment, provided natively in CoPO codebase:

...
config = dict(
    env=get_rllib_compatible_env(MultiAgentRoundaboutEnv),
    env_config=dict(start_seed=tune.grid_search([5000, 6000, 7000, 8000, 9000, 10000, 11000, 12000]), ),
    num_gpus=0.25 if args.num_gpus != 0 else 0,
)
train(
    IPPOTrainer,
    exp_name=exp_name,
    keep_checkpoints_num=5,
    stop=stop,
    config=get_ippo_config(config),
    num_gpus=args.num_gpus,
    num_seeds=1,
    test_mode=args.test,
    custom_callback=MultiAgentDrivingCallbacks,
    # local_mode=True
)

After the aforementioned modifications, the code becomes:

config = dict(
    env=get_rllib_compatible_env(MultiAgentRoundaboutEnv),
    env_config=dict(start_seed=5000),  # <<<=== Modifified!
    num_gpus=0.25 if args.num_gpus != 0 else 0,
    num_workers=1,  # <<<=== Modifified!
)
train(
    IPPOTrainer,
    exp_name=exp_name,
    keep_checkpoints_num=5,
    stop=stop,
    config=get_ippo_config(config),
    num_gpus=args.num_gpus,
    num_seeds=1,
    test_mode=args.test,
    custom_callback=MultiAgentDrivingCallbacks,
    local_mode=True  # <<<=== Modifified!
)

Now you can run the training script with debugger! Please make sure to reset those changes if you want to deploy the script in production. Thanks!

Can I use GPU for training?

Yes. Apart from specifying python train_all_ALGO.py --num-gpus 4 to tell RLLib "I have 4 gpus in this computer!", you can also modify the num_gpus config WITHIN the config dict. The num_gpus within config dict specifies the number of GPU each trial will consume. By default, config["num_gpus"]=0.5 means each trial will use 0.5 GPU. If your computer has 4 gpus and sufficient cpus, then RLLib will launch 8 concurrent trials. Note that those specifications does not mean true resource consumption.

Opencv-python error

AttributeError: module 'cv2' has no attribute 'gapi_wip_gst_GStreamerPipeline'

Try:

pip uninstall opencv-python
pip uninstall opencv-python-headless
pip install opencv-python==4.5.5.64

Ray dashboard error

TypeError: __init_subclass__() takes no keyword arguments

Try:

pydantic==1.9.0

Citation

@article{peng2021learning,
  title={Learning to Simulate Self-Driven Particles System with Coordinated Policy Optimization},
  author={Peng, Zhenghao and Hui, Ka Ming and Liu, Chunxiao and Zhou, Bolei},
  journal={Advances in Neural Information Processing Systems},
  volume={34},
  year={2021}
}

copo's People

Contributors

Stargazers

Watchers

copo's Issues

help wanted:ray.exceptions.RayTaskError(KeyError)

Description
I am in the process of running training.

Operating System
ubuntu 18.04

Problems
When I run the commend 'python inter/train_cl.py --exp-name inter_cl ','ray.exceptions.RayTaskError(KeyError)' and 'KeyError: step_reward' happened.
I hope to get help.,thank you!

error:
Failure # 1 (occurred at 2022-04-22_09-18-36)
Traceback (most recent call last):
File "/home/behazy/anaconda3/envs/copo/lib/python3.7/site-packages/ray/tune/trial_runner.py", line 586, in _process_trial
results = self.trial_executor.fetch_result(trial)
File "/home/behazy/anaconda3/envs/copo/lib/python3.7/site-packages/ray/tune/ray_trial_executor.py", line 609, in fetch_result
result = ray.get(trial_future[0], timeout=DEFAULT_GET_TIMEOUT)
File "/home/behazy/anaconda3/envs/copo/lib/python3.7/site-packages/ray/_private/client_mode_hook.py", line 47, in wrapper
return func(*args, **kwargs)
File "/home/behazy/anaconda3/envs/copo/lib/python3.7/site-packages/ray/worker.py", line 1456, in get
raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(KeyError): �[36mray::IPPOCL.train_buffered()�[39m (pid=55550, ip=192.168.79.142)
File "python/ray/_raylet.pyx", line 480, in ray._raylet.execute_task
File "python/ray/_raylet.pyx", line 432, in ray._raylet.execute_task.function_executor
File "/home/behazy/anaconda3/envs/copo/lib/python3.7/site-packages/ray/tune/trainable.py", line 167, in train_buffered
result = self.train()
File "/home/behazy/anaconda3/envs/copo/lib/python3.7/site-packages/ray/rllib/agents/trainer.py", line 526, in train
raise e
File "/home/behazy/anaconda3/envs/copo/lib/python3.7/site-packages/ray/rllib/agents/trainer.py", line 515, in train
result = Trainable.train(self)
File "/home/behazy/anaconda3/envs/copo/lib/python3.7/site-packages/ray/tune/trainable.py", line 226, in train
result = self.step()
File "/home/behazy/anaconda3/envs/copo/lib/python3.7/site-packages/ray/rllib/agents/trainer_template.py", line 148, in step
res = next(self.train_exec_impl)
File "/home/behazy/anaconda3/envs/copo/lib/python3.7/site-packages/ray/util/iter.py", line 756, in next
return next(self.built_iterator)
File "/home/behazy/anaconda3/envs/copo/lib/python3.7/site-packages/ray/util/iter.py", line 783, in apply_foreach
for item in it:
File "/home/behazy/anaconda3/envs/copo/lib/python3.7/site-packages/ray/util/iter.py", line 783, in apply_foreach
for item in it:
File "/home/behazy/anaconda3/envs/copo/lib/python3.7/site-packages/ray/util/iter.py", line 843, in apply_filter
for item in it:
File "/home/behazy/anaconda3/envs/copo/lib/python3.7/site-packages/ray/util/iter.py", line 843, in apply_filter
for item in it:
File "/home/behazy/anaconda3/envs/copo/lib/python3.7/site-packages/ray/util/iter.py", line 783, in apply_foreach
for item in it:
File "/home/behazy/anaconda3/envs/copo/lib/python3.7/site-packages/ray/util/iter.py", line 783, in apply_foreach
for item in it:
File "/home/behazy/anaconda3/envs/copo/lib/python3.7/site-packages/ray/util/iter.py", line 783, in apply_foreach
for item in it:
[Previous line repeated 1 more time]
File "/home/behazy/anaconda3/envs/copo/lib/python3.7/site-packages/ray/util/iter.py", line 876, in apply_flatten
for item in it:
File "/home/behazy/anaconda3/envs/copo/lib/python3.7/site-packages/ray/util/iter.py", line 828, in add_wait_hooks
item = next(it)
File "/home/behazy/anaconda3/envs/copo/lib/python3.7/site-packages/ray/util/iter.py", line 783, in apply_foreach
for item in it:
File "/home/behazy/anaconda3/envs/copo/lib/python3.7/site-packages/ray/util/iter.py", line 783, in apply_foreach
for item in it:
File "/home/behazy/anaconda3/envs/copo/lib/python3.7/site-packages/ray/util/iter.py", line 783, in apply_foreach
for item in it:
[Previous line repeated 1 more time]
File "/home/behazy/anaconda3/envs/copo/lib/python3.7/site-packages/ray/util/iter.py", line 471, in base_iterator
yield ray.get(futures, timeout=timeout)
File "/home/behazy/anaconda3/envs/copo/lib/python3.7/site-packages/ray/_private/client_mode_hook.py", line 47, in wrapper
return func(*args, **kwargs)
ray.exceptions.RayTaskError(KeyError): �[36mray::RolloutWorker.par_iter_next()�[39m (pid=55549, ip=192.168.79.142)
File "python/ray/_raylet.pyx", line 480, in ray._raylet.execute_task
File "python/ray/_raylet.pyx", line 432, in ray._raylet.execute_task.function_executor
File "/home/behazy/anaconda3/envs/copo/lib/python3.7/site-packages/ray/util/iter.py", line 1152, in par_iter_next
return next(self.local_it)
File "/home/behazy/anaconda3/envs/copo/lib/python3.7/site-packages/ray/rllib/evaluation/rollout_worker.py", line 327, in gen_rollouts
yield self.sample()
File "/home/behazy/anaconda3/envs/copo/lib/python3.7/site-packages/ray/rllib/evaluation/rollout_worker.py", line 662, in sample
batches = [self.input_reader.next()]
File "/home/behazy/anaconda3/envs/copo/lib/python3.7/site-packages/ray/rllib/evaluation/sampler.py", line 95, in next
batches = [self.get_data()]
File "/home/behazy/anaconda3/envs/copo/lib/python3.7/site-packages/ray/rllib/evaluation/sampler.py", line 224, in get_data
item = next(self.rollout_provider)
File "/home/behazy/anaconda3/envs/copo/lib/python3.7/site-packages/ray/rllib/evaluation/sampler.py", line 620, in _env_runner
sample_collector=sample_collector,
File "/home/behazy/anaconda3/envs/copo/lib/python3.7/site-packages/ray/rllib/evaluation/sampler.py", line 1124, in _process_observations_w_trajectory_view_api
env_index=env_id)
File "/home/behazy/CoPO/copo_code/copo/callbacks.py", line 41, in on_episode_step
episode.user_data["step_reward"][k].append(info["step_reward"])
KeyError: 'step_reward'

Why I can't find module "metadrive.constants"?

Hello, Mr. Peng,
I run your official code with gym==0.19.0 and ray==0.22.0, and download metadrive with "git clone" commend.

However, there is an error:

Successfully registered the following environments: ['MetaDrive-validation-v0', 'MetaDrive-10env-v0', 'MetaDrive-100envs-v0', 'MetaDrive-1000envs-v0', 'SafeMetaDrive-validation-v0', 'SafeMetaDrive-10env-v0', 'SafeMetaDrive-100envs-v0', 'SafeMetaDrive-1000envs-v0', 'MARLTollgate-v0', 'MARLBottleneck-v0', 'MARLRoundabout-v0', 'MARLIntersection-v0', 'MARLParkingLot-v0', 'MARLMetaDrive-v0'].
Successfully initialize Ray!
Available resources:  {'object_store_memory': 8912057548.0, 'accelerator_type:G': 1.0, 'memory': 17824115099.0, 'node:192.168.1.104': 1.0, 'CPU': 96.0}
== Status ==
Current time: 2023-03-10 23:34:46 (running for 00:00:00.41)
Memory usage on this node: 9.2/31.0 GiB 
Using FIFO scheduling algorithm.
Resources requested: 1.9999999999999998/96 CPUs, 0/0 GPUs, 0.0/16.6 GiB heap, 0.0/8.3 GiB objects (0.0/1.0 accelerator_type:G)
Result logdir: /home/ps/Documents/Projects/CoPO/TEST
Number of trials: 5/5 (4 PENDING, 1 RUNNING)
+---------------------------------------------------+----------+-------+----------------------+--------+-----------------+
| Trial name                                        | status   | loc   | env                  |   seed |   vf_clip_param |
|---------------------------------------------------+----------+-------+----------------------+--------+-----------------|
| IPPOTrainer_MultiAgentIntersectionEnv_15c85_00000 | RUNNING  |       | MultiAgentInter_50d0 |      0 |              10 |
| IPPOTrainer_MultiAgentIntersectionEnv_15c85_00001 | PENDING  |       | MultiAgentInter_50d0 |      0 |              20 |
| IPPOTrainer_MultiAgentIntersectionEnv_15c85_00002 | PENDING  |       | MultiAgentInter_50d0 |      0 |              50 |
| IPPOTrainer_MultiAgentIntersectionEnv_15c85_00003 | PENDING  |       | MultiAgentInter_50d0 |      0 |             100 |
| IPPOTrainer_MultiAgentIntersectionEnv_15c85_00004 | PENDING  |       | MultiAgentInter_50d0 |      0 |            1000 |
+---------------------------------------------------+----------+-------+----------------------+--------+-----------------+


2023-03-10 23:34:50,656 ERROR trial_runner.py:1088 -- Trial IPPOTrainer_MultiAgentIntersectionEnv_15c85_00000: Error processing event.
ray.tune.error._TuneNoNextExecutorEventError: Traceback (most recent call last):
  File "/home/ps/miniconda3/envs/copo/lib/python3.7/site-packages/ray/tune/execution/ray_trial_executor.py", line 1070, in get_next_executor_event
    future_result = ray.get(ready_future)
  File "/home/ps/miniconda3/envs/copo/lib/python3.7/site-packages/ray/_private/client_mode_hook.py", line 105, in wrapper
    return func(*args, **kwargs)
  File "/home/ps/miniconda3/envs/copo/lib/python3.7/site-packages/ray/_private/worker.py", line 2311, in get
    raise value
ray.exceptions.RayActorError: The actor died because of an error raised in its creation task, ray::IPPOTrainer.__init__() (pid=12557, ip=192.168.1.104, repr=IPPOTrainer)
  File "/home/ps/miniconda3/envs/copo/lib/python3.7/site-packages/ray/rllib/algorithms/algorithm.py", line 368, in __init__
    config.validate()
  File "/home/ps/Documents/Projects/CoPO/copo_code/copo/torch_copo/algo_ippo.py", line 50, in validate
    from metadrive.constants import DEFAULT_AGENT
ModuleNotFoundError: No module named 'metadrive.constants'

Hope you can help me to solve this issue, sincerely thanks :)

visualization problem

When i have my checkpoint folder , how can i visualize it . When i run [vis_from_checkpoint.py] directly, there is something wrong with it .I know checkpoint need to be processed but i don not know how to do. I'm so sorry for my stupid... Hope you can give me hints in detail or examples. Thanks a lot. ToT

torch_copo error

~/CoPO/copo_code/copo/torch_copo$ python train_copo.py
2023-04-11 16:53:27.584910: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcudart.so.10.1'; dlerror: libcudart.so.10.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /home/yl-02/.mujoco/mujoco200/bin
2023-04-11 16:53:27.584930: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
/home/yl-02/anaconda3/envs/didrive/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:513: DeprecationWarning: np.object is a deprecated alias for the builtin object. To silence this warning, use object by itself. Doing this will not modify any behavior and is safe.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
np.object,
/home/yl-02/anaconda3/envs/didrive/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:521: DeprecationWarning: np.bool is a deprecated alias for the builtin bool. To silence this warning, use bool by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use np.bool_ here.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
np.bool,
/home/yl-02/anaconda3/envs/didrive/lib/python3.7/site-packages/tensorflow/python/framework/tensor_util.py:107: DeprecationWarning: np.object is a deprecated alias for the builtin object. To silence this warning, use object by itself. Doing this will not modify any behavior and is safe.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
np.object:
/home/yl-02/anaconda3/envs/didrive/lib/python3.7/site-packages/tensorflow/python/framework/tensor_util.py:109: DeprecationWarning: np.bool is a deprecated alias for the builtin bool. To silence this warning, use bool by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use np.bool_ here.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
np.bool:
/home/yl-02/anaconda3/envs/didrive/lib/python3.7/site-packages/tensorflow/python/autograph/utils/testing.py:21: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
import imp
/home/yl-02/anaconda3/envs/didrive/lib/python3.7/site-packages/h5py/init.py:46: DeprecationWarning: np.typeDict is a deprecated alias for np.sctypeDict.
from ._conv import register_converters as _register_converters
Successfully registered the following environments: ['MetaDrive-validation-v0', 'MetaDrive-10env-v0', 'MetaDrive-100envs-v0', 'MetaDrive-1000envs-v0', 'SafeMetaDrive-validation-v0', 'SafeMetaDrive-10env-v0', 'SafeMetaDrive-100envs-v0', 'SafeMetaDrive-1000envs-v0', 'MARLTollgate-v0', 'MARLBottleneck-v0', 'MARLRoundabout-v0', 'MARLIntersection-v0', 'MARLParkingLot-v0', 'MARLMetaDrive-v0'].
/home/yl-02/anaconda3/envs/didrive/lib/python3.7/site-packages/tensorflow_probability/python/internal/backend/numpy/dtype.py:82: DeprecationWarning: np.bool is a deprecated alias for the builtin bool. To silence this warning, use bool by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use np.bool_ here.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
bool = np.bool # pylint: disable=redefined-builtin
/home/yl-02/anaconda3/envs/didrive/lib/python3.7/site-packages/tensorflow_probability/python/internal/backend/numpy/dtype.py:112: DeprecationWarning: np.str is a deprecated alias for the builtin str. To silence this warning, use str by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use np.str_ here.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
string = getattr(np, 'str', getattr(np, 'string', None))
/home/yl-02/anaconda3/envs/didrive/lib/python3.7/site-packages/tensorflow_probability/python/mcmc/sample_halton_sequence.py:373: DeprecationWarning: np.bool is a deprecated alias for the builtin bool. To silence this warning, use bool by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use np.bool_ here.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
sieve = np.ones(n // 3 + (n % 6 == 2), dtype=np.bool)
2023-04-11 16:53:30,752 ERROR services.py:1195 -- Failed to start the dashboard: Failed to start the dashboard, return code 1
The last 10 lines of /tmp/ray/session_2023-04-11_16-53-29_150060_3096856/logs/dashboard.log:
File "", line 677, in _load_unlocked
File "", line 728, in exec_module
File "", line 219, in _call_with_frames_removed
File "/home/yl-02/anaconda3/envs/didrive/lib/python3.7/site-packages/ray/dashboard/modules/snapshot/snapshot_head.py", line 40, in
class RayActivityResponse(BaseModel, extra=Extra.allow):
File "pydantic/main.py", line 309, in pydantic.main.ModelMetaclass.new
File "/home/yl-02/anaconda3/envs/didrive/lib/python3.7/abc.py", line 126, in new
cls = super().new(mcls, name, bases, namespace, **kwargs)
TypeError: init_subclass() takes no keyword arguments
2023-04-11 16:53:30,752 ERROR services.py:1196 -- Failed to start the dashboard, return code 1
The last 10 lines of /tmp/ray/session_2023-04-11_16-53-29_150060_3096856/logs/dashboard.log:
File "", line 677, in _load_unlocked
File "", line 728, in exec_module
File "", line 219, in _call_with_frames_removed
File "/home/yl-02/anaconda3/envs/didrive/lib/python3.7/site-packages/ray/dashboard/modules/snapshot/snapshot_head.py", line 40, in
class RayActivityResponse(BaseModel, extra=Extra.allow):
File "pydantic/main.py", line 309, in pydantic.main.ModelMetaclass.new
File "/home/yl-02/anaconda3/envs/didrive/lib/python3.7/abc.py", line 126, in new
cls = super().new(mcls, name, bases, namespace, **kwargs)
TypeError: init_subclass() takes no keyword arguments
Traceback (most recent call last):
File "/home/yl-02/anaconda3/envs/didrive/lib/python3.7/site-packages/ray/_private/services.py", line 1181, in start_api_server
raise Exception(err_msg + last_log_str)
Exception: Failed to start the dashboard, return code 1
The last 10 lines of /tmp/ray/session_2023-04-11_16-53-29_150060_3096856/logs/dashboard.log:
File "", line 677, in _load_unlocked
File "", line 728, in exec_module
File "", line 219, in _call_with_frames_removed
File "/home/yl-02/anaconda3/envs/didrive/lib/python3.7/site-packages/ray/dashboard/modules/snapshot/snapshot_head.py", line 40, in
class RayActivityResponse(BaseModel, extra=Extra.allow):
File "pydantic/main.py", line 309, in pydantic.main.ModelMetaclass.new
File "/home/yl-02/anaconda3/envs/didrive/lib/python3.7/abc.py", line 126, in new
cls = super().new(mcls, name, bases, namespace, **kwargs)
TypeError: init_subclass() takes no keyword arguments
Successfully initialize Ray!
Available resources: {'object_store_memory': 4120841011.0, 'node:192.168.50.150': 1.0, 'CPU': 16.0, 'accelerator_type:G': 1.0, 'memory': 8241682023.0}
/home/yl-02/anaconda3/envs/didrive/lib/python3.7/site-packages/gym/spaces/box.py:84: UserWarning: WARN: Box bound precision lowered by casting to float32
logger.warn(f"Box bound precision lowered by casting to {self.dtype}")

when it runs, it meets this error.

Run in CARLA？

What a great job! I want to ask can CoPO work in CARLA simulator?

AssertionError in visualizing trained models.

Hi! Thanks for the excellent work!
After I used train_all_copo_dist.py to train the model and try to visualize it, I found that the dimensions of OBS and Weights did not match, even the same problem in the <new_best_checkpoints>. This problem occurred in every environment I trained, and the number of rows in OBS is always 1 less than the number of columns in Weights.
But I can visualize the <new_best_checkpoints> by the new_vis.py, which is wired.

Traceback (most recent call last):
  File "/home/CoPO_tf/CoPO/copo_code/copo/vis_from_checkpoint.py", line 91, in <module>
    action = policy_function(o, d)
  File "/home/CoPO_tf/CoPO/copo_code/copo/eval/get_policy_function.py", line 170, in __call__
    actions = self.policy(obs_to_be_eval)
  File "/home/CoPO_tf/CoPO/copo_code/copo/vis_from_checkpoint.py", line 72, in policy
    ret = policy_class(weights, obs, policy_name=policy_name, layer_name_suffix="_1", deterministic=deterministic)
  File "/home/CoPO_tf/CoPO/copo_code/copo/eval/get_policy_function.py", line 61, in _compute_actions_for_tf_policy
    assert obs.shape[1] == weights[s].shape[0], (obs.shape, weights[s].shape)
AssertionError: ((20, 96), (97, 256))

Run without reaction (but no error)

The code running on the server keeps showing this message without progress。Hope to get answers, thank you

Agent ID KeyError

If anyone else is facing this same issue. I tried reinstalling the metadrive to the latest version but the latest one requires some dependencies incompatible with older python version.

Failure # 1 (occurred at 2024-04-28_19-08-33)
[36mray::CoPOTrainer.train()[39m (pid=517814, ip=130.127.106.66, repr=CoPOTrainer)
  File "/home/abastol/modules/anaconda3/envs/copo/lib/python3.7/site-packages/ray/tune/trainable/trainable.py", line 367, in train
    raise skipped from exception_cause(skipped)
  File "/home/abastol/modules/anaconda3/envs/copo/lib/python3.7/site-packages/ray/tune/trainable/trainable.py", line 364, in train
    result = self.step()
  File "/home/abastol/modules/anaconda3/envs/copo/lib/python3.7/site-packages/ray/rllib/algorithms/algorithm.py", line 749, in step
    results, train_iter_ctx = self._run_one_training_iteration()
  File "/home/abastol/modules/anaconda3/envs/copo/lib/python3.7/site-packages/ray/rllib/algorithms/algorithm.py", line 2623, in _run_one_training_iteration
    results = self.training_step()
  File "/home/abastol/working/cleanCOPO/algo_copo.py", line 534, in training_step
    worker_set=self.workers, max_env_steps=self.config.train_batch_size
  File "/home/abastol/modules/anaconda3/envs/copo/lib/python3.7/site-packages/ray/rllib/execution/rollout_ops.py", line 86, in synchronous_parallel_sample
    lambda w: w.sample(), local_worker=False, healthy_only=True
  File "/home/abastol/modules/anaconda3/envs/copo/lib/python3.7/site-packages/ray/rllib/evaluation/worker_set.py", line 696, in foreach_worker
    handle_remote_call_result_errors(remote_results, self._ignore_worker_failures)
  File "/home/abastol/modules/anaconda3/envs/copo/lib/python3.7/site-packages/ray/rllib/evaluation/worker_set.py", line 73, in handle_remote_call_result_errors
    raise r.get()
ray.exceptions.RayTaskError(KeyError): [36mray::RolloutWorker.apply()[39m (pid=517895, ip=130.127.106.66, repr=<ray.rllib.evaluation.rollout_worker.RolloutWorker object at 0x7af041348c10>)
  File "/home/abastol/modules/anaconda3/envs/copo/lib/python3.7/site-packages/ray/rllib/utils/actor_manager.py", line 183, in apply
    raise e
  File "/home/abastol/modules/anaconda3/envs/copo/lib/python3.7/site-packages/ray/rllib/utils/actor_manager.py", line 174, in apply
    return func(self, *args, **kwargs)
  File "/home/abastol/modules/anaconda3/envs/copo/lib/python3.7/site-packages/ray/rllib/execution/rollout_ops.py", line 86, in <lambda>
    lambda w: w.sample(), local_worker=False, healthy_only=True
  File "/home/abastol/modules/anaconda3/envs/copo/lib/python3.7/site-packages/ray/rllib/evaluation/rollout_worker.py", line 900, in sample
    batches = [self.input_reader.next()]
  File "/home/abastol/modules/anaconda3/envs/copo/lib/python3.7/site-packages/ray/rllib/evaluation/sampler.py", line 92, in next
    batches = [self.get_data()]
  File "/home/abastol/modules/anaconda3/envs/copo/lib/python3.7/site-packages/ray/rllib/evaluation/sampler.py", line 285, in get_data
    item = next(self._env_runner)
  File "/home/abastol/modules/anaconda3/envs/copo/lib/python3.7/site-packages/ray/rllib/evaluation/sampler.py", line 721, in _env_runner
    base_env.send_actions(actions_to_send)
  File "/home/abastol/modules/anaconda3/envs/copo/lib/python3.7/site-packages/ray/rllib/env/multi_agent_env.py", line 615, in send_actions
    raise e
  File "/home/abastol/modules/anaconda3/envs/copo/lib/python3.7/site-packages/ray/rllib/env/multi_agent_env.py", line 608, in send_actions
    obs, rewards, dones, infos = env.step(agent_dict)
  File "/home/abastol/working/cleanCOPO/utils/env_wrappers.py", line 309, in step
    o, r, d, i = super(LCFEnv, self).step(actions)
  File "/home/abastol/working/cleanCOPO/utils/env_wrappers.py", line 96, in step
    self._update_distance_map(dones=d)
  File "/home/abastol/working/cleanCOPO/utils/env_wrappers.py", line 143, in _update_distance_map
    if hasattr(self, "vehicles_including_just_terminated"):
  File "/home/abastol/modules/anaconda3/envs/copo/lib/python3.7/site-packages/metadrive/envs/base_env.py", line 490, in vehicles_including_just_terminated
    ret.update(self.agent_manager.just_terminated_agents)
  File "/home/abastol/modules/anaconda3/envs/copo/lib/python3.7/site-packages/metadrive/manager/agent_manager.py", line 276, in just_terminated_agents
    for agent_name, v_name in self._agents_finished_this_frame.items()
  File "/home/abastol/modules/anaconda3/envs/copo/lib/python3.7/site-packages/metadrive/manager/agent_manager.py", line 276, in <dictcomp>
    for agent_name, v_name in self._agents_finished_this_frame.items()
  File "/home/abastol/modules/anaconda3/envs/copo/lib/python3.7/site-packages/metadrive/manager/agent_manager.py", line 289, in get_agent
    object_name = self.agent_to_object(agent_name)
  File "/home/abastol/modules/anaconda3/envs/copo/lib/python3.7/site-packages/metadrive/manager/agent_manager.py", line 316, in agent_to_object
    return self._agent_to_object[agent_id]
KeyError: 'agent15'
Failure # 2 (occurred at 2024-04-28_19-09-13)
[36mray::CoPOTrainer.train()[39m (pid=533950, ip=130.127.106.66, repr=CoPOTrainer)
  File "/home/abastol/modules/anaconda3/envs/copo/lib/python3.7/site-packages/ray/tune/trainable/trainable.py", line 367, in train
    raise skipped from exception_cause(skipped)
  File "/home/abastol/modules/anaconda3/envs/copo/lib/python3.7/site-packages/ray/tune/trainable/trainable.py", line 364, in train
    result = self.step()
  File "/home/abastol/modules/anaconda3/envs/copo/lib/python3.7/site-packages/ray/rllib/algorithms/algorithm.py", line 749, in step
    results, train_iter_ctx = self._run_one_training_iteration()
  File "/home/abastol/modules/anaconda3/envs/copo/lib/python3.7/site-packages/ray/rllib/algorithms/algorithm.py", line 2623, in _run_one_training_iteration
    results = self.training_step()
  File "/home/abastol/working/cleanCOPO/algo_copo.py", line 534, in training_step
    worker_set=self.workers, max_env_steps=self.config.train_batch_size
  File "/home/abastol/modules/anaconda3/envs/copo/lib/python3.7/site-packages/ray/rllib/execution/rollout_ops.py", line 86, in synchronous_parallel_sample
    lambda w: w.sample(), local_worker=False, healthy_only=True
  File "/home/abastol/modules/anaconda3/envs/copo/lib/python3.7/site-packages/ray/rllib/evaluation/worker_set.py", line 696, in foreach_worker
    handle_remote_call_result_errors(remote_results, self._ignore_worker_failures)
  File "/home/abastol/modules/anaconda3/envs/copo/lib/python3.7/site-packages/ray/rllib/evaluation/worker_set.py", line 73, in handle_remote_call_result_errors
    raise r.get()
ray.exceptions.RayTaskError(KeyError): [36mray::RolloutWorker.apply()[39m (pid=534063, ip=130.127.106.66, repr=<ray.rllib.evaluation.rollout_worker.RolloutWorker object at 0x72f28a442f10>)
  File "/home/abastol/modules/anaconda3/envs/copo/lib/python3.7/site-packages/ray/rllib/utils/actor_manager.py", line 183, in apply
    raise e
  File "/home/abastol/modules/anaconda3/envs/copo/lib/python3.7/site-packages/ray/rllib/utils/actor_manager.py", line 174, in apply
    return func(self, *args, **kwargs)
  File "/home/abastol/modules/anaconda3/envs/copo/lib/python3.7/site-packages/ray/rllib/execution/rollout_ops.py", line 86, in <lambda>
    lambda w: w.sample(), local_worker=False, healthy_only=True
  File "/home/abastol/modules/anaconda3/envs/copo/lib/python3.7/site-packages/ray/rllib/evaluation/rollout_worker.py", line 900, in sample
    batches = [self.input_reader.next()]
  File "/home/abastol/modules/anaconda3/envs/copo/lib/python3.7/site-packages/ray/rllib/evaluation/sampler.py", line 92, in next
    batches = [self.get_data()]
  File "/home/abastol/modules/anaconda3/envs/copo/lib/python3.7/site-packages/ray/rllib/evaluation/sampler.py", line 285, in get_data
    item = next(self._env_runner)
  File "/home/abastol/modules/anaconda3/envs/copo/lib/python3.7/site-packages/ray/rllib/evaluation/sampler.py", line 721, in _env_runner
    base_env.send_actions(actions_to_send)
  File "/home/abastol/modules/anaconda3/envs/copo/lib/python3.7/site-packages/ray/rllib/env/multi_agent_env.py", line 615, in send_actions
    raise e
  File "/home/abastol/modules/anaconda3/envs/copo/lib/python3.7/site-packages/ray/rllib/env/multi_agent_env.py", line 608, in send_actions
    obs, rewards, dones, infos = env.step(agent_dict)
  File "/home/abastol/working/cleanCOPO/utils/env_wrappers.py", line 309, in step
    o, r, d, i = super(LCFEnv, self).step(actions)
  File "/home/abastol/working/cleanCOPO/utils/env_wrappers.py", line 96, in step
    self._update_distance_map(dones=d)
  File "/home/abastol/working/cleanCOPO/utils/env_wrappers.py", line 143, in _update_distance_map
    if hasattr(self, "vehicles_including_just_terminated"):
  File "/home/abastol/modules/anaconda3/envs/copo/lib/python3.7/site-packages/metadrive/envs/base_env.py", line 490, in vehicles_including_just_terminated
    ret.update(self.agent_manager.just_terminated_agents)
  File "/home/abastol/modules/anaconda3/envs/copo/lib/python3.7/site-packages/metadrive/manager/agent_manager.py", line 276, in just_terminated_agents
    for agent_name, v_name in self._agents_finished_this_frame.items()
  File "/home/abastol/modules/anaconda3/envs/copo/lib/python3.7/site-packages/metadrive/manager/agent_manager.py", line 276, in <dictcomp>
    for agent_name, v_name in self._agents_finished_this_frame.items()
  File "/home/abastol/modules/anaconda3/envs/copo/lib/python3.7/site-packages/metadrive/manager/agent_manager.py", line 289, in get_agent
    object_name = self.agent_to_object(agent_name)
  File "/home/abastol/modules/anaconda3/envs/copo/lib/python3.7/site-packages/metadrive/manager/agent_manager.py", line 316, in agent_to_object
    return self._agent_to_object[agent_id]
KeyError: 'agent29'

Eval error

Hi! I got this error when trying to run evaluate_population.py, how does this happen?

Relationship between networks

Please the relationship between the policy network, the individual value network, the neighborhood value network, and the global value network and how to transfer the parameters between them.

Visualizing when training

Thanks for your code and contribution! It's so great!
Then, I have some questions about visualization and local mode.

First, is it possible to visualize the training process?
Although it might costs lots of memory and efficiency, I prefer to check my scenes as training the model.

Secondly, I'm not sure I get your meaning about local mode. Does that mean I can solve the first question? Would you mind explaining what you mean more detail?

Sorry to bother you.
Thanks a lot!!

Code implementation

Hello, your code is implemented with ray and tf. Can you please provide a copy with pytorch? Thanks.

When training multi-agent(train_all_copo_dist.py), observation type is not a Dictionary but a Box

When execute 'train_all_copo_dist.py', I was watching the process of generating train data.
in this project, because it is multi-agent observation, I think observation should be gym.Dict. (as wrote in ,annotation:
'metadrive/manager/agent_manager.py' )
but I checked obs_space generated in Box type.
and It was executed normally.(no errors)

How can I fix this problem? Or is this not a problem?

what is the version of "ray" in the torch copo?

I notice that "ray.rllib.algorithms" is imported in the torch version. But the version of ray in "setup.py" is still 1.2.0

Should I upgrate ray to 2.0 version or modify the functions in torch copo to adapt ray-1.2.0 ?

Thanks.

Curriculum Learning implementation problem

Hi!@pengzhenghao Could you explain how you specifically implemented Curriculum Learning in your paper? Thanks!

Does copo use "traffic light" and "communication" information as observation?

TypeError: 'str' object is not callable

(copo) user@user-virtual-machine:~/CoPO/copo_code/copo$ python vis.py
Successfully registered the following environments: ['MetaDrive-validation-v0', 'MetaDrive-10env-v0', 'MetaDrive-100envs-v0', 'MetaDrive-1000envs-v0', 'SafeMetaDrive-validation-v0', 'SafeMetaDrive-10env-v0', 'SafeMetaDrive-100envs-v0', 'SafeMetaDrive-1000envs-v0', 'MARLTollgate-v0', 'MARLBottleneck-v0', 'MARLRoundabout-v0', 'MARLIntersection-v0', 'MARLParkingLot-v0', 'MARLMetaDrive-v0'].
Traceback (most recent call last):
File "vis.py", line 49, in
action = policy_function(o, d)
File "/home/user/CoPO/copo_code/copo/eval/get_policy_function.py", line 153, in call
actions = self.policy(obs_to_be_eval)
TypeError: 'str' object is not callable

conflict dependencies

The conflict is caused by:
copo 0.0.0 depends on gym==0.19.0
ray[rllib] 2.2.0 depends on gym<0.24.0 and >=0.21.0; extra == "rllib"

train error with gym=0.19.0 ray = 2.2.0 metadrive-simulator = 0.4.1.2

Hello! I'm interested in your impressive project. However, when I trained these algorithms: ippo, ccppo and copo, the codes had bug.

I don't know how to fix it. Looking forward to your reply!

Some Visualization Issues

Hello, I am very interested in the CoPO project!
But at the moment I have some problems, I hope you can clear my confusion, thanks!

The following error is prompted when running vis_from_checkpoint.py.

My path points to checkpoint-480 as shown. What is the cause of the error? Is it the wrong way to run the script?
I don't understand how the .npz file in the best_checkpoints folder is generated?
You declare checkpoint read type as checkpoint-xxx in vis_from_checkpoint.py.

But declare checkpoint_name is like: {ALGO}_ {ENV} _{INDEX}.npz in get_policy_function.py.

Which way should I follow? Do I need to convert checkpoint-xxx files to npz files? How to convert it?
What does "Note that if you are restoring CoPO checkpoint, you need to implement appropriate wrapper to encode the LCF into the observation and feed them to the neural network." in vis_from_checkpoint.py mean?
How is the following visualization made? The vehicle trajectory and collision location are visually displayed, which is great!

Very much looking forward to your reply! Thank you for taking the time to answer these questions!

Running on a remote server?

Hello, I want to run the visualization program python vis.py on a remote server, but the following problem occurs.

Can the visualization program be started on a remote server?

dos CoPO use only one policy? or many policies for each agent?

****I am asking because I am confused whether 'CoPO' trains several policies or not. I understood that It would use one policy per agent, is this right?

Fail to restore data from checkpoint

Hi Zhenghao,
I trained the intersection using torch copo (train_copo.py) and tried to evaluate the performance using copo_code/new_vis.py.
However, it gives unpickled = pickle.loads(data) TypeError: an integer is required (got type bytes).
I used the file that was stored in TEST/CoPOTraininger_Multi.../checkpoint_000440/algorithm_state.pkl. Did I use the wrong pkl file?

Originally posted by @XilunZhangRobo in #25 (comment)

Issues of installation caused by gym's vision.

I have installed the gym==0.19.0 and there is a error about the conflict between the copo==0.0.0 and ray[rllib]==2.2.0.
The error just is like this:
"
The conflict is caused by:
copo 0.0.0 depends on gym==0.19.0
ray[rllib] 2.2.0 depends on gym<0.24.0 and >=0.21.0; extra == "rllib"
"

I have tried to uninstall the ray==2.2.0 but there were still this error... How can I solve this problem?

Visualize PGMap

Hello, I reproduced your code, in addition to the five scenarios in the paper, there is also a PGMap scenario, in terms of success rate, PGMapde success rate is very high, other scenarios have a very low success rate, so I want to visualize PGMap.
The model I trained has been converted into .npz . According to the previous requirements, and the five scenarios in the paper can be visualized normally except for the low success rate (which may not be trained well).But visualizing PGMap has a success rate of 0! Each time it collides halfway, it does not match the success rate of training of 0.8

I added the PGMap scene to the vis.py file

Also prompt that variable meta_svo_lookup_table is required.Noting that it was mean and std, I found the progress .csv and added two variables.
I would like to ask which step is wrong or what needs to be added to make the success rate normal

How can I reproduce experimental results?

Hello, I am very impressed with the CoPO project. Thank you for sharing a great paper and code.
I wanted to see the trained multi-agent, so I visualized it using the weight stored in copo_code/copo/best_checkpoint/ and copo_code/vis.py. file. (without any modifications)
However, unlike the paper, I was able to render agents with lower performance(lower succeess rate).
How should I modify the code to see the higher performance of agents like your paper?
I look forward to your reply. Thank you.

When I run train_copo, I get an error during the run

2023-06-19 21:22:48,277 ERROR trial_runner.py:1088 -- Trial CoPOTrainer_MultiAgentIntersectionEnv_0e9dd_00000: Error processing event.
ray.exceptions.RayTaskError(AttributeError): ray::CoPOTrainer.save() (pid=22984, ip=127.0.0.1, repr=CoPOTrainer)
File "python\ray_raylet.pyx", line 830, in ray._raylet.execute_task
File "python\ray_raylet.pyx", line 834, in ray._raylet.execute_task
File "python\ray_raylet.pyx", line 780, in ray._raylet.execute_task.function_executor
File "D:\Anaconda\envs\copo\lib\site-packages\ray_private\function_manager.py", line 674, in actor_method_executor
return method(__ray_actor, *args, **kwargs)
File "D:\Anaconda\envs\copo\lib\site-packages\ray\util\tracing\tracing_helper.py", line 466, in _resume_span
return method(self, *_args, **_kwargs)
File "D:\Anaconda\envs\copo\lib\site-packages\ray\tune\trainable\trainable.py", line 473, in save
checkpoint_dict_or_path = self.save_checkpoint(checkpoint_dir)
File "D:\Anaconda\envs\copo\lib\site-packages\ray\util\tracing\tracing_helper.py", line 466, in _resume_span
return method(self, *_args, **_kwargs)
File "D:\Anaconda\envs\copo\lib\site-packages\ray\rllib\algorithms\algorithm.py", line 1959, in save_checkpoint
state = self.getstate()
File "D:\Anaconda\envs\copo\lib\site-packages\ray\util\tracing\tracing_helper.py", line 466, in _resume_span
return method(self, *_args, **_kwargs)
File "D:\Anaconda\envs\copo\lib\site-packages\ray\rllib\algorithms\algorithm.py", line 2382, in getstate
state["worker"] = self.workers.local_worker().get_state()
File "D:\Anaconda\envs\copo\lib\site-packages\ray\rllib\evaluation\rollout_worker.py", line 1602, in get_state
policy_states[pid] = self.policy_map[pid].get_state()
File "D:\Anaconda\envs\copo\lib\site-packages\ray\rllib\policy\torch_mixins.py", line 98, in get_state
state = super().get_state()
File "D:\Anaconda\envs\copo\lib\site-packages\ray\rllib\policy\torch_policy_v2.py", line 907, in get_state
state = super().get_state()
File "D:\Anaconda\envs\copo\lib\site-packages\ray\rllib\policy\policy.py", line 917, in get_state
state["policy_spec"] = policy_spec.serialize()
File "D:\Anaconda\envs\copo\lib\site-packages\ray\rllib\policy\policy.py", line 133, in serialize
"action_space": space_to_dict(self.action_space),
File "D:\Anaconda\envs\copo\lib\site-packages\ray\rllib\utils\serialization.py", line 140, in space_to_dict
d = {"space": gym_space_to_dict(space)}
File "D:\Anaconda\envs\copo\lib\site-packages\ray\rllib\utils\serialization.py", line 119, in gym_space_to_dict
return _box(space)
File "D:\Anaconda\envs\copo\lib\site-packages\ray\rllib\utils\serialization.py", line 62, in _box
"shape": sp._shape, # shape is a tuple.
AttributeError: 'Box' object has no attribute '_shape'

Why this error occurs and how to correct it