Giter VIP home page Giter VIP logo

crowdnav's Introduction

CrowdNav

Website | Paper | Video

This repository contains the codes for our ICRA 2019 paper. For more details, please refer to the paper Crowd-Robot Interaction: Crowd-aware Robot Navigation with Attention-based Deep Reinforcement Learning.

Please find our more recent work in the following links

Abstract

Mobility in an effective and socially-compliant manner is an essential yet challenging task for robots operating in crowded spaces. Recent works have shown the power of deep reinforcement learning techniques to learn socially cooperative policies. However, their cooperation ability deteriorates as the crowd grows since they typically relax the problem as a one-way Human-Robot interaction problem. In this work, we want to go beyond first-order Human-Robot interaction and more explicitly model Crowd-Robot Interaction (CRI). We propose to (i) rethink pairwise interactions with a self-attention mechanism, and (ii) jointly model Human-Robot as well as Human-Human interactions in the deep reinforcement learning framework. Our model captures the Human-Human interactions occurring in dense crowds that indirectly affects the robot's anticipation capability. Our proposed attentive pooling mechanism learns the collective importance of neighboring humans with respect to their future states. Various experiments demonstrate that our model can anticipate human dynamics and navigate in crowds with time efficiency, outperforming state-of-the-art methods.

Method Overview

Setup

  1. Install Python-RVO2 library
  2. Install crowd_sim and crowd_nav into pip
pip install -e .

Getting Started

This repository is organized in two parts: gym_crowd/ folder contains the simulation environment and crowd_nav/ folder contains codes for training and testing the policies. Details of the simulation framework can be found here. Below are the instructions for training and testing policies, and they should be executed inside the crowd_nav/ folder.

  1. Train a policy.
python train.py --policy sarl
  1. Test policies with 500 test cases.
python test.py --policy orca --phase test
python test.py --policy sarl --model_dir data/output --phase test
  1. Run policy for one episode and visualize the result.
python test.py --policy orca --phase test --visualize --test_case 0
python test.py --policy sarl --model_dir data/output --phase test --visualize --test_case 0
  1. Visualize a test case.
python test.py --policy sarl --model_dir data/output --phase test --visualize --test_case 0
  1. Plot training curve.
python utils/plot.py data/output/output.log

Simulation Videos

CADRL LSTM-RL
SARL OM-SARL

Learning Curve

Learning curve comparison between different methods in an invisible setting.

Citation

If you find the codes or paper useful for your research, please cite our paper:

@inproceedings{chen2019crowd,
  title={Crowd-robot interaction: Crowd-aware robot navigation with attention-based deep reinforcement learning},
  author={Chen, Changan and Liu, Yuejiang and Kreiss, Sven and Alahi, Alexandre},
  booktitle={2019 International Conference on Robotics and Automation (ICRA)},
  pages={6015--6022},
  year={2019},
  organization={IEEE}
}

crowdnav's People

Contributors

changanvr avatar lunox avatar svenkreiss avatar tessavdheiden avatar yuejiangliu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

crowdnav's Issues

training issue

Hi,sorry to bother you.
I don't want to imitate learning,so I set 'imitation_learning=False' in train.py,but there are some error:
AttributeError: 'JointState' object has no attribute 'unsqueeze' File "train.py", line 188, in
main()
File "train.py", line 140, in main
explorer.run_k_episodes(il_episodes, 'train', update_memory=True, imitation_learning=False)
File "../crowd_nav/utils/explorer.py", line 72, in run_k_episodes
self.update_memory(states, actions, rewards, imitation_learning)
File "../crowd_nav/utils/explorer.py", line 116, in update_memory
value = reward + gamma_bar * self.target_model(next_state.unsqueeze(0)).data.item()
AttributeError: 'JointState' object has no attribute 'unsqueeze'

Can you tell me how to solve it?Thanks very much!

Application on python2.7

Hi,
I use your model on ROS, but ROS only support python2.7. I try to use python2.7 to run but it failed. There are some syntax difference between them. I truely need some module like ORCA, Can you have some good ideas to cope with it?
Thank you!

I have two questions about testing

Hi, i read your paper about CADRL. it is very awesome to me.
By the way, i want to know there is possible about changing agent numbers.
(1) If i want to testing 6 human or robot, what can i do for code?
(2) After changing the code, training 6 agent environment, then would i be able to test that weight to 10 or more agent environment? CADRL , it would works well?

haha... sorry i have a another question.
Why human's policy must be ORCA?.
My opinion is that actually ORCA is not really human's policy, but it should be robot policy.

Static Human

Hello,I'm sorry to bother you. I tried to set several static humans whose start points are the same as their goals. But I found during navigating,the static cicles would also move,they didn't keep still. I don't know why,is this situation normal? I mix generate_circle_crossing_human strategy and generate_square_crossing_human strategy, may this cause the problem?

Training error

(py36) hailong@hailong-HP-Compaq-Pro-6300-MT:~/CrowdNav/crowd_nav$ python train.py --policy sarl
Output directory already exists! Overwrite the folder? (y/n)y
2019-06-26 15:14:04, INFO: Current git head hash code: %s
double free or corruption (!prev)
Aborted (core dumped)
(py36) hailong@hailong-H

Number of humans in 'train' and 'test'

Excuse me.

I'm confused about the human_num in crowd_sim.py.
Why should we set human_num = 1 when self.robot.policy.multiagent_training = 1?

`

np.random.seed(counter_offset[phase] + self.case_counter[phase])
if phase in ['train', 'val']:        
    human_num = self.human_num if self.robot.policy.multiagent_training else 1
    self.generate_random_human_position(human_num=human_num, rule=self.train_val_sim)
else:
    self.generate_random_human_position(human_num=self.human_num, rule=self.test_sim)
# case_counter is always between 0 and case_size[phase]
self.case_counter[phase] = (self.case_counter[phase] + 1) % self.case_size[phase]

`
what I want to know is
In training stage, do we use 1 human agent or 5 human agents (env.config [sim] human_num=5)?

I really need your help, thanks~

ROS implementation

How can the different policies be implemented in ROS and take sensor input (camera)?

Problem of importing crowd_sim

When I run 'python train.py --policy sarl', I got the following problem:

Traceback (most recent call last):
File "train.py", line 10, in
from crowd_sim.envs.utils.robot import Robot
File "/home/sam/CrowdNav/crowd_sim/envs/init.py", line 1, in
from crowd_sim.envs import crowd_sim
File "/home/sam/CrowdNav/crowd_sim/envs/crowd_sim.py", line 548
nonlocal global_step
^
SyntaxError: invalid syntax

Would you please help me to solve this problem? Thank you very much!

Social robot

Hi Changan!

Some interesting results I gained with altering the reward for discomfort distance:
reward = (dmin - self.discomfort_dist) * self.discomfort_penalty_factor * self.time_step * (1 - (self.robot.vx**2 + self.robot.vy**2)**.5)

Some additional settings:
discomfort_dist = 1.2
[robot] visible = true

What this does, it penalizes the robot not only on getting close to humans, but also if it is close and has a high speed (nervous behavior).

Please let me know how if you want to have the video and/or let me commit this.

How to train from RL? ()

If I just want to go straight to RL, can I just remove (#) the following code?

imitation learning

# if args.resume:
#     if not os.path.exists(rl_weight_file):
#         logging.error('RL weights does not exist')
#     model.load_state_dict(torch.load(rl_weight_file))
#     rl_weight_file = os.path.join(args.output_dir, 'resumed_rl_model.pth')
#     logging.info('Load reinforcement learning trained weights. Resume training')
# elif os.path.exists(il_weight_file):
#     model.load_state_dict(torch.load(il_weight_file))
#     logging.info('Load imitation learning trained weights.')
# else:
#     il_episodes = train_config.getint('imitation_learning', 'il_episodes')
#     il_policy = train_config.get('imitation_learning', 'il_policy')
#     il_epochs = train_config.getint('imitation_learning', 'il_epochs')
#     il_learning_rate = train_config.getfloat('imitation_learning', 'il_learning_rate')
#     trainer.set_learning_rate(il_learning_rate)
#     if robot.visible:
#         safety_space = 0
#     else:
#         safety_space = train_config.getfloat('imitation_learning', 'safety_space')
#     il_policy = policy_factory[il_policy]()
#     il_policy.multiagent_training = policy.multiagent_training
#     il_policy.safety_space = safety_space
#     robot.set_policy(il_policy)
#     explorer.run_k_episodes(il_episodes, 'train', update_memory=True, imitation_learning=True)  ##
#     trainer.optimize_epoch(il_epochs)  ##
#     torch.save(model.state_dict(), il_weight_file)
#     logging.info('Finish imitation learning. Weights saved.')
#     logging.info('Experience set size: %d/%d', len(memory), memory.capacity)
explorer.update_target_model(model)

I have tried, but it surprises me that the first Val got success rate 1.
Is there something wrong with the change? How to achieve only-RL training?

I really need your help. Thanks~

Dose using CPU faster than GPU?

Hi, I try to train in RTX 2080 but it is too slow and 5000 episodes cost about 18h. The article told that train in CPU just took 10h. I wonder whether the CPU training is the recommanded option. Thank you!

compared algorithm question

Could you please tell me the computional cost of ORCA, CADRL, LSTM-RL and SARL?
I feel that these algorithms are not in the same category. It is not easy to compare. ORCA may perform best in other environments. Is that true? What is their computational cost?

Problem in changing the optimizer

I tried different optimizer during the training period: SGD and Adam. However, after I modified trainer.py as: self.optimizer = optim.Adam(self.model.parameters(), lr=learning_rate) and run python train.py --policy sarl in terminal, it appears:

Traceback (most recent call last):
File "train.py", line 177, in
main()
File "train.py", line 159, in main
explorer.run_k_episodes(env.case_size['val'], 'val', episode=episode)
File "/home/jackeywang/CrowdNav/crowd_nav/utils/explorer.py", line 42, in run_k_episodes
action = self.robot.act(ob)
File "/home/jackeywang/CrowdNav/crowd_sim/envs/utils/robot.py", line 13, in act
action = self.policy.predict(state)
File "/home/jackeywang/CrowdNav/crowd_nav/policy/multi_human_rl.py", line 58, in predict
raise ValueError('Value network is not well trained. ')
ValueError: Value network is not well trained.

I can't figure out why different optimizer can cause error like that, hope for your help.

Questions about the policy of human

Hi! I want to know could human's policy be CADRL or SARL for the purpose of Crowd Simulation? I noticed that you have implemented ORCA and Linear policy for human. But I don't know how to replace ORCA with CADRL or SARL. Could you give me some advice? Thank you!

If I use RGB

I wonder how did you train your robot when you use RGB image as input?

some questions about setup

hi,sorry to bother you .i can not understand the command in the setup,which means"install crowd_nav into pip",as i know,the crowd_nav is a folder name, how should i do to install this requirments. i would like to seek for your help , thank you very much .

training problem

sorry to bother you.
Do you know how to deal with this error?

Traceback (most recent call last):
File "/home/naiyisiji/Projects/CrowdNav/crowd_nav/train.py", line 177, in
main()
File "/home/naiyisiji/Projects/CrowdNav/crowd_nav/train.py", line 129, in main
explorer.run_k_episodes(il_episodes, 'train', update_memory=True, imitation_learning=True)
File "/home/naiyisiji/Projects/CrowdNav/crowd_nav/utils/explorer.py", line 36, in run_k_episodes
ob = self.env.reset(phase)
TypeError: reset() takes 1 positional argument but 2 were given

GPU computation

Hi!

If you would like to use the GPU and the occupancy map, the map have to be put on cuda.

Could you make me a collaborator, so I can commit those changes (2 lines in multi-human-rl)?

Best,
Tessa

training issues

python test.py --policy sarl --model_dir data/output --phase test --visualize --test_case 0 --traj
Traceback (most recent call last):
File "test.py", line 8, in
from crowd_nav.utils.explorer import Explorer
File "/home/ubuntu/J/TLSGAN-master/crowd_nav/utils/explorer.py", line 7, in
from crowd_nav.policy.policy_factory import policy_factory
File "/home/ubuntu/J/TLSGAN-master/crowd_nav/policy/policy_factory.py", line 6, in
policy_factory['cadrl'] = CADRL
TypeError: 'module' object does not support item assignment

questions about 'agent.py'

Hello! I am interested in your fantastic research work and have some questions when reading your code:
In the 'agent.py' (/crowd_sim/envs/utils/), I noticed these lines in the 'get_next_observable_state' function: (when the kinematics is 'unicycle')
next_vx = action.v * np.cos(self.theta)
next_vy = action.v * np.sin(self.theta)

My question is why it should be 'self_theta' ? I mean why the next_vx and next_vy are not computed using the new theta = self.theta + action.r ?
Could you please explain it? Thank you very much!

question about different number of pedestrians

Hello
I saw that you mentioned in your paper that you use the attention module to handle the variable number of pedestrians, but when looking at your code, I saw that you set the number of other pedestrians to 5, not see the treatment of different length. (See the padding code in explorer.py, but you commented them out)
Therefore, do you assume a fixed number of pedestrians during training, so when you change the number of pedestrians, you need to retrain the model, or are there other things I haven't noticed?
Looking forward to your answer
您好
不太清楚我的表达是不是很清楚,看到您似乎也是**人,因此中文再说一遍:我看到在论文里您提到attention模块可以处理不同数量的行人,但是我在您的代码里是直接设置了行人数量为5。(我在explorer.py这个文件里看到了一些padding的代码,但是被注释掉了)
因此,请问您在训练中是假定了一个固定数量的行人么?那这样的话,如果拿到另外一个行人数量不通的环境里还需要重新train模型么?或者说还是有其他我没有注意到的地方呢?
期待回复,祝科研顺利

Training Issue

Sorry to bother you, I am new to learn reinforcement learning and i' d like to ask you some question.

  1. I wonder how could i change the times of iterations of training?
  2. Could i visualize the process during the training?
  3. Are there exist pre-trained model of IL or RL?

Looking forward to your reply. Thank you very much.

TypeError: 'NoneType' object is not subscriptable

Dear prof.Chen
I tried to run this command "python test.py --policy orca --phase test --visualize --test_case 4 --gpu "
,output "TypeError: 'NoneType' object is not subscriptable". I tried to repair but it didn't work. Any help is appreciated.
Thanks,
Taku
################################################################################
Details: python test.py --policy orca --phase test --visualize --test_case 4 --gpu
2018-10-10 10:22:17, INFO: Using device: cuda:0
hereconfigs/policy.config
configs/env.config
2018-10-10 10:22:17, INFO: human number: 5
2018-10-10 10:22:17, INFO: Not randomize human's radius and preferred speed
2018-10-10 10:22:17, INFO: Training simulation: circle_crossing, test simulation: circle_crossing
2018-10-10 10:22:17, INFO: Square width: 10.0, circle width: 4.0
2018-10-10 10:22:17, INFO: ORCA agent buffer: 0.000000
2018-10-10 10:22:17, INFO: Agent is invisible and has holonomic kinematic constraint
args.video_file None
5
Traceback (most recent call last):
File "test.py", line 115, in
main()
File "test.py", line 104, in main
env.render('video', args.video_file)
File /CrowdNav/crowd_sim/envs/crowd_sim.py", line 504, in render
humans = [plt.Circle(human_positions[0][i], self.humans[i].radius,fill=False,color=(self.attention_weights[0][i], 0, 0)) for i in range(len(self.humans))]
File "/CrowdNav/crowd_sim/envs/crowd_sim.py", line 504, in
humans = [plt.Circle(human_positions[0][i], self.humans[i].radius,fill=False,color=(self.attention_weights[0][i], 0, 0)) for i in range(len(self.humans))]
TypeError: 'NoneType' object is not subscriptable

The training stopped halfway

When I start to train the agent, it seems some errors will stop the training.
For example

2022-04-01 22:02:19, INFO: Current git head hash code: %s
2022-04-01 22:02:19, INFO: Using device: cpu
2022-04-01 22:02:19, INFO: Policy: CADRL without occupancy map
2022-04-01 22:02:19, INFO: human number: 5
2022-04-01 22:02:19, INFO: Not randomize human's radius and preferred speed
2022-04-01 22:02:19, INFO: Training simulation: circle_crossing, test simulation: circle_crossing
2022-04-01 22:02:19, INFO: Square width: 10.0, circle width: 4.0
2022-04-01 22:02:19, INFO: Current learning rate: 0.010000
malloc_consolidate(): invalid chunk size

Process finished with exit code 134 (interrupted by signal 6: SIGABRT)

or

..........................
...........................
2022-04-01 21:41:03, INFO: TRAIN in episode 321 has success rate: 0.00, collision rate: 0.00, nav time: 25.00, total reward: 0.0000
Process finished with exit code 139 (interrupted by signal 11: SIGSEGV)

Can I ask what caused this?

how to train/test cadrl and lstm_rl?

Hello:
First of all, thank you for your open source code. I want to know how to train/test the cadrl and lstm_rl model? when I try to use this:
python test.py --policy cadrl --model_dir data/output --phase test or add --il like this:
python test.py --policy cadrl --model_dir data/output --phase test --il
I get this error:
2018-11-17 21:58:04, INFO: Using device: cpu 2018-11-17 21:58:04, INFO: Policy: CADRL without occupancy map Traceback (most recent call last): File "test.py", line 123, in <module> main() File "test.py", line 60, in main policy.get_model().load_state_dict(torch.load(model_weights)) File "/home/tcj/.local/lib/python3.5/site-packages/torch/nn/modules/module.py", line 719, in load_state_dict self.__class__.__name__, "\n\t".join(error_msgs))) RuntimeError: Error(s) in loading state_dict for ValueNetwork: Missing key(s) in state_dict: "value_network.0.bias", "value_network.0.weight", "value_network.2.bias", "value_network.2.weight", "value_network.4.bias", "value_network.4.weight", "value_network.6.bias", "value_network.6.weight". Unexpected key(s) in state_dict: "mlp1.0.weight", "mlp1.0.bias", "mlp1.2.weight", "mlp1.2.bias", "mlp2.0.weight", "mlp2.0.bias", "mlp2.2.weight", "mlp2.2.bias", "attention.0.weight", "attention.0.bias", "attention.2.weight", "attention.2.bias", "attention.4.weight", "attention.4.bias", "mlp3.0.weight", "mlp3.0.bias", "mlp3.2.weight", "mlp3.2.bias", "mlp3.4.weight", "mlp3.4.bias", "mlp3.6.weight", "mlp3.6.bias".

About using the GPU

When I run: python train.py --policy sarl
Every time there will be: INFO: Using device: cpu

I wonder how can I use GPU to speed up the training?

My server's setting, GPU: GTX 1080Ti × 3 (VRAM: 11GB × 3)

Thanks a lot.

Real-world Experiments code

Hello, thanks for your contribution about DRL.
I'm interested in mobile robot with DRL. When I read your paper and watched the video you tested using Segway.
Could you open the source for Segway porting source?

Regards.

Grid locations to be corrected?

Hi Changan!

Super code! In multi_human_rl.py you compute the locations incorrectly. Please alter this piece of code:
line 154:

dm[int(index) + self.cell_num ** 2].append(other_vx[i])
dm[int(index) + self.cell_num ** 2 * 2].append(other_vy[i])

Question in terms of the differences between mlp1 and mlp2

Thanks for your great repo!
According to your paper, the mlp1 is used to embed the joint state of robot and human, while the mlp2 is used to obtain pairwise interaction features between robot and each human. However, based on the code, they look identical to me (both are a sequence of linear layers. perhaps the only differences is that mlp1 has a last relu but mlp2 doesn't).

I am wondering why they (i.e., mlp1 and mp2) have different purpose if they are the same? Or why the outputs of mlp1 are not the pairwise interaction features between robot and each human?

Thanks!

drawing issue

Sorry to bother you. Could you please teach me how to draw a picture like this?(Trajectory comparison in an invisible test case in your paper's figure 5)
image

training issues

Sorry to bother you, I want to do a comparative experiment of sarl. So the first step is that I need to train other algorithm, when I run the command "python train.py --policy orca", it prints:

(CrowdNav) hurong@hurong:~/CrowdNav/crowd_nav$ python train.py --policy orca
Output directory already exists! Overwrite the folder? (y/n)y
2021-02-05 12:56:38, INFO: Current git head hash code: %s
2021-02-05 12:56:38, INFO: Using device: cpu
usage: Parse configuration file [-h] [--env_config ENV_CONFIG]
[--policy POLICY]
[--policy_config POLICY_CONFIG]
[--train_config TRAIN_CONFIG]
[--output_dir OUTPUT_DIR] [--weights WEIGHTS]
[--resume] [--gpu] [--debug]
Parse configuration file: error: Policy has to be trainable

Could you please tell me why I can not train the orca?

NameError: name 'orientation_arrows' is not defined

Dear prof.Chen
Thank you for getting back to me. I run this program in my computer.But when I tried to run this command "python test.py --policy sarl --model_dir data/sarl --phase test --visualize --test_case 0", the output "NameError: name 'orientation_arrows' is not defined". I am not sure what should I do to repair it. Any help is appreciated.
Thanks,
Zhang Di

Is it the transition probability from time t to time t + ∆t assumed to be known?

Hi, thanks again for your repo.
I am wondering is it the transition probability from time t to time t + ∆t assumed to be known?
Since in [1] the transition probability is unknown and thus the authors proposed to employ a policy-based learning.

Thanks!

[1] Michael Everett‡ , Yu Fan Chen† , and Jonathan P. How‡. Motion Planning Among Dynamic, Decision-Making Agents with Deep Reinforcement Learning.

command problem

Sorry to bother you, I do not know what do these command mean, like symbol "--"
if I want to test the sarl policy, can I just run the one command "python test.py --policy sarl --model_dir data/output --phrase test". Because there is another commmand called "python test.py --policy orca --phrase test".I think this command is to test orca policy. But I just want to test sarl policy.
Another question is what is "--model_dir data/output --phrase test" mean?
Please help me, I will be greatful. Thank you

some questions about the experiment in real world

Hi~ I've read your paper in detail , and I would like to ask that how you used the onboard sensors to perceive the position and speed state of the pedestrians(i.e obstacles ) around segway robot in the real world experiment. I plan to do similar experiments but I'm not sure how to design perception module to get pedestrians' state in real world.

Tuining Paramters

How did you tune or decide the RL parameters?
Is there any way to tune parameters or heuristically decided?

train question

Excuse me,can you tell me how to train the agent in linear policy? and when I modified the train parameter,after the imitating learing is done, when run the command explorer.run_k_episodes(env.case_size['val'], 'val', episode=episode), the action = self.robot.act(ob) ob, reward, done, info = self.env.step(action) show the action is None (in vx = human.vx - action.vx), can you give me some advice?

Using a trained torch model in C++?

Hi:
My goal is to test these architectures in a C++ written evaluation project to compare with my method. So basically what I need to do is somehow run this trained models in the C++ project, as simple as possible? Does torch have C++ interface? Or I call python function normally in C++?
I read some answers in google but no clear, so can you give some advice? thank you.

train.py error

When I train a policy,the error as follows:
Output directory already exists! Overwrite the folder? (y/n)n
Traceback (most recent call last):
File "train.py", line 177, in
main()
File "train.py", line 57, in main
repo = git.Repo(search_parent_directories=True)
File "/home/ljx/anaconda3/envs/CrowdNav/lib/python3.6/site-packages/git/repo/base.py", line 184, in init
raise InvalidGitRepositoryError(epath)
git.exc.InvalidGitRepositoryError: /home/ljx/桌面/CrowdNav-master/crowd_nav

Is it a problem with the installation package version?

How to set self.kinematics while running CADRL

In the CADRL paper by Chen et al, the heading angle of the robot (i.e. self.robot.theta) is part of the robot's state. But the default kinematic of this repo, as defined in crowd_anv/configs/policy.config, is "holonomic", and when kinematic is set to "holonomic", the angle theta never changes throughout an episode (it is always set to 90 degrees).

So should I set kinematic to "unicycle" while training or testing CADRL?

Thank you a lot!

Visible

Hi Changan,

You mentioned that setting the robot to visible lets the robot learn that humans avoid it.

The videos on the readme, are they in the unvisible setting?

Question about experiment

Hi:
The segway is a differential drive robot, but your paper assumes the robot is holonomic? So what is the difference between these two kinematic modes in your method?

Testing Issue

Hi,
I want to run this repo on my robot, but I face some problems:

After change the Robot-class, Could I use the model weights without re-train?
I found that every-time I change Robot-class I need to use $pip install e .
And need to retrain the model, otherwise it would response "ValueError: Value network is not well trained."

Does it have easy way to use own input?
I have collect the real-time human data, but I don't know how to use these data except write a new test.py

Some weight might be missing

Hi,

I try to run python test.py --policy sarl --model_dir data/output --phase test --visualize --test_case 0 in README.md after the training step is over. But some error is happened like this:

2020-06-16 11:53:35, INFO: Using device: cpu
2020-06-16 11:53:35, INFO: Policy: SARL w/ global state
Traceback (most recent call last):
  File "test.py", line 113, in <module>
    main()
  File "test.py", line 59, in main
    policy.get_model().load_state_dict(torch.load(model_weights))
  File "/home/jxk/anaconda3/lib/python3.5/site-packages/torch/nn/modules/module.py", line 769, in load_state_dict
    self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for ValueNetwork:
	Missing key(s) in state_dict: "mlp1.0.weight", "mlp1.0.bias", "mlp1.2.weight", "mlp1.2.bias", "mlp2.0.weight", "mlp2.0.bias", "mlp2.2.weight", "mlp2.2.bias", "attention.0.weight", "attention.0.bias", "attention.2.weight", "attention.2.bias", "attention.4.weight", "attention.4.bias", "mlp3.0.weight", "mlp3.0.bias", "mlp3.2.weight", "mlp3.2.bias", "mlp3.4.weight", "mlp3.4.bias", "mlp3.6.weight", "mlp3.6.bias". 
	Unexpected key(s) in state_dict: "value_network.0.weight", "value_network.0.bias", "value_network.2.weight", "value_network.2.bias", "value_network.4.weight", "value_network.4.bias", "value_network.6.weight", "value_network.6.bias".

I run the process in README.md step by step, Can you tell what is going on or something I miss?

Thank you!

Action space question

I'm sorry to bother you. Excuse me, in 80 discrete action spaces, why I'm sorry to bother you. Excuse me, in 80 discrete action spaces, why should the linear velocity be exponentially valued?

Questions about action sampling

Hi, I have some questions about the following code in "build_action_space" function (in cadrl.py,line 82):
speeds = [(np.exp((i + 1) / self.speed_samples) - 1) / (np.e - 1) * v_pref for i in range(self.speed_samples)]
The comment said the actions were sampled uniformly. Hence I cannot understand why the speeds are given by exponential function within (0,1]?
Thank you very much!

Where to find test scenarios?

Hi, great work for integrating many RL-based policies into one system! I am working on with my own methods and want to make comparison with existing methods. I wonder where I can find the testing scenarios so I can do the comparison?

Many thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.