mhauskn / dqn-hfo Goto Github PK

View Code? Open in Web Editor NEW

77.0 77.0 23.0 463 KB

License: MIT License

CMake 5.53% C++ 75.92% Shell 18.55%

dqn-hfo's People

Contributors

Stargazers

Watchers

dqn-hfo's Issues

Try fewer state features?

Hi @mhauskn. Thanks a lot for your work. I’m curious about one thing. By HFO manual, there are 58(+1) state features for only-one-offense-player setting, but these features are over complete, as written in the manual. For example, we only need 2 features to locate the position of one player, but there are much more Landmark and Proximity features. Have you considered or tried using fewer state features?

Understanding Gradient Inversion

Sorry, if this appears to be a stupid question. I am trying to implement gradient inversion using PyTorch based on the paper but I would like to ask for some clarifications. Is the inversion done on all the layers? or is it just done on the last layer? If it's the former case, we would have to keep the output of each layer

Thanks a lot for your help in advance

Shortcuts for visualization?

Since my server doesn't support X11 display forwarding, is there an easier way to visualize the learnt policies than having the copy the rcg logs to my local machine and rendering the screenshots via the HFO GUI?
Is there a shortcut to visualize the policies while the training takes place? I would really like an alternative to having to wait till the end of a run to check out what kind of policy is being learnt (apart from the messy way of saving checkpoints at frequent intervals and then performing (1) above?

Not able to progress in learning?

I'm trying to run the test job ./bin/dqn -save state/test -alsologtostderr --nogpu on the latest version of HFO without a gpu. Even after 2000 iterations I'm not seeing any improvement in episode reward. Is this some issue with the HFO version I'm using? Some sample output after ~1500 iterations:

I0101 11:53:57.222157 29593 dqn_main.cpp:355] [Agent0] Episode 1478 reward = -0.00117409
I0101 11:53:57.345566 29593 sgd_solver.cpp:92] Gradient clipping: scaling down gradients (L2 norm 1.23605e+09 > 10) by scale factor 8.09029e-09
I0101 11:53:57.636891 29593 sgd_solver.cpp:92] Gradient clipping: scaling down gradients (L2 norm 1.07481e+09 > 10) by scale factor 9.30401e-09
I0101 11:53:57.838814 29593 sgd_solver.cpp:92] Gradient clipping: scaling down gradients (L2 norm 1.40991e+09 > 10) by scale factor 7.09265e-09
I0101 11:53:58.040874 29593 sgd_solver.cpp:92] Gradient clipping: scaling down gradients (L2 norm 1.47654e+09 > 10) by scale factor 6.77259e-09
I0101 11:53:58.246521 29593 sgd_solver.cpp:92] Gradient clipping: scaling down gradients (L2 norm 1.52825e+09 > 10) by scale factor 6.54343e-09
I0101 11:53:58.463399 29593 sgd_solver.cpp:92] Gradient clipping: scaling down gradients (L2 norm 1.20801e+09 > 10) by scale factor 8.2781e-09
I0101 11:53:58.706378 29593 sgd_solver.cpp:92] Gradient clipping: scaling down gradients (L2 norm 1.23667e+09 > 10) by scale factor 8.08623e-09
I0101 11:53:58.941028 29593 sgd_solver.cpp:92] Gradient clipping: scaling down gradients (L2 norm 1.38584e+09 > 10) by scale factor 7.21582e-09
I0101 11:53:59.179901 29593 sgd_solver.cpp:92] Gradient clipping: scaling down gradients (L2 norm 4.34253e+08 > 10) by scale factor 2.30281e-08
I0101 11:53:59.376896 29593 sgd_solver.cpp:92] Gradient clipping: scaling down gradients (L2 norm 1.66873e+09 > 10) by scale factor 5.99256e-09
EndOfTrial: 0 / 1580 162979 OUT_OF_TIME

Runtime error (./bin/dqn -save state/test -alsologtostderr)

Sorry, I have a Bus error (core dump) .
And, the runtime does't end.

I1003 17:56:31.852213 29161 hfo_game.cpp:39] Starting server with command: ./bin/HFO --fullstate --frames-per-trial 500 --port 43927 --offense-agents 1 --offense-npcs 0 --defense-agents 0 --defense-npcs 0 --ball-x-min 0.000000 --ball-x-max 0.200000 --offense-on-ball 0 --headless --no-logging
I1003 17:56:31.852248 29162 dqn_main.cpp:207] Thread 0, port=43927, save_prefix=state/test_agent0
Creating team Agent2d (base)
Creating team Agent2d (base)
Launch npc base_left-1
I1003 17:56:32.149988 29162 dqn_main.cpp:218] Found Resumable(s): [state/test_agent0] , ,
*** Aborted at 1570092992 (unix time) try "date -d @1570092992" if you are using GNU date ***
PC: @ 0x55ad53588d12 dqn::IPLayer()
*** SIGBUS (@0x0) received by PID 29160 (TID 0x7f6b0389b700) from PID 0; stack trace: ***
@ 0x7f6b16dff890 (unknown)
@ 0x55ad53588d12 dqn::IPLayer()
@ 0x55ad5358992d dqn::Tower()
@ 0x55ad53593649 dqn::CreateActorNet()
@ 0x55ad535b7df7 KeepPlayingGames()
@ 0x55ad535b9dc5 std::thread::_Impl<>::_M_run()
@ 0x7f6b15fc69e0 (unknown)
@ 0x7f6b16df46db start_thread
@ 0x7f6b1568388f clone
Launch npc base_left-2
Launch npc base_left-3
Bus error (core dump)
Launch npc base_left-4
Launch npc base_left-5
Launch npc base_left-6
Launch npc base_left-7
Launch npc base_left-8
Launch npc base_left-9
Launch npc base_left-10
Waiting for player-controlled agent base_left-0: config_dir=/home/ci/HFO/bin/teams/base/config/formations-dt, server_port=43927,server_addr=localhost, team_name=base_left, play_goalie=False

what is the Low Level Actions Dash "power" range??

in the manual of HFO, it writes:

so the power's range is [-100, 100].

but in your another papaer:"On-Policy vs. Off-Policy Updates for Deep Reinforcement Learning"

so, what's the power's range?

Very Thanks for your answer.

Create Actor Critic Net prototxt files

Call them actor.prototxt and critic.prototxt.

Please submit PR when done. Thanks!

Update method

I am not familiar with Caffe..So can anyone help me to understand the update of this parameterized action method? Is it still the same as the standard DDPG update or the gradients of critic is calculated in two part: with respect to discrete action and parameters separately? If the gradients is in two part, then how to update the actor network?
Thanks in advance!

Core dumped when running without GPU.

When I run scenario branch on the CPU (i.e. without --gpu), I get the following error in the first episode itself:

base_left: init ok.  unum: 11 side: l
base_left 11:  KickTable created.
base_left 11: [0, 1]  prepare see synch
base_left: init ok.  unum: 11 side: l
base_left 11:  KickTable created.
base_left 11: [0, 1]  prepare see synch
base_left: init ok.  unum: 11 side: l
base_left 11:  KickTable created.
base_left 11: [0, 1]  prepare see synch
E0405 10:29:42.568827 27393 common.cpp:104] Cannot create Cublas handle. Cublas won't be available.
E0405 10:29:42.584054 27393 common.cpp:111] Cannot create Curand generator. Curand won't be available.
F0405 10:29:42.607831 27393 syncedmem.hpp:18] Check failed: error == cudaSuccess (30 vs. 0)  unknown error
*** Check failure stack trace: ***
    @     0x7fa2e6b795cd  google::LogMessage::Fail()
    @     0x7fa2e6b7b433  google::LogMessage::SendToLog()
    @     0x7fa2e6b7915b  google::LogMessage::Flush()
    @     0x7fa2e6b7be1e  google::LogMessageFatal::~LogMessageFatal()
    @     0x7fa2e5fcd081  caffe::SyncedMemory::mutable_cpu_data()
    @     0x7fa2e5fd9631  caffe::Blob<>::Reshape()
    @     0x7fa2e5fd9aba  caffe::Blob<>::Reshape()
    @     0x7fa2e604a8c0  caffe::MemoryDataLayer<>::DataLayerSetUp()
    @     0x7fa2e5ff41b5  caffe::Net<>::Init()
    @     0x7fa2e5ff5bf1  caffe::Net<>::Net()
    @     0x7fa2e613b85a  caffe::Solver<>::InitTrainNet()
    @     0x7fa2e613cb57  caffe::Solver<>::Init()
    @     0x7fa2e613cefa  caffe::Solver<>::Solver()
    @     0x7fa2e600c653  caffe::Creator_AdamSolver<>()
    @           0x465cb9  caffe::SolverRegistry<>::CreateSolver()
    @           0x452ed2  dqn::DQN::Initialize()
    @           0x453f6b  dqn::DQN::DQN()
    @           0x478d61  KeepPlayingGames()
    @           0x47bad2  std::thread::_Impl<>::_M_run()
    @     0x7fa2e56f6c80  (unknown)
    @     0x7fa2e6fc56ba  start_thread
    @     0x7fa2e4e5c41d  clone
    @              (nil)  (unknown)
./run.sh: line 27: 27291 Aborted                 (core dumped)

Am I missing something? Do I need to do something else to run the code on just a CPU?

Commit f385067 doesn't build.

After checking out this branch (with the communication models), when I do a make -j4 after a cmake from inside the build folder, I get the following error:

[ 13%] Built target dummy_goalie
[ 26%] Built target dummy_teammate
[ 33%] Building CXX object CMakeFiles/dqn.dir/src/dqn.cpp.o
[ 46%] Built target passer
[ 53%] Building CXX object CMakeFiles/dqn.dir/src/tasks/move_to_ball.cpp.o
[ 60%] Building CXX object CMakeFiles/dqn.dir/src/dqn_main.cpp.o
[ 73%] Built target chaser
[ 80%] Building CXX object CMakeFiles/dqn.dir/src/tasks/task.cpp.o
[ 86%] Building CXX object CMakeFiles/dqn.dir/src/tasks/kick_to_goal.cpp.o
/data/Code/dqn-hfo_comm/dqn-hfo/src/dqn.cpp: In function ‘void dqn::DotProductLayer(caffe::NetParameter&, const string&, const std::vector<std::__cxx11::basic_string<char> >&, const std::vector<std::__cxx11::basic_string<char> >&, const boost::optional<caffe::Phase>&, int)’:
/data/Code/dqn-hfo_comm/dqn-hfo/src/dqn.cpp:442:3: error: ‘DotProductParameter’ is not a member of ‘caffe’
   caffe::DotProductParameter* param = layer.mutable_dot_product_param();
   ^
/data/Code/dqn-hfo_comm/dqn-hfo/src/dqn.cpp:442:31: error: ‘param’ was not declared in this scope
   caffe::DotProductParameter* param = layer.mutable_dot_product_param();
                               ^
/data/Code/dqn-hfo_comm/dqn-hfo/src/dqn.cpp:442:45: error: ‘class caffe::LayerParameter’ has no member named ‘mutable_dot_product_param’
   caffe::DotProductParameter* param = layer.mutable_dot_product_param();
                                             ^
/data/Code/dqn-hfo_comm/dqn-hfo/src/dqn.cpp: In function ‘void dqn::ParameterLayer(caffe::NetParameter&, const string&, std::__cxx11::string, std::vector<int>)’:
/data/Code/dqn-hfo_comm/dqn-hfo/src/dqn.cpp:453:3: error: ‘ParameterParameter’ is not a member of ‘caffe’
   caffe::ParameterParameter* param = layer.mutable_parameter_param();
   ^
/data/Code/dqn-hfo_comm/dqn-hfo/src/dqn.cpp:453:30: error: ‘param’ was not declared in this scope
   caffe::ParameterParameter* param = layer.mutable_parameter_param();
                              ^
/data/Code/dqn-hfo_comm/dqn-hfo/src/dqn.cpp:453:44: error: ‘class caffe::LayerParameter’ has no member named ‘mutable_parameter_param’
   caffe::ParameterParameter* param = layer.mutable_parameter_param();
                                            ^
CMakeFiles/dqn.dir/build.make:86: recipe for target 'CMakeFiles/dqn.dir/src/dqn.cpp.o' failed
make[2]: *** [CMakeFiles/dqn.dir/src/dqn.cpp.o] Error 1
make[2]: *** Waiting for unfinished jobs....
CMakeFiles/Makefile2:104: recipe for target 'CMakeFiles/dqn.dir/all' failed
make[1]: *** [CMakeFiles/dqn.dir/all] Error 2
Makefile:83: recipe for target 'all' failed
make: *** [all] Error 2

Which commit of caffe should we be using with this branch?

Thanks!

scripts/train.sh:option -- unrecognized - ignored

Sorry for bothering you, After I run a test job, I execute the scripts of train.sh, the result show that: option -- unrecognized - ignored
Usage: cluster graphfile
do you know why this problem occurs, and are the vision of the our clusters different?　I want to use your code to generate the action and to deploy them in the robocup2D, Looking forward to you response.

unix time error?

Has anyone seen this error and/or know how to resolve it? After invoking the command:

mkdir state && ./bin/dqn -save state/test -alsologtostderr

And the game starts, I get,
*** Aborted at 1511282572 (unix time) try "date -d @1511282572" if you are using GNU date ***

I did not have this problem running CPU_ONLY. It's only after getting the CUDA and GPU online that I have the problem. The CAFFE runtest passes all tests.

See below.

I1121 16:42:52.013945 26478 dqn_main.cpp:355] [Agent0] Episode 0 reward = 0
EndOfTrial: 0 / 2 205 OUT_OF_TIME
I1121 16:42:52.031786 26478 dqn_main.cpp:355] [Agent0] Episode 1 reward = 0
EndOfTrial: 0 / 3 308 OUT_OF_TIME
I1121 16:42:52.047994 26478 dqn_main.cpp:355] [Agent0] Episode 2 reward = 0
EndOfTrial: 0 / 4 411 OUT_OF_TIME
I1121 16:42:52.064990 26478 dqn_main.cpp:355] [Agent0] Episode 3 reward = 0
EndOfTrial: 0 / 5 514 OUT_OF_TIME
I1121 16:42:52.081012 26478 dqn_main.cpp:355] [Agent0] Episode 4 reward = 0
EndOfTrial: 0 / 6 617 OUT_OF_TIME
I1121 16:42:52.097050 26478 dqn_main.cpp:355] [Agent0] Episode 5 reward = 0
EndOfTrial: 0 / 7 720 OUT_OF_TIME
I1121 16:42:52.113097 26478 dqn_main.cpp:355] [Agent0] Episode 6 reward = 0
EndOfTrial: 0 / 8 823 OUT_OF_TIME
I1121 16:42:52.131078 26478 dqn_main.cpp:355] [Agent0] Episode 7 reward = 0
EndOfTrial: 0 / 9 926 OUT_OF_TIME
I1121 16:42:52.148632 26478 dqn_main.cpp:355] [Agent0] Episode 8 reward = 0
EndOfTrial: 0 / 10 1029 OUT_OF_TIME
I1121 16:42:52.165283 26478 dqn_main.cpp:355] [Agent0] Episode 9 reward = 0
EndOfTrial: 0 / 11 1132 OUT_OF_TIME
I1121 16:42:52.181612 26478 dqn_main.cpp:355] [Agent0] Episode 10 reward = 0
*** Aborted at 1511282572 (unix time) try "date -d @1511282572" if you are using GNU date ***
PC: @     0x7fedcf9a1a5e (unknown)
*** SIGSEGV (@0xb05c40000) received by PID 26475 (TID 0x7fedacbbc700) from PID 96731136; stack trace: ***
    @     0x7fedcf8644b0 (unknown)
    @     0x7fedcf9a1a5e (unknown)
    @           0x440a2d dqn::ZeroGradParameters<>()
    @           0x433339 dqn::DQN::UpdateActorCritic()
    @           0x43b588 dqn::DQN::Update()
    @           0x458cc0 KeepPlayingGames()
    @           0x45ac61 std::thread::_Impl<>::_M_run()
    @     0x7fedd01d0c80 (unknown)
    @     0x7fedd13176ba start_thread
    @     0x7fedcf9363dd clone
    @                0x0 (unknown)
Segmentation fault (core dumped)

reproduce problem: env is running!

Dear mhauskn:

I want to use tensorflow to reproduce this experiment. But I find some problem.The most import is agent and env use network to connect, so when I training the policy, the env is runing! And when I use network to infer the action(need some time to cal), but the env is also runing!

so how to control the env waiting the agent?

Thx!

cublas_v2.h: No such file or directory

Hi Mhauskn:
I am interested int your research, and I want to rebuild your project dqn-hfo, However when I make the project, error occurred, the compiler says that fetal error cublas_v2.h No such file or directory, I want to know have you meet this problem. And can you give out a document to lead us to learn the project , thx.

Problem while connecting to HFO

Hi,
Thanks for sharing the code.

Seems like there's a problem in connecting to the server. When I run dqn I get these errors:

base_left: init ok.  unum: 11 side: l
***ERROR*** RCSSParamParser. unknown parameter name or invalid value. name=[hfo] value=[1]
***ERROR*** RCSSParamParser. unknown parameter name or invalid value. name=[hfo_log_dated] value=[1]
***ERROR*** RCSSParamParser. unknown parameter name or invalid value. name=[hfo_log_dir] value=[log/]
***ERROR*** RCSSParamParser. unknown parameter name or invalid value. name=[hfo_log_fixed] value=[0]
***ERROR*** RCSSParamParser. unknown parameter name or invalid value. name=[hfo_log_fixed_name] value=[rcssserver]
***ERROR*** RCSSParamParser. unknown parameter name or invalid value. name=[hfo_logging] value=[0]
***ERROR*** RCSSParamParser. unknown parameter name or invalid value. name=[hfo_max_frames] value=[-1]
***ERROR*** RCSSParamParser. unknown parameter name or invalid value. name=[hfo_max_trial_time] value=[500]
***ERROR*** RCSSParamParser. unknown parameter name or invalid value. name=[hfo_max_trials] value=[-1]
***ERROR*** RCSSParamParser. unknown parameter name or invalid value. name=[hfo_max_untouched_time] value=[100]
***ERROR*** RCSSParamParser. unknown parameter name or invalid value. name=[hfo_offense_on_ball] value=[0]
Launch npc base_right-1
Launch npc base_right-2
base_left 11:  KickTable created.
base_left 11: [-1, 0] recv error message [(error only_init_allowed_on_init_port)]

and then the server responds the agent with this message at every cycle of the game (and the monitor doesn't appear):

EndOfTrial: 0 / 3 308 OUT_OF_TIME
I0407 20:50:46.703956 11890 dqn_main.cpp:300] [Agent0] Episode 2 reward = 0.0376374
base_left 11: [308, 0] recv error message [(error illegal_command_form)]
base_left 11: [308, 0] lost dash? at [308, 0] sense=100 internal=101
base_left 11: [308, 0] lost turn_neck? at [308, 0] sense=323 internal=324
base_left 11: [308, 0] lost change_view? at [308, 0] sense=49 internal=50

HFO demos run smoothly and it seems like dqn can't send the commands to HFO properly.

State size 59 vs 58 error

Hi,
I tried he code and it work well on one of my machine; but for a new server with ubuntu 14.04, it shows the following strange error: the state size is 59, while it should be 58.
Can you enlight me why this could happen?

Detailed error log:
...
Launch npc base_right-2
Launch npc base_right-3
Launch npc base_right-4
Launch npc base_right-5
Launch npc base_right-6
Launch npc base_right-7
base_left 11: [0, 7] see synch.
Launch npc base_right-8
Launch npc base_right-9
Launch npc base_right-10
Launch npc base_right-11
Checking all players connected
Starting game
F0823 00:37:30.385576 2669 dqn_main.cpp:110] Check failed: current_state.size() == dqn.state_size() (59 vs. 58)
*** Check failure stack trace: ***
@ 0x7f4619803dbd google::LogMessage::Fail()
@ 0x7f4619805c5d google::LogMessage::SendToLog()
@ 0x7f46198039ac google::LogMessage::Flush()
@ 0x7f461980657e google::LogMessageFatal::~LogMessageFatal()
@ 0x453f89 PlayOneEpisode()
@ 0x455479 KeepPlayingGames()
@ 0x4568df std::thread::_Impl<>::_M_run()
@ 0x7f461826ea60 (unknown)
@ 0x7f46192e0184 start_thread
@ 0x7f46179d637d (unknown)
Aborted (core dumped)

server down

Each time I run dqn, it ends with the belowing error: server down. It always appeared after the Actor/Critic Iteration.
What is the reason for this error? And How can fix it?

I0711 15:58:31.193724 5373 dqn.cpp:807] [Agent0] Critic Iteration 64000, loss = 0.00130656
I0711 15:58:31.193823 5373 dqn.cpp:813] [Agent0] Actor Iteration 64000, avg_q_value = 0.0267761
base_left 11: [154620, 0] recv error message [(error illegal_command_form)]
base_left 11: waited 5 seconds. server down??
F0711 15:58:36.237735 5373 hfo_game.cpp:114] Server Down!
*** Check failure stack trace: ***
@ 0x7fbacb009daa (unknown)
@ 0x7fbacb009ce4 (unknown)
@ 0x7fbacb0096e6 (unknown)
@ 0x7fbacb00c687 (unknown)
@ 0x458189 HFOGameState::update()
@ 0x42c683 PlayOneEpisode()
@ 0x42e864 KeepPlayingGames()
@ 0x42fd6f std::thread::_Impl<>::_M_run()
@ 0x7fbac9eb6a60 (unknown)
@ 0x7fbaca9b6184 start_thread
@ 0x7fbac961e37d (unknown)
@ (nil) (unknown)
[1] 5370 abort (core dumped) ./dqn -save state/test -alsologtostderr

I was trying out dqn-hfo and...

It looks like your using the branch of HFO (I forget the name but there's only one). I try and build that branch and I'm finding it seems to hang after it asks me for passworks. Is this stable enough I should try going this route? Or should I stick to the master version of HFO and then find a verion of hfo_dqn that works the HFO?

I'm interesting in the HFO interface. Seems like a cool concept. I'm interested in following what your doing with it so I'd prefer to run with the latest stuff if possible.

mhauskn / dqn-hfo Goto Github PK

dqn-hfo's People

Contributors

Stargazers

Watchers

Forkers

dqn-hfo's Issues

Recommend Projects

Recommend Topics

Recommend Org