jmichaux / intrinsic-motivation Goto Github PK

View Code? Open in Web Editor NEW

11.0 3.0 6.0 966 KB

Using multiple sensor modalities to improve exploration for robotic manipulation tasks with sparse rewards

Python 44.95% Shell 55.05%

deep-reinforcement-learning intrinsic-motivation curiosity robotics exploration

intrinsic-motivation's People

Contributors

Stargazers

Watchers

Forkers

tarsbase hsouporto kukuxia ricktolar mhmdgaffar kwonyos

intrinsic-motivation's Issues

MuJoCo errors

The Error

I'm getting the following error in MuJoCo:

raise MujocoException('Got MuJoCo Warning: {}'.format(warn)) mujoco_py.builder.MujocoException: Got MuJoCo Warning: Nan, Inf or huge value in QACC at DOF 0. The simulation is unstable. Time = 0.4000.

Based on this, this, this, and this the error is due to time-stepping or actions that are too large.

My observations

There are two things that I noticed:

The Entropy increases over time, from 27 to 165
The actions predicted by the network are quite large, like 1e7 around time the bug occurs

Reproducing the bug

First install the repo. Then, to reproduce the bug starting from the very beginning of training run

python main.py

To reproduce the bug starting at a later point in the training run

python main.py --debug

Questions for debugging

Why does it mater that the actions are large? The environment clips the actions. See below:

https://github.com/jmichaux/multimodal-curiosity/blob/c507206a79595573fcd808cebcf60f98d26789db/multimodal_curiosity/envs/robot_env.py#L63

Will I get errors if I always feed in zeros for the actions?

If I load the bad model, it doesn't matter if all the actions are zero. I still see the bug.

What if I change the environment?

I see the same error if I use FetchReachDense-v1 and FetchPushDense-v1

Is my cpu the problem?
Do I need to normalize the observations?

is this based off of a research paper?

im trying to do research and beat SOTA for a school project, and I want to know if this is based off of a paper that I can somehow improve upon?

ModuleNotFoundError: No module named 'multimodal_envs'

Hi, jmichaux!
I'm interested in your project and I've installed all the libraries you mentioned. I got the following error when I run the main.py:

ModuleNotFoundError: No module named 'multimodal_envs'

I can't find any helpful information and any help will be grateful. Thanks.

When should intrinsic rewards be given?

Right now intrinsic rewards are given on any transition. Intuitively, this isn't quite right because we end up rewarding the agent when it dies. This could, in theory, lead to excessive exploration a la the Noisy TV problem. In practice it doesn't really seem to matter. But it would be interesting to see if we can speed up learning by only giving the exploration bonus on transition where the agent doesn't fail.

Parallelize environments and/or training?

Right now I am only parallelizing the environments to collect more data using the same agent. Does it make sense to use multiple agents for updating the weights? How would I do this? MPI?

Exploration bonus makes training worse

The exploration makes the training worse

jmichaux / intrinsic-motivation Goto Github PK

intrinsic-motivation's People

Contributors

Stargazers

Watchers

Forkers

intrinsic-motivation's Issues

MuJoCo errors

The Error

My observations

Reproducing the bug

Questions for debugging

is this based off of a research paper?

ModuleNotFoundError: No module named 'multimodal_envs'

When should intrinsic rewards be given?

Parallelize environments and/or training?

Exploration bonus makes training worse

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent