Minimal implementation of Decision Transformer: Reinforcement Learning via Sequence Modeling in PyTorch for mujoco control tasks in OpenAI gym

License: MIT License

Python 100.00%

deep-learning deep-reinforcement-learning machine-learning mujoco offline-reinforcement-learning openai-gym pytorch pytorch-transformers reinforcement-learning robotics transformer

min-decision-transformer's Introduction

Hi there 👋

min-decision-transformer's People

Contributors

Stargazers

Watchers

min-decision-transformer's Issues

Any plan to reproduce the atari experiments as well?

Thank you for your simplified code! The original dt repo uses minigpt for the atari experiments as well, so a reproduction is (theoretically) possible. Is there any plan to include this part?

How to calculate the last result scores in the table?

As shown in the table, DT (this repo) got 69.43 in Hopper and 75.47 in Walker2d. I would like to ask how these scores calculated from the csv results?

Training is ok, but failed to eval.

Hello👋,

Thank you for open-sourcing the code for the min decision transformer. Your code has been tremendously helpful in helping me understand DT.

However, I am currently facing an issue. During the training process, the action loss is indeed steadily decreasing, but the test results have consistently been subpar, to the point of having no discernible impact. I've been grappling with this problem for a while now and can't seem to figure out why this is happening.

By the way, I haven't tested it on the three environments, namely halfcheetah, hopper, and walker2d, mainly because I've been struggling with configuring d4rl. I'm using the upgraded version of d4rl provided by Farama, specifically on the pointmaze offline dataset.

If you could spare some time to assist me with this, I would be immensely grateful!

The position of `dropout` operation are different from official repo ?

In official repo, Attention part:
`

    w = nn.Softmax(dim=-1)(w)
    w = self.attn_dropout(w)
    # Mask heads if we want to
    if head_mask is not None:
        w = w * head_mask

    outputs = [torch.matmul(w, v)]`

The dropout is directly after the Softmax and before the matmul.

On the other hand, in our implement:
`

    normalized_weights = F.softmax(weights, dim=-1)
    # attention (B, N, T, D)
    # normalized_weights.shape: (B, N, T, T)
    # v.shape: (B, N, T, D)
    attention = self.att_drop(normalized_weights @ v)`

The dropout is at last.

In my opinion, they are different. How do you think about it? :-)

The calculation of state_mean and state_std in d4rl_info.py

Hello there, we are currently trying to use the code to give some other dataset a go, like walker2d-random-v2. So we would like to kindly ask how state_mean and state_std in d4rl_info are calculated, or whether there is any open data for these values. Thank you very much!

oscillations of eval score

Hi nik!

I have trained the walker2d and other environments several times. The settings and hyper parameters are all followed by original DT. And I found some strange points. Most of the environments, DT has the best score when the training steps between 10000 and 20000, and no obvious increase after 20000 steps.Sometimes, a trough also happened during that period. Would you mind give me some clues about these things?

Debugging a custom gym environment

Hey,
I am trying to train this on my custom gym environment but the model isn't learning at all. Any idea what could be the probable cause?

nikhilbarhate99 / min-decision-transformer Goto Github PK

min-decision-transformer's Introduction

Hi there 👋

min-decision-transformer's People

Contributors

Stargazers

Watchers

Forkers

min-decision-transformer's Issues

Any plan to reproduce the atari experiments as well?

How to calculate the last result scores in the table?

Training is ok, but failed to eval.

The position of `dropout` operation are different from official repo ?

The calculation of state_mean and state_std in d4rl_info.py

oscillations of eval score

Debugging a custom gym environment

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent