sea-snell / implicit-language-q-learning Goto Github PK

View Code? Open in Web Editor NEW

192.0 192.0 17.0 1.17 MB

Official code from the paper "Offline RL for Natural Language Generation with Implicit Language Q Learning"

Home Page: https://sea-snell.github.io/ILQL_site/

License: MIT License

Python 86.50% Shell 13.50%

implicit-q-learning iql language-model nlp offline-rl python pytorch q-learning reinforcement-learning

implicit-language-q-learning's People

Contributors

Stargazers

Watchers

Forkers

mohan-zhang-u hdvvip stjordanis abdulhaim techthiyanes james4ever0 jack32666 zero506 gverkes rgaurg jxihong hushuitian thesarahtops dkimds romanbvd jeremy-qin

implicit-language-q-learning's Issues

Error Running Monte Carlo Policy for Wordle

Hi. When I try to run the Wordle game using the monte carlo policy, I get an error, and it fails to run. I am using Python 3.9.7 on Ubuntu 22.04.

Here is my terminal output:

(I hope the screenshot works)
The code that I am running is composed of the playing Wordle script you provided (https://github.com/Sea-Snell/Implicit-Language-Q-Learning#playing-wordle), just edited with the basic monte carlo policy script (https://github.com/Sea-Snell/Implicit-Language-Q-Learning#montecarlopolicy), and stuff to output the reward.

I appreciate any help you can provide to help fix this error. If you need more information, please don't hesitate to ask.
Thanks!

Is it possible to release the code based on jax

Hi, really great repo.

May I ask is it possible to release the code based on Jax.

Best

Question about stds reported in the paper

Hello!

Could you please explain how stds for performance are calculated in your paper? In RL std is usually reported over different training seeds but I haven't seen the seed argument for the training. I also didn't find information about it in the paper.

Question about the max steps

Hi，
Thanks for your good job. However, when I try to reproduce the results in the paper, I am confused about how to set the max steps (Null in config.yaml).
I appreciate any suggestions.

Could not find the euclidean distance based reward cache for the visual dialogue task

Dear authors,

Thank you for your great work as well as sharing this comprehensive code base. I was wondering if you could please share the euclidean distance based reward cache for the visual dialogue task, i.e., the point ii below as quoted from the documentation. I have downloaded the data.zip from the link to Google drive and it seems not to contain [split]_rank_reward_cache2.json. Thank you!

reward_cache: Optional[str]=None – where the rewards for each dialogue are stored. If None, it will set all rewards to None. We provide caches for two reward functions:

i.The reward for the percentile-rank reward function we used in our paper is cached at: data/vis_dialogue/processed/visdial_0.5/[split]_rank_reward_cache1.json, where [split] is replaced by one of train, val, or test.

ii.The euclidean distance based reward used by the paper Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning is cached at: data/vis_dialogue/processed/visdial_0.5/[split]_reward_cache2.json, where [split] is replaced by one of train, val, or test.

There is no token_reward in wordle train_bc file

Dear authors,

Thanks for your great job. However, when I try to run the train_bc.py in wordle file. I got the error that

  File "/data2/xxx/Implicit-Language-Q-Learning/src/wordle/load_objects.py", line 99, in load_human_dataset
    token_reward = load_item(config['token_reward'], device, verbose=verbose)
KeyError: 'token_reward'

Could you please tell me more about the solution?

I also have another question why we should run train_bc and get the weight first? Why cannot we train iql directly?

Thanks!

Question on indexing in Q loss

Hello! Could you please explain me why in the Q loss function

Implicit-Language-Q-Learning/src/models/iql_model.py

Line 355 in 0dc5561

def get_q_loss(self, vns, qs, rs, gamma, terminals):

we have this multiplication: (1 - terminals[:, 1:]) * vns? Shouldn't it be (1 - terminals[:, :-1]) * vns? If code is correct what is the logic behind indexing than?

A question on beta hyperparameter

Hello, thanks for your work!

I was going through the code and we have this line for advantage:
https://github.com/Sea-Snell/Implicit-Language-Q-Learning/blob/4af8c5c12baf69c743dc9753e377a427109b7e93/src/models/iql_model.py#L329C15-L329C15

And the beta parameter is supposed to control the impact of reweighing. But in all configs beta is set to 0

Implicit-Language-Q-Learning/config/wordle/train_iql.yaml

Line 23 in 4af8c5c

beta: 0.0

Implicit-Language-Q-Learning/config/toxicity/train_iql.yaml

Line 43 in 4af8c5c

beta: 0.0

Implicit-Language-Q-Learning/config/vis_dial/train_iql.yaml

Line 35 in 4af8c5c

beta: 0.0

Which means that reweighting does not affect output anyhow which is not what we want I think. Am I missing something or is it just incorrect values in configs? Default value of beta is 1 for the model and in paper you report using other values for that paramerter, while configs set it to 0.

sea-snell / implicit-language-q-learning Goto Github PK

implicit-language-q-learning's People

Contributors

Stargazers

Watchers

Forkers

implicit-language-q-learning's Issues

Error Running Monte Carlo Policy for Wordle

Is it possible to release the code based on jax

Question about stds reported in the paper

Question about the max steps

Could not find the euclidean distance based reward cache for the visual dialogue task

There is no token_reward in wordle train_bc file

Question on indexing in Q loss

A question on beta hyperparameter

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent