Giter VIP home page Giter VIP logo

sea-snell / implicit-language-q-learning Goto Github PK

View Code? Open in Web Editor NEW
192.0 192.0 17.0 1.17 MB

Official code from the paper "Offline RL for Natural Language Generation with Implicit Language Q Learning"

Home Page: https://sea-snell.github.io/ILQL_site/

License: MIT License

Python 86.50% Shell 13.50%
implicit-q-learning iql language-model nlp offline-rl python pytorch q-learning reinforcement-learning

implicit-language-q-learning's People

Contributors

sea-snell avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

implicit-language-q-learning's Issues

Error Running Monte Carlo Policy for Wordle

Hi. When I try to run the Wordle game using the monte carlo policy, I get an error, and it fails to run. I am using Python 3.9.7 on Ubuntu 22.04.

Here is my terminal output:
image

(I hope the screenshot works)
The code that I am running is composed of the playing Wordle script you provided (https://github.com/Sea-Snell/Implicit-Language-Q-Learning#playing-wordle), just edited with the basic monte carlo policy script (https://github.com/Sea-Snell/Implicit-Language-Q-Learning#montecarlopolicy), and stuff to output the reward.

I appreciate any help you can provide to help fix this error. If you need more information, please don't hesitate to ask.
Thanks!

Question about stds reported in the paper

Hello!

Could you please explain how stds for performance are calculated in your paper? In RL std is usually reported over different training seeds but I haven't seen the seed argument for the training. I also didn't find information about it in the paper.

Question about the max steps

Hi,
Thanks for your good job. However, when I try to reproduce the results in the paper, I am confused about how to set the max steps (Null in config.yaml).
I appreciate any suggestions.

Could not find the euclidean distance based reward cache for the visual dialogue task

Dear authors,

Thank you for your great work as well as sharing this comprehensive code base. I was wondering if you could please share the euclidean distance based reward cache for the visual dialogue task, i.e., the point ii below as quoted from the documentation. I have downloaded the data.zip from the link to Google drive and it seems not to contain [split]_rank_reward_cache2.json. Thank you!

reward_cache: Optional[str]=None – where the rewards for each dialogue are stored. If None, it will set all rewards to None. We provide caches for two reward functions:

i.The reward for the percentile-rank reward function we used in our paper is cached at: data/vis_dialogue/processed/visdial_0.5/[split]_rank_reward_cache1.json, where [split] is replaced by one of train, val, or test.

ii.The euclidean distance based reward used by the paper Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning is cached at: data/vis_dialogue/processed/visdial_0.5/[split]_reward_cache2.json, where [split] is replaced by one of train, val, or test.

There is no token_reward in wordle train_bc file

Dear authors,

Thanks for your great job. However, when I try to run the train_bc.py in wordle file. I got the error that

  File "/data2/xxx/Implicit-Language-Q-Learning/src/wordle/load_objects.py", line 99, in load_human_dataset
    token_reward = load_item(config['token_reward'], device, verbose=verbose)
KeyError: 'token_reward'

Could you please tell me more about the solution?

I also have another question why we should run train_bc and get the weight first? Why cannot we train iql directly?

Thanks!

A question on beta hyperparameter

Hello, thanks for your work!

I was going through the code and we have this line for advantage:
https://github.com/Sea-Snell/Implicit-Language-Q-Learning/blob/4af8c5c12baf69c743dc9753e377a427109b7e93/src/models/iql_model.py#L329C15-L329C15

And the beta parameter is supposed to control the impact of reweighing. But in all configs beta is set to 0




Which means that reweighting does not affect output anyhow which is not what we want I think. Am I missing something or is it just incorrect values in configs? Default value of beta is 1 for the model and in paper you report using other values for that paramerter, while configs set it to 0.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.