Giter VIP home page Giter VIP logo

gdc's Introduction

Generative Distributional Control

Generative Distributional Control (GDC) is a general framework for imposing constraints on samples of pretrained language models. The constraints can be either pointwise (e.g. all samples must be non-offensive) or distributional (e.g. a specified percentage of samples must mention females).

This repo contains code accompanying the following three papers:

gdc's People

Contributors

germank avatar hadyelsahar avatar mukhal avatar tomekkorbak avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

gdc's Issues

learning with discriminator.json converges too slowly

Hi, thanks for the great work.

I'm trying to reproduce the experiment on pointwise constraints.
However, it seems the training converges too slowly specifically on discriminator.json, in contrast to other configurations.
After 7 days of training on 2 V100 gpus, I found the Eval/b(x)_mean is still fluctuating around 0.25~0.28 (compared to the result reported on paper). It seems Eval/KL(p || pi) is still decreasing, so I believe it's not converged yet.

Are there any differences between the configuration on this repo and the one the model that gives the result on the paper is trained with?
Or, could you give any recommendation to boost the training?

Thanks in advance!

effect of reducing batch_size and forward_batch_size

Hi ! I have a clarification question about batch_size and forward_batch_size in the experiment configurations.

The paper mentions that all experiments use batch size of 2048. This large batch size (together with a forward_batch_size of 256 as in configs/gdc/pointwise/word.json) requires full utilization of 48G of GPU memory and is unfortunately a bit too much for my computation device (I only have 12 G GPUs, sad ...).

That said, I also noticed that the batch_size and forward_batch_size is reduced to 64 and 32, respectively, in configs/gdc/pointwise/wordlist.json, while the rest of the configurations remain the same as in word.json. Using these smaller batch numbers, I was able to comfortably use my humble GPUs to train the GDC model.

My question is: how does the batch_size and forward_batch_size impact the results? Do you think I can reach similar performance if I use the provided wordlist.json configuration as is (i.e., without changing anything in the file), and in particular, with the smaller batch numbers (64 and 32, respectively)? Also, I am aware that I can use 2048 for batch_size and a small forward_batch_size such as 32, but this runs extremely slowly... 64 and 32 are preferred for my use case.

Thanks in advance! By the way, great work and thanks for sharing the code!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.