The examples explain optimizing GPT-2 text generation, but can it be also used to opti

Usage for masked word prediction about trl HOT 2 CLOSED

huggingface commented on September 26, 2024

Usage for masked word prediction

from trl.

Comments (2)

ozyyshr commented on September 26, 2024

Hi, thanks for the great work. I also want to know whether and how it can be used for masked token predictions. Thanks in advance!

from trl.

lvwerra commented on September 26, 2024

Reinforcement learning is designed for sequential decision problems and thus works well for causal language modeling (such as GPT-2). BERT however does not fall in that category since it is a one-shot prediction and not a sequential prediction such as in language modeling. So I don't think it is straight forward to adapt this approach.

from trl.

Related Issues (20)

`PPOTrainer` OOM Error Because of Forced Upcast to `torch.float32` HOT 1
how to convert dpodata to ktodata HOT 1
[Question] Why TR-DPO default alpha and tau don't match the values suggested in the paper?
Why is num_labels=1 in the reward_madeling.py script? HOT 1
Correct masking when the same roles are present in adjacent messages in DataCollatorForCompletionOnlyLM
CUDA error: device-side assert triggered HOT 3
Does PPOV2 not support PEFT or Lora?
Can DPO be used to shorten the model response length preference? HOT 2
Fine-tune large vision language model for chat completion only HOT 2
Question: Does `trl` support training on AMD GPUs? HOT 2
`OnPolicyConfig`: Change `non_eos_penalty` to be more clearly documented and consistent across different trainers HOT 5
Can model be `None`? HOT 1
Example Scripts/Commands for Creating SFT & Reward Models for PPO/RLOO/Other Trainers HOT 2
Different with online dpo papers HOT 1
finetuning gemma2-2b with multi-gpu get OOM, how do i only do model sharding and no data parallel(i guess it's going into DDP). HOT 2
I'm closing this issue because no one has provided any code to reproduce the error. If you think you are facing the same problem, please open another issue (link this one) with a precise description and a code to reproduce the error.
Negative Entropy in TRL PPOv2Trainer TLDR Example HOT 3
Clarification of 2 Entropies in PPOv2Trainer Documentation
`OnPolicyConfig` - Rename or revise `num_sample_generations` HOT 1
AttributeError: 'PPOv2Trainer' object has no attribute 'deepspeed' HOT 3

Usage for masked word prediction about trl HOT 2 CLOSED

Comments (2)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent