vinairesearch / bartpho Goto Github PK

BARTpho: Pre-trained Sequence-to-Sequence Models for Vietnamese (INTERSPEECH 2022)

License: MIT License

bartpho sequence-to-sequence bart pretrained-models text-summarization vietnamese-nlp

bartpho's Issues

Using BARTpho for Text summarization on Vietnamese

Thank you so much for this fantastic work. I'm currently doing research on BART model and having some questions regarding to BARTpho model in need for elaborating.
I have been using this training paradigm https://github.com/yixinL7/BRIO for text summarization, they used the pretrained BART model, facebook/bart-large-cnn as baseline and achieved pretty good result on CNN/DM dataset. I figured I could replaced the baseline model with BARTpho and do the same with my custom dataset. But the cross validation during training was quite poor no matter how I change the configuration.
So my question are:

Do I need to finetune BARTpho model on my custom dataset, I guess the facebook/bart-large-cnn model is able to achieve good result because it already trained on CNN/DM dataset.
If I do then can you show me how to finetune BARTpho on colab, since it a large model and I don't think colab has enough resources.

Please give sample code for text summarization

Hello @datquocnguyen ,
Can you provide a sample code to do text summarization task with this model?

Thank you.

Cannot load BartPhoTokenizer from pretrained when work with huggingface/transformers

Hi authors,

I get an error when I load the tokenizer from pretrained as bellow:

from transformers import AutoModel, AutoTokenizer
bartpho_syllable = AutoModel.from_pretrained("vinai/bartpho-syllable")
syllable_tokenizer = AutoTokenizer.from_pretrained("vinai/bartpho-syllable", use_fast=True)

AttributeError: module transformers.models.bartpho has no attribute BartphoTokenizer

I'm using Google Colab. Could you guys let me know how to overcome this issue.
Thanks.

Do you have plan to release bartpho version 2 ?

Hi Mr. Đạt,
Do you plan to release version 2 of bartpho-word-base which will be trained on 20gb of Wikipedia and news text + 120 GB of texts from OSCAR-2301 like phobert-base ?

Multiple Mask Tokens

I want to ask about Multiple Mask Tokens. For example TXT = "chúng tôi [mask] nghiên [mask] viên" I want to return the top_k of the [mask] at position 1 and the top_k of the [mask] at position 2 at the same time, does the model support it?

What is your decoder_start_token_id

Could you update the decoder_start_token_id in the model.config because I found None in the config.json

Thank you.

[deleted]

Pretraining BARTpho

How many epochs is BARTpho trained?

Whrere is `config.json` file?

Hi @datquocnguyen,

I am so attracted by your project. I followed your tutorial and try to train a new model, but I can not find config.json in fairseq-bartpho-word.zip. Can you tell me how to get it?

Thank you.

fine-tuning BARTpho with vinai/bartpho-syllabus, error undefined fairseqs_ids_to_tokens and 'unk' when using vinai/bartpho-word

Error undefined fairseqs_ids_to_tokens, I'm trying to fine-tune based on vinai/bartpho-syllabus, can you suggest a solution for this issue?

"I have tried as instructed here: [https://github.com//issues/2#issuecomment-1146988402]"

"I have additionally followed the instructions provided in this CSV file format:

When I use vinai/bartpho-word for training, there is no issue with the training process itself, but the prediction results show the appearance of the 'unk' character.

How can I do fine-tuning BARTpho for text summarization ?

When I call BARTpho-word via AutoModelForSeq2SeqLM to fine tune text summarization task, the decoder_start_token_id is None. How can I load correctly the model ? or how can I fine tune BARTpho for text summarization on my dataset ? My code is below:
model = AutoModelForSeq2SeqLM.from_pretrained(model_args.model_name_or_path)
print(model.config.decoder_start_token_id) None

Code to train model to reproduce the model and evaluate in all experiments

I would like to know the source code related to Section 3 and 4 in the paper.

Train model with unsupervised denoising objective

Hi authors,
I plan to pretrain BARTpho model on my custom vietnamese datasets with denoising objective (text infilling + sentence permutation as suggested in your paper). Having checked all issues and found this related one: #8, however, I still cannot find any example/notebooks in your given HF link which shows an instruction on how to pretrain BART on a custom dataset in denoising manner.

Could you please provide me with the link to pretrain BART? It would be very grateful.

Cannot load BartPhoTokenizer when i load the tokenizer use syllabel

When i try to load to load the tokenizer use syllable, i get an error "module transformers.models.bartpho has no attribute BartPhoTokenizer".
i used transformers 4.15.0. I tried to clean my folder ~/.cache/huggingface/transformers but i still get the same error.
Can you help me! Thanks.

vinairesearch / bartpho Goto Github PK

bartpho's Issues

Using BARTpho for Text summarization on Vietnamese

Please give sample code for text summarization

Cannot load BartPhoTokenizer from pretrained when work with huggingface/transformers

I get an error when I load the tokenizer from pretrained as bellow:

from transformers import AutoModel, AutoTokenizer
bartpho_syllable = AutoModel.from_pretrained("vinai/bartpho-syllable")
syllable_tokenizer = AutoTokenizer.from_pretrained("vinai/bartpho-syllable", use_fast=True)

Do you have plan to release bartpho version 2 ?

Multiple Mask Tokens

What is your decoder_start_token_id

[deleted]

Pretraining BARTpho

Whrere is `config.json` file?

fine-tuning BARTpho with vinai/bartpho-syllabus, error undefined fairseqs_ids_to_tokens and 'unk' when using vinai/bartpho-word

How can I do fine-tuning BARTpho for text summarization ?

Code to train model to reproduce the model and evaluate in all experiments

Train model with unsupervised denoising objective

Cannot load BartPhoTokenizer when i load the tokenizer use syllabel

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

vinairesearch / bartpho Goto Github PK

bartpho's Issues

I get an error when I load the tokenizer from pretrained as bellow:

from transformers import AutoModel, AutoTokenizer bartpho_syllable = AutoModel.from_pretrained("vinai/bartpho-syllable") syllable_tokenizer = AutoTokenizer.from_pretrained("vinai/bartpho-syllable", use_fast=True)

Recommend Projects

Recommend Topics

Recommend Org

from transformers import AutoModel, AutoTokenizer
bartpho_syllable = AutoModel.from_pretrained("vinai/bartpho-syllable")
syllable_tokenizer = AutoTokenizer.from_pretrained("vinai/bartpho-syllable", use_fast=True)