Giter VIP home page Giter VIP logo

goemotions_portuguese's Introduction

GoEmotions for Portuguese

This repository contains scripts for downloading, translating the datasets and perform the fine-tuning of a BERT model for portuguese emotion classification based on the GoEmotions dataset. Original and Ekman taxonomy are supported.

Step 1: Requirements

Create an environment, clone this repository and run:

pip install -r requirements.txt

Step 2: Dataset

First you should download and translate the original dataset, for this run the following script:

python translate.py

When the script finishes all the data needed should be on the dataset folder.

Step 3: Fine-tuning

Skip this part if you don't want to perform the fine-tuning of the model (use one of the available models in step 4)

To perform the fine-tuning you should run:

python run_training.py \
--model_output_dir fine_tuned_model

You can change the folder where you save the model by passing a folder name in the --model_output_dir argument

There are optional arguments you can pass:

--batch_size
Batch size for training (default = 16)
--max_seq_length
Maximum sequence length (default = 128)
--model_name
Name of the model to be used (default = "neuralmind/bert-base-portuguese-cased")
--n_epochs
Number of epochs to run fine tuning (default = 4)
--warmup_proportion
Float number between 0 and 1 that represents the proportion for warmup (default = 0.2)
--beta
Beta parameter for weighing method Class-Balanced Loss (default = 0.999)
--no_cuda
If passed True, gpu will not be used (default = False)
--seed
Seed for pseudo-random number generation for pytorch, numpy, python.random (default = 42)
--taxonomy
Select which taxonomy to be used, original or ekman (default = "original")
--resume_from_checkpoint
If passed, should be a path for a checkpoint file (default = None)
--accumulate_grad_batches
If passed, will accumulate gradients for the k epochs provided (default = 1)

Step 4: Fine-tuned Model Files

Skip this part if you performed the fine-tuning in step 3

Links for the fine-tuned models(this models are only for original taxonomy):

BERTimbau_base_GoEmotions_portuguese

BERTimbau_large_GoEmotions_portuguese

Unzip and put all the files in the 'fine_tuned_model' folder

Step 5: Using Fine-tuned Model

from transformers import BertTokenizer, BertForSequenceClassification, pipeline
from pprint import pprint

#Folder path containing the fine-tuned model files
model_path = 'fine_tuned_model'

model = BertForSequenceClassification.from_pretrained(model_path)
tokenizer = BertTokenizer.from_pretrained(model_path)


classifier = pipeline('text-classification', model=model, tokenizer=tokenizer, return_all_scores=True)

threshold = 0.3

inputs = [
	'Eu te amo',
	'Eu acho que você é uma ótima pessoa',
	'Eu odeio aquele cara',
	]

output = classifier(inputs)

predictions = []

for prediction in output:
	predictions.append(list(x for x in prediction if x['score']>= threshold))

pprint(predictions)

# Output
# [[{'label': 'amor', 'score': 0.9658263325691223}],
#  [{'label': 'admiração', 'score': 0.9569578170776367}],
#  [{'label': 'raiva', 'score': 0.6997460126876831}]]

goemotions_portuguese's People

Contributors

luzo0 avatar

Stargazers

Alejandro Alberoni avatar Laís Piai avatar Maurício de Carvalho Lima avatar Átila Melo avatar Daniel Carvalho avatar Jorge Ivan avatar Gabriele Araújo avatar  avatar  avatar VitorBSP avatar Antonio M. A. Menezes avatar Fernanda Malheiros Assi avatar Luciano Vargas avatar Mateus Machado avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.