Giter VIP home page Giter VIP logo

finbert's Introduction

FinBERT: Financial Sentiment Analysis with BERT

FinBERT is an NLP model to analyze the sentiment of financial text. It is built by further training the BERT language model on a large financial corpus and fine-tuning it for financial sentiment classification. For the details, please see FinBERT: Financial Sentiment Analysis with Pre-trained Language Models.

Important Note: FinBERT implementation relies on Hugging Face's pytorch_pretrained_bert library and their implementation of BERT for sequence classification tasks. pytorch_pretrained_bert is an earlier version of the transformers library. It is on the top of our priority to migrate the code for FinBERT to transformers in the near future.

Installing

Before cloning the repository, make sure you have git-lfs installed on your environment. The instructions to do so can be found here. Install the dependencies by creating the Conda environment finbert from the given environment.yml file and activating it.

conda env create -f environment.yml
conda activate finbert

Models

There are two models in this repo. One is the language model that has been further pre-trained on Reuters TRC2 and classifier model that has been fine-tuned on Financial Phrasebank.

Datasets

There are two datasets used for FinBERT. The language model further training is done on a subset of Reuters TRC2 dataset. This dataset is not public, but researchers can apply for access here.

For the sentiment analysis, we used Financial Phrase Bank from Malo et al. (2014). The dataset can be downloaded from this link. If you want to train the model on the same dataset, after downloading it, you should create three files under the data/sentiment_data folder as train.csv, validation.csv, test.csv.

Training the model

Training is done in finbert_training.ipynb notebook. The trained model will be saved to models/classifier_model/finbert-sentiment. You can find the training parameters in the notebook as follows:

config = Config(   data_dir=cl_data_path,
                   bert_model=bertmodel,
                   num_train_epochs=4.0,
                   model_dir=cl_path,
                   max_seq_length = 64,
                   train_batch_size = 32,
                   learning_rate = 2e-5,
                   output_mode='classification',
                   warm_up_proportion=0.2,
                   local_rank=-1,
                   discriminate=True,
                   gradual_unfreeze=True )

The last two parameters discriminate and gradual_unfreeze determine whether to apply the corresponding technique against catastrophic forgetting.

Getting predictions

We provide a script to quickly get sentiment predictions using FinBERT. Given a .txt file, predict.py produces a .csv file including the sentences in the text, corresponding softmax probabilities for three labels, actual prediction and sentiment score (which is calculated with: probability of positive - probability of negative).

Here's an example with the provided example text: test.txt. From the command line, simply run:

python predict.py --text_path test.txt --output_dir output/ --model_path models/classifier_model/finbert-sentiment

Disclaimer

This is not an official Prosus product. It is the outcome of an intern research project in Prosus AI team.

About Prosus

Prosus is a global consumer internet group and one of the largest technology investors in the world. Operating and investing globally in markets with long-term growth potential, Prosus builds leading consumer internet companies that empower people and enrich communities. For more information, please visit www.prosus.com.

Contact information

Please contact Dogu Araci dogu.araci[at]naspers[dot]com and Zulkuf Genc zulkuf.genc[at]naspers[dot]com about any FinBERT related issues and questions.

finbert's People

Contributors

theofpa avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.