Giter VIP home page Giter VIP logo

finbert's Introduction

FinBERT

***** June 2, 2022: More fine-tuned FinBERT models available*****

Visit FinBERT.AI for more details on the recent development of FinBERT.

We have fine-tuned FinBERT pretrained model on several financial NLP tasks, all outperforming traditional machine learning models, deep learning models, and fine-tuned BERT models. All the fine-tuned FinBERT models are publicly hosted at Huggingface ๐Ÿค—. Specifically, we have the following:

  • FinBERT-Pretrained: The pretrained FinBERT model on large-scale financial text. link
  • FinBERT-Sentiment: for sentiment classification task. link
  • FinBERT-ESG: for ESG classification task. link
  • FinBERT-FLS: for forward-looking statement (FLS) classification task. link

In this Github repo,

Background:

FinBERT is a BERT model pre-trained on financial communication text. The purpose is to enhance finaincal NLP research and practice. It is trained on the following three finanical communication corpus. The total corpora size is 4.9B tokens.

  • Corporate Reports 10-K & 10-Q: 2.5B tokens
  • Earnings Call Transcripts: 1.3B tokens
  • Analyst Reports: 1.1B tokens

FinBERT results in state-of-the-art performance on various financial NLP task, including sentiment analysis, ESG classification, forward-looking statement (FLS) classification. With the release of FinBERT, we hope practitioners and researchers can utilize FinBERT for a wider range of applications where the prediction target goes beyond sentiment, such as financial-related outcomes including stock returns, stock volatilities, corporate fraud, etc.


***** July 30, 2021: migrated to Huggingface ๐Ÿค—*****

The fine-tuned FinBERT model for financial sentiment classification has been uploaded and integrated with Huggingface's transformers library. This model is fine-tuned on 10,000 manually annotated (positive, negative, neutral) sentences from analyst reports. This model achieves superior performance on financial tone anlaysis task. If you are simply interested in using FinBERT for financial tone analysis, give it a try.

from transformers import BertTokenizer, BertForSequenceClassification
import numpy as np

finbert = BertForSequenceClassification.from_pretrained('yiyanghkust/finbert-tone',num_labels=3)
tokenizer = BertTokenizer.from_pretrained('yiyanghkust/finbert-tone')

sentences = ["there is a shortage of capital, and we need extra financing", 
             "growth is strong and we have plenty of liquidity", 
             "there are doubts about our finances", 
             "profits are flat"]

inputs = tokenizer(sentences, return_tensors="pt", padding=True)
outputs = finbert(**inputs)[0]

labels = {0:'neutral', 1:'positive',2:'negative'}
for idx, sent in enumerate(sentences):
    print(sent, '----', labels[np.argmax(outputs.detach().numpy()[idx])])
    
'''
there is a shortage of capital, and we need extra financing ---- negative
growth is strong and we have plenty of liquidity ---- positive
there are doubts about our finances ---- negative
profits are flat ---- neutral
'''
    

***** Jun 16, 2020: Pretrained FinBERT Model Released*****

We provide four versions of pre-trained FinBERT weights.

FinVocab is a new WordPiece vocabulary on our finanical corpora using the SentencePiece library. We produce both cased and uncased versions of FinVocab, with sizes of 28,573 and 30,873 tokens respectively. This is very similar to the 28,996 and 30,522 token sizes of the original BERT cased and uncased BaseVocab.

Citation

@misc{yang2020finbert,
    title={FinBERT: A Pretrained Language Model for Financial Communications},
    author={Yi Yang and Mark Christopher Siy UY and Allen Huang},
    year={2020},
    eprint={2006.08097},
    archivePrefix={arXiv},
    }

Contact

Please post a Github issue or contact [email protected] if you have any questions.

finbert's People

Contributors

yya518 avatar mcsuy1998 avatar mikaeldusenne avatar mcsuy avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.