how to create train.csv, validation.csv, test.csv about finbert HOT 11 CLOSED

prosusai commented on September 15, 2024

how to create train.csv, validation.csv, test.csv

from finbert.

Comments (11)

evanzd commented on September 15, 2024 5

The labels provided by @valentintsl are not balanced. I'm using the script below to create these datasets:

import pandas as pd

with open('Sentences_50Agree.txt', 'rb') as f:
    data = f.read().decode(errors='ignore')

df = pd.DataFrame([x.split('@') for x in data.strip().split('\r\n')], columns=['text', 'label'])

pos = df.query('label=="positive"')
pos = pos.sample(len(pos), random_state=0) # shuffle samples

neg = df.query('label=="negative"')
neg = neg.sample(len(neg), random_state=0)

neu = df.query('label=="neutral"')
neu = neu.sample(len(neu), random_state=0)

n_pos = int(len(pos)*0.2)
n_neg = int(len(neg)*0.2)
n_neu = int(len(neu)*0.2)

pd.concat([pos[:-n_pos*2], neg[:-n_neg*2], neu[:-n_neu*2]], axis=0).to_csv('train.csv', sep='\t')
pd.concat([pos[-n_pos*2:-n_pos], neg[-n_neg*2:-n_neg], neu[-n_neu*2:-n_neu]], axis=0).to_csv('validation.csv', sep='\t')
pd.concat([pos[-n_pos:], neg[-n_neg:], neu[-n_neu:]], axis=0).to_csv('test.csv', sep='\t')

from finbert.

valentintsl commented on September 15, 2024 4

Hello mates !
After doing a thorough search of the program, I found the architecture of the train, test, validation inputs.
The csv files need to have the following column names [text, label] WITH THE INDEX.
The data needs to be separated by tab '\t'.

I attach the csv files for the Financial Phrase Bank dataset that I made.

FinancialPhraseBankforFinBERT.zip

Valentin TASSEL

from finbert.

emes83 commented on September 15, 2024

Hi, how to setup and create train.csv, validation.csv, test.csv from Financial Pharase Bank data?

The same here.

from finbert.

akmalsabri commented on September 15, 2024

@emes83 or maybe we need to create them on our own from the link https://www.researchgate.net/publication/251231364_FinancialPhraseBank-v10

from finbert.

emes83 commented on September 15, 2024

@emes83 or maybe we need to create them on our own from the link https://www.researchgate.net/publication/251231364_FinancialPhraseBank-v10

Maybe, I believe that, yes, but how?. I'm wondering how to do this to adapt the solution to another language in the future.

from finbert.

akmalsabri commented on September 15, 2024

@emes83 i try this data . #5 (comment) .maybe you should try.
But still ,I get error IndexError: list index out of range when running train_data = finbert.get_data('train')

from finbert.

emes83 commented on September 15, 2024

@emes83 i try this data . #5 (comment) .maybe you should try.
But still ,I get error IndexError: list index out of range when running train_data = finbert.get_data('train')

The same here.

from finbert.

praslisa commented on September 15, 2024

I am getting the same error IndexError: list index out of range and the dataset in the link has invalid characters as well. Anyone could fix the issue?

from finbert.

emes83 commented on September 15, 2024

Can somebody please share train/validation/test files? In proper/working format?

from finbert.

nithinreddyy commented on September 15, 2024

trained_model = finbert.train(train_examples = train_data, model = model)

Error is

TypeError                                 Traceback (most recent call last)
<ipython-input-11-2ebf0cb3d4e8> in <module>
----> 1 trained_model = finbert.train(train_examples = train_data, model = model)

~\finBERT-master\finbert\finbert.py in train(self, train_examples, model)
    482                     print('No best model found')
    483                 torch.save({'epoch': str(i), 'state_dict': model.state_dict()},
--> 484                            self.config.model_dir / ('temporary' + str(i)))
    485                 best_model = i
    486 

TypeError: unsupported operand type(s) for /: 'str' and 'str'

Can anyone check this and help me, please?

from finbert.

doguaraci commented on September 15, 2024

You can find the instructions to create these files on the updated README.

from finbert.

how to create train.csv, validation.csv, test.csv about finbert HOT 11 CLOSED

Comments (11)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent