Comments (11)
The labels provided by @valentintsl are not balanced. I'm using the script below to create these datasets:
import pandas as pd
with open('Sentences_50Agree.txt', 'rb') as f:
data = f.read().decode(errors='ignore')
df = pd.DataFrame([x.split('@') for x in data.strip().split('\r\n')], columns=['text', 'label'])
pos = df.query('label=="positive"')
pos = pos.sample(len(pos), random_state=0) # shuffle samples
neg = df.query('label=="negative"')
neg = neg.sample(len(neg), random_state=0)
neu = df.query('label=="neutral"')
neu = neu.sample(len(neu), random_state=0)
n_pos = int(len(pos)*0.2)
n_neg = int(len(neg)*0.2)
n_neu = int(len(neu)*0.2)
pd.concat([pos[:-n_pos*2], neg[:-n_neg*2], neu[:-n_neu*2]], axis=0).to_csv('train.csv', sep='\t')
pd.concat([pos[-n_pos*2:-n_pos], neg[-n_neg*2:-n_neg], neu[-n_neu*2:-n_neu]], axis=0).to_csv('validation.csv', sep='\t')
pd.concat([pos[-n_pos:], neg[-n_neg:], neu[-n_neu:]], axis=0).to_csv('test.csv', sep='\t')
from finbert.
Hello mates !
After doing a thorough search of the program, I found the architecture of the train, test, validation inputs.
The csv files need to have the following column names [text, label] WITH THE INDEX.
The data needs to be separated by tab '\t'.
I attach the csv files for the Financial Phrase Bank dataset that I made.
FinancialPhraseBankforFinBERT.zip
Valentin TASSEL
from finbert.
Hi, how to setup and create train.csv, validation.csv, test.csv from Financial Pharase Bank data?
The same here.
from finbert.
@emes83 or maybe we need to create them on our own from the link https://www.researchgate.net/publication/251231364_FinancialPhraseBank-v10
from finbert.
@emes83 or maybe we need to create them on our own from the link https://www.researchgate.net/publication/251231364_FinancialPhraseBank-v10
Maybe, I believe that, yes, but how?. I'm wondering how to do this to adapt the solution to another language in the future.
from finbert.
@emes83 i try this data . #5 (comment) .maybe you should try.
But still ,I get error IndexError: list index out of range
when running train_data = finbert.get_data('train')
from finbert.
@emes83 i try this data . #5 (comment) .maybe you should try.
But still ,I get errorIndexError: list index out of range
when runningtrain_data = finbert.get_data('train')
The same here.
from finbert.
I am getting the same error IndexError: list index out of range and the dataset in the link has invalid characters as well. Anyone could fix the issue?
from finbert.
Can somebody please share train/validation/test files? In proper/working format?
from finbert.
trained_model = finbert.train(train_examples = train_data, model = model)
Error is
TypeError Traceback (most recent call last)
<ipython-input-11-2ebf0cb3d4e8> in <module>
----> 1 trained_model = finbert.train(train_examples = train_data, model = model)
~\finBERT-master\finbert\finbert.py in train(self, train_examples, model)
482 print('No best model found')
483 torch.save({'epoch': str(i), 'state_dict': model.state_dict()},
--> 484 self.config.model_dir / ('temporary' + str(i)))
485 best_model = i
486
TypeError: unsupported operand type(s) for /: 'str' and 'str'
Can anyone check this and help me, please?
from finbert.
You can find the instructions to create these files on the updated README.
from finbert.
Related Issues (20)
- Preprocessing using TRC2
- Using Finbert for 240 multilabel multiclass classification HOT 1
- AxisError when call predict via REST API on Flask HOT 2
- pip install transformers is necessary to Dockerfile HOT 1
- error using predict.py HOT 4
- no code for FiQA sentiment classification task? HOT 1
- ad Gateway for url: https://huggingface.co/bert-base-uncased/resolve/main/config.json
- Sentence Representation Layer
- unable to parse tokenizer_config.json HOT 1
- TypeError: ord() expected a character, but string of length 69 found HOT 1
- pretrained model assignment HOT 1
- Understanding the output HOT 3
- Incorrect prediction Using Huggingface Transformers converted to ONNX format HOT 1
- Questions about regression HOT 1
- Tokenizer HOT 2
- Size of training data
- Is Pretrained only FinBert available HOT 1
- 'FinBert' object has no attribute 'class_weights'
- Unable to run finbert on R HOT 1
- help me to create the dataset for custom data for fine tuning
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from finbert.