labteral / ernie Goto Github PK
View Code? Open in Web Editor NEWSimple State-of-the-Art BERT-Based Sentence Classification with Keras / TensorFlow 2. Built with HuggingFace's Transformers.
License: Apache License 2.0
Simple State-of-the-Art BERT-Based Sentence Classification with Keras / TensorFlow 2. Built with HuggingFace's Transformers.
License: Apache License 2.0
In your project root you have the LICENSE file which is apache-2.0
But in your setup.py file here and here, you have the license as gnu-gpl-3
.
Those are incompatible and it would be great if you clarified.
I'm also trying to use this in nexB/scancode-results-analyzer and would you explain why transformers is locked at an old version?
Hi,
I'm trying to install ernie and it seems to require an inconsistent version of pandas.
https://pasteboard.co/IXXIry1.png
Can you advise on what to do?
This runs on colab just fine.
It also throws errors that it cannot find TensorFlow even though tf is installed.
from ernie import SplitStrategies, AggregationStrategies
texts = ["Oh, that's great!", "That's really bad"]
probabilities = classifier.predict(texts,
split_strategy=SplitStrategies.GroupedSentencesWithoutUrls,
aggregation_strategy=AggregationStrategies.Mean)
Hi there, fist thank you for your great work. It looks like this project only supports English now, is there any plan for supporting other languages such as Chinese or French?
Hi,
Thanks for a nice library. Is it possible to use a custom BERT (huggingface) model to be used in sentence classification pipeline?
Hi
classifier = SentenceClassifier(model_name=Models.BertBaseUncased, max_length=128, labels_no=2)
classifier.load_dataset(train ,validation_split=0.2)
classifier.fine_tune(epochs=4, learning_rate=2e-5, training_batch_size=32, validation_batch_size=64)
error
KeyError Traceback (most recent call last)
/opt/anaconda3/lib/python3.7/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
2645 try:
-> 2646 return self._engine.get_loc(key)
2647 except KeyError:
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 0
During handling of the above exception, another exception occurred:
KeyError Traceback (most recent call last)
<ipython-input-6-17537c46bdcd> in <module>
1 classifier = SentenceClassifier(model_name=Models.BertBaseUncased, max_length=128, labels_no=2)
----> 2 classifier.load_dataset(train ,validation_split=0.2)
3 classifier.fine_tune(epochs=4, learning_rate=2e-5, training_batch_size=32, validation_batch_size=64)
/opt/anaconda3/lib/python3.7/site-packages/ernie/ernie.py in load_dataset(self, dataframe, csv_path, validation_split)
251 raise NotImplementedError
252
--> 253 sentences = list(dataframe[0])
254 labels = dataframe[1].values
255
/opt/anaconda3/lib/python3.7/site-packages/pandas/core/frame.py in __getitem__(self, key)
2798 if self.columns.nlevels > 1:
2799 return self._getitem_multilevel(key)
-> 2800 indexer = self.columns.get_loc(key)
2801 if is_integer(indexer):
2802 indexer = [indexer]
/opt/anaconda3/lib/python3.7/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
2646 return self._engine.get_loc(key)
2647 except KeyError:
-> 2648 return self._engine.get_loc(self._maybe_cast_indexer(key))
2649 indexer = self.get_indexer([key], method=method, tolerance=tolerance)
2650 if indexer.ndim > 1 or indexer.size > 1:
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 0
Thank you for the convenient interface for HuggingFace.
In batch mode, DistilBertBaseUncased scores 21 texts/second with a modest CPU, but I would like to score millions of texts, so I would like to compare other models such as SqueezeBERT and MobileBERT. Would you be willing to add support for some more models?
Thanks for this great package.
While finding the optimal learning rate using Keras Lr Finder , how could this be incorporated.
from keras_lr_finder import LRFinder
classifier=SentenceClassifier(model_name=Models.BertBaseUncased,max_length=256, labels_no=2)
classifier.load_dataset(train1,validation_split=0.1)
lr_finder = LRFinder(classifier)
lr_finder.find(classifier, 0.0001, 1, 5, 1)
This gives an error . Could you please suggest an alternative for using this or any other way for getting the optimal learning rate.
Hi, after saving the model in the folder and for loading using these sentences:
from ernie import SentenceClassifier, Models
classifier=SentenceClassifier('../input/model-predictions/ernie-autosave/bert/1592945713203/')
I am getting a biased error, each and every time I am loading the saved model in this way but predictions seem to be always 1 even for 0 class?
Look into this issue.
Nice work!
I often start out with much more unlabelled than labelled data. Is it possible to do masked language model fine-tuning (without the classification head) to start with on the full set of data before adding the classifier?
If not, would a second best approach be to do it iteratively i.e. train on the small amount of labelled data, predict for the unlabelled data, fine tune on the labels & predictions and then re-train just on the labelled data?
In load_dataset.py, sentences and labels are hardcoded to take first and second column of input dataframe. Is there a way to use Ernie if I have >1 features?
sentences = list(dataframe[dataframe.columns[0]])
labels = dataframe[dataframe.columns[1]].values
Running it with distillbert produced the following error.
Your input ran out of data; interrupting training. Make sure that your dataset or generator can generate at least steps_per_epoch * epochs
I tried
Originally posted by @surya-narayanan in #5 (comment)
How would you finetune, if you want to use kfolds?
When I try to import ernie
, I got the following error:
from transformers import (
ImportError: cannot import name 'AutoModel' from 'transformers' (/home/janpaulus/miniconda3/envs/ernie/lib/python3.7/site-packages/transformers/__init__.py)
There seems to be a problem with the AutoModel
class which isn't essential for the ernie.py
file (at least for my usage).
My workaround for the ernie.py
file is the "removal" of the AutoModel import:
import tensorflow as tf
import numpy as np
from transformers import (
AutoTokenizer,
#AutoModel,
TFAutoModelForSequenceClassification,
)
Maybe the transformers version 2.4.1 isn't the right one?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.