Comments (26)
Hello Emil! Thanks for the interest - we are thinking of adding more models in more languages. In particular, we are currently looking at French, Italian and Dutch. Which languages / tasks are you most interested in?
from flair.
It's a bit different depending on if it's for hobby projects or work projects.
Hobby projects: Swedish / POS, Swedish / NER
Business projects: Nordics (Swedish/Norwegian/Danish/Finish), German, Spanish, English, Polish. POS and NER.
from flair.
Ok great! The German models (POS/NER) will be put online probably sometime next week.
We will also progressively add more languages in the near future. Of your list, I think Polish and Spanish are the most likely to be added soonish, though I can't say exactly when.
from flair.
Is there something about adding a new language that I could help with?
For instance, there one big Swedish dataset with POS and NER tags called SUC 3.0. It's available for download here: https://spraakbanken.gu.se/eng/resource/suc3
from flair.
Yes, if you're interested you could train a new model for Swedish POS or NER. You would probably need to adapt the NLPTaskDataFetcher for the task you want to train it on, but otherwise could probably use pretty much the same code as given here (and in the experiments section).
I've added Swedish word embeddings to the project. I will also add issues for this task if you are interested!
from flair.
Fantastic! If you add the issues I’ll see where I can help.
from flair.
Are you planning to work on Portuguese language?
from flair.
@eduardompereira I am not sure how quickly we can get around to Portuguese, so we'd welcome contributions here! If it helps, we could package standard word embeddings for Portuguese with the next release? Are you aware of good NER datasets for Portuguese?
from flair.
Hi there.
Any news on the french models ?
For NER and POS-Tagging there is the WikiNER french dataset which comes in a quite easily adaptable format :
https://github.com/dice-group/FOX/tree/master/input/Wikiner
For the word embeddings one can also use french fasttext embeddings:
https://fasttext.cc/docs/en/crawl-vectors.html
from flair.
Hi @mhham thanks for the pointers - more languages are definitely planned and French is high up on our priority list. I am hoping that the next release will be a lot more multilingual than currently, but I am not sure how quickly we can get around to which language. Of course contributions are always welcome!
from flair.
Hi, thanks for the great work! I wonder how many languages does flair support for NER now? From what I see on release 0.4 it seems that English, German, Dutch, French, italian, Spanish, Portuguese, Polish are supported?
Btw is there any updates on Nordic language models @EmilStenstrom? I am currently working with NER in Norwegian so it would be very useful:) Thanks!
from flair.
I've trained FlairEmbeddings on Wikipedia dumps + OPUS (1 epoch) for some more languages:
no, fa, ar, id, pl, da, hi, nl, eu, sl, he, hr, fi, bg, cs and sv.
I'll provide them as soon as I have checked their performance on UD :)
from flair.
Thanks for the reply! @stefan-it
I just read the Tutorial 2, so the pretrained NER model is available in German, French and Dutch, right?
from flair.
Yes, you could also test our multilingual NER model, which can detect entities in English, German, Dutch and Spanish (and even other languages a little) even though it is only one model.
from flair.
Thanks for the pointer! Will try that out:)
from flair.
@eduardompereira I am not sure how quickly we can get around to Portuguese, so we'd welcome contributions here! If it helps, we could package standard word embeddings for Portuguese with the next release? Are you aware of good NER datasets for Portuguese?
Hi @alanakbik . I study NER for Portuguese, and for "general" NER models, I believe the best dataset is the one spacy uses, which is the one from WikiNER (Learning multilingual named entity recognition from Wikipedia) .
As for Portuguese word embeddings, there's a lab from an university here in Brazil that trained many different models of word embeddings for Portuguese here. In order for them to be available in flair, should they be added to embeddings.WordEmbeddings?
from flair.
@pvcastro yes good idea. We've already added the WikiNER dataset for Portuguese (see tutorial). You can load it with:
original_corpus = NLPTaskDataFetcher.load_corpus(NLPTask.WIKINER_PORTUGUESE)
Aside from this, I think it would be good to support a downloading and conversion routine for word embeddings such as the ones you linked, to make it easy to start experimenting with them!
from flair.
OK, great. I'll work on this and submit a PR soon.
Thanks @alanakbik!
from flair.
Hi guys! Flair is amazing....I am reading your project because I am writing my Msc thesis in NLP. I was wondering if Flair support Greek language?
from flair.
Hello @jimkts - only one embedding type currently supports Greek, namely BytePairEmbeddings
, which you could use to embed sentences and train models for Greek:
embeddings = BytePairEmbeddings("el")
sentence = Sentence('Αγαπώ την Ελλάδα')
embeddings.embed(sentence)
for token in sentence:
print(token)
print(token.embedding)
In order to train a model, you would need to add a Greek training dataset. For instance, the Greek Universal Dependency Treebank or a dataset for Named Entity Recognition. You can check out the tutorials on how to read in your own datasets or train your own models. If you have questions do let us know - we'd be happy to add Greek support.
from flair.
@jimkts I could train Flair embeddings for Greek if you want :)
Meanwhile, you could also try the multilingual BERT model (it also includes Greek, trained on Wikipedia).
from flair.
Hello @jimkts - only one embedding type currently supports Greek, namely
BytePairEmbeddings
, which you could use to embed sentences and train models for Greek:embeddings = BytePairEmbeddings("el") sentence = Sentence('Αγαπώ την Ελλάδα') embeddings.embed(sentence) for token in sentence: print(token) print(token.embedding)In order to train a model, you would need to add a Greek training dataset. For instance, the Greek Universal Dependency Treebank or a dataset for Named Entity Recognition. You can check out the tutorials on how to read in your own datasets or train your own models. If you have questions do let us know - we'd be happy to add Greek support.
Hello @alanakbik ....I trained a big Greek corpus(~17 Gb and ~3500000 words) on gensim Word2Vec. How can I use this pre trained model on Flair?
from flair.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
from flair.
Hello Emil! Thanks for the interest - we are thinking of adding more models in more languages. In particular, we are currently looking at French, Italian and Dutch. Which languages / tasks are you most interested in?
Dear Alan, is it available any support to Italian NER? Is it required a new training for Italian NER? Thanks
from flair.
Hello @marcomoriatbi there is no pre-trained model for Italiian NER yet. You could try 'ner-multi
' which was trained over 4 languages and kind of works also for related languages it was trained for. I tried this model for French and it worked ok, so maybe that extends to Italian as well.
Otherwise, you would need to train your own Italian NER model. There are Italian Flair embddings included, but on the dataset side, we currently only include NER datasets for Italian that were automatically generated: WIKINER_ITALIAN
, WIKIANN
and XTREME
(see here for more info). I think there are better NER datasets for Italian out there.
from flair.
Hi, I am looking through Flair and wondering if it support Vietnamese or not. If not, will it in the future? Thank you!
from flair.
Related Issues (20)
- [Question]: Status of zappr integration HOT 2
- [Bug]: Error message: "learning rate too small - quitting training!" HOT 1
- [Feature]: Multiple GPU Training for MultiTask Learning HOT 3
- [Question]: Embedding with fine tuned classifier model HOT 1
- [Question]: Fine-tune transformer model with TransformerWordEmbedding HOT 1
- [Question]: How much GPU Memory For Zelda Entity Linking?
- [Question]:
- [Bug]: HOT 2
- [Bug]: No Python 3.12 support HOT 2
- [Bug]: Validation F1 score is consistently 0 across training epochs but test F1 is ~0.85 HOT 5
- ImportError: cannot import name 'triu' from 'scipy.linalg'[Question]: HOT 4
- [Question]: Using flair for creating tags or keywords that represents a given text
- [Bug]: optimizer state not saved HOT 2
- [Bug]: Loading task classifier from `Classifier` class not working
- [Bug]: Shared layers in multi-task model are no longer shared after loading the model from a checkpoint HOT 1
- [Question]: Extending NER tags of Hunflair
- [Bug]: SequenceTagger._all_scores_for_token() function returns incorrect prediction distribution for tokens
- [Bug]: transformers 4.40.0 assumes infinite sequence length on many models and breaks HOT 2
- [How do I get flair to recognize my custom embeddings class during model loading?]: HOT 1
- [Question]: Multi-Task Learning with use_all_task
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from flair.