Hi! Flair looks amazing. Clean code, easy to use. Thanks for making it open source!</p

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Support for more languages? about flair HOT 26 CLOSED

flairnlp commented on May 19, 2024 2

Support for more languages?

from flair.

Comments (26)

alanakbik commented on May 19, 2024

Hello Emil! Thanks for the interest - we are thinking of adding more models in more languages. In particular, we are currently looking at French, Italian and Dutch. Which languages / tasks are you most interested in?

from flair.

EmilStenstrom commented on May 19, 2024

It's a bit different depending on if it's for hobby projects or work projects.

Hobby projects: Swedish / POS, Swedish / NER
Business projects: Nordics (Swedish/Norwegian/Danish/Finish), German, Spanish, English, Polish. POS and NER.

from flair.

alanakbik commented on May 19, 2024

Ok great! The German models (POS/NER) will be put online probably sometime next week.

We will also progressively add more languages in the near future. Of your list, I think Polish and Spanish are the most likely to be added soonish, though I can't say exactly when.

from flair.

EmilStenstrom commented on May 19, 2024

Is there something about adding a new language that I could help with?

For instance, there one big Swedish dataset with POS and NER tags called SUC 3.0. It's available for download here: https://spraakbanken.gu.se/eng/resource/suc3

from flair.

alanakbik commented on May 19, 2024

Yes, if you're interested you could train a new model for Swedish POS or NER. You would probably need to adapt the NLPTaskDataFetcher for the task you want to train it on, but otherwise could probably use pretty much the same code as given here (and in the experiments section).

I've added Swedish word embeddings to the project. I will also add issues for this task if you are interested!

from flair.

EmilStenstrom commented on May 19, 2024

Fantastic! If you add the issues I’ll see where I can help.

from flair.

eduardompereira commented on May 19, 2024

Are you planning to work on Portuguese language?

from flair.

alanakbik commented on May 19, 2024

@eduardompereira I am not sure how quickly we can get around to Portuguese, so we'd welcome contributions here! If it helps, we could package standard word embeddings for Portuguese with the next release? Are you aware of good NER datasets for Portuguese?

from flair.

mhham commented on May 19, 2024

Hi there.
Any news on the french models ?
For NER and POS-Tagging there is the WikiNER french dataset which comes in a quite easily adaptable format :
https://github.com/dice-group/FOX/tree/master/input/Wikiner

For the word embeddings one can also use french fasttext embeddings:
https://fasttext.cc/docs/en/crawl-vectors.html

from flair.

alanakbik commented on May 19, 2024

Hi @mhham thanks for the pointers - more languages are definitely planned and French is high up on our priority list. I am hoping that the next release will be a lot more multilingual than currently, but I am not sure how quickly we can get around to which language. Of course contributions are always welcome!

from flair.

lz-chen commented on May 19, 2024

Hi, thanks for the great work! I wonder how many languages does flair support for NER now? From what I see on release 0.4 it seems that English, German, Dutch, French, italian, Spanish, Portuguese, Polish are supported?
Btw is there any updates on Nordic language models @EmilStenstrom? I am currently working with NER in Norwegian so it would be very useful:) Thanks!

from flair.

stefan-it commented on May 19, 2024

I've trained FlairEmbeddings on Wikipedia dumps + OPUS (1 epoch) for some more languages:

no, fa, ar, id, pl, da, hi, nl, eu, sl, he, hr, fi, bg, cs and sv.

I'll provide them as soon as I have checked their performance on UD :)

from flair.

lz-chen commented on May 19, 2024

Thanks for the reply! @stefan-it
I just read the Tutorial 2, so the pretrained NER model is available in German, French and Dutch, right?

from flair.

alanakbik commented on May 19, 2024

Yes, you could also test our multilingual NER model, which can detect entities in English, German, Dutch and Spanish (and even other languages a little) even though it is only one model.

from flair.

lz-chen commented on May 19, 2024

Thanks for the pointer! Will try that out:)

from flair.

pvcastro commented on May 19, 2024

@eduardompereira I am not sure how quickly we can get around to Portuguese, so we'd welcome contributions here! If it helps, we could package standard word embeddings for Portuguese with the next release? Are you aware of good NER datasets for Portuguese?

Hi @alanakbik . I study NER for Portuguese, and for "general" NER models, I believe the best dataset is the one spacy uses, which is the one from WikiNER (Learning multilingual named entity recognition from Wikipedia) .
As for Portuguese word embeddings, there's a lab from an university here in Brazil that trained many different models of word embeddings for Portuguese here. In order for them to be available in flair, should they be added to embeddings.WordEmbeddings?

from flair.

alanakbik commented on May 19, 2024

@pvcastro yes good idea. We've already added the WikiNER dataset for Portuguese (see tutorial). You can load it with:

original_corpus = NLPTaskDataFetcher.load_corpus(NLPTask.WIKINER_PORTUGUESE)

Aside from this, I think it would be good to support a downloading and conversion routine for word embeddings such as the ones you linked, to make it easy to start experimenting with them!

from flair.

pvcastro commented on May 19, 2024

OK, great. I'll work on this and submit a PR soon.
Thanks @alanakbik!

from flair.

jimkts commented on May 19, 2024

Hi guys! Flair is amazing....I am reading your project because I am writing my Msc thesis in NLP. I was wondering if Flair support Greek language?

from flair.

alanakbik commented on May 19, 2024

Hello @jimkts - only one embedding type currently supports Greek, namely BytePairEmbeddings, which you could use to embed sentences and train models for Greek:

embeddings = BytePairEmbeddings("el")

sentence = Sentence('Αγαπώ την Ελλάδα')
embeddings.embed(sentence)

for token in sentence:
    print(token)
    print(token.embedding)

In order to train a model, you would need to add a Greek training dataset. For instance, the Greek Universal Dependency Treebank or a dataset for Named Entity Recognition. You can check out the tutorials on how to read in your own datasets or train your own models. If you have questions do let us know - we'd be happy to add Greek support.

from flair.

stefan-it commented on May 19, 2024

@jimkts I could train Flair embeddings for Greek if you want :)

Meanwhile, you could also try the multilingual BERT model (it also includes Greek, trained on Wikipedia).

from flair.

jimkts commented on May 19, 2024

Hello @jimkts - only one embedding type currently supports Greek, namely BytePairEmbeddings, which you could use to embed sentences and train models for Greek:
embeddings = BytePairEmbeddings("el")

sentence = Sentence('Αγαπώ την Ελλάδα')
embeddings.embed(sentence)

for token in sentence:
    print(token)
    print(token.embedding)
In order to train a model, you would need to add a Greek training dataset. For instance, the Greek Universal Dependency Treebank or a dataset for Named Entity Recognition. You can check out the tutorials on how to read in your own datasets or train your own models. If you have questions do let us know - we'd be happy to add Greek support.

Hello @alanakbik ....I trained a big Greek corpus(~17 Gb and ~3500000 words) on gensim Word2Vec. How can I use this pre trained model on Flair?

from flair.

stale commented on May 19, 2024

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

from flair.

marcomoriatbi commented on May 19, 2024

Hello Emil! Thanks for the interest - we are thinking of adding more models in more languages. In particular, we are currently looking at French, Italian and Dutch. Which languages / tasks are you most interested in?

Dear Alan, is it available any support to Italian NER? Is it required a new training for Italian NER? Thanks

from flair.

alanakbik commented on May 19, 2024

Hello @marcomoriatbi there is no pre-trained model for Italiian NER yet. You could try 'ner-multi' which was trained over 4 languages and kind of works also for related languages it was trained for. I tried this model for French and it worked ok, so maybe that extends to Italian as well.

Otherwise, you would need to train your own Italian NER model. There are Italian Flair embddings included, but on the dataset side, we currently only include NER datasets for Italian that were automatically generated: WIKINER_ITALIAN, WIKIANN and XTREME (see here for more info). I think there are better NER datasets for Italian out there.

from flair.

longsc2603 commented on May 19, 2024

Hi, I am looking through Flair and wondering if it support Vietnamese or not. If not, will it in the future? Thank you!

from flair.

Support for more languages? about flair HOT 26 CLOSED

Comments (26)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent