Giter VIP home page Giter VIP logo

Comments (26)

alanakbik avatar alanakbik commented on May 19, 2024

Hello Emil! Thanks for the interest - we are thinking of adding more models in more languages. In particular, we are currently looking at French, Italian and Dutch. Which languages / tasks are you most interested in?

from flair.

EmilStenstrom avatar EmilStenstrom commented on May 19, 2024

It's a bit different depending on if it's for hobby projects or work projects.

Hobby projects: Swedish / POS, Swedish / NER
Business projects: Nordics (Swedish/Norwegian/Danish/Finish), German, Spanish, English, Polish. POS and NER.

from flair.

alanakbik avatar alanakbik commented on May 19, 2024

Ok great! The German models (POS/NER) will be put online probably sometime next week.

We will also progressively add more languages in the near future. Of your list, I think Polish and Spanish are the most likely to be added soonish, though I can't say exactly when.

from flair.

EmilStenstrom avatar EmilStenstrom commented on May 19, 2024

Is there something about adding a new language that I could help with?

For instance, there one big Swedish dataset with POS and NER tags called SUC 3.0. It's available for download here: https://spraakbanken.gu.se/eng/resource/suc3

from flair.

alanakbik avatar alanakbik commented on May 19, 2024

Yes, if you're interested you could train a new model for Swedish POS or NER. You would probably need to adapt the NLPTaskDataFetcher for the task you want to train it on, but otherwise could probably use pretty much the same code as given here (and in the experiments section).

I've added Swedish word embeddings to the project. I will also add issues for this task if you are interested!

from flair.

EmilStenstrom avatar EmilStenstrom commented on May 19, 2024

Fantastic! If you add the issues I’ll see where I can help.

from flair.

eduardompereira avatar eduardompereira commented on May 19, 2024

Are you planning to work on Portuguese language?

from flair.

alanakbik avatar alanakbik commented on May 19, 2024

@eduardompereira I am not sure how quickly we can get around to Portuguese, so we'd welcome contributions here! If it helps, we could package standard word embeddings for Portuguese with the next release? Are you aware of good NER datasets for Portuguese?

from flair.

mhham avatar mhham commented on May 19, 2024

Hi there.
Any news on the french models ?
For NER and POS-Tagging there is the WikiNER french dataset which comes in a quite easily adaptable format :
https://github.com/dice-group/FOX/tree/master/input/Wikiner

For the word embeddings one can also use french fasttext embeddings:
https://fasttext.cc/docs/en/crawl-vectors.html

from flair.

alanakbik avatar alanakbik commented on May 19, 2024

Hi @mhham thanks for the pointers - more languages are definitely planned and French is high up on our priority list. I am hoping that the next release will be a lot more multilingual than currently, but I am not sure how quickly we can get around to which language. Of course contributions are always welcome!

from flair.

lz-chen avatar lz-chen commented on May 19, 2024

Hi, thanks for the great work! I wonder how many languages does flair support for NER now? From what I see on release 0.4 it seems that English, German, Dutch, French, italian, Spanish, Portuguese, Polish are supported?
Btw is there any updates on Nordic language models @EmilStenstrom? I am currently working with NER in Norwegian so it would be very useful:) Thanks!

from flair.

stefan-it avatar stefan-it commented on May 19, 2024

I've trained FlairEmbeddings on Wikipedia dumps + OPUS (1 epoch) for some more languages:

no, fa, ar, id, pl, da, hi, nl, eu, sl, he, hr, fi, bg, cs and sv.

I'll provide them as soon as I have checked their performance on UD :)

from flair.

lz-chen avatar lz-chen commented on May 19, 2024

Thanks for the reply! @stefan-it
I just read the Tutorial 2, so the pretrained NER model is available in German, French and Dutch, right?

from flair.

alanakbik avatar alanakbik commented on May 19, 2024

Yes, you could also test our multilingual NER model, which can detect entities in English, German, Dutch and Spanish (and even other languages a little) even though it is only one model.

from flair.

lz-chen avatar lz-chen commented on May 19, 2024

Thanks for the pointer! Will try that out:)

from flair.

pvcastro avatar pvcastro commented on May 19, 2024

@eduardompereira I am not sure how quickly we can get around to Portuguese, so we'd welcome contributions here! If it helps, we could package standard word embeddings for Portuguese with the next release? Are you aware of good NER datasets for Portuguese?

Hi @alanakbik . I study NER for Portuguese, and for "general" NER models, I believe the best dataset is the one spacy uses, which is the one from WikiNER (Learning multilingual named entity recognition from Wikipedia) .
As for Portuguese word embeddings, there's a lab from an university here in Brazil that trained many different models of word embeddings for Portuguese here. In order for them to be available in flair, should they be added to embeddings.WordEmbeddings?

from flair.

alanakbik avatar alanakbik commented on May 19, 2024

@pvcastro yes good idea. We've already added the WikiNER dataset for Portuguese (see tutorial). You can load it with:

original_corpus = NLPTaskDataFetcher.load_corpus(NLPTask.WIKINER_PORTUGUESE)

Aside from this, I think it would be good to support a downloading and conversion routine for word embeddings such as the ones you linked, to make it easy to start experimenting with them!

from flair.

pvcastro avatar pvcastro commented on May 19, 2024

OK, great. I'll work on this and submit a PR soon.
Thanks @alanakbik!

from flair.

jimkts avatar jimkts commented on May 19, 2024

Hi guys! Flair is amazing....I am reading your project because I am writing my Msc thesis in NLP. I was wondering if Flair support Greek language?

from flair.

alanakbik avatar alanakbik commented on May 19, 2024

Hello @jimkts - only one embedding type currently supports Greek, namely BytePairEmbeddings, which you could use to embed sentences and train models for Greek:

embeddings = BytePairEmbeddings("el")

sentence = Sentence('Αγαπώ την Ελλάδα')
embeddings.embed(sentence)

for token in sentence:
    print(token)
    print(token.embedding)

In order to train a model, you would need to add a Greek training dataset. For instance, the Greek Universal Dependency Treebank or a dataset for Named Entity Recognition. You can check out the tutorials on how to read in your own datasets or train your own models. If you have questions do let us know - we'd be happy to add Greek support.

from flair.

stefan-it avatar stefan-it commented on May 19, 2024

@jimkts I could train Flair embeddings for Greek if you want :)

Meanwhile, you could also try the multilingual BERT model (it also includes Greek, trained on Wikipedia).

from flair.

jimkts avatar jimkts commented on May 19, 2024

Hello @jimkts - only one embedding type currently supports Greek, namely BytePairEmbeddings, which you could use to embed sentences and train models for Greek:

embeddings = BytePairEmbeddings("el")

sentence = Sentence('Αγαπώ την Ελλάδα')
embeddings.embed(sentence)

for token in sentence:
    print(token)
    print(token.embedding)

In order to train a model, you would need to add a Greek training dataset. For instance, the Greek Universal Dependency Treebank or a dataset for Named Entity Recognition. You can check out the tutorials on how to read in your own datasets or train your own models. If you have questions do let us know - we'd be happy to add Greek support.

Hello @alanakbik ....I trained a big Greek corpus(~17 Gb and ~3500000 words) on gensim Word2Vec. How can I use this pre trained model on Flair?

from flair.

stale avatar stale commented on May 19, 2024

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

from flair.

marcomoriatbi avatar marcomoriatbi commented on May 19, 2024

Hello Emil! Thanks for the interest - we are thinking of adding more models in more languages. In particular, we are currently looking at French, Italian and Dutch. Which languages / tasks are you most interested in?

Dear Alan, is it available any support to Italian NER? Is it required a new training for Italian NER? Thanks

from flair.

alanakbik avatar alanakbik commented on May 19, 2024

Hello @marcomoriatbi there is no pre-trained model for Italiian NER yet. You could try 'ner-multi' which was trained over 4 languages and kind of works also for related languages it was trained for. I tried this model for French and it worked ok, so maybe that extends to Italian as well.

Otherwise, you would need to train your own Italian NER model. There are Italian Flair embddings included, but on the dataset side, we currently only include NER datasets for Italian that were automatically generated: WIKINER_ITALIAN, WIKIANN and XTREME (see here for more info). I think there are better NER datasets for Italian out there.

from flair.

longsc2603 avatar longsc2603 commented on May 19, 2024

Hi, I am looking through Flair and wondering if it support Vietnamese or not. If not, will it in the future? Thank you!

from flair.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.