Giter VIP home page Giter VIP logo

Comments (4)

foxik avatar foxik commented on July 19, 2024

BTW, I consider nametag to be very weak currently -- it is not very accurate (it is unchanged since ~2014) and requires a tagger+lemmatizer to work; we use it only for Czech.

As for extracting the tagger -- the released UDPipe models actually contain two MorphoDita models -- one is a tagger predicting UPOS, XPOS & Feats, and the other one is a lemmatizer predicting UPOS & Lemmas. I do not think it is possible to extract the models using existing binaries, but it would be trivial to write one, if you want it.

from nametag.

jwijffels avatar jwijffels commented on July 19, 2024

I have some .udpipe models where the parts of speech and the lemmatizer was trained with 1 morphodita model for which I can still use the tagger external now to test NameTag out.

Some background:

  • I'm working on 15th-19th century corpora with text consisting of a combination of Dutch dialects with French & Latin and
  • which are obtained by either manual transcription of images or automated (full of errors) extraction of text from images based on Transcribus or Tesseract.

I don't mind using pre-deep learning machine learning techniques, my laptop is still from 2013 and the users of the models are historians which have no clue about computer programming.

Free free to provide any advise on tooling that would be more suitable. The requirements that I have are

  • a named entity recognition model can be trained and scored on a regular CPU-only computer in decent time
  • the toolkit should not assume pretrained embeddings exist
  • preferably written in C++ without any very complex Makefile wizardry so that I can easily wrap it up in an R package in 1 day instead of 1 week
    For example I couldn't find any open-source biLSTM-CRF model which matches the above requirements. Would be interested in pointers to tooling you advise.

from nametag.

foxik avatar foxik commented on July 19, 2024

I do not really have any suggestions -- NameTag generally fulfils the "not much required computational performance". The disadvantages are the required morphological model (but if you already have it, it is not a problem) and lower than state-of-the-art performance (it does not even use a CRF layer -- it uses a MEMM with dynamic decoding only; and the implemented feature templates are not that strong). But I do not have any low-resource alternative (we are still using it for Czech)...

When the new UDPipe appears (yes, it is bordering with vaporware at this moment, I am unfortunately aware), we plan a NER + NEL modules too; but they will require substantially more computational resources (especially for training)...

from nametag.

jwijffels avatar jwijffels commented on July 19, 2024

Thanks for the messages and the advice. Looking forward to the vaporware announcements :)

from nametag.

Related Issues (15)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.