Giter VIP home page Giter VIP logo

Comments (9)

AnakTeka avatar AnakTeka commented on May 18, 2024 4

+1

I'm doing sequence to sequence labeling right now, but sometimes there are errors in grammar or some kind of typos from the data. It would be really great if we are able to fix errors directly from doccano, but instead right now I have to export data from doccano into my computer, edit, then import it back into doccano.

from doccano.

Hironsan avatar Hironsan commented on May 18, 2024 4

Yes. We already have an API for editing document so we can implement this feature by implementing the frontend.

But one problem is that if we edit a document with annotation in a sequence labeling project, the annotated data will become useless. An example is as follows:

Original Text: Plsident Obama
Annotation: {'start_offset': 9, 'end_offset': 14}

Edited Text: Plesident Obama
Annotation: {'start_offset': 9, 'end_offset': 14}  # incorrect
True annotation: {'start_offset': 10, 'end_offset': 15} 

One solution is that when we edit the document, we delete all their annotation. This is my thought.

from doccano.

SeekPoint avatar SeekPoint commented on May 18, 2024 3

yes, it is A very clear and straight demands

from doccano.

Hironsan avatar Hironsan commented on May 18, 2024

Lack of information.

from doccano.

SeekPoint avatar SeekPoint commented on May 18, 2024

Is this feature on planing?

from doccano.

sudodoki avatar sudodoki commented on May 18, 2024

well, other possible options might be:

  1. only delete labels that have start_offset after edited character position
  2. compare what are actual values for old_doc[start_offset:end_offset] == new_doc[start_offset:end_offset] and if they are equal, keep them (covers 1) as well as cases when edits are swapping chars, without adding/removing them outside of labeled items

But this might still not work for everyone

from doccano.

SeekPoint avatar SeekPoint commented on May 18, 2024

I think it is not a big issue.
Alternative solution is, we save both original text and modified text

from doccano.

icoxfog417 avatar icoxfog417 commented on May 18, 2024

Implementing data editing feature on doccano is a little overwhelming. It would be good to identify the territory of the tool.

from doccano.

BwandoWando avatar BwandoWando commented on May 18, 2024

I support this suggestion, I am seeing incorrectly spelled text, wrong punctuations, etc, etc. Would love to have the capability to fix typos, grammatical errors, etc, as I see them.

from doccano.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.