Comments (9)
+1
I'm doing sequence to sequence labeling right now, but sometimes there are errors in grammar or some kind of typos from the data. It would be really great if we are able to fix errors directly from doccano, but instead right now I have to export data from doccano into my computer, edit, then import it back into doccano.
from doccano.
Yes. We already have an API for editing document so we can implement this feature by implementing the frontend.
But one problem is that if we edit a document with annotation in a sequence labeling project, the annotated data will become useless. An example is as follows:
Original Text: Plsident Obama
Annotation: {'start_offset': 9, 'end_offset': 14}
Edited Text: Plesident Obama
Annotation: {'start_offset': 9, 'end_offset': 14} # incorrect
True annotation: {'start_offset': 10, 'end_offset': 15}
One solution is that when we edit the document, we delete all their annotation. This is my thought.
from doccano.
yes, it is A very clear and straight demands
from doccano.
Lack of information.
from doccano.
Is this feature on planing?
from doccano.
well, other possible options might be:
- only delete labels that have
start_offset
after edited character position - compare what are actual values for
old_doc[start_offset:end_offset] == new_doc[start_offset:end_offset]
and if they are equal, keep them (covers 1) as well as cases when edits are swapping chars, without adding/removing them outside of labeled items
But this might still not work for everyone
from doccano.
I think it is not a big issue.
Alternative solution is, we save both original text and modified text
from doccano.
Implementing data editing feature on doccano is a little overwhelming. It would be good to identify the territory of the tool.
from doccano.
I support this suggestion, I am seeing incorrectly spelled text, wrong punctuations, etc, etc. Would love to have the capability to fix typos, grammatical errors, etc, as I see them.
from doccano.
Related Issues (20)
- Experience random freeze when labeling NER project HOT 1
- Custom rest request: Can this function mark the relationship through request during the automatic marking process?
- The function "Import Dataset" can not use. Status ramain loading and won't stop. HOT 2
- The function "Export Dataset" can't be used. HOT 1
- How to change data ID HOT 2
- Webpage freezes frequently even with the project size of 100 images also HOT 2
- Need help with rest api auto labelling HOT 1
- sqlite3.IntegrityError: UNIQUE constraint failed: labels_textlabel.example_id, labels_textlabel.user_id, labels_textlabel.text
- one click deployment AWS error HOT 1
- GitLab authentication HOT 2
- docker-compose安装时flower服务无法启动 HOT 4
- doccano和label-studio之间的标注转换 HOT 1
- Log file annotations HOT 1
- Username of superuser is not customizable
- Upload dataset error
- 关系标注功能无法使用 HOT 2
- App init: no django-storages backend configured, using default (local) storage backend if set, otherwise you need to manage file storage independently of this app. HOT 1
- Import Exported Data From Multiple Users HOT 1
- Local Development Environment Setup Problem
- Unable to select and annotate elements in files. HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from doccano.