Giter VIP home page Giter VIP logo

ner-annotator's Introduction

NER Annotator for Spacy

NER Annotator for SpaCy allows you to create training data for creating a custom NER Model with custom tags.

Screenshots

NER Annotator Screenshot

Development

Requirements

  1. Node JS 14.x
  2. Yarn Package Manager
  3. Rust (for building desktop versions)

Running it locally for development

  1. Open another terminal and start the server for the UI
yarn
yarn serve

Now go to http://localhost:8081/ner-annotator/

Developing the desktop application

The desktop applications have been created using Tauri.

yarn tauri:serve

To build the final binaries run

yarn tauri:build

Credits

  1. App Icon - Ornithology icons created by Freepik - Flaticon

ner-annotator's People

Contributors

alvi-khan avatar faran-javaid avatar lelvilamp avatar leonkunert avatar lyihaoo avatar tecoholic avatar totalus avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ner-annotator's Issues

No way to open a new file.

The only way to open a new file is to refresh the page (website) or re-open the application (desktop).

Add Previous Button to go to the previous entry

Would be good to add a 'previous' or 'backward' button ( exactly opposite to 'Skip' button ) so that we can navigate back to the previous sentence/ line - to add more labels / edit the labels marked in the current session.

Keyboard shortcuts

Would speed up the annotation process.
Keyboard shortcuts for

  • Changing Classes (1-9)
  • Save (Space | Enter | S)
  • Skip (Space | Tab | Enter | S)
  • Back (Backspace | B)
  • Reset (Esc | R)

Install

I got the following error upon install:

yarn install v1.22.10
[1/4] ๐Ÿ”  Resolving packages...
[2/4] ๐Ÿšš  Fetching packages...
error [email protected]: The engine "node" is incompatible with this module. Expected version ">=10". Got "8.12.0"
error Found incompatible module.
info Visit https://yarnpkg.com/en/docs/cli/install for documentation about this command.

Install

Hello!

On yarn install getting this

[1/4] Resolving packages...
[2/4] Fetching packages...
info [email protected]: The platform "linux" is incompatible with this module.
info "[email protected]" is an optional dependency and failed compatibility check. Excluding it from installation.
info [email protected]: The platform "linux" is incompatible with this module.
info "[email protected]" is an optional dependency and failed compatibility check. Excluding it from installation.
info [email protected]: The platform "linux" is incompatible with this module.
info "[email protected]" is an optional dependency and failed compatibility check. Excluding it from installation.
[3/4] Linking dependencies...
warning " > @fortawesome/[email protected]" has incorrect peer dependency "[email protected]".
warning " > [email protected]" has unmet peer dependency "webpack@^4.36.0 || ^5.0.0".
[4/4] Building fresh packages...
[-/7] โ „ waiting...
[-/7] โ „ waiting...
[-/7] โ „ waiting...
[-/7] โ „ waiting...
error /home/mike/WDC4_1/Neo/ner-annotator/ui/node_modules/node-sass: Command failed.
Exit code: 1

Then there is a lot of extraction info and finally

Node.js v17.0.1
error Command failed with exit code 1.
info Visit https://yarnpkg.com/en/docs/cli/run for documentation about this command

Can you help please?

Ambiguity in indexing of labels

Hi @tecoholic . First of all thanks for this great repository. Secondly I would like to ask you a question.
Can you please explain on which string are you indexing (either original or after tokenization) because when I am testing the exported json file the indexes are not appropriate.

Allow importing multiple files for annotation

NER Annotator works with only one text file at a time. But any serious project will have multiple files for annotations. It would be useful either

  • allow loading multiple files at the same time or
  • provide a way to load the next file at the completion of one file

This was all the tags can be exported as a single JSON.

Note

Since the tech is browser based - memory considerations have to be kept in mind

Type of labeling format used

Hello, my question is about the type of labeling format used. Was it IOB, IOB2, IOBES, or another type of format?

Running tool on remote server for collaborative tagging activity

I was wondering if I can start "ner-annotator" tool on remote server and multiple users can simultaneously do tagging on different corpus.

So, user can select corpus from their local machine, but server will not be running on local machine.

Anyone, have tried doing this? Let me know if anyone has ideas to achieve this.

Thanks.

Very Cool Tools - Two Questions

First, Thanks a lot for creating this tools!

It's really useful to my since I am recently using SpaCy as well! Two questions.

  1. While annotating a file, can we have a function of going back and re-annotate the previous sentences so that we could correct errors if any? Or after we completing, may we have a bottom to go back to the home page? (I could do it by reloading the web for now).
  2. I don't know if it's a bug or not... when I annotated a sentence and assign a tag to the last word, the last word will disappear. Then I found if my tagging include the last word, the selected portion of the sentences will disappear. I am not sure what is happening. The file I uploaded to be annotated was a text file with sentences listed line by line, not sure if it matters.

One more thing, actually I am willing to contribute to you repo if you need so extra labor... Since I am Data scientist, I may only be able to do the Python part... but if you need, I am happy to...

Thanks,
Guoyi

How to fix common Ner Tag for all the different text file.

Hello Team,

I appreciate for building this beautiful tool for annotating the text file for NER.

I want to annotate 10000 different text file with fixed number of common Ner Tag for all the text files. I have to every time add the same Ner Tag reputedly for all text file.

Since I am using the application in my local using localhost. Can you please help me to make the code change in my local to fix the common tag for all the text file. Since I do not have much experience in front end applications, any help would really be appreciated.

I would really appreciate if you could help me to fix the tags.

Possibility to move line to the end

Mb it will be helpful to have a possibility to move currently tagging line to the very end e.g. to think about the entities, personally I need it all the time

Docker install fails to build server.Dockerfile

It appears the python slim package does not have GCC which is needed

#8 6.315 gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -fPIC -I/usr/local/include/python3.10 -c regex_3/_regex.c -o build/temp.linux-x86_64-3.10/regex_3/_regex.o
#8 6.315 error: command 'gcc' failed: No such file or directory
#8 6.315 ----------------------------------------
#8 6.316 ERROR: Command errored out with exit status 1: /usr/local/bin/python -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-7i04flgx/regex_ac4e0ddea9bf47599020bcf405efdfaa/setup.py'"'"'; file='"'"'/tmp/pip-install-7i04flgx/regex_ac4e0ddea9bf47599020bcf405efdfaa/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(file) if os.path.exists(file) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' install --record /tmp/pip-record-2vi5dim1/install-record.txt --single-version-externally-managed --compile --install-headers /usr/local/include/python3.10/regex Check the logs for full command output.

executor failed running [/bin/sh -c pip install --no-cache-dir -r requirements.txt]: exit code: 1

Annotate bunch of text file

Seems like I can annotate only one text file every time. I have bunch of text file. it creates separate json for each of them also I have to add label each time I annotate the file.

Doubt about POS and DEP!!

Hi!

I'm newly in this NLP world, so I have a doubt about the POS and Dependencies. When your training your blank model of spacy for NER, you omited components in the pipeline as: tagger, and passer. So my doubt how would add in the training_data.spacy before train the model. I'm asking you this, because your video is the most detailed that I've been seen of this theme and it's Excellent!!. Can you help me with this issue that I have.

Regards,

Text Encoding

The tool doesn't understand ANSI encoding or microsoft.
image

Could we change encoding or find a way to detect it ?

Using the windows msi installer.

Thank you for your tool !

Indexing issue at last sentence.

Saving or skipping on the last sentence still increments the current index. Repeatedly clicking save results in the annotations for the last sentence being repeatedly added to the annotations list since the index is still incrementing.

Capability to annotate portion of a word

In my project, given the text is not properly cleansed, I'd often want to annotate a portion of a word. Can we introduce support for this in the tool?

For example, today LabelStudio supports this,
image

Also, as a clarification, even if we introduce the support in this tool, does spaCy support partial words?

Network error while importing a file

The UI opens but getting a network error while importing a file or performing an action.
Below is the displayed error

localhost: port says
Error: Network error.

kindly help in this

Annotation tool for relation extraction

Thanks for this amazing tool, it works very well.
Would you know of a data annotation tool for relation extraction? I tried searching the web but couldnt find a tool that can accomplish relation extraction. UBAI tool is paid service and doesnt work well for custom model. It would be great if you can recommend an unpaid tool for relation extraction.

Unable to tag portions of a string

Looking to tag portions of a string.

AAABBBCCCDDDEEE.FFFFFF.GGGGGGGG

You cannot tag AAA, BBB, CCC, DDD, EEE, FFFFFF, or GGGGGGG. It will select the entire "word" automatically. An override to allow for fragmented tagging would be helpful.

Currently selected tag not loaded from local storage.

If there are tags stored in local storage and we refresh the page, the tags are loaded but none of them are selected. This allows us to select text for labelling without first selecting a tag. The result is an entity with the label 'Unlabelled'.

Issue about the code.

Steps to reproduce:

1, Git clone the repo and build the source code.
2. Run python serve.py
3. yarn serve
4. open browser then go to http://localhost:8080/

a. Issue one:
While annotating a file, can we have a function of going back and re-annotate the previous sentences so that we could correct errors if any? Or after we completing, may we have a bottom to go back to the home page? (I could do it by reloading the web for now).

For now, I don't see any options to go back and to the previous annotation. Only has reset, save and skip.
image

  1. when I annotated a sentence and assign a tag to the last word, the last word will disappear. Then I found if my tagging include the last word, the selected portion of the sentences will disappear. I am not sure what is happening. The file I uploaded to be annotated was a text file with sentences listed line by line, not sure if it matters.

For example, as shown here, I defined a entity named "A" and I annotated the sentence "I am going to walmart". If I assign A to "going" it will be fine.
image

But if I assign A to "walmart", the word "walmart will disappear"...
image

Please let me know if you need any other details.

Thanks,
Guoyi

Sample Code Update

Thanks @tecoholic for this! Must be immensily helpful for a lot!

There were typos that are fixed and and also I've added this extra condition that was showing a weird error due to empty item in the list. Feel free to use it if you think it's better!

import json
import spacy

# load the training data
with open('your-annotations.json') as fp:
  training_data = json.load(fp)

# prepare an empty model to train
nlp = spacy.blank('en')
nlp.vocab.vectors.name = 'demo'
ner = nlp.create_pipe('ner')
nlp.add_pipe(ner, last=True)

# Add the custome NER Tags as entities into the model
for label in training_data["classes"]:
  nlp.entity.add_label(label)

# Train the model
optimizer = nlp.begin_training()

for text, annotations in training_data["annotations"]:
    if len(text) > 0: # in case an empty sentence was saved while annotating
        nlp.update([text], [annotations], sgd=optimizer)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.