d5555 / neuralgym Goto Github PK

View Code? Open in Web Editor NEW

53.0 2.0 10.0 318.01 MB

🚀GUI for training spaCy models

License: MIT License

Python 100.00%

spacy nlp-machine-learning neural-networks training-neural-net spacy-gui spacy-models

neuralgym's Introduction

NeuralGym v1.1

python app for training spaCy models
Check out TagEditor for creating training data

Installation

Option 1: No installation required. Download zip archive files NGym.7z.001 and NGym.7z.002 into the same folder, and unzip NGym.7z.001.
Launch ng.exe

Option 2: from cmd

git clone https://github.com/d5555/NeuralGym
pip install neuralgym/.

To run application open terminal (cmd) and type: python -m ngym or ngym or

python
>>>import ngym

How to use

Create an output directory where the trained model will be saved.
Select train and dev data files in spaCy format. You can use TagEditor to create your training dataset. For demonstration purposes there are 2 dataset files, imdb_train.spacy (400 docs) and imdb_dev.spacy (100 docs) annotated with POS ,Dependencies, NER and Textcategories.
Select a source model (it can be any spaCy model compatible with spaCy 3.0+) for training from source. You can specify either a source model name, eg en_core_web_sm or select a folder with model. If you specify the model name without full path, the model should be placed into the application's main folder (including model's dist-info folder) or add path to the Python folder where spaCy models are installed by pushing button Add sys path. Usually it is Python...\Lib\site-package. For example ... "C:\Python39\Lib\site-packages"
To train from source check on Training options From source respectively or uncheck them to start from blank model.
Labels in the training data should match labels in the original model otherwise start from blank model.
Check on Use averages so the model to be saved with parameter averaging after training is done.
Press Start to initialize training. You can disrupt training process at any time by clicking stop.
After training is completed there will be 2 folders in the output directory, 'Best model' and 'Last model'.
Button Reset allows to restore default settings in case of an error. Or delete 'config.cfg' in the main folder.

*If you want to contribute to a project and make it better, your help is very welcome.
[email protected]

neuralgym's People

Contributors

Stargazers

Watchers

Forkers

todun shalevy1 doriclaudino brunomrtz fajarlabs copperdong biobot500 sandy1811 eolas-bith phymucs

neuralgym's Issues

Can't access ng_utils

Using the second option, I am unable to run the app. It is giving error:
ng_utils not found

Error in the training loop occured list indices must be integers or slices, not NoneType

I've used TagEditor on text and get normal txt file with structure of entities in text for import in neural gym. After import and start of NG I've got error

Error in the training loop occured list indices must be integers or slices, not NoneType

After inspection and reimport of all sentences one by one I've found this particular sentences that generates this error

("The study so far has focused on KITO between the Beltway and OYUI in Gaithersburg and on the Beltway between the George Washington Parkway in Virginia and Route 5 in Prince George's, including the notoriously traffic-choked Legion Bridge.", { 'words': ['The', 'study', 'so', 'far', 'has', 'focused', 'on', 'KITO', 'between', 'the', 'Beltway', 'and', 'OYUI', 'in', 'Gaithersburg', 'and', 'on', 'the', 'Beltway', 'between', 'the', 'George', 'Washington', 'Parkway', 'in', 'Virginia', 'and Route', '5', 'in', 'Prince', 'George', "'s", ',', 'including', 'the', 'notoriously', 'traffic', '-', 'choked', 'Legion', 'Bridge', '.'], 'entities': [(32, 37, 'PLACE'), (50, 67, 'PLACE'), (115, 140, 'PLACE'), (153, 164, 'PLACE'), (226, 239, 'PLACE')] }),

cross platform support

It is not clear if it only supports Windows.
It would be possible to port the project to macos/linux?

Symantec detected a virus

on running ng.exe , my antivirus detected it as a virus -

Security risk detected: WS.Reputation.1

hello. i have a question on how to make train data. (NER)

I was looking at the train_data.txt file to train the model.

("""It's a visually stunning movie, finding moments both macro and micro to highlight the beautiful imagination that "Star Wars" can evoke.""", {
'words': ['It', "'s", 'a', 'visually', 'stunning', 'movie', ',', 'finding', 'moments', 'both', 'macro', 'and', 'micro', 'to', 'highlight', 'the', 'beautiful', 'imagination', 'that', '"', 'Star', 'Wars', '"', 'can', 'evoke', '.'],
'entities': [(25, 30, 'PRODUCT'), (114, 123, 'WORK_OF_ART')],
'heads': [1, 1, 5, 4, 5, 1, 1, 1, 7, 10, 8, 10, 10, 14, 7, 17, 17, 14, 24, 24, 21, 24, 23, 24, 17, 1],
'deps': ['nsubj', 'ROOT', 'det', 'advmod', 'amod', 'attr', 'punct', 'advcl', 'dobj', 'preconj', 'amod', 'cc', 'conj', 'aux', 'advcl', 'det', 'amod', 'dobj', 'mark', 'punct', 'compound', 'nsubj', 'punct', 'aux', 'relcl', 'punct'],
'tags': ['PRP', 'VBZ', 'DT', 'RB', 'JJ', 'NN', ',', 'VBG', 'NNS', 'CC', 'JJ', 'CC', 'JJ', 'TO', 'VB', 'DT', 'JJ', 'NN', 'IN', '``', 'NNP', 'NNS', "''", 'MD', 'VB', '.'],
'cats': {'POSITIVE': True, 'NEGATIVE': False}
})

What does the number mean in 'entities'?
Do you have a document to read what'heads','deps','tags' and'cats' are?

Thanks for reading.

I can't use NeuralGYM

How can I use this tool inside anaconda environment, because when I'm executing directly in windows, I received the following error:

spaCy2.2.4
Variables initialization...
Output_dir:"D:/Steven/TRABAJO_CK/AMI/OCR_Codigos/opencv-text-recognition/pruebas/"
Creating blank 'en' model
Loading training data...
TRAIN_DATA loaded from path: "D:\Steven\TRABAJO_CK\AMI\OCR_Codigos\opencv-text-recognition\pruebas\TOOL.txt"
Number of examples: 2
n_iter = 100
learn_rate = 0.001
drop = 0.2
batch_start = 4.0
batch_stop = 32.0
batch_compound = 1.001
Selected pipeline components: ['tagger', 'parser', 'ner', 'textcat']
Initializing variables error:
'str' object has no attribute 'get'
spaCy2.2.4
Variables initialization...
Output_dir:"D:/Steven/TRABAJO_CK/AMI/OCR_Codigos/opencv-text-recognition/pruebas/"
Creating blank 'en' model
Loading training data...
TRAIN_DATA loaded from path: "D:\Steven\TRABAJO_CK\AMI\OCR_Codigos\opencv-text-recognition\pruebas\TOOL.txt"
Number of examples: 2
n_iter = 100
learn_rate = 0.001
drop = 0.2
batch_start = 4.0
batch_stop = 32.0
batch_compound = 1.001
Selected pipeline components: ['ner']
Initializing variables error:
'str' object has no attribute 'get'

It could be helpful if you can help.

[E923] It looks like there is no proper sample data to initialize the Model of component 'tok2vec'

Hey there, I unfortunately always get the following error:

Initializing pipeline*
[2023-11-02 22:10:16,436] [INFO] Set up nlp object from config
[2023-11-02 22:10:16,445] [INFO] Pipeline: ['tok2vec', 'ner']
[2023-11-02 22:10:16,453] [INFO] Added vocab lookups: lexeme_norm
[2023-11-02 22:10:16,454] [INFO] Created vocabulary
[2023-11-02 22:10:16,454] [INFO] Finished initializing nlp object
Error in the training loop occured
[E923] It looks like there is no proper sample data to initialize the Model of component 'tok2vec'. To check your input data paths and annotation, run: python -m spacy debug data config.cfg[E923] It looks like there is no proper sample data to initialize the Model of component 'tok2vec'. To check your input data paths and annotation, run: python -m spacy debug data config.cfg

All the paths are correct and checked several times over. It works when I do it "manually", but I cant get it to work in the program. Which is a shame because this seems much more comfortable :D