Giter VIP home page Giter VIP logo

neuralgym's Introduction

NeuralGym v1.1

python app for training spaCy models
Check out TagEditor for creating training data

Installation

Option 1: No installation required. Download zip archive files NGym.7z.001 and NGym.7z.002 into the same folder, and unzip NGym.7z.001.
Launch ng.exe

Option 2: from cmd

git clone https://github.com/d5555/NeuralGym
pip install neuralgym/.

To run application open terminal (cmd) and type: python -m ngym or ngym or

python
>>>import ngym

How to use

  1. Create an output directory where the trained model will be saved.
  2. Select train and dev data files in spaCy format. You can use TagEditor to create your training dataset. For demonstration purposes there are 2 dataset files, imdb_train.spacy (400 docs) and imdb_dev.spacy (100 docs) annotated with POS ,Dependencies, NER and Textcategories.
  3. Select a source model (it can be any spaCy model compatible with spaCy 3.0+) for training from source. You can specify either a source model name, eg en_core_web_sm or select a folder with model. If you specify the model name without full path, the model should be placed into the application's main folder (including model's dist-info folder) or add path to the Python folder where spaCy models are installed by pushing button Add sys path. Usually it is Python...\Lib\site-package. For example ... "C:\Python39\Lib\site-packages"
    To train from source check on Training options From source respectively or uncheck them to start from blank model.
    Labels in the training data should match labels in the original model otherwise start from blank model.
  4. Check on Use averages so the model to be saved with parameter averaging after training is done.
  5. Press Start to initialize training. You can disrupt training process at any time by clicking stop.
  6. After training is completed there will be 2 folders in the output directory, 'Best model' and 'Last model'.
  7. Button Reset allows to restore default settings in case of an error. Or delete 'config.cfg' in the main folder.

alt text

*If you want to contribute to a project and make it better, your help is very welcome.
[email protected]

neuralgym's People

Contributors

d5555 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

neuralgym's Issues

Can't access ng_utils

Using the second option, I am unable to run the app. It is giving error:
ng_utils not found

*Error in the training loop occured* list indices must be integers or slices, not NoneType

I've used TagEditor on text and get normal txt file with structure of entities in text for import in neural gym. After import and start of NG I've got error

Error in the training loop occured list indices must be integers or slices, not NoneType

After inspection and reimport of all sentences one by one I've found this particular sentences that generates this error

("The study so far has focused on KITO between the Beltway and OYUI in Gaithersburg and on the Beltway between the George Washington Parkway in Virginia and Route 5 in Prince George's, including the notoriously traffic-choked Legion Bridge.", { 'words': ['The', 'study', 'so', 'far', 'has', 'focused', 'on', 'KITO', 'between', 'the', 'Beltway', 'and', 'OYUI', 'in', 'Gaithersburg', 'and', 'on', 'the', 'Beltway', 'between', 'the', 'George', 'Washington', 'Parkway', 'in', 'Virginia', 'and Route', '5', 'in', 'Prince', 'George', "'s", ',', 'including', 'the', 'notoriously', 'traffic', '-', 'choked', 'Legion', 'Bridge', '.'], 'entities': [(32, 37, 'PLACE'), (50, 67, 'PLACE'), (115, 140, 'PLACE'), (153, 164, 'PLACE'), (226, 239, 'PLACE')] }),

cross platform support

It is not clear if it only supports Windows.
It would be possible to port the project to macos/linux?

Symantec detected a virus

on running ng.exe , my antivirus detected it as a virus -

Security risk detected: WS.Reputation.1

hello. i have a question on how to make train data. (NER)

I was looking at the train_data.txt file to train the model.

("""It's a visually stunning movie, finding moments both macro and micro to highlight the beautiful imagination that "Star Wars" can evoke.""", {
'words': ['It', "'s", 'a', 'visually', 'stunning', 'movie', ',', 'finding', 'moments', 'both', 'macro', 'and', 'micro', 'to', 'highlight', 'the', 'beautiful', 'imagination', 'that', '"', 'Star', 'Wars', '"', 'can', 'evoke', '.'],
'entities': [(25, 30, 'PRODUCT'), (114, 123, 'WORK_OF_ART')],
'heads': [1, 1, 5, 4, 5, 1, 1, 1, 7, 10, 8, 10, 10, 14, 7, 17, 17, 14, 24, 24, 21, 24, 23, 24, 17, 1],
'deps': ['nsubj', 'ROOT', 'det', 'advmod', 'amod', 'attr', 'punct', 'advcl', 'dobj', 'preconj', 'amod', 'cc', 'conj', 'aux', 'advcl', 'det', 'amod', 'dobj', 'mark', 'punct', 'compound', 'nsubj', 'punct', 'aux', 'relcl', 'punct'],
'tags': ['PRP', 'VBZ', 'DT', 'RB', 'JJ', 'NN', ',', 'VBG', 'NNS', 'CC', 'JJ', 'CC', 'JJ', 'TO', 'VB', 'DT', 'JJ', 'NN', 'IN', '``', 'NNP', 'NNS', "''", 'MD', 'VB', '.'],
'cats': {'POSITIVE': True, 'NEGATIVE': False}
})

  1. What does the number mean in 'entities'?

  2. Do you have a document to read what'heads','deps','tags' and'cats' are?

 Thanks for reading.

I can't use NeuralGYM

How can I use this tool inside anaconda environment, because when I'm executing directly in windows, I received the following error:

spaCy2.2.4
Variables initialization...
Output_dir:"D:/Steven/TRABAJO_CK/AMI/OCR_Codigos/opencv-text-recognition/pruebas/"
Creating blank 'en' model
Loading training data...
TRAIN_DATA loaded from path: "D:\Steven\TRABAJO_CK\AMI\OCR_Codigos\opencv-text-recognition\pruebas\TOOL.txt"
Number of examples: 2
n_iter = 100
learn_rate = 0.001
drop = 0.2
batch_start = 4.0
batch_stop = 32.0
batch_compound = 1.001
Selected pipeline components: ['tagger', 'parser', 'ner', 'textcat']
Initializing variables error:
'str' object has no attribute 'get'
spaCy2.2.4
Variables initialization...
Output_dir:"D:/Steven/TRABAJO_CK/AMI/OCR_Codigos/opencv-text-recognition/pruebas/"
Creating blank 'en' model
Loading training data...
TRAIN_DATA loaded from path: "D:\Steven\TRABAJO_CK\AMI\OCR_Codigos\opencv-text-recognition\pruebas\TOOL.txt"
Number of examples: 2
n_iter = 100
learn_rate = 0.001
drop = 0.2
batch_start = 4.0
batch_stop = 32.0
batch_compound = 1.001
Selected pipeline components: ['ner']
Initializing variables error:
'str' object has no attribute 'get'

It could be helpful if you can help.

[E923] It looks like there is no proper sample data to initialize the Model of component 'tok2vec'

Hey there, I unfortunately always get the following error:

Initializing pipeline*
[2023-11-02 22:10:16,436] [INFO] Set up nlp object from config
[2023-11-02 22:10:16,445] [INFO] Pipeline: ['tok2vec', 'ner']
[2023-11-02 22:10:16,453] [INFO] Added vocab lookups: lexeme_norm
[2023-11-02 22:10:16,454] [INFO] Created vocabulary
[2023-11-02 22:10:16,454] [INFO] Finished initializing nlp object
Error in the training loop occured
[E923] It looks like there is no proper sample data to initialize the Model of component 'tok2vec'. To check your input data paths and annotation, run: python -m spacy debug data config.cfg[E923] It looks like there is no proper sample data to initialize the Model of component 'tok2vec'. To check your input data paths and annotation, run: python -m spacy debug data config.cfg

All the paths are correct and checked several times over. It works when I do it "manually", but I cant get it to work in the program. Which is a shame because this seems much more comfortable :D

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.