Giter VIP home page Giter VIP logo

Comments (12)

amn41 avatar amn41 commented on May 18, 2024

the spanish MITIE models are here , if you unzip them and find the feature extractor file you should use that as your mitie_file. If you find that the tokenizer isn't working perfectly for spanish we can address that.

from rasa.

angelo337 avatar angelo337 commented on May 18, 2024

I just Download that model and place all that infor in the config file, however I am getting this error:
would you please point me out how to fix it?
thanks

creangel@creangel_hadoop:~/Downloads/mitie/rasa_nlu$ time python -m rasa_nlu.train -c config.json
Training to recognize 4 categories: 'saludo', 'restaurante_busqueda', 'afirmacion', 'despedida'
Train classifier
extracting text features
now do training
num training samples: 63
C: 200 f-score: 0.709677
C: 400 f-score: 0.709677
C: 300 f-score: 0.709677
C: 100 f-score: 0.709677
C: 0.01 f-score: 0.612903
C: 600 f-score: 0.709677
C: 1400 f-score: 0.709677
C: 3000 f-score: 0.709677
C: 5000 f-score: 0.709677
C: 2550 f-score: 0.709677
C: 1325 f-score: 0.709677
C: 712.5 f-score: 0.709677
C: 406.25 f-score: 0.709677
C: 253.125 f-score: 0.709677
C: 176.562 f-score: 0.709677
C: 138.281 f-score: 0.709677
C: 119.141 f-score: 0.709677
C: 109.57 f-score: 0.709677
C: 104.785 f-score: 0.709677
C: 102.393 f-score: 0.709677
C: 101.196 f-score: 0.709677
C: 100.598 f-score: 0.709677
C: 100.299 f-score: 0.709677
best C: 100.598
test on train:
20 0 0 0
0 8 0 0
0 0 21 0
0 0 0 14

overall accuracy: 1
Training time: 429 seconds.
df.number_of_classes(): 4

Traceback (most recent call last):
File "/usr/lib/python2.7/runpy.py", line 162, in _run_module_as_main
"main", fname, loader, pkg_name)
File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
exec code in run_globals
File "build/bdist.linux-x86_64/egg/rasa_nlu/train.py", line 65, in
File "build/bdist.linux-x86_64/egg/rasa_nlu/train.py", line 59, in do_train
File "build/bdist.linux-x86_64/egg/rasa_nlu/trainers/mitie_trainer.py", line 25, in train
File "build/bdist.linux-x86_64/egg/rasa_nlu/trainers/mitie_trainer.py", line 42, in train_entity_extractor
File "build/bdist.linux-x86_64/egg/rasa_nlu/trainers/mitie_trainer.py", line 31, in start_and_end
IndexError: list index out of range

from rasa.

amn41 avatar amn41 commented on May 18, 2024

looks like there's an error picking up one of your entities. I can't tell if this is a bug or a problem with your data without seeing it.

Please try training intents only (e.g. removing any entities from your training data), and then add them back one by one until you trigger this error. Then please post here the training example which causes the error.

from rasa.

angelo337 avatar angelo337 commented on May 18, 2024

hi there
I just try your solutions and work like a charm, i figure out my mistake is that start counting sentences from 1 instead of 0.
now is fix it.
thanks

from rasa.

oziee avatar oziee commented on May 18, 2024

I have the same problem @angelo337 had.. IndexError: list index out of range
I am using the expressions.json file from wit.ai

is there a problem with training wit data??
expressions.json.zip

from rasa.

amn41 avatar amn41 commented on May 18, 2024

thanks for sharing your training data! I'm able to reproduce this error. It's down to the fact that you have entities like 'perth' in the sentence "what is perths weather like next week". MITIE can only handle entities made up of whole tokens. I will handle this edge case in rasa, but it will still return "perths" rather than "perth" as your location. So for now you will have to resolve that entity yourself. It's on the roadmap to come up with a solution to that, though.

from rasa.

amn41 avatar amn41 commented on May 18, 2024

although thinking about it we could explicitly insert a whitespace in these cases. I will create a new issue & make a proposal

from rasa.

beeva-lisettegarcia avatar beeva-lisettegarcia commented on May 18, 2024

Hello

I would like to use rasa por spanish texts.
I already download the spanish Mitie model and prepared the config file.
During training, I get the following error:

python -m rasa_nlu.train -c config.json
Training to recognize 8 categories: 'greet', 'restaurant_search', 'affirm', 'goodbye', 'saludo', 'busqueda_restaurante', 'afirmacion', 'despedida'
Train classifier
extracting text features
now do training
num training samples: 44
C: 200 f-score: 0.525
C: 400 f-score: 0.525
C: 300 f-score: 0.525
C: 100 f-score: 0.525
C: 0.01 f-score: 0.575
C: 50.005 f-score: 0.525
C: 25.0075 f-score: 0.525
C: 12.5088 f-score: 0.525
C: 6.25938 f-score: 0.525
C: 3.13469 f-score: 0.525
C: 1.57234 f-score: 0.525
C: 0.791172 f-score: 0.525
C: 0.400586 f-score: 0.525
best C: 0.01
test on train:
5 0 0 0 0 0 0 0
0 8 0 0 0 0 0 0
0 0 6 0 0 0 1 0
0 0 0 5 0 0 0 0
1 0 0 0 1 0 0 0
0 0 0 0 0 8 0 0
0 0 0 0 0 0 5 0
0 0 0 1 0 0 0 3

overall accuracy: 0.931818
Training time: 854 seconds.
df.number_of_classes(): 8

Traceback (most recent call last):
File "/home/lisettegarcia/miniconda3/envs/py2/lib/python2.7/runpy.py", line 174, in _run_module_as_main
"main", fname, loader, pkg_name)
File "/home/lisettegarcia/miniconda3/envs/py2/lib/python2.7/runpy.py", line 72, in _run_code
exec code in run_globals
File "/home/lisettegarcia/miniconda3/envs/py2/lib/python2.7/site-packages/rasa_nlu/train.py", line 65, in
do_train(config)
File "/home/lisettegarcia/miniconda3/envs/py2/lib/python2.7/site-packages/rasa_nlu/train.py", line 59, in do_train
trainer.train(training_data)
File "/home/lisettegarcia/miniconda3/envs/py2/lib/python2.7/site-packages/rasa_nlu/trainers/mitie_trainer.py", line 30, in train
self.entity_extractor = self.train_entity_extractor(data.entity_examples)
File "/home/lisettegarcia/miniconda3/envs/py2/lib/python2.7/site-packages/rasa_nlu/trainers/mitie_trainer.py", line 53, in train_entity_extractor
start, end = self.find_entity(ent, text)
File "/home/lisettegarcia/miniconda3/envs/py2/lib/python2.7/site-packages/rasa_nlu/trainers/mitie_trainer.py", line 35, in find_entity
tokens, offsets = tk.tokenize_with_offsets(text)
File "/home/lisettegarcia/miniconda3/envs/py2/lib/python2.7/site-packages/rasa_nlu/tokenizers/mitie_tokenizer.py", line 24, in tokenize_with_offsets
offset += m.start()
AttributeError: 'NoneType' object has no attribute 'start'

Tracing the error, I found the problem in

(mitie_tokenizer.py)
line 22 m = re.search(re.escape(tok), _text[offset:])

when we work with words with accents.

Any idea ?

Thanks
busq_restaurante_Data.json.zip

from rasa.

frankai avatar frankai commented on May 18, 2024

I have the same problem than @beeva-lisettegarcia when training with spanish accents. The problem appears to be in the mitie_tokenizer.py script. Any idea or clue to fix it? Thanks!

from rasa.

tmbo avatar tmbo commented on May 18, 2024

@beeva-lisettegarcia @frankai I just pushed a change that should fix the encoding issue (unfortunately the test that should have ensured this functionality had a bug on its own 😓 ). Would be great if you could test that to see if it solves your issue.

For the future: Please avoid re-using closed issues. Don't hesitate to create new issues. The only thing you should do is the following: make sure the exact problem is not already an existing issue.

from rasa.

cbonadio avatar cbonadio commented on May 18, 2024

I had the same issue as @beeva-lisettegarcia @frankai, now pulled the changes and it is working.

Thanks

from rasa.

beeva-lisettegarcia avatar beeva-lisettegarcia commented on May 18, 2024

Thanks, Now it is working :-)

from rasa.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.