hi there is it possible to use RASA in spanish? with the MITIE model in spanish?<b

the spanish MITIE models are <a href="https://github.com/mit-nlp/MITIE/releases/downlo

I have the same problem <a class="user-mention notranslate" data-hovercard-type="user"

I have the same problem than <a class="user-mention notranslate" data-hovercard-type="

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

spanish usage about rasa HOT 12 CLOSED

rasahq commented on May 18, 2024

spanish usage

from rasa.

Comments (12)

amn41 commented on May 18, 2024

the spanish MITIE models are here , if you unzip them and find the feature extractor file you should use that as your mitie_file. If you find that the tokenizer isn't working perfectly for spanish we can address that.

from rasa.

angelo337 commented on May 18, 2024

I just Download that model and place all that infor in the config file, however I am getting this error:
would you please point me out how to fix it?
thanks

creangel@creangel_hadoop:~/Downloads/mitie/rasa_nlu$ time python -m rasa_nlu.train -c config.json
Training to recognize 4 categories: 'saludo', 'restaurante_busqueda', 'afirmacion', 'despedida'
Train classifier
extracting text features
now do training
num training samples: 63
C: 200 f-score: 0.709677
C: 400 f-score: 0.709677
C: 300 f-score: 0.709677
C: 100 f-score: 0.709677
C: 0.01 f-score: 0.612903
C: 600 f-score: 0.709677
C: 1400 f-score: 0.709677
C: 3000 f-score: 0.709677
C: 5000 f-score: 0.709677
C: 2550 f-score: 0.709677
C: 1325 f-score: 0.709677
C: 712.5 f-score: 0.709677
C: 406.25 f-score: 0.709677
C: 253.125 f-score: 0.709677
C: 176.562 f-score: 0.709677
C: 138.281 f-score: 0.709677
C: 119.141 f-score: 0.709677
C: 109.57 f-score: 0.709677
C: 104.785 f-score: 0.709677
C: 102.393 f-score: 0.709677
C: 101.196 f-score: 0.709677
C: 100.598 f-score: 0.709677
C: 100.299 f-score: 0.709677
best C: 100.598
test on train:
20 0 0 0
0 8 0 0
0 0 21 0
0 0 0 14

overall accuracy: 1
Training time: 429 seconds.
df.number_of_classes(): 4

Traceback (most recent call last):
File "/usr/lib/python2.7/runpy.py", line 162, in _run_module_as_main
"main", fname, loader, pkg_name)
File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
exec code in run_globals
File "build/bdist.linux-x86_64/egg/rasa_nlu/train.py", line 65, in
File "build/bdist.linux-x86_64/egg/rasa_nlu/train.py", line 59, in do_train
File "build/bdist.linux-x86_64/egg/rasa_nlu/trainers/mitie_trainer.py", line 25, in train
File "build/bdist.linux-x86_64/egg/rasa_nlu/trainers/mitie_trainer.py", line 42, in train_entity_extractor
File "build/bdist.linux-x86_64/egg/rasa_nlu/trainers/mitie_trainer.py", line 31, in start_and_end
IndexError: list index out of range

from rasa.

amn41 commented on May 18, 2024

looks like there's an error picking up one of your entities. I can't tell if this is a bug or a problem with your data without seeing it.

Please try training intents only (e.g. removing any entities from your training data), and then add them back one by one until you trigger this error. Then please post here the training example which causes the error.

from rasa.

angelo337 commented on May 18, 2024

hi there
I just try your solutions and work like a charm, i figure out my mistake is that start counting sentences from 1 instead of 0.
now is fix it.
thanks

from rasa.

oziee commented on May 18, 2024

I have the same problem @angelo337 had.. IndexError: list index out of range
I am using the expressions.json file from wit.ai

is there a problem with training wit data??
expressions.json.zip

from rasa.

amn41 commented on May 18, 2024

thanks for sharing your training data! I'm able to reproduce this error. It's down to the fact that you have entities like 'perth' in the sentence "what is perths weather like next week". MITIE can only handle entities made up of whole tokens. I will handle this edge case in rasa, but it will still return "perths" rather than "perth" as your location. So for now you will have to resolve that entity yourself. It's on the roadmap to come up with a solution to that, though.

from rasa.

amn41 commented on May 18, 2024

although thinking about it we could explicitly insert a whitespace in these cases. I will create a new issue & make a proposal

from rasa.

beeva-lisettegarcia commented on May 18, 2024

Hello

I would like to use rasa por spanish texts.
I already download the spanish Mitie model and prepared the config file.
During training, I get the following error:

python -m rasa_nlu.train -c config.json
Training to recognize 8 categories: 'greet', 'restaurant_search', 'affirm', 'goodbye', 'saludo', 'busqueda_restaurante', 'afirmacion', 'despedida'
Train classifier
extracting text features
now do training
num training samples: 44
C: 200 f-score: 0.525
C: 400 f-score: 0.525
C: 300 f-score: 0.525
C: 100 f-score: 0.525
C: 0.01 f-score: 0.575
C: 50.005 f-score: 0.525
C: 25.0075 f-score: 0.525
C: 12.5088 f-score: 0.525
C: 6.25938 f-score: 0.525
C: 3.13469 f-score: 0.525
C: 1.57234 f-score: 0.525
C: 0.791172 f-score: 0.525
C: 0.400586 f-score: 0.525
best C: 0.01
test on train:
5 0 0 0 0 0 0 0
0 8 0 0 0 0 0 0
0 0 6 0 0 0 1 0
0 0 0 5 0 0 0 0
1 0 0 0 1 0 0 0
0 0 0 0 0 8 0 0
0 0 0 0 0 0 5 0
0 0 0 1 0 0 0 3

overall accuracy: 0.931818
Training time: 854 seconds.
df.number_of_classes(): 8

Traceback (most recent call last):
File "/home/lisettegarcia/miniconda3/envs/py2/lib/python2.7/runpy.py", line 174, in _run_module_as_main
"main", fname, loader, pkg_name)
File "/home/lisettegarcia/miniconda3/envs/py2/lib/python2.7/runpy.py", line 72, in _run_code
exec code in run_globals
File "/home/lisettegarcia/miniconda3/envs/py2/lib/python2.7/site-packages/rasa_nlu/train.py", line 65, in
do_train(config)
File "/home/lisettegarcia/miniconda3/envs/py2/lib/python2.7/site-packages/rasa_nlu/train.py", line 59, in do_train
trainer.train(training_data)
File "/home/lisettegarcia/miniconda3/envs/py2/lib/python2.7/site-packages/rasa_nlu/trainers/mitie_trainer.py", line 30, in train
self.entity_extractor = self.train_entity_extractor(data.entity_examples)
File "/home/lisettegarcia/miniconda3/envs/py2/lib/python2.7/site-packages/rasa_nlu/trainers/mitie_trainer.py", line 53, in train_entity_extractor
start, end = self.find_entity(ent, text)
File "/home/lisettegarcia/miniconda3/envs/py2/lib/python2.7/site-packages/rasa_nlu/trainers/mitie_trainer.py", line 35, in find_entity
tokens, offsets = tk.tokenize_with_offsets(text)
File "/home/lisettegarcia/miniconda3/envs/py2/lib/python2.7/site-packages/rasa_nlu/tokenizers/mitie_tokenizer.py", line 24, in tokenize_with_offsets
offset += m.start()
AttributeError: 'NoneType' object has no attribute 'start'

Tracing the error, I found the problem in

(mitie_tokenizer.py)
line 22 m = re.search(re.escape(tok), _text[offset:])

when we work with words with accents.

Any idea ?

Thanks
busq_restaurante_Data.json.zip

from rasa.

frankai commented on May 18, 2024

I have the same problem than @beeva-lisettegarcia when training with spanish accents. The problem appears to be in the mitie_tokenizer.py script. Any idea or clue to fix it? Thanks!

from rasa.

tmbo commented on May 18, 2024

@beeva-lisettegarcia @frankai I just pushed a change that should fix the encoding issue (unfortunately the test that should have ensured this functionality had a bug on its own 😓 ). Would be great if you could test that to see if it solves your issue.

For the future: Please avoid re-using closed issues. Don't hesitate to create new issues. The only thing you should do is the following: make sure the exact problem is not already an existing issue.

from rasa.

cbonadio commented on May 18, 2024

I had the same issue as @beeva-lisettegarcia @frankai, now pulled the changes and it is working.

Thanks

from rasa.

beeva-lisettegarcia commented on May 18, 2024

Thanks, Now it is working :-)

from rasa.

spanish usage about rasa HOT 12 CLOSED

Comments (12)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent