Comments (12)
the spanish MITIE models are here , if you unzip them and find the feature extractor file you should use that as your mitie_file
. If you find that the tokenizer isn't working perfectly for spanish we can address that.
from rasa.
I just Download that model and place all that infor in the config file, however I am getting this error:
would you please point me out how to fix it?
thanks
creangel@creangel_hadoop:~/Downloads/mitie/rasa_nlu$ time python -m rasa_nlu.train -c config.json
Training to recognize 4 categories: 'saludo', 'restaurante_busqueda', 'afirmacion', 'despedida'
Train classifier
extracting text features
now do training
num training samples: 63
C: 200 f-score: 0.709677
C: 400 f-score: 0.709677
C: 300 f-score: 0.709677
C: 100 f-score: 0.709677
C: 0.01 f-score: 0.612903
C: 600 f-score: 0.709677
C: 1400 f-score: 0.709677
C: 3000 f-score: 0.709677
C: 5000 f-score: 0.709677
C: 2550 f-score: 0.709677
C: 1325 f-score: 0.709677
C: 712.5 f-score: 0.709677
C: 406.25 f-score: 0.709677
C: 253.125 f-score: 0.709677
C: 176.562 f-score: 0.709677
C: 138.281 f-score: 0.709677
C: 119.141 f-score: 0.709677
C: 109.57 f-score: 0.709677
C: 104.785 f-score: 0.709677
C: 102.393 f-score: 0.709677
C: 101.196 f-score: 0.709677
C: 100.598 f-score: 0.709677
C: 100.299 f-score: 0.709677
best C: 100.598
test on train:
20 0 0 0
0 8 0 0
0 0 21 0
0 0 0 14
overall
accuracy: 1
Training time: 429 seconds.
df.number_of_classes(): 4
Traceback (most recent call last):
File "/usr/lib/python2.7/runpy.py", line 162, in _run_module_as_main
"main", fname, loader, pkg_name)
File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
exec code in run_globals
File "build/bdist.linux-x86_64/egg/rasa_nlu/train.py", line 65, in
File "build/bdist.linux-x86_64/egg/rasa_nlu/train.py", line 59, in do_train
File "build/bdist.linux-x86_64/egg/rasa_nlu/trainers/mitie_trainer.py", line 25, in train
File "build/bdist.linux-x86_64/egg/rasa_nlu/trainers/mitie_trainer.py", line 42, in train_entity_extractor
File "build/bdist.linux-x86_64/egg/rasa_nlu/trainers/mitie_trainer.py", line 31, in start_and_end
IndexError: list index out of range
from rasa.
looks like there's an error picking up one of your entities. I can't tell if this is a bug or a problem with your data without seeing it.
Please try training intents only (e.g. removing any entities from your training data), and then add them back one by one until you trigger this error. Then please post here the training example which causes the error.
from rasa.
hi there
I just try your solutions and work like a charm, i figure out my mistake is that start counting sentences from 1 instead of 0.
now is fix it.
thanks
from rasa.
I have the same problem @angelo337 had.. IndexError: list index out of range
I am using the expressions.json file from wit.ai
is there a problem with training wit data??
expressions.json.zip
from rasa.
thanks for sharing your training data! I'm able to reproduce this error. It's down to the fact that you have entities like 'perth' in the sentence "what is perths weather like next week". MITIE can only handle entities made up of whole tokens. I will handle this edge case in rasa, but it will still return "perths" rather than "perth" as your location. So for now you will have to resolve that entity yourself. It's on the roadmap to come up with a solution to that, though.
from rasa.
although thinking about it we could explicitly insert a whitespace in these cases. I will create a new issue & make a proposal
from rasa.
Hello
I would like to use rasa por spanish texts.
I already download the spanish Mitie model and prepared the config file.
During training, I get the following error:
python -m rasa_nlu.train -c config.json
Training to recognize 8 categories: 'greet', 'restaurant_search', 'affirm', 'goodbye', 'saludo', 'busqueda_restaurante', 'afirmacion', 'despedida'
Train classifier
extracting text features
now do training
num training samples: 44
C: 200 f-score: 0.525
C: 400 f-score: 0.525
C: 300 f-score: 0.525
C: 100 f-score: 0.525
C: 0.01 f-score: 0.575
C: 50.005 f-score: 0.525
C: 25.0075 f-score: 0.525
C: 12.5088 f-score: 0.525
C: 6.25938 f-score: 0.525
C: 3.13469 f-score: 0.525
C: 1.57234 f-score: 0.525
C: 0.791172 f-score: 0.525
C: 0.400586 f-score: 0.525
best C: 0.01
test on train:
5 0 0 0 0 0 0 0
0 8 0 0 0 0 0 0
0 0 6 0 0 0 1 0
0 0 0 5 0 0 0 0
1 0 0 0 1 0 0 0
0 0 0 0 0 8 0 0
0 0 0 0 0 0 5 0
0 0 0 1 0 0 0 3
overall accuracy: 0.931818
Training time: 854 seconds.
df.number_of_classes(): 8
Traceback (most recent call last):
File "/home/lisettegarcia/miniconda3/envs/py2/lib/python2.7/runpy.py", line 174, in _run_module_as_main
"main", fname, loader, pkg_name)
File "/home/lisettegarcia/miniconda3/envs/py2/lib/python2.7/runpy.py", line 72, in _run_code
exec code in run_globals
File "/home/lisettegarcia/miniconda3/envs/py2/lib/python2.7/site-packages/rasa_nlu/train.py", line 65, in
do_train(config)
File "/home/lisettegarcia/miniconda3/envs/py2/lib/python2.7/site-packages/rasa_nlu/train.py", line 59, in do_train
trainer.train(training_data)
File "/home/lisettegarcia/miniconda3/envs/py2/lib/python2.7/site-packages/rasa_nlu/trainers/mitie_trainer.py", line 30, in train
self.entity_extractor = self.train_entity_extractor(data.entity_examples)
File "/home/lisettegarcia/miniconda3/envs/py2/lib/python2.7/site-packages/rasa_nlu/trainers/mitie_trainer.py", line 53, in train_entity_extractor
start, end = self.find_entity(ent, text)
File "/home/lisettegarcia/miniconda3/envs/py2/lib/python2.7/site-packages/rasa_nlu/trainers/mitie_trainer.py", line 35, in find_entity
tokens, offsets = tk.tokenize_with_offsets(text)
File "/home/lisettegarcia/miniconda3/envs/py2/lib/python2.7/site-packages/rasa_nlu/tokenizers/mitie_tokenizer.py", line 24, in tokenize_with_offsets
offset += m.start()
AttributeError: 'NoneType' object has no attribute 'start'
Tracing the error, I found the problem in
(mitie_tokenizer.py)
line 22 m = re.search(re.escape(tok), _text[offset:])
when we work with words with accents.
Any idea ?
Thanks
busq_restaurante_Data.json.zip
from rasa.
I have the same problem than @beeva-lisettegarcia when training with spanish accents. The problem appears to be in the mitie_tokenizer.py script. Any idea or clue to fix it? Thanks!
from rasa.
@beeva-lisettegarcia @frankai I just pushed a change that should fix the encoding issue (unfortunately the test that should have ensured this functionality had a bug on its own 😓 ). Would be great if you could test that to see if it solves your issue.
For the future: Please avoid re-using closed issues. Don't hesitate to create new issues. The only thing you should do is the following: make sure the exact problem is not already an existing issue.
from rasa.
I had the same issue as @beeva-lisettegarcia @frankai, now pulled the changes and it is working.
Thanks
from rasa.
Thanks, Now it is working :-)
from rasa.
Related Issues (20)
- TypeError linked to protobuf on Rasa 3.3.2 / Python 3.9 when trying to import Validator HOT 2
- rasa data validate does not properly ignore warnings HOT 3
- JSONDecodeError when loading YAML file HOT 1
- Could not load model due to Error initializing graph component for node 'run_LanguageModelFeaturizer1' HOT 4
- rasa train does not pick GPU HOT 4
- AttributeError: module 'rasa_nlu.config' has no attribute 'load' HOT 1
- Explain-ability with LIME or SHAP HOT 2
- Bugs encountered when using external PostgreSQL and Redis HOT 2
- Problems with rasa installation on Python 3.10 HOT 2
- Improving README.md steps in Development Internals HOT 3
- Test feature request
- Test bug
- Training model not working on mac m1: 9284 illegal hardware instruction HOT 2
- 💡 Looking for issues? Head over to Jira
- 💡 Looking for issues? Head over to Jira!
- Cython installation issue in arm processor. HOT 3
- Make pre compiled typo detection when $ rasa train
- RASA NLU trainer error HOT 3
- Add random_state (as keyword argument?) to generate_folds and use it when executing stratified sampling HOT 1
- UserWaning, issue found in data/rules.yml Found intent "name_intent" in stories wich is not a part of the domain
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from rasa.