rasahq / rasa Goto Github PK

💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants

Home Page: https://rasa.com/docs/rasa/

License: Apache License 2.0

Python 99.30% Makefile 0.22% Dockerfile 0.22% HTML 0.06% Shell 0.10% HCL 0.09% Jinja 0.02%

nlp machine-learning machine-learning-library bot bots botkit rasa wit nlu conversational-bots

rasa's Introduction

Rasa Open Source

💡 We're migrating issues to Jira 💡

Starting January 2023, issues for Rasa Open Source are located in this Jira board. You can browse issues without being logged in; if you want to create issues, you'll need to create a Jira account.

Rasa is an open source machine learning framework to automate text and voice-based conversations. With Rasa, you can build contextual assistants on:

Facebook Messenger
Slack
Google Hangouts
Webex Teams
Microsoft Bot Framework
Rocket.Chat
Mattermost
Telegram
Twilio
Your own custom conversational channels

or voice assistants as:

Alexa Skills
Google Home Actions

Rasa helps you build contextual assistants capable of having layered conversations with lots of back-and-forth. In order for a human to have a meaningful exchange with a contextual assistant, the assistant needs to be able to use context to build on things that were previously discussed – Rasa enables you to build assistants that can do this in a scalable way.

There's a lot more background information in this blog post.

🤔 Learn more about Rasa
🤓 Read The Docs
😁 Install Rasa
🚀 Dive deeper in the learning center
🤗 Contribute
❓ Get enterprise-grade support
🏢 Explore the features of our commercial platform
📚 Learn more about research papers that leverage Rasa

Where to get help

There is extensive documentation in the Rasa Docs. Make sure to select the correct version so you are looking at the docs for the version you installed.

Please use Rasa Community Forum for quick answers to questions.

How to contribute

We are very happy to receive and merge your contributions into this repository!

To contribute via pull request, follow these steps:

Create an issue describing the bug/improvement you want to work on or pick up an existing issue in Jira
Follow our Pull Request guidelines: write code, test, documentation, changelog and follow our Code Style
Create a pull request describing your changes

For more detailed instructions on how to contribute code, check out these code contributor guidelines.

You can find more information about how to contribute to Rasa (in lots of different ways!) on our website..

Your pull request will be reviewed by a maintainer, who will get back to you about any necessary changes or questions. You will also be asked to sign a Contributor License Agreement.

Development Internals

Installing Poetry

Rasa uses Poetry for packaging and dependency management. If you want to build it from source, you have to install Poetry first. Please follow the official guide to see all possible options.

To update an existing poetry version to the version, currently used in rasa, run:

    poetry self update <version>

Managing environments

The official Poetry guide suggests to use pyenv or any other similar tool to easily switch between Python versions. This is how it can be done:

pyenv install 3.10.10
pyenv local 3.10.10  # Activate Python 3.10.10 for the current project

Note: If you have trouble installing a specific version of python on your system it might be worth trying other supported versions.

By default, Poetry will try to use the currently activated Python version to create the virtual environment for the current project automatically. You can also create and activate a virtual environment manually — in this case, Poetry should pick it up and use it to install the dependencies. For example:

python -m venv .venv
source .venv/bin/activate

You can make sure that the environment is picked up by executing

poetry env info

Building from source

To install dependencies and rasa itself in editable mode execute

make install

Note for macOS users: under macOS Big Sur we've seen some compiler issues for dependencies. Using export SYSTEM_VERSION_COMPAT=1 before the installation helped.

Installing optional dependencies

In order to install rasa's optional dependencies, you need to run:

make install-full

Note for macOS users: The command make install-full could result in a failure while installing tokenizers (issue described in depth here).

In order to resolve it, you must follow these steps to install a Rust compiler:

brew install rustup
rustup-init

After initialising the Rust compiler, you should restart the console and check its installation:

rustc --version

In case the PATH variable had not been automatically setup, run:

export PATH="$HOME/.cargo/bin:$PATH"

Running and changing the documentation

First of all, install all the required dependencies:

make install install-docs

After the installation has finished, you can run and view the documentation locally using:

make livedocs

It should open a new tab with the local version of the docs in your browser; if not, visit http://localhost:3000 in your browser. You can now change the docs locally and the web page will automatically reload and apply your changes.

Running the Tests

In order to run the tests, make sure that you have the development requirements installed:

make prepare-tests-ubuntu # Only on Ubuntu and Debian based systems
make prepare-tests-macos  # Only on macOS

Then, run the tests:

make test

They can also be run at multiple jobs to save some time:

JOBS=[n] make test

Where [n] is the number of jobs desired. If omitted, [n] will be automatically chosen by pytest.

Running the Integration Tests

In order to run the integration tests, make sure that you have the development requirements installed:

make prepare-tests-ubuntu # Only on Ubuntu and Debian based systems
make prepare-tests-macos  # Only on macOS

Then, you'll need to start services with the following command which uses Docker Compose:

make run-integration-containers

Finally, you can run the integration tests like this:

make test-integration

Resolving merge conflicts

Poetry doesn't include any solution that can help to resolve merge conflicts in the lock file poetry.lock by default. However, there is a great tool called poetry-merge-lock. Here is how you can install it:

pip install poetry-merge-lock

Just execute this command to resolve merge conflicts in poetry.lock automatically:

poetry-merge-lock

Build a Docker image locally

In order to build a Docker image on your local machine execute the following command:

make build-docker

The Docker image is available on your local machine as rasa:localdev.

Code Style

To ensure a standardized code style we use the formatter black. To ensure our type annotations are correct we use the type checker pytype. If your code is not formatted properly or doesn't type check, GitHub will fail to build.

Formatting

If you want to automatically format your code on every commit, you can use pre-commit. Just install it via pip install pre-commit and execute pre-commit install in the root folder. This will add a hook to the repository, which reformats files on every commit.

If you want to set it up manually, install black via poetry install. To reformat files execute

make formatter

Type Checking

If you want to check types on the codebase, install mypy using poetry install. To check the types execute

make types

Deploying documentation updates

We use Docusaurus v2 to build docs for tagged versions and for the main branch. To run Docusaurus, install Node.js 12.x. The static site that gets built is pushed to the documentation branch of this repo.

We host the site on netlify. On main branch builds (see .github/workflows/documentation.yml), we push the built docs to the documentation branch. Netlify automatically re-deploys the docs pages whenever there is a change to that branch.

Releases

Rasa has implemented robust policies governing version naming, as well as release pace for major, minor, and patch releases.

The values for a given version number (MAJOR.MINOR.PATCH) are incremented as follows:

MAJOR version for incompatible API changes or other breaking changes.
MINOR version for functionality added in a backward compatible manner.
PATCH version for backward compatible bug fixes.

The following table describes the version types and their expected release cadence:

Version Type	Description	Target Cadence
Major	For significant changes, or when any backward-incompatible changes are introduced to the API or data model.	Every 1 - 2 yrs
Minor	For when new backward-compatible functionality is introduced, a minor feature is introduced, or when a set of smaller features is rolled out.	+/- Quarterly
Patch	For backward-compatible bug fixes that fix incorrect behavior.	As needed

While this table represents our target release frequency, we reserve the right to modify it based on changing market conditions and technical requirements.

Maintenance Policy

Our End of Life policy defines how long a given release is considered supported, as well as how long a release is considered to be still in active development or maintenance.

The maintenance duration and end of life for every release are shown on our website as part of the Product Release and Maintenance Policy.

Cutting a Major / Minor release

A week before release day

Make sure the milestone already exists and is scheduled for the correct date.
Take a look at the issues & PRs that are in the milestone: does it look about right for the release highlights we are planning to ship? Does it look like anything is missing? Don't worry about being aware of every PR that should be in, but it's useful to take a moment to evaluate what's assigned to the milestone.
Post a message on the engineering Slack channel, letting the team know you'll be the one cutting the upcoming release, as well as:
1. Providing the link to the appropriate milestone
2. Reminding everyone to go over their issues and PRs and please assign them to the milestone
3. Reminding everyone of the scheduled date for the release

A day before release day

Go over the milestone and evaluate the status of any PR merging that's happening. Follow up with people on their bugs and fixes. If the release introduces new bugs or regressions that can't be fixed in time, we should discuss on Slack about this and take a decision on how to move forward. If the issue is not ready to be merged in time, we remove the issue / PR from the milestone and notify the PR owner and the product manager on Slack about it. The PR / issue owners are responsible for communicating any issues which might be release relevant. Postponing the release should be considered as an edge case scenario.

Release day! 🚀

At the start of the day, post a small message on slack announcing release day! Communicate you'll be handling the release, and the time you're aiming to start releasing (again, no later than 4pm, as issues may arise and cause delays). This message should be posted early in the morning and before moving forward with any of the steps of the release, in order to give enough time to people to check their PRs and issues. That way they can plan any remaining work. A template of the slack message can be found here. The release time should be communicated transparently so that others can plan potentially necessary steps accordingly. If there are bigger changes this should be communicated.
Make sure the milestone is empty (everything has been either merged or moved to the next milestone)
Once everything in the milestone is taken care of, post a small message on Slack communicating you are about to start the release process (in case anything is missing).
You may now do the release by following the instructions outlined in the Rasa Open Source README !

After a Major release

After a Major release has been completed, please follow these instructions to complete the documentation update.

Steps to release a new version

Releasing a new version is quite simple, as the packages are build and distributed by GitHub Actions.

Release steps:

Make sure all dependencies are up to date (especially Rasa SDK)
- For Rasa SDK, except in the case of a patch release, that means first creating a new Rasa SDK release (make sure the version numbers between the new Rasa and Rasa SDK releases match)
- Once the tag with the new Rasa SDK release is pushed and the package appears on pypi, the dependency in the rasa repository can be resolved (see below).
If this is a minor / major release: Make sure all fixes from currently supported minor versions have been merged from their respective release branches (e.g. 3.3.x) back into main.
In case of a minor release, create a new branch that corresponds to the new release, e.g.
```
 git checkout -b 1.2.x
 git push origin 1.2.x
```
Switch to the branch you want to cut the release from (main in case of a major, the <major>.<minor>.x branch for minors and patches)
- Update the rasa-sdk entry in pyproject.toml with the new release version and run poetry update. This creates a new poetry.lock file with all dependencies resolved.
- Commit the changes with git commit -am "bump rasa-sdk dependency" but do not push them. They will be automatically picked up by the following step.
If this is a major release, update the list of actively maintained versions in the README and in the docs.
Run make release
Create a PR against the release branch (e.g. 1.2.x)
Once your PR is merged, tag a new release (this SHOULD always happen on the release branch), e.g. using
```
git checkout 1.2.x
git pull origin 1.2.x
git tag 1.2.0 -m "next release"
git push origin 1.2.0 --tags
```
GitHub will build this tag and publish the build artifacts.
After all the steps are completed and if everything goes well then we should see a message automatically posted in the company's Slack (product channel) like this one
If no message appears in the channel then you can do the following checks:
- Check the workflows in Github Actions and make sure that the merged PR of the current release is completed successfully. To easily find your PR you can use the filters event: push and branch: <version number> (example on release 2.4 you can see here)
- If the workflow is not completed, then try to re run the workflow in case that solves the problem
- If the problem persists, check also the log files and try to find the root cause of the issue
- If you still cannot resolve the error, contact the infrastructure team by providing any helpful information from your investigation
After the message is posted correctly in the product channel, check also in the product-engineering-alerts channel if there are any alerts related to the Rasa Open Source release like this one

Cutting a Patch release

Patch releases are simpler to cut, since they are meant to contain only bugfixes.

The only things you need to do to cut a patch release are:

Notify the engineering team on Slack that you are planning to cut a patch, in case someone has an important fix to add.
Make sure the bugfix(es) are in the release branch you will use (p.e if you are cutting a 2.0.4 patch, you will need your fixes to be on the 2.0.x release branch). All patch releases must come from a .x branch!
Once you're ready to release the Rasa Open Source patch, checkout the branch, run make release and follow the steps + get the PR merged.
Once the PR is in, pull the .x branch again and push the tag!

Additional Release Tasks

Note: This is only required if the released version is the highest version available. For instance, perform the following steps when version > version on main.

In order to check compatibility between the new released Rasa version to the latest version of Rasa X/Enterprise, we perform the following steps:

Following a new Rasa release, an automated pull request is created in Rasa-X-Demo.
Once the above PR is merged, follow instructions here, to release a version.
Update the new version in the Rasa X/Enterprise env file. The Rasa-X-Demo project uses the new updated Rasa version to train and test a model which in turn is used by our CI to run tests in the Rasa X/Enterprise repository, thus validating compatibility between Rasa and Rasa X/Enterprise.

Actively maintained versions

Please refer to the Rasa Product Release and Maintenance Policy page.

License

A list of the Licenses of the dependencies of the project can be found at the bottom of the Libraries Summary.

rasa's People

Contributors

Stargazers

Watchers

Forkers

samhavens kevark dialoganalytics julien-c agentxx charlesmst ebottabi siqueries ml-lab devilankur18 alejandroesquivel chagge arbdigital surajpaib binarycpu magicknight anukat2015 intranetfactory theolivenbaum folkevil angelo337 awesome-archive techiev2 mjunaidi hammingcube libaoming wavelets mathandpencil sungjinlees vyraun hbcbh1999 loticdigital insighty igorcosta itys xuanhan863 inuishan lazycrazyowl scupid-admin rudimk thayumaanavan anandsrao wizeline jdwuarin danielcreese soedr hintikkakimmo theflyingmantis sanyaade-machine-learning stvhanna rickredsix cherish24 senthalan nmstoker zorrock chatbots-code codeaudit kalaboster itsbalamurali socalironman ankitbko ashutoshpw so9941 andreapavoni yarwelp toferc kesapp dipesh-gandhi jadielam nimblestack gogistics khadirlamrani jgranstrom jasmine-kaur reyadrahman wjpeters jcheng8 hanhanwu stealth-17 sjjpo2002 mmraghavendra mcotse claytantor tolleiv michaelokuboyejo unarmedpuppy jonbaer ctk0418 verloop joshi-rohit myluco skyba cookkkie leandroneves choufractal gitter-badger hpieris snowygoose pavelk2 omerjp

rasa's Issues

server should handle nonascii text

if we add some non-ascii text in test_post_parse() the encoding isn't right. see lines 57-59 in test_server.py

        req = requests.post(http_server + "/parse", json={"q": u"hello ńöñàśçií"})
        expected = [{"entities": {}, "confidence": None, "intent": "greet", "_text": "hello ńöñàśçií"}]
       assert req.status_code == 200 and req.json() == expected

this fails

Is there support for slot filling?

Hi there

Is there support for slot filling where the user can provide all the necessary info for a given intent over multiple interactions
Example:
User: I'm hungry
Bot: What kind of food would you like to eat?
User: Mexican
Bot: Are you looking to order delivery or go out?
User: Delivery
Bot: Here are some nearby options for Mexican food for delivery: ...

Or for a simpler more classic example - think booking an appointment where you're asked to provide time and with whom for instance.

Is there already support for this type of scenario? If so, could you point me to key parts of code/tutorial of relevance? If not, is this on the roadmap, and is there some workaround way to do this in the meantime?

Thank you

support for training data fetched from DB

I like this idea, suggested by @3x14159265 of being able to load training data from a DB rather than a json file. Primarily because a DB is a more flexible & robust way to store that data, and not because of performance.

If possible, I don't want rasa NLU to dictate what database people use to store their stuff. So maybe we can create an abstract TrainingDataLoader class, from which everything else derives, including the ones which read json files. Then users can subclass that to read from whichever DB they like.

entities: list or dict?

@plauto you mentioned there was some confusion around when entities is a dict and when it's a list. Let's figure out what exactly is wrong

MITIE install fails on Windows 10

Hi,

I'm trying to install MITIE backend on windows 10 using pip install git+https://github.com/mit-nlp/MITIE.git. However the install fails at the build step with the following error message.
`Collecting git+https://github.com/mit-nlp/MITIE.git
Cloning https://github.com/mit-nlp/MITIE.git to c:\users\njones\appdata\local\temp\pip-97mlgx-build
Installing collected packages: mitie
Running setup.py install for mitie: started
Running setup.py install for mitie: finished with status 'error'
Complete output from command c:\python27\python.exe -u -c "import setuptools, tokenize;file='c:\users\njones\appdata\local\temp\pip-97mlgx-build\setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" install --record c:\users\njones\appdata\local\temp\pip-ydwrn7-record\install-record.txt --single-version-externally-managed --compile:
running install
running build
error: [Error 2] The system cannot find the file specified

----------------------------------------

Command "c:\python27\python.exe -u -c "import setuptools, tokenize;file='c:\users\njones\appdata\local\temp\pip-97mlgx-build\setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" install --record c:\users\njones\appdata\local\temp\pip-ydwrn7-record\install-record.txt --single-version-externally-managed --compile" failed with error code 1 in c:\users\njones\appdata\local\temp\pip-97mlgx-build`

Any suggestions on I could be doing wrong would be really helpful!

Thanks

KeyError 'text'

I have trained the demo-restaurants.json and booted up a server, but when I query, I get a KeyError Exception

Exception happened during processing of request from ('127.0.0.1', 43896)
Traceback (most recent call last):
  File "/usr/lib/python2.7/SocketServer.py", line 290, in _handle_request_noblock
    self.process_request(request, client_address)
  File "/usr/lib/python2.7/SocketServer.py", line 318, in process_request
    self.finish_request(request, client_address)
  File "/usr/lib/python2.7/SocketServer.py", line 331, in finish_request
    self.RequestHandlerClass(request, client_address, self)
  File "/home/lalithmohan/.local/lib/python2.7/site-packages/rasa_nlu/server.py", line 75, in <lambda>
    self.server = HTTPServer(('', self.config.port), lambda *args: RasaRequestHandler(self.data_router, *args))
  File "/home/lalithmohan/.local/lib/python2.7/site-packages/rasa_nlu/server.py", line 159, in __init__
    BaseHTTPRequestHandler.__init__(self, *args)
  File "/usr/lib/python2.7/SocketServer.py", line 652, in __init__
    self.handle()
  File "/usr/lib/python2.7/BaseHTTPServer.py", line 340, in handle
    self.handle_one_request()
  File "/usr/lib/python2.7/BaseHTTPServer.py", line 328, in handle_one_request
    method()
  File "/home/lalithmohan/.local/lib/python2.7/site-packages/rasa_nlu/server.py", line 201, in do_POST
    self.wfile.write(self.get_response(data_dict))
  File "/home/lalithmohan/.local/lib/python2.7/site-packages/rasa_nlu/server.py", line 175, in get_response
    result = self.data_router.parse(data["text"])
KeyError: 'text'

I have both the data and mitie_file in the same directory and have installed mitie backend. I also tried sending the key as "text" instead of "q"in the POST, but it gives the wrong intent and entities. I don't know what I'm missing here. Here's my config.json

{ "backend": "mitie", "path" : "./", "mitie_file" : "total_word_feature_extractor.dat", "data" : "demo-rasa.json" }

Add a little more detail to the ReadTheDocs documentation

Firstly I think this is an amazing project (big thank you for the it and the blog post about the details behind it, which is well worth a read!)

I expect you've got a lot of things on the To Do list, but expanding a little on the detail in the ReadTheDocs documentation in a couple of places would help newbies a lot.

A couple of places:

Installation assumes a certain amount of experience (perhaps this is a good way to avoid those with v little technical ability out as they may bring more problems, but as it stands, it doesn't list the actual steps required - you have a few lines on the ReadMe of the repo here which could be added; otherwise the only details in ReadTheDocs are for the backend choices)
Some more detail on the syntax for the training data

it's possible to infer it from the demo file, but initially a few things weren't clear regarding what was required or not (eg you don't actually need the items in the "intent_examples" to match up to those in the "entity_examples"
the numbering for the start/end of entities (ie it's zero-based and the end currently needs to be one more than the expected end, eg a single character entity would typically start and end at the same position, but in the current syntax it is start+1)
what (if anything) is the benefit of including examples with no entities in the "entity_examples" section
that neither "intent_examples" nor "entiity_examples" can be empty and that for entities they are recommended to have more than one example of each case
whether entities can be two words or not
any particular formatting or names etc that won't work or causes issues
whether multiple files can be used in the regular Rasa format (they can be if it's working with output from API.AI)

Anyway, this isn't meant as a rude demand, just a friendly comment so it's tracked - if I get time, is this the sort of contribution you'd welcome from fellow developers/users?

[Question] Deploying to Heroku

There probably needs to be more documentation on how to deploy to heroku. I've added a deploy button in #89 and upon attempting to deploy, I now have to or have the option to fill in the following.

RASA_TOKEN (token for validating requests)
- This is automatically generated, but what is this used for? Do I need to look up this value later?
RASA_BACKEND (which backend to use)
- Looks like this is documented here: https://github.com/golastmile/rasa_nlu/blob/master/docs/config.rst might be helpful to link back to this.
RASA_PATH (where to save models)
- Am I generating these models? Do I need to back these up?
RASA_LOGDIR (where to save logs)
- Fine as is. Could talk more about what is being logged.
RASA_MITIE_FILE (file containing mitie feature extractor)
- If I'm using spacy_sklearn for my backend, why is this still required?
RASA_SERVER_MODEL_DIR (dir containing model which should be used to process requests)
- Not sure what this is used for? https://github.com/golastmile/rasa_nlu/blob/master/docs/config.rst is scarse on details.
AWS_SECRET_ACCESS_KEY (secret key for S3 access)
- What permissions are required for uploading? Just PutObject or something else?
AWS_ACCESS_KEY_ID (key id for S3 access)
- What permissions are required for uploading? Just PutObject or something else?
BUCKET_NAME (name of s3 bucket)
- Is there a specific directory structure for the s3 bucket?
AWS_REGION (aws region of S3 bucket)
- Fine as is.

There probably should be another document such as deploying-to-heroku.md or deploy/heroku.md explaining what each variable does.

further german support

we should add tests to test_tokenizers.py and test_featurizers.py that make use of spacy's german stuff.

we should also throw an error if someone tries to use mitie as a backend but has the language set to 'de'

logging verbosity options

I think we should give the user some control over the level of logging output.

option 1: default is to be very verbose, make a -q quiet command line flag to suppress

option 2: default is minimal logging, make -v verbose flag to help debugging

python 3 support

Currently code doesn't work with python 3.

use unicode throughout

I think we should read training data files in as unicode, and just represent all text with unicode objects and not str.

In training_data.py I would propose we read the json files like this

import codecs
f = codecs.open('data.json', encoding='utf-8')
data = json.loads(f.read())

then we should only have unicode objects everywhere and we can avoid using any str at all. NB this may have downstream effects in e.g. spacy_tokenizer.py so we should run all the tests to check everything still works.

Fulfillment is Empty

Hi,

I have Rasa installed with MITIE backend. I trained my models from API.ai, using my own and using the ones I found in the examples folder. Unfortunately, it responds with an empty "fulfillment" every time I make a query. Please advise.

Best,
Aaron

Wrong indexing for entity strings

On
https://github.com/golastmile/rasa_nlu/blob/master/src/trainers/mitie_trainer.py#L40
The increment by one on the end index is not correct. text[start:end] should get the word properly by itself. increase it by one will cause problem when a word is followed by any symbols. This bug is hidden because the next character for most words or tokens is a white space.

add error message when no logfile key is specified in config.json

Problem
Two scenarios:

no config file specified. Empty dict is used as default configuration;
config file specified, but the dictionary holds no logfile key;

DataRouter() in server.py assigns self.logfile=config["logfile"] but get method fails.

Possible Solution
verify if key is present in config dict:

trigger an error and exit;
use default value.

visualization generates errors

$ python -m rasa_nlu.visualize ./rasa_nlu_log.json

It launch but when i do a curl, I have

Exception happened during processing of request from ('127.0.0.1', 63478)
Traceback (most recent call last):
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/SocketServer.py", line 295, in _handle_request_noblock
    self.process_request(request, client_address)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/SocketServer.py", line 321, in process_request
    self.finish_request(request, client_address)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/SocketServer.py", line 334, in finish_request
    self.RequestHandlerClass(request, client_address, self)
  File "build/bdist.macosx-10.12-intel/egg/rasa_nlu/visualize.py", line 14, in <lambda>
  File "build/bdist.macosx-10.12-intel/egg/rasa_nlu/visualize.py", line 27, in __init__
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/SocketServer.py", line 655, in __init__
    self.handle()
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/BaseHTTPServer.py", line 340, in handle
    self.handle_one_request()
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/BaseHTTPServer.py", line 328, in handle_one_request
    method()
  File "build/bdist.macosx-10.12-intel/egg/rasa_nlu/visualize.py", line 37, in do_GET
  File "build/bdist.macosx-10.12-intel/egg/rasa_nlu/training_data.py", line 18, in __init__
    self.fformat = self.guess_format(self.files)
  File "build/bdist.macosx-10.12-intel/egg/rasa_nlu/training_data.py", line 60, in guess_format
    if "data" in "data" and type(filedata.get("data")) is list:
AttributeError: 'list' object has no attribute 'get'

It seems that it cannot load the file but the file exists and contains:

[
  {
    "entities": {},
    "intent": "greet",
    "text": "hello"
  }
]

catch errors if insufficient training examples

should add some code to count the number of training examples for each intent and entity before passing to train() and throw a helpful error / warning as appropriate

Add confidence level to entities and intent

Would be great to return the confidence for an intent and/or entity, how does it work at the moment? does it return the best match ? Happy to help implementing this as well :)

Build a Botkit Middleware

Botkit is a Node-based tool for building bots. Many developers use API-powered NLP tools right now via a middleware system.

We should create a middleware for Botkit that:

sends messages received by the bot to the developer's rasa endpoint, amends the message with intent and entity information
exposes a trigger conditional that allows the bot to 'hear' intents from rasa

Though the specific details will be different, this will be virtually identical in form to any of the other NLP middlewares, such as botkit-wit:
https://github.com/SnowStormIO/botkit-witai

One click button to deploy example application

We have a "deploy to Heroku" button set up, the purpose of which is to let people try out rasa NLU SUPER quickly without setting up a virtualenv, installing numpy/scipy/mitie, etc.

with the new HTTP /train endpoint, this means that users can deploy to heroku, and then just POST their downloaded data from wit/LUIS/api to train a model.

the free Heroku VPSs are very resource constrained, we only have 0.5Gb of memory to play with, see https://devcenter.heroku.com/articles/dyno-types . Running spacy will be a problem, because it's pretty memory heavy. So I think we should default to using mitie in that case.

Problem with MITIE (as of now) is that the intent classifier has really stupid line search defaults hard-coded in the C++ code. This means it can take hours to train a pretty simple model, even on a decent laptop.

My fork of MITIE has a temporary hack to disable the line search all together, which actually works fine. We just need to test that the heroku servers correctly install my fork https://github.com/amn41/MITIE and then test that we can in fact train a model in reasonable time.

The other issue is that the filesystem gets wiped regularly and without warning on heroku, so we may have to download the mitie total_word_feature_extractor.dat file any given time we train a model.

PR #31 adds functionality to persist trained models to S3, specifically so that we don't rely on the server's filesystem any more. Models should get uploaded automatically if the right environment variables are set.

[Question] How to keep track of conversation state?

This is probably out-of-scope for this project, but is there any way to keep track of a conversation state?

spanish usage

hi there
is it possible to use RASA in spanish? with the MITIE model in spanish?
if so, could you please point me some resource to do all changes?
thanks
angelo

document `intent` versus `entity` for wit users

wit no longer has a native concept of an intent , so for new users of wit they won't know what that means. we'll also have to figure out how to parse exported wit apps which don't have any intents defined

server response

First of all, thanks for the great work!
I'm by all means no Python expert, I just wanted to try how this server works, and have encountered several issues down the road, some I could solve them, some not ... So, from the beginning:

numpy must be installed, maybe I missed that in the docs, but it threw error prior to install it
while trying to train, after installing MITIE following the steps in the docs, it won't work, at least in my setup, unless I give some extra params on the command line; the full command line I had to use is
python -m rasa_nlu.train -c config.json -b mitie -d data/demo-rasa.json -p model
Despite the fact mitie is configured in config.json, if I'm missing that on the command line, trainer will throw error by picking 'keyword' and saying it's not implemented. The same for the rest of the parameters.
after training is finished, I'm running the server as advised in the docs. Here I have two issues: I can't see the file with commands sent to the server, it's not being created, but more important, no matter what text I send for analysis, the server constantly answers like this:
{"error": "Invalid parse parameter specified"}
Command line is the advised one:
curl -X POST -d '{"text":"I am looking for Chinese food"}' http://localhost:5000/parse

Can you please tell me what I'm doing wrong ? Where should I look for hints ?

Thanks a lot!

Visualizer should show matched intent for each utterance

Either group the training data by intent, or show the intent of an utterance on the left, before the presenting the sentence

MITIEInterpreter

I have got an error while running main.py, seems to be pointing to MITIEInterpreter. the init functions takes in 3 parameter and 3 was passed into the function (interpreter = MITIEInterpreter(config.classifier_file,config.ner_file,config.fe_file)) via main.py but i still get an error, could you look into it please.

self.extractor = named_entity_extractor(ner_file,fe_filepath)
TypeError: __init__() takes exactly 2 arguments (3 given)

push coveralls from travis

tests for update_config

we should have some tests for the update_config method. This merges options set in the config file, passed on the command line, and set in environment variables. Should make clear to the user what takes priority and test for any conflicts.

Visualizer has trouble rendering overlapping entities

Same word appears multiple times if it is included in overlapping entities.

Expected behaviour: All matching entity types appear at the end of the matched word

Error running sklearn backend on windows

I am getting the following error when running:

python -m rasa_nlu.server -c config.json -e api

INFO:root:using spacy + sklearn backend
Traceback (most recent call last):
  File "C:\Users\\Anaconda2\lib\runpy.py", line 162, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "C:\Users\\Anaconda2\lib\runpy.py", line 72, in _run_code
    exec code in run_globals
  File "build\bdist.win-amd64\egg\rasa_nlu\server.py", line 247, in 
  File "build\bdist.win-amd64\egg\rasa_nlu\server.py", line 20, in __init__
  File "build\bdist.win-amd64\egg\rasa_nlu\server.py", line 54, in __create_interpreter
  File "build\bdist.win-amd64\egg\rasa_nlu\interpreters\spacy_sklearn_interpreter.py", line 15, in __init__
  File "C:\Users\\Anaconda2\lib\pickle.py", line 1384, in load
    return Unpickler(file).load()
  File "C:\Users\\Anaconda2\lib\pickle.py", line 864, in load
    dispatch[key](self)
  File "C:\Users\\Anaconda2\lib\pickle.py", line 1096, in load_global
    klass = self.find_class(module, name)
  File "C:\Users\\Anaconda2\lib\pickle.py", line 1130, in find_class
    __import__(module)
ImportError: No module named sklearn_intent_classifier

I have tried switching over to MITIE, but have had no luck getting that to work either.

I am trying to run a model that was exported from api.ai.

Any help solving this issue would be much appreciated.

fix documentation for api endpoints

we currently claim that 'emulating' as service only affects the json format of the response. This is incorrect because the API methods actually also change, so that the user should only have to change the url they're sending requests to. need to make clear this is always the case (including for api.ai) & documented as such.

Visualizer requires mitie but doc doesn't mention

Hello!
I was going through tutorial and attempt to visualize but got the following error. After installing MITIE, all was happy. Should we note this in the doc or am I overlooking something?

$ python -m rasa_nlu.visualize data/demo-rasa.json
192.168.200.1 - - [21/Dec/2016 21:30:12] "GET / HTTP/1.1" 200 -
----------------------------------------
Exception happened during processing of request from ('192.168.200.1', 56752)
Traceback (most recent call last):
  File "/usr/lib/python2.7/SocketServer.py", line 295, in _handle_request_noblock
    self.process_request(request, client_address)
  File "/usr/lib/python2.7/SocketServer.py", line 321, in process_request
    self.finish_request(request, client_address)
  File "/usr/lib/python2.7/SocketServer.py", line 334, in finish_request
    self.RequestHandlerClass(request, client_address, self)
  File "build/bdist.linux-x86_64/egg/rasa_nlu/visualize.py", line 14, in <lambda>
  File "build/bdist.linux-x86_64/egg/rasa_nlu/visualize.py", line 27, in __init__
  File "/usr/lib/python2.7/SocketServer.py", line 655, in __init__
    self.handle()
  File "/usr/lib/python2.7/BaseHTTPServer.py", line 340, in handle
    self.handle_one_request()
  File "/usr/lib/python2.7/BaseHTTPServer.py", line 328, in handle_one_request
    method()
  File "build/bdist.linux-x86_64/egg/rasa_nlu/visualize.py", line 37, in do_GET
  File "build/bdist.linux-x86_64/egg/rasa_nlu/training_data.py", line 25, in __init__
    from rasa_nlu.tokenizers.mitie_tokenizer import MITIETokenizer
  File "build/bdist.linux-x86_64/egg/rasa_nlu/tokenizers/mitie_tokenizer.py", line 1, in <module>
ImportError: No module named mitie

"advanced" number entity extraction

Hello,
Is it possible to train Rasa to extract numbers even in "tricky" cases like: 100K -> 100000, 100,000->100000, 100.5K -> 100500 etc.
wit/number entity managed to extracted it very well.

Thanks, Yevgeny

Rasa.ai added value

Hey,

Rasa.ai looks like a very interesting framework and seems to address main weak point of the current hosted NLP API solutions

We've been working for a while with scapy which makes a good work for us till now.
Should we consider switching to rasa.ai and for what reasons? I mean it seems like your framework uses scapy a lot, for example your tokenizer uses scapy's one (or mitie's) so what are use cases you aimed to simplify/enhance and how your framework does it better than the backend libraries it wraps?

MITIE is always required

I tried to use the spacy backend without installing MITIE & scipy. Installing them resolved the issue.

Error message below:

python -m rasa_nlu.train -c config.json
Traceback (most recent call last):
  File "/usr/lib/python2.7/runpy.py", line 174, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
    exec code in run_globals
  File "build/bdist.linux-x86_64/egg/rasa_nlu/train.py", line 44, in <module>
  File "build/bdist.linux-x86_64/egg/rasa_nlu/train.py", line 36, in do_train
  File "build/bdist.linux-x86_64/egg/rasa_nlu/train.py", line 23, in create_trainer
  File "build/bdist.linux-x86_64/egg/rasa_nlu/trainers/spacy_sklearn_trainer.py", line 5, in <module>
  File "build/bdist.linux-x86_64/egg/rasa_nlu/classifiers/sklearn_intent_classifier.py", line 2, in <module>
ImportError: No module named mitie

Training error

I am doing restaurants tutorial mentioned at http://rasa-nlu.readthedocs.io/en/latest/tutorial.html#section-tutorial

Doing it spacy_sklearn way

Downloaded spacy and all the models

I had scikit-learn installed, which was 0.18

I had data/demo-rasa.json so mentioned it in the config.json and not demo-restaurants.json

When I fiired following command:
python -m rasa_nlu.train -c config.json

I got error:

ImportError: cannot import name check_arrays

Thinking that its a scikit version issue (ref: http://stackoverflow.com/questions/29596237/import-check-arrays-from-sklearn) I installed 0.17.1 which gave following error:

ImportError: No module named model_selection

I am stuck now.. not sure what to do next...any hints?

tests for train.py and server.py

these scripts are currently not well covered by unit tests. I think coveralls doesn't report that because they're considered 'outside' the rasa_nlu module

Why supported languages are hardcoded?

Hi all,

first of all, thank you for this awesome work!

I'm almost new to NLP & co in general, so I've started stydying as much as possible to get some grasp. Meanwhile, I've started tinkering with rasa_nlu and, after running the default provided examples, I tried to use it with italian language, but I can't specify it in the configuration file (or CLI) because it has hardcoded languages (en and de).

I'm aware it needs a total_word_feature_extractor (at least for the MITIE backend), I've generated one from a relatively small italian corpus, but I can't use it anyway on rasa_nlu.

I've also read somewhere that it's possible to avoid a predefined language model at the cost of very low quality results, but at the point where I am, it's totally acceptable.

So, is it possible (or is it planned) to support more languages and/or workaround the hardcoded languages?

Thanks in advance for the replies, keep up the good job!

Wrong language is used by spacy interpreter

Although language is set to "de" in described model (metadata.json) the interpreter still using "en" as default language.

I have fixed that by overwriting the language_name in SpacySklearnInterpreter if "language" in metadata exists.

ImportError: No module named mitie

I keep on getting this error when trying to run the train command.

python -m rasa_nlu.train -c config.json
Traceback (most recent call last):
  File "D:\Programs\Python27\lib\runpy.py", line 162, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "D:\Programs\Python27\lib\runpy.py", line 72, in _run_code
    exec code in run_globals
  File "build\bdist.win32\egg\rasa_nlu\train.py", line 65, in <module>
  File "build\bdist.win32\egg\rasa_nlu\train.py", line 54, in do_train
  File "build\bdist.win32\egg\rasa_nlu\train.py", line 28, in create_trainer
  File "build\bdist.win32\egg\rasa_nlu\trainers\mitie_trainer.py", line 1, in <module>
ImportError: No module named mitie

I already have the MITIE-models and config.json setup accordingly

{
  "backend": "mitie",
  "path" : "./",
  "mitie_file" : "D:\\Projects\\mities\\english\\total_word_feature_extractor.dat",
  "data" : "./data/demo-restaurants.json"
}

Is there something that I might be missing?

Apache license problematic for vendorizing code, e.g. visualizer

Hi,

First, congrats on the launch! Glad to see the Python NLP ecosystem getting even stronger. We're rolling out more spaCy models early next year that should be relevant to you guys.

I noticed some administrivia I'd like to raise, though. It's useful to get these things straightened out early...

I see you guys have chosen to go with the Apache license. I think this might be a problem, because it means you'll have to think carefully about the licensing of code that ends up in your repository.

There's already an example of the problem in rasa's entity visualizer, which includes code copy-pasted from our entity visualizer. The relevant commit is here.

If you make Rasa MIT licensed, you'll be able to cut-and-paste code from MIT libraries like this, so long as you attribute it. But if you keep Rasa under the Apache license, you'll only be able to include Apache licensed code in the library.

While this isn't a big deal for the visualizer, you could later get some substantial patches that are actually sourced from an MIT library, without you knowing it. By distributing this under the Apache license, you'll be giving people the impression that they don't need to attribute the code to the original author. You won't be able to fix this problem by simply adding an attribution line — you'll have to delete the code, or change your license.

Matt

update to rasa output format

We should make the output format of parsed sentences identical to the rasa native data format

e.g. parsing "Show me chinese restaurants" currently returns :

{
        "text": "show me chinese restaurants",
        "intent": "inform",
        "entities": { 
            "cuisine" : "chinese"
        }
}

I'm proposing to change that to

{
     "text": "show me chinese restaurants",
     "intent": "inform",
     "entities": [ 
          {
            "start": 8,
            "end": 15,
            "value": "chinese",
            "entity": "cuisine"
          }
     ]
}

which the same as what we expect for input for training a model. Then we can log exactly these responses while the server is running and make it easy to add these back to training data to improve models

ensure pep-8 style

can use the pep8 plugin to pytest for this.

Example of Using RASA as a Library

In the project's description, it says that

rasa NLU is written in Python, but it you can use it from any language through a HTTP API. If your project is written in Python you can simply import the relevant classes.

The documentation gives an example of how one would use RASA as a server, but there are no examples of how one would use RASA as a library directly in Python.

Could you add a basic example of model training and usage from Python?

add api.ai importer / emulator

simple web app for creating training data

I'm sure some users will want some kind of GUI for rasa NLU.

We currently have the visualizer which can render training data. Would be cool to have a simple web app which can create formatted training data files, similarly to the wit.ai / LUIS interfaces.

Just comment here if you would like to work on it, or have suggestions on how it should be done

Unable to detect entities from exported data of api.ai

I imported one agent's data from api.ai to rasa and faced below mentioned issues in response -

Not able to detect entities.
It was working fine for exact statements ,but for even slightly different it was detecting random intents.
Even for gibberish data it was detecting one or the other intent. Keeping into account that we can't have confidence of detection , its impossible to know whether we have detected correct intent or not.

To further debug the issue if you want to have exported data let me know I will mail you.

Intent classifier without entities

Hi there,
I'm trying to train an intent classifier without entities :

{
  "rasa_nlu_data": {
    "entity_examples": [
     ], 
    "intent_examples": [
      {
        "text": "hey", 
        "intent": "greet"
      }
     ...

And I got the following error :

raise Exception("You can't call train() on an empty trainer.")
Exception: You can't call train() on an empty trainer

There is a way to use only Intents?

Thanks in advance

numpy is always required

the MITIEFeaturizer actually uses numpy, in the create_bow_vecs function.
We should document this.

multitenancy support

This suggestion also comes from @3x14159265 .
Idea is to have rasa NLU provide multiple apps, e.g. have several models loaded into memory and serve requests based on them (routed by the URL).

The simplest approach is to start a separate server for each model, and use a supervisor. But each process will have word vectors loaded into memory, which means you can't fit very many on a server.

A better way would be to have several models loaded within one server, although I think only the spaCy backend would actually be able to share the large memory component between them. Would probably be doable to modify MITIE to support that as well.

To help plan this out, would be really helpful if people wrote their intended deployment setup here, so we can discuss various trade-offs.