Comments (9)
Hey @tmbo Thanks for the insights!
I'd like to keep that and rather separate model persistence and data loading
I think in this case, we do something like introducing AbstractDataLoader
which gets implemented for loading training data from file-system or mongo-db.AbstractPersistor
for persistance will include, aws, gcs, & mongo-db (no change in the current implementation except, introducing a mongo-db class).
from rasa.
@amn41 So basically, we can extend that abstract class for mongoDB and store the training JSON (for example for wit or api) inside a mongodb collection right?
At this point do you think it would be OK to perform the training by using the POST /parse endpoint that we have? Here we could maybe specify a connector, host, username, password and maybe a collection...
the do_POST method would then call the appropriate db connector through the Data Router.
So for example we could think of providing POST /parse with:
- the training data
{
"provider": "traing_data".
"data": { .......}
}
- a db connector and collection
{
"provider": "mongoDB".
"config": {
"host": "....",
"user": "....",
"pass": "....",
"collection": "...."
}
}
- or you can register a connector along with its configuration in config.json and you just refer to a specific connector/collection to trigger the training process:
{
"provider": "mongoDB".
"collection": "...."
}
from rasa.
closing for now. If this becomes relevant again we can reopen
from rasa.
Hi guys! 😃
I had a chat with @tmbo regarding this and went through the source.
Rasa specific I/O operations (I noticed) are:
- Training phase:
-
Opening config file :
config.json
-
Loading training data :
data.json
-
Saving model :
ner |——config.json |——model intent_classifier.pkl metadata.json training_data.json
- Testing phase:
- Opening config file
- Loading model
- Logging
Considering 2, 3 & 5:
I would like to suggest and implement a solution as below:
-
Introduce an abstract
DataAdapter
:class DataAdapter(): @abstractmethod def read_training_data(): pass @abstractmethod def save_model(): pass @abstractmethod def read_model(): pass
-
DataAdapter
gets implemented in several ways, e.g:MongoAdapter
,FileAdapter
andS3Adapter
-
Such instance gets passed around to the Trainer,
MetaData
and etc. -
When data is required call
read_training_data()
, similarly for other operations as well. -
Leave the room for others to implement
DataAdapter
when needed. (e.g.DynamoAdapter
for dynamodb) -
Delete
persister.py
asS3Adapter
andFileAdapter
does that work. May remove some of the code indata_router.py
-
Change
config.json
, e.g:{ …. “training_data” : { “source”: “file_system”, “path” : “./data/training_data.json”} …. }
or
{ …. “training_data” : { “source”: “mongo_db”, “host” : “”,…, “collection”: “”} …. }
What do you think?
from rasa.
@dinal24 Thanks for the nice writeup. Some thoughts:
- it should be possible to support "mixed" data adapters, e.g. reading the data from mongo but writing the models onto disk (I think writing the models into mongo is a rare use case anyway).
- not sure yet about the configuration format, but that is a different topic we are currently thinking about (e.g. the nesting is though if arguments should be passed in via command line or environment), so anything will be fine here for the time being.
from rasa.
Hey @tmbo thanks for your thoughts. I could think of an altered solution.
We have the abstract DataAdapter
, and implement a default class like DefaultDataAdapter
. It will include all the functionality of the latest rasa (s3 & file) + mongo.
class DefaultDataAdapter(DataAdapter):
def __init__(train_conf, s_model_conf, r_model_conf)
# deduces using input params, e.g:
# self.train_type = 'mongodb'
# self.train_param = 'mongodb://localhost:27020/mydb'
# self.save_model_type = 'file'
# self.save_model_param = './models'
pass
def read_training_data():
if self.train_type == 'mongodb':
# load from mongo
pass
elif self.train_type == 'file':
# load from fs
pass
def save_model():
# similarly
pass
def read_model():
# similarly
pass
We can keep the configuration the same. Extract information from the config.json
and create a DataAdapter
instance and pass it as necessary. Also MongoDB preference can be inputted using the standard mongo URI string via terminal.
If someone needs a different combination has the ease to implement DataAdapter
and use it.
I believe logs can also be stored to a mongo instead of file system (if required). It will need another function like def write_log()
, implementation can be done in future.
I hope this addresses your concerns. 😀
from rasa.
Alright, here are my thoughts. I'd rather like to keep it as simple as possible for now, we can always add more abstractions later if we feel it needs more structure. From your previous idea I really liked that every data source had its own class it was implemented in. I'd like to keep that and rather separate model persistence and data loading (that doesn't mean an implementation like mongo can't use a helper class to share implementation details between the two).
That said, I'd love it if you could introduce an interface to load data (the interface to persist models already exists with two implementations) and integrate that. I think this is a good start to get you coding. Don't hesitate to share your PR early so we can continue to exchange ideas, you can still change it afterwards.
from rasa.
there now is the possibility to fetch training data from an http endpoint which should be fine for most use cases.
from rasa.
Can we find these updates and how to use them in the documentation?
from rasa.
Related Issues (20)
- TypeError linked to protobuf on Rasa 3.3.2 / Python 3.9 when trying to import Validator HOT 2
- rasa data validate does not properly ignore warnings HOT 3
- JSONDecodeError when loading YAML file HOT 1
- Could not load model due to Error initializing graph component for node 'run_LanguageModelFeaturizer1' HOT 4
- rasa train does not pick GPU HOT 4
- AttributeError: module 'rasa_nlu.config' has no attribute 'load' HOT 1
- Explain-ability with LIME or SHAP HOT 2
- Bugs encountered when using external PostgreSQL and Redis HOT 2
- Problems with rasa installation on Python 3.10 HOT 2
- Improving README.md steps in Development Internals HOT 3
- Test feature request
- Test bug
- Training model not working on mac m1: 9284 illegal hardware instruction HOT 2
- 💡 Looking for issues? Head over to Jira
- 💡 Looking for issues? Head over to Jira!
- Cython installation issue in arm processor. HOT 3
- Make pre compiled typo detection when $ rasa train
- RASA NLU trainer error HOT 3
- Add random_state (as keyword argument?) to generate_folds and use it when executing stratified sampling HOT 1
- UserWaning, issue found in data/rules.yml Found intent "name_intent" in stories wich is not a part of the domain
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from rasa.