I like this idea, suggested by <a class="user-mention notranslate" data-hovercard-type

Hey <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url=

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Hi guys! 😃 I had a chat with <a class="user-mention notranslate" da

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Hey <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url=

support for training data fetched from DB about rasa HOT 9 CLOSED

rasahq commented on May 21, 2024 3

support for training data fetched from DB

from rasa.

Comments (9)

dinal24 commented on May 21, 2024 1

Hey @tmbo Thanks for the insights!

I'd like to keep that and rather separate model persistence and data loading

I think in this case, we do something like introducing AbstractDataLoader which gets implemented for loading training data from file-system or mongo-db.AbstractPersistor for persistance will include, aws, gcs, & mongo-db (no change in the current implementation except, introducing a mongo-db class).

from rasa.

plauto commented on May 21, 2024

@amn41 So basically, we can extend that abstract class for mongoDB and store the training JSON (for example for wit or api) inside a mongodb collection right?

At this point do you think it would be OK to perform the training by using the POST /parse endpoint that we have? Here we could maybe specify a connector, host, username, password and maybe a collection...
the do_POST method would then call the appropriate db connector through the Data Router.

So for example we could think of providing POST /parse with:

the training data

{
  "provider":  "traing_data".
  "data":  { .......}
}

a db connector and collection

{
  "provider":  "mongoDB".
  "config": { 
      "host": "....", 
      "user": "....",
      "pass": "....",
      "collection": "...."
  }
}

or you can register a connector along with its configuration in config.json and you just refer to a specific connector/collection to trigger the training process:

{
  "provider":  "mongoDB".
  "collection": "...."
}

from rasa.

amn41 commented on May 21, 2024

closing for now. If this becomes relevant again we can reopen

from rasa.

dinal24 commented on May 21, 2024

Hi guys! 😃

I had a chat with @tmbo regarding this and went through the source.

Rasa specific I/O operations (I noticed) are:

Training phase:

Opening config file :
```
config.json
```
Loading training data :
```
 data.json
```

Saving model :

 ner
    |——config.json
    |——model
 intent_classifier.pkl
 metadata.json
 training_data.json

Testing phase:

Opening config file
Loading model
Logging

Considering 2, 3 & 5:

I would like to suggest and implement a solution as below:

Introduce an abstract DataAdapter:

 class DataAdapter():
    @abstractmethod
    def read_training_data():
        pass

    @abstractmethod
    def save_model():
        pass

    @abstractmethod
    def read_model():
        pass

DataAdapter gets implemented in several ways, e.g: MongoAdapter, FileAdapter and S3Adapter
Such instance gets passed around to the Trainer, MetaData and etc.
When data is required call read_training_data(), similarly for other operations as well.
Leave the room for others to implement DataAdapter when needed. (e.g. DynamoAdapter for dynamodb)
Delete persister.py as S3Adapter and FileAdapter does that work. May remove some of the code in data_router.py

Change config.json, e.g:

 {
    ….
    “training_data” : { “source”: “file_system”, “path” : “./data/training_data.json”}
    ….
 }

 {
    ….
    “training_data” : { “source”: “mongo_db”, “host” : “”,…, “collection”: “”}
    ….
 }

What do you think?

from rasa.

tmbo commented on May 21, 2024

@dinal24 Thanks for the nice writeup. Some thoughts:

it should be possible to support "mixed" data adapters, e.g. reading the data from mongo but writing the models onto disk (I think writing the models into mongo is a rare use case anyway).
not sure yet about the configuration format, but that is a different topic we are currently thinking about (e.g. the nesting is though if arguments should be passed in via command line or environment), so anything will be fine here for the time being.

from rasa.

dinal24 commented on May 21, 2024

Hey @tmbo thanks for your thoughts. I could think of an altered solution.

We have the abstract DataAdapter, and implement a default class like DefaultDataAdapter. It will include all the functionality of the latest rasa (s3 & file) + mongo.

      class DefaultDataAdapter(DataAdapter):
          def __init__(train_conf, s_model_conf, r_model_conf)
              # deduces using input params, e.g:
              # self.train_type = 'mongodb'
              # self.train_param = 'mongodb://localhost:27020/mydb'
              # self.save_model_type = 'file'
              # self.save_model_param = './models'
              pass
  
          def read_training_data():
              if self.train_type == 'mongodb':
                  # load from mongo
                  pass
              elif self.train_type == 'file':    
                  # load from fs
                  pass
  
          def save_model():
              # similarly
              pass
  
          def read_model():
              # similarly
              pass

We can keep the configuration the same. Extract information from the config.json and create a DataAdapter instance and pass it as necessary. Also MongoDB preference can be inputted using the standard mongo URI string via terminal.

If someone needs a different combination has the ease to implement DataAdapter and use it.

I believe logs can also be stored to a mongo instead of file system (if required). It will need another function like def write_log(), implementation can be done in future.

I hope this addresses your concerns. 😀

from rasa.

tmbo commented on May 21, 2024

Alright, here are my thoughts. I'd rather like to keep it as simple as possible for now, we can always add more abstractions later if we feel it needs more structure. From your previous idea I really liked that every data source had its own class it was implemented in. I'd like to keep that and rather separate model persistence and data loading (that doesn't mean an implementation like mongo can't use a helper class to share implementation details between the two).

That said, I'd love it if you could introduce an interface to load data (the interface to persist models already exists with two implementations) and integrate that. I think this is a good start to get you coding. Don't hesitate to share your PR early so we can continue to exchange ideas, you can still change it afterwards.

from rasa.

tmbo commented on May 21, 2024

there now is the possibility to fetch training data from an http endpoint which should be fine for most use cases.

from rasa.

anushka17agarwal commented on May 21, 2024

Can we find these updates and how to use them in the documentation?

from rasa.

support for training data fetched from DB about rasa HOT 9 CLOSED

Comments (9)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent