Giter VIP home page Giter VIP logo

mukeshmk / fm-learn Goto Github PK

View Code? Open in Web Editor NEW
3.0 4.0 2.0 1.27 MB

Federated Meta-Learning: a concept that allows everyone to benefit from the data that is generated through machine learning libraries.

Home Page: https://fmlearn.herokuapp.com/

Python 23.89% CSS 53.05% JavaScript 14.96% HTML 8.09%
federated-learning machine-learning scikit-learn federated-meta-learning meta-learning fmlearn fm-learn algorithm-selection algorithm-selection-library recommender-system

fm-learn's Introduction

FMLearn

โ€œFederated Meta-Learningโ€ (FML), a concept that allows everyone to benefit from the data that is generated through software libraries including machine learning and data science libraries.

We have built FMLearn, an application developed using the client-server model, to allows the exchange of meta-data about machine learning models for the purpose of meta-learned algorithm selection and configuration.

scikit-learn has been forked and a package has been developed in it to make API calls to FMLearn.

The use of FMLearn to identify the algorithm with the best performance, that is, least MSE for a dataset allows the user in scaling down the repetitive effort and time consumed in rewriting and executing code, correcting possible human errors, etc.

Important Links:

Proposal: Federated Meta-Learning

Publication: Federated Meta-Learning: Democratizing Algorithm Selection Across Disciplines and Software Libraries

GitHub Repo for the modified scikit-learn: mukeshmk/scikit-learn

fm-learn's People

Contributors

dependabot[bot] avatar mukeshmk avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

fm-learn's Issues

prediction of an algorithm when no model/dataset exists (cold start problem)

prediction of an algorithm when no model/dataset exists or the cold start problem is a huge feature/issue on its own, for now, to prevent this from causing the below run time issue:

Traceback (most recent call last):
File "/app/.heroku/python/lib/python3.7/site-packages/flask/app.py", line 2446, in wsgi_app
response = self.full_dispatch_request()
File "/app/.heroku/python/lib/python3.7/site-packages/flask/app.py", line 1951, in full_dispatch_request
rv = self.handle_user_exception(e)
File "/app/.heroku/python/lib/python3.7/site-packages/flask/app.py", line 1820, in handle_user_exception
reraise(exc_type, exc_value, tb)
File "/app/.heroku/python/lib/python3.7/site-packages/flask/_compat.py", line 39, in reraise
raise value
File "/app/.heroku/python/lib/python3.7/site-packages/flask/app.py", line 1949, in full_dispatch_request
rv = self.dispatch_request()
File "/app/.heroku/python/lib/python3.7/site-packages/flask/app.py", line 1935, in dispatch_request
return self.view_functions[rule.endpoint](**req.view_args)
File "/app/src/api.py", line 102, in predict_fmlearn
tt_encoder = fml.get_encoders()[utils.TARGET_TYPE]
KeyError: 'Target Type'

Temporary Fix:
add a fail-safe to check for empty database and return a JSON object indicating no "Model has been Trained!"?

Proper Fix:
TODO: think of a way to solve the cold start problem?

heroku deployment failed: reason - empty database

Heroku Deployment Failed: reason - empty database.

when the application is deployed for the first time or the started after the database is cleared, the feature to load the data and train a model is breaking as the data frame loaded is an empty one and getting Xy for the model training fails with the error.

Traceback (most recent call last):
  File "app.py", line 24, in <module>
    from src.api import metrics_api
  File "/home/blake/code/fm-learn/src/api.py", line 22, in <module>
    fml.load_data()
  File "/home/blake/code/fm-learn/src/fmlearn.py", line 60, in load_data
    self._X, self._y = utils.get_Xy(self._df)
  File "/home/blake/code/fm-learn/src/utils/utils.py", line 60, in get_Xy
    y = df[[DATASET_HASH]]
  File "/home/blake/code/fm-learn/venv/lib/python3.7/site-packages/pandas/core/frame.py", line 2806, in __getitem__
    indexer = self.loc._get_listlike_indexer(key, axis=1, raise_missing=True)[1]
  File "/home/blake/code/fm-learn/venv/lib/python3.7/site-packages/pandas/core/indexing.py", line 1553, in _get_listlike_indexer
    keyarr, indexer, o._get_axis_number(axis), raise_missing=raise_missing
  File "/home/blake/code/fm-learn/venv/lib/python3.7/site-packages/pandas/core/indexing.py", line 1640, in _validate_read_indexer
    raise KeyError(f"None of [{key}] are in the [{axis_name}]")
KeyError: "None of [Index(['Dataset Hash'], dtype='object')] are in the [columns]"

Possible Fix:
Stop training the model if the data frame is empty? and train later once data is available.

Other Issues to be considered:

  • #18 prediction of an algorithm when no model/dataset exists (cold start problem)
  • #19 when to load the data and train model when said scenario occurs?

force retrain of model if the model is older than a set time frame

complete #TODO in predict() function.

# TODO: force retrain of model if the model is older than a set time frame?

The ideal case is the model is trained once (possibly at the server start?) with the existing data and then retrained again and again after a certain time frame has passed by reloading data from the DB.

Since there is no plan for using background processes to trigger an event to automatically retrain the model, an alternate way of achieving this would by checking if a set time has passed before predict() function is called via the API calls and then retraining the model.

This could be a possible workaround to background process though not an ideal one.

check the shape of the input dataframe

complete TODO in predict() function.

check the shape of the input data frame and the shape of the data frame used to train the model, possibly from the class variables?

# TODO: check if the shape of the input df matches that used to train the model.

NOTE: handle what would happen if the data is reloaded?
might have to fix #12 before this?

Change the currently used key-based SHA256 to a one-time hash function

Change the currently used key-based SHA256 to a one-time hash function.
Remove the process of encryption and decryption from this project, the current implementation will only handle matches based on hash values.
v1 of the project will not provide the possibility of retrieving the data back from the hash value.

Make sure to update sklearn based on these comments as well.

Based on comments from Joeran Beel, after meeting on 02-04-2020.

when to load the data and train model for the first time

when to load the data and train model for the first time?

This scenario was discovered with #17 and should be partially solved with #18 though with the proposed temporary fix. But, the first time training of the model should occur when we have about MAX_NEW_RECORDS no of records in the database initially?

Thought in future this number could be configured.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.