Add the ability for a user to easily test an algorithm with different parameters simil

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Hello <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-ur

Hello Nicolas, I have created the pull request <a class="issue-link

Closing this as it has been solved by <a class="commit-link" data-hovercard-type="comm

[Feature] GridSearchCV for Surprise algorithms about surprise HOT 9 CLOSED

nicolashug commented on May 19, 2024 1

[Feature] GridSearchCV for Surprise algorithms

from surprise.

Comments (9)

NicolasHug commented on May 19, 2024 1

Hey Maher,

No you don't need to use RDT, you can simply build the docs locally. If you haven't already, install all the packages for development with

pip install -r requirements_dev.txt

this will install sphinx (the documentation generator), the RTD theme and some sphinx extensions needed in the doc building process.

The you should be able to build the docs locally by running

cd doc
make html

You can check the results in doc/build/html

I'll add all this in the CONTRIBUTING.md file!

from surprise.

mahermalaeb commented on May 19, 2024

@NicolasHug,
I want to add documentation to the new feature. Before the pull request I want to test them on "readthedocs" to see if they look good. I read a little and it seems that I have to create an account and publish docs from my forked repository to do so.

Do you confirm that this is the way to go or is there any other alternative that I am not aware of?

Thanks,
Maher

from surprise.

mahermalaeb commented on May 19, 2024

Hello @NicolasHug,

I have finished the implementation and basic documentation of the feature, and pushed the changes to my repo here. When you can please take a look at the docs to confirm that the functionality is as you expected.

In addition, I want to write a user-guide page but I need to take your opinion on where to add it.
Should it be added under "Advanced Usage" section, should I create a separate section or should I add a jupyter notebook?

Happy new year btw :)

Maher

from surprise.

NicolasHug commented on May 19, 2024

This is awesome!

For the doc, you can put it as the first subsection of the "Advanced Usage", right before "Manually iterate over folds". As this is a major feature, you can also add a quick example in the README under the "example" section (if you don't no problem, I'll do it ^^)

I haven't thoroughly tested the code for now (I'll come back to you, probably on Monday), so these are just a few remarks after taking a quick glance at it:

In the test file, could we set the number of epochs and number of factors to lower values so that the tests run faster?
We can make a direct reference to the GridSearchCV of scikit-learn in the doc, so that people know what this is about.
Try to be as pythonic as possible. For example, using enumerate in a for loop would be better than counting the number of iterations (combination_counter += 1). These two talks (talk1, talk2) by Raymond Hettinger have been of great help to me.
I'd rather not use u'strings' here, is it possible to avoid them?
Could cv_results_ be a defaultdict(list) ? I think this would avoid the if key in self.cv_results_.keys():...
I never used attributes that end (or start) with underscores so may be rename best_params_ and likes to best_params.
Is this true : Depending on the nature of the data parameter, it may or may not cross validation. ?
As best_params_ and co are defaultdict, I don't think the tests in test_measure_is_not_case_sensitive do what you think they do (no exception will ever be raised).
Maybe add a little bit of (meaningful) comments to the code?
don't forget to be PEP8 compliant! I see many lines exceeding 80 characters, flake8 is your friend.

Also, please create a pull request so that we can review changes on the pull request . Your upcoming commits will be automatically added to the pull request discussion.

Thank you so much for putting time into it, this will be a very attractive feature!

Happy new year :)!

Nicolas

from surprise.

NicolasHug commented on May 19, 2024

Hey, coming back to you on this :)

It all seem to work very well, I only have a few more remarks:

maybe the verbose message could be changed to something like "Computing MAE and RMSE for parameters combination 1/8"?
Could we used an OrderderDict for the parameters? Or something that would keep the parameters list in the same order as they were given?
Could we use the evaluate() function of the evaluate module instead of computing accuracies manually? To avoid a name clash you would need something like performances = globals()['evaluate'](algo_instance, data)
I think something like measures = [measure.upper() for measure in measures] at the beginning of the function would save a lot other measures.upper() in the future

And I think this is pretty much it :)

Nicolas

from surprise.

mahermalaeb commented on May 19, 2024

Hello Nicolas,

I have created the pull request #7 . I have also added a user guide in the beginning of the advanced usage section.

Please find my comments on each of the points above:

In the test file, could we set the number of epochs and number of factors to lower values so that the tests run faster? -->I have reduced iterations in most functions. Could be reduced further but requires changing assertions
We can make a direct reference to the GridSearchCV of scikit-learn in the doc, so that people know what this is about. --> Done
Try to be as pythonic as possible. For example, using enumerate in a for loop would be better than counting the number of iterations (combination_counter += 1). These two talks (talk1, talk2) by Raymond Hettinger have been of great help to me. --> Done Thanks to the talks :)
I'd rather not use u'strings' here, is it possible to avoid them? --> I didn't get what you meant from this point. Can you please clarify it to me?
Could cv_results_ be a defaultdict(list) ? I think this would avoid the if key in self.cv_results_.keys():... --> --> Done in a pythonic way. Awesome talks again
I never used attributes that end (or start) with underscores so may be rename best_params_ and likes to best_params. --> I don't know why either. I kept the GridSearchCV convention. Let me know if you confirm renaming them
Is this true : Depending on the nature of the data parameter, it may or may not cross validation. ? --> Since I am using the evaluate method you implemented and the data is used as an input, I whether the cross validated depends on whether the data was divided into folds. Am I missing something?
As best_params_ and co are defaultdict, I don't think the tests in test_measure_is_not_case_sensitive do what you think they do (no exception will ever be raised). --> Corrected
Maybe add a little bit of (meaningful) comments to the code? --> Done. Let me know if its enough
don't forget to be PEP8 compliant! I see many lines exceeding 80 characters, flake8 is your friend --> All lines should be less than 80 chars now
maybe the verbose message could be changed to something like "Computing MAE and RMSE for parameters combination 1/8"? --> I changed the verbose a little. However I didn't add the accuracy names such as MAE and RMSE since I am using the evaluate method which adds them by default. Note that the docs of evaluate says with verbosity 0 it shouldn't print anything but it actually prints accuracy measures.
Could we used an OrderderDict for the parameters? Or something that would keep the parameters list in the same order as they were given? --> The moment a dict is defined, its order is lost. So to my knowledge there is no way to save the order of dict keys. The solution would be changing the input to the function to be an ordered list maybe. But I think it is easier and more convenient to keep it a normal dict.
Could we use the evaluate() function of the evaluate module instead of computing accuracies manually? To avoid a name clash you would need something like performances = globals()['evaluate'](algo_instance, data) --> This is my bad. I should have used it from the start instead of doing it manually. Now that its done, the code is much cleaner.
I think something like measures = [measure.upper() for measure in measures] at the beginning of the function would save a lot other measures.upper() in the future --> Done

Notes:

My background is java where the good practice is to write a lot of small functions and unit tests. It's not the case in Python. Any enhancements you add to the code will be of great benefit for me.
I have created a duplicate case insensitive dict because the one you already had overrides the __str__ method. This is not good for best_params_ and co because we are no longer able to print it. Nevertheless, I think it is better to use inheritance then duplicating the code but I didn't want to mess with the code you've written

Looking forward for the merge ;)

Maher

from surprise.

NicolasHug commented on May 19, 2024

Awesome!

I put my remarks on the pull request.

from surprise.

NicolasHug commented on May 19, 2024

Closing this as it has been solved by 714be0b

from surprise.

gopal-gcek commented on May 19, 2024

param_grid = {'n_epochs': [5, 10], 'lr_all': [0.002, 0.005],
'reg_all': [0.4, 0.6]}

Cross-validation using grid search

grid_search = GridSearchCV(SVD, param_grid, measures=['rmse', 'mae'], cv=3, n_jobs=1)

Find the best parameters on the data set

data = Dataset.load_builtin('ml-100k')
after Running above code i got the below error can any help me also i have checked that there is key word measures in grid search of surprises

TypeError Traceback (most recent call last)
in ()
6 'reg_all': [0.4, 0.6]}
7 # Cross-validation using grid search
----> 8 grid_search = GridSearchCV(SVD, param_grid, measures=['rmse', 'mae'], cv=3, n_jobs=1)
9 # Find the best parameters on the data set
10 data = Dataset.load_builtin('ml-100k')

TypeError: init() got an unexpected keyword argument 'measures'

from surprise.

[Feature] GridSearchCV for Surprise algorithms about surprise HOT 9 CLOSED

Comments (9)

Cross-validation using grid search

Find the best parameters on the data set

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent