Comments (9)
Hey Maher,
No you don't need to use RDT, you can simply build the docs locally. If you haven't already, install all the packages for development with
pip install -r requirements_dev.txt
this will install sphinx (the documentation generator), the RTD theme and some sphinx extensions needed in the doc building process.
The you should be able to build the docs locally by running
cd doc
make html
You can check the results in doc/build/html
I'll add all this in the CONTRIBUTING.md
file!
from surprise.
@NicolasHug,
I want to add documentation to the new feature. Before the pull request I want to test them on "readthedocs" to see if they look good. I read a little and it seems that I have to create an account and publish docs from my forked repository to do so.
Do you confirm that this is the way to go or is there any other alternative that I am not aware of?
Thanks,
Maher
from surprise.
Hello @NicolasHug,
I have finished the implementation and basic documentation of the feature, and pushed the changes to my repo here. When you can please take a look at the docs to confirm that the functionality is as you expected.
In addition, I want to write a user-guide page but I need to take your opinion on where to add it.
Should it be added under "Advanced Usage" section, should I create a separate section or should I add a jupyter notebook?
Happy new year btw :)
Maher
from surprise.
This is awesome!
For the doc, you can put it as the first subsection of the "Advanced Usage", right before "Manually iterate over folds". As this is a major feature, you can also add a quick example in the README under the "example" section (if you don't no problem, I'll do it ^^)
I haven't thoroughly tested the code for now (I'll come back to you, probably on Monday), so these are just a few remarks after taking a quick glance at it:
- In the test file, could we set the number of epochs and number of factors to lower values so that the tests run faster?
- We can make a direct reference to the GridSearchCV of scikit-learn in the doc, so that people know what this is about.
- Try to be as pythonic as possible. For example, using
enumerate
in a for loop would be better than counting the number of iterations (combination_counter += 1
). These two talks (talk1, talk2) by Raymond Hettinger have been of great help to me. - I'd rather not use
u'strings'
here, is it possible to avoid them? - Could
cv_results_
be adefaultdict(list)
? I think this would avoid theif key in self.cv_results_.keys():
... - I never used attributes that end (or start) with underscores so may be rename
best_params_
and likes tobest_params
. - Is this true : Depending on the nature of the
data
parameter, it may or may not cross validation. ? - As
best_params_
and co aredefaultdict
, I don't think the tests intest_measure_is_not_case_sensitive
do what you think they do (no exception will ever be raised). - Maybe add a little bit of (meaningful) comments to the code?
- don't forget to be PEP8 compliant! I see many lines exceeding 80 characters, flake8 is your friend.
Also, please create a pull request so that we can review changes on the pull request . Your upcoming commits will be automatically added to the pull request discussion.
Thank you so much for putting time into it, this will be a very attractive feature!
Happy new year :)!
Nicolas
from surprise.
Hey, coming back to you on this :)
It all seem to work very well, I only have a few more remarks:
- maybe the verbose message could be changed to something like "Computing MAE and RMSE for parameters combination 1/8"?
- Could we used an OrderderDict for the parameters? Or something that would keep the parameters list in the same order as they were given?
- Could we use the
evaluate()
function of theevaluate
module instead of computing accuracies manually? To avoid a name clash you would need something likeperformances = globals()['evaluate'](algo_instance, data)
- I think something like
measures = [measure.upper() for measure in measures]
at the beginning of the function would save a lot othermeasures.upper()
in the future
And I think this is pretty much it :)
Nicolas
from surprise.
Hello Nicolas,
I have created the pull request #7 . I have also added a user guide in the beginning of the advanced usage section.
Please find my comments on each of the points above:
- In the test file, could we set the number of epochs and number of factors to lower values so that the tests run faster? -->I have reduced iterations in most functions. Could be reduced further but requires changing assertions
- We can make a direct reference to the GridSearchCV of scikit-learn in the doc, so that people know what this is about. --> Done
- Try to be as pythonic as possible. For example, using enumerate in a for loop would be better than counting the number of iterations (combination_counter += 1). These two talks (talk1, talk2) by Raymond Hettinger have been of great help to me. --> Done Thanks to the talks :)
- I'd rather not use u'strings' here, is it possible to avoid them? --> I didn't get what you meant from this point. Can you please clarify it to me?
- Could cv_results_ be a defaultdict(list) ? I think this would avoid the if key in self.cv_results_.keys():... --> --> Done in a pythonic way. Awesome talks again
- I never used attributes that end (or start) with underscores so may be rename best_params_ and likes to best_params. --> I don't know why either. I kept the GridSearchCV convention. Let me know if you confirm renaming them
- Is this true : Depending on the nature of the data parameter, it may or may not cross validation. ? --> Since I am using the evaluate method you implemented and the data is used as an input, I whether the cross validated depends on whether the data was divided into folds. Am I missing something?
- As best_params_ and co are defaultdict, I don't think the tests in test_measure_is_not_case_sensitive do what you think they do (no exception will ever be raised). --> Corrected
- Maybe add a little bit of (meaningful) comments to the code? --> Done. Let me know if its enough
- don't forget to be PEP8 compliant! I see many lines exceeding 80 characters, flake8 is your friend --> All lines should be less than 80 chars now
- maybe the verbose message could be changed to something like "Computing MAE and RMSE for parameters combination 1/8"? --> I changed the verbose a little. However I didn't add the accuracy names such as MAE and RMSE since I am using the evaluate method which adds them by default. Note that the docs of evaluate says with verbosity 0 it shouldn't print anything but it actually prints accuracy measures.
- Could we used an OrderderDict for the parameters? Or something that would keep the parameters list in the same order as they were given? --> The moment a dict is defined, its order is lost. So to my knowledge there is no way to save the order of dict keys. The solution would be changing the input to the function to be an ordered list maybe. But I think it is easier and more convenient to keep it a normal dict.
- Could we use the evaluate() function of the evaluate module instead of computing accuracies manually? To avoid a name clash you would need something like performances = globals()['evaluate'](algo_instance, data) --> This is my bad. I should have used it from the start instead of doing it manually. Now that its done, the code is much cleaner.
- I think something like measures = [measure.upper() for measure in measures] at the beginning of the function would save a lot other measures.upper() in the future --> Done
Notes:
- My background is java where the good practice is to write a lot of small functions and unit tests. It's not the case in Python. Any enhancements you add to the code will be of great benefit for me.
- I have created a duplicate case insensitive dict because the one you already had overrides the
__str__
method. This is not good for best_params_ and co because we are no longer able to print it. Nevertheless, I think it is better to use inheritance then duplicating the code but I didn't want to mess with the code you've written
Looking forward for the merge ;)
Maher
from surprise.
Awesome!
I put my remarks on the pull request.
from surprise.
Closing this as it has been solved by 714be0b
from surprise.
param_grid = {'n_epochs': [5, 10], 'lr_all': [0.002, 0.005],
'reg_all': [0.4, 0.6]}
Cross-validation using grid search
grid_search = GridSearchCV(SVD, param_grid, measures=['rmse', 'mae'], cv=3, n_jobs=1)
Find the best parameters on the data set
data = Dataset.load_builtin('ml-100k')
after Running above code i got the below error can any help me also i have checked that there is key word measures in grid search of surprises
TypeError Traceback (most recent call last)
in ()
6 'reg_all': [0.4, 0.6]}
7 # Cross-validation using grid search
----> 8 grid_search = GridSearchCV(SVD, param_grid, measures=['rmse', 'mae'], cv=3, n_jobs=1)
9 # Find the best parameters on the data set
10 data = Dataset.load_builtin('ml-100k')
TypeError: init() got an unexpected keyword argument 'measures'
from surprise.
Related Issues (20)
- build_anti_testset() takes along time and at the end it doesnot work HOT 2
- question - surprise for implicit rating data? HOT 1
- Can Surprise work with PySpark?
- What to do if the dataset I want to read has more than three parameters HOT 1
- A bug when importing data from DataFrame HOT 2
- How do I apply ALS minimization in SVD? HOT 1
- Error: Sample larger than population or is negative? HOT 1
- Issues with running Suprise on M1 mac HOT 1
- trainset do not recommend new products
- Cross-validation kNN wrong results on custom dataset
- Possible memory leakage in SVDpp HOT 1
- GridSearchCV always recommends the first parameter combination as best HOT 2
- Wrong mapping of the raw IDs to the internal IDs
- How to remove NumPy installation in setup.py HOT 4
- Couldn't install Surprise in windows HOT 5
- No timestamp data in trainset HOT 1
- Couldn't install Surprise HOT 5
- How to do kfold crossvalidation on trainset (eg splitting movielens-100k using u1 split. then kfold crossvalidation on u1.base, test on u1.test) HOT 1
- Unexpected RMSE Differences in SVD Models with almost the same Training Data
- Compatibility with Python 3.12 HOT 9
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from surprise.