When going through literature and the internet concerned with context-aware recommende

Results differ between CARSKit and LibRec? about carskit HOT 13 CLOSED

irecsys commented on September 15, 2024

Results differ between CARSKit and LibRec?

from carskit.

Comments (13)

irecsys commented on September 15, 2024

Thanks for your discussion. I'd like to give you some expalantions about this, hopefully it helps you answer your questions.

1). First of all, there are some incorrect metric calculations in Librec, such as AUC. Thus the AUC calculation in CARSKit is incorrect too. But basically you can trust others.
2). You cannot apply a non-context-aware data with CARSKit, as you mentioned, the data set applied to CARSKit should be in format as follows:
user,item,rating,context:na
3). In CARS, the evaluation is different from traditional ones in terms of ranking metrics. It is evaluated by each context each user, since we are going to recommend a list of items to (user, context) pair

Hopefully those insights can help you understand the difference between librec and CARSKit. Let me know if you have further questions.

from carskit.

irecsys commented on September 15, 2024

a sample of data format you should prepare:

user,item,rating,time,location
1,applebees,1,weekday,school
1,burger king,4,weekday,school
1,carls jr,5,weekday,school
1,costco,5,weekday,school
1,el mazateno,1,weekday,school
1,kentucky fried chicken,5,weekday,school
1,mc donals,1,weekday,school
2,applebees,5,weekday,school
2,daruma,5,weekday,school

system will reconigz it and convert it

from carskit.

basvank commented on September 15, 2024

Thanks for your quick response. Regarding your explanations:

1). Good to know, I have not seen this anywhere, but that explains some of the problems in the results indeed
2). I understand this, but as I showed in my initial message I have converted the context-unaware MovieLens 100K data set to a context-aware one with the context NA for each and every rating, as was also explained in the user guide. So the dataset is indeed context-aware, but with only the context NA
3). This also makes sense, but as I understand this would mean that when you have a context-aware data set where each rating has context NA as in the example I showed (so the last value for each item in the data set is 1 for context:na), this would basically "reduce" to a context-unaware variant. So the library should (and does, in fact) recommend items for (x, context=context:na) for all users, so only one context for each user, and as all ratings in the data are in this same context context:na, all users and all items/ratings are in the same context and this is thus irrelevant for the recommendation process. Thus, in this case, a context-unaware recommender should produce the same results as a context-aware recommender. This is also what I want, because I want the results of context-unaware recommenders (such as ItemKNN, UserKNN and SVD++) as a baseline to be able to compare the context-aware algorithms later on.

from carskit.

basvank commented on September 15, 2024

This is the data set I use, as you can see it is already context-aware with only 1 context:

movielens100kratings.txt

from carskit.

irecsys commented on September 15, 2024

OKay, if all your data format and experiments were correct, there are two remaining reasons coming up into my mind:

1). Upon your experimental results, I guess there are some differences on the evaluations between librec and CARSKit. Let's take the ranking evaluation for example, librec will not evaluate all the items for ranking, there is a selection process for the item candidate. CARSKit followed this way and made changes accordingly. I guess this is the main reason, you can double check the evalranking() function in Recommender.java. I will double check that too.

2). You may double check the "CARSKit.Workspace" folder, there is a file named "ratings_binary" which is the final rating file used for prediction. You can double check whether this file is in the correct format or not.

Note not all context-aware recommendation works better than non-contextual ones. It varies from domain to domain, data to data, especially when it comes to which context variables you used in the data. I know you did not go further on this step. Just FYI

from carskit.

basvank commented on September 15, 2024

1). I am aware of this, that will be the point to investigate in my thesis. However, at the moment I am only looking at setting context-unaware baselines, so this should not be a problem as of now.

2). Do you mean here that the splitting in training and test data is done randomly? That is a good point. I think I know a way of eliminating this: if I manually split the data in training and test sets and supply those to both algorithms, which they support using the test-set option in evaluation.setup if I understand correctly, the comparison would be easier to make. I think we can then also compare the output files to see if they have anything in common at all. Do you agree with this?

3). I have checked this and it looks exactly like the movielens100kratings.txt file I uploaded above, so that's fine

from carskit.

irecsys commented on September 15, 2024

Yes, the evaluation is important, you can simply use training-testing evaluation. In terms of cross validation, previously there was a bug on the librec, but I remembered it was fixed. You can double check the output files (in rating predictions) to see whether they are the same folds for different algorithms. I still suspect the evalranking() function, where the evaluation is a little bit difference from normal in librec and CARSKit.

from carskit.

basvank commented on September 15, 2024

Ok, I will have another look at it tomorrow. Thanks for the feedback.

from carskit.

basvank commented on September 15, 2024

I believe I found the cause of the differences between LibRec and CARSKit: the CARSKit library includes already rated items in the recommendation list, while the LibRec does not. This causes the other items to drop in the list (as already rated items almost always appear higher) or even fall out of the list. However, already rated items are not considered correctly predicted items, so these have a negative effect on all metrics.

It seems that this behavior is due to this code, where it becomes apparent that this is a deliberate choice. I understand this consideration, but after giving it some thought I believe the required behavior might differ depending on the use case:

The case supported by the current behavior, where already rated items are also shown in the top-N recommendation list
A case where already rated items are not shown in the context in which they are rated, but are shown in other contexts for a particular user. For instance: when I have watched a movie while at home alone I do not want to see that movie recommended at a later time while watching alone at home. However, when I am with a friend that has not seen that particular movie, I might want to get it recommended because I want to show it to him and am willing to watch it again if it was very good. This would mean that the exact (user, item, context) combinations that appear in the training set should be filtered from the recommendations.
A case where users do not want to see items that they have rated/bought in any context at all, but the context information is used to improve the recommendations for other users. This is actually my use-case. For instance, a webshop where purchased items do not have to be shown to buyers, but the fact that they bought an item in a certain context (for instance in the weekend) can help improve the recommendations for other users of the system. This would mean that all (user, item) combinations that appear in the training set, without considering the context, should be filtered from the recommendations.

Do you agree that these are different use-cases that can be supported by CARSKit? I think it could be a setting with the current behavior as the default. Having looked at it quickly I believe the line I referred to before can be altered to support the second use-case, implementing the third use-case might be a bit more complicated.

from carskit.

irecsys commented on September 15, 2024

Well, thanks for your finding. Yes, as I mentioned before, I guess the differece lied in the evalranking() function. Right now, I am able to remember this operation, since in CARS data set, users may rate items more than one time, so it is not neccesary to restrict a unique (user, item) in ranking.

Well, by your given suggestions and concerns, actually we can add context as another constraint, and make sure we will not add this item into candidate list if user already rated this item in a specific context. How do you think about that? As you mentioned, yes, the 3rd case is complicated, we may only evaluate the algorithms in a uniformed and general case. How do you think about it?

from carskit.

irecsys commented on September 15, 2024

I have updated the evalRanking() in Recommender.java. Let me know if you have further questions.

from carskit.

basvank commented on September 15, 2024

I have a small comment in your commit

from carskit.

irecsys commented on September 15, 2024

Hello, the changes above removed items which have been rated by a given user in given context from the candidate list for evaluations.

Also, if you are interested in revising and building the CARSKit library, please let me know. I will add you to the contributor list.

from carskit.

Results differ between CARSKit and LibRec? about carskit HOT 13 CLOSED

Comments (13)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent