irecsys / carskit Goto Github PK

View Code? Open in Web Editor NEW

123.0 20.0 54.0 44.91 MB

Java-Based Context-aware Recommendation Library

Home Page: https://carskit.github.io/

License: GNU General Public License v3.0

CSS 2.11% Java 97.89%

carskit context-aware recommender-system recommendation-engine recommendations matrix-factorization context contextual

carskit's People

Contributors

Stargazers

Watchers

carskit's Issues

NA values

Hi,
1)Please, I wanted to know, is it MANDATORY to have some NA values in the context fields if I am planning to use similarity CAMF?
2) If the answer in 1 is yes, how should I assign them in my dataset, just randomly change some context condition to be NA?
3)Kindly can you confirm if the format below is correct for the input file as I get 1 all the time in REC10 when I choose CAMF models?

userid,itemid,rating,p1,p2,p3,p4
1,1,3,NA,NA,NA,NA
1,1,3,NA,NA,NA,NA
1,1,3,NA,NA,NA,NA
1,1,3,NA,NA,NA,NA
1,1,4,X-Large,X-Small,Nominal,Small
1,1,3,NA,NA,NA,NA
1,1,2,X-Large,X-Large,Numeric,X-Large
1,1,3,NA,NA,NA,NA
1,1,3,NA,NA,NA,NA
1,1,3,NA,NA,NA,NA
1,1,2,NA,NA,NA,NA
1,1,2,X-Large,X-Large,Numeric,X-Large
1,1,2,X-Large,X-Large,Numeric,X-Large
1,1,5,NA,NA,NA,NA
1,1,5,NA,NA,NA,NA
1,1,5,NA,NA,NA,NA
1,1,5,NA,NA,NA,NA
1,1,5,NA,NA,NA,NA
1,1,5,NA,NA,NA,NA
1,1,5,NA,NA,NA,NA
1,1,5,NA,NA,NA,NA
1,1,5,NA,NA,NA,NA
1,1,5,NA,NA,NA,NA
1,1,5,NA,NA,NA,NA
1,1,5,NA,NA,NA,NA
1,1,5,NA,NA,NA,NA
1,1,5,NA,NA,NA,NA
1,1,5,NA,NA,NA,NA
1,1,4,X-Large,X-Small,Numeric,X-Small
1,1,5,NA,NA,NA,NA
1,1,5,NA,NA,NA,NA
1,1,5,NA,NA,NA,NA
1,1,5,NA,NA,NA,NA
1,1,5,NA,NA,NA,NA
1,1,5,NA,NA,NA,NA
1,1,4,Large,X-Small,Numeric,X-Small
1,1,4,X-Large,X-Small,Numeric,X-Small
1,1,4,X-Large,X-Small,Numeric,X-Small
1,1,4,Large,X-Small,Numeric,X-Small
1,1,5,NA,NA,NA,NA
Thank you.

Blank result

I used CSLIM_ICS and CSLIM_LCS for TripAdvisor2's ratings data. But final results are Pre5: NA,Pre10: NA, Rec5: NA, Rec10: NA,... and recommendation file is empty.
What does this mean?

Using many different models

Is it possible to do some changes to the config file in order to be able to call different models one time, for example, if I want the results from MF and BiasedMF is there a method other than run the tool many times?
Thank you so much.

Evaluation for item recommendation

Hello and thanks for this very interesting piece of software.

I am trying to use it for item recommendation based on positive-only input (interactions between a user and an item).
I have some questions related to the format of the input and the algorithm:

How should I encode the ratings?
I set all the ratings as 1, obtaining a compact format input like:
user, item, rating, location
1, 2, 1, home
2, 1, 1, home
1, 3, 1, work
...
Does the algorithm take care explicitly of the user-item negative interactions (i.e. when a user did not interact with an item)?
I would be interesting in obtaining top K recommendations where the candidate items list contains also the items that have already been interacted with (i.e. 'rated'). This is to obtain higher accuracy, since it seems that my users tend to interact again with items they have previously experienced.
Is it possible to obtain this behaviour?
(related to 2.) I am using a test set in a separate file to assess precision. My test file contains items that have already been rated by a user (possibly in the same context too). This seems to generate an error, since by running CARSKit I obtain:

INFO ] 2016-04-12 17:41:59,227 -- WorkingPath: /home/paolo/raisDataScience/recommender_system/CARSkit/CARSKit.Workspace/
[INFO ] 2016-04-12 17:41:59,241 -- Your original rating data path: /home/paolo/raisDataScience/recommender_system/CARSkit/train.csv
[INFO ] 2016-04-12 17:41:59,241 -- Current working path: /home/paolo/raisDataScience/recommender_system/CARSkit/CARSKit.Workspace/
[WARN ] 2016-04-12 17:41:59,246 -- You rating data is in Compact format. CARSKit is working on transformation on the data format...
[INFO ] 2016-04-12 17:42:02,694 -- Data transformaton completed (from Compact to Binary format). See new rating file: /home/paolo/raisDataScience/recommender_system/CARSkit/CARSKit.Workspace/ratings_binary.txt
[INFO ] 2016-04-12 17:42:02,726 -- Dataset: ...ARSKit.Workspace/ratings_binary.txt
[INFO ] 2016-04-12 17:42:02,732 -- DataPath: /home/paolo/raisDataScience/recommender_system/CARSkit/CARSKit.Workspace/ratings_binary.txt
[INFO ] 2016-04-12 17:42:05,780 -- Rating data set has been successfully loaded.
[INFO ] 2016-04-12 17:42:05,878 --
/*****************************************************************************************************
*

Dataset: /home/paolo/raisDataScience/recommender_system/CARSkit/CARSKit.Workspace/ratings_binary.txt

User amount: 17113

Item amount: 78

Rate amount: 207493

Context dimensions: 1 (location)

Context conditions: 25 (location: 25)

Context situations: 24

Contextual Data density: 15.5447%

Scale distribution: [1.0 x 207493]

Average value of all ratings: 1.000000

Standard deviation of all ratings: 0.000000

Mode of all rating values: 1.000000

Median of all rating values: 1.000000
*
*****************************************************************************************************/
[INFO ] 2016-04-12 17:42:05,878 -- Dataset: ...ARSKit.Workspace/ratings_binary.txt
[INFO ] 2016-04-12 17:42:05,878 -- DataPath: /home/paolo/raisDataScience/recommender_system/CARSkit/CARSKit.Workspace/ratings_binary.txt
[INFO ] 2016-04-12 17:42:08,577 -- Rating data set has been successfully loaded.
[INFO ] 2016-04-12 17:42:08,579 -- With Setup: test-set -f /home/paolo/raisDataScience/recommender_system/CARSkit/test.csv
[INFO ] 2016-04-12 17:42:08,580 -- Dataset: ...recommender_system/CARSkit/test.csv
[INFO ] 2016-04-12 17:42:08,580 -- DataPath: /home/paolo/raisDataScience/recommender_system/CARSkit/test.csv
[ERROR] 2016-04-12 17:42:08,580 -- value already present: 0
java.lang.IllegalArgumentException: value already present: 0
at com.google.common.collect.HashBiMap.put(HashBiMap.java:238)
at com.google.common.collect.HashBiMap.put(HashBiMap.java:215)
at carskit.data.processor.DataDAO.readData(DataDAO.java:169)
at carskit.main.CARSKit.runAlgorithm(CARSKit.java:317)
at carskit.main.CARSKit.execute(CARSKit.java:115)
at carskit.main.CARSKit.main(CARSKit.java:87)

Just to add all the information, my config file is:

dataset.ratings.wins=C:\Users\irecs\Desktop\Data\music\ratings.txt
dataset.ratings.lins=/home/paolo/raisDataScience/recommender_system/CARSkit/train.csv

dataset.social.wins=-1
dataset.social.lins=1

ratings.setup=-threshold -1 -datatransformation 1

recommender=CAMF_C

evaluation.setup=test-set -f /home/paolo/raisDataScience/recommender_system/CARSkit/test.csv
item.ranking=on -topN 10

output.setup=-folder CARSKit.Workspace -verbose on, off --to-clipboard --to-file results_all.txt

guava.cache.spec=maximumSize=200,expireAfterAccess=2m

num.factors=10
num.max.iter=120

learn.rate=2e-10 -max -1 -bold-driver

reg.lambda=0.001 -u 0.001 -i 0.001 -b 0.001 -s 0.001 -c 0.001
pgm.setup=-alpha 2 -beta 0.5 -burn-in 300 -sample-lag 10 -interval 100

similarity=PCC
num.shrinkage=-1

num.neighbors=10

AoBPR=-lambda 0.3
BUCM=-gamma 0.5
BHfree=-k 10 -l 10 -gamma 0.2 -sigma 0.01
FISM=-rho 100 -alpha 0.5
Hybrid=-lambda 0.5
LDCC=-ku 20 -kv 19 -au 1 -av 1 -beta 1
PD=-sigma 2.5
PRankD=-alpha 20
RankALS=-sw on
RSTE=-alpha 0.4
SLIM=-l1 1 -l2 5 -k 50
CSLIM_C=-lw1 1 -lw2 5 -lc1 1 -lc2 5 -k 20 -als 0
CSLIM_CUCI=-lw1 1 -lw2 5 -lc1 1 -lc2 5 10 -1 -als 0
CSLIM_CI=-lw1 1 -lw2 5 -lc1 1 -lc2 5 -k 20 -als 0
CSLIM_CU=-lw1 1 -lw2 5 -lc1 1 -lc2 5 -k 10 -als 0
GCSLIM_CC=-lw1 1 -lw2 5 -lc1 1 -lc2 5 -k -1 -als 0
CSLIM_ICS=-lw1 1 -lw2 5 -k -1 -als 0
CSLIM_LCS=-lw1 1 -lw2 5 -k -1 -als 0
CSLIM_MCS=-lw1 1 -lw2 5 -k -1 -als 0
GCSLIM_ICS=-lw1 1 -lw2 5 -k -1 -als 0
GCSLIM_LCS=-lw1 1 -lw2 5 -k -1 -als 0
GCSLIM_MCS=-lw1 1 -lw2 5 -k -1 -als 0
FM=-lw 0.01 -lf 0.02

Performance of context aware algorithms

Hi,

I am trying out CARSKit with the DePaul Movie dataset and I find that the contextual algos consistently perform worse than traditional collaborative filtering algorithms or even the average based algorithms.

I have not changed any of the algorithms specific default hyper parameters in settings.conf. The generated results are shown below. I have highlighted the ones that perform better than the rest.

Are these results expected? Is there any other dataset on which the contextual algorithms might perform better? Also, if there is a benchmarks page (ala LibRec) that I have missed, please do point me towards it.

As you can see, context unaware algorithms seem to be performing better. Please let me know if I have missed something here.

RESULTS:
Final Results by SlopeOne, MAE: 0.967844, RMSE: 1.181897, NAME: 0.241961, rMAE: 0.946509, rRMSE: 1.211459, MPE: 0.000000, carskit.alg.baseline.cf.SlopeOne@56c86535, Time: '00:00','00:00'
Final Results by ItemKNN, MAE: 0.868002, RMSE: 1.098535, NAME: 0.217000, rMAE: 0.837544, rRMSE: 1.130362, MPE: 0.000000, 10, PCC, -1, Time: '00:00','00:00'
Final Results by UserKNN, MAE: 0.916442, RMSE: 1.136917, NAME: 0.229111, rMAE: 0.892226, rRMSE: 1.171608, MPE: 0.000000, 10, PCC, -1, Time: '00:00','00:00'
Final Results by PMF, MAE: 2.329682, RMSE: 2.725427, NAME: 0.582421, rMAE: 2.329682, rRMSE: 2.725427, MPE: 0.000000, numFactors: 10, numIter: 120, lrate: 2.0E-10, maxlrate: -1.0, regB: 0.001, regU: 0.001, regI: 0.001, regC: 0.001, isBoldDriver: true, Time: '00:00','00:00'
Final Results by BPMF, MAE: 0.852885, RMSE: 1.086851, NAME: 0.213221, rMAE: 0.828192, rRMSE: 1.123668, MPE: 0.000000, 10, 120, Time: '00:04','00:00'
Final Results by BiasedMF, MAE: 1.231312, RMSE: 1.423191, NAME: 0.307828, rMAE: 1.230463, rRMSE: 1.460495, MPE: 0.000000, numFactors: 10, numIter: 120, lrate: 2.0E-10, maxlrate: -1.0, regB: 0.001, regU: 0.001, regI: 0.001, regC: 0.001, isBoldDriver: true, Time: '00:00','00:00'

Final Results by NMF, MAE: 0.729386, RMSE: 0.994550, NAME: 0.182347, rMAE: 0.696364, rRMSE: 1.036728, MPE: 0.000000, 10, 120, Time: '00:00','00:00'

Final Results by SVD++, MAE: 1.237963, RMSE: 1.430500, NAME: 0.309491, rMAE: 1.248961, rRMSE: 1.479570, MPE: 0.000000, numFactors: 10, numIter: 120, lrate: 2.0E-10, maxlrate: -1.0, regB: 0.001, regU: 0.001, regI: 0.001, regC: 0.001, isBoldDriver: true, Time: '00:01','00:00'
Final Results by UserSplitting-BiasedMF, MAE: 1.230567, RMSE: 1.424304, NAME: 0.307642, rMAE: 1.235632, rRMSE: 1.471055, MPE: 0.000000, numFactors: 10, numIter: 120, lrate: 2.0E-10, maxlrate: -1.0, regB: 0.001, regU: 0.001, regI: 0.001, regC: 0.001, isBoldDriver: true, Time: '00:00','00:00'
Final Results by UserSplitting-ItemKNN, MAE: 0.858941, RMSE: 1.089768, NAME: 0.214735, rMAE: 0.839733, rRMSE: 1.130370, MPE: 0.000000, 10, PCC, -1, Time: '00:00','00:00'
Final Results by UserSplitting-UserKNN, MAE: 0.907304, RMSE: 1.136996, NAME: 0.226826, rMAE: 0.886262, rRMSE: 1.175547, MPE: 0.000000, 10, PCC, -1, Time: '00:00','00:00'
Final Results by UserSplitting-SlopeOne, MAE: 0.940398, RMSE: 1.166736, NAME: 0.235100, rMAE: 0.915888, rRMSE: 1.198899, MPE: 0.000000, carskit.alg.baseline.cf.SlopeOne@771a1d97, Time: '00:00','00:00'
Final Results by UserSplitting-PMF, MAE: 2.329682, RMSE: 2.725427, NAME: 0.582421, rMAE: 2.329682, rRMSE: 2.725427, MPE: 0.000000, numFactors: 10, numIter: 120, lrate: 2.0E-10, maxlrate: -1.0, regB: 0.001, regU: 0.001, regI: 0.001, regC: 0.001, isBoldDriver: true, Time: '00:00','00:00'
Final Results by UserSplitting-BPMF, MAE: 0.869493, RMSE: 1.123035, NAME: 0.217373, rMAE: 0.839136, rRMSE: 1.152975, MPE: 0.000000, 10, 120, Time: '00:05','00:00'
Final Results by UserSplitting-BiasedMF, MAE: 1.229375, RMSE: 1.419656, NAME: 0.307344, rMAE: 1.231460, rRMSE: 1.456512, MPE: 0.000000, numFactors: 10, numIter: 120, lrate: 2.0E-10, maxlrate: -1.0, regB: 0.001, regU: 0.001, regI: 0.001, regC: 0.001, isBoldDriver: true, Time: '00:00','00:00'

Final Results by UserSplitting-NMF, MAE: 0.769139, RMSE: 1.050730, NAME: 0.192285, rMAE: 0.743096, rRMSE: 1.094921, MPE: 0.000000, 10, 120, Time: '00:00','00:00'

Final Results by UserSplitting-SVD++, MAE: 1.233742, RMSE: 1.425546, NAME: 0.308436, rMAE: 1.234045, rRMSE: 1.466769, MPE: 0.000000, numFactors: 10, numIter: 120, lrate: 2.0E-10, maxlrate: -1.0, regB: 0.001, regU: 0.001, regI: 0.001, regC: 0.001, isBoldDriver: true, Time: '00:01','00:00'
Final Results by UserSplitting-UserAvg, MAE: 1.120046, RMSE: 1.330542, NAME: 0.280011, rMAE: 1.097438, rRMSE: 1.357503, MPE: 0.000000, carskit.alg.baseline.avg.UserAverage@78d5cfd6, Time: '00:00','00:00'
Final Results by UserSplitting-ItemAvg, MAE: 1.090122, RMSE: 1.312299, NAME: 0.272530, rMAE: 1.073971, rRMSE: 1.345288, MPE: 0.000000, carskit.alg.baseline.avg.ItemAverage@1d402894, Time: '00:00','00:00'

Final Results by UserSplitting-UserItemAvg, MAE: 0.745786, RMSE: 1.071021, NAME: 0.186446, rMAE: 0.744489, rRMSE: 1.107227, MPE: 0.000000, carskit.alg.baseline.avg.UserItemAverage@5b7fd935, Time: '00:00','00:00'

Final Results by UserItemAvg, MAE: 0.689877, RMSE: 1.004177, NAME: 0.172469, rMAE: 0.668129, rRMSE: 1.036691, MPE: 0.000000, carskit.alg.baseline.avg.UserItemAverage@637719cf, Time: '00:00','00:00'

Final Results by ItemSplitting-UserItemAvg, MAE: 0.706875, RMSE: 1.022935, NAME: 0.176719, rMAE: 0.696765, rRMSE: 1.059279, MPE: 0.000000, carskit.alg.baseline.avg.UserItemAverage@644f96a0, Time: '00:00','00:00'

Final Results by ItemSplitting-ItemKNN, MAE: 0.860276, RMSE: 1.091466, NAME: 0.215069, rMAE: 0.833373, rRMSE: 1.130851, MPE: 0.000000, 10, PCC, -1, Time: '00:00','00:00'
Final Results by ItemSplitting-UserKNN, MAE: 0.914196, RMSE: 1.136213, NAME: 0.228549, rMAE: 0.887852, rRMSE: 1.172677, MPE: 0.000000, 10, PCC, -1, Time: '00:00','00:00'
Final Results by ItemSplitting-SlopeOne, MAE: 0.962192, RMSE: 1.176141, NAME: 0.240548, rMAE: 0.941140, rRMSE: 1.210475, MPE: 0.000000, carskit.alg.baseline.cf.SlopeOne@16d8db20, Time: '00:00','00:00'

Final Results by ItemSplitting-NMF, MAE: 0.764128, RMSE: 1.040546, NAME: 0.191032, rMAE: 0.735947, rRMSE: 1.081225, MPE: 0.000000, 10, 120, Time: '00:00','00:00'

Final Results by ItemSplitting-ItemAvg, MAE: 1.101381, RMSE: 1.305931, NAME: 0.275345, rMAE: 1.083516, rRMSE: 1.337027, MPE: 0.000000, carskit.alg.baseline.avg.ItemAverage@78d5cfd6, Time: '00:00','00:00'
Final Results by UISplitting-UserItemAvg, MAE: 0.764234, RMSE: 1.091865, NAME: 0.191058, rMAE: 0.765368, rRMSE: 1.128514, MPE: 0.000000, carskit.alg.baseline.avg.UserItemAverage@3315a56d, Time: '00:00','00:00'

CONTEXT AWARE:
Final Results by ContextAvg, MAE: 1.210466, RMSE: 1.405670, NAME: 0.302616, rMAE: 1.176579, rRMSE: 1.440965, MPE: 0.000000, carskit.alg.baseline.avg.ContextAverage@13065590, Time: '00:00','00:00'
Final Results by ContextAvg, MAE: 1.210466, RMSE: 1.405670, NAME: 0.302616, rMAE: 1.176579, rRMSE: 1.440965, MPE: 0.000000, carskit.alg.baseline.avg.ContextAverage@61877c15, Time: '00:00','00:00'
Final Results by ItemContextAvg, MAE: 1.088244, RMSE: 1.313791, NAME: 0.272061, rMAE: 1.058464, rRMSE: 1.340398, MPE: 0.000000, carskit.alg.baseline.avg.ItemContextAverage@5c877f84, Time: '00:00','00:00'
Final Results by UserContextAvg, MAE: 1.027653, RMSE: 1.248563, NAME: 0.256913, rMAE: 1.013124, rRMSE: 1.294667, MPE: 0.000000, carskit.alg.baseline.avg.UserContextAverage@2f178e05, Time: '00:00','00:00'
Final Results by CPTF, MAE: 2.329682, RMSE: 2.725427, NAME: 0.582421, rMAE: 2.329682, rRMSE: 2.725427, MPE: 0.000000, numFactors: 10, numIter: 120, lrate: 2.0E-10, maxlrate: -1.0, regB: 0.001, regU: 0.001, regI: 0.001, regC: 0.001, isBoldDriver: true, Time: '00:00','00:00'
Final Results by CAMF_CI, MAE: 1.549310, RMSE: 2.006931, NAME: 0.387328, rMAE: 1.562343, rRMSE: 2.050521, MPE: 0.000000, numFactors: 10, numIter: 120, lrate: 2.0E-10, maxlrate: -1.0, regB: 0.001, regU: 0.001, regI: 0.001, regC: 0.001, isBoldDriver: true, Time: '00:01','00:00'
Final Results by CAMF_CU, MAE: 1.539712, RMSE: 2.004897, NAME: 0.384928, rMAE: 1.550807, rRMSE: 2.045866, MPE: 0.000000, numFactors: 10, numIter: 120, lrate: 2.0E-10, maxlrate: -1.0, regB: 0.001, regU: 0.001, regI: 0.001, regC: 0.001, isBoldDriver: true, Time: '00:01','00:00'
Final Results by CAMF_C, MAE: 1.227395, RMSE: 1.438526, NAME: 0.306849, rMAE: 1.225086, rRMSE: 1.498170, MPE: 0.000000, numFactors: 10, numIter: 120, lrate: 2.0E-10, maxlrate: -1.0, regB: 0.001, regU: 0.001, regI: 0.001, regC: 0.001, isBoldDriver: true, Time: '00:01','00:00'
Final Results by CAMF_CUCI, MAE: 1.231175, RMSE: 1.429833, NAME: 0.307794, rMAE: 1.229866, rRMSE: 1.476612, MPE: 0.000000, numFactors: 10, numIter: 120, lrate: 2.0E-10, maxlrate: -1.0, regB: 0.001, regU: 0.001, regI: 0.001, regC: 0.001, isBoldDriver: true, Time: '00:04','00:00'
Final Results by CAMF_ICS, MAE: 2.329682, RMSE: 2.725427, NAME: 0.582421, rMAE: 2.329682, rRMSE: 2.725427, MPE: 0.000000, numFactors: 10, numIter: 120, lrate: 2.0E-10, maxlrate: -1.0, regB: 0.001, regU: 0.001, regI: 0.001, regC: 0.001, isBoldDriver: true, Time: '00:02','00:00'
Final Results by CAMF_LCS, MAE: 2.300570, RMSE: 2.697210, NAME: 0.575143, rMAE: 2.302239, rRMSE: 2.700686, MPE: 0.000000, numFactors: 10, numIter: 120, lrate: 2.0E-10, maxlrate: -1.0, regB: 0.001, regU: 0.001, regI: 0.001, regC: 0.001, isBoldDriver: true, Time: '00:03','00:00'
Final Results by CAMF_MCS, MAE: 2.329682, RMSE: 2.725427, NAME: 0.582421, rMAE: 2.329682, rRMSE: 2.725427, MPE: 0.000000, numFactors: 10, numIter: 120, lrate: 2.0E-10, maxlrate: -1.0, regB: 0.001, regU: 0.001, regI: 0.001, regC: 0.001, isBoldDriver: true, Time: '00:03','00:00'

CARSKit library use

Is there any demonstration to use CARSKit library in web application ? Please provide some examples to call from web application for newbies..

Evaluation using test-set

Hi,

I recently use CARSkit to compare some context-aware recommendation algorithms of the state of the art. I would like to evaluate them by supplying manually the training and testing set.

It creates the binary file, but I got the error "value already present: 0"
I checked and I don't have duplicate lines present in both train and test files.
What could be the problem?

Here is my config file:

dataset.ratings.wins=C:\train.csv
dataset.social.wins=-1
dataset.social.lins=-1
ratings.setup=-threshold 3 -datatransformation -1
recommender=camf_ci
evaluation.setup=test-set -f C:\testFile_0.csv
item.ranking=off -topN 10
output.setup=-folder CARSKit.Workspace -verbose on, off --to-file results.txt
guava.cache.spec=maximumSize=200,expireAfterAccess=2m
########## Model-based Methods ##########
num.factors=10
num.max.iter=100
learn.rate=2e-2 -max -1 -bold-driver
reg.lambda=0.0001 -c 0.001
pgm.setup=-alpha 2 -beta 0.5 -burn-in 300 -sample-lag 10 -interval 100
similarity=pcc
num.shrinkage=-1
num.neighbors=10

The error output:
java.lang.IllegalArgumentException: value already present: 0
at com.google.common.collect.HashBiMap.put(HashBiMap.java:238)
at com.google.common.collect.HashBiMap.put(HashBiMap.java:215)
at carskit.data.processor.DataDAO.readData(DataDAO.java:208)
at carskit.main.CARSKit.runAlgorithm(CARSKit.java:319)
at carskit.main.CARSKit.execute(CARSKit.java:121)
at carskit.main.CARSKit.main(CARSKit.java:93)

Thanks in advance for your help.

No recommender is specified!

Dear Yong,

Recently, I came across your toolkit as well as your recently published paper "Context-Aware Collaborative Filtering Using Context Similarity: An Empirical Comparison". I am working on reproducing your results but I have been trying different CARS algorithms i,e., chen1, chen2, FM and I get an error: java.lang.Exception: No recommender is specified! . I had a look at the Java code. All the algorithms are called in the main and they exist in their corresponding folders. I tried to change parameters ... but it didn't work.

For algorithms like ExactFiltering and CPTF, it started running but I got the following results:

Final Results by BPRR2, Pre5: 0.048804,Pre10: 0.034689, Rec5: 0.048804, Rec10: 0.069378, AUC5: 0.610541, AUC10: 0.651134, MAP5: 0.028704, MAP10: 0.032047,NDCG5: 0.055237, NDCG10: 0.066468,MRR5: 0.129486, MRR10: 0.140072, -1.0,10,0.02,-1.0,1.0E-4,1.0E-4,100, Time: '05:59','00:04'
Final Results by BPRR1, Pre5: 0.021053,Pre10: 0.017225, Rec5: 0.021053, Rec10: 0.034450, AUC5: 0.550194, AUC10: 0.578849, MAP5: 0.013529, MAP10: 0.015608,NDCG5: 0.025646, NDCG10: 0.033053,MRR5: 0.064354, MRR10: 0.072185, -1.0,10,0.02,-1.0,1.0E-4,1.0E-4,100, Time: '06:00','00:04'
Final Results by ExactFiltering, Pre5: 0.000000,Pre10: 0.000000, Rec5: 0.000000, Rec10: 0.000000, AUC5: 0.500000, AUC10: 0.500000, MAP5: 0.000000, MAP10: 0.000000,NDCG5: 0.000000, NDCG10: 0.000000,MRR5: 0.000000, MRR10: 0.000000, 20, pcc, -1, Time: '00:00','30:09'
Final Results by CPTF, Pre5: 0.000000,Pre10: 0.000000, Rec5: 0.000000, Rec10: 0.000000, AUC5: 0.500000, AUC10: 0.500000, MAP5: 0.000000, MAP10: 0.000000,NDCG5: 0.000000, NDCG10: 0.000000,MRR5: 0.000000, MRR10: 0.000000, 10, 0.02, -1.0, 1.0E-4, 100, true, Time: '00:00','00:10'

P.S., The problems happen only when I use CARS algorithms!

Snippet of my code:

dataset.ratings.lins=[.... path to my data ]/.csv

dataset.social.wins=-1
dataset.social.lins=-1

ratings.setup=-threshold -1 -datatransformation 1 -fullstat -1

recommender=chen2
....

I look forward to hearing back from you.
Thank you.
Best,
-- Manel.

Setting up seed for algorithms with initialization

Hi Prof. Zheng,

I saw the CV can specify the random seed, but I think it is for data partition?

I want to reproduce the result (MAP, NDCG), and I use CAMF_CU, whose fitting requires initialisation and gradient descent. I am wondering whether there is a way that I can fix the initialization for the gradient descent? Fix the initial value?

Also, I am wondering how do you use command line code to compile the java scripts into jar file? Would you mind sharing that command in this space.

Thank you for your consideration.

Regarding data sparsity in context aware datasets

Hello,

I am interested in the "In-car music" publicly available dataset and tried using it to train the CARSKit algorithm.
The dataset is very sparse and I need a solution for this so that the algorithm can give accurate results.
I already came across the option of deleting corresponding columns/rows with sparse values but thats not possible since mostly all rows and columns have lot of sparse values.
Please provide a solution to the problem .
Thanks in advance for your help.

Regards,
Madhuri

Issues with running UserContextAverage

Hi Prof. Zheng,

I found a NullPointerException when running the User Context Average method.

Could you please look at this issue in particular?

Thank you for your consideration.

Getting started with CARSKit on GNU/Linux

Hello, we have been chatting a bit right after your great tutorial at ACM SAC this year! :)

I was getting started with CARSKit, by simply changing dataset.ratings.lins in settings.conf to ./data/Movie_DePaulMovie/ratings.txt - the current path of the dataset on my system.

This is the output of CARSKit-v0.3.0.jar:

$ java -jar CARSKit-v0.3.0.jar 
[INFO ] 2016-08-15 01:28:58,715 -- WorkingPath: ./data/Movie_DePaulMovie/CARSKit.Workspace/
[INFO ] 2016-08-15 01:28:58,726 -- Your original rating data path: ./data/Movie_DePaulMovie/ratings.txt
[INFO ] 2016-08-15 01:28:58,726 -- Current working path: ./data/Movie_DePaulMovie/CARSKit.Workspace/
[INFO ] 2016-08-15 01:28:58,762 -- Dataset: ...ARSKit.Workspace/ratings_binary.txt
[INFO ] 2016-08-15 01:28:58,765 -- DataPath: ./data/Movie_DePaulMovie/CARSKit.Workspace/ratings_binary.txt
[ERROR] 2016-08-15 01:28:58,765 -- 
java.lang.NullPointerException
    at java.io.File.<init>(File.java:277)
    at happy.coding.io.FileIO.getReader(FileIO.java:154)
    at carskit.data.processor.DataDAO.readData(DataDAO.java:198)
    at carskit.main.CARSKit.readData(CARSKit.java:250)
    at carskit.main.CARSKit.execute(CARSKit.java:117)
    at carskit.main.CARSKit.main(CARSKit.java:92)

What could be the problem?

Best, Pasquale

Using CARSKit with implicit feedback

Hi :)

I am using CARSKit with BPR for top-N recommendation. My dataset is purchase data so I have only implicit feedback (more precisely I just know what the user bought). I added some cool stuff like weather information and demographic stuff about the cities of the stores etc.

When using CARSKit I only applied BPR with Pre-filtering techniques (User-, item-, uisplitting) so far because of the implicit feedback.

Is there another approach I should try out? For example CSLIM?
Is it possible to use a hybrid filtering approach like DCR/DCW/BPSO in combination with BPR?

Many thanks in advance!! :)

Little detail

the version variable still in 0.1 in 0.2 version.

protected static String version = "0.1.0";

MAE and RMSE for SLIM

I configured tem.ranking=off -topN 10 but results of SLIM was still Pre, Rec,...
How can I get MAE and RMSE values?

CARS - dependent-dev models

Hi Dr.,
What I understand from the tool documents the CARS - dependent-sim models are used for Ranking only.

My Question is:
Can w use the deviation based(CARS - dependent-dev models) for predicting?

I just tried them with the new release and they worked, just wanted to confirm this issue.
I am looking for your reply with appreciation.

Error when using DCW and DCR

Hi,

I recently tried out the newly added DCW and DCR algorithms but I am getting an error in the ContextSimilarity() function in DCW and ContextRelaxation() function in DCR. The error is java.lang.IndexOutOfBoundsException: Index: 1, Size: 1 and happens in conds1.get(i) in both function classes.

Can you check if you get the same error as well? Or, if not, let me know what settings and data file you use so that I can check if it's a problem on my side?

Thanks for the help!

Value of r

Hi,

Please can you help with the following question :

If I am using the line (evaluation.setup=given-ratio -r 0.6), as I understood r represents the percentage of the training to the testing which in turn refers to the Matrix density so if r=0.6 then the Matrix density is 60%. Is my understanding correct?

I run the tool many times and changed r value to be 0.1, 0.4 0.6 then 0.8 because I am looking for the MAE when Matrix density is 10%, 40%, 60% and 80%.
But I am surprised that the values of MAE and RMSE are increasing when density is increasing while they need to decrease as the training set will be more.

Not sure what I did wrong, my data file has the following format:

userid,itemid,rating,p1,p2,p3,p4
1,1,1,1,0.339,0,0.866
1,1,0.65,1,0.339,1,0.298
1,1,0.3,0.043,1,1,0.082

where p1,p2,p3,p4 are the four contexts I am using.

Your help is highly appreciated.
Thanks in advance.

Splitting data set

I use Movilens 100k data set. I use 20% of the data for testing. In this test data set I want each user to have only 10 rating values. How can I configure the setting.conf file to do that?

Results differ between CARSKit and LibRec?

When going through literature and the internet concerned with context-aware recommender systems I came across your CARSKit library which looks very promising. I am interested in this for my masters' thesis as for that I will compare the behavior of different context-aware recommender approaches and their context-unaware counterparts across different datasets, which is all supported by CARSKit.

However, when I started experimenting with the library I came across some unexpected behavior. I started with the MovieLens 100K context-unaware dataset but it did not produce the results I expected based on the data on http://www.librec.net/example.html. As far as I understand, the context-unaware algorithms are exact copies of the implementations provided by the LibRec library, so it should produce at least similar results. Please correct me if I am wrong here.

The rating results are similar, with maximum differences of 0.14% in RMSE and MAE, so not significant. However, I found out the results for top N recommendation (ranking) differ significantly. For this, I found the following results:

	Prec@5	Prec@10	Recall@5	Recall@10	AUC	MAP	NDCG	MRR
ItemKNN site	0,318	0,260	0,103	0,164	0,885	0,187	0,536	0,554
ItemKNN librec	0,321	0,259	0,105	0,162	0,907	0,093	0,198	0,550
ItemKNN cars	0,158	0,140	0,069	0,116	0,864	0,053	0,125	0,345
UserKNN site	0,338	0,280	0,116	0,182	0,884	0,208	0,554	0,569
UserKNN librec	0,327	0,278	0,115	0,181	0,915	0,104	0,214	0,556
UserKNN cars	0,089	0,098	0,033	0,078	0,803	0,023	0,071	0,202
SVD++ site	N/A	N/A	N/A	N/A	N/A	N/A	N/A	N/A
SVD++ librec	0,038	0,039	0,009	0,018	0,632	0,005	0,021	0,081
SVD++ cars	0,025	0,028	0,006	0,014	0,607	0,004	0,015	0,056

Where "[[algorithm]] site" is what is reported on http://www.librec.net/example.html, "[[algorithm]] librec" is what is produced by LibRec v1.3 and "[[algorithm]] cars" is what is produced by CARSKit v0.2.0.

So, I summarized the results as follows, showing the relative difference between the results reported on the LibRec site and produced by the two algorithms for the ItemKNN and UserKNN algorithms and the relative difference between LibRec and CARSKit for the SVD++ algorithm, which has no results on the site:

	Prec@5	Prec@10	Recall@5	Recall@10	AUC	MAP	NDCG	MRR
ItemKNN % difference librec wrt site	0,94%	-0,38%	1,94%	-1,22%	2,49%	-50,27%	-63,06%	-0,72%
ItemKNN % difference cars wrt site	-50,31%	-46,15%	-33,01%	-29,27%	-2,37%	-71,66%	-76,68%	-37,73%

	Prec@5	Prec@10	Recall@5	Recall@10	AUC	MAP	NDCG	MRR
UserKNN % difference librec wrt site	-3,25%	-0,71%	-0,86%	-0,55%	3,51%	-50,00%	-61,37%	-2,28%
UserKNN % difference cars wrt site	-73,67%	-65,00%	-71,55%	-57,14%	-9,16%	-88,94%	-87,18%	-64,50%

	Prec@5	Prec@10	Recall@5	Recall@10	AUC	MAP	NDCG	MRR
SVD++ % difference cars wrt librec	-34,21%	-28,21%	-33,33%	-22,22%	-3,96%	-20,00%	-28,57%	-30,86%

So as you can see, all results for LibRec are within 4% of what is reported on their site for all metrics except MAP and NDCG (still need to figure out why that is, but that seems unrelated), but differ over at least 30% up to even 90% for all metrics except AUC for the CARS library.

Of course I realize that LibRec could have it all wrong, both in the library and therefore also on their site, but as you claim that CARS is based on LibRec the difference should at least be explainable. Furthermore, my gut feeling says that LibRec has it right as it is widely adopted and their results correspond closely to for instance the results reported by M. Levy and K. Jack in Efficient Top-N Recommendation by Linear Regression (RecSys conference 2013).

One final thing I can think of is that I used the wrong format to represent a context-unaware data set. Based on your user guide I formatted the file as follows, which as I understand gives all ratings the context NA and thus the data set should be context-unaware:

user,item,rating,context:na
196,242,3,1
186,302,3,1
22,377,1,1
244,51,2,1
...

So, summarizing, I was wondering if you have any explanation for this behavior, whether this is a known problem or if this is the desired behavior? If not, do you have any idea what the cause of this difference can be? I am willing and able to dive into the code, but at first glance it seems similar to LibRec. So maybe you are able to indicate where this library significantly differs?

Hope this helps you and we can figure it out! Keep up the good work!

For reference, here are the log outputs for the runs I did to arrive at the above results, to show that I used the same settings for LibRec and CARSKit (based on what is reported on the LibRec site):

UserKNN rating librec
[INFO ] 2016-02-01 xx:xx:xx,xxx -- UserKNN,0.736409,0.943499,0.184102,0.699130,0.988028,0.576380,,60, PCC, 25,'xx:xx','xx:xx'
ItemKNN rating librec
[INFO ] 2016-02-01 xx:xx:xx,xxx -- ItemKNN,0.723676,0.923718,0.180919,0.686630,0.970820,0.572490,,40, PCC, 2500,'xx:xx','xx:xx'

UserKNN rating cars
[INFO ] 2016-02-01 xx:xx:xx,xxx -- Final Results by UserKNN, MAE: 0.736947, RMSE: 0.944363, NAME: 0.184237, rMAE: 0.700390, rRMSE: 0.988958, MPE: 0.000000, 60, PCC, 25, Time: 'xx:xx','xx:xx'
ItemKNN rating cars
[INFO ] 2016-02-01 xx:xx:xx,xxx -- Final Results by ItemKNN, MAE: 0.724341, RMSE: 0.924782, NAME: 0.181085, rMAE: 0.687070, rRMSE: 0.972629, MPE: 0.000000, 40, PCC, 2500, Time: 'xx:xx','xx:xx'


UserKNN top N librec
[INFO ] 2016-02-01 xx:xx:xx,xxx -- UserKNN,0.327498,0.277956,0.114772,0.180838,0.914600,0.103961,0.213676,0.555877,,80, COS, 50,'xx:xx','xx:xx'
ItemKNN top N librec
[INFO ] 2016-02-01 xx:xx:xx,xxx -- ItemKNN,0.320504,0.259231,0.104593,0.161601,0.907029,0.092651,0.197906,0.550298,,80, COS, 50,'xx:xx','xx:xx'

UserKNN top N cars
[INFO ] 2016-02-01 xx:xx:xx,xxx -- Final Results by UserKNN, Pre5: 0.089234,Pre10: 0.098344, Rec5: 0.033394, Rec10: 0.077723, AUC: 0.803486, MAP: 0.022585, NDCG: 0.070543, MRR: 0.201989, 80, COS, 50, Time: 'xx:xx','xx:xx'
ItemKNN top N cars
[INFO ] 2016-02-01 xx:xx:xx,xxx -- Final Results by ItemKNN, Pre5: 0.157872,Pre10: 0.139542, Rec5: 0.069404, Rec10: 0.115820, AUC: 0.863906, MAP: 0.052911, NDCG: 0.124604, MRR: 0.344676, 80, COS, 50, Time: 'xx:xx','xx:xx'


SVD++ rating librec
[INFO ] 2016-02-01 xx:xx:xx,xxx -- SVD++,0.718764,0.912503,0.179691,0.681520,0.956593,0.575460,,5, 0.01, -1.0, 0.1, 0.1, 0.1, 100, true,'xx:xx','xx:xx'

SVD++ rating cars
[INFO ] 2016-02-01 xx:xx:xx,xxx -- Final Results by SVD++, MAE: 0.720267, RMSE: 0.913879, NAME: 0.180067, rMAE: 0.682750, rRMSE: 0.958558, MPE: 0.000000, numFactors: 5, numIter: 100, lrate: 0.01, maxlrate: -1.0, regB: 0.1, regU: 0.1, regI: 0.1, regC: 0.1, isBoldDriver: true, Time: 'xx:xx','xx:xx'


SVD++ top N librec
[INFO ] 2016-02-01 xx:xx:xx,xxx -- SVD++,0.038287,0.039476,0.009236,0.018094,0.632358,0.005420,0.020654,0.081348,,5, 0.01, -1.0, 0.1, 0.1, 0.1, 100, true,'xx:xx','xx:xx'

SVD++ top N cars
[INFO ] 2016-02-01 xx:xx:xx,xxx -- Final Results by SVD++, Pre5: 0.025109,Pre10: 0.027889, Rec5: 0.006315, Rec10: 0.014330, AUC: 0.607030, MAP: 0.003629, NDCG: 0.014905, MRR: 0.056455, numFactors: 5, numIter: 100, lrate: 0.01, maxlrate: -1.0, regB: 0.1, regU: 0.1, regI: 0.1, regC: 0.1, isBoldDriver: true, Time: 'xx:xx','xx:xx'

Splitting Approaches

Hi,

I'm trying to use BPR with UISplitting. Everytime I execute the settings.conf the log tells me that there haven been 0 items/users splitted. Is this normal?

My procedure: I created two settings.conf files. One is for transforming the testset into binary format and the other one is for the real process with my trainset as trainset and the testset (transformed to binary format) as testset.
I also tried to directly use my testset as non-binary format csv-file together with the trainset but this gives me an error, so I think it might be fine to convert the testset in a first step and then use the output as new testset.

1. create binary format from testset
  Code snippet:

dataset.ratings.lins=/home/[...]/test_carskit_Bundesland.csv
dataset.social.lins=-1
ratings.setup=-threshold -1 -datatransformation 1 -fullstat -1
[...]
evaluation.setup=test-set -f /home/[...]/train_carskit_Bundesland.csv

After this step I extract the converted testset (now: ratings_binary.txt) from the created CARSKit.Workspace folder and put it next to my trainset. Then I deleted the CARSKit.Workspace folder and the debug.log and results.txt file.

1. run normal approach:
  Code snippet:

dataset.ratings.lins=/home/[...]/train_carskit_Bundesland.csv
dataset.social.lins=-1
ratings.setup=-threshold -1 -datatransformation 1 -fullstat -1
recommender= uisplitting -traditional bpr
evaluation.setup=test-set -f /home/[...]/ratings_binary.txt

When I now run this, than it first starts converting the trainset into binary format which is fine. After doing so it starts the UISplit and it tells me 0 items have been splitted and 0 users have been splitted. I don't know if this is okay because the process continues with bpr aftwerwards and doesn't give me an error. But this has finished and I evaluate my results and compare the context splitted results with the one only using BPR the curves seem to be very similar. So I thought it might be that I am doing something wrong here.

Can you help me please :)
Thank you very much!

This is my output:

/**********************************************************************************************************
 *
 * Dataset: /home/[...]/CARSKit.Workspace/ratings_binary.txt
 * 
 * Statistics of U-I-C Matrix:
 * User amount: 508769
 * Item amount: 93689
 * Rate amount: 4118854
 * Context dimensions: 1 (bundesland)
 * Context conditions: 12 (bundesland: 12)
 * Context situations: 11
 * Data density: 0.0007%
 * Scale distribution: [1.0 x 4118854]
 * Average value of all ratings: 1.000000
 * Standard deviation of all ratings: 0.000000
 * Mode of all rating values: 1.000000
 * Median of all rating values: 1.000000
 *
 **********************************************************************************************************/
With Setup: test-set -f /home/jannis/Fileshares/matti/CROSS_VALIDATION/1/Bundesland/ratings_binary.txt
Dataset: ...ION/1/Bundesland/ratings_binary.txt
DataPath: /home/jannis/Fileshares/matti/CROSS_VALIDATION/1/Bundesland/ratings_binary.txt
Rating data set has been successfully loaded.
0 items have been splitted.
0 users have been splitted.
UI Splitting is done... Algorithm 'bpr' will be applied to the transformed data set.
Density of transformed 2D rating matrix ============================== 0.0075777913744593155
Final Results by UISplitting-BPR, Pre1: 0.012440,Pre2: 0.010356,Pre3: 0.008427,Pre4: 0.007442,Pre5: 0.006712,Pre6: 0.006134,Pre7: 0.005674,Pre8: 0.005283,Pre9: 0.004979,Pre10: 0.004729,Pre11: 0.004500,Pre12: 0.004297,Pre13: 0.004124,Pre14: 0.003963,Pre15: 0.003813,Pre16: 0.003681,Pre17: 0.003572,Pre18: 0.003469,Pre19: 0.003379,Pre20: 0.003295, Rec1: 0.005252,Rec2: 0.008719,Rec3: 0.010527,Rec4: 0.012352,Rec5: 0.013880,Rec6: 0.015140,Rec7: 0.016400,Rec8: 0.017554,Rec9: 0.018630, Rec10: 0.019651, Rec11: 0.020667, Rec12: 0.021555, Rec13: 0.022390, Rec14: 0.023233, Rec15: 0.023917, Rec16: 0.024644, Rec17: 0.025447, Rec18: 0.026170, Rec19: 0.026991, Rec20: 0.027826, AUC: 0.531280, MAP: 0.009775, NDCG: 0.017080, MRR: 0.022582, -1.0,10,0.02,-1.0,1.0E-4,1.0E-4,100, Time: '02:11:02','01:02:48'

irecsys / carskit Goto Github PK

carskit's People

Contributors

Stargazers

Watchers

Forkers

carskit's Issues

Recommend Projects

Recommend Topics

Recommend Org