quora / qmf Goto Github PK
View Code? Open in Web Editor NEWA fast and scalable C++ library for implicit-feedback matrix factorization models
License: Apache License 2.0
A fast and scalable C++ library for implicit-feedback matrix factorization models
License: Apache License 2.0
Does qmf support non-contiguous user and item labels? (apologies for the obvious question but I have not studied the source code yet)
It might be difficult to reproduce, but every now and then, both train and test loss are infinite from the beginning until the end of the process.
Have you ever noticed that?
As for als ,
when iterating, it always needs to initialize the left factor to 0?
what if I just use setFactors(genUnif) to initialize the left factor in the first round of optimizing,
and then just keep the fitting factors in later iterating?
And what if i change the als training as the incremental mini-batch mode, shall i still need to initialize the left factor to 0?
Thanks very much
Shall I user consine or dot product to calculate the similarity between user & item?
(since there would be some negative latent fatore for user & item, is it still suitable to cosine?
thanks very much
As the user behavior time window is sliding, and at different time,the user bahavior should have different effect (such as one click happened yesterday and another click happened 30 days before ,should affect the user factor quite differently).
And my question is how can I make it in BPR?
Can I just update the factor (old) with a decay rate?
such as
new_factor = old_factor * 0.99999999 + some_loss_error
thanks very much
Hi Guys - we're playing around with QMF. Quick question - when running the following command
/qmf/bin$ ./wals --train_dataset=space.train --user_factors=user_factors --item_factors=item_factors --regularization_lambda=0.05 --confidence_weight=40 --nepochs=10 --nfactors=30 --nthreads=4 --test_avg_metrics=p@5,r@5 --test_always --num_test_users=30
We're expecting some form of output or information regarding the evaluation based on the documentation. The output from the binary is here.
./wals --train_dataset=space.train --user_factors=user_factors --item_factors=item_factors --regularization_lambda=0.05 --confidence_weight=40 --nepochs=10 --nfactors=30 --nthreads=4 --test_avg_metrics=p@5,r@5 --test_always --num_test_users=30
I0926 13:56:18.471247 18817 wals.cpp:85] loading training data
I0926 13:56:22.179965 18817 wals.cpp:95] training
I0926 13:56:33.102653 18817 WALSEngine.cpp:80] epoch 1: train loss = 54.118
I0926 13:56:43.987740 18817 WALSEngine.cpp:80] epoch 2: train loss = 53.8029
I0926 13:56:54.861482 18817 WALSEngine.cpp:80] epoch 3: train loss = 53.79
I0926 13:57:05.761577 18817 WALSEngine.cpp:80] epoch 4: train loss = 53.7868
I0926 13:57:16.658241 18817 WALSEngine.cpp:80] epoch 5: train loss = 53.7855
I0926 13:57:27.557302 18817 WALSEngine.cpp:80] epoch 6: train loss = 53.7849
I0926 13:57:38.434825 18817 WALSEngine.cpp:80] epoch 7: train loss = 53.7845
I0926 13:57:49.277307 18817 WALSEngine.cpp:80] epoch 8: train loss = 53.7843
I0926 13:58:00.133127 18817 WALSEngine.cpp:80] epoch 9: train loss = 53.7841
I0926 13:58:11.047634 18817 WALSEngine.cpp:80] epoch 10: train loss = 53.784
I0926 13:58:11.047685 18817 wals.cpp:99] saving model output
and no files are created in the bin dir. Can you give any suggestions as to what we could be doing wrong?
is the evalset_ and the step of
evaluate(epoch);
necessary if we dont care the loss rate for the online enviroment ?
for without evalset, we can save a lot of time
Could you create a release/tag ?
I've installed all the requested libraries, but I got the following error message when building the binaries by make. Can someone tell me how to fix it? Thanks!
Scanning dependencies of target BPREngineTest
[ 53%] Building CXX object CMakeFiles/BPREngineTest.dir/qmf/test/BPREngineTest.cpp.o
/home/a30135/tmp/qmf/qmf/test/BPREngineTest.cpp: In member function ‘virtual void qmf::BPREngine_init_Test::TestBody()’:
/home/a30135/tmp/qmf/qmf/test/BPREngineTest.cpp:33:78: error: could not convert ‘{{3, 2}, {5, 2}, {3, 4}, {6, 2}, {7, 10}}’ from ‘’ to ‘std::vectorqmf::DatasetElem’
std::vector dataset = {{3, 2}, {5, 2}, {3, 4}, {6, 2}, {7, 10}};
^
/home/a30135/tmp/qmf/qmf/test/BPREngineTest.cpp:61:76: error: could not convert ‘{{5, 4}, {3, 10}, {6, 12}, {8, 13}}’ from ‘’ to ‘std::vectorqmf::DatasetElem’
std::vector testDataset = {{5, 4}, {3, 10}, {6, 12}, {8, 13}};
^
/home/a30135/tmp/qmf/qmf/test/BPREngineTest.cpp: In member function ‘virtual void qmf::BPREngine_optimize_Test::TestBody()’:
/home/a30135/tmp/qmf/qmf/test/BPREngineTest.cpp:108:55: error: could not convert ‘{{1, 1}, {2, 2}}’ from ‘’ to ‘std::vectorqmf::DatasetElem’
std::vector dataset = {{1, 1}, {2, 2}};
^
/home/a30135/tmp/qmf/qmf/test/BPREngineTest.cpp:121:71: error: could not convert ‘{{1, 1}, {1, 3}, {2, 2}, {3, 1}}’ from ‘’ to ‘std::vectorqmf::DatasetElem’
std::vector dataset = {{1, 1}, {1, 3}, {2, 2}, {3, 1}};
^
/home/a30135/tmp/qmf/qmf/test/BPREngineTest.cpp:140:71: error: could not convert ‘{{1, 1}, {1, 3}, {2, 2}, {3, 1}}’ from ‘’ to ‘std::vectorqmf::DatasetElem’
std::vector dataset = {{1, 1}, {1, 3}, {2, 2}, {3, 1}};
^
make[2]: *** [CMakeFiles/BPREngineTest.dir/qmf/test/BPREngineTest.cpp.o] Error 1
make[1]: *** [CMakeFiles/BPREngineTest.dir/all] Error 2
make: *** [all] Error 2
Hi All --
Has anyone ever benchmarks this package on the MovieLens datasets? I'm looking at this for the first time and trying to figure out what hyperparamers need to be tuned to get good performance.
Thanks!
~ Ben
EDIT: Related -- if anyone has a pointer to a properly parameterized example of using this library, that'd also be super helpful.
is test_dataset not available with wals, or is it missing from the dodumentation?
Hi Guys - I'm getting the following error when running make. Not sure if this is an isolated issue.
/qmf$ make [ 42%] Built target qmf [ 46%] Built target gtest [ 50%] Built target gtest_main Scanning dependencies of target BPREngineTest [ 53%] Building CXX object CMakeFiles/BPREngineTest.dir/qmf/test/BPREngineTest.cpp.o /home/ubuntu/qmf/qmf/test/BPREngineTest.cpp: In member function ‘virtual void qmf::BPREngine_init_Test::TestBody()’: /home/ubuntu/qmf/qmf/test/BPREngineTest.cpp:33:78: error: could not convert ‘{{3, 2}, {5, 2}, {3, 4}, {6, 2}, {7, 10}}’ from ‘<brace-enclosed initializer list>’ to ‘std::vector<qmf::DatasetElem>’ std::vector<DatasetElem> dataset = {{3, 2}, {5, 2}, {3, 4}, {6, 2}, {7, 10}}; ^ /home/ubuntu/qmf/qmf/test/BPREngineTest.cpp:61:76: error: could not convert ‘{{5, 4}, {3, 10}, {6, 12}, {8, 13}}’ from ‘<brace-enclosed initializer list>’ to ‘std::vector<qmf::DatasetElem>’ std::vector<DatasetElem> testDataset = {{5, 4}, {3, 10}, {6, 12}, {8, 13}}; ^ /home/ubuntu/qmf/qmf/test/BPREngineTest.cpp: In member function ‘virtual void qmf::BPREngine_optimize_Test::TestBody()’: /home/ubuntu/qmf/qmf/test/BPREngineTest.cpp:108:55: error: could not convert ‘{{1, 1}, {2, 2}}’ from ‘<brace-enclosed initializer list>’ to ‘std::vector<qmf::DatasetElem>’ std::vector<DatasetElem> dataset = {{1, 1}, {2, 2}}; ^ /home/ubuntu/qmf/qmf/test/BPREngineTest.cpp:121:71: error: could not convert ‘{{1, 1}, {1, 3}, {2, 2}, {3, 1}}’ from ‘<brace-enclosed initializer list>’ to ‘std::vector<qmf::DatasetElem>’ std::vector<DatasetElem> dataset = {{1, 1}, {1, 3}, {2, 2}, {3, 1}}; ^ /home/ubuntu/qmf/qmf/test/BPREngineTest.cpp:140:71: error: could not convert ‘{{1, 1}, {1, 3}, {2, 2}, {3, 1}}’ from ‘<brace-enclosed initializer list>’ to ‘std::vector<qmf::DatasetElem>’ std::vector<DatasetElem> dataset = {{1, 1}, {1, 3}, {2, 2}, {3, 1}}; ^ make[2]: *** [CMakeFiles/BPREngineTest.dir/qmf/test/BPREngineTest.cpp.o] Error 1 make[1]: *** [CMakeFiles/BPREngineTest.dir/all] Error 2 make: *** [all] Error 2
Matrix WALSEngine::computeXtX(const Matrix& X)
WALSEngine::computeXtX
Can the qmf library install on mac ? thanks,
when i use bpr , the train loss and test loss is inf , and the result is inf
As for the loss function and updating the latent vector(parameter) ,
i find it is different implementment in the code and in the pater
in code:
Double BPREngine::loss(const Double scoreDifference) const {
return log(1.0 + exp(-scoreDifference));
}
in paper:
i just paste the rule here:
http://photo27.hexun.com/p/2019/0201/632421/b_vip_11118BF2E45A1C8033DC6DA3576F7A2C.jpg
http://photo27.hexun.com/p/2019/0201/632421/b_vip_11118BF2E45A1C8033DC6DA3576F7A2C.jpg
you can get the rule here
Θ = Θ + α( e^(-Xuij) / (1+ e ^(-Xuij) )........)
Hi all,
I just run bpr ,
and my input data is like this:
5 million users
680,000 items
5 billion clicks
and my machine is as the details at bottom.
and it would take me about 3.5 hours to train (convergence) with 20 epoch.
and is there any way to accelerate the speed of the training?
and my machine is like this:
lsb_release -a
LSB Version: :core-4.1-amd64:core-4.1-noarch:cxx-4.1-amd64:cxx-4.1-noarch:desktop-4.1-amd64:desktop-4.1-noarch:languages-4.1-amd64:languages-4.1-noarch:printing-4.1-amd64:printing-4.1-noarch
Distributor ID: CentOS
Description: CentOS Linux release 7.3.1611 (Core)
Release: 7.3.1611
Codename: Core
cat /proc/cpuinfo| grep "physical id"| sort| uniq| wc -l
2
cat /proc/cpuinfo| grep "cpu cores"| uniq
cpu cores : 8
cat /proc/cpuinfo| grep "processor"| wc -l
32
Thanks very much
Linking CXX executable test/BPREngineTest
/usr/local/lib/gcc/x86_64-unknown-linux-gnu/5.4.0/../../../../lib64/liblapack.a(dsytrs.f.o): In function dsytrs_': dsytrs.f:(.text+0x1ec): undefined reference to
dswap_'
dsytrs.f:(.text+0x247): undefined reference to dscal_' dsytrs.f:(.text+0x3ab): undefined reference to
dswap_'
dsytrs.f:(.text+0x3fc): undefined reference to dswap_' dsytrs.f:(.text+0x492): undefined reference to
dger_'
dsytrs.f:(.text+0x4cf): undefined reference to dswap_' dsytrs.f:(.text+0x57f): undefined reference to
dger_'
dsytrs.f:(.text+0x5f2): undefined reference to dger_' dsytrs.f:(.text+0x68b): undefined reference to
dgemv_'
dsytrs.f:(.text+0x6ce): undefined reference to dswap_' dsytrs.f:(.text+0x725): undefined reference to
dger_'
dsytrs.f:(.text+0x764): undefined reference to dscal_' dsytrs.f:(.text+0x815): undefined reference to
dger_'
dsytrs.f:(.text+0x86b): undefined reference to dger_' dsytrs.f:(.text+0x9bd): undefined reference to
dgemv_
I ran wals and got 2 output files, one is the user matrix and other is the item matrix.
My question is, is there some tool that can multiply the two matrices together to get the final user-item matrix, which can be used to get recommendations for each user?
I have the following 2 factor files as the output of wals
// user factors file
user1 factor_1 factor_2 factor_3 ...
user2 factor_1 factor_2 factor_3 ...
user3 factor_1 factor_2 factor_3 ...
...
// item factors file
item1 factor_1 factor_2 factor_3 ...
item2 factor_1 factor_2 factor_3 ...
item3 factor_1 factor_2 factor_3 ...
...
What I want is something like this
user1 item3|score item1|score item5|score ...
user2 item9|score item2|score item8|score ...
...
For each user, item recommendations are sorted by their scores
If I have the output above, I can recommend item3, item1, item5 to user1 and item9, item2 item8 to user2, etc.
Is there a tool to do this or I have to write something to multiply the 2 matrices and do filtering/sorting myself?
Thanks,
Eddie
Hi all --
In some other recommender systems, there's a flag to filter the items in the training set from the test metrics -- is there something like that in qmf?
That is, it doesn't make sense to compute p@k on the test set if we allow the top-k predictions to contain items that we observed in the train set, and therefore know won't appear in the test set.
Thanks
I can't install it on Centos7, got a google::base unreference error.
I have executed the command "cmake ." , but when I execute the command "make", an error occurs and it says that "qmf-master/qmf/utils/ThreadPool-inl.h:34:21: error: using invalid field" , e.g. the code "tasks_.emplace([package]{ (*package)(); });" . I have installed the dependent libs glog, gflags and lapack and my cmake version is 3.10.2 , gcc version is 7.5.0
According to the documentation:
--eval_num_neg (default 3): number of random negatives per positive used to generate the fixed evaluation sets mentioned above
However, I see it only once in the code, with a DEFINE
./qmf/bpr.cpp:42:DEFINE_uint64(eval_num_neg, 3, "number of negatives generated per positive in evaluation");
What is the purpose of this flag?
Can I combine bpr (for offline long term static behavior) with als (near line for short time (mini batch) incremental dynamic behavior)?
Recommendation algorithms deal with ever growing user feedback, so for bpr , I can just use long term userid+itemid pairs to train the model and get the user/item latent factors (we can do that daily).
As the data grows (especially for incremental short time user behavior), we can just assume that the user factor is static (since new user would have cold start issue with collaborative filtering, we can just ignore new users). And we can use the daily trained user/item factors as the initialized values (instead of the values initialized randomly) for the mini-batch iterate (we can do this hourly). And we just fix the user vector, and iterate the newly growing user+item pairs the update the item factors (which we can call that as the partial ALS (only alternating for the item part and keep the user part static). And if we do like that, we can get the newly growing data trained (get the related items’ factors updated) and also can get almost as good as the result with the total data training .
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.