quora / qmf Goto Github PK

View Code? Open in Web Editor NEW

462.0 462.0 96.0 266 KB

A fast and scalable C++ library for implicit-feedback matrix factorization models

License: Apache License 2.0

CMake 2.42% C++ 97.58%

qmf's People

Contributors

Stargazers

Watchers

Forkers

dkorolev mattpd ml-ai-nlp-ir lenovor jsalva yangjunpro helma-t hzhaoaf shashankg7 vgoklani bigr-lab mathluo mindis zhujiem albietz caomw shrivastava-piyush kingzqwang marius92mc techscientist github-hongweizhang liuxialong eliethesaiyan digideskio opentable zzzrbx nguyenhoangson 8-chems jeppe txliheng styladev alrehamy minghao2016 eggie5 anuranrc jayinai hangelwen light-bringer flyscofield ingtube zhengnan agangzz zhouyonglong skytodinfi amano-ginji nwut chihming zhhengcs colinsongf sandy4321 zhengd07 feiyeye yochju jbdatascience stone8oy bsu-wish shafiahmed bkj poivrenoir afcarl auserj sher-ali kendankendan vitamin-github hgccode knowledgehacker allensmile sibyllalee1688 upepo lengzi fishexpert jianantian liuwei430524 pzhao16me peterlau0626 defaultrobot yueyedeai pconstr joetag zhiweilin zuiwanting ilovejs luyulong sydboys ddtuan99 fire-archive dliofindia seanpm2001 seanwallawalla-forks andysucao innernull ioanna24 rnaimehaom wudingfengbo praveensolanki1

qmf's Issues

Non-contiguous labels

Does qmf support non-contiguous user and item labels? (apologies for the obvious question but I have not studied the source code yet)

train/test infinite loss

It might be difficult to reproduce, but every now and then, both train and test loss are infinite from the beginning until the end of the process.
Have you ever noticed that?

when iterating, it always needs to initialize the left factor to 0?

As for als ,

when iterating, it always needs to initialize the left factor to 0?

what if I just use setFactors(genUnif) to initialize the left factor in the first round of optimizing,
and then just keep the fitting factors in later iterating?

And what if i change the als training as the incremental mini-batch mode, shall i still need to initialize the left factor to 0?

@albietz

Thanks very much

Is there plans to support distribute training by multi-machine?

Shall I use cosine or dot product to calculate the similarity between user & item?

Shall I user consine or dot product to calculate the similarity between user & item?

(since there would be some negative latent fatore for user & item, is it still suitable to cosine?

thanks very much

How can I make bpr to surpport "Time Travelling"

As the user behavior time window is sliding, and at different time,the user bahavior should have different effect (such as one click happened yesterday and another click happened 30 days before ,should affect the user factor quite differently).

And my question is how can I make it in BPR?

Can I just update the factor (old) with a decay rate?
such as
new_factor = old_factor * 0.99999999 + some_loss_error

thanks very much

Evaluation output

Hi Guys - we're playing around with QMF. Quick question - when running the following command

/qmf/bin$ ./wals --train_dataset=space.train --user_factors=user_factors --item_factors=item_factors --regularization_lambda=0.05 --confidence_weight=40 --nepochs=10 --nfactors=30 --nthreads=4 --test_avg_metrics=p@5,r@5 --test_always --num_test_users=30

We're expecting some form of output or information regarding the evaluation based on the documentation. The output from the binary is here.

./wals --train_dataset=space.train --user_factors=user_factors --item_factors=item_factors --regularization_lambda=0.05 --confidence_weight=40 --nepochs=10 --nfactors=30 --nthreads=4 --test_avg_metrics=p@5,r@5 --test_always --num_test_users=30
I0926 13:56:18.471247 18817 wals.cpp:85] loading training data
I0926 13:56:22.179965 18817 wals.cpp:95] training
I0926 13:56:33.102653 18817 WALSEngine.cpp:80] epoch 1: train loss = 54.118
I0926 13:56:43.987740 18817 WALSEngine.cpp:80] epoch 2: train loss = 53.8029
I0926 13:56:54.861482 18817 WALSEngine.cpp:80] epoch 3: train loss = 53.79
I0926 13:57:05.761577 18817 WALSEngine.cpp:80] epoch 4: train loss = 53.7868
I0926 13:57:16.658241 18817 WALSEngine.cpp:80] epoch 5: train loss = 53.7855
I0926 13:57:27.557302 18817 WALSEngine.cpp:80] epoch 6: train loss = 53.7849
I0926 13:57:38.434825 18817 WALSEngine.cpp:80] epoch 7: train loss = 53.7845
I0926 13:57:49.277307 18817 WALSEngine.cpp:80] epoch 8: train loss = 53.7843
I0926 13:58:00.133127 18817 WALSEngine.cpp:80] epoch 9: train loss = 53.7841
I0926 13:58:11.047634 18817 WALSEngine.cpp:80] epoch 10: train loss = 53.784
I0926 13:58:11.047685 18817 wals.cpp:99] saving model output

and no files are created in the bin dir. Can you give any suggestions as to what we could be doing wrong?

is the evalset_ and the step of evaluate(epoch); necessary if we dont care the loss rate for the online envitoment ?

is the evalset_ and the step of
evaluate(epoch);
necessary if we dont care the loss rate for the online enviroment ?

for without evalset, we can save a lot of time

Create a Release

Could you create a release/tag ?

Make failed

I've installed all the requested libraries, but I got the following error message when building the binaries by make. Can someone tell me how to fix it? Thanks!

Scanning dependencies of target BPREngineTest
[ 53%] Building CXX object CMakeFiles/BPREngineTest.dir/qmf/test/BPREngineTest.cpp.o
/home/a30135/tmp/qmf/qmf/test/BPREngineTest.cpp: In member function ‘virtual void qmf::BPREngine_init_Test::TestBody()’:
/home/a30135/tmp/qmf/qmf/test/BPREngineTest.cpp:33:78: error: could not convert ‘{{3, 2}, {5, 2}, {3, 4}, {6, 2}, {7, 10}}’ from ‘’ to ‘std::vectorqmf::DatasetElem’
std::vector dataset = {{3, 2}, {5, 2}, {3, 4}, {6, 2}, {7, 10}};
^
/home/a30135/tmp/qmf/qmf/test/BPREngineTest.cpp:61:76: error: could not convert ‘{{5, 4}, {3, 10}, {6, 12}, {8, 13}}’ from ‘’ to ‘std::vectorqmf::DatasetElem’
std::vector testDataset = {{5, 4}, {3, 10}, {6, 12}, {8, 13}};
^
/home/a30135/tmp/qmf/qmf/test/BPREngineTest.cpp: In member function ‘virtual void qmf::BPREngine_optimize_Test::TestBody()’:
/home/a30135/tmp/qmf/qmf/test/BPREngineTest.cpp:108:55: error: could not convert ‘{{1, 1}, {2, 2}}’ from ‘’ to ‘std::vectorqmf::DatasetElem’
std::vector dataset = {{1, 1}, {2, 2}};
^
/home/a30135/tmp/qmf/qmf/test/BPREngineTest.cpp:121:71: error: could not convert ‘{{1, 1}, {1, 3}, {2, 2}, {3, 1}}’ from ‘’ to ‘std::vectorqmf::DatasetElem’
std::vector dataset = {{1, 1}, {1, 3}, {2, 2}, {3, 1}};
^
/home/a30135/tmp/qmf/qmf/test/BPREngineTest.cpp:140:71: error: could not convert ‘{{1, 1}, {1, 3}, {2, 2}, {3, 1}}’ from ‘’ to ‘std::vectorqmf::DatasetElem’
std::vector dataset = {{1, 1}, {1, 3}, {2, 2}, {3, 1}};
^
make[2]: *** [CMakeFiles/BPREngineTest.dir/qmf/test/BPREngineTest.cpp.o] Error 1
make[1]: *** [CMakeFiles/BPREngineTest.dir/all] Error 2
make: *** [all] Error 2

MovieLens performance

Hi All --

Has anyone ever benchmarks this package on the MovieLens datasets? I'm looking at this for the first time and trying to figure out what hyperparamers need to be tuned to get good performance.

Thanks!
~ Ben

EDIT: Related -- if anyone has a pointer to a properly parameterized example of using this library, that'd also be super helpful.

test_dataset option with wals

is test_dataset not available with wals, or is it missing from the dodumentation?

Unit tests not passing

Hi Guys - I'm getting the following error when running make. Not sure if this is an isolated issue.

/qmf$ make [ 42%] Built target qmf [ 46%] Built target gtest [ 50%] Built target gtest_main Scanning dependencies of target BPREngineTest [ 53%] Building CXX object CMakeFiles/BPREngineTest.dir/qmf/test/BPREngineTest.cpp.o /home/ubuntu/qmf/qmf/test/BPREngineTest.cpp: In member function ‘virtual void qmf::BPREngine_init_Test::TestBody()’: /home/ubuntu/qmf/qmf/test/BPREngineTest.cpp:33:78: error: could not convert ‘{{3, 2}, {5, 2}, {3, 4}, {6, 2}, {7, 10}}’ from ‘<brace-enclosed initializer list>’ to ‘std::vector<qmf::DatasetElem>’ std::vector<DatasetElem> dataset = {{3, 2}, {5, 2}, {3, 4}, {6, 2}, {7, 10}}; ^ /home/ubuntu/qmf/qmf/test/BPREngineTest.cpp:61:76: error: could not convert ‘{{5, 4}, {3, 10}, {6, 12}, {8, 13}}’ from ‘<brace-enclosed initializer list>’ to ‘std::vector<qmf::DatasetElem>’ std::vector<DatasetElem> testDataset = {{5, 4}, {3, 10}, {6, 12}, {8, 13}}; ^ /home/ubuntu/qmf/qmf/test/BPREngineTest.cpp: In member function ‘virtual void qmf::BPREngine_optimize_Test::TestBody()’: /home/ubuntu/qmf/qmf/test/BPREngineTest.cpp:108:55: error: could not convert ‘{{1, 1}, {2, 2}}’ from ‘<brace-enclosed initializer list>’ to ‘std::vector<qmf::DatasetElem>’ std::vector<DatasetElem> dataset = {{1, 1}, {2, 2}}; ^ /home/ubuntu/qmf/qmf/test/BPREngineTest.cpp:121:71: error: could not convert ‘{{1, 1}, {1, 3}, {2, 2}, {3, 1}}’ from ‘<brace-enclosed initializer list>’ to ‘std::vector<qmf::DatasetElem>’ std::vector<DatasetElem> dataset = {{1, 1}, {1, 3}, {2, 2}, {3, 1}}; ^ /home/ubuntu/qmf/qmf/test/BPREngineTest.cpp:140:71: error: could not convert ‘{{1, 1}, {1, 3}, {2, 2}, {3, 1}}’ from ‘<brace-enclosed initializer list>’ to ‘std::vector<qmf::DatasetElem>’ std::vector<DatasetElem> dataset = {{1, 1}, {1, 3}, {2, 2}, {3, 1}}; ^ make[2]: *** [CMakeFiles/BPREngineTest.dir/qmf/test/BPREngineTest.cpp.o] Error 1 make[1]: *** [CMakeFiles/BPREngineTest.dir/all] Error 2 make: *** [all] Error 2

is the returan value this function( Matrix WALSEngine::computeXtX(const Matrix& X) ) to get the transpose matrix of the original matrix?

Matrix WALSEngine::computeXtX(const Matrix& X)

WALSEngine::computeXtX

Install on mac

Can the qmf library install on mac ? thanks,

the loss is inf

when i use bpr , the train loss and test loss is inf , and the result is inf

as the for bpr loss function, is the expression right in the code?

As for the loss function and updating the latent vector(parameter) ,

i find it is different implementment in the code and in the pater

in code:
Double BPREngine::loss(const Double scoreDifference) const {
return log(1.0 + exp(-scoreDifference));
}

in paper:

i just paste the rule here:
http://photo27.hexun.com/p/2019/0201/632421/b_vip_11118BF2E45A1C8033DC6DA3576F7A2C.jpg
http://photo27.hexun.com/p/2019/0201/632421/b_vip_11118BF2E45A1C8033DC6DA3576F7A2C.jpg

you can get the rule here

Θ = Θ + α( e^(-Xuij) / (1+ e ^(-Xuij) )........)

performance of bpr

Hi all,

I just run bpr ,
and my input data is like this:
5 million users
680,000 items
5 billion clicks

and my machine is as the details at bottom.

and it would take me about 3.5 hours to train (convergence) with 20 epoch.

and is there any way to accelerate the speed of the training?

and my machine is like this:

lsb_release -a
LSB Version: :core-4.1-amd64:core-4.1-noarch:cxx-4.1-amd64:cxx-4.1-noarch:desktop-4.1-amd64:desktop-4.1-noarch:languages-4.1-amd64:languages-4.1-noarch:printing-4.1-amd64:printing-4.1-noarch
Distributor ID: CentOS
Description: CentOS Linux release 7.3.1611 (Core)
Release: 7.3.1611
Codename: Core

cat /proc/cpuinfo| grep "physical id"| sort| uniq| wc -l
2

cat /proc/cpuinfo| grep "cpu cores"| uniq
cpu cores : 8

cat /proc/cpuinfo| grep "processor"| wc -l
32

Thanks very much

has anyone met the question about dsytrs_?

Linking CXX executable test/BPREngineTest
/usr/local/lib/gcc/x86_64-unknown-linux-gnu/5.4.0/../../../../lib64/liblapack.a(dsytrs.f.o): In function dsytrs_': dsytrs.f:(.text+0x1ec): undefined reference to dswap_'
dsytrs.f:(.text+0x247): undefined reference to dscal_' dsytrs.f:(.text+0x3ab): undefined reference to dswap_'
dsytrs.f:(.text+0x3fc): undefined reference to dswap_' dsytrs.f:(.text+0x492): undefined reference to dger_'
dsytrs.f:(.text+0x4cf): undefined reference to dswap_' dsytrs.f:(.text+0x57f): undefined reference to dger_'
dsytrs.f:(.text+0x5f2): undefined reference to dger_' dsytrs.f:(.text+0x68b): undefined reference to dgemv_'
dsytrs.f:(.text+0x6ce): undefined reference to dswap_' dsytrs.f:(.text+0x725): undefined reference to dger_'
dsytrs.f:(.text+0x764): undefined reference to dscal_' dsytrs.f:(.text+0x815): undefined reference to dger_'
dsytrs.f:(.text+0x86b): undefined reference to dger_' dsytrs.f:(.text+0x9bd): undefined reference to dgemv_

Question - how to get final recommendations for each user?

I ran wals and got 2 output files, one is the user matrix and other is the item matrix.
My question is, is there some tool that can multiply the two matrices together to get the final user-item matrix, which can be used to get recommendations for each user?

I have the following 2 factor files as the output of wals

// user factors file 
user1 factor_1 factor_2 factor_3 ...
user2 factor_1 factor_2 factor_3 ... 
user3 factor_1 factor_2 factor_3 ...
...

// item factors file 
item1 factor_1 factor_2 factor_3 ...
item2 factor_1 factor_2 factor_3 ... 
item3 factor_1 factor_2 factor_3 ...
...

What I want is something like this

user1 item3|score item1|score item5|score ...
user2 item9|score item2|score item8|score ...
...

For each user, item recommendations are sorted by their scores
If I have the output above, I can recommend item3, item1, item5 to user1 and item9, item2 item8 to user2, etc.

Is there a tool to do this or I have to write something to multiply the 2 matrices and do filtering/sorting myself?

Thanks,
Eddie

Filtering train set in `test_avg_metrics`

Hi all --

In some other recommender systems, there's a flag to filter the items in the training set from the test metrics -- is there something like that in qmf?

That is, it doesn't make sense to compute p@k on the test set if we allow the top-k predictions to contain items that we observed in the train set, and therefore know won't appear in the test set.

Thanks

How to install on Centos7

I can't install it on Centos7, got a google::base unreference error.

error when making

I have executed the command "cmake ." , but when I execute the command "make", an error occurs and it says that "qmf-master/qmf/utils/ThreadPool-inl.h:34:21: error: using invalid field" , e.g. the code "tasks_.emplace([package]{ (*package)(); });" . I have installed the dependent libs glog, gflags and lapack and my cmake version is 3.10.2 , gcc version is 7.5.0

Is eval_num_neg implemented?

According to the documentation:

--eval_num_neg (default 3): number of random negatives per positive used to generate the fixed evaluation sets mentioned above

However, I see it only once in the code, with a DEFINE
./qmf/bpr.cpp:42:DEFINE_uint64(eval_num_neg, 3, "number of negatives generated per positive in evaluation");

What is the purpose of this flag?

Can I combine bpr (for offline long term static behavior) with als (near line for short time (mini batch) incremental dynamic behavior)?

Recommendation algorithms deal with ever growing user feedback, so for bpr , I can just use long term userid+itemid pairs to train the model and get the user/item latent factors (we can do that daily).

As the data grows (especially for incremental short time user behavior), we can just assume that the user factor is static (since new user would have cold start issue with collaborative filtering, we can just ignore new users). And we can use the daily trained user/item factors as the initialized values (instead of the values initialized randomly) for the mini-batch iterate (we can do this hourly). And we just fix the user vector, and iterate the newly growing user+item pairs the update the item factors (which we can call that as the partial ALS (only alternating for the item part and keep the user part static). And if we do like that, we can get the newly growing data trained (get the related items’ factors updated) and also can get almost as good as the result with the total data training .