Giter VIP home page Giter VIP logo

alphafm's People

Contributors

castellanzhang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

alphafm's Issues

请问能否支持label为小数?

FM也可以做回归,但是看这个code好像label只能是二分类,若我想要学习一个0-1之内的得分,相当于做回归,应该如何修改?谢谢~

预估模型报错 load model error

cat /data/zhangbo_ret/data/test_data/ptr_train_libsvm_data_alpha_test_off.txt | /root/work/FM_Recall/alphaFM/bin/fm_predict -core 15 -dim 8 -m /data/zhangbo_ret/data/test_data/alpha_fm_ptr_param_train.txt -mf txt -out /data/zhangbo_ret/data/test_data/alpha_fm_ptr_predict_result.txt

我训练的模型参数文件是 alpha_fm_ptr_param_train.txt ,是8维的,但是按照官网介绍的这样取预测样本时,一直报错 load model... load model error! 。
请问这是怎么回事啊/?我加了 -mf 参数还是不行。你们有遇到过吗?

predict文件的label是如何确定的?

image

你好,我使用fm_predict输出了label 和 概率,我原本以为label = 1是当score > 0.5。但根据输出的结果来看似乎不是这样的,请问输出文件的label是如何确定的?

segment fault

请教大佬,train的时候偶现segment fault
有什么办法避免,或者定位?

load model中的bug

您好,我在使用过程中遇到一个bug,即:
我在训练时设置了不同的dim参数,如1,1,2,但是在predict时抛出了异常。
具体原因是output model时,没有写factor num这个参数
void ftrl_model::outputModel(ofstream& out) { out << "bias " << *muBias << endl; for(unordered_map<string, ftrl_model_unit*>::iterator iter = muMap.begin(); iter != muMap.end(); ++iter) { out << iter->first << " " << *(iter->second) << endl; } }

而在load model中,factor采用的是默认的factor num,所以抛出异常
while(getline(in, line)) { strVec.clear(); utils::splitString(line, ' ', &strVec); if(strVec.size() != 3 * factor_num + 4) { return false; } string& index = strVec[0]; ftrl_model_unit* pMU = new ftrl_model_unit(factor_num, strVec); muMap[index] = pMU; }

alphaFM 效果如何?

你好,
非常感谢你写的alpha FM,速度很快,很好用。可是,在我这边数据上跑的效果却远差于LR模型(使用alpha lr对比)。请问你那边测试alpha FM的效果如何?

在线训练时,载入模型后,卡在 start! init end!上,并不开始模型训练

以下是我的训练输出,从 2:40:45 输出 init end! 之后,八个小时都未开始训练。 机器有 40 个核, 用 13 个线程肯定没问题用 top 命令看了内存使用情况,没有其他使用内存的大程序。这种情况会有哪些可能的原因呢?
the train command is : hadoop fs -cat hdfs://xxxxx/20190124/22/* | /home/stat/alphaFM/bin/fm_train -imf txt -im model_test.txt -init_stdev 0 -core 13 -w_l1 9.91 -w_alpha 0.01 -dim 1,1,0 -mf txt -m model_test.txt load model... model loading finished [2019-01-25 02:40:45] start! [2019-01-25 02:40:45] init end!

fm_predict, multi-core, random order

when using fm_predict with -core 30, the output order is not the same as input.
It is ok if you only need auc. But when you need gauc, you can't associate label/score with uid, because the fm_predict output is random order.
You can only use -core 1 to persist order, but it is very slow.

Any improvements?

关于内存使用量的问题

这个项目还维护吗?代码存在严重的内存泄漏问题,虽然我不懂C++,但看了下代码貌似没有释放内存的代码。
训练模型的过程中内存不断上涨,训练了17亿条样本的时候内存已经涨到近80%,100G了。

          total        used        free      shared  buff/cache   available

Mem: 125G 101G 6.2G 8.0M 18G 18G

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
22797 algorit+ 20 0 0.098t 0.097t 1544 S 50.2 79.4 1280:04 fm_train

有关样本的权重

加入我有一个98个负样本 (feature,0)
有两个正样本(feature, 1)

如果能把上述100个样本转出成一个 98权重的(feature,0)一个2权重的(feature,1)

请教关于ps框架

alphaFM确实是神器,受小弟一拜~
BTW,您在博客中谈到实现ps分布式计算,这个感觉是蛮有诱惑力的,不知道您实现了没有,希望不吝赐教。

尝试过adam优化器吗

hi,大神。没错,又是我。

(1)个人在做实验的时候,修改mult为标准的h(x)-y,并将sample的标签小于0的置为0,发现auc提升了一些(千分之二吧);

(2)目前个人在尝试adam替换ftrl,不知道大神以前做过这个实验不?

祝好~

no member named '_Hash_node_base' in namespace 'std::__1::__detail'

MacOS

g++ -O3 fm_train.cpp src/Frame/pc_frame.cpp src/Utils/utils.cpp -I . -std=c++11 -o bin/fm_train -lpthread
In file included from fm_train.cpp:5:
In file included from ./src/FTRL/ftrl_trainer.h:5:
In file included from ./src/FTRL/ftrl_model.h:14:
./src/FTRL/../Mem/my_allocator.h:42:58: error: expected expression
if(typeid(T) == typeid(__detail::_Hash_node_base*))
^
./src/FTRL/../Mem/my_allocator.h:42:42: error: no member named '_Hash_node_base' in namespace 'std::__1::__detail'
if(typeid(T) == typeid(__detail::_Hash_node_base*))

./src/FTRL/../Mem/my_allocator.h:53:32: error: no template named '_Hash_node' in namespace 'std::__1::__detail'; did you mean '__hash_node'?
if(typeid(T) != typeid(std::__detail::_Hash_node<std::pair<const char* const, MODEL_UNIT >, false>))
^~~~~~~~~~~~~~~~~~~~~~~~~
__hash_node
/Library/Developer/CommandLineTools/usr/include/c++/v1/__hash_table:95:8: note: '__hash_node' declared here
struct __hash_node
^
In file included from fm_train.cpp:5:
In file included from ./src/FTRL/ftrl_trainer.h:5:
In file included from ./src/FTRL/ftrl_model.h:14:
./src/FTRL/../Mem/my_allocator.h:53:104: error: template argument for template type parameter must be a type
if(typeid(T) != typeid(std::__detail::_Hash_node<std::pair<const char* const, MODEL_UNIT >, false>))
^~~~~
/Library/Developer/CommandLineTools/usr/include/c++/v1/__hash_table:94:28: note: template parameter is declared here
template <class _Tp, class _VoidPtr>
^
In file included from fm_train.cpp:5:
In file included from ./src/FTRL/ftrl_trainer.h:5:
In file included from ./src/FTRL/ftrl_model.h:14:
./src/FTRL/../Mem/my_allocator.h:64:58: error: expected expression
if(typeid(T) == typeid(__detail::_Hash_node_base*))
^
./src/FTRL/../Mem/my_allocator.h:64:42: error: no member named '_Hash_node_base' in namespace 'std::__1::__detail'
if(typeid(T) == typeid(__detail::_Hash_node_base*))
~~~~~~~~~~^
./src/FTRL/../Mem/my_allocator.h:82:16: error: use of undeclared identifier '_Hash_impl'
return _Hash_impl::hash(key, strlen(key));
^
In file included from fm_train.cpp:5:
In file included from ./src/FTRL/ftrl_trainer.h:5:
./src/FTRL/ftrl_model.h:44:27: error: no template named '_Hash_node' in namespace 'std::__1::__detail'; did you mean '__hash_node'?
using node_type = std::__detail::_Hash_node<std::pair<const char* const, ftrl_model_unit >, false>;
^~~~~~~~~~~~~~~~~~~~~~~~~
__hash_node
/Library/Developer/CommandLineTools/usr/include/c++/v1/__hash_table:95:8: note: '__hash_node' declared here
struct __hash_node
^
In file included from fm_train.cpp:5:
In file included from ./src/FTRL/ftrl_trainer.h:5:
./src/FTRL/ftrl_model.h:44:104: error: template argument for template type parameter must be a type
using node_type = std::__detail::_Hash_node<std::pair<const char* const, ftrl_model_unit >, false>;
^~~~~
/Library/Developer/CommandLineTools/usr/include/c++/v1/__hash_table:94:28: note: template parameter is declared here
template <class _Tp, class _VoidPtr>
^
In file included from fm_train.cpp:5:
In file included from ./src/FTRL/ftrl_trainer.h:5:
./src/FTRL/ftrl_model.h:45:61: error: use of undeclared identifier 'node_type'
size_t offset_this = get_value_offset_in_Hash_node((node_type*)NULL);
^
./src/FTRL/ftrl_model.h:45:71: error: expected expression
size_t offset_this = get_value_offset_in_Hash_node((node_type*)NULL);
^
./src/FTRL/ftrl_model.h:46:33: error: unknown type name 'node_type'; did you mean 'true_type'?
size_t padding = sizeof(node_type) - offset_this - class_size;
^~~~~~~~~
true_type
/Library/Developer/CommandLineTools/usr/include/c++/v1/type_traits:540:38: note: 'true_type' declared here
typedef _LIBCPP_BOOL_CONSTANT(true) true_type;
^
In file included from fm_train.cpp:5:
In file included from ./src/FTRL/ftrl_trainer.h:5:
In file included from ./src/FTRL/ftrl_model.h:4:
/Library/Developer/CommandLineTools/usr/include/c++/v1/unordered_map:826:5: error: static_assert failed due to requirement 'is_same<value_type, typename
allocator_type::value_type>::value' "Invalid allocator::value_type"
static_assert((is_same<value_type, typename allocator_type::value_type>::value),
^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
./src/FTRL/ftrl_model.h:204:20: note: in instantiation of template class 'std::__1::unordered_map<const char *, ftrl_model_unit, my_hash, my_equal,
my_allocator<std::__1::pair<const char *, ftrl_model_unit >, float, ftrl_model_unit> >' requested here
my_hash_map muMap;
^
./src/FTRL/ftrl_trainer.h:191:18: note: in instantiation of template class 'ftrl_model' requested here
pModel = new ftrl_model(opt.factor_num, opt.init_mean, opt.init_stdev);
^
fm_train.cpp:40:21: note: in instantiation of member function 'ftrl_trainer::ftrl_trainer' requested here
ftrl_trainer trainer(opt);
^
fm_train.cpp:87:16: note: in instantiation of function template specialization 'train' requested here
return train(opt);
^
In file included from fm_train.cpp:5:
In file included from ./src/FTRL/ftrl_trainer.h:5:
In file included from ./src/FTRL/ftrl_model.h:4:
/Library/Developer/CommandLineTools/usr/include/c++/v1/unordered_map:826:5: error: static_assert failed due to requirement 'is_same<value_type, typename
allocator_type::value_type>::value' "Invalid allocator::value_type"
static_assert((is_same<value_type, typename allocator_type::value_type>::value),
^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
./src/FTRL/ftrl_model.h:204:20: note: in instantiation of template class 'std::__1::unordered_map<const char *, ftrl_model_unit, my_hash, my_equal,
my_allocator<std::__1::pair<const char *, ftrl_model_unit >, double, ftrl_model_unit> >' requested here
my_hash_map muMap;
^
./src/FTRL/ftrl_trainer.h:191:18: note: in instantiation of template class 'ftrl_model' requested here
pModel = new ftrl_model(opt.factor_num, opt.init_mean, opt.init_stdev);
^
fm_train.cpp:40:21: note: in instantiation of member function 'ftrl_trainer::ftrl_trainer' requested here
ftrl_trainer trainer(opt);
^
fm_train.cpp:89:12: note: in instantiation of function template specialization 'train' requested here
return train(opt);
^
14 errors generated.
src/Frame/pc_frame.cpp:9:5: warning: 'sem_init' is deprecated [-Wdeprecated-declarations]
sem_init(&semPro, 0, 1);
^
/Library/Developer/CommandLineTools/SDKs/MacOSX10.14.sdk/usr/include/sys/semaphore.h:55:42: note: 'sem_init' has been explicitly marked deprecated here
int sem_init(sem_t *, int, unsigned int) __deprecated;
^
/Library/Developer/CommandLineTools/SDKs/MacOSX10.14.sdk/usr/include/sys/cdefs.h:176:40: note: expanded from macro '__deprecated'
#define __deprecated attribute((deprecated))
^
src/Frame/pc_frame.cpp:10:5: warning: 'sem_init' is deprecated [-Wdeprecated-declarations]
sem_init(&semCon, 0, 0);
^
/Library/Developer/CommandLineTools/SDKs/MacOSX10.14.sdk/usr/include/sys/semaphore.h:55:42: note: 'sem_init' has been explicitly marked deprecated here
int sem_init(sem_t *, int, unsigned int) __deprecated;
^
/Library/Developer/CommandLineTools/SDKs/MacOSX10.14.sdk/usr/include/sys/cdefs.h:176:40: note: expanded from macro '__deprecated'
#define __deprecated attribute((deprecated))
^
2 warnings generated.
make: *** [all] Error 1

感觉FTRL里面train加锁有问题

多线程训练的情况下,有的线程更新参数g,有的线程读了参数s, 虽然有锁程序运行没问题,但感觉可能出现一条样本对参数的跟新不一致的问题
280 mu.mtx.lock();
个人感觉应该加在循环体外

请问代码中使用的loss function是什么?

我看您在计算gi时是这样算的:
double mult = y * (1 / (1 + exp(-p * y)) - 1);
double w_gi = mult * xi;

不应该是p - y吗,为什么是mult那样算的呢?
还请不吝赐教!

模型txt转bin的时候提示:read file error

命令:
./model_bin_tool -task 4 -im ./models/click_model_20200510_filter -om ./models/click_model_20200510_filter_bin -dim 8 -mnt float
模型:
F0 -0.264242
weekday_6 0.0173578 -0.0102694 0 -0.022693 0.00254934 0.0396216 0 0 -0.00407052
dist_interval_4000_4500 -0.00691738 0 0.00269233 0.000255495 0 -0.00661624 0.00172624 -0.0268299 -0.00641555
rec 0.00342799 0 0 -0.0272095 0.0139266 -0.0251638 0 -0.00760475 0.000963799
dist_interval_3500_4000 0.0140954 0.00791469 -0.0205997 0.0250574 0 0.041754 -0.0493869 0.00497005 -0.0204439

随着模型在线训练的进行,w_z 逐渐变大,导致稀疏效果不好,AUC降低

您好,我在使用 alphaFM 在线训练一个 LR 模型。在持续训练的过程中,随着在线训练的进行,每个特征出现的次数逐渐增多,w_z 的绝对值也逐渐变大。 当 w_l1 > abs(w_z) 时,特征权重为0 ,由于 w_z 变大,导致权重不为 0 的特征暴增,AUC 也逐渐降低。看了 FTRL 的公式,感觉这个问题是避免不了的。请问怎么解决这个问题呢?随着训练的进行,逐渐增大 w_l1 吗?

求联系,讨论FM

CastellanZhang,你好:
我目前也在做fm和与其有关的算法,对于你的项目很感兴趣,比如如何通过openMP进一步提高单机执行效率,使用FTRL的时候到底读取几次数据等问题希望向你多多请教,我的邮箱:[email protected] ,请问你的是?希望和你建立联系

删除频次太少的特征

大佬,我想让模型训练的时候忽略出现次数太少的特征,直到它出现次数够多了以后再纳入模型。
是不是就在model_unit里面增加一个记录出现次数的变量cnt,然后把src/FTRL/ftrl_trainer.h里面的
if(fabs(mu.w_zi) <= w_l1)
改成
if(fabs(mu.w_zi) <= w_l1 || mu.cnt <= cntThre)
就好了

谢谢大佬

关于模型跨平台问题

你好,楼主,请问有对应java版本的库吗?如果用c++训练完的模型,保存后,用java来加载,有什么好的方式吗?

about exp/log

I use FTRL to optimize logistic regression,and find it slow than newGLMNET;
now,i wander why,because compute loss need exp ?

我想请教一下为什么我用alphaFM跑出来的结果auc表现远不如LR

在预测文件中,第一个是label1/-1,后面跟着的是概率,在计算auc的时候,我通过如下的方式来获取模型预测的概率

  • 如果label是1(在fm_pre.txt文件中),那么模型预测为1的概率为prob
  • 如果label是-1,那么模型预测为1 的概率为1-prob

如上准备的概率和标签一起进入BinaryClassificationMetrics,最后表现的auc不超过1%,而LR接近90%,请问可能在哪里出现了问题?

关于alphaFM输出model的疑问

关于alphaFM输出的model,偏移量为三个值。但无论是FM还是LR,偏移量不都应该只有一个w0嘛?
另,当dim设置为 1,1,0时,是标准的LR。那为什么featurename对应的也是3个值?
谢谢~

性能跑不上去

发现 从 hdfs上用 –text 同步下来样本,用 pipe的方式,作为ftrl的输入,开始的时候,处理的很快,cpu能达到core数(–core 48,48个thread 开始能cpu跑满)。但是 跑几分钟后,cpu就只能达到 (1400%左右),接下来就一直上不去,大伙了解可能是啥原因吗

如何对管道数据做中间过程转换?

@CastellanZhang 你好!在readme文件里有看到:
因此alphaFM读取训练样本采用了管道的方式,这样的好处除了节省内存, 还可以通过管道对输入数据做各种中间过程的转换,比如采样、格式变换等
小白不太懂c++😓,所以没能在源码里找到对应的使用方法,不知道能否给一个使用示例呢?谢谢!

是否考虑加入CV的模式?

非常感谢作者的无私的奉献,比如线上的引擎是java的请问FM的predict有考虑提供不同语言的load接口?

同一份数据, dim 1,1,8 比 dim 1,1,0 的AUC低很多

您好,请教一下, 3000 万样本, 同样的 12 万维度稀疏特征, dim 参数为 1,1,0 时, AUC 是0.70, dim 参数为 1,1,8 时, AUC 为 0.54, 调参调了很久,一直上不去。
很奇怪,按道理 FM 应该比 LR 拟合得好才对,怎么 AUC 下降这么多呢?

force_v_sparse

force_v_sparse是在w=0时,强制令v=0, 这个操作有什么理论依据吗?
举例来说,当年龄对预测结果没有影响,但是年龄性别组合到一起可能就有影响了。
如果年龄的w=0, 就令年龄的v=0, 那么会导致跟年龄组合的其它组合特征都为0了?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.