Light

nzc / dnn_ctr Goto Github PK

View Code? Open in Web Editor NEW

757.0 23.0 284.0 42.43 MB

The framework to deal with ctr problem。The project contains FNN,PNN,DEEPFM, NFM etc

Python 100.00%

dnn_ctr's Issues

关于数据

请问所使用的数据是来自哪儿的？

import utils 的data_preprocess出现这个问题，请指点一下

cannot import name 'data_preprocess'

关于NFM

博主，您可以理解错了NFM了

https://github.com/nzc/dnn_ctr/blob/master/model/NFM.py#L235

看到这里，假设数据有39个域，FM的embed向量长度为4，你用FM构造的向量是 39*(39-1)/2 =741 个，为各个域隐向量两两之间的内积。

但是，我看到官方实现的方法是用FM求和后的隐向量，在这个场景下长度应为4

您可以看下官方实现（翻译成torch版的了，原版TF应该和这个差不多）

https://github.com/guoyang9/NFM-pyorch

关于NFM模型

原文里面用了下面的这个式子简化了运算
NFM.py里面好像是用的循环，这样是否会效率较低？

deepFM，多分类

请问该代码会提供多分类版本吗。尝试着改成多分类的，失败了。

关于DeepFM

您好，我运行你的代码，报如下错误，请问怎么解决，第一次用pytorch
RuntimeError: cuda runtime error (10) : invalid device ordinal at torch/csrc/cuda/Module.cpp:88

请问有人用完整的数据集测试过吗能跑到论文中给定的loss吗

关于数据预处理

您好，我在用kaggle-2014-criteo冠军的代码跑category_emb.csv的时候，出现这个错误，怎么解决呀？

/bin/sh: 1: ./converters/pre-a.py: not found
cat: tr.gbdt.dense.tmp.0: 没有那个文件或目录
cat: tr.gbdt.sparse.tmp.0: 没有那个文件或目录

Unable to find ffnn.pkl file

Hi,
Could you please help me fix the following issue.
Traceback (most recent call last):
File "main.py", line 4, in
from model import FNN
File "/root/dnn-ctr-fnn/dnn_ctr/model/FNN.py", line 540, in
fnn.load_state_dict(torch.load('./data/model/ffnn.pkl'))
File "/root/.local/lib/python2.7/site-packages/torch/serialization.py", line 356, in load
f = open(f, 'rb')
IOError: [Errno 2] No such file or directory: './data/model/ffnn.pkl'

For dcn model

In

dnn_ctr/model/DCN.py

Line 232 in c750fec

 x_l = torch.sum(x_0 * x_l, 1).view([-1,1]) * getattr(self,'cross_weight_'+str(i+1)).view([1,-1]) + getattr(self,'cross_bias_'+str(i+1)) + x_l 

, x_0 * x_l should be replaced by torch.matmul(x_0, x_l.t()), right ?

deepfm中embedding的梯度计算

您好, 请教一个梯度计算的问题. deepfm中embedding层的参数学习(即second_order_emb), torch在计算梯度的时候是分别计算deep部分和fm部分, 然后求和得到更新的步长的么? 另外就是这个embedding层的初始化有什么技巧么?

关于特征工程的做法

您好！很荣幸看到您的代码，然而在训练我们的数据集时遇到了一些问题。
问题1：特征工程
关于特征的编码，我们比较好奇是使用什么样的方式。如果可以的话，能请您发一下对criteo数据集进行特征编码的代码或链接吗？
问题2：标签编码
在阅读代码的过程中，我发现在读取数据时，index是对应的Xi_train内容，是读取的csv中的数据。而value是1-39的标签。这让我有些费解。如果可以的话，能请您大概描述一下这么做的原因吗？或者请您简单介绍一下embed标签的csv中每一列代表的意义吗？

关于DeepFM模型

你好，我在fork你的代码之后，运行main.py文件，报如下错误，

缺少**./data/category_emb.csv**这个文件，我想问这个文件我应该怎么生成啊？
谢谢。

已经更新

关于DCN

首先，感谢分享，原文有一句话是We propose the DCN model that enables Web-scale automatic feature learning with both sparse and dense inputs 是可以把稀疏数据作为训练数据的，请问你的分享是否有这样的接口呢。

数据集加载

我看作者是有个小数据集的，直接用data_preprocess.py读数据，形成一个dict，我们做真实数据预测的时候，请问是直接把整个数据集加载到内存中吗，我这样试过，由于内存不足被kill了，该怎样解决呢

model里面的din模型对应的注释是nfm

是注释错误了吗？

强烈建议默认支持python3

现在都是新时代了，希望兼容一下python3。

关于FNN

您好，跑FNN的时候出现这个错误是什么情况？
RuntimeError: cuda runtime error (8) : invalid device function at /pytorch/aten/src/THC/generated/../generic/THCTensorMathReduce.cu:18

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.