Giter VIP home page Giter VIP logo

Comments (19)

CastellanZhang avatar CastellanZhang commented on August 17, 2024 1

@420742882 ,还有样本的输入顺序,最好是按照样本时间戳的顺序,不能是类似于按用户聚合或正样本都排在一起这种。FTRL因为是online learning的框架,对样本序比较敏感,没有时间戳的话最好shuffle一下。

from alphafm.

CasyWang avatar CasyWang commented on August 17, 2024

Any conclusion here?

from alphafm.

CastellanZhang avatar CastellanZhang commented on August 17, 2024

@420742882 ,机器学习依赖的环节很多,数据选取、特征构建、模型选择、参数调优、评价指标等等。笼统的讨论两个算法孰优孰劣是没有意义的,况且还有“没有免费午餐定理”。否则大家在所有问题上都用“最优”算法岂不一劳永逸了?就我的经验,在我们的大部分广告ctr数据上同样的高维特征做输入,调好参数后,FM比LR的AUC能有千分位上的提高。特征工程没做好或参数没调好,DNN输给LR都很正常,何况FM。

from alphafm.

CasyWang avatar CasyWang commented on August 17, 2024

from alphafm.

CastellanZhang avatar CastellanZhang commented on August 17, 2024

@CasyWang ,是完全一样的高维特征(包含组合特征和非组合特征),即完全一样的样本做输入,只是模型不同。

from alphafm.

420742882 avatar 420742882 commented on August 17, 2024

@CastellanZhang ,不好意思,我没讲清楚。我的意思是用alpha lr和我们的lr去做对比,完全一样的样本、特征。alpha lr比我们的lr效果要差很多。至于说,fm比lr在千分位上有提升,我是认可的。

from alphafm.

CastellanZhang avatar CastellanZhang commented on August 17, 2024

@420742882 ,我不知道你们自己的LR具体是怎么实现的,如果同样是online learning FTRL的话,应该是完全一致的。如果是传统的类似于OWL-QN之类的全局优化算法,那么数据量是个问题。我们广告业务碰到的都是大规模数据,即样本相对于特征维度足够多,我做过严格对比,样本只需要过一般即能达到全局LR算法的效果,甚至更好。如果数据量不多的话可能需要迭代多次才行。有人问过我类似的问题,见:CastellanZhang/alphaFM_softmax#2
还是那句话,具体问题具体分析。

from alphafm.

420742882 avatar 420742882 commented on August 17, 2024

实验用的数据量(亿级样本)和特征纬度是足够多的。我再研究一下,多谢~

from alphafm.

420742882 avatar 420742882 commented on August 17, 2024

@CastellanZhang ,多谢!重新shuflle了一下样本,auc跟baseline lr已经很接近,之前差10个百分点,现在差一个百分点。按样本时间戳顺序排序,效果是不是应该会更好?

from alphafm.

CastellanZhang avatar CastellanZhang commented on August 17, 2024

@420742882 ,不客气。按时间戳排序会不会更好,我没法下结论,只能你们继续实验了,只能说这么做肯定不会很差。此外,好好调调参数,或者再加大样本量,应该可以和lr持平的。

from alphafm.

420742882 avatar 420742882 commented on August 17, 2024

@CastellanZhang 调了w_alpha参数,auc和logloss可以反超baseline lr。但是线上AB Test时,效果下降很厉害。可能是什么原因呢?ctr预估的不准么?我们的样本做了采样,会把ctr预估的偏高~ 对比了ftrl和lr的预估结果,ftrl会把ctr预估的更高。

from alphafm.

CastellanZhang avatar CastellanZhang commented on August 17, 2024

@420742882 ,auc和logloss都比baseline好,说明训练工具本身没有问题。至于线上效果,我根本不知道你们具体业务,无法评价,而且线上情况涉及因素众多,很大可能不是训练工具的原因,需要你们全面debug了。

from alphafm.

420742882 avatar 420742882 commented on August 17, 2024

@CastellanZhang 你好。我们的业务是ctr预估。线下测试auc有千分之二的提升,logloss也是好于基线,但是线上效果就是差于基线。可以给一些排查思路么?新模型和base模型都是用同一套线上代码,有bug的概率比较小。

from alphafm.

shenleiz avatar shenleiz commented on August 17, 2024

博主好,我在运用alphaFM的过程中发现,上线效果,刚开始还可以,但一段时间之后,效果就下降了,没有最初上线那边好,不清楚是FTRL的问题吗,还是其他原因,您有遇到类似情况吗?

from alphafm.

CastellanZhang avatar CastellanZhang commented on August 17, 2024

@shenleiz ,我们的好多FTRL模型都是跑了两年多了都没有问题。据我所知FTRL已然是很成熟的算法,从2013年各家公司都开始广泛使用,所以不用担心算法本身是否正确,好好想想是哪个环节出了问题。尤其是之前都成功了最近不行了,那么二者对比是哪些因素发生了变化?如果一切重新开始是否还能像当初一样生效?需要自己好好实验分析。

from alphafm.

CasyWang avatar CasyWang commented on August 17, 2024

from alphafm.

lcshr123 avatar lcshr123 commented on August 17, 2024

模型运行一段时间后,维度不断增高,出现一定程度的过拟合,就会导致效果下跌。这时候,需要retraining.发自我的iPhone------------------ 原始邮件 ------------------发件人: BruceZhao [email protected]发送时间: 2018年1月4日 17:59收件人: CastellanZhang/alphaFM [email protected]抄送: oliverwang [email protected], Mention [email protected]主题: 回复:[CastellanZhang/alphaFM] alphaFM 效果如何? (#3)博主好,我在运用alphaFM的过程中发现,上线效果,刚开始还可以,但一段时间之后,效果就下降了,没有最初上线那边好,不清楚是FTRL的问题吗,还是其他原因,您有遇到类似情况吗? —You are receiving this because you were mentioned.Reply to this email directly, view it on GitHub, or mute the thread. {"api_version":"1.0","publisher":{"api_key":"05dde50f1d1a384dd78767c55493e4bb","name":"GitHub"},"entity":{"external_key":"github/CastellanZhang/alphaFM","title":"CastellanZhang/alphaFM","subtitle":"GitHub repository","main_image_url":"https://cloud.githubusercontent.com/assets/143418/17495839/a5054eac-5d88-11e6-95fc-7290892c7bb5.png","avatar_image_url":"https://cloud.githubusercontent.com/assets/143418/15842166/7c72db34-2c0b-11e6-9aed-b52498112777.png","action":{"name":"Open in GitHub","url":"[https://github.com/CastellanZhang/alphaFM"}},"updates":{"snippets":{"icon":"PERSON","message":"@shenleiz in #3: 博主好,我在运用alphaFM的过程中发现,上线效果,刚开始还可以,但一段时间之后,效果就下降了,没有最初上线那边好,不清楚是FTRL的问题吗,还是其他原因,您有遇到类似情况吗?"}],"action":{"name":"View Issue","url":"#3 (comment)"}}}

请问这个是 FTRL 的通病吗? 我也遇到了这种情况。我感觉在线学习应该可以一直持续训练才对

from alphafm.

aromazyl avatar aromazyl commented on August 17, 2024

FTRL本身就是给凸问题做优化的,FM非凸。

from alphafm.

dotsonliu avatar dotsonliu commented on August 17, 2024

请问xlearn 和 alphafm哪个效果好? 离线auc xlearn高了3个点

from alphafm.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.