Giter VIP home page Giter VIP logo

Comments (9)

xjzhou avatar xjzhou commented on September 14, 2024

同样的感觉,如果优化的话,最简单有效的是优化语料数据还是训练代码,或者模型?

from text-antispam.

pakrchen avatar pakrchen commented on September 14, 2024

效果不好具体是指什么?三个模型运行的结果分别是多少?

from text-antispam.

xjzhou avatar xjzhou commented on September 14, 2024

比如我给几个具体的例子。我用的默认 server.py,测试几个效果

image

image

image

from text-antispam.

pakrchen avatar pakrchen commented on September 14, 2024

在text2vec.py和text_features.py中都是把spam标记为0,pass标记为1。你给的结果按照这种标记应该是正确的。

from text-antispam.

pakrchen avatar pakrchen commented on September 14, 2024

如果你有具体业务场景的标记数据,添加训练数据是最简单有效的优化方法。

from text-antispam.

xjzhou avatar xjzhou commented on September 14, 2024

方便加个微信吗(我的微信见截图)?哈哈想起这个会被antispam掉。我也以为可能0是spam,但是,这个case确是1

image

from text-antispam.

pakrchen avatar pakrchen commented on September 14, 2024

训练用的数据集主要是短信,像“加我微信有福利”这类样本一般是在网络评论上容易出现,得加入数据集中重新训练。
训练数据集中的中文短信样本来源:https://github.com/hrwhisper/SpamMessage/tree/master/data
英文短信样本来源:http://www.esp.uem.es/jmgomez/smsspamcorpus/

from text-antispam.

vinayakumarr avatar vinayakumarr commented on September 14, 2024

How to do it character level and this I have to apply for 21 classes. Which are all changes has to be done

from text-antispam.

zhaocaicat avatar zhaocaicat commented on September 14, 2024

训练用的数据集主要是短信,像“加我微信有福利”这类样本一般是在网络评论上容易出现,得加入数据集中重新训练。
训练数据集中的中文短信样本来源:https://github.com/hrwhisper/SpamMessage/tree/master/data
英文短信样本来源:http://www.esp.uem.es/jmgomez/smsspamcorpus/
那如果是这种变体文本的话,需要将这个变体文本作为样本加入进去训练,这样网络才能识别变体的文本吗?

from text-antispam.

Related Issues (18)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.