Giter VIP home page Giter VIP logo

Comments (9)

daochenzha avatar daochenzha commented on May 27, 2024

@FYNIXqwq 没错 更改牌库是有效果的

from douzero.

FYNIXqwq avatar FYNIXqwq commented on May 27, 2024

@daochenzha 目前不洗牌模式训练有一个很大的难题,当发牌风格改为不洗牌的类型(牌型整齐,而且bomb_count很高)的时候,训练的初期loss会非常高,数量级可以从10^2一路升到10^4,请教一下在这种情况下应该如何调节参数?(batch_size个人尝试过16和32,learning_rate从10^-7到10^-3都尝试过)

from douzero.

FYNIXqwq avatar FYNIXqwq commented on May 27, 2024

@daochenzha 牌堆的生成思路是先将牌的点数(3到A,2,小王,大王)打乱,再按照打乱后的顺序,王各生成1张,3到A和2各生成4张,再将生成的牌堆进行切牌,这样就能模仿不洗牌模式的“瑕疵”,而不至于“过于整齐”,整个过程只生成一次64位随机整数作为种子,牌的点数打乱方式、切牌次数以及切牌位置都通过种子取余来解决。

from douzero.

FYNIXqwq avatar FYNIXqwq commented on May 27, 2024

@daochenzha 按照洗牌模式进行训练,训练初期的loss在默认情况下只有个位数,但是不洗牌模式的loss可以在几十到几万之间波动。

from douzero.

FYNIXqwq avatar FYNIXqwq commented on May 27, 2024

如果用胜率进行训练则不会造成十分高的loss

from douzero.

FYNIXqwq avatar FYNIXqwq commented on May 27, 2024

如果用胜率进行训练则不会造成十分高的loss

受此启发,个人尝试修改了奖惩规则,减少了bomb_count对奖惩结果的影响,防止其指数扩大后数值偏差过于严重

from douzero.

FYNIXqwq avatar FYNIXqwq commented on May 27, 2024

image

from douzero.

FYNIXqwq avatar FYNIXqwq commented on May 27, 2024

这个新的奖惩机制鼓励AI在胜算很大的时候提高bomb count,同时大幅减少bomb count小于等于3时的惩罚幅度,这样AI不至于学会摆烂

from douzero.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.