Giter VIP home page Giter VIP logo

Comments (6)

mx54039q avatar mx54039q commented on August 21, 2024 1

你好,

  1. 这里的描述是勘误, 确实使用的是channel-wise的高斯分布, 感谢指出, 后续会修正. 高斯分布会比较贴切数据的原始分布, 在效果上应该会比fully factorized density model更好 (经验性的), 但对整体模型的性能影响不大.
  2. 假设训练策略和其他的结构都没问题的话, 我能想到的可能的原因有两点: 1). 加载我的预训练模型之后, fully factorized density model是重新初始化的, 可能在训练初始的时候会使得context model和hyperprior model失衡, 这里你可以check一下bpp_z是否正常. 2). 图片输入的范围 [-1,1] -> [0,1], 这一块可能会影响到Transformer结构的优化.

from entroformer.

mx54039q avatar mx54039q commented on August 21, 2024

from entroformer.

TNTWEN avatar TNTWEN commented on August 21, 2024

你好!感谢您的回复!
确实有可能是加载预训练权重的问题,我也正在尝试从头开始训练,看看是否会有提升。

非常感谢您的工作!祝好!

from entroformer.

TNTWEN avatar TNTWEN commented on August 21, 2024

@mx54039q
你好!阅读代码时发现该处self.dim是不是应该改成self.key_value_proj_dim,即多头注意力的dk取64而非384

self.scale = self.dim ** -0.5 if config.scale else 1.

from entroformer.

mx54039q avatar mx54039q commented on August 21, 2024

你说的没错, 这一块看来是我的实现错误, 不过训练影响不大. 你可以修改一下, 如果有正向提升的话麻烦告知我一下, 谢谢. :)

from entroformer.

TNTWEN avatar TNTWEN commented on August 21, 2024

@mx54039q
我已经找到了之前训练效果不佳的原因,之前训练时我对于y只用了add noise的操作,没有用ste。 由于我只用bidirectional,于是我把forward过程做到和compress,decompress一致,无需add noise操作。 在compressai的编码方案中,我使用了quantize_ste(y - means_hat) + means_hat

我将图像输入域改为了[-1,1],筛选了1.5W张图片作为数据集,加载了您的权重,目前训练了两个模型,效果还不错。在Kodak上compress/decompress的结果如下:
λ=0.02 bpp:0.607 PSNR:35.275
λ=0.18 bpp:1.623 PSNR: 41.401
应该与您paper的结果很相近了。

我现在正在尝试您的random mask的pretrain方案。我修改了dk进行pretrain,然后看看finetune的效果如何,再与您交流^-^

from entroformer.

Related Issues (18)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.