damo-cv / entroformer Goto Github PK

View Code? Open in Web Editor NEW

75.0 1.0 10.0 440 KB

C++ 7.42% Python 86.40% Shell 6.18%

entroformer's Introduction

[ICLR2022] Entroformer: A Transformer-based Entropy Model for Learned Image Compression [pdf]

The official repository for Entroformer: A Transformer-based Entropy Model for Learned Image Compression.

Pipeline

Evaluation on Kodak Dataset

Requirements

Prerequisites

Clone the repo and create a conda environment as follows:

conda create --name entroformer python=3.7
conda activate entroformer
conda install pytorch=1.7 torchvision cudatoolkit=10.1
pip install torchac

(We use PyTorch 1.7, CUDA 10.1. We use torchac for arithmetic coding.)

Test Dataset

Kodak Dataset

kodak
├── image1.jpg 
├── image2.jpg
└── ...

Train & Evaluate & Comress & Decompress

Train:

sh scripts/pretrain.sh 0.3
sh scripts/train.sh [tradeoff_lambda(e.g. 0.02)]
(You may use your own dataset by modifying the train/test data path.)

Evaluate:

# Kodak
sh scripts/test.sh [/path/to/kodak] [model_path]
(sh test_parallel.sh [/path/to/kodak] [model_path])

Compress:

sh scripts/compress.sh [original.png] [model_path]
(sh compress_parallel.sh [original.png] [model_path])

Decompress:

sh scripts/decompress.sh [original.bin] [model_path]
(sh decompress_parallel.sh [original.bin] [model_path])

Trained Models

Download the pre-trained models optimized by MSE.

Note: We reorganize code and the performances are slightly different from the paper's.

Acknowledgement

Codebase from L3C-image-compression , torchac

Citation

If you find this code useful for your research, please cite our paper

@InProceedings{Yichen_2022_ICLR,
    author    = {Qian, Yichen and Lin, Ming and Sun, Xiuyu and Tan, Zhiyu and Jin, Rong},
    title     = {Entroformer: A Transformer-based Entropy Model for Learned Image Compression},
    booktitle = {International Conference on Learning Representations},
    month     = {May},
    year      = {2022},
}

entroformer's People

Contributors

Stargazers

Watchers

Forkers

tntwen ali-zafari f1052063 eclipsess dzbwhut rafael2k yanwang202199 echo-cwb pgdtgq samirandas1311

entroformer's Issues

训练集的获取与图片选择

您好，感谢您的工作与开源。
我想要用compressai重新复现您的工作，但我的数据集存在尺寸不足384的图片，并且复现结果和数据集存在相关性。在论文中提到数据集的选择是：“We choose 14886 images from OpenImage (Krasin et al., 2017) as our training data.”。OpenImage好像有很多版本，并且图片数量十分庞大。
请问您是如何选择图片以保证数据集中的多样性？此外，不知您是否方便上传您的数据集？
希望收到您的回复，谢谢！

Decompressed

我在解压缩的照片的时候遇到了：RuntimeError: "min_all" not implemented for 'Half'

Compressed binary file size

Hi,
I tried to compress one image file by your pre-trained model.
But the binary file size is same (2.9M) on different lambda model (entroformer_lambda0.1.pth, entroformer_lambda0.01.pth)
I expected to get different binary file size.
It should be the binary file size which is from "entroformer_lambda0.01.pth" model smaller than "entroformer_lambda0.1.pth"
Am i miss something..?

thank you

Inquiry: Extending Model Capability for High Resolution Image Compression

Hi,

I've been experimenting with your model and found it to be quite effective for compressing low-resolution images. However, I've encountered some challenges when attempting to compress high-resolution images such as test images of JPEG AI, where the model seems to underperform or fail.

Could you provide any insights or recommendations on how to adapt or extend the model to handle high-resolution image compression more effectively? Are there specific parameters or modifications that can be made to accommodate larger image sizes?

I appreciate your work on this project and any guidance you can offer on this matter.

Thank you!

The RD points on Kodak

Thanks for sharing the code! If possible, could you provide the RD points (the value of x and y axis) on Kodak dataset? (Preferably at least 4 RD points.) I would like to compare these results with the RD performance of my algorithm.

About pretrain and finetune

你好！
之前我将entroformer在compressai框架下重构，加载了您提供的预训练模型，成功训练出了您论文的效果。最近尝试从头开始训练模型，发现从头开始训练有些困难，所以和您确认一下超参数的设置与训练方案的改进。

数据集： Openimage搜集的1.5W张图片训练的模型均为parallel自回归

Pretrain:
根据https://github.com/damo-cv/entroformer/blob/main/scripts/pretrain.sh
我在Pretrain训练时，batshsize 为8，epoch设置为1000，λ 0.3，学习率全程保持1e-4 ，patchSize 256 。采用了random mask，mask_ratio为 0.5。

Finetune：
batshsize 为8，取消了random mask，patchSize 384 ，λ 0.02 ，学习率初始化为1e-4，学习率变化策略尝试了您提供的策略，其中epoch 500，也尝试了factor=0.5,patience=20的ReduceLROnPlateau学习率调整策略，这两者差别不大。但训练结果比使用预训练模型差了些。

在Kodak测试集上，加载预训练模型结果为 Loss：1.017 bpp：0.607 PSNR：35.275。
但是pretrain+finetune的结果就差了好些：Loss：1.036 bpp：0.6181 PSNR：35.2041。

特别是模型从学习率将为2.5e-05开始，Loss就减少的很小。

故有以下几点超参数的设定想和您确认，希望您能提供一些改进训练的思路：
①：Pretrain时您将patchSize设置为 256，而非384。由于epoch多， patchSize设256时虽然训练地更快些，但是否会影响pretrain模型的效果？
②：您的batchsize都是设置为8，该训练任务是否对batchsize敏感，更大的batchsize是否会有帮助？
③：我的训练结果给我的感觉是陷入了局部最优点，在Finetune时，前25个epoch为warmup，75个epoch学习率是1e-4，感觉训得有些少，是否可以finetune时学习率保持1e-4，训练更多的epoch来寻找全局最优点？

关于bpp的计算

您好，关于bpp的具体计算我有一个问题，中间特征y和z的分辨率是要比原始图像x更小的，打个比方，原始图像为3，256，256，y可能就是C1，100，100,z应该更小可能是C2，50，50，那么计算bpp的时候是除以原始图像的分辨率(256，256)，还是中间特征的分辨率(200，200)或者(50，50)?

When will the code be revealed?

I found your paper interesting, When will the code be revealed? :)

Checkpoint for psnr = 37.72

Dear author,

Thank you for your precious work ! I am experimenting with your code and could not find the pertained model producing 37.72 dB on Kodak dataset. Is it possible to share the checkpoint for that quality? In addition, if you have parallel entroformer checkpoints for higher lambdas, is it possible to share them as well ?

Thanks in advance

torchac 中的BitInputStream类

您好，我想问一下BitInputStream类要怎么使用呀，我想得到bin文件的纯0,1bit流数据

About prob_model and training

你好！我在尝试您的项目时有几点疑问想要请教。
1、论文中写到超先验部分使用没有任何假设分布的channel-wise fully factorized density model。但在代码中的prob_model似乎是channel-wise 的高斯分布。使用高斯分布有何优势？
2、我将entroformer 迁移到compressai框架下实验，训练集采用openimage的train_f并做了筛选，大约有9W张图片，以（384，384）的patchsize训练。网络输入范围为[0,1],超先验部分采用了 fully factorized density model，训练策略与您的实现相同。加载了您的预训练模型进行训练，但效果不太理想，在Kodak上测试结果和Minnen (2018)差不多。但相同的数据集在您的代码下训练效果就很好。想听听您的意见，不知道是不是fully factorized density model导致的

非常感谢您的工作！

Kodak dataset 在低码率上就达到很高的psnr

您好，谢谢你的工作和开源。我用您的代码训练sh scripts/train.sh 0.1800，而改变设置depth=2,heads=4，--na unidirectional，--attn_topk -1，并且去掉下面代码
#if(hasattr(self, 'sos_pred_token')):
# sos_pred_token = repeat(self.sos_pred_token, '() n d -> b n d', b = batch_size)
# out = torch.cat((sos_pred_token, out[:,:-1,:]), dim=1)
时，训练集为DIV2K 800张图片，随机裁剪训练，我发现改变--alpha(即lamda)的时候，即使变得很大，(如: --alpha 0.18, bpp:0.1684, PSNR:39.04)在Kodak dataset，我的码率很低，psnr很高，请问这是怎么回事，希望得到您的回复，谢谢

y = encode_model(image*2 - 1)

为什么用不同的预训练模型，得到的二进制文件的大小都是一样的

为什么用不同的预训练模型，得到的二进制文件的大小都是一样的，而且二进制的文件比图像的大小还大，怎么体现了压缩呢