googolxx / stf Goto Github PK

View Code? Open in Web Editor NEW

153.0 153.0 19.0 20.78 MB

Pytorch implementation of the paper "The Devil Is in the Details: Window-based Attention for Image Compression".

License: Apache License 2.0

Python 87.90% C++ 12.10%

attention-mechanism python pytorch transformer

stf's People

Contributors

Stargazers

Watchers

Forkers

ali-zafari acelya39 dbhevc memory4963 miraut aybarsmanav chenyuanxu ronechen smallflyingpig guoguo1314 nshidqi yifeipet distoramos wonlee2019 hjoonpark aslakdjupskas msnetrom albertopresta duongvinh

stf's Issues

请问加载模型时遇到的问题

代码如

device = "cuda" if torch.cuda.is_available() else "cpu"
net = WACNN()
model = torch.load("ckpt/cnn_0036_best.pth.tar")
net.load_state_dict(model["state_dict"])
net.eval()

会引发错误

而训练指令在使用chechpoint如
CUDA_VISIBLE_DEVICES=0,1,2 python train.py -d suim_data -e 1000 --batch-size 64 --save --save_path ckpt/cnn_0036.pth.tar --checkpoint ckpt/cnn_0036.pth.tar -m cnn --cuda --lambda 0.0035
会直接无效

一些问题

1、如何训练ssim的模型，也是将lamda改成ssim的值，loss改成ssim的loss，然后从头开始训练6个吗（比如你就是训练的6个）？我看这个链接InterDigitalInc/CompressAI#213 只用在mes上面微调就行了，请问这个是怎么微调的？

关于训练时长

请问您使用什么机子，训练一次需要多长时间呢

关于CLIC Professional Validation dataset上的MS-SSIM结果

您好，我想确认一下，对于CLIC数据集，你们是在CLIC 2020 professional validation set的41张图片上进行测试的吗？

我正在CLIC 2020 professional validation set上做一些测试，由于您暂未发布面向MS-SSMI训练的模型。我用compressai官方发布的cheng2020-attn (optimized for MS-SSIM)模型，测出来的结果如下（为了节约时间，使用的是entropy-estimation模式，但和实际压缩差别不大）：
{
"ms-ssim": [
0.9485660457029575,
0.9640006800977196,
0.9746105394712309,
0.982909100811656,
0.9871920754269856,
0.9904760369440404
],
"bpp": [
0.07344438667159255,
0.11335148580554055,
0.1676517518736967,
0.25349851770371923,
0.35020289660953896,
0.4888123557334993
],
}

以dB为单位，则MS-SSIM为 [
12.8875,
14.43706,
15.95347,
17.67235,
18.92521,
20.21182,
]
我测出来的结果和论文中Figure 7右图中的结果相差较大（论文中结果整体偏高），您知道问题出在哪吗？是数据集的问题，还是测试细节上有一些问题？

另外面向MSE优化的cheng2020-attn模型，我测出来结果如下：
{
"psnr": [
30.32762732156893,
31.62418156135373,
33.03751177904083,
34.74842425090511,
36.019728218636864,
37.394848800287015
],
"bpp": [
0.08608534591408765,
0.12756635571216665,
0.19314934858461705,
0.3055826674510793,
0.42445681334995644,
0.5825808095495876
],
}
和论文中的结果也略有差别，但差别不大。

疑问

训练细节问题，为了修改学习率，怎么停止训练，怎么继续训练？是需要手动吗？
谢谢博主！

Training epochs

Thank you for sharing the codes of your great work.

I have a question about the training epochs that you used for your paper and for pre-trained models.
In your paper, the number of iterations is 1.8M, and your model sees 1.8M * 16 = 28.8M images in total. However, in your README, the number of epochs is 1000, and your model sees 0.3M (the size of the subset of OpenImagesV6) * 1000 = 300M images in total.
The number is not equivalent, so I would appreciate it if you could tell me how did you train your model for your paper and the pre-trained models.

请问运行模型时会报错 A load persistent id instruction was encountered, but no persistent_load function was specified.不知道怎么解决

Using A100 GPU to train the model

When using A100 GPU to train the model "cnn", maybe we will get the following error:

NVIDIA A100-PCIE-40GB with CUDA capability sm_80 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70.
If you want to use the NVIDIA A100-PCIE-40GB GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/

dimension y_hat before and after entropy coding

Hi,

Sorry to bother you again. I just have one question about the STF model (but I suppose it could also apply for the CNN one).

When I do print(y_hat.shape) before and after the entropy coding, in the forward(self, x) function of the SymmetricalTransFormer class (in the stf.py file), I don't get the same dimensions....
I get before entropy coding [1, 384, 48, 80] and after entropy decoding [1, 384, 640, 48] (for an input tensor of shape [1, 3, 768, 1280])

Aren't they supposed to be the same ? Can you please tell me what I'm doing wrong ?

Thank You !!

Here's your code with my "print"s :

`
def forward(self, x):
"""Forward function."""
x = self.patch_embed(x)

    Wh, Ww = x.size(2), x.size(3)
    x = x.flatten(2).transpose(1, 2)
    x = self.pos_drop(x)
    for i in range(self.num_layers):
        layer = self.layers[i]
        x, Wh, Ww = layer(x, Wh, Ww)

    y = x
    C = self.embed_dim * 8
    y = y.view(-1, Wh, Ww, C).permute(0, 3, 1, 2).contiguous()
    y_shape = y.shape[2:]

    ####### 
    print("y.shape : ", y.shape)
    #########

    z = self.h_a(y)
    _, z_likelihoods = self.entropy_bottleneck(z)
    z_offset = self.entropy_bottleneck._get_medians()
    z_tmp = z - z_offset
    z_hat = ste_round(z_tmp) + z_offset

    latent_scales = self.h_scale_s(z_hat)
    latent_means = self.h_mean_s(z_hat)

    y_slices = y.chunk(self.num_slices, 1)
    y_hat_slices = []
    y_likelihood = []

    for slice_index, y_slice in enumerate(y_slices):
        support_slices = (y_hat_slices if self.max_support_slices < 0 else y_hat_slices[:self.max_support_slices])
        mean_support = torch.cat([latent_means] + support_slices, dim=1)
        mu = self.cc_mean_transforms[slice_index](mean_support)
        mu = mu[:, :, :y_shape[0], :y_shape[1]]

        scale_support = torch.cat([latent_scales] + support_slices, dim=1)
        scale = self.cc_scale_transforms[slice_index](scale_support)
        scale = scale[:, :, :y_shape[0], :y_shape[1]]

        _, y_slice_likelihood = self.gaussian_conditional(y_slice, scale, mu)

        y_likelihood.append(y_slice_likelihood)
        y_hat_slice = ste_round(y_slice - mu) + mu

        lrp_support = torch.cat([mean_support, y_hat_slice], dim=1)
        lrp = self.lrp_transforms[slice_index](lrp_support)
        lrp = 0.5 * torch.tanh(lrp)
        y_hat_slice += lrp

        y_hat_slices.append(y_hat_slice)

    y_hat = torch.cat(y_hat_slices, dim=1)
    y_likelihoods = torch.cat(y_likelihood, dim=1)

    y_hat = y_hat.permute(0, 2, 3, 1).contiguous().view(-1, Wh*Ww, C)
    for i in range(self.num_layers):
        layer = self.syn_layers[i]
        y_hat, Wh, Ww = layer(y_hat, Wh, Ww)

    ###########
    print("y_hat.shape : ", (y_hat.view(-1, Wh, Ww, self.embed_dim)).shape)
    ###########

    x_hat = self.end_conv(y_hat.view(-1, Wh, Ww, self.embed_dim).permute(0, 3, 1, 2).contiguous())
    return {
        "x_hat": x_hat,
        "likelihoods": {"y": y_likelihoods, "z": z_likelihoods},
    }`

【Inquiries on Training Issues】

Dear Googolxx,

I am retraining the pytorch implementation, and randomly choose the 150k training images & 50k testing images.

And i follow the training usage "CUDA_VISIBLE_DEVICES=0,1 python train.py -d /path/to/image/dataset/ -e 1000 --batch-size 16 --save --save_path /path/to/save/ -m stf --cuda --lambda 0.0035"

But there exists a problem :: training slowly, four hours per epoch; and "-e 1000" will cost 166 days for one lambda.

Further, i use the 4090 GPU, the training status is followed as

and one training process owns many PIDs (MAYBE this situation causes the slow training speed?)

Am i miss something or should i consider some tips/tricks ? What steps can be taken to increase training speed？

关于数据的问题

您好，请问一下，您Table1的数据是用自己的数据集重新再跑的一遍得出结果吗？还是用的compressai给的结果。那率失真曲线图中的全部数据也是用自己数据集跑的吗？还是用的compressai给的结果？谢谢了！

导入compressai失败

作者您好，想请问一下，我在import compressai的时候，出现了以下报错：
ModuleNotFoundError: No module named 'compressai._CXX'
我认为可能是只导入了您仓库中改写过的compressai，并没有导入原始compressai仓库，但是我已经使用pip install compressai安装了原始compressai仓库，想问一下这种情况应该怎么解决？

Pretrained models

Can you take the pretrained models, as I found it may take a long time to train the network from scratch.

关于compressai平台的问题

作者你好，看了你的文章里面的结果对比里有和很多传统方法（JPEG、BPG、VVC等）压缩后的图片比较，我查看compressai的文档里面只能使用这些方法对数据集进行评估，得到bpp，psnr这些性能指标。想问一下可以使用compressai得到这些传统方法压缩后的图片吗？可以的话是怎么实现的呢？十分感谢！

Lightweight pre-trained model or

Hello ！
Running the pre-training model is beyond the capacity of the server .There is an error “cuda out of memory Tried to allocate 4.34 GiB (GPU 0; 31.75 GiB total capacity; 845.96 MiB already allocated; 3.41 GiB free; 858.00 MiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
”
So Is there a lightweight pre-trained model ？Or can we modify to reduce the number of images when running pre-trained model tests？
Any reply would be appreciated!

Numbers in the paper

Hi, thanks for your great work!

Is it possible to release the numbers of your R-D curve in your paper, Figure 6, like what CompressAI does? It would help us a lot to study and learn from your work.

压缩高分辨率图像

作者您好，想问下您有没有尝试过用cnn预训练模型去压缩比较大的图像？我下载了您的基于cnn的预训练模型后，压缩了一张3.6M的图像，得到了如下结果：

"name": "cnn",
"description": "Inference (ans)",
"results": {
"psnr": [
36.9149467424472
],
"ms-ssim": [
0.9617322683334351
],
"bpp": [
0.05067232837933475
],
"encoding_time": [
0.5175447463989258
],
"decoding_time": [
0.34578394889831543
]
}
其中单看psnr可能会觉得效果很好，但是bpp为0.05几，这样子的结果是正常的嘛？我又压缩了一张大小为6M左右的图像，得到的结果为："name": "cnn",
"description": "Inference (ans)",
"results": {
"psnr": [
35.43835283028224
],
"ms-ssim": [
0.9664726257324219
],
"bpp": [
0.03681968530415619
],
"encoding_time": [
1.4469022750854492
],
"decoding_time": [
0.8462481498718262
]
}
bpp的结果为0.036几，感觉很奇怪。不知道您有没有压缩过比较大的图像，结果是怎样呢？如果能得到回复我将会很感激！！

一些请教

大神你好，您知道这种图是怎么生成的么，和您论文里的bitmap有些相似

Plug and Play

Hello, I put the Win_noShift_Attention in your code into my project, but the effect decreased, I guess I deployed it incorrectly. Can you tell me how to configure this module. My direction is monocular depth estimation

the results of executing the pre-training model

Sorry to disturb you：
could u please tell me what are the results of executing the pre-training model？Is it only the value of mse 、bpp、psnr ...or something else？ And where can I see the results？So what would be the result of my successful implementation of the pre-training model？
by the way，are u Chinese？Can we communicate in Chinese？HAha。。
I would appreciate any reply。

Pre-trained model weights

Hi,

First of all thanks for you great work!
I wonder if it is possible to release all pre-trained model weights (right now I think there are only two available) that correspond to your results in the paper.
I would like to evaluate the STF model on various dataset (other than Kodak) so that I'm able to compare it with my own research.

Thanks. Any feedback is appreciated.

Bit allocation map

Hi !

Great work !

I have just one question : how did you compute the bits allocation map in Fig 3 in your article ?

Thanks !

关于CNN模型

您好作者，请问在CNN model 中，为什么要使用shift model呢，非常感谢

关于模型效果和训练参数

我在用400张无损高清数据集（4000*6000）进行训练时，epoch=200，batch size=2，patch_size=(256,256)，训练的多个λ对应的模型，但是λ=0.25的时候尝试高分辨率测试图像和Kodak测试图像，得到的psnr均很小，即使拉姆达设置为0.483得到压缩图像的psnr也只有不到psnr，请问是训练数据量和训练参数设置的问题吗？应该怎样修改？

[bug]使用给出的模型测试报错

下载预训练模型cnn_025_best.pth.tar时显示800多M，实际只有250M，在测试该模型的时候报错如下：
RuntimeError:PytorchStreamReader failed reading zip archive: failed finding central directory。网上说是因为模型损伤，具体是因为什么呢？

specific path

Hi !
Thank you for the great work ! could you please give me the specific path，e.g. the specific path of the evaluate command ：-r reconstruction path -p path of checkpoint？thank u very much！

Regarding the issue of multi-GPU training

Regarding the issue of multi-GPU training, the official 'compressai' is still in the experimental stage for multi-GPU support. Does your model support multi-GPU? When will it support multi-GPU training, and when will it not?

model optimized with ms-ssim

Hi,
Thank you for your great work!

In the paper, you mentioned that your models were optimized by MSE and MS_SSIM.
However, in the codes, i didn't see it's optimized with ms-ssim.
Could you please help to advise on this?

Thank you!

I'm having some problems。

Hello author. I would like to know how you got the RD points. I don't know how to write the code？
How to get the json file in the result？

Visualization of WAM, NLAM, and w/o attention mod- ule for the channel with maximal entropy.

作者你好，看了您论文中的 WAM和NLAM之间的比特分配情况对比，我想知道您是怎么实现的，这个问题困扰我好久了，希望您能帮我解惑，非常感谢！！

模型训练相关

你好大神，1：我训练你模型时源代码里使用MultiStepLR修改了学习率，但是train的时候学习率依然没变，这个bug你修复了吗？如果修复了请问您是如何修复的呢？
2: 还有一个现问题就是在compressai 框架下您是如何加载预训练模型呢？
期待并感谢您的回复！！！

关于如何获取不同bpp下的psnr，ss-ssim

Getting weird results with STF, need help !

Hi !
Thank you for the great work !

I'm having some difficulties to reproduce the results on other datasets though.... And I could really use your help !

I trained STF (transformer version) on Waymo & BDD100K opensource datasets (imgs from front cam of an autonomous vehicle, 180000 imgs in total), for the almost same lambda values [0.0009, 0.0018, 0.0035, 0.0067, 0.013, 0.025, 0.0483] using MSE Loss for 200 epochs each. Using Weight&Bias app, the training results looks fine (see graph below). Don't they ?

And when I test my new weights on Waymo I get weird results...
Indeed : (see the graph below) ...
- on cropped (256, 256) images (just like the validation on during training), the results make sense (green curve)
- on full size images (with the appropriate padding, just like your compressai.utils.eval_model code), the results don't make sense (red curve)
- the black curve is the results I get on Waymo using the weights trained with lambda = [0.0035, 0.025] on OpenImages that you generously provide

I have done those test multiple times and I always get the same strange results...

Do you have any idea where the bug can come from ?? Is it the training datasets ? Why the cropping change the results ?

I would be extremely grateful for your help !!!

关于两种模型通道数的设置问题

作者您好！

由于我在使用CAVE数据集（31通道）训练高光谱图像的压缩，因此需要修改图像通道数量来进行训练。因此我想请问一下cnn模型和stf模型中对于图像通道数量的设置有什么区别。

首先就是CNN模型中似乎只需要在31行附近的这个部分将原来的3修改为31即可，并且可以正常训练。

self.g_a = nn.Sequential(
    conv(31, N, kernel_size=5, stride=2), #从3修改为31
    ...
)

紧接着就是在STF模型中，我尝试修改了351行与388行两处 in_chans 的初始值为31，尝试运行后仍旧是报通道数不匹配的错误。

    def __init__(self, patch_size=4, in_chans=31, embed_dim=96, norm_layer=None):
        super().__init__()
        ...
class SymmetricalTransFormer(CompressionModel):
    def __init__(self,
                pretrain_img_size=256,
                patch_size=2,
                in_chans=31, #input channel num = 31
                ...

以下是报错的提示：

Namespace(aux_learning_rate=0.001, batch_size=16, checkpoint=None, clip_max_norm=1.0, cuda=True, dataset='CAVE', epochs=1000, learning_rate=0.0001, lmbda=0.0035, model='stf', num_workers=30, patch_size=(256, 256), save=True, save_path='stf/ckpt/', seed=None, test_batch_size=64)
Learning rate: 0.0001
/data2/zhaoshuyi/anaconda3/envs/compressai/lib/python3.7/site-packages/torch/nn/modules/loss.py:528: UserWarning: Using a target size (torch.Size([16, 31, 128, 128])) that is different to the input size (torch.Size([16, 3, 128, 128])). This will likely lead to incorrect results due to broadcasting. Please ensure they have the same size.
  return F.mse_loss(input, target, reduction=self.reduction)
Traceback (most recent call last):
  File "stf/train.py", line 367, in <module>
    main(sys.argv[1:])
  File "stf/train.py", line 342, in main
    args.clip_max_norm,
  File "stf/train.py", line 132, in train_one_epoch
    out_criterion = criterion(out_net, d)
  File "/data2/zhaoshuyi/anaconda3/envs/compressai/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "stf/train.py", line 52, in forward
    out["mse_loss"] = self.mse(output["x_hat"], target)
  File "/data2/zhaoshuyi/anaconda3/envs/compressai/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/data2/zhaoshuyi/anaconda3/envs/compressai/lib/python3.7/site-packages/torch/nn/modules/loss.py", line 528, in forward
    return F.mse_loss(input, target, reduction=self.reduction)
  File "/data2/zhaoshuyi/anaconda3/envs/compressai/lib/python3.7/site-packages/torch/nn/functional.py", line 2928, in mse_loss
    expanded_input, expanded_target = torch.broadcast_tensors(input, target)
  File "/data2/zhaoshuyi/anaconda3/envs/compressai/lib/python3.7/site-packages/torch/functional.py", line 74, in broadcast_tensors
    return _VF.broadcast_tensors(tensors)  # type: ignore
RuntimeError: The size of tensor a (3) must match the size of tensor b (31) at non-singleton dimension 1

所以，请问还有什么别的需要修改的地方来实现对高光谱图像的压缩的支持呢？十分感谢！