microsoft / dcvc Goto Github PK

View Code? Open in Web Editor NEW

349.0 13.0 55.0 6.41 MB

Deep Contextual Video Compression

License: MIT License

Python 80.23% CMake 1.61% C++ 17.05% C 0.70% Cuda 0.41%

dcvc's Introduction

Introduction

Official Pytorch implementation for Neural Video and Image Compression including:

Neural Video Codec
- DCVC: Deep Contextual Video Compression, NeurIPS 2021, in this folder.
- DCVC-TCM: Temporal Context Mining for Learned Video Compression, in IEEE Transactions on Multimedia, and arxiv, in this folder.
- DCVC-HEM: Hybrid Spatial-Temporal Entropy Modelling for Neural Video Compression, ACM MM 2022, in this folder.
  - The first end-to-end neural video codec to exceed H.266 (VTM) using the highest compression ratio configuration, in terms of both PSNR and MS-SSIM.
  - The first end-to-end neural video codec to achieve rate adjustment in single model.
- DCVC-DC: Neural Video Compression with Diverse Contexts, CVPR 2023, in this folder.
  - The first end-to-end neural video codec to exceed ECM using the highest compression ratio low delay configuration with a intra refresh period roughly to one second (32 frames), in terms of PSNR and MS-SSIM for RGB content.
  - The first end-to-end neural video codec to exceed ECM using the highest compression ratio low delay configuration with a intra refresh period roughly to one second (32 frames), in terms of PSNR for YUV420 content.
- DCVC-FM: Neural Video Compression with Feature Modulation, CVPR 2024, in this folder.
  - The first end-to-end neural video codec to exceed ECM using the highest compression ratio low delay configuration with only one intra frame, in terms of PSNR for both YUV420 content and RGB content in a single model.
  - The first end-to-end neural video codec that support a large quality and bitrate range in a single model.
Neural Image Codec
- EVC: Towards Real-Time Neural Image Compression with Mask Decay, ICLR 2023, in this folder.

On the comparison

Please note that different methods may use different configurations to test different models, such as

Source video may be different, e.g., cropped or padded to the desired resolution.
Intra period may be different, e.g., 96, 32, 12, or 10.
Number of encoded frames may be different.

So, it does not make sense to compare the numbers in different methods directly, unless making sure they are using same test conditions.

Please find more details on the test conditions.

Acknowledgement

The implementation is based on CompressAI and PyTorchVideoCompression.

Citation

If you find this work useful for your research, please cite:

@article{li2021deep,
  title={Deep Contextual Video Compression},
  author={Li, Jiahao and Li, Bin and Lu, Yan},
  journal={Advances in Neural Information Processing Systems},
  volume={34},
  year={2021}
}

@article{sheng2022temporal,
  title={Temporal context mining for learned video compression},
  author={Sheng, Xihua and Li, Jiahao and Li, Bin and Li, Li and Liu, Dong and Lu, Yan},
  journal={IEEE Transactions on Multimedia},
  year={2022},
  publisher={IEEE}
}

@inproceedings{li2022hybrid,
  title={Hybrid Spatial-Temporal Entropy Modelling for Neural Video Compression},
  author={Li, Jiahao and Li, Bin and Lu, Yan},
  booktitle={Proceedings of the 30th ACM International Conference on Multimedia},
  year={2022}
}

@inproceedings{li2023neural,
  title={Neural Video Compression with Diverse Contexts},
  author={Li, Jiahao and Li, Bin and Lu, Yan},
  booktitle={{IEEE/CVF} Conference on Computer Vision and Pattern Recognition,
             {CVPR} 2023, Vancouver, Canada, June 18-22, 2023},
  year={2023}
}

@inproceedings{li2024neural,
  title={Neural Video Compression with Feature Modulation},
  author={Li, Jiahao and Li, Bin and Lu, Yan},
  booktitle={{IEEE/CVF} Conference on Computer Vision and Pattern Recognition,
             {CVPR} 2024, Seattle, WA, USA, June 17-21, 2024},
  year={2024}
}

@inproceedings{wang2023EVC,
  title={EVC: Towards Real-Time Neural Image Compression with Mask Decay},
  author={Wang, Guo-Hua and Li, Jiahao and Li, Bin and Lu, Yan},
  booktitle={International Conference on Learning Representations},
  year={2023}
}

Trademarks

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft’s Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party’s policies.

dcvc's People

Contributors

Stargazers

Watchers

dcvc's Issues

Nan value encountered while testing DCVC-FM code

Hi, thank you for sharing the repository. When I test the DCVC-FM code with HEVC Class D dataset, pytorch=2.0.0, python=3.10.13, numpy=1.26.4, nan value encountered, how to solve the problem?

Traceback (most recent call last):
File "DCVC-FM/src/models/image_model.py", line 166, in encode
encoded = self.forward_one_frame(x, q_index)
File "DCVC-FM/src/models/image_model.py", line 149, in forward_one_frame
bits_y = self.get_y_gaussian_bits(y_for_bit, scales_hat)
File "DCVC-FM/src/models/common_model.py", line 74, in get_y_gaussian_bits
gaussian = torch.distributions.normal.Normal(mu, sigma)
File "/home/anaconda3/envs/dcvc-fm/lib/python3.10/site-packages/torch/distributions/normal.py", line 56, in init
super().init(batch_shape, validate_args=validate_args)
File "/home/anaconda3/envs/dcvc-fm/lib/python3.10/site-packages/torch/distributions/distribution.py", line 62, in init
raise ValueError(
ValueError: Expected parameter scale (Tensor of shape (1, 256, 32, 48)) of distribution Normal(loc: torch.Size([1, 256, 32, 48]), scale: torch.Size([1, 256, 32, 48])) to satisfy the constraint GreaterThan(lower_bound=0.0), but found invalid values:
tensor([[[[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
...,
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan]]]], device='cuda:0')

DCVC-DC: Intra_force mode invaild

Hi,

Thank you for your excellent work.
However, when I execute DCVC-DC force_intra mode, I received an error about EntopryCoder.

The error is caused by DCVC-DC/src/models/entropy_models.py line 13: ModuleNotFoundError: No module named 'src.models.MLCodec_rans'.

Does it have some modules not been upload?

Severe quality degradation issue when using the DCVC-DC Intra model !!!

First of all, I want to express my gratitude for releasing both your code and pre-trained model weights for all the settings.
The testing results closely align with the data reported in your paper, particularly regarding the UVG, HEVC-B, and MCL-JCV datasets.

Recently, I conducted tests on the intra model IntraNoAR of DCVC-DC using the provided pre-trained weights (cvpr2023_video_psnr.pth.tar) from this link. My tests involved using images from a different dataset in the RGB color space for research purposes. Unfortunately, during this process, I encountered issues that led to instability in my model during training.

It took some time for me to debug my implementation, and I finally found that the main issue stemmed from the degradation of the I-frame when compressed by your intra model. I didn't update any learnable parameters in the model which means the result should be the same as the pre-trained model you provided.

Here is the result of the degraded compressed image

All the images above were downloaded from the original video sources of the commonly used training dataset Vimeo90K, accessible via this link.

Could you please test your intra model with the provided images to check for any problems? Additionally, it would be greatly appreciated if you could provide an additional training script for both your intra-frame and inter-frame models.

regarding DCVC-DC

Request for DCVC-DC Training Code

Dear DCVC Authors,

First of all, thank you for your outstanding work on the DCVC project. The contributions you have made to the field of video compression are greatly appreciated.

I am currently working on a research project that heavily relies on advanced video compression techniques, and I have found your DCVC-DC model to be particularly promising. However, I have encountered some difficulties in training the DCVC-DC model from scratch due to the lack of detailed training code.

Would it be possible for you to share the training code for DCVC-DC? Having access to the training scripts would be immensely beneficial for my research and would enable me to better understand and build upon your work.

Thank you for considering my request. I look forward to your response.

Best regards,
aboyandaworld

Email: [email protected]

DCVC-DC trainning code

I would like to train DCVC-DC to adapt to my own dataset. Could I get the training code for it?

My email is [email protected].

Thanks very much.

Could I get the training code for DCVC-DC

Hi, thank you for sharing the repository.
I would like to fine-tune DCVC-DC to adapt to my dataset.
Could I receive the training code for this purpose?
Please contact me at [email protected].
Thank you.

Pretrained Model of EVC is not available.

Thanks for sharing your code. While the link to the pre-trained models of EVC is unavailable, may you update the download link? That will be very helpful, thx.

How to train weights

How to train weights？

DCVC-FM traning code.

I'm so interested in the DCVC-FM, could you please send me the traing codes. Thanks very much.

[email protected]

Unable to trace IntraNoAR model

Repro script

model = IntraNoAR()
model_path = '<path to DCVC repo>/DCVC-DC/checkpoints/cvpr2023_image_psnr.pth.tar'
state_dict = get_state_dict(model_path)
model.load_state_dict(state_dict)
model.eval()

input_shape = (1, 3, 1088, 1920)
x = torch.ones(input_shape)
traced_model = torch.jit.trace(model, x)

  File "<path to lib>/lib/python3.9/site-packages/torch/jit/_trace.py", line 759, in trace
    return trace_module(
  File "<path to lib>/python3.9/site-packages/torch/jit/_trace.py", line 976, in trace_module
    module._c._create_method_from_trace(
  File "<path to lib>/python3.9/site-packages/torch/nn/modules/module.py", line 1178, in _slow_forward
    result = self.forward(*input, **kwargs)
  File "<path to dcvc>/DCVC-DC/src/models/image_model.py", line 115, in forward
    curr_q_enc, curr_q_dec = self.get_q_for_inference(q_in_ckpt, q_index)
  File "<path to dcvc>/DCVC-DC/src/models/image_model.py", line 109, in get_q_for_inference
    curr_q_enc = self.get_curr_q(q_scale_enc, self.q_basic_enc, q_index=q_index)
  File "<path to dcvc>/DCVC-DC/src/models/common_model.py", line 37, in get_curr_q
    return q_basic * q_scale
  File "<path to lib>/lib/python3.9/site-packages/torch/_tensor.py", line 955, in __array__
    return self.numpy()
RuntimeError: Can't call numpy() on Tensor that requires grad. Use tensor.detach().numpy() instead.

Pipeline for running the model

Hey. Great project. Do you have a pipeline I could try out? i.e., inputting a video and then getting the compressed output?

DCVC-DC training code

I would like to train DCVC-DC to adapt to my own dataset. Could I get the training code for it?

My email is [email protected].

Thanks very much.

Training reproduction is impossible (attached script)

I am currently working on reproducing DCVC models (TCM, HEM, DC). I have implemented the training_step using pytorch_lightning as shown below.

However, the performance results after training are not satisfactory, and I observe the same phenomenon for all models.

If anyone has identified a similar pattern and has solutions, it would be great to work on it together!

Feel free to reach out to me via email. (I would also be happy to share minor modifications to the model classes.)


import torch
import torch.nn.functional as F
import torch.optim as optim
from pytorch_lightning import LightningModule
from torch.nn.modules.utils import consume_prefix_in_state_dict_if_present
from src.models.image_model import IntraNoAR
from src.models.video_model import DMC
from src.utils.common import AverageMeter
import torchvision

import random


def get_stage_config(current_epoch):
    # borders_of_stages = [1, 4, 7, 10, 16, 21, 24, 25, 27, 30] # Default
    borders_of_stages = [1, 4, 7, 10, 21, 26, 29, 30, 32, 35] # early More
    # borders_of_stages = [1, 4, 7, 10, 16, 21, 27, 29, 35, 38] # More
    # borders_of_stages = [1, 4, 7, 10, 16, 21, 28, 30, 30, 30] # no avg_loss
    # borders_of_stages = [0, 0, 0, 0, 0, 0, 5, 7, 10, 11]    # Fine-tuning final stages
    # borders_of_stages = [0, 0, 0, 0, 0, 0, 8, 10, 11, 11]    # Fine-tuning no avg_loss (with total_epochs=10)
    

    if current_epoch < borders_of_stages[0]:
        stage = 0
    elif current_epoch < borders_of_stages[1]:
        stage = 1
    elif current_epoch < borders_of_stages[2]:
        stage = 2
    elif current_epoch < borders_of_stages[3]:
        stage = 3
    elif current_epoch < borders_of_stages[4]:
        stage = 4
    elif current_epoch < borders_of_stages[5]:
        stage = 5
    elif current_epoch < borders_of_stages[6]:
        stage = 6
    elif current_epoch < borders_of_stages[7]:
        stage = 7
    elif current_epoch < borders_of_stages[8]:
        stage = 8
    elif current_epoch < borders_of_stages[9]:
        stage = 9
        
    # stage: 0 - 9
    config = {}
    config["loss"] = []
    config["stage"] = stage

    # 1. Number of Frames
    if stage < 5:
        config["nframes"] = 2
    elif stage < 6:
        config["nframes"] = 3
    elif stage < 10:
        config["nframes"] = 5

    # 2. Loss
    if stage < 1:
        config["loss"].append("mv_dist")
    elif stage < 2:
        config["loss"].append("mv_dist")
        config["loss"].append("mv_rate")
    elif stage < 3:
        config["loss"].append("x_dist")
    elif stage < 4:
        config["loss"].append("x_dist")
        config["loss"].append("x_rate")
    elif stage < 10:
        config["loss"].append("x_dist")
        config["loss"].append("x_rate")
        config["loss"].append("mv_rate")

    # 3. Learning Rate
    if stage < 7:
        config["lr"] = 1e-4
    elif stage < 8:
        config["lr"] = 1e-5
    elif stage < 9:
        config["lr"] = 5e-5
    elif stage < 10:
        config["lr"] = 1e-5

    # 4. Loss Avg.
    if stage < 8:
        config["avg_loss"] = False
    elif stage < 10:
        config["avg_loss"] = True

    # 5. mode.
    if stage < 2:
        config["mode"] = "inter"
    elif stage < 4:
        config["mode"] = "recon"
    elif stage < 10:
        config["mode"] = "all"
    return config


class DCLightning(LightningModule):
    def __init__(
        self, kwargs
    ):
        # ----------------- Single P-frame --------------------
        # Stage 0: mv_dist                                      < 1
        # Stage 1: mv_dist & mv_rate                            < 4
        # Stage 2: x_dist                                       < 7
        # Stage 3: x dist & x_rate                              < 10
        # Stage 4: x dist & x_rate & mv_rate                    < 16

        # ----------------- Dual P-frame --------------------
        # Stage 5: x dist & x_rate & mv_rate                    < 21

        # ----------------- Four P-frame --------------------
        # Stage 6: x dist & x_rate & mv_rate                    < 24
        # Stage 7: x dist & x_rate & mv_rate (1e-5)             < 25
        # Stage 8: x dist & x_rate & mv_rate (5e-5) (avg_loss)  < 27
        # Stage 9: x dist & x_rate & mv_rate (1e-5) (avg_loss)  < 30

        super().__init__()
        self.i_frame_model = self.load_i_frame_model()
        self.p_frame_model = DMC()

        self.q_index_to_lambda = {
            # 0: 340,
            # 1: 680,
            # 2: 1520,
            # 3: 3360,
            0: 85,
            1: 170,
            2: 380,
            3: 840,
        }
        self.weights = [0.5, 1.2, 0.5, 0.9]
        self.automatic_optimization = False
        self.single = kwargs["single"]
        self.quality = kwargs["quality"]
        
    # (out_net, frames[i+1], q_index=q, objective=objective)
    def rate_distortion_loss(
        self,
        out_net,
        target,
        q_index: int,
        objective: list,
        frame_idx: int,
    ):
        bpp = torch.tensor(0.0).to(out_net["dpb"]["ref_frame"].device)
        if "mv_rate" in objective:
            bpp += out_net["bpp_mv_y"] + out_net["bpp_mv_z"]
        if "x_rate" in objective:
            bpp += out_net["bpp_y"] + out_net["bpp_z"]

        out = {"bpp": bpp}
        out["mse"] = F.mse_loss(out_net["dpb"]["ref_frame"], target)
        out["psnr"] = 10 * torch.log10(1 * 1 / out["mse"])

        if self.use_weighted_loss:
            out["loss"] = (
                self.q_index_to_lambda[q_index] * out["mse"] * self.weights[frame_idx]
                + out["bpp"]
            )
        else:
            out["loss"] = self.q_index_to_lambda[q_index] * out["mse"] + out["bpp"]
        return out

    def update(self, force=True):
        return self.model.update(force=force)

    def compress(self, ref_frame, x):
        return self.model.compress(ref_frame, x, self.quality)

    def decompress(
        self, ref_frame, mv_y_string, mv_z_string, y_string, z_string, height, width
    ):
        return self.model.decompress(
            ref_frame, y_string, z_string, mv_y_string, mv_z_string, height, width
        )

    def training_step(self, batch, batch_idx):
        config = get_stage_config(self.current_epoch)
        lr = config["lr"]
        nframes = config["nframes"]
        objective = config["loss"]
        use_avg_loss = config["avg_loss"]
        mode = config["mode"]
        
        self.use_weighted_loss = True if nframes >= 5 else False
        
        q = random.randint(0, 3) if not self.single else self.quality

        # Set Optimizers
        opt = self.optimizers()
        opt._optimizer.param_groups[0]["lr"] = lr

        # Batch: [B, T, C, H, W]
        seq_len = batch.shape[1]
        frames = [image.squeeze(1) for image in batch.chunk(seq_len, 1)][:nframes]

        # I frame compression
        with torch.no_grad():
            # (x, q_in_ckpt=False, q_index=None):
            self.i_frame_model.eval()
            x_hat = self.i_frame_model(frames[0], q_in_ckpt=True, q_index=q)["x_hat"]
            dpb = {
                "ref_frame": x_hat,
                "ref_feature": None,
                "ref_mv_feature": None,
                "ref_y": None,
                "ref_mv_y": None,
            }
            
            if batch_idx % 100 == 0:
                self.log_images(
                    {
                        f"train_x_ori_{0}": frames[0],
                        f"train_x_recon_{0}": dpb['ref_frame']
                    },
                    batch_idx,
                )

        # Iterative Update
        if mode == "inter":
            step = self.p_frame_model.forward_inter
        elif mode == "recon":
            step = self.p_frame_model.forward_recon
        elif mode == "all":
            step = self.p_frame_model.forward_all
        else:
            raise NotImplementedError

        total_psnr = AverageMeter()
        total_bpp = AverageMeter()
        total_mse = AverageMeter()
        total_loss = AverageMeter()

        avg_loss = 0
        for i in range(nframes - 1):
            # (x, dpb, q_index, frame_idx):
            out_net = step(frames[i + 1], dpb, q_index=q, frame_idx=i)
            dpb = out_net["dpb"]
            
            out_criterion = self.rate_distortion_loss(
                out_net,
                frames[i + 1],
                q_index=q,
                objective=objective,
                frame_idx=i,
            )

            if not use_avg_loss:
                opt.zero_grad()
                self.manual_backward(out_criterion["loss"])
                self.clip_gradients(
                    opt, gradient_clip_val=1.0, gradient_clip_algorithm="norm"
                )
                opt.step()
                # All the information in dpb are freed
                if nframes >= 3:
                    for k in dpb.keys():
                        dpb[k] = dpb[k].detach()
            else:
                avg_loss += out_criterion["loss"]



            if batch_idx % 100 == 0:
                self.log_images(
                    {
                        f"train_x_ori_{i+1}": frames[i+1],
                        f"train_x_recon_{i+1}": dpb['ref_frame']
                    },
                    batch_idx,
                )

            total_psnr.update(out_criterion["psnr"].item())
            total_bpp.update(out_criterion["bpp"].item())
            total_mse.update(out_criterion["mse"].item())
            total_loss.update(out_criterion["loss"].item())

        if use_avg_loss:
            # TODO: should we divide avg_loss by sequence length? -> AdamW optimizer can deal with the avg_loss, but Adam optimizer can not without division. 
            opt.zero_grad()
            self.manual_backward(avg_loss / (nframes - 1))
            self.clip_gradients(
                opt, gradient_clip_val=1.0, gradient_clip_algorithm="norm"
            )
            opt.step()

        self.log_dict(
            {
                "avg_psnr": total_psnr.avg,
                "avg_bpp": total_bpp.avg,
                "avg_mse": total_mse.avg,
                "avg_loss": total_loss.avg,
            },
            sync_dist=True,
        )


    def log_images(self, log_dict, batch_idx):
        if self.global_rank == 0:
            for key in log_dict.keys():
                self.logger.experiment.add_image(
                    key,
                    torchvision.utils.make_grid(torch.Tensor.cpu(log_dict[key])),
                    self.current_epoch * 100000 + batch_idx,
                    dataformats="CHW",
                )

    def validation_step(self, batch, batch_idx):
        with torch.no_grad():
            nframes = 5
            objective = ["mv_rate", "x_rate", "x_dist"]
            self.use_weighted_loss = True if nframes >= 5 else False

            for q in range(4):
                # Set Optimizers
                # Batch: [B, T, C, H, W]
                seq_len = batch.shape[1]
                frames = [image.squeeze(1) for image in batch.chunk(seq_len, 1)]
                recon_frames = []

                # I frame compression
                # (x, q_in_ckpt=False, q_index=None):
                x_hat = self.i_frame_model(frames[0], q_in_ckpt=True, q_index=q)[
                    "x_hat"
                ]
                dpb = {
                    "ref_frame": x_hat,
                    "ref_feature": None,
                    "ref_mv_feature": None,
                    "ref_y": None,
                    "ref_mv_y": None,
                }
                recon_frames.append(x_hat)

                # Iterative Update
                step = self.p_frame_model.forward_all

                total_psnr = AverageMeter()
                total_bpp = AverageMeter()
                total_mse = AverageMeter()
                total_loss = AverageMeter()

                for i in range(nframes - 1):
                    # (x, dpb, q_index, frame_idx):
                    out_net = step(frames[i + 1], dpb, q_index=q, frame_idx=(i))
                    out_criterion = self.rate_distortion_loss(
                        out_net,
                        frames[i + 1],
                        q_index=q,
                        objective=objective,
                        frame_idx=i,
                    )

                    dpb = out_net["dpb"]
                    recon_frames.append(dpb["ref_frame"])

                    total_psnr.update(out_criterion["psnr"].item())
                    total_bpp.update(out_criterion["bpp"].item())
                    total_mse.update(out_criterion["mse"].item())
                    total_loss.update(out_criterion["loss"].item())

                self.log_dict(
                    {
                        f"val_avg_psnr/q{q}": total_psnr.avg,
                        f"val_avg_bpp/q{q}": total_bpp.avg,
                        f"val_avg_mse/q{q}": total_mse.avg,
                        f"val_avg_loss/q{q}": total_loss.avg,
                    },
                    sync_dist=True,
                )

                if batch_idx == 2:
                    self.log_images(
                        {
                            f"val_x_ori/q{q}": torch.cat(frames, dim=0),
                            f"val_x_recon/q{q}": torch.cat(recon_frames, dim=0),
                        },
                        batch_idx
                    )

    def configure_optimizers(self):
        parameters = {n for n, p in self.p_frame_model.named_parameters()}
        params_dict = dict(self.p_frame_model.named_parameters())

        optimizer = optim.AdamW(
            (params_dict[n] for n in sorted(parameters)),
            lr=1e-4,  # default
        )
        # optimizer = optim.Adam(
        #     (params_dict[n] for n in sorted(parameters)),
        #     lr=1e-4,  # default
        # )

        return {
            "optimizer": optimizer,
        }

    def load_i_frame_model(self):
        i_frame_net = IntraNoAR()
        ckpt = torch.load(
            "../../checkpoints/cvpr2023_image_psnr.pth.tar",
            map_location=torch.device("cpu"),
        )
        if "state_dict" in ckpt:
            ckpt = ckpt["state_dict"]
        if "net" in ckpt:
            ckpt = ckpt["net"]
        consume_prefix_in_state_dict_if_present(ckpt, prefix="module.")

        i_frame_net.load_state_dict(ckpt)
        i_frame_net.eval()
        return i_frame_net
`
```

`.MLCodec_rans|.MLCodec_CXX` could not be resolved.

Thank you for your greak work!
When I was working on the released DCVC-HEM code, I found some errors saying 'Import ".MLCodec_rans"|".MLCodec_CXX" could not be resolved' in the entropy_models.py file and I didn't found any clue about these two folders/files/libraries. So could you guys help me with this?

I'm also interested in the DCVC-FM, could you please send me the traing codes. Thanks very much. [email protected]

Could you release the code for training DCVC-DC ?

Hi, thank you for sharing the repository.
I would like to fine-tune a pre-trained DCVC-DC model on my dataset. Could you please send the code for training DCVC-DC to the following email?

Email :
[email protected]

The inconsistent test results of the compared methods

Hello, I want to express my gratitude for your outstanding work. However, I have noticed that each method compared with DCVC-DC, ranging from the traditional VVC to the deep learning-based DCVC-TCM, DCVC-HEM, and CANF-VC, seems to yield test results considerably higher than those reported in the original paper for each dataset. Could you please explain the reason for this?

Training time?

Looking to know what training time to expect for what GPUs?

(Thanks for the library)

performance drop when fintunning DMC model from ACMMM2022

Congratulations on such remarkable results that surpass VVC!

I ran the test code with the pre-trained model you provided and I reproduced the results you reported in the paper.

Unluckily, when I tried to finetune it for some steps following the training strategy (lr=1e-5, multi-frame forward loss averaging) in (sheng et al2021), the RD curve will drop around 0.7 dB after just hundreds of steps.

The training code is based on DVC and the regular rate-distortion loss is optimized.

Any suggestions will be appreciated.

Thank you!

R-D curve differences between DCVC-HEM and DCVC-DC papers

Hi, first of all, thanks for your great work!
While looking around the repository to get R-D performances from the figures you provided, I found that reported performances of DCVC-DC and DCVC-HEM papers are really different.

I compared UVG dataset's RGB R-D curves.

Regarding DCVC-HEM's figure (https://github.com/microsoft/DCVC/blob/main/DCVC-HEM/assets/rd_curve_psnr.png),

highest operating point of DCVC shows approximately (37dB, 0.10bpp) R-D performance,
and DCVC-HEM shows approximately (38.5dB, 0.115bpp).

However, regarding DCVC-DC's figure (https://github.com/microsoft/DCVC/blob/main/DCVC-DC/assets/rd_rgb_psnr.png),

DCVC's PSNR is higher than 38dB when it's bpp is much lower than 0.10bpp,
and DCVC-HEM's is also higher than 40.5dB while it's bpp is about 0.09bpp.

I saw that dataset_config_example.json is same between two model's repositories, but are there any differences between two models' test conditions that I missed?

Thanks in advance!

The batchsize change causes the channels to mismatch

Has anyone tried to modify batchsize so that one graphics card can run two frames at the same time, I've encountered many problems

Comment on validity of claims

First of all, I want to congratulate you on your excellent results.

However, I have reservations about the claim stating that your codec "is the first end-to-end neural video codec to exceed ECM using the highest compression ratio configuration, in terms of PSNR for YUV420 content".

I believe it would be more accurate to compare your codec against the established benchmarks of VTM and ECM results using JVET common test conditions. Additionally, it's crucial to include a clear disclaimer regarding the usage of a forced intra-period of 32 for ECM and VTM, as this will have affected the comparison outcomes (VTM does not need such a frequent intra-period, but I assume your model does). When testing your model, my results were close but did not beat VTM.

By ensuring a fair and direct comparison, your findings will hold more credibility within the video coding community.

pretrained model lost

I am unable to open the pretrained model website. Could you please send me a copy to my email address [email protected]

Missing description for the pre-trained models

There are six pre-trained models inside the linked folder. A description of each of them in the README will be helpful in determining which model to pick and try out.

The default values of quantization parameters

Where are the default values of quantization parameters defined for this program? Also, how do the four values specified in the command line argument i_frame_q_scales affect the default quantization parameters?On EVC

DCVC_result_psnr.json is not provided in the testing code

Could you please upload it? Thanks so much.

DCVC-DC training code

I'm quite interested in the training code for DCVC-DC, especially in understanding how the hierarchical weights influence the model during the training process. Specifically, I implemented the hierarchical weights in my own model, adopting the popular multi-stage training approach (IP, PP, PP, PP, ...) and then the cascade training for sequences like IPPPPP, PPPPPP, and so on.

I've observed a fascinating behavior in my model during the multi-stage training, and it appears to align with the trend you mentioned. However, when transitioning to the cascade stage, a peculiar shift in the curve by 2 frames caught my attention. After thoroughly inspecting my code, I'm confident there are no bugs. Did you incorporate any special techniques during your training, or do you have insights into this phenomenon? I'm eagerly anticipating your response.

Seems that the links for Checkpoints seems have been broken.

It seems that the links for the Checkpoints have been broken. I am unable to open or download its contents.

If possible, pls check the links' status. Thank u very much.

Training loop

Hi, thank you for sharing the code.
I would like to train DCVC-DC model by myself. Could you please share the training code or send it to the following email;

[email protected]

Best,
Alberto

Will you upload the training loop

Hi,

Will you upload the training loop to the project ?
Thanks a lot

Performance: 1fps with 1080p case on NV4090, is this expected？

I try to run this model on N4090 with 1080p case, but the fps is around 1. Is this behavior expected? If not, are there any configuration settings I should check that could be causing this?

The download link for weight of DCVC-DC is unavailable

https://github.com/microsoft/DCVC/tree/main/DCVC-DC#pretrained-models

Help Needed | About the training strategy in DCVC-HEM

😃aloha！ Thanks for your nice work.
I am wonderring how the training loop is set so that the neural codec mentioned in DCVC-HEM could realize rate adjustment in single model. I am kind of a newbie in this area, after reading this paper DCVC-HEM and the test code provided by this repo, I have no idea about this. The paper says the training strategy is set as same as TCM, so I read the referenced paper TCM to figure out how to realize a multi-stage training. But after reading the paper I still don't know about this.
So could you please provide a little help or share the training code?

training bpp_y

Dear author,

I am a student researching video coding, and I must say your work has been truly outstanding and inspiring to me.

I have previously reproduced and fine-tuned the DCVC-HEM model, which has granted me a deeper understanding of the video coding field. However, recently, I encountered difficulties while trying to reproduce the DCVC-FM code. I consistently faced a situation where the bpp training easily becomes divergent. I have scrutinized the training process and found that it is particularly prone to training above 4.0, whether in training bpp_y or bpp_mv_y.

I attempted to import pre-trained parameters and train the model using only bpp as the loss, but unfortunately, I still could not achieve training below 0.6. The difference between the results and the pre-trained model, which achieved 0.01, is quite significant. Could you kindly provide me with some valuable advice regarding training the model? I am eagerly looking forward to your response.

DCVC-FM model is only applicable to datasets converted using the BT.709 standard under RGB test conditions?

Thank you for the released codes and models; they have significantly helped my research! However, I have encountered some confusion during the evaluation.

Most previous approaches have adopted PNG datasets extracted using ffmpeg software during the conversion from YUV420P to PNG. I tested both the DCVC-DC and DCVC-FM models on these datasets that were converted with ffmpeg. It was observed that the DCVC-FM model performed significantly worse under the same Group of Pictures (GOP) length of 32 in RGB test conditions, with the exception of the HEVC Class E dataset.

Has anyone else encountered this issue?

I conjecture the reason maybe that neural networks are easily fitted to data processing during training, considering that the training color conversion adheres to the BT.709 standard. However, traditional codecs perform consistently across different color conversion approaches.

Frame-level bit-allocation of DMC model from ACM MM2022

Hello, thank you for the work and sharing your code.

We have a question after running the program following the provided test instruction. We drawing the per-frame PSNR value recorded in the output json file (as shown in the figure below), we found that there seems to be a frame-level bit allocation mechanism in the DMC model. However, we could not find any information on frame-level bit allocation in the proposed paper or in the released code.

Can you provide some details on how your implementation does the bit allocation between each frame or any insight into such phenomenon?

Could you release the training code for DCVC-FM?

Hi, thank you for sharing the repository. Could you please send the training code for DCVC-FM to the following email? [email protected]

Can this run on Intel Integrated GPU?

I am using Intel GPU and it is not going to GPU at all. How to make it run on intel Integrated GPU?
Thanks

I would like to train DCVC-FM on my dataset and am also requesting access to the training code. Thank you. My email address is: [[email protected]]

I would like to train DCVC-FM on my dataset and am also requesting access to the training code. Thank you.
My email address is: [[email protected]]

FP16 inference

Thank you for your great effort in building DCVC series models.

A few month earlier, I've achieved about 13 fps (DCVC-DC with fp16 precision, decoding only) in TensorRT with 2K videos, and for 720P videos, it can achieve 60fps (DCVC-DC with fp16 precision, decoding only). I've only checked quality of reconstructed I frame image, and it was just fine. When the precision goes down to 8 bit, the reconstructed image suffers from a huge quality loss.

I'm trying to do the same thing based on DCVC-FM models. Now you've implemented new "block_mc_kernel", can it work on 8 bit precision? What should I do to use your implementation in ONNX or TensorRT? My conversion work flow is pytorch->ONNX->TensorRT.

Also, I'm interested in doing some fine tunings based on DCVC-FM, could you please send me the training code? I can send you a tiny but robust rate control algorithm (based on DCVC-FM) in return. My email is [email protected].