Giter VIP home page Giter VIP logo

sdc17 / upop Goto Github PK

View Code? Open in Web Editor NEW
93.0 5.0 7.0 2.46 MB

[ICML 2023] UPop: Unified and Progressive Pruning for Compressing Vision-Language Transformers.

Home Page: https://dachuanshi.com/UPop-Project/

License: BSD 3-Clause "New" or "Revised" License

Python 98.08% Shell 1.92%
efficient-deep-learning model-compression multimodal-learning vision-language-transformer image-captioning image-text-retrieval visual-question-answering visual-reasoning text-image-retrieval framework

upop's Introduction

UPop: Unified and Progressive Pruning for Compressing Vision-Language Transformers

Build Paper Paper Code Webiste Blog Blog
Pytorch Pytorch License

🧐 A Quick Look

  • What is it: UPop is the first structured pruning framework for vision-language Transformers. It enables effective structured pruning on various multi-modal & uni-modal tasks (including Visual Reasoning, Image Captioning, Visual Question Answer, Image-Text Retrieval, Text-Image Retrieval, Image Classification and Image Segmentation), datasets (including NLVR2, COCO Caption, VQAv2, COCO, Flickr30K, ImageNet and ADE20K), and model architectures (including BLIP, CLIP, DeiT and Segmenter).

    overview.mp4
  • What challenge does it tackle: The above video demonstrates that Unified Search adopted by UPop rescues us from the burden of repeated experiments (e.g., doing grid search) for searching optimal compression ratios among different modalities and structures. Furthermore, Progressive Pruning adopted by UPop eliminates the weight gap between the searched model and the pruned subnet to be retrained, therefore gaining better convergence and performance, especially at high compression ratios.

  • How about the performance: On multimodal tasks, for example, UPop can achieve 2x compression with only 1.2% and 2.0% accuracy loss on the VQAv2 dataset for Visual Question Answer and the NLVR2 dataset for Visual Reasoning, respectively. On unimodal tasks, for example, UPop can achieve 1.5x and 1.2x compression without any loss of accuracy on the ImageNet dataset for Image Classification and the ADE20K dataset for Image Segmentation, respectively. Some examples of vector-level structured granularity are as follows.

    Example (Task • Dataset • Model • Metric) Performance Parameters (M) FLOPs (G)
    Visual ReasoningNLVR2BLIP • Acc $83.1 \rightarrow 81.1_{\color{red}\downarrow 2.0}$ $259.5 \rightarrow 150.2_{\color{ForestGreen}\downarrow 42\%}$ $132.5 \rightarrow 89.4_{\color{ForestGreen}\downarrow 33\%}$
    Image CaptionCaption COCOBLIP • SPICE $23.8 \rightarrow 23.3_{\color{red}\downarrow 0.5}$ $224.0 \rightarrow 127.1_{\color{ForestGreen}\downarrow 43\%}$ $65.7 \rightarrow 39.8_{\color{ForestGreen}\downarrow 39\%}$
    Visual Question AnswerVQAv2BLIP • Acc $77.5 \rightarrow 76.3_{\color{red}\downarrow 1.2}$ $361.6 \rightarrow 211.3_{\color{ForestGreen}\downarrow 42\%}$ $186.1 \rightarrow 109.4_{\color{ForestGreen}\downarrow 41\%}$
    Image-Text RetrievalCOCOBLIP • R@1 $81.9 \rightarrow 77.4_{\color{red}\downarrow 4.5}$ $447.6 \rightarrow 248.9_{\color{ForestGreen}\downarrow 44\%}$ $153.2\rightarrow 88.3_{\color{ForestGreen}\downarrow 42\%}$
    Image-Text RetrievalCOCOCLIP • R@1 $71.5 \rightarrow 70.8_{\color{red}\downarrow 0.7}$ $856.0 \rightarrow 473.7_{\color{ForestGreen}\downarrow 45\%}$ $395.7\rightarrow 196.3_{\color{ForestGreen}\downarrow 50\%}$
    Text-Image RetrievalCOCOBLIP • R@1 $64.3\rightarrow 59.8_{\color{red}\downarrow 4.5}$ $447.6 \rightarrow 248.9_{\color{ForestGreen}\downarrow 44\%}$ $153.2\rightarrow 88.3_{\color{ForestGreen}\downarrow 42\%}$
    Text-Image RetrievalCOCOCLIP • R@1 $56.8\rightarrow 53.1_{\color{red}\downarrow 3.7}$ $856.0 \rightarrow 473.7_{\color{ForestGreen}\downarrow 45\%}$ $395.7\rightarrow 196.3_{\color{ForestGreen}\downarrow 50\%}$
    Image-Text RetrievalFlickr30KBLIP • R@1 $96.8\rightarrow 92.2_{\color{red}\downarrow 4.4}$ $447.6\rightarrow 250.5_{\color{ForestGreen}\downarrow 44\%}$ $153.2\rightarrow 91.0_{\color{ForestGreen}\downarrow 41\%}$
    Image-Text RetrievalFlickr30KCLIP • R@1 $96.8\rightarrow 93.2_{\color{red}\downarrow 3.6}$ $856.0\rightarrow 474.3_{\color{ForestGreen}\downarrow 45\%}$ $395.7 \rightarrow 201.1_{\color{ForestGreen}\downarrow 49\%}$
    Text-Image RetrievalFlickr30KBLIP • R@1 $86.9 \rightarrow 82.0_{\color{red}\downarrow 4.9}$ $447.6\rightarrow 250.5_{\color{ForestGreen}\downarrow 44\%}$ $153.2\rightarrow 91.0_{\color{ForestGreen}\downarrow 41\%}$
    Text-Image RetrievalFlickr30KCLIP • R@1 $86.6\rightarrow 80.5_{\color{red}\downarrow 6.1}$ $856.0\rightarrow 474.3_{\color{ForestGreen}\downarrow 45\%}$ $395.7 \rightarrow 201.1_{\color{ForestGreen}\downarrow 49\%}$
    ClassificationImageNetDeiT • Acc@1 $79.9\rightarrow 80.2_{\color{ForestGreen}\uparrow 0.3}$ $22.0 \rightarrow 15.7_{\color{ForestGreen}\downarrow 29\%}$ $4.6 \rightarrow 3.2_{\color{ForestGreen}\downarrow 30\%}$
    ClassificationImageNetDeiT • Acc@5 $95.0 \rightarrow 95.1_{\color{ForestGreen}\uparrow 0.1}$ $22.0 \rightarrow 15.7_{\color{ForestGreen}\downarrow 29\%}$ $4.6 \rightarrow 3.2_{\color{ForestGreen}\downarrow 30\%}$
    SegmentationADE20KSegmenter$\text{mIoU}^s$ $45.3\rightarrow 45.3_{\color{ForestGreen}\uparrow 0.0}$ $26.4 \rightarrow 21.5_{\color{ForestGreen}\downarrow 19\%}$ $38.6 \rightarrow 30.4_{\color{ForestGreen}\downarrow 21\%}$
    SegmentationADE20KSegmenter$\text{mIoU}^m$ $46.9 \rightarrow 47.1_{\color{ForestGreen}\uparrow 0.2}$ $26.4 \rightarrow 21.5_{\color{ForestGreen}\downarrow 19\%}$ $38.6 \rightarrow 30.4_{\color{ForestGreen}\downarrow 21\%}$

🥳 What's New

  • (Jun 2023), we worked on a new project CrossGET: Cross-Guided Ensemble of Tokens for Accelerating Vision-Language Transformers, which reduces computational costs effectively for accelerating. [Paper] [Code] 💡

  • (Jun 30, 2023), we released the implementation, scripts, checkpoints, and logs. [Code] [Website] 🚩

  • (Apr 25, 2023), our work UPop: Unified and Progressive Pruning for Compressing Vision-Language Transformers was accepted by ICML 2023. [Paper] [ArXiv] 🎉

🏃 Installation

The code is tested on Pytorch==1.11.0, cuda==11.3.1, and python==3.8.13. The dependencies can be installed by:

conda env create -f environment.yml

The status of installing dependencies: build

🚀 Visual Reasoning on the NLVR2 Dataset

  • Dataset & Annotation

    Download the NLVR2 dataset, unzip it under the datasets folder, and accordingly modify the image_root in config. Download all-in-one annotations (including annotations for Visual Reasoning, Image Caption, VQA, Image-Text Retrieval, and Text-Image Retrieval tasks) from Google Drive or Baidu Drive, unzip it under the annotation folder, and accordingly modify the annotation in config. See here for expected folder structres.

  • Evaluation

    Download compressed checkpoints from the table below, put them under the output folder, and accordingly modify the --pretrained of the scripts. For example, to evaluate a 2x compressed model:

    python -m torch.distributed.run --nproc_per_node=8 compress_nlvr.py --evaluate \
    --pretrained output/nlvr_nlvr2_compression_2x/model_base_nlvr_nlvr2_2x_compressed.pth \
    --config ./configs/nlvr.yaml \
    --output_dir output/nlvr_nlvr2_compression_2x
  • Compression

    Download the uncompressed model from the table below, put it under the pretrained folder, and accordingly modify the pretrained in config. For example, to conduct a 2x compression on 8 A100 GPUs:

    python -m torch.distributed.run --nproc_per_node=8 compress_nlvr.py --p 0.5 --epoch 15 \
    --pretrained pretrained/model_base_nlvr.pth \
    --config ./configs/nlvr.yaml \
    --output_dir output/nlvr_nlvr2_compression_2x
  • Download

    Reduction Uncompressed Model Compression Script Training Log Compressed Checkpoint Evaluation Script
    2x Google/Baidu Link Google/Baidu Google/Baidu Link
    3x Google/Baidu Link Google/Baidu Google/Baidu Link
    4x Google/Baidu Link Google/Baidu Google/Baidu Link
    5x Google/Baidu Link Google/Baidu Google/Baidu Link
    10x Google/Baidu Link Google/Baidu Google/Baidu Link

🚀 Image Caption on the COCO Caption Dataset

  • Dataset & Annotation

    Download the COCO Caption dataset, unzip it under the datasets folder, and accordingly modify the image_root in config. Download all-in-one annotations from Google Drive or Baidu Drive, unzip it under the annotation folder, and accordingly modify the annotation in config. See here for expected folder structres.

  • Evaluation

    Download compressed checkpoints from the table below, put them under the output folder, and accordingly modify the --pretrained of the scripts. For example, to evaluate a 2x compressed model:

    python -m torch.distributed.run --nproc_per_node=8 compress_caption.py --evaluate \
    --pretrained output/caption_coco_compression_2x/model_base_caption_capfilt_large_coco_2x_compressed.pth \
    --config ./configs/caption_coco.yaml \
    --output_dir output/caption_coco_compression_2x
  • Compression

    Download the uncompressed model from the table below, put it under the pretrained folder, and accordingly modify the pretrained in config. For example, to conduct a 2x compression on 8 A100 GPUs:

    python -m torch.distributed.run --nproc_per_node=8 compress_caption.py --p 0.5 --epoch 5 \
    --pretrained pretrained/model_base_caption_capfilt_large.pth \
    --config ./configs/caption_coco.yaml \
    --output_dir output/caption_coco_compression_2x
  • Download

    Reduction Uncompressed Model Compression Script Training Log Compressed Checkpoint Evaluation Script
    2x Google/Baidu Link Google/Baidu Google/Baidu Link
    4x Google/Baidu Link Google/Baidu Google/Baidu Link

🚀 Visual Question Answer on the VQAv2 Dataset

  • Dataset & Annotation

    Download the VQAv2 dataset and Visual Genome dataset, unzip them under the datasets folder, and accordingly modify the image_root in config. Download all-in-one annotations from Google Drive or Baidu Drive, unzip it under the annotation folder, and accordingly modify the annotation in config. See here for expected folder structres.

  • Evaluation

    Download compressed checkpoints from the table below, put them under the output folder, and accordingly modify the --pretrained of the scripts. For example, to evaluate a 2x compressed model:

    [!Note] Note that the scripts will generate answers vqa_result.json, which should be submitted to the official server to obtain evaluation results.

    python -m torch.distributed.run --nproc_per_node=8 compress_vqa.py --evaluate \
    --pretrained output/vqa_vqa2_compression_2x/model_base_vqa_capfilt_large_vqa2_2x_compressed.pth \
    --config ./configs/vqa.yaml \
    --output_dir output/vqa_vqa2_compression_2x
  • Compression

    Download the uncompressed model from the table below, put it under the pretrained folder, and accordingly modify the pretrained in config. For example, to conduct a 2x compression on 8 A100 GPUs:

    python -m torch.distributed.run --nproc_per_node=8 compress_vqa.py --p 0.5 --epoch 10 \
    --pretrained pretrained/model_base_vqa_capfilt_large.pth \
    --config ./configs/vqa.yaml \
    --output_dir output/vqa_vqa2_compression_2x
  • Download

    Reduction Uncompressed Model Compression Script Training Log Compressed Checkpoint Evaluation Script
    2x Google/Baidu Link Google/Baidu Google/Baidu Link
    4x Google/Baidu Link Google/Baidu Google/Baidu Link

🚀 Image-Text and Text-Image Retrieval on the COCO Dataset

  • Dataset & Annotation

    Download the COCO dataset, unzip it under the datasets folder, and accordingly modify the image_root in config. Download all-in-one annotations from Google Drive or Baidu Drive, unzip it under the annotation folder, and accordingly modify the annotation in config. See here for expected folder structres.

  • Evaluation

    Download compressed checkpoints from the table below, put them under the output folder, and accordingly modify the --pretrained of the scripts. For example, to evaluate a 2x compressed model:

    python -m torch.distributed.run --nproc_per_node=8 compress_retrieval.py --evaluate \
    --pretrained output/retrieval_coco_compression_2x/model_base_retrieval_coco_2x_compressed.pth --config ./configs/retrieval_coco.yaml \
    --output_dir output/retrieval_coco_compression_2x
  • Compression

    Download the uncompressed model from the table below, put it under the pretrained folder, and accordingly modify the pretrained in config. For example, to conduct a 2x compression on 8 A100 GPUs:

    python -m torch.distributed.run --nproc_per_node=8 compress_retrieval.py --p 0.5 --epoch 6 \
    --pretrained pretrained/model_base_retrieval_coco.pth \
    --config ./configs/retrieval_coco.yaml \
    --output_dir output/retrieval_coco_compression_2x
  • Download

    Reduction Uncompressed Model Compression Script Training Log Compressed Checkpoint Evaluation Script
    2x Google/Baidu Link Google/Baidu Google/Baidu Link
    4x Google/Baidu Link Google/Baidu Google/Baidu Link

🚀 Image-Text and Text-Image Retrieval on the Flickr30K Dataset

  • Dataset & Annotation

    Download the Flickr30k dataset, unzip it under the datasets folder, and accordingly modify the image_root in config. Download all-in-one annotations from Google Drive or Baidu Drive, unzip it under the annotation folder, and accordingly modify the annotation in config. See here for expected folder structres.

  • Evaluation

    Download compressed checkpoints from the table below, put them under the output folder, and accordingly modify the --pretrained of the scripts. For example, to evaluate a 2x compressed model:

    python -m torch.distributed.run --nproc_per_node=8 compress_retrieval_flickr.py --evaluate \
    --pretrained output/retrieval_flickr_compression_2x/model_base_retrieval_flickr_2x_compressed.pth \
    --config ./configs/retrieval_flickr.yaml \
    --output_dir output/retrieval_flickr_compression_2x
  • Compression

    Download the uncompressed model from the table below, put it under the pretrained folder, and accordingly modify the pretrained in config. For example, to conduct a 2x compression on 8 A100 GPUs:

    python -m torch.distributed.run --nproc_per_node=8 compress_retrieval_flickr.py --p 0.5 --epoch 12 \
    --pretrained pretrained/model_base_retrieval_flickr.pth \
    --config ./configs/retrieval_flickr.yaml \
    --output_dir output/retrieval_flickr_compression_2x
  • Download

    Reduction Uncompressed Model Compression Script Training Log Compressed Checkpoint Evaluation Script
    2x Google/Baidu Link Google/Baidu Google/Baidu Link
    4x Google/Baidu Link Google/Baidu Google/Baidu Link

🚀 Image-Text and Text-Image Retrieval on the COCO Dataset with CLIP

  • Dataset & Annotation

    Download the COCO dataset, unzip it under the datasets folder, and accordingly modify the image_root in config. Download all-in-one annotations from Google Drive or Baidu Drive, unzip it under the annotation folder, and accordingly modify the annotation in config. See here for expected folder structres.

  • Evaluation

    Download compressed checkpoints from the table below, put them under the output folder, and accordingly modify the --pretrained of the scripts. For example, to evaluate a 2x compressed model:

    python -m torch.distributed.run --nproc_per_node=8 compress_retrieval_clip.py --evaluate \
    --pretrained output/retrieval_coco_clip_compression_2x/clip_large_retrieval_coco_2x_compressed.pth \
    --config ./configs/retrieval_coco_clip.yaml \
    --output_dir output/retrieval_coco_clip_compression_2x
  • Compression

    Download the uncompressed model from the table below, put it under the pretrained folder, and accordingly modify the pretrained in config. For example, to conduct a 2x compression on 8 A100 GPUs:

    python -m torch.distributed.run --nproc_per_node=8 compress_retrieval_clip.py --p 0.5 --epoch 6 \
    --pretrained pretrained/clip_large_retrieval_coco.pth \
    --config ./configs/retrieval_coco_clip.yaml \
    --output_dir output/retrieval_coco_clip_compression_2x
  • Download

    Reduction Uncompressed Model Compression Script Training Log Compressed Checkpoint Evaluation Script
    2x Google/Baidu Link Google/Baidu Google/Baidu Link
    4x Google/Baidu Link Google/Baidu Google/Baidu Link

🚀 Image-Text and Text-Image Retrieval on the Flickr30K Dataset with CLIP

  • Dataset & Annotation

    Download the Flickr30k dataset, unzip it under the datasets folder, and accordingly modify the image_root in config. Download all-in-one annotations from Google Drive or Baidu Drive, unzip it under the annotation folder, and accordingly modify the annotation in config. See here for expected folder structres.

  • Evaluation

    Download compressed checkpoints from the table below, put them under the output folder, and accordingly modify the --pretrained of the scripts. For example, to evaluate a 2x compressed model:

    python -m torch.distributed.run --nproc_per_node=8 compress_retrieval_clip.py --evaluate \
    --pretrained output/retrieval_flickr_clip_compression_2x/clip_large_retrieval_flickr_2x_compressed.pth \
    --config ./configs/retrieval_flickr_clip.yaml \
    --output_dir output/retrieval_flickr_clip_compression_2x
  • Compression

    Download the uncompressed model from the table below, put it under the pretrained folder, and accordingly modify the pretrained in config. For example, to conduct a 2x compression on 8 A100 GPUs:

    python -m torch.distributed.run --nproc_per_node=8 compress_retrieval_clip.py --p 0.5 --epoch 12 \
    --pretrained pretrained/clip_large_retrieval_flickr.pth \
    --config ./configs/retrieval_flickr_clip.yaml \
    --output_dir output/retrieval_flickr_clip_compression_2x
  • Download

    Reduction Uncompressed Model Compression Script Training Log Compressed Checkpoint Evaluation Script
    2x Google/Baidu Link Google/Baidu Google/Baidu Link
    4x Google/Baidu Link Google/Baidu Google/Baidu Link

🚀 Image Classification on the ImageNet Dataset

  • Dataset & Annotation

    Download the ImageNet dataset, unzip it under the datasets folder, and accordingly modify the option --data-path in compression and evaluation scripts. See here for expected folder structres.

  • Evaluation

    Download compressed checkpoints from the table below, put them under the output folder, and accordingly modify the option --resume of the scripts. For example, to evaluate a 50% compressed model:

    python -m torch.distributed.run --nproc_per_node=8 compress_deit.py --eval --dist-eval \
    --data-path datasets/vision/imagenet \
    --model deit_small_patch16_224 \
    --resume output/train_deit_small_patch16_224_60s_300r_050x/deit_small_patch16_224_050x_compressed.pth
  • Compression

    Download the uncompressed model from the table below, put it under the pretrained folder, and accordingly modify the option --finetune of the scripts. For example, to conduct a 50% compression on 8 A100 GPUs:

    python -m torch.distributed.run --nproc_per_node=8 compress_deit.py \
    --data-path datasets/vision/imagenet \
    --finetune pretrained/deit_small_patch16_224-cd65a155.pth \
    --model deit_small_patch16_224 \
    --epochs-search 60 \
    --epochs 300 \
    --batch-size 512 \
    --lr-search 1e-4 \
    --lr 1e-4 \
    --warmup-epochs 0 \
    --p 0.5 \
    --interval 800 \
    --output_dir output/train_deit_small_patch16_224_60s_300r_050x
  • Download

    Reduction Uncompressed Model Compression Script Training Log Compressed Checkpoint Evaluation Script
    10% Google/Baidu Link Google/Baidu Google/Baidu Link
    20% Google/Baidu Link Google/Baidu Google/Baidu Link
    30% Google/Baidu Link Google/Baidu Google/Baidu Link
    40% Google/Baidu Link Google/Baidu Google/Baidu Link
    50% Google/Baidu Link Google/Baidu Google/Baidu Link

🚀 Image Segmentation on the Ade20k Dataset

  • Dataset & Annotation

    Download the Ade20k dataset, unzip it under the datasets folder, and accordingly modify the option --dataset in compression and evaluation scripts. See here for expected folder structres.

  • Evaluation

    Download compressed checkpoints from the table below, put them under the output folder, accordingly modify the path option of the scripts, and export the folder of datasets as the environment variable DATASET. For example, to evaluate a 30% compressed model:

    export DATASET=datasets/vision
    
    # for single-scale testing
    python -m torch.distributed.run --nproc_per_node=4 segm/eval/miou.py \
    output/seg_small_mask_16s_64r_030x/seg_small_mask_030x_compressed.pth ade20k --singlescale
    
    # for multi-scale testing
    python -m torch.distributed.run --nproc_per_node=4 segm/eval/miou.py \
    output/seg_small_mask_16s_64r_030x/seg_small_mask_030x_compressed.pth ade20k --multiscale
  • Compression

    Download the uncompressed model from the table below, put it under the pretrained folder, accordingly modify the option --pretrained of the scripts, and export the folder of datasets as the environment variable DATASET. For example, to conduct a 30% compression on 4 A100 GPUs:

    export DATASET=datasets/vision
    
    python -m torch.distributed.run --nproc_per_node=4 segm/train.py --dataset ade20k \
    --backbone vit_small_patch16_384 --decoder mask_transformer --no-resume \
    --pretrained pretrained/seg_small_mask.pth \
    --epochs-search 16 \
    --epochs 64 \
    --batch-size 64 \
    --lr-search 4e-3 \
    -lr 4e-3  \
    --p 0.30 \
    --interval 200 \
    --log-dir output/seg_small_mask_16s_64r_030x
  • Download

    Reduction Uncompressed Model Compression Script Training Log Compressed Checkpoint Evaluation Script
    10% Google/Baidu Link Google/Baidu Google/Baidu Link
    15% Google/Baidu Link Google/Baidu Google/Baidu Link
    20% Google/Baidu Link Google/Baidu Google/Baidu Link
    30% Google/Baidu Link Google/Baidu Google/Baidu Link

📑 Common Issues

1. Evaluation with single GPU

  • For BLIP and CLIP models, evaluate the 2x compressed BLIP model on the NLVR2 dataset as an example:

    python compress_nlvr.py --evaluate \
    --pretrained output/caption_coco_compression_2x/model_base_caption_capfilt_large_coco_2x_compressed.pth \
    --config ./configs/caption_coco.yaml \
    --output_dir output/caption_coco_compression_2x
  • For DeiT, evaluate the 50% compressed model on the ImageNet dataset as an example:

    [!Note] Note that without the option ---dist-eval

    python compress_deit.py --eval \
    --data-path datasets/vision/imagenet \
    --model deit_small_patch16_224 \
    --resume output/train_deit_small_patch16_224_60s_300r_050x/deit_small_patch16_224_050x_compressed.pth
  • For Segmenter, evaluate the 30% compressed model on the ADE20k dataset as an example:

    export DATASET=datasets/vision
    
    # for single-scale testing
    python segm/eval/miou.py \
    output/seg_small_mask_16s_64r_030x/seg_small_mask_030x_compressed.pth ade20k --singlescale
    
    # for multi-scale testing
    python segm/eval/miou.py \
    output/seg_small_mask_16s_64r_030x/seg_small_mask_030x_compressed.pth ade20k --multiscale

2. Compress with single GPU

  • For BLIP and CLIP models, compress the BLIP model to half on the NLVR2 dataset as an example:

    python compress_nlvr.py --p 0.5 --epoch 15 \
    --pretrained pretrained/model_base_nlvr.pth \
    --config ./configs/nlvr.yaml \
    --output_dir output/nlvr_nlvr2_compression_2x
  • For DeiT, conduct a 50% compression on the ImageNet dataset as an example:

    python compress_deit.py \
    --data-path datasets/vision/imagenet \
    --finetune pretrained/deit_small_patch16_224-cd65a155.pth \
    --model deit_small_patch16_224 \
    --epochs-search 60 \
    --epochs 300 \
    --batch-size 512 \
    --lr-search 1e-4 \
    --lr 1e-4 \
    --warmup-epochs 0 \
    --p 0.5 \
    --interval 800 \
    --output_dir output/train_deit_small_patch16_224_60s_300r_050x
  • For Segmenter, conduct a 30% compression on the Ade20k dataset as an example:

    export DATASET=datasets/vision
    
    python segm/train.py --dataset ade20k \
    --backbone vit_small_patch16_384 --decoder mask_transformer --no-resume \
    --pretrained pretrained/seg_small_mask.pth \
    --epochs-search 16 \
    --epochs 64 \
    --batch-size 64 \
    --lr-search 4e-3 \
    -lr 4e-3  \
    --p 0.30 \
    --interval 200 \
    --log-dir output/seg_small_mask_16s_64r_030x

3. Out of memory during the evaluation

  • For BLIP and CLIP models, change the batch_size_test (or the batch_size for the Image Caption task) in the corresponding config file to a smaller number.
  • For DeiT, modify the option --batch-size of the scripts to a smaller number.
  • For Segmenter, the default batch size of the evaluation is 1. For the single-scale testing, the peak of used GPU memory on a single card is less than 5G, which should be able to run on most types of GPUs. For the multi-scale testing, the peak of used GPU memory on a single card is about 13G, which may require a GPU with relatively larger memory.

4. Out of memory during the compression

  • For BLIP and CLIP models, change the batch_size_train and batch_size_test (or the batch_size for the Image Caption task) in the corresponding config file to a smaller number. Besides, the option --amp for compression scripts can be used to enable mixed precision. Compress the BLIP model to half on the NLVR2 dataset as an example:

    python -m torch.distributed.run --nproc_per_node=8 compress_nlvr.py --p 0.5 --epoch 15 --amp \
    --pretrained pretrained/model_base_nlvr.pth \
    --config ./configs/nlvr.yaml \
    --output_dir output/nlvr_nlvr2_compression_2x

    [!WARNING]
    Note that using mixed precision may produce nan gradients. Since UPop take gradients as metrics to determine pruned positions, nan gradients may disrupt the determination and degrade the performance.

  • For DeiT and Segmenter, modify the option --batch-size of the scripts to a smaller number. Mixed precision is not supported temporarily, as it frequently causes nan gradients.

🌲 Expected Folder Structures

├── annotation
│   ├── answer_list.json
│   ├── coco_gt
│   │   ├── coco_karpathy_test_gt.json
│   │   └── coco_karpathy_val_gt.json
│   ├── ...
├── clip                                               
├── compress_caption.py       
├── compress_deit.py        
├── compress_nlvr.py                  
├── compress ...    
├── configs                                             
├── data                                        
├── datasets
│   └── vision
│       ├── coco
│       ├── flickr
│       ├── NLVR2     
│       ├── ...                                                                              
├── deit   
├── log                                     
├── models            
├── output                                    
├── pretrained
│   ├── bert-base-uncased
│   ├── clip_large_retrieval_coco.pth
│   ├── clip_large_retrieval_flickr.pth
│   ├── ...       
├── segm                                                                                   
├── transform                                                                           
└── utils.py                                

💬 Acknowledgments

This code is built upon BLIP, CLIP, DeiT, Segmenter, and timm. Thanks for these awesome open-source projects!

✨ Citation

If you find our work or this code useful, please consider citing the corresponding paper:

@InProceedings{pmlr-v202-shi23e,
  title = {{UP}op: Unified and Progressive Pruning for Compressing Vision-Language Transformers},
  author = {Shi, Dachuan and Tao, Chaofan and Jin, Ying and Yang, Zhendong and Yuan, Chun and Wang, Jiaqi},
  booktitle = {Proceedings of the 40th International Conference on Machine Learning},
  pages = {31292--31311},
  year = {2023},
  volume = {202},
  publisher = {PMLR}
}

upop's People

Contributors

chaofantao avatar sdc17 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

upop's Issues

关于数据集的问题

是否可以提供VQA v2和visual genome数据集的网盘文件,文中所提供的链接现在无法打开或下载

Question about accumulated gradients metric

Dear author,

Hello, I have read your paper and code. UPop uses the cumulative gradient of mask as metric to evaluate the weight importance. However, I don't understand why UPop prunes the parts with large cumulative gradients. Does it mean that the parts with larger cumulative gradients are less important? Is there any related research supporting this, or is it based on intuition? Could you please provide some clarification?

Thank you.

你好,我想询问一些关于微调方面的问题

我希望对BLIP模型进行剪枝,并在自己的项目数据集上进行微调,我是不是应该在retrain阶段计算loss时选用自己的数据集进行微调。
以及后续还应该在哪些参数上进行调整,或者还有什么策略可以提升微调效果。
刚刚接触这方面的研究,希望能得到您的指点,非常感谢。

Runtime error caused by clip/mock.py or deit/mock.py while evaluating or compressing

  • For CLIP models, the clip/mock.py is used for patching our modification to the nn.MultiheadAttention. It was modified from the source code of the nn.MultiheadAttention in version Pytorch==1.11.0, and also tested on Pytorch==1.12.1 and Pytorch==1.13.1. However, it may not be compatible with other Pytorch versions that we have not tested. If you encounter this error in other versions, you may switch to version 1.11.0 or create your own patch file by referring to our clip/mock.py.

  • For DeiT models, the deit/mock.py is used for patching our modification to the timm.models. It was modified from the source code of the timm.models.vision_transformer in version timm==0.4.12 and torchvision==0.12.0. It may not be compatible with other timm and torchvision versions that we have not tested. If you encounter this error in other versions, you may switch to the above versions we used, or create your own patch file by referring to our deit/mock.py.

Problem with CLIP implementation

There's a bug concerning CLIP implementation, since nn.MultiheadAttention does not have a parameter called "search", which is customized in your implementation of other modules to give the model a learnable pruning parameter. However, the command runs without error. I will appreciate it if you would help me with that.

Availability of Pretrained Weights for Chinese Users

Hi,

Can you update a version of the pretrained weights that can be shared through popular cloud storage platforms such as Baidu Netdisk or OneDrive, which are widely used in China. some of users in mainland China have reported difficulties in accessing the pretrained weights due to the need for a proxy or VPN connection to bypass internet censorship.

Best

Problem installing petrel-oss-sdk v2.2.1-2-g1505ef3-master from environment.yaml pip dependencies

Hi,

Thanks for your valuable work and sharing the code.
When trying to install the dependencies listed in the environment.yaml file for UPop, I encountered an error related to the petrel-oss-sdk package. Specifically, the version specified in the file (v2.2.1-2-g1505ef3-master) could not be installed using pip.

I would appreciate any suggestions or insights on how to resolve this issue and successfully install the petrel-oss-sdk package from the environment.yaml file.

best

你好 我在渐进式剪枝这一环节有些疑问 希望您解答

你好,很出色的工作!
我在渐进式剪枝这一部分有一些疑问,在论文中,您提出用累计梯度的大小作为alpha参数重要性的衡量因子,不过在论文中好像并未说明更新alpha时变动的元素是alpha的grad值大还是值小的元素?
在代码中,我看到:

sorted_alpha_grad, indices = torch.sort(alpha_grad, descending=True)
    compression_weight = torch.ones_like(indices)
    compression_weight[indices < alpha_grad_attn.numel()] = 36 # 36 = 12 (number of heads) * [1 (weights of query) + 1 (weights of key) + 1 (weights of value)]
    threshold = sorted_alpha_grad[torch.argmin(torch.abs(torch.cumsum(compression_weight, 0) - torch.sum(compression_weight)*pi))]
    
    def update(module, grad):
        mask = ((grad <= threshold) | (grad <= torch.min(grad)))
        module.data.copy_(mask + (~mask)*(1 - pi/p))

这一部分似乎说明,在update过程中是grad比threshold大的部分参数得到了更新,将其趋于0,这里我有些疑问,为何grad比threshold大的部分参数重要性就小以至于能趋于0呢?希望您解答

显存被占满

您好,我在试运行您的代码的时候,显示我的显存被占满。报错信息如下:

RuntimeError: CUDA out of memory. Tried to allocate 290.00 MiB (GPU 0; 31.75 GiB total capacity; 29.78 GiB already allocated; 113.50 MiB free; 30.27 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

我检索到占满内存的代码是 loss = model(image, caption, alpha=alpha, idx=idx)

想问一下该怎么解决内存占满的问题呢?谢谢!

加載checkpoints

您好!
請問在compression之後,模型的結構也發生改變,是不是不能用原來的build model函數加載模型的checkpoints?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.