Giter VIP home page Giter VIP logo

recon's Introduction

🪖 ReCon: Contrast with Reconstruct

PWC PWC PWC PWC PWC PWC PWC

Contrast with Reconstruct: Contrastive 3D Representation Learning Guided by Generative Pretraining ICML 2023
Zekun Qi*, Runpei Dong*, Guofan Fan, Zheng Ge, Xiangyu Zhang, Kaisheng Ma and Li Yi

OpenReview | arXiv | Models

This repository contains the code release of ReCon: Contrast with Reconstruct: Contrastive 3D Representation Learning Guided by Generative Pretraining (ICML 2023). ReCon is also short for reconnaissance 🪖.

Contrast with Reconstruct

News

  • 💥 Mar, 2024: Check out our latest work ShapeLLM, which achieves 95.25% fine-tuned accuracy and 65.4 zero-shot accuracy on ScanObjectNN
  • 📌 Aug, 2023: Check out our exploration of efficient conditional 3D generation VPP
  • 📌 Jun, 2023: Check out our exploration of pre-training in 3D scenes Point-GCC
  • 🎉 Apr, 2023: ReCon accepted by ICML 2023
  • 💥 Feb, 2023: Check out our previous work ACT, which has been accepted by ICLR 2023

1. Requirements

PyTorch >= 1.7.0; python >= 3.7; CUDA >= 9.0; GCC >= 4.9; torchvision;

# Quick Start
conda create -n recon python=3.8 -y
conda activate recon

conda install pytorch==1.10.0 torchvision==0.11.0 cudatoolkit=11.3 -c pytorch -c nvidia
# pip install torch==1.10.0+cu113 torchvision==0.11.0+cu113 torchaudio==0.10.0+cu113 -f https://download.pytorch.org/whl/torch_stable.html
# Install basic required packages
pip install -r requirements.txt
# Chamfer Distance
cd ./extensions/chamfer_dist && python setup.py install --user
# PointNet++
pip install "git+https://github.com/erikwijmans/Pointnet2_PyTorch.git#egg=pointnet2_ops&subdirectory=pointnet2_ops_lib"

2. Datasets

We use ShapeNet, ScanObjectNN, ModelNet40 and ShapeNetPart in this work. See DATASET.md for details.

3. ReCon Models

Task Dataset Config Acc. Checkpoints Download
Pre-training ShapeNet pretrain_base.yaml N.A. ReCon
Classification ScanObjectNN finetune_scan_hardest.yaml 91.26% PB_T50_RS
Classification ScanObjectNN finetune_scan_objbg.yaml 95.35% OBJ_BG
Classification ScanObjectNN finetune_scan_objonly.yaml 93.80% OBJ_ONLY
Classification ModelNet40(1k) finetune_modelnet.yaml 94.5% ModelNet_1k
Classification ModelNet40(8k) finetune_modelnet_8k.yaml 94.7% ModelNet_8k
Zero-Shot ModelNet10 zeroshot_modelnet10.yaml 75.6% ReCon zero-shot
Zero-Shot ModelNet10* zeroshot_modelnet10.yaml 81.6% ReCon zero-shot
Zero-Shot ModelNet40 zeroshot_modelnet40.yaml 61.7% ReCon zero-shot
Zero-Shot ModelNet40* zeroshot_modelnet40.yaml 66.8% ReCon zero-shot
Zero-Shot ScanObjectNN zeroshot_scan_objonly.yaml 43.7% ReCon zero-shot
Linear SVM ModelNet40 svm.yaml 93.4% ReCon svm
Part Segmentation ShapeNetPart segmentation 86.4% mIoU part seg
Task Dataset Config 5w10s (%) 5w20s (%) 10w10s (%) 10w20s (%) Download
Few-shot learning ModelNet40 fewshot.yaml 97.3 ± 1.9 98.9 ± 1.2 93.3 ± 3.9 95.8 ± 3.0 ReCon

The checkpoints and logs have been released on Google Drive. You can use the voting strategy in classification testing to reproduce the performance reported in the paper. For classification downstream tasks, we randomly select 8 seeds to obtain the best checkpoint. For zero-shot learning, * means that we use all the train/test data for zero-shot transfer.

4. ReCon Pre-training

Pre-training with the default configuration, run the script:

sh scripts/pretrain.sh <GPU> <exp_name>

If you want to try different models or masking ratios etc., first create a new config file, and pass its path to --config.

CUDA_VISIBLE_DEVICES=<GPU> python main.py --config <config_path> --exp_name <exp_name>

5. ReCon Classification Fine-tuning

Fine-tuning with the default configuration, run the script:

bash scripts/cls.sh <GPU> <exp_name> <path/to/pre-trained/model>

Or, you can use the command.

Fine-tuning on ScanObjectNN, run:

CUDA_VISIBLE_DEVICES=<GPUs> python main.py --config cfgs/full/finetune_scan_hardest.yaml \
--finetune_model --exp_name <exp_name> --ckpts <path/to/pre-trained/model>

Fine-tuning on ModelNet40, run:

CUDA_VISIBLE_DEVICES=<GPUs> python main.py --config cfgs/full/finetune_modelnet.yaml \
--finetune_model --exp_name <exp_name> --ckpts <path/to/pre-trained/model>

6. ReCon Test&Voting

Test&Voting with the default configuration, run the script:

bash scripts/test.sh <GPU> <exp_name> <path/to/best/fine-tuned/model>

or:

CUDA_VISIBLE_DEVICES=<GPUs> python main.py --test --config cfgs/finetune_modelnet.yaml \
--exp_name <output_file_name> --ckpts <path/to/best/fine-tuned/model>

7. ReCon Few-Shot

Few-shot with the default configuration, run the script:

sh scripts/fewshot.sh <GPU> <exp_name> <path/to/pre-trained/model> <way> <shot> <fold>

or

CUDA_VISIBLE_DEVICES=<GPUs> python main.py --config cfgs/full/fewshot.yaml --finetune_model \
--ckpts <path/to/pre-trained/model> --exp_name <exp_name> --way <5 or 10> --shot <10 or 20> --fold <0-9>

8. ReCon Zero-Shot

Zero-shot with the default configuration, run the script:

bash scripts/zeroshot.sh <GPU> <exp_name> <path/to/pre-trained/model>

9. ReCon Part Segmentation

Part segmentation on ShapeNetPart, run:

cd segmentation
bash seg.sh <GPU> <exp_name> <path/to/pre-trained/model>

or

cd segmentation
python main.py --ckpts <path/to/pre-trained/model> --log_dir <path/to/log/dir> --learning_rate 0.0001 --epoch 300

Test part segmentation on ShapeNetPart, run:

cd segmentation
bash test.sh <GPU> <exp_name> <path/to/best/fine-tuned/model>

10. ReCon Linear SVM

Linear SVM on ModelNet40, run:

sh scripts/svm.sh <GPU> <exp_name> <path/to/pre-trained/model> 

11. Visualization

We use PointVisualizaiton repo to render beautiful point cloud image, including specified color rendering and attention distribution rendering.

Contact

If you have any questions related to the code or the paper, feel free to email Zekun ([email protected]) or Runpei ([email protected]).

License

ReCon is released under MIT License. See the LICENSE file for more details. Besides, the licensing information for pointnet2 modules is available here.

Acknowledgements

This codebase is built upon Point-MAE, Point-BERT, CLIP, Pointnet2_PyTorch and ACT

Citation

If you find our work useful in your research, please consider citing:

@inproceedings{qi2023recon,
  title={Contrast with Reconstruct: Contrastive 3D Representation Learning Guided by Generative Pretraining},
  author={Qi, Zekun and Dong, Runpei and Fan, Guofan and Ge, Zheng and Zhang, Xiangyu and Ma, Kaisheng and Yi, Li},
  booktitle={International Conference on Machine Learning (ICML) },
  year={2023}
}

and closely related work ACT and ShapeLLM:

@inproceedings{dong2023act,
  title={Autoencoders as Cross-Modal Teachers: Can Pretrained 2D Image Transformers Help 3D Representation Learning?},
  author={Runpei Dong and Zekun Qi and Linfeng Zhang and Junbo Zhang and Jianjian Sun and Zheng Ge and Li Yi and Kaisheng Ma},
  booktitle={The Eleventh International Conference on Learning Representations (ICLR) },
  year={2023},
  url={https://openreview.net/forum?id=8Oun8ZUVe8N}
}
@article{qi2024shapellm,
  author = {Qi, Zekun and Dong, Runpei and Zhang, Shaochen and Geng, Haoran and Han, Chunrui and Ge, Zheng and Wang, He and Yi, Li and Ma, Kaisheng},
  title = {ShapeLLM: Universal 3D Object Understanding for Embodied Interaction},
  journal = {arXiv preprint arXiv:2402.17766},
  year = {2024}
}

recon's People

Contributors

qizekun avatar runpeidong avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

recon's Issues

Zero-shot classification of ModelNet40

Thanks for your amazing work! I was trying to run zero-shot classification of ModelNet40 using the pipeline you provided. However, I only got 61.2% accuracy (66.8% reported in the paper) using the following script:

python main.py \
--config=cfgs/zeroshot/modelnet40.yaml \
--zeroshot \
--exp_name=zeroshot_modelnet \
--ckpts=ckpts/zeroshot_66_78.pth

Am I missing something here? Thanks!

About pretain logs file

Thanks for your amazing work! Can you provide the pretain logs file.I want to check if I'm running it incorrectly.
Thank you!

About reproducing the experiment result

Hello, thank you for your great work. I encountered some issues while attempting to reproduce your experiment.

I downloaded your pretrained model from Google Cloud, fine-tuned it on an RTX 3090, and obtained the following results: 93.97% on OBJ_BG, 92.08% on OBJ_ONLY, and 89.97% on PB_T50_RS (without voting, seed = 0). However, I couldn't achieve comparable results to those reported in the paper, which are 95.18%, 93.63%, and 90.63%, respectively.

After reading this issue, I learned about the correct method to reproduce the results. I then attempted using seed 32174, but the results remained the same at 93.97% on OBJ_BG. In general, it seems unlikely that the seed alone would cause such a significant performance difference (e.g., 93.97% in my case vs. 95.18% in your reported results).

Could you please provide guidance on how to accurately reproduce the experiment? Thank you very much.

Unknown args and configs in the given logs

In the given logs in Google Drive, there are some unknown args and configs that can't be found in the released code. For example, "args.pretrain_prompt : False", "config.model.cls_sample : 256" in hardest_90_63.log, and "config.model.cls_embeding : False" in "objbg_95_18.log". What are them? Are these args and configs related to the final result?

The pretrain ckpts choice for classification and segmentation task

Dear @qizekun ,

Thanks for your very nice work, recently I noticed that reproducing the results for point cloud pretraining, usually, requires a decent pre-trained checkpoint.

My question is if the same pre-trained checkpoints are used for both the classification and the segmentation task.

For example, if I found ckpt-ep250 works well for classification, am i right to use it also for part segmentation? Or to produce also a decent result of part segmentation, I need to choose another ckpts (e.g., ckpt-ep300)
?

Thanks in advance for your answer.

Best regards and have a nice day,

I ran into a problem

When I run the program, I get the following error, is my data not downloaded, or my package is not installed?
1690464568396
Looking forward to your reply, thank you

About ShapeNet55/34 Dataset

Thank you so much for your excellent work. When I run the pre-training code sh scripts/pretrain.sh <GPU> <exp_name>, I get an error,as shown:

[ WARN:[email protected]] global /io/opencv/modules/imgcodecs/src/loadsave.cpp (239) findDecoder imread_('/media/data/data01/wcs/data/ShapeNet55-34/shapenet_img/02747177-.png'): can't open/read file: check file path/integrity

I'm pretty sure it's a problem with the shapenet55/34 dataset. As you said “the image data is different from the pointcloud data in some samples, you need to update the meta-data "ShapeNet55-34/ShapeNet-55/train.txt & test.txt" from Our Google Drive.”

But i don't know what modification i need to do, can you tell me. Thanks.

about experiment result

Hi,I download your pretain model from the google cloud and finetune on NVIDIA 3090,and achieve the 92.38% result on svm modelnet40 task ,and 94.49% 92.6% 89.62% on the ScanObjectNN task, and the random seed also be set to 0 .

Is it related to the server and Pytorch environment I'm using? Or I need to cancel the setting of the random seed and run multiple times.

Thank you!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.