baaivision / eva Goto Github PK

View Code? Open in Web Editor NEW

2.0K 31.0 146.0 8.81 MB

EVA Series: Visual Representation Fantasies from BAAI

License: MIT License

Python 94.40% Jupyter Notebook 0.21% Shell 0.23% C++ 1.89% Cuda 3.18% Dockerfile 0.07% Makefile 0.01% CMake 0.01%

foundation-models representation-learning vision-transformer

eva's Introduction

EVA: Visual Representation Fantasies from BAAI

EVA-01 (CVPR 2023, 🌟highlight🌟) - Exploring the Limits of Masked Visual Representation Learning at Scale
EVA-02 (Arxiv 2023) - A Visual Representation for Neon Genesis
EVA-CLIP (Arxiv 2023) - Improved Training Techniques for CLIP at Scale
EVA-CLIP-18B (Arxiv 2024) - EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
EVA @ Hugging Face 🤗 & timm
EVA-CLIP @ Hugging Face 🤗 & timm
EVA-CLIP @ open_clip

Contact

We are hiring at all levels at BAAI Vision Team, including full-time researchers, engineers and interns. If you are interested in working with us on foundation model, self-supervised learning and multimodal learning, please contact Xinlong Wang ([email protected]).

License

The content of this project itself is licensed under LICENSE.

Misc

eva's People

Contributors

Stargazers

Watchers

Forkers

lindylin1817 cv-ip quan-sun t-sumida sunsmarterjie mohamedettebayo camielk yuhtc wangbo-zhao mrk1992 luhc15 forestr gaohuan2015 humandotlearning otakutyrant ee2110 overbestfitting zeynepozdemir andrey200702 wofmanaf anhforth pkurainbow happylee524 charleneleong-ai dou3516 geniusfoever lyconan tommyngx sherazvalossa datakalp nameongithub rentainhe antecessor thesujitroy dungdo123 jmhessel qiao1025566574 pawan2411 imrton mldl rbumbi epsilon-deltta babyblue26 codeaudit mrhalyang terrencepai froestiago ai-app zyoungxu zlgenuine 1343744768 legendbc kchen116 maserhe gjtjx matrixgame2018 larsoncs whuhxb mingsun95 leborn-dominic qinb leavenotrace anylee2021 techthiyanes shunxing1234 h-z-h-cell jorina29 youhuang67 deanofthewebb solomon1588 cv-seg tanviaanand xiao2mo auto-transfer gaoxin627 pipizhum ethicalsecurity-agency encounter1997 amanikiruga ailearnwjf asden alxemade zldrobit mansoorinho pandaupc keyo3 yaojh01 songpeng326 kailthen af-74413592 zaza-97 kaeun-kim-97 vt132 ttengwang nemonameless shuxunoo evdcush kaicheng-yang0828 tomorn112 kite-hz

eva's Issues

Can't load instance segmenation model

Hello, thanks for the amazing work :)

I have met a problem when I tried to load the instance segmentation model by using the .py config file, since I wanted to do single image inference. Can you please help me? My code is:

from detectron2.config import LazyConfig
from detectron2.modeling import build_model

# I would like to use this model: EVA/det/projects/ViTDet/configs/COCO/cascade_mask_rcnn_vitdet_eva.py 
config_file = 'path/to/COCO/instance_segmentation model'
checkpoint_file = './drive/MyDrive/eva_coco_seg.pth'

cfg = LazyConfig.load(config_file)
model = build_model(cfg)

And I get following error:

/usr/local/lib/python3.8/dist-packages/omegaconf/dictconfig.py in _get_node(self, key, validate_access, validate_key, throw_on_missing_value, throw_on_missing_key)
    478         if value is None:
    479             if throw_on_missing_key:
--> 480                 raise ConfigKeyError(f"Missing key {key!s}")
    481         elif throw_on_missing_value and value._is_missing():
    482             raise MissingMandatoryValue("Missing mandatory value: $KEY")

ConfigAttributeError: Missing key MODEL
    full_key: MODEL
    object_type=dict

I also tried other detectron2 methods for loading model, but still not work.

The usage of BEiT_win

Hi, it is very kind to provide the BEiT with window attention. Did you try this model in seg for more efficient memory usage and how about the accuracy? In addition, you seem to drop the cls embedding and change the number of relative position bias from (2Wh-1 * 2Ww-1 + 3) to (2Wh-1 * 2Ww-1). So the relation position bias will be initialized randomly rather than loaded from pre-trained weights?

about pre-training model

hello,could you please post the pre-training model, e.g., vit-base/16 or vit-large/16?

Thanks

detection inference error

Hi. Thx for the great work!

I tried detection inference with the following code

!python demo/demo.py \
    --config-file projects/ViTDet/configs/COCO/cascade_mask_rcnn_vitdet_eva.py \
    --input test.jpg \
    --output detect_test \
    --opts MODEL.WEIGHTS eva_coco_det.pth

And the following error occurred.

yaml.parser.ParserError: expected '<document start>', but found '<scalar>'
  in "projects\ViTDet\configs\COCO\cascade_mask_rcnn_vitdet_eva.py", line 6, column 5

Do you know why the error occurs?
or Can you show me the detection inference sample code?

Thank you very much!

Questionning the comparison with SSL methods on paperwithcode

Hi,

Congrats on this interesting work.

I am wondering whether the comparisons with previous works here and here are really fair given that you are sort of distilling from CLIP encoders (i.e. which use text supervision).

Whether the NUM_FRAMES should be set to 16 instead of 8 on K700 fine-tune phrase?

EVA/video/configs/kinetics700_ft.yaml

Line 6 in 4f3f50a

NUM_FRAMES: 8

as mentioned as titile, the num_frames should be set to 16 in fine-tune phrase after k722 pretraining, but default value is 8.

assert result.hostname is not None

Thanks for your excellent work！
When I use --num-gpus=2, I get the following error, I can't solve the problem, and I hope to get your help.

Traceback (most recent call last):
File "tools/lazyconfig_train_net.py", line 125, in
launch(
File "/public/home/1/code/EVA/det/detectron2/engine/launch.py", line 67, in launch
mp.spawn(
File "/public/home/1/Anaconda3/envs/EVA/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 230, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "/public/home/1/Anaconda3/envs/EVA/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 188, in start_processes
while not context.join():
File "/public/home/1/Anaconda3/envs/EVA/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 150, in join
raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException:

-- Process 0 terminated with the following error:
Traceback (most recent call last):
File "/public/home/1/Anaconda3/envs/EVA/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 59, in _wrap
fn(i, *args)
File "/public/home/1/code/EVA/det/detectron2/engine/launch.py", line 108, in _distributed_worker
raise e
File "/public/home/1/code/EVA/det/detectron2/engine/launch.py", line 98, in _distributed_worker
dist.init_process_group(
File "/public/home/1/Anaconda3/envs/EVA/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 520, in init_process_group
store, rank, world_size = next(rendezvous_iterator)
File "/public/home/1/Anaconda3/envs/EVA/lib/python3.8/site-packages/torch/distributed/rendezvous.py", line 141, in _tcp_rendezvous_handler
assert result.hostname is not None
AssertionError

This is my running code：
python tools/lazyconfig_train_net.py --num-gpus 2
--num-machines 1 --machine-rank 0 --dist-url "tcp://$MASTER_ADDR:60900"
--config-file projects/ViTDet/configs/COCO/cascade_mask_rcnn_vitdet_eva.py \

how many types in the pre-trained model? as coco?

How long does it take to train EVA-02?

Great job on your solid work! I am truly impressed! :)

Although I could find the GPU-time statistics in the eva-01 paper, I couldn't locate the pre-training compute used for eva-02?

The MIM target for EVA

Sorry to bother you. I notice that the EVA utilizes the projected CLIP feature as the target, I am wondering why not use the feature before projection since it is the original representation of ViT model. Is is determined by the performance of experiments?

The config file for instance segmentation

Hi, could you please tell me where can I find the file of "projects/ViTDet/configs/COCO/cascade_mask_rcnn_vitdet_eva_1536.py" for instance segmentation?

inconsistence in the function `_jax_init`

Hi, great work to EVA-02.

It seems that there exists an inconsistence in the function _jax_init.

The code comment is we use xavier_uniform following official JAX ViT & MAE, while using xavier_normal_ instead in L193.

It is intended or is a typo?

Thanks.

ImageNet 1K results without 21k Intermediate Fine-tune

I see you achieve good result 89.2 in imagenet 1k with eva_l_psz14_21k_ft，but what can it achieve when without 21k intermediate fine-tune?

COCO object detection image size and inference speed

First of all, thank you for publishing all this hard work!

After some detectron2 struggles (it has to be build from the EVA fork in /EVA/det/, not from the facebookresearch repo!) I managed to run the COCO evaluation example to reproduce your object detection and instance segmentation results. Amazing stuff!!

My first question is regarding the image size. The detectron2 default image size for COCO is 1024. In your paper you say that you fine-tuned on COCO and LVIS using 1280² inputs, but in the code example published here you use a default image size of 1536 (cascade_mask_rcnn_vitdet_eva_1536.py). Why did you use 1536 here?

I did some inference experiments and it seems 1280 is indeed an optimal image size for inference:

image_size	inference speed (3090)	bbox mAP (only on 100 val2017 images)
1536	~2.35 s/iter	67.1
1280	~1.05 s/iter	67.4
1024	~1.23 s/iter	66.5
612	~0.95 s/iter	59.8

Second question. I used AMP to autocast the torch model to FP16 which already reduced the 1280² inference time from 1.05 s/iter to 0.75 s/iter. I am interested in further optimizing the model for faster inference speeds. Do you think it is possible to optimize your model using TensorRT or ONNX?

Thanks again!

The setting of hyperparameter

Hi, Thanks for the great work. I am confused about the meaning of pt_hw_seq_len. Is there any specific reason why it is set at 16? eg, following the pre-training setting, which is 224/14=16? Can it be set as 1 or being equal to ft_seq_len. In such cases, the position `t' is normalized to 0-1 or without normalization.

How to handle pos encoding with different resolution?

Thank authors for the great great work!

In EVA-02, the model changes position encoding from RPE to RoPE. I'd like to understand how this change works when given different resolutions with pretrained one. Is it possible to input any resolution to do zero-shot tasks? Or it has to be finetuned when given different resolutions like detection or segmentation?

Error during fine-tuning EVA object detector

Hello,

I am fine-tuning EVA on my custom dataset. I ran into the following error (it also happens when fine-tuning on COCO):

File "train.py", line 187, in <module>
  main(args)
File "train.py", line 169, in main
  trainer.train(0, cfg.train.max_iter)
File "/home/appuser/eva_repo/det/detectron2/engine/train_loop.py", line 149, in train
  self.run_step()
File "/home/appuser/eva_repo/det/detectron2/engine/train_loop.py", line 421, in run_step
  self.grad_scaler.scale(losses).backward()
File "/home/appuser/.local/lib/python3.7/site-packages/torch/_tensor.py", line 255, in backward
  torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File "/home/appuser/.local/lib/python3.7/site-packages/torch/autograd/__init__.py", line 149, in backward
  allow_unreachable=True, accumulate_grad=True)  # allow_unreachable flag
File "/home/appuser/.local/lib/python3.7/site-packages/torch/autograd/function.py", line 87, in apply
  return self._forward_cls.backward(self, *args)  # type: ignore[attr-defined]
File "/home/appuser/.local/lib/python3.7/site-packages/fairscale/nn/checkpoint/checkpoint_activations.py", line 331, in backward
  outputs = ctx.run_function(*unpacked_args, **unpacked_kwargs)
File "/home/appuser/eva_repo/det/detectron2/modeling/backbone/vit.py", line 289, in forward
  x = self.attn(x)
File "/home/appuser/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
  return forward_call(*input, **kwargs)
File "/home/appuser/eva_repo/det/detectron2/modeling/backbone/vit.py", line 139, in forward
  attn = add_decomposed_rel_pos(attn, q, self.rel_pos_h, self.rel_pos_w, (H, W), (H, W))
File "/home/appuser/eva_repo/det/detectron2/modeling/backbone/utils.py", line 133, in add_decomposed_rel_pos
  Rh = get_rel_pos(q_h, k_h, rel_pos_h)
File "/home/appuser/eva_repo/det/detectron2/modeling/backbone/utils.py", line 100, in get_rel_pos
  z = rel_pos[:, i].view(src_size).cpu().float().numpy()
RuntimeError: Can't call numpy() on Tensor that requires grad. Use tensor.detach().numpy() instead.

The error is caused by this line:

EVA/det/detectron2/modeling/backbone/utils.py

Line 100 in 098c493

z = rel_pos[:, i].view(src_size).cpu().float().numpy()

I fixed it by changing it to:

z = rel_pos[:, i].view(src_size).cpu().float().detach().numpy()

After this change, the training runs without issue and the loss decreases steadily.

But I am not sure if I understand the full implications of this change.
Calling .detach() means the gradients are not updated for this Tensor. Or is that not an issue for this call? Did you not get this error during training?

I am running EVA inside docker with CUDA 11.1, Python 3.7, torch 1.9.0, torchvision 0.10.0, mmcv-full 1.6.1. But I doubt this is a versioning issue

Performance with MAE style pretraining

Hi, I noticed that during the pretraining for EVA, there are two settings: MAE style and BEiT style. I am wondering the performance of MAE style, is there any comparison between these two styles of pertaining?

Will you publish a config for instance segmentation ?

Hi Guys !
Thank you for your job !
Will you publish a folder with instance segmentation and when ?
Thank you !

use_cls seems likes not used in linear prob

Have you tried Swin transformer?

In current EVA version, the backbone structure is the vanilla ViT. I have noticed that the author is the inventor of Swin transformer .
I have a doubt that have you tried to replace the ViT with Swin T, and how is the performance of ETA in the circumstance.
Thank you!

installation on colab

Please I am having issue installing EVA
can anyone please help me installing and configuring EVA on google colab notebook ?
if possible send me a colab notebook

Representation of Image in EVA

Hi, the EVA model is trained with image tokens, and you use the average pooling for image representation during fine-tuning. I think the CLS token is not well-learned. But for EVA-CLIP, it initialized from the EVA model whose CLS token is not well-learned and uses CLS for pre-training. I am wondering why not using the average pooling for image representation and align with the text?

RuntimeError: EncoderDecoderMask2FormerAug: BEiTAdapter: "compute_indices_weights_cubic" not implemented for 'Half'

Hi, I met a problem with this bug. The configs used are the same as yours.

About the EVA-02 on Arxiv

I sincerely suggest change all the title(EVA-02) mentioned in this paper back to black. Too many words in red is extremely distracting and hurt the overall reading experience.

calculation in the detection code when xattn=False

Thank you for your great work!

There seems to be a code error here, these permutation operations should not exist when xattn = False. They should be moved to line 125 in if self.xattn:.

Thanks.

cuda out of memory

Thanks for your excellent work. When i train custom datasets, and adapt the total batch size into 8 (one node ,8 a100), the error still happend. I don't know why

Typos VA/EVA-02/asuka/README.md

You always give the pretraining scripts as "python -m torch.distributed.launch --nproc_per_node=8 --nnodes=${WORLD_SIZE} --node_rank=${RANK} --master_addr=${MASTER_ADDR} --master_port=12345 --use_env run_beit_pretraining.py" I believe you mean run_eva_pretrianing.py?

Custom downsteam object detection task (only bbox no mask )

Hi Author,

  I want to  training a custom object detection task base on your excellent work.

But i found our dataset do not have mask label.
How to remove all mask requirement in your coco detection task ?

Run semantic segmentation for a single image

How to run segmentation for a single image please?

warning while training

hi there
i am experiencing this warning while training

01/13 09:56:11 d2.engine.train_loop]: Starting training from iteration 0
/usr/local/lib/python3.8/dist-packages/shapely/set_operations.py:133: RuntimeWarning: invalid value encountered in intersection
return lib.intersection(a, b, **kwargs)
/usr/local/lib/python3.8/dist-packages/shapely/set_operations.py:133: RuntimeWarning: invalid value encountered in intersection
return lib.intersection(a, b, **kwargs)
/usr/local/lib/python3.8/dist-packages/shapely/set_operations.py:133: RuntimeWarning: invalid value encountered in intersection
return lib.intersection(a, b, **kwargs)
/usr/local/lib/python3.8/dist-packages/shapely/set_operations.py:133: RuntimeWarning: invalid value encountered in intersection
return lib.intersection(a, b, **kwargs)
/usr/local/lib/python3.8/dist-packages/shapely/set_operations.py:133: RuntimeWarning: invalid value encountered in intersection
return lib.intersection(a, b, **kwargs)
/usr/local/lib/python3.8/dist-packages/shapely/set_operations.py:133: RuntimeWarning: invalid value encountered in intersection
return lib.intersection(a, b, **kwargs)
/usr/local/lib/python3.8/dist-packages/torch/nn/functional.py:718: UserWarning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (Triggered internally at /pytorch/c10/core/TensorImpl.h:1156.)
return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode)
/usr/local/lib/python3.8/dist-packages/shapely/set_operations.py:133: RuntimeWarning: invalid value encountered in intersection
return lib.intersection(a, b, **kwargs)
/usr/local/lib/python3.8/dist-packages/shapely/set_operations.py:133: RuntimeWarning: invalid value encountered in intersection
return lib.intersection(a, b, **kwargs)
/usr/local/lib/python3.8/dist-packages/shapely/set_operations.py:133: RuntimeWarning: invalid value encountered in intersection
return lib.intersection(a, b, **kwargs)
/usr/local/lib/python3.8/dist-packages/shapely/set_operations.py:133: RuntimeWarning: invalid value encountered in intersection
return lib.intersection(a, b, **kwargs)
/usr/local/lib/python3.8/dist-packages/shapely/set_operations.py:133: RuntimeWarning: invalid value encountered in intersection
return lib.intersection(a, b, **kwargs)
[01/13 09:56:45 d2.utils.events]: eta: 20:18:34 iter: 19 total_loss: 14.61 loss_cls_stage0: 4.223 loss_box_reg_stage0: 0.002018 loss_cls_stage1: 4.189 loss_box_reg_stage1: 0.003367 loss_cls_stage2: 4.404 loss_box_reg_stage2: 0.00404 loss_mask: 0.693 loss_rpn_cls: 0.6935 loss_rpn_loc: 0.3204 time: 1.6258 data_time: 0.0255 lr: 9.7405e-07 max_mem: 28114M
/usr/local/lib/python3.8/dist-packages/shapely/set_operations.py:133: RuntimeWarning: invalid value encountered in intersection
return lib.intersection(a, b, **kwargs)

bug: detectron2 build fails after cleanup commit

Hello,

The recent cleanup commit (73b3708) causes the detectron2 build to fail.

Steps to reproduce:

cd /path/to/EVA/det
python -m pip install -e .

Result:

Obtaining file:///home/appuser/eva_repo/det
  Preparing metadata (setup.py) ... error
  error: subprocess-exited-with-error

  × python setup.py egg_info did not run successfully.
  │ exit code: 1
  ╰─> [8 lines of output]
      running egg_info
      creating /tmp/pip-pip-egg-info-6d_v4u3e/detectron2.egg-info
      writing /tmp/pip-pip-egg-info-6d_v4u3e/detectron2.egg-info/PKG-INFO
      writing dependency_links to /tmp/pip-pip-egg-info-6d_v4u3e/detectron2.egg-info/dependency_links.txt
      writing requirements to /tmp/pip-pip-egg-info-6d_v4u3e/detectron2.egg-info/requires.txt
      writing top-level names to /tmp/pip-pip-egg-info-6d_v4u3e/detectron2.egg-info/top_level.txt
      writing manifest file '/tmp/pip-pip-egg-info-6d_v4u3e/detectron2.egg-info/SOURCES.txt'
      error: package directory 'projects/PointRend/point_rend' does not exist
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

Expected result:
detectron2 is installed without error

Proposed fix:
Change from

EVA/det/setup.py

Lines 141 to 145 in 56ee4e7

 PROJECTS = { 

 "detectron2.projects.point_rend": "projects/PointRend/point_rend", 

 "detectron2.projects.deeplab": "projects/DeepLab/deeplab", 

 "detectron2.projects.panoptic_deeplab": "projects/Panoptic-DeepLab/panoptic_deeplab", 

 }

PROJECTS = {}

There is no dependency on these other projects so the reference to them can be safely removed.

How to train with custom dataset?

I have custom data with coco format, how can I train with my own data?

Maybe consider naming the next version EVA 0 for Rei Ayanami

Thank you for your excellent work, a ten year EVA fan here

Train instance segmentation using EVA pretrained on custom dataset

I am seeking the instructions on how to train an instance segmentation model using an EVA pretrained model on my custom data. I would be grateful if someone could provide me with the necessary steps and best practices to accomplish this task. Thanks.

Inference with Eva

Can anyone please show me how to perform inference (which script or command to use) on single or multiple image after training the Eva module on my custom datasets?

The pos_emb used in eva pretraining and eva det.

Hello，thanks for yor great work! Sorry to bother you, but I have some doubts about the pos emb in the codes.
(1) I find the pos_emb used in eva pretraining is abs_pos_emb
in 68 Line which set parser.set_defaults(abs_pos_emb=True)
(https://github.com/baaivision/EVA/blob/master/eva/run_eva_pretraining.py)
As i know, in the original beit codes, they use the rel_pos. Why do you need to make this modification, is this setting helpful for pre-training or some downstream tasks?
(2) I found that when performing detection tasks（eva_det）, vit's pos emb uses rel pos, which is inconsistent with the pre-training stage

EVA train on object365

Hello,

I am trying to work with EVA pretrained on object365 but I have few questions about it. I try to load the model and get these message :


Some model parameters or buffers are not found in the checkpoint:
roi_heads.mask_head.deconv.{bias, weight}
roi_heads.mask_head.mask_fcn1.norm.{bias, weight}
roi_heads.mask_head.mask_fcn1.weight
roi_heads.mask_head.mask_fcn2.norm.{bias, weight}
roi_heads.mask_head.mask_fcn2.weight
roi_heads.mask_head.mask_fcn3.norm.{bias, weight}
roi_heads.mask_head.mask_fcn3.weight
roi_heads.mask_head.mask_fcn4.norm.{bias, weight}
roi_heads.mask_head.mask_fcn4.weight
roi_heads.mask_head.predictor.{bias, weight}

Does that mean the model has not been trained with a mask rcnn architecture ? So, only object detection pretraining ?

Would it be possible to share the lazy config file of this checkpoint ?

Thank !

Some questions about EVA pretraining

In your opinion, is EVA a method of both model scaling and data scaling? Does pretraining with more data (such as the data used in CLIP finetuning) yield better results than using only the 30M data described in the paper? What data is optimal for EVA?
How does the teacher model influence the performance of the student model? Is it possible to replace the CLIP model with a supervised model trained on IN21k?
In #19, you mentioned that ViT has many desirable properties, but some researchers have observed that scaling the ViT model can lead to instability during fp16 training (such as overflow in the forward pass and underflow in the backward pass). Why do you continue to choose ViT as a backbone over alternatives like SwinTransformer, and could you elaborate on ViT's "good properties"? Thank you.

how can i train my model with mask picture/video ?

Problem about object365 dataset

Is the released obj365 detection model trained on offcial Object365 dataset(174W), and validate on the val dataset(8W)

The intermediate fine-tuning scripts in object 365

Can you share the intermediate fine-tuning scripts and logs in object 365?

About the fps

Thank you for your great work! By the way I want to know about your fps because I can not find it in the paper.
I got 0.7s per task on my own A5000, one card. Seems slower than other method? Or if I forgot to change some config?

Look forward to your reply! @Yuxin-CV

Configuration for Detection

Hi, is there any instruction or command to pre-train the EVA on the Object 365 dataset? Thanks.

RuntimeError: EncoderDecoderMask2FormerAug: BEiTAdapter: "compute_indices_weights_cubic" not implemented for 'Half'

I met this problem，could you help me about it？

.py file for single image inference

Thank you so much for releasing this, I can't wait to try it on my own data. Are you planning to release a py script for single image inference?

Version of Xformers

What is your version of Xformers?

Xformers >= 0.0.16 needs torch >=1.13

Distillation of large ViT

Hello,

In your opinion, what is the best way to distill a large vision transformer (eg. ViT-g) into a small one (eg. ViT-B) ?

Seems there are many alternatives: MIM as for EVA, distillation token as in DeiT, more classical techniques such as response based or feature based methods etc.

Thanks,
Simon

The requirement of GPU memory for inference for the instance segmentation

Hi, I find the problem of "out of memory" when conducting inference on COCO2017 val using 3090 (24G). I also see the batch_size is default set to 1 during inference. Therefore, I just wonder what the minimum required GPU memory.

EVA 2.0-CLIP

Hi, nice work to EVA-2.0, is there any plan to try EVA 2.0 training on CLIP or scaling EVA 2.0 to giant size?

	PROJECTS = {
	"detectron2.projects.point_rend": "projects/PointRend/point_rend",
	"detectron2.projects.deeplab": "projects/DeepLab/deeplab",
	"detectron2.projects.panoptic_deeplab": "projects/Panoptic-DeepLab/panoptic_deeplab",
	}

baaivision / eva Goto Github PK

eva's Introduction

EVA: Visual Representation Fantasies from BAAI

Contact

License

Misc

eva's People

Contributors

Stargazers

Watchers

Forkers

eva's Issues

Recommend Projects

Recommend Topics

Recommend Org